GraymanRe - RokRat Loader MalOps Challenge

Introduction

Today we are analyzing RokRat whilst solving the RokRat Loader Malware Challenge by MalOps. RokRat is a tool used by APT37, a state-sponsored hacking group linked to North Korea. The RokRat tool is a multifaceted tool, amongst which one of the features is a shellcode loader. Specifically this shellcode loading mechanism is featured in this challenge write-up as we try and understand how it decrypts, loads and executes the embedded shellcode blob in memory.

Q1) What is the MD5 hash of the binary?

We can retrieve the MD5 hash of any given file on a Windows system, using the built-in PowerShell feature Get-FileHash. Using the full command: Get-FileHash -Algorithm MD5 .\sample, we retrieve the following MD5: CF28EF5CEDA2AA7D7C149864723E5890.

Q2) What is the entry point address of the binary in hex?

The entry point address in a binary can be calculated by adding the value of the AddressOfEntryPoint field to the value of the ImageBase field within the Optional Header of a PE/COFF file. The AddressOfEntryPoint is located at 16 bytes from the start of the Optional Header and has a size of 4 bytes, whilst the ImageBase is located at either 24 or 28 bytes from the start of the Optional Header. This depends on whether the file is PE32 or PE32+. Subsequently, the size of the field is either 4 or 8 bytes respectively. We can use tools such as PEstudio to retrieve the values for these fields. Alternatively, we could also load the binary into IDA. IDA will, by default, place us at the entry point of the binary and display the corresponding address. Regardless of the chosen method, the entry point is located at 0x401000.

Q3) What XOR key is used to decrypt the embedded shellcode in hex?

XOR, exclusive OR, is a logical operator which can be used as a valid assembly instruction. Therefore, with the sample loaded into IDA, we can search for all XOR operators within the binary. Since the binary for this challenge is relatively small, the result should not yield an abundance of results. In larger samples, other, more refined techniques might be required instead of the simple IDA Python code below:


import idautils
import ida_bytes
import idaapi
import ida_allins
import ida_ida

start_ea = ida_ida.inf_get_min_ea()
end_ea   = ida_ida.inf_get_max_ea()

insn = idaapi.insn_t()
ea = start_ea
while ea != idaapi.BADADDR and ea < end_ea:
    if ida_bytes.is_code(ida_bytes.get_flags(ea)):
        length = idaapi.decode_insn(insn, ea)
        if length > 0 and insn.itype == ida_allins.NN_xor:
            print("XOR at 0x%X" % ea)
        ea += length or 1
    else:
        ea = idaapi.next_head(ea, end_ea)

A total of 14 XOR operators were found. As you will see, for most of the identified XOR operators, the source and destination operands are the same register, effectively clearing all data in the register. Therefore, we could clean up the code and negate all XOR instructions where the source and destination operand are the same register. This leaves us with a total of 2 results (0x4010EA and 0x401149).

To narrow down our results, we reflect on the challenge question. Since we know that some form of data (embedded shellcode) is being decrypted, we are looking for an XOR operation inside a small blob which is continuously iterated over. Furthermore, based on the challenge question, we can assume that the XOR value remains constant.

We see that within the loop that calls the XOR at 0x4010EA, the XOR value in esi is updated. Therefore, we can safely conclude that the XOR operation at 0x401149, which does not update the value in bl at all, contains the key that we are looking for. Based on static debugging, it is hard to identify the XOR key. As dynamic debugging would yield results in a faster manner, we will place a breakpoint on 0x401149 and run the binary to identify the value. Our breakpoint is successfully hit upon execution and using IDA Python we can retrieve the value for the bl register:


import idaapi
rv = idaapi.regval_t()
idaapi.get_reg_val("bl", rv)
bl_val = rv.ival
print("BL = 0x%02X" %bl_val)

The result is: BL = 0x29

Q4) What is the memory protection constant used when allocating memory for the payload in hex?

Whilst answering the previous question we have identified that shellcode was being decrypted. Working with the assumption that the decrypted shellcode will be executed, we can analyze how this can be achieved. One way to achieve execution of shellcode is by injecting it into an existing process. To achieve shellcode injection in a straightforward way, a simple three step process needs to be followed:

Allocate space in a process, often done using VirtualAlloc,
Write the shellcode into the allocated memory space, using WriteProcessMemory
Execute the injected shellcode by calling CreateRemoteThread

Since this method is well-known and classed as the classic injection, detection engines are capable of detecting the execution of these three calls in succession and would block the injection behavior. One way malware authors obfuscate these calls is by using Windows API hashing. API hashing is used to avoid the straightforward inspection of API function calls in the Import Address Table of a binary. Malware authors use the Process Environment Block (PEB) which contains information about loaded modules (like kernel32.dll) mapped into the address space. During runtime, the Process Environment Block can be used to dynamically resolve functions in a particular DLL. To obfuscate the names that are looked for, the function names (API's) are hashed by the malware authors.

Based on this knowledge, we can continue our analysis. We already determined that the shellcode decryption takes place in sub_401134. We see that directly after the decryption, sub_4012C2 is called. This function calls sub_401041 multiple times where the first argument is always a hex string. We see that at the beginning of sub_401041, the Flink of the InLoadOrderModuleList is retrieved from the PEB. As a result, we can conclude that the first argument for each call to sub_401041 is the API hash of a module.

Now, the question specifically asks for the memory protection constant. This is the final argument of the VirtualAlloc API, as documented by MSDN. In order to both write and execute the shellcode, the protection constant should, at least, contain the PAGE_EXECUTE_READWRITE constant, or 0x40 in hex.

If we closely inspect the VirtualAlloc function, we see that the second to last argument is the memory allocation type parameter. This parameter requires at least MEM_COMMIT and MEM_RESERVE, or 0x3000 in hex. Upon closer inspection of all calls to sub_401041, we discover that the first one exactly matches both the allocation type and protection constant.

Based on this finding, we can assume that the memory protection constant used for allocating memory for the shellcode is: 0x40.

Q5) What is the hash value used to find the VirtualAlloc function in hex?

By answering question 4, we identified that sub_401041 is the function responsible for finding the correct API to perform the shellcode injection. Additionally, we identified that the first function was VirtualAlloc as it used the memory protection constant. Therefore, the hash value used to find the VirtualAlloc function was: 0xAA7ADB76.

Q6) How many bits does the DLL name hash algorithm rotate right (ROR) by in hex?

Now that we have identified the DLL name hash algorithm function, we can easily search for the ror instruction. We identify that the ror instruction is used twice. Tracing the input hash value which was supplied as one of the function's arguments, we can conclude that the ror instruction uses the fixed size of 0xB to rotate the DLL name hash right.

Q7) What value is checked to verify a valid PE header in hex?

Previously, we have identified that the function at 0x401134 decrypts the embedded shellcode. This function subsequently calls the function at 0x4012C2. We have identified that this function is responsible for calling the import hashing function. However, during our analysis, we have disregarded the first call being made by 0x401CC2C to the function at 0x4011F9. If we dive into this function we identify that it loads the first 2 bytes from the decrypted shellcode into registers before calling 0x401164.

By analyzing 0x401164 we learn that a check will take place to ensure the first two bytes are corresponding to 0x4A5D or the well-known MZ-header. However, the MZ-header is not part of the PE-header. If we continue our analysis, we see a second function being called from 0x4011F9. If the MZ-header cannot be found, it will call 0x401197, which will first try another method to identify the MZ-header. If that fails, it will read a 32-bit value (4-bytes) from the decrypted shellcode at offset 0x3C. This offset is the default offset and length for the PE identifier: PE\0\0.

We subsequently see a compare taking place, comparing the just read buffer against 0x4550. Therefore, we can conclude that the value that is being checked to verify a valid PE header in hex is 0x4550.

Q8) What is the hexadecimal offset value used in the code to access the export directory in the PE file's Optional Header data directory?

By default, the Export Directory's Relative Virtual Address can be found at offset 0x78 from the start of the NT headers.

Q9) How many API functions are resolved using hashing in the entire binary?

We have previously identified the function at 0x401041 to be the import hashing function. Using IDA, we can cross reference the amount of times that this function is being called. To do so, we navigate (CTRL+G) to the address 0x401041. We then click on sub_401041 and hit "X". This brings up the window, showing that the function has been called a total of 7 times.

Q10) How many bytes of headers are skipped to reach the start of the decrypted data?

We did not previously properly describe the analysis of the decryption function sub_401134. If we have a closer look at this function, we can state that the blob of shellcode resides in the eax register within the decryption loop in the decryption function. However, it originates from ecx. If we then look at all the usages of ecx within sub_401134 we see that the first byte of the blob is used to set bl, which corresponds to the XOR decryption key. Furthermore, we see that the subsequent 4 bytes are being stored in edx, which later gets moved to esi and is decremented for each byte that gets decrypted. Then, the pointer in ecx is incremented by 5, moving the pointer to the actual start of the buffer. Therefore, we can conclude that 5 bytes of headers (1 byte for the XOR key and 4 bytes for the length of data) are skipped to reach the start of the decrypted data.

Analyzing APT-37's RokRat malware - MalOps Challenge