Analyzing Simda Botnet Malware - MalOps Challenge

in

Introduction

MalOps released a new challenge and that means it is CTF time again! This time around, we are analyzing the Simda botnet malware using IDA Pro using both static and dynamic debugging. Strap in, because we are going on a rollercoaster ride of encrypted payloads and anti-analysis techniques.

Q1) What is the first windows API used by the malware to allocate memory?

The first Windows API that is used by the malware to allocate memory is VirtualAllocEx. After analyzing the imports, it appears very likely that all required APIs are imported using a combination of LoadLibrayA and GetProcAddress. When cross-referencing calls to GetProcAddress, we see that it is actually only used once, in the function sub_4016B0 and it is used to import VirtualAllocEx, which gets called soon after. q1_answer

Q2) What does the second parameter given to RegOpenKeyA call point to?

The second parameter given to RegOpenKeyA points to the string clsid\{d66d6f99-cdaa-11d0-b822-00c04fc9b31f}. In the function sub_401170, we identify that the function RegOpenKeyA is stored into the variable dword_4CA0DC. If we cross-reference this variable, we see that is called twice.
Now, arguments are passed from last to first, meaning the second argument given to the function, will be the second to last in our IDA view. Furthermore, because this is an x86 executable, the arguments are given to a function using the instruction push. Thus, we see that the second to last push operation obtains its value from the variable off_4CA040 which represents the string clsid\{d66d6f99-cdaa-11d0-b822-00c04fc9b31f}. q2_answer

Q3) The malware dynamically resolves Windows API function names in memory, and decrypts a large blob of data, which function is responsible for grabbing the encrypted blobs? Provide address in hex

The function responsible for grabbing the encrypted blobs is located at address 0x4011B0.
It was previously identified that sub_4016B0 imports the VirtualAllocEx function and calls it. The result is the base address of the allocated region pages. This result is used in a while (1) loop as it is passed to the function sub_4011B0. By analyzing this function, we see that it reads 0x20B bytes.

Q4) The malware uses a dynamic key for decryption, What is the initial decryption key used to decrypt the encrypted blobs (word size)?

The initial decryption key that is being used is 0xB0B6. In function sub_401650, we identify a XOR operation which is performed on the encrypted blobs. The static value initially supplied to this function represents the initial decryption key. As sub_401000 is the caller, we identify in this function that the value 0xB0B6 is passed to it. q4_answer

Q5) What is the name of the first Windows API function decrypted?

So far, we know that function sub_4016B0 is responsible for retrieving the encrypted blobs. Furthermore, we know that the pointer to the encrypted blobs is then stored in dword_4CA0C8. This variable is then passed as input to function sub_401000 which subsequently retrieves a fixed set of bytes and supplies this together with the initial decryption key to sub_401650.

We can use dynamic debugging to then follow the decryption process. In sub_401000, the register [ebp+arg_0] holds the pointer to the encrypted blob. So we can add a breakpoint on address 0x401046 and analyze eax to see the encrypted data. Following along, the hardcoded key 0xB0B6 gets incremented before it is passed to sub_401650.

Since all of this happens in a loop, we can safely traverse the loop a few times and see the changes being written to the address which dword_4CA230 points to. Note that the offset is calculated using the incremented value stored in dword_4CA230. Finally, we discover that the first decrypted value is a Windows API corresponding to GetProcAddress. q5_answer

Q6) What is the address of the ret instruction responsible for jumping to decrypted shellcode?

Once 0x401167 returns, we see that the offset to jump to in the shellcode is stored in dword_4CA094. Subsequently, the address of sub_401130 is stored in eax, it is then pushed onto the stack and then start returns. This means basically, that instead of exiting, it actually points eip to sub_401130.

By analyzing sub_401130, we see that the pointer to the shellcode in dword_4CA094 is pushed onto the stack and then the function returns. Once again, this means that execution is transferred to the shellcode. q6_answer

Q7) Based on the memory allocated by the malware, what is the offset of the first instruction executed after decryption in hex?

By answering question 6, we identified that the address pointing to the start of the decrypted shellcode blob was stored in dword_4CA094. This address was incremented with a hardcoded offset: 0x86ED0.
q7_answer

Q8) What is the second API called by the malware after decryption?

Now, we are still debugging and have entered the shellcode. If we analyze the memory buffer in ecx, we are at the top of the initial/main function in the shellcode. Furthermore, by analyzing the instructions, we see multiple calls to addresses lower on the stack. By doing some analysis, we can determine the exact range of the shellcode. The start of the shellcode is at 0x2306580 and ends at 0x2306FC8.

To make my own life easier, I select this entire block in memory, save it to a new executable and open this new executable in IDA. Then, I go to the last function sub_950 because this is the entrypoint of the shellcode. Now, we can analyze which functions are called and what the second API call is.

The second call the malware makes is to sub_F0, where we see that LoadLibraryExA is stored in [ebp+var_28] which is then loaded into eax. Furthermore we also see that kernel32.dll is stored into a variable. Therefore, we can conclude that it is likely that a pointer to the function LoadLibraryExA in kernel32.dll gets loaded and subsequently calls. q8_answer

Q9) The malware decrypts another part in memory with another dynamic key, what is the fixed addition value to the key in hex (word size)?

By now, we can either continue statically analzying the binary, or performing dynamic analysis by running & debugging the shellcode. Whatever the preferred option, we will notice that the main function performs multiple actions, one of which is setting up the functions to allocate some memory in which the second encrypted part is stored. Eventually, we notice that in sub_900, this block is decrypted. In this decryption loop, we see a straightforward add operation which adds 0x03E9 to the dynamic key. q9_answer

Q10) There are 3 hardcoded IPs, list them in the format: IP1,IP2,IP3 (same order as found)

Whilst debugging, we will now have set a breakpoint on the mov esp, ebp operation after the decryption block. This would allow us to select the memory, for me starting at 2590000, and save it to another file. As we notice, the binary data this time around starts with the header MZ which clearly denotes an executable. Once again, we can select the designated memory section containing the newly decrypted memory and save it to another executable. If we open a new IDA session. We are working with the hypothesis that the signifance of 'hardcoded' means that if we run the strings utility, it will very likely show us the IP addresses. So this is the first starting point, and as expected, we find the following hardcoded IP addresses. 212.117.176.187,79.133.196.94,69.57.173.222

Q11) What is the address of the function that perform anti analysis checks

To answer this question, we are working with the hypothesis that whenever the anti-analysis is performed, it will be performed quickly after initial execution of the malware as not to give away too many features and other indicators of compromise.
By analyzing the graph, we notice that at 0x402728 a large potentially continuous loop is entered. As such, we set the cut-off for looking for the anti-analysis checks at this address. Then, we go through all calls prior to this address and we end up in function sub_401B98. Here, we see that a specific file c:\\cgvi5r6i\\vgdgfd.72g is checked for its existence and if the file does not exist, first the malware will loop over a list of processes and compare them against known anti-malware processes. Subsequently, the malware loops over a list of registry keys for sandboxes and malware processes and checks if they exist.
Therefore, we can conclude that the function sub_401B98 is performing anti-analysis checks. q11_answer

Q12) The malware will use completely different 3 IPs than the hardcoded ones, list them in order: 7x.xxx.xx.xxx,2xx.xx.xx.xx,1xx.xxx.xx.xxx

We continue our analysis from 0x402295 onwards. If the anti-analysis checks fail, we end up at 0x402728 which results in the malware halting. If we pass the checks, we continue on towards 0x4022B0. Now, we previously identified the hardcoded IP addresses, two of which were found in 0x40A839 which is called at location 0x40243D. Before this call though, we see multiple instances where dynamic string are loaded. If we run the debugger to this operation, we identify that the dynamic strings at 0x4023E0 and 0x4023E6 contain two IP addresses. If we then inspect the adjecent memory, we identify the third IP address which matches the expected pattern: 79.142.66.239,217.23.12.63,109.236.87.106

Q13) To which Windows environment variable-based folder does the malware copy itself?

The lazy approach I opted for in this case was to search the hardcoded strings for the %-character as this is commonly used to identify variable-based folders. Luckily for us, only one result is found: %appdata%\ScanDisc.exe. When we analyze the code of the function sub_405C5F, we see that the string is passed to ExpandEnvironmentStringA before the result is then passed to the CopyFileA API, which takes as input the original file, which points to the malware sample in the current directory, as well as the destination file, which is the %appdata% folder. q13_answer

Q14) What is the registry key the malware uses for persistence?

To answer this question, we are working with the hypothesis that the malware uses at least RegCreateKey*. By looking for all cross-references of this Windows API function, we notice three distinct ones and the one in sub_4038E3 immediately jumps out to me as the RunOnce key is a popular persistence mechansim.
We debug the code and set a breakpoint on the creation of the registry key, when we then analyze the remaining part of the function, we see subsequently a call to RegSetValueExW. Just before this API call, wsprintfW is used to combine the value opt and the new binary name stored in the %appdata% folder. Therefore, we can conclude that the registry key HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce is used to create persistence. q14_answer

Q15) What is the argument the malware will launch itself with?

By answering the previous question, we noticed that the value written to the RunOnce key is the malware executable combined with the argument opt. We can further verify this by going back to the section where the commandline was parsed as here we notice that the supplied value is compared against the static string opt. q15_answer