Introduction
MalOps released a new challenge and that means it is CTF time again! This time around, we are analyzing the Simda botnet malware using IDA Pro using both static and dynamic debugging. Strap in, because we are going on a rollercoaster ride of encrypted payloads and anti-analysis techniques.
Q1) What is the first windows API used by the malware to allocate memory?
The first Windows API that is used by the malware to allocate memory is VirtualAllocEx. After analyzing
the imports, it appears very likely that all required APIs are imported using a combination of
LoadLibrayA
and GetProcAddress. When cross-referencing calls to GetProcAddress, we
see that it is actually only used once, in the function sub_4016B0
and it is used to import
VirtualAllocEx, which gets called soon after.
Q2) What does the second parameter given to RegOpenKeyA call point to?
The second parameter given to RegOpenKeyA points to the string
clsid\{d66d6f99-cdaa-11d0-b822-00c04fc9b31f}.
In the function sub_401170, we identify that the function RegOpenKeyA is stored into the
variable dword_4CA0DC. If we cross-reference this variable, we see
that is called twice.
Now, arguments are passed from last to first, meaning the second argument given to the function,
will be the second to last in our IDA view. Furthermore,
because this is an x86 executable, the arguments are given to a function using the instruction
push. Thus, we see that the
second to last push operation obtains its value from the variable off_4CA040 which represents the string clsid\{d66d6f99-cdaa-11d0-b822-00c04fc9b31f}.
Q3) The malware dynamically resolves Windows API function names in memory, and decrypts a large blob of data, which function is responsible for grabbing the encrypted blobs? Provide address in hex
The function responsible for grabbing the encrypted blobs is located at address 0x4011B0.
It was previously identified that sub_4016B0 imports the VirtualAllocEx function and calls it. The result is the base
address of the allocated region pages. This result is used in a while
(1) loop as it is passed to the function sub_4011B0.
By analyzing this function, we see that it reads 0x20B bytes.
Q4) The malware uses a dynamic key for decryption, What is the initial decryption key used to decrypt the encrypted blobs (word size)?
The initial decryption key that is being used is 0xB0B6. In
function sub_401650, we identify a XOR
operation which is performed on the encrypted blobs. The static value initially supplied to this
function represents the initial decryption key. As sub_401000
is the caller, we identify in this function that the value 0xB0B6
is passed to it.
Q5) What is the name of the first Windows API function decrypted?
So far, we know that function sub_4016B0 is responsible for
retrieving the encrypted blobs. Furthermore, we
know that the pointer to the encrypted blobs is then stored in dword_4CA0C8. This variable is then passed
as input to function sub_401000 which subsequently retrieves a
fixed set of bytes and supplies this together
with the initial decryption key to sub_401650.
We can use dynamic debugging to then follow the decryption process. In sub_401000, the register [ebp+arg_0]
holds the pointer to the encrypted blob. So we can add a breakpoint on address 0x401046 and analyze eax
to see the encrypted data. Following along, the hardcoded key 0xB0B6 gets incremented before it is passed to sub_401650.
Since all of this happens in a loop, we can safely traverse the loop a few times and see the
changes being written to the address which
dword_4CA230 points to. Note that the offset is calculated using
the incremented value stored in dword_4CA230.
Finally, we discover that the first decrypted value is a Windows API corresponding to GetProcAddress.
Q6) What is the address of the ret instruction responsible for jumping to decrypted shellcode?
Once 0x401167 returns, we see that the offset to jump to in the
shellcode is stored in dword_4CA094.
Subsequently, the address of sub_401130 is stored in eax, it is
then pushed onto the stack and then start
returns. This means basically, that instead of exiting, it actually points eip to sub_401130.
By analyzing sub_401130, we see that the pointer to the shellcode
in dword_4CA094 is pushed onto the stack
and then the function returns. Once again, this means that execution is transferred to the
shellcode.
Q7) Based on the memory allocated by the malware, what is the offset of the first instruction executed after decryption in hex?
By answering question 6, we identified that the address pointing to the start of the decrypted
shellcode blob was stored in
dword_4CA094. This address was incremented with a hardcoded
offset: 0x86ED0.
Q8) What is the second API called by the malware after decryption?
Now, we are still debugging and have entered the shellcode. If we analyze the memory buffer in
ecx, we are at the top of the initial/main
function in the shellcode. Furthermore, by analyzing the instructions, we see multiple calls to
addresses lower on the stack. By doing some
analysis, we can determine the exact range of the shellcode. The start of the shellcode is at
0x2306580 and
ends at 0x2306FC8.
To make my own life easier, I select this entire block in memory, save it to a new executable
and open this new executable in IDA.
Then, I go to the last function sub_950 because this is the
entrypoint of the shellcode. Now, we can
analyze which functions are called and what the second API call is.
The second call the malware makes is to sub_F0, where we see that
LoadLibraryExA
is stored in [ebp+var_28] which is then loaded into eax. Furthermore we also
see that kernel32.dll is stored into a variable. Therefore, we can
conclude that it is likely that a pointer
to the function LoadLibraryExA in kernel32.dll gets loaded and subsequently calls.
Q9) The malware decrypts another part in memory with another dynamic key, what is the fixed addition value to the key in hex (word size)?
By now, we can either continue statically analzying the binary, or performing dynamic analysis
by running & debugging the shellcode.
Whatever the preferred option, we will notice that the main
function performs multiple actions, one of which
is setting up the functions to allocate some memory in which the second encrypted part is
stored. Eventually, we notice that in
sub_900, this block is decrypted. In this decryption loop, we see
a straightforward add
operation which adds 0x03E9 to the dynamic key.
Q10) There are 3 hardcoded IPs, list them in the format: IP1,IP2,IP3 (same order as found)
Whilst debugging, we will now have set a breakpoint on the mov esp, ebp operation after the decryption block. This would allow us to select the memory, for me starting at 2590000, and save it to another file. As we notice, the binary data this time around starts with the header MZ which clearly denotes an executable. Once again, we can select the designated memory section containing the newly decrypted memory and save it to another executable. If we open a new IDA session. We are working with the hypothesis that the signifance of 'hardcoded' means that if we run the strings utility, it will very likely show us the IP addresses. So this is the first starting point, and as expected, we find the following hardcoded IP addresses. 212.117.176.187,79.133.196.94,69.57.173.222
Q11) What is the address of the function that perform anti analysis checks
To answer this question, we are working with the hypothesis that whenever the anti-analysis is
performed, it will be performed quickly
after initial execution of the malware as not to give away too many features and other
indicators of compromise.
By analyzing the graph, we notice that at 0x402728 a large
potentially continuous loop is entered. As such,
we set the cut-off for looking for the anti-analysis checks at this address. Then, we go through
all calls prior to this address and
we end up in function sub_401B98. Here, we see that a specific
file c:\\cgvi5r6i\\vgdgfd.72g
is checked for its existence and if the file does not exist, first the malware will loop over a
list of processes and compare them against known
anti-malware processes. Subsequently, the malware loops over a list of registry keys for
sandboxes and malware processes and checks if they exist.
Therefore, we can conclude that the function sub_401B98 is
performing anti-analysis checks.
Q12) The malware will use completely different 3 IPs than the hardcoded ones, list them in order: 7x.xxx.xx.xxx,2xx.xx.xx.xx,1xx.xxx.xx.xxx
We continue our analysis from 0x402295 onwards. If the anti-analysis checks fail, we end up at 0x402728 which results in the malware halting. If we pass the checks, we continue on towards 0x4022B0. Now, we previously identified the hardcoded IP addresses, two of which were found in 0x40A839 which is called at location 0x40243D. Before this call though, we see multiple instances where dynamic string are loaded. If we run the debugger to this operation, we identify that the dynamic strings at 0x4023E0 and 0x4023E6 contain two IP addresses. If we then inspect the adjecent memory, we identify the third IP address which matches the expected pattern: 79.142.66.239,217.23.12.63,109.236.87.106
Q13) To which Windows environment variable-based folder does the malware copy itself?
The lazy approach I opted for in this case was to search the hardcoded strings for the %-character as this
is commonly used to identify variable-based folders. Luckily for us, only one result is found:
%appdata%\ScanDisc.exe.
When we analyze the code of the function sub_405C5F, we see that
the string is passed to ExpandEnvironmentStringA
before the result is then passed to the CopyFileA API, which takes
as input the original file, which points to
the malware sample in the current directory, as well as the destination file, which is the %appdata% folder.
Q14) What is the registry key the malware uses for persistence?
To answer this question, we are working with the hypothesis that the malware uses at least RegCreateKey*.
By looking for all cross-references of this Windows API function, we notice three distinct ones
and the one in sub_4038E3
immediately jumps out to me as the RunOnce key is a popular
persistence mechansim.
We debug the code and set a breakpoint on the creation of the registry key, when we then analyze
the remaining part of the function, we see subsequently
a call to RegSetValueExW. Just before this API call, wsprintfW is used to combine the value
opt and the new binary name stored in the %appdata% folder.
Therefore, we can conclude that the registry key HKEY_CURRENT_USER\SOFTWARE\Microsoft\Windows\CurrentVersion\RunOnce
is
used to create persistence.
Q15) What is the argument the malware will launch itself with?
By answering the previous question, we noticed that the value written to the RunOnce key is the malware executable
combined with the argument opt.
We can further verify this by going back to the section where the commandline was parsed as here
we notice that the supplied value is compared
against the static string opt.