Search This Blog

Showing posts with label code injection. Show all posts
Showing posts with label code injection. Show all posts

Monday, October 22, 2012

Exception Driven "Debugging": Getting behind anti debugging tricks.

Of course, every debugging is exception driven. At least because a breakpoint generates debug exception wich is passed to debugger. In this article, however, I will refer to regular exceptions.

There are tens if not hundreds of software protectors used by software vendors around the globe. Some are good, some are less good, in either case, vendors rarely use them in a proper way, thinking that simply enabling anti-debugging features, provided by protector of their choice, is enough. I have seen it myself - a widely known commercial application, protected using Themida (which is one of the most complicated protectors) remains SOOO unprotected, that Themida is not even notices during the extraction of relatively sensitive information using the application itself.

However, the purpose of this article is not to discuss pros and cons of Themida or any other protector, nor do I have any intention to disgrace any of the software vendors. The purpose is to describe a relatively easy way of bypassing common anti debugging tricks (including Windows DRM protection)  with DLL injection.

As the term "anti debugging" states - such methods target modern debuggers. There are several commonly known tricks:
  1. IsDebuggerPresent() - you would be surprised to know how many vendors rely on this API alone;
  2. Additional methods of debugger presence detection;
  3. IAT modification - which is not really worth trying;
  4. Redirection of debugging API (e.g. to an infinite loop).
  5. And some more.
Point #4 does not let you to implement your own debugger in a hope that it would not be noticed by the victim program (many beginners fall out at this point).

Point #3 - how much can you modify the IAT? I mean, system loader has to still be able to parse it, thus, if system loader can - everyone can.

Point #1 is not even worth further mentioning here.

In this article I am going to describe a simple way (although, some may cry and say it is a hard way) to get around most of anti debugging tricks without even noticing their presence by implementing a simple pseudo debugger dll, which is to be injected into the target process.


Step #1. Preparations

In order to use any debugger, you have to know where to set your breakpoints. Otherwise, the whole process is meaningless. But how can you define proper locations if the executable on disc is encrypted (e.g. with Themida) and you still cannot attach a debugger to see what is going on inside?

The solution is quite simple. Simple in deed. Windows provides us with all the instruments to read the memory of another process (given that you have sufficient access rights) with OpenProcess(), ReadProcessMemory() and NtQueryInformationProcess() API functions. Using those, you can simply dump the decrypted executable and any of its modules (DLLs) to a separate file on disc.

NtQueryInformationProcess() provides you with the address of the PEB (see this post for more information on PEB) of the target process. Then you simply parse the linked list of loaded modules, get the base address (module handle) and the image size for each, then use ReadProcessMemory to copy the image to a file. One complication, though, you will have to use ReadProcessMemory in order to access the PEB of the remote process.

Once you have dumped the target image to a file, such file can be easily loaded into IDA Pro, disassembled and researched statically.


Step #2. Injector and DLL

I do not see any reason to describe the DLL injection process here, as it has been described many times, even in this blog. You are free to use standard injection method, advanced DLL injection method or use this method if you have problems with the two previously mentioned.


DllMain()

It is suggested not to perform any heavy action in this function, however, we do not really have a choice (although, you can launch a separate thread). First thing to do is to suspend all running threads (except the current one of course). The problem is that Windows has no API function that would allow you to enumerate threads of a single process, instead, it lets you go through all the threads in the system. See MSDN pages for Thread32First and Thread32Next - there should be a perfect example of getting threads of the current process. Once all the threads are suspended, you are ready to proceed.


Installation of breakpoints 

No, we are not going to use regular 0xCC software breakpoints, neither are we going to make any use of hardware breakpoints here. Instead, we are going to place an instruction that would raise an exception to the location of desired breakpoint. To keep such instruction short and to avoid changing the values of the registers, 'AAM 0' seems to be a perfect candidate. It only takes two bytes 0xD4 0x00 and raises the EXCEPTION_INT_DIVIDE_BY_ZERO exception (exception code 0xC0000094).

Use the VirtualProtect() function to change the access rights of the target address, so you can alter its content, backup the original two bytes from that address and overwrite them with 0x00D4

VirtualProtect((LPVOID)(target & ~0xFFF), 0x1000, PAGE_EXECUTE_READWRITE, (PDWORD)&prevProtect);
*((unsigned short*)target) = 0x00D4;
VirtualProtect((LPVOID)(target & ~0xFFF), 0x1000, prevProtect, (PDWORD)&prevProtect);

Now the victim process is almost ready to be continued. One thing left - exception handler. We will use vectored exception handling mechanism as it allows our handler to be (at least among) the first to handle an exception. Once the handler has been added with AddVectoredExceptionHandler(), you may resume the suspended threads of the process.



Handler

One important thing to do once your handler gets control, is to check for the address where the exception occurred and for the exception code, as we have no intention to deal with irrelevant exceptions:

LONG CALLBACK handler(PEXCEPTION_POINTERS ep)
{
   if(ep->ContextRecord->Eip == target && ep->ExceptionRecord->ExceptionCode == 0xC0000094)
   {
      // Do your stuff here
   }
   else
      // Optionally log other exceptions
      return EXCEPTION_CONTINUE_SEARCH;
   return EXCEPTION_CONTINUE_EXECUTION;
}


Your Stuff

One of the parameters you get with your handler is the pointer to the CONTEXT structure, which provides you with the content of all the registers at the time of the exception. Needless to mention, that you have the access to the process' memory as well. Just as you were in a debugger with the only difference - you have to implement the routine that would show you the data you are interested in. Do not forget to emulate the original instruction replaced by the pseudo breakpoint and advance the Eip accordingly before returning from handler.

One more thing to mention - it may be a good idea to suspend all other threads of the victim process while in the 'your stuff' portion of the handler.


Stability

I am not claiming this method to be bullet proof and I am more than sure ( I simply know) - there are ways to defeat it, however, personally, I have not yet met such software. In addition - this method is tested and stable.


Hope this article was helpful. See you at the next.

P.S. Lazy guys, nerds, etc., do not cry for sources. This method is really simple. Besides, if copy/paste is the only programming technique you are aware of, then, probably, this blog is not the right place for you.

Wednesday, May 30, 2012

CreateRemoteThread. Bypass Windows 7 Session Separation

Internet is full of programmers' forums and those forums are full with questions about CreateRemoteThread Windows API function not working on Windows 7 (when trying to inject a DLL). Those posts made by lucky people, somehow, redirect you to the MSDN page dedicated to this API, which says: "Terminal Services isolates each terminal session by design. Therefore, CreateRemoteThread fails if the target process is in a different session than the calling process." and, basically, means - start the process from your injector as suspended, inject your DLL and then resume the process' main thread. This works... Most of the time... But sometimes you really need to inject your code into a running process. Isn't there a way to do that? Well, there is. As a matter of fact, it is so easy, that I decided not to attach my source code to this article (mainly, because I am too lazy to make it look readable :) ). It appears to be that I am not the only one lazy here :), so I have uploaded the source code.

Let me start as usual, with a note for nerds in order to avoid meaningless comments and stupid discussions. 
The code provided within the article is for example purposes only. Error checks have been omitted on purpose. Yes, there may be another, probably even better, way of doing this. No, manual DLL mapping is not better unless you have plenty of time and nothing to do with it.

All others, let's get to business :)


Opening the Victim Process

This is the easiest part. At this stage you will see whether you are able to inject your code or not (in case of a system process, for example). Nothing unusual here - you simply invoke the good old OpenProcess API

HANDLE WINAPI OpenProcess(
       DWORD dwDesiredAccess, /* in our case PROCESS_ALL_ACCESS */
       BOOL  bInheritHandle, /* no need, so FALSE */
       DWORD dwProcessId /* self explanatory enough */
);

which opens the process specified by dwProcessId and returns a handle to that process, unless, you have no sufficient rights to access that process.


Reading the Shellcode

What you usually see in the examples of shellcode over the internet, is an unsigned char array of hexadecimal values somewhere in the C code. Helps to keep the amount of files smaller, but is not really comfortable to deal with. I decided to store the shellcode in a separate binary file, produced with FASM (Flat Assembler):

use32
   ; offset of the LoadLibraryA address within the shellcode
   dd    func
   ; save all registers
   push  eax ebx ecx edx ebp edi esi
   ; get your EIP
   call  next
next:
   pop   eax
   mov   ebx, eax
   ; get the address of the DLL name
   mov   eax, string - next
   ; do this to avoid possible negative values (due to sign extend)
   movzx eax, al
   add   eax, ebx
   ; pass it to the LoadLibraryA API
   push  eax
   ; get the address of the LoadLibraryA function
   mov   eax, func - next
   movzx eax, al
   add   eax, ebx
   mov   eax, [eax]
   ; call LoadLibraryA
   call  eax
   ; restore registers
   pop   esi edi ebp edx ecx ebx eax
   ; return
   ret
func     dd 0x12345678 ; placeholder for the address
string:

Compiling this code with FASM.EXE will produce a raw binary file, where all offsets are 0 - based. There are some parts in the code above, that may require some additional explanation (for example, why does it not end with ExitThread()). I am aware of this and I will provide you with the explanation a little bit later.

For now, allocate an unsigned char buffer for your shellcode. Make this buffer large enough to contain the shellcode and the name of the DLL (my assumption is, that you passed that name as a command line parameter to your injector). with it's terminating zero.

Once you have read the shellcode into that buffer - append the name of the DLL (which may be a full path to the DLL) to the end of the shellcode with, for example, memcpy() function. Half done with it. Now we still have to "tell" the shellcode where the LoadLibraryA API function is located in memory. Fortunately, the load address randomization in Windows is far from being perfect (addresses  of loaded modules may vary between subsequent reboots, but are the same for all processes). This means that, just as in usual DLL injection, we obtain the address of this API in our process by calling good old GetProcAddress(GetModuleHandleA("kernel32.dll"), "LoadLibraryA") and save it to the "func" variable of the shellcode. Due to the fact that our shellcode may vary in size from time to time (that depends on the needs), we saved the offset to that variable in the first four bytes of the shellcode, which eliminates the need to hardcode the offset. Simply do the following:

*(unsigned int*)(shellcode_ptr + *(int*)(shellcode_ptr)) = (unsigned int)LoadLibraryA_address;

Our shellcode is ready now.


"Create remote thread" without CreateRemoteThread()

As the title of this paragraph suggests - we are not going to use the CreateRemoteThread(). In fact, we are not going to create any thread in the victim process (well, the injected DLL may, but the shellcode won't).


Code Injection

Surely, we need to move our shellcode into the victim process' address space in order to load or library. We are doing it in the same manner, as we would copy the name of the DLL in regular DLL injection procedure:
  1. Allocate memory in the remote process with
    LPVOID WINAPI VirtualAllocEx(
       HANDLE hProcess, /* the handle we obtained with OpenProcess */
       LPVOID lpAddress, /* preferred address; may be NULL */
       SIZE_T dwSize, /* size of the allocation in bytes */
       DWORD  flAllocationType, /* MEM_COMMIT */
       DWORD  flProtect /* PAGE_EXECUTE_READWRITE */
    );
    This function returns the address of the allocation in the address space of the victim process or NULL if it fails.
  2. Copy the shellcode into the buffer we've just allocated in the address space of the victim process:
    BOOL WINAPI WriteProcessMemory(
       HANDLE   hProcess, /* same handle as above */
       LPVOID   lpBaseAddress, /* address of the allocation */
       LPCVOID  lpBuffer, /* address of the local buffer with the shellcode */
       SIZE_T   nSize, /* size of the shellcode together with the appended                                 NULL-terminated string */
  3.    SIZE_T   *lpNumberOfBytesWritten /* if this is zero - check your code */
    );
    If the return value of this function is non zero - we have successfully copied our shellcode into the victim process' address space. It may also be a good idea to check the value returned in the lpNumberOfBytesWritten.

Make It Run
So, we have copied our shell code. The only thing left, is to make it run, but we cannot use the CreateRemoteThread() API... Solution is a bit more complicated.

First of all, we have to suspend all threads of the victim process. In general, suspending only one thread is enough, but, as we cannot know for sure what is going on there, we should suspend them all. There is no specific API that would provide us with the list of threads for a specified process, instead, we have to create a snapshot with CreateToolhelp32Snapshot, which provides us with the list of all currently running threads of all processes running in the system:

HANDLE WINAPI CreateToolhelp32Snapshot(
   DWORD dwFlags, /* TH32CS_SNAPTHREAD = 0x00000004 */
   DWORD th32ProcessID /* in this case may be 0 */
);

This function returns the handle to the snapshot, which contains information on all present threads. Once we have this, we "iterate through the list" with Thread32First and Thread32Next API functions:

BOOL WINAPI Thread32First(
   HANDLE hSnapshot, /* the handle to the snapshot */
   LPTHREADENTRY32 lpte /* pointer to the THREADENTRY32 structure */
);

The Thread32Next has the same prototype as Thread32First.

typedef struct tagTHREADENTRY32{
   DWORD dwSize; /* size of this struct; you have to initialize this field before use */
   DWORD cntUsage; 
   DWORD th32ThreadID; /* use this value to open thread for suspension */
   DWORD th32OwnerProcessID; /* compare this value against the PID of the victim 
                              to filter out threads of other processes */
   LONG  tpBasePri;
   LONG  tpDeltaPri;
   DWORD dwFlags;
} THREADENTRY32, *PTHREADENTRY32;

For each THREADENTRY32 with matching th32OwnerProcessID, open it with OpenThread() and suspend with SuspendThread:

HANDLE WINAPI OpenThread(
   DWORD dwDesiredAccess, /* THREAD_ALL_ACCESS */
   BOOL  bInheritHandle, /* FALSE */
   DWORD dwThreadId /* th32ThreadID field of THREADENTRY32 structure */
);

and

DWORD WINAPI SuspendThread(
   HANDLE hThread, /* Obtained by OpenThread() */
);

Don't forget to CloseHandle(openedThread) :)

Take the first thread, once it is opened (actually, you can do that with any thread that belongs to the victim process) and suspended, and get its CONTEXT (see "Community Additions" here) using the GetThreadContext API:

BOOL WINAPI GetThreadContext(
   HANDLE    hThread, /* handle to the thread */
   LPCONTEXT lpContext /* pointer to the CONTEXT structure */
);

Now, when all the threads of the victim process are suspended, we are may do our job. The idea is to redirect the execution flow of this thread to our shellcode, but make it in such a way, that the shellcode would return to where the suspended thread currently is. This is not a problem at all, as we have the CONTEXT of the thread. The following code does that just fine:

/* "push" current EIP of the thread onto its stack, so that the ret instruction in the shellcode returns the execution flow to this address (which is somewhere in WaitForSingleObject for suspended threads) */
ctx.Esp -= sizeof(unsigned int);
WriteProcessMemory(victimProcessHandle, 
                   (LPVOID)ctx.Esp, 
                   (LPCVOID)&ctx.Eip,
                   sizeof(unsigned int),
                   &bytesWritten);
/* Set the EIP to our injected shellcode; do not forget to skip the first four bytes */
ctx.Eip = remoteAddress + sizeof(unsigned int);

Almost there. All we have to do now, is resume the previously suspended threads in the same manner (iterating with Thread32First and Thread32Next with the same snapshot handle).

Don't forget to close the victim process' handle with CloseHandle() ;)


Shellcode

After all this, the execution flow in the selected thread of the victim process reaches our shellcode, which source code should be pretty clear now. It simply calls the LoadLibraryA() API function with the name/path of the DLL we want to inject.

One important note - it is a bad practice to do anything "serious" inside the DllMain() function. My suggestion is - create a new thread in DllMain() and do all the job there, so that it may return safely.

Hope this article was helpful.

Have fun injecting and see you at the next.




Wednesday, March 21, 2012

Linux Threads Through a Magnifier: Remote Threads

Source code for this article may be found here.

Sometimes, a need may rise to start a thread in a separate process and the need is not necessarily malicious. For example, one may want to replace library functions or to place some code between the executable and a library function. However, Linux does not provide a system call that would do anything similar to CreateRemoteThread Windows API despite the fact that I see people searching for such functionality. You may google for "CreateRemoteThread equivalent in Linux" yourself and see that at least 90% of the results end up with something like "why would you want to do that?" There is a certain type of people in forums, most likely, thinking if they do not have an answer, then, probably, it does not exist and no one would ever need it. Others truly believe, that if they know why, they can tell you how to do that in another way. The latest is sometimes true, but most of the time, the solution being requested is the only one acceptable and that's what people refuse to understand.

So, let's say, you need to inject a thread into a running process for whatever reason (may be you want to perform a "DLL injection" the Linux way - your business). Although, there is no specific system call to allow you that, there are plenty of other system calls and library functions that would "happily" assist you.

Unavoidable ptrace()
First time you take a look at ptrace() it is a bit frightening (just like ioctl()) - one function, lots of possible requests and go figure out when and which parameter is being ignored. In practice, it quite simple. This function is used by debuggers and in cases when one needs to monitor the execution of a process for whatever reason. We will use this function for thread injection in this article.

The first thing you would want to do is to attach to the target process:

   ptrace(PTRACE_ATTACH, pid, NULL, NULL);

PTRACE_ATTACH - request to attach to a running process;
pid - the ID of the process you want to attach to.

If the return value is equal to the pid of the target process - voila, you are attached. If it is -1, however, this means that an error has occurred and you need to check errno to know what has happened. you should keep in mind, that on certain systems you may not be able to attach to a process which is not a descendant of the attaching one or has not specified it as tracer (using prctl()). For example, in Ubuntu, since Ubuntu 10.10 this is exactly the situation. If you want to change that, however, you then need to locate your ptrace.conf file and set ptrace scope to 0.

Since I am using Ubuntu and I can only attach to a child process (unless I want some additional headache) and this is what I am going to cover in this article.


Preparations
The first step, just like in case of Windows, you need to write an injector. It will load the victim process, inject the shellcode and exit. This is the simplest part and the skeleton of such loader would look like this:

#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <sys/ptrace.h>
#include <sys/wait.h>
#include <sys/user.h>

int   main(int argc, char** argv)
{
   pid_t   pid;
   int     status;

   if(0 == (pid = fork()))
   {
      // We are in the child process, so we just ptrace() and execl()
      ptrace(PTRACE_TRACEME, 0, NULL, NULL);
      execl(*(argv+1), NULL, NULL);
   }
   else
   {
      // We are in the parent (injector)
      ptrace(PTRACE_SETOPTIONS, pid, PTRACE_O_TRACEEXEC, NULL);
      // Wait for exec in the child
      waitpid(pid, &status, 0);
      
      // The rest of the code comes here

   }
   return 0;
}

As you can see, the loader forks and then behaves depending on the return value of the fork() function. If it returns 0, this means that we are in the child process (actually, you should check whether it returned -1, which would indicate an error), otherwise, it is a pid of the child process and we are in the parent.

Child
The child code does not have too many things to do. All that needs to be done is to tell the OS that it may be traced and replace itself with the victim executable by calling execl().

Parent
In case of parent, the situation is much different and much more complicated. You should tell the OS, that you want to get notification when the victim process issues sys_execve by calling ptrace() with PTRACE_SETOPTIONS  and PTRACE_O_TRACEEXEC. Then you simply waitpid().

When waitpid() returns (and you should check the return value for -1, which means error), it is still not the best time to start the injection. Especially, given that you may have no idea of what is where in the victim process. The next step is to wait for a system call to occur by telling the OS (and it would be good to skip a couple of system calls, so that the victim may initialize properly):

ptrace(PTRACE_SYSCALL, pid, NULL, NULL);

followed by a loop:

while(1)
{
   if(-1 == waitpid(pid, &status, 0))
   {
      //Some error occurred. Print a message and
      break;
   }

   if(WIFEXITED(status))
   {
      //The victim process has terminated. Print a message and
      break;
   }

   if(WIFSTOPPED(status))
   {
      // Here comes the actual injection code. Actually, all its stages.
   }
   
   if(WIFSIGNALED(status))
   {
      // The victim process received a signal and terminated. Print a message and
      break;
   }

   // All done.
   return 0;
}


Injection 
You should introduce a variable to count stages. Let's name it step

Stage 0 (step = 0)
I have not mentioned it, but ptrace() would notify you twice during a system call. First time right before the system call (so you can inspect registers), the second notification would arrive right after system call's completion (so you can inspect the return value). Therefore, this time we do nothing, but resume the traced victim:

ptrace(PTRACE_SYSCALL, pid, NULL, NULL);

and increment the stage variable.


Stage 1 (step = 1)
Backup victim's registers, portion of victim's code that would be overwritten with your shellcode and, finally, inject your shellcode.

Use ptrace(PTRACE_GETREGS, pid, NULL, regs) where regs is a pointer to struct user_regs (declared in sys/user.h). The content of the victim's registers would be copied there.

Use ptrace(PTRACE_PEEKTEXT, pid, address_in_victim, NULL) to copy the executable code from the victim (to make a backup) and ptrace(PTRACE_POKETEXT, pid, address_in_victim, shellcode) where address_in_victim is what its name suggests (you obtain the initial value from victim's RIP on 64 or EIP on 32 bit systems). Shellcode, however, contains bytes of the code being injected packed into an unsigned long value. You, most probably, would have to make those calls for several iterations, as I do not think your shellcode would be at most 8 bytes.

The start of your shellcode will allocate memory for the thread function (unless you are going to run code that already is there).

start:
   mov   rax, 9      ;sys_mmap
   mov   rdi, 0      ;requested address
   mov   rsi, 0x1000 ;one page
   mov   rdx, 7      ;PROT_READ | PROT_WRITE | PROT_EXEC
   mov   r10, 0x22   ;MAP_ANON | MAP_PRIVATE
   mov   r8, -1      ;fd
   mov   r9, 0       ;offset
   syscall
   db 0xCC

Increment stage variable. Resume the victim process with

ptrace(PTRACE_SINGLESTEP, pid, NULL, NULL);


Stage 2 (step = 2)
Ignore all stops until

0xCC == (unsigned char)(ptrace(PTRACE_PEEKTEXT, pid,
      ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), NULL), NULL) & 0xFF

which would mean that you have reached your break point. Check victim's rax register for return value

retval = ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rax), NULL);

and abort if it contains an error code.

You have to increment the Instruction Pointer (RIP/EIP) before letting the victim to resume:

ptrace(PTRACE_POKEUSER, pid, offsetof(struct user, regs.rip),
       ptrace(PTRACE_PEEKUSER,pid, offsetof(struct user, regs.rip), NULL) + 1);


Increment stage counter and 

ptrace(PTRACE_SINGLESTEP, pid, NULLNULL);


Stage 3 (step = 3)
After allocating memory, your shellcode should copy the thread function there and, actually, create a thread (similar to this).

You should, again, ignore all stops as long as

0xCC != (unsigned char)(ptrace(PTRACE_PEEKTEXT, pid,

      ptrace(PTRACE_PEEKUSER, pid, offsetof(struct user, regs.rip), NULL), NULL) & 0xFF

Once you get to this breakpoint, you know that the thread has been initiated and the injector has done what it was written for.

Now you have to restore the victim to its initial, pre-injection state by restoring the values of the registers:

ptrace(PTRACE_SETREGS, pid, NULL, regs);

and, which is even more important - you have to restore the backed up code by copying back the backed up unsigned longs.

The last thing would be detaching from the victim process:

ptrace(PTRACE_DETACH, pid, NULL, NULL);

At this point, your injector may safely exit letting the victim to continue execution.

Voila! You have just injected a thread into another process.

Output of the injector, victim program and the injected thread























P.S. Shared Object Injection (a la DLL injection)
Although, injection of executable code is quite simple, injection of shared object is a different story. Despite the fact, that Linux kernel provides sys_uselib system call, it may be unavailable on some systems. In such case, you have several options:

  • Check whether the victim uses libdl (dlopen(), dlsym() and dlclose() functions, parse the image and obtain addresses of relevant functions. However, not every program uses libdl.
  • Use sys_uselib system call. However, it may be unavailable.
  • Write your own shared object loader. This may be a real pain, but you would be able to reuse it whenever you need.

Hope this post was helpful. See you at the next.

Friday, March 2, 2012

Defeating Packers for Static Analysis of Malicious Code

I doubt whether there is anybody in either AV industry or among reverse engineers who does not know what a software packer is (for those who don't - this article may help). Malware research and reverse engineering forums are full of packers' related questions, descriptions thereof, unpacking suggestions and links to both packers and unpackers. In short - people have been doing a lot of precious work on defeating packers and protectors.

However, for those of us who are not afraid of static analysis, there is an easier way (I'd dare to say "generic") to handle packers and protectors and retrieve the unpacked form of the executable (cannot hold my self from adding a note for nerds: no, this does not include reversing virtual machines like Oreans' one. This is up to you). So, the main problem is obtaining the unpacked version of code as all the rest may be well handled from there. What we actually need is a dump of unpacked executable. There are lots of memory dumping programs, but some protectors "know" how to handle them, therefore, this article explains a simple and short way of obtaining such dump without teasing the implemented protections.

Knock Knock
First of all, we need to, let's say, get into the process. There are at least two ways to do that:

  • Use the OpenProcess Windows API with, preferably, PROCESS_ALL_ACCESS and read/write from/to process' memory.
  • Inject our code into the process' memory space (simply a code injection or a DLL injection).
My preference is the second one as you mostly have more power operating from inside.

There are several ways to inject a DLL into another process, e.g. calling LoadLibraryA as a remote thread in the victim process or even this one (my preference is the second one again). This is in deed the easiest part. My personal suggestion would be to create a suspended process, inject your DLL and then resume the main thread of the created process as this provides you with greater flexibility.

Set the Trap
No, I am not referring to 0xCC (trap to debugger). Trap, in this case, means something that would trigger the dumper embedded into the injected DLL and cause it to dump the unpacked image, for example, patch one of the API functions with a jmp instruction, which would redirect the execution flow to where we want. Be careful with this approach, as your patch may be well "overpatched" by the protection mechanism of the target executable. Let me give you a couple of suggestions: never patch the first bytes of the API (I assume this code is not a production code, so you may let it be bound to your version of Windows); patch as deep as possible - meaning leave the kernel32.dll alone and go further to ntdll.dll where possible. 

For example, if your target executable outputs a string to a console, that may be a good idea to patch either WriteConsoleA or WriteConsoleW API  function. However, it may be an even better idea to go deeper and patch WriteConsoleInternal (Win7) and install your notification jump there. Once that API is called - chances are that the executable has been fully unpacked. As an alternative, you may simply create a new thread in your DLL and Sleep it for several milliseconds (or even seconds) and then dump the memory.

You may perform these actions in the DllMain of your injected DLL, on the other hand, you may create a separate procedure for this, but then you'd have to use this approach or something similar.

Dumper Triggered
No matter how (either by the API patch or our thread) the dumper is triggered. Sure thing - we are not going to dump the whole memory allocated by the process. We just need the image. The easiest way of getting the information on ImageBase and SizeOfImage of the target module (usually the main module of the process) is to find the corresponding entry in PEB (you may want to check the "Hiding Injected DLL in Windows" post to get more information on PEB and related structures). However, it is important to mention that you HAVE to be careful with that, as the information in PEB may be altered by the protection scheme of the victim executable. Having found the base address and the size of the image, just write the content of that memory region to a file (make sure to take note of image's base address if you are dumping DLL). Quite simple, isn't it? Well, not really. You have to check for memory protection of every region you are currently going to dump as it may have either PAGE_WRITE or PAGE_EXECUTE access rights only, meaning that you cannot access it for reading. Once done with this, you may either let the program execution to continue or terminate the process.

In addition, it is strongly recommended to suspend all the threads of the process, except the thread our code is running in.

Using Dump
Nothing's easier - load the dump into IDA Pro and see how good it handles it.

P.S. Suspending/Resuming Threads
Suspending threads is a bit annoying as you have to get the IDs of all the threads currently running in the system, then select those with process ID of your process and suspend then. The same procedure is applicable for resuming suspended threads.

First of all CreateToolhelp32Snapshot (MSDN):

HANDLE WINAPI CreateToolhelp32Snapshot(
              DWORD wdFlags,
              DWORD th32ProcessID
       );

You have to specify TH32_SNAPTHREAD as flags in order to get threads information. If the return value is not NULL, the you may proceed with Thread32First:

BOOL WINAPI Thread32First(
            HANDLE          hSnapshot,
            LPTHREADENTRY32 lpte
            );

followed by subsequent calls to Thread32Next (has the same arguments) until the return value is FALSE.

The functions Thread32First and Thread32Next fill the THREADENTRY32 structure which has the following format:

typedef struct tagTHREADENTRY32
{
   DWORD dwSize; /* Should be set to sizeof(THREADENTRY32) prior 
                    to calling Thread32First */
   DWORD cntUsage;
   DWORD th32ThreadID;
   DWORD th32OwnerProcessID;
   LONG  tpBasePri;
   LONG  tpDeltaPri;
   DWORD dwFlags;
} THREADENTRY32, *PTHREADENTRY32;

Fields of interest for you would be the th32OwnerProcessID and the th32ThreadID. Compare the th32OwnerProcessID with the ID of the process (previously obtained with GetCurrentProcessId()) your code is running in. If those values are equal, then you have to open the thread with:

HANDLE WINAPI OpenThread(
              DWORD dwDesiredAccess, /* Would be THREAD_ALL_ACCESS */
              BOOL  bInheritHandle, /* FALSE */
              DWORD dwThreadId      /* th32ThreadID */
              );

Then suspend the thread with:

DWORD WINAPI SuspendThread( HANDLE hThread );

while passing the handle obtained with OpenThread().
You have to resume threads once you have saved the dump by calling:

DWORD WINAPI ResumeThread( HANDLE hThread );

Don't forget to close each thread handle with CloseHandle.


That's it. Hope this post was helpful (at least I used this method a lot).
See you at the next.


Friday, December 16, 2011

Executable Code Injection the Interesting Way

So. Executable code injection. In general, this term is associated with malicious intent. It is true in many cases, but in, at least, as many, it is not. Being malware researcher for the most of my career, I can assure you, that this technique appears to be very useful when researching malicious software, as it allows (in most cases) to defeat its protection and gather much of the needed information. Although, it is highly recommended not to use such approach, sometimes it is simply unavoidable.

There are several ways to perform code injection. Let's take a look at them.

DLL Injection
The most simple way to inject a DLL into another process is to create a remote thread in the context of that process by passing the address of the LoadLibrary API as a ThreadProc. However, it appears to be unreliable in modern versions of Windows due to the address randomization (which is currently not true, but who knows, may be once it becomes real randomization).

Another way, a bit more complicated, implies a shell code to be injected into the address space of another process and launched as a remote thread. This method offers more flexibility and is described here.

Manual DLL Mapping
Unfortunately, it has become fashionable to give new fancy names to the old good techniques. Manual DLL Mapping is nothing more than a complicated code injection. Why complicated, you may ask - because it involves implementation of custom PE loader, which should be able to resolve relocations. Adhering the Occam's Razor principle, I take the responsibility to claim, that it is much easier and makes more sense to simply allocate memory in another process using VirtualAllocEx API and inject the position independent shell code. 

Simple Code Injection
As the title of this section states, this is the simplest way. Allocate a couple of memory blocks in the address space of the remote process using VirtualAllocEx (one for code and one for data), copy your shell code and its data into those blocks and launch it as a remote thread.

All the methods listed above are covered well on the Internet. You may just google for "code injection" and you will get thousands of well written tutorials and articles. My intention is to describe a more complex, but also a more interesting way of code injection (in a hope that you have nothing else to do but try to implement this).

Before we start:
Another note for nerds. 
  • The code in this article does not contain any security checks unless it is needed as an example.
  • This is not malware writing tutorial, so I do not care whether the AV alerts when you try to use this method.
  • No, manual DLL mapping is not better ;-).
  • Neither do I care about how stable this solution is. If you decide to implement this, you will be doing it at your own risk.
Now, let's have some fun.




Disk vs Memory Layout
Before we proceed with the explanation, let's take a look at the PE file layout, whether on disk or in memory, as our solution relies on that.

This layout is logically identical for both PE files on disk and PE files in memory. The only differences are that some parts may not be present in memory and, the most important for us, on disk items are aligned by "File Alignment" while in memory they are aligned by "Page Alignment" values, which, in turn may be found in the Optional Header. For full PE COFF format reference check here.

Right now, we are particularly interested in sections that contain executable code ((SectionHeader.characteristics & 0x20000020) != 0). Usually, the actual code does not fill the whole section, leaving some parts simply padded by zeros. For example, if our code section only contains 'ExitProcess(0)', which may be compiled into 8 bytes, it will still occupy FileAlignment bytes on disk (usually 0x200 bytes). It will take even more space in memory, as the next section may not be mapped closer than this_section_virtual_address + PageAlignement  (in this particular case), which means that if we have 0x1F8 free bytes when the file is on disk, we'll have 0xFF8 free bytes when the file is loaded in memory.
The "formula" to calculate available space in code section is next_section_virtual_address - (this_section_virtual_address + this_section_virtual_size) as virtual size is (usually) the amount of actual data in section. Remember this, as that is the space that we are going to use as our injection target.
It may happen, that the target executable does not have enough spare space for our shell code, but let this not bother you too much. A process contains more than one module (the main executable and all the DLLs). This means that you can look for spare space in the code sections of all modules. Why only code sections? Just in order not to mess too much with memory protection.

Shellcode
The first and the most important rule for shellcode - it MUST be position independent. In our case, this rule is especially unavoidable (if you may say so) as it is going to be spread all over the memory space (depends on the size of your shell code, of course). 

The second, but not less important rule - carefully plan your code according to your needs. The less space it takes, the easier the injection process would be.

Let's keep our shell code simple. All it would do is interception of a single API (does not matter which one, select whichever you want from the target executable's import section), and show a message box each time that API is called (you should probably select ExitProcess for interception if you do not want the message box popping up all the time).

Divide your shellcode into independent functional blocks. By independent, I mean that it should not have any direct or relative calls or jumps. Each block should have one data field, which would contain the address of the table containing addresses of all our functions (and data if needed). Such mechanism would allow us to spread the code all over the available space in different modules without the need to mess with relocations at all.

The picture on the left and the diagram below will help you to better understand the concept. 
Init - our initialization function. Once the code is injected, you would want to call this function as a remote thread.
Patch - this block is responsible for actually patching the import table with the address of our Fake.

The code in each of the above blocks will have to access Data in order to retrieve addresses of functions from other blocks.

Your initialization procedure would have to locate the KERNEL32.DLL in memory in order to obtain the addresses of LoadLibrary (yes, it would be better to use LoadLibrary rather then GetModuleHandle), GetProcAddress and VirtualProtect API functions which are crucial even for such a simple task as patching one API call. Those addresses would be stored in Data.

The Injector
While the shellcode is pretty trivial (at least in this particular case), the injector is not. It will not allocate memory in the address space of another process (if possible, of course). Instead, it will parse the the PEB (Process Environment Block) of the victim in order to get the list of loaded modules. Once that is done, it will parse section headers of every module in order to create list of available memory locations (remember, we prefer code sections only) and fill the Data block with appropriate addresses. Let's take a look at each step.

First of all, it may be a good idea to suspend the process by calling SuspendThread function on each of its threads. You may want to read this post about threads enumeration. One more thing to remember is to open the victim process with the following flags: PROCESS_VM_READ | PROCESS_VM_OPERATION | PROCESS_VM_WRITE | PROCESS_QUERY_INFORMATION | PROCESS_SUSPEND_RESUME in order to be able to perform all the following operations. The function itself is quite simple:

DWORD WINAPI SuspendThread(__in HANDLE hThread);

Don't forget to resume all threads with ResumeThread once the injection is done.

The next step would be calling the NtQueryInformationProcess function from the ntdll.dll. The only problem with it is that it has no associated import library and you will have to locate it with GetProcAddress(GetModuleHandle("ntdll.dll"), "NtQueryInformationProcess"), unless you have a way to explicitly specify it in the import table of your injector. Also, try LoadLibrary if the GetModuleHandle does not work for you.

NTSTATUS WINAPI NtQueryInformationProcess(
   __in      HANDLE ProcessHandle,
   __in      PROCESSINFOCLASS ProcessInformationClass, /* Use 0 in order to 
                                               get the PEB address */
   __out     PVOID ProcessInformation,  /* Pointer to the PROCESS_BASIC_INFORMATION
                                                       structure */
   __in      ULONG ProcessInformationLength, /* Size of the PROCESS_BASIC_INFORMATION
                                                     structure in bytes */
   __out_opt PULONG ReturnLength
);

typedef struct _PROCESS_BASIC_INFORMATION
{
   PVOID     ExitStatus;
   PPEB      PebBaseAddress;
   PVOID     AffinityMask;
   PVOID     BasePriority;
   ULONG_PTR UniqueProcessId;
   PVOID     InheritedFromUniqueProcessId;
} PROCESS_BASIC_INFORMATION;

The NtQueryInformationProces will provide you with the address of the PEB of the victim process. This post will explain you how to deal with PEB content. Of course, you will not be able to access that content directly (as it is in the address space of another process), so you will have to use WriteProcessMemory and ReadProcessMemory functions for that.

BOOL WINAPI WriteProcessMemory(
   __in   HANDLE   hProcess,
   __in   LPVOID   lpBaseAddress,  /* Address in another process */
   __in   LPCVOID  lpBuffer,  /* Local buffer */
   __in   SIZE_T   nSize,  /* Size of the buffer in bytes */
   __out  SIZE_T*  lpNumberOfBytesWritten
};

BOOL WINAPI ReadProcessMemory(
   __in   HANDLE   hProcess,
   __in   LPCVOID  lpBaseAddress, /* Address in another process */
   __out  LPVOID   lpBuffer,  /* Local buffer */
   __in   SIZE_T   nSize,  /* Size of the buffer in bytes */
   __out  SIZE_T*  lpNumberOfBytesRead
};

Due to the fact that you are going to deal with read only memory locations, you should call VirtualProtectEx in order to make those locations writable (PAGE_EXECUTE_READWRITE).  Don't forget to restore memory access permissions to PAGE_EXECUTE_READ when you are done. 

BOOL WINAPI VirtualProtectEx(
   __in  HANDLE hProcess,
   __in  LPVOID lpAddress, /* Address in another process */
   __in  SIZE_T dwSize,  /* Size of the range in bytes */
   __in  DWORD  flNewProtect, /* New protection */
   __out PDWORD lpflOldProtect
};

You may also want to change the VirtualSize of those sections of the victim process you used for injection in order to cover the injected code. Just adjust it in the headers in memory.

That's all folks. Let me leave the hardest part (writing the code) up to you this time. 

Hope this post was interesting and see you at the next.