# Theodosius - Jit linker, Mapper, Mutator, and Obfuscator
Theodosius (Theo for short) is a jit linker created entirely for obfuscation and mutation of both code, and code flow. The project is extremely modular in design and supports
both kernel and usermode projects. Since Theo inherits HMDM (highly modular driver mapper), any vulnerable driver that exposes arbitrary MSR writes, or physical memory read/write can be used with this framework to map unsigned code into the kernel. This is possible since HMDM inherits VDM (vulnerable driver manipulation), and MSREXEC (elevation of arbitrary MSR writes to kernel execution).
Since Theo is a jit linker, unexported symbols can be jit linked. Resolving such symbols is open ended and allows the programmer of this framework to handle how they want to resolve symbols. More on this later (check out example projects).
A linker is a program which takes object files produces by a compiler and generates a final executable native to the operating system. A linker interfaces with not only object files but also static libraries, "lib" files. What is a "lib" file? Well a lib file is just an archive of obj's. You can invision it as a zip/rar without any compression, just concatination of said object files.
Theo is a jit linker, which means it will link objs together and map them into memory all at once. For usability however, instead of handling object files, Theo can parse entire lib files and extract the objects out of the lib.
If you define a c++ file called "main.cpp" the compiler will generate an object file by the name of "main.obj". When you refer to data or code defined in another c/c++ file, the linker uses a symbol table to resolve the address of said code/data. In this situation I am the linker and I resolve all of your symbols :).
Static linking is when the linker links entire routines not created by you, into your code. Say `memcpy` (if its not inlined), will be staticlly linked with the CRT. Static linking also allows for your code to be more independant as all the code you need you bring with you. However, with Theo, you cannot link static libraries which are not compiled with `mcmodel=large`. Theo supports actual static linking, in other words, using multiple static libraries at the same time.
Dynamic linking is when external symbols are resolved at runtime. This is done by imports and exports in DLL's (dynamiclly linked libraries). Theo supports "dynamic linking", or in better terms, linking against exported routines. You can see examples of this inside of both usermode and kernelmode examples.
For integration with visual studios please open install [llvm2019](https://marketplace.visualstudio.com/items?itemName=MarekAniola.mangh-llvm2019) extension, or [llvm2017](https://marketplace.visualstudio.com/items?itemName=LLVMExtensions.llvm-toolchain) extension. Once installed, create or open a visual studio project which you want to use with LLVM-Obfuscator and Theo. Open ***Properties*** --> ***Configuration Properties*** ---> ***General***, then set ***Platform Toolset*** to ***LLVM***.
Once LLVM is selected, under the ***LLVM*** tab change the clang-cl location to the place where you extracted [clang-cl.rar](https://githacks.org/_xeroxz/theodosius/-/blob/cc9496ccceba3d1f0916859ddb2583be9362c908/resources/clang-cl.rar). Finally under ***Additional Compiler Options*** (same LLVM tab), set the following: `-Xclang -std=c++1z -Xclang -mcode-model -Xclang large -Xclang -fno-jump-tables -mllvm -split -mllvm -split_num=4 -mllvm -sub_loop=4`.
Please refer to the [LLVM-Obfuscator Wiki](https://github.com/obfuscator-llvm/obfuscator/wiki) for more information on commandline arguments.
Theodosius uses the same class structure as HMDM does. Its a highly modular format which allows for extreme usage, supporting almost every idea one might have. In order to use Theo, you must first define three lambdas, `theo::memcpy_t` a method of copying memory, `theo::malloc_t` a method to allocate executable memory, and lastely `theo::resolve_symbol_t` a lamdba to resolve external symbols.
This uses VDM to syscall into memcpy exported by ntoskrnl... If you want to do something in usermode you can proxy memcpy to `WriteProcessMemory` or any other method of writing memory.
This lambda is used to allocate executable memory. Any method will do as long as the memcpy lambda can write to the allocated memory. An MSREXEC example for this lambda is defined below.
This lambda uses MSREXEC to allocate kernel memory via ExAllocatePool. However this is completely open ended on how you want to do it, you can allocate your memory into discarded
sections, you can allocate your memory in another address space, etc... Its extremely modular.
Another, yet simple, usermode example for this lambda is defined below.
### `theo::resolve_symbol_t` - resolve external symbol
This lambda will try and resolve external symbols. Symbols which are not defined inside of any object files. For example `PiddbCacheTable`, an unexported ntoskrnl symbol, which normally people sig scan for, can now be jit linked. This is possible by parsing a MAP file, however you can recode this to support PDB's, etc. Again its completely opened ended on how you want to resolve symbols.
Another example of this lambda can be viewed in the usermode examples. This routine simply loops over every single module mapped into the specific process you want to map/link with.
Once all three lambdas are defined, you can then create a `theo::hmm_ctx` (highly modular mapper context). This class is like the one from HMDM however it requires an extra lambda to resolve external symbols.
The entry point of the mapped code is not invoked by `hmm_ctx`, but rather its left up to you to call. An example of calling the entry point can be seen below.
In order to allow for a routine to be scattered throughout a 64bit address space, RIP relative addressing must not be used. In order to facilitate this, a very special version
of clang-cl is used which can use `mcmodel=large`. This will generate instructions which do not use RIP relative addressing when referencing symbols outside of the routine in which the
instruction itself resides. The only exception to this is JCC instructions, (besides call) also known as branching instructions. Take this c++ code for an example:
```cpp
ObfuscateRoutine
extern "C" int ModuleEntry()
{
MessageBoxA(0, "Demo", "Hello From Obfuscated Routine!", 0);
UsermodeMutateDemo();
UsermodeNoObfuscation();
}
```
This c++ function, compiled by clang-cl with `mcmodel=large`, will generate a routine with the following instructions:
```
0x00: ; void UsermodeNoObfuscation(void)
0x00: public ?UsermodeNoObfuscation@@YAXXZ
0x00: ?UsermodeNoObfuscation@@YAXXZ proc near ; CODE XREF: ModuleEntry+42↓p
As you can see from the code above, (sorry for the terrible syntax highlighting), references to strings and calls to functions are done by first loading the address of the symbol into a register and then interfacing with the symbol.
Each of these instructions can be anywhere in virtual memory and it would not effect code execution one bit. However this is not the case with routines which have conditional branches. Take the following c++ code for example.
Uh oh, `jnb loc_99`?, thats RIP relative! In order to handle branching operations, a "jump table" is generated by `obfuscation::obfuscate` explicit default constructor. Instead of branching to the RIP relative code, it will instead branch to an inline jump (`JMP [RIP+0x0]`). As demonstrated below, the branching operation is altered to branch to an asbolute jump.
The linker is able to get the address of the branching code by taking the rip relative virtual address of the branching operation, which is a signed number, and adding it to the current byte offset into the current routine, plus the size of the branching instruction. For example `LoopDemo@17` + size of the branching instruction, which is six bytes, then adding the signed relative virtual address (0x2A). The result of this simple calculation gives us `LoopDemo@65`, which is correct, the branch goes to `add rsp, 28h` in the above example.
The usage of the word obfuscation in this project is use to define any changes made to code, this includes code flow. `obfuscation::obfuscate`, a base class, which is inherited and expanded upon by `obfuscation::mutation`, obfuscates code flow by inserting `JMP [RIP+0x0]` instructions after every single instruction. This allows for a routine to be broken up into unique allocations of memory and thus provides more canvas room for creative ideas.
The base class, as described in the above section, contains a handful of util routines and a single explicit constructor which is the corner stone of the class. The constructor fixes JCC relative virtual addresses so that if the condition is met, instead of jumping instruction pointer relativitly, it will jump to an addition jmp (`JMP [RIP+0x0]`).
LEA's, nor CALL's are rip relative, even for symbols defined inside of the routine in which the instruction is compiled into. In other words JCC instructions are the only instruction pointer relative instructions that are generated.
This class inherits from `obfuscate` and adds additional code, or "mutation". This class is a small example of how to use inheritance with `obfuscate` base class. It generates a stack push/pop palindrome. The state of the stack is restored before the routines actual instruction is executed. The assembly will now look like this in memory:
This example uses MSREXEC and Theodosius to map unsigned code into the kernel. This example is inside of the "Examples" folder. I would also like to note that in this demo external unexported ntoskrnl symbols are resolved by using a MAP file. This map file looks like this:
Mind the space at the beginning of each line. If you want to generate a file like this, put ntoskrnl.exe into IDA Pro and then click File ---> Produce File ---> Create MAP File, dont select "Segment Information", but do select "Demangled Names". After the MAP file is generate, please delete all of the garbage at the beginning of the file. I.E delete all spaces and "Address, Public By Value" stuff.
Once you have generated a map file for ntoskrnl.exe, or any other binary you want to link with, you can then use it to resolve external symbols. In the `DemoDrv` project, I reference two external symbols. One being `PiddbCacheTable`, and the other being a win32kfull.sys export.
```cpp
// this is a demo of resolving non-exported symbols...
// win32kfull.sys export example...
extern "C" void NtUserRegisterShellPTPListener();
extern "C" void* PiDDBCacheTable;
```
These two symbols are simply printed out via DbgPrint.
// example of calling other obfuscated/non obfuscated routines...
PrintCR3();
LoopDemo();
}
```
Once compiled the assembly will look like this. Note that each reference to symbols is done via a relocation to an absolute address. This means strings can (and will) be mapped into their own allocation of memory.
Theo calculates the size of each symbol by subtracting the address of the next symbol (in the same section), from the address of the symbol itself. If the symbol is the last one in a section, the distance between the start of the symbol and the end of the section is used. Now lets take a look at what happens when we link/map this routine. Theo starts by allocating space for all non-obfuscated symbols.
> ??_C@_0BG@GFEIGDHO@?$DO?5Current?5CR3?5?$DN?50x?$CFp?6?$AA@ allocated at = 0xFFFF998BC5361FB0, size = 22
> ??_C@_0BB@HGKDPLMC@?$DO?5Loop?5Demo?3?5?$CFd?6?$AA@ allocated at = 0xFFFF998BC5364FA0, size = 17
> ??_C@_0BA@LBLNBFIC@?$DO?5Hello?5World?$CB?6?$AA@ allocated at = 0xFFFF998BC5365FA0, size = 16
> ??_C@_0BK@PLIIADON@?$DO?5PiDDBCacheTable?5?$DN?50x?$CFp?6?$AA@ allocated at = 0xFFFF998BC5366EA0, size = 26
> ??_C@_0DE@FLODGMCP@?$DO?5win32kfull?$CBNtUserRegisterShell@ allocated at = 0xFFFF998BC5366EE0, size = 52
> ??_C@_0BD@JGNLDBEI@?$DO?5DrvEntry?5?$DN?50x?$CFp?6?$AA@ allocated at = 0xFFFF998BC5366F40, size = 19
> ?PrintCR3@@YAXXZ allocated at = 0xFFFF998BC5366F80, size = 58
```
As you can see, each string gets its own pool, each global variable does too, and every non-obfuscated routine is mapped into its own pool. The memory however, has not been copied yet since there are relocations that need to happen before they are copied into memory (in PrintCr3).
The next thing Theo does is allocate space for obfuscated routines. In the `DemoDrv`, there is a demo for each type of obfuscation (just mutation and control flow obfuscation for now).
As you can see, Theo uses Zydis to go over all routines marked for obfuscation and generates new symbols for each instruction inside of the routine. The symbol goes by `[RoutineName]@[Instruction Offset]`. Note that JCC's are indeed rip relative, these need to be fixed.
Note that in DemoDrv there is a function called "LoopDemo" which is obfuscated. Instead of the JCC instruction branching to the conditional code, it instead branches to an inline jmp. If it doesnt branch, then it simply jumps to the next instruction like normal.
As you can see above, this is what Theo generates for JCC's. Also note that this clang compiler does not generate RIP relative LEA's or CALL's. The only RIP relative stuff Theo deals with are JCC's.
This example uses WinAPI's to allocate virtual memory in another process and also to copy virtual memory. Only exported routines from loaded DLL's in the target process can be resolved.