Metamorphic engines

Kewin Rausch

4.82/5 (7 votes)

Jan 22, 2017

CPOL

16 min read

22258

352

How an application performs metamorphism to adapt and survive in an "hostile" environment.

Download the source here

Introduction

Metamorphic techniques aims to provide to applications the ability to physically change their code without moving away from the given objectives. This means that the actual code and structure of the application is dynamic and subject to changes, but what the application want to do remains unchanged. If your program objective is, for example, to perform some mathematical operations or manage I/O streams, then it will continue to do that and produce the same results… but how it will reach this objective changes.

Why shall an application perform such transformation on itself? Well, this technique has been developed some years ago (lot of years ago, actually) to primary fool Antiviruses. This is actually one of the techniques which makes the level of harm of viral applications to rise to higher level, because basically invalidates Antivirus signature databases, which are the most important information used to locate and later put in quarantine or delete viral infections in your PC. Together with polymorphic engines (see my introduction to them here), they are the actual methods to allow your application to run undetected on an Operating System.

Background

All the concepts founds in this article are for advanced users; using this technique without a deep knowledge in CPU Instruction Sets will likely result in a total fail of your application. C programming skill are a must, since this is the language chosen to develop the engine (my favourite language and my first choice in term of programming, actually). Knowledge of the common build tools is mandatory (like GCC and linkers), together with assemblers (NASM in this article) and additional utilities provided by Linux command line (objdump or readelf).

You also must know how an application is compiled and packed to be loaded and run in an Operating System. This means understanding the PE and ELF file format, for Windows and Linux platforms. Since the technology developed here is valid for x86 processor family, basically all Operating Systems using this platform can be affected by the following software.

As always, I’ll try to keep the description of the application internals as simple as I can, to allow a wider range of readers to understand the following concepts.

Warning: discretion is required.

This techniques can actually be embedded in whatever application to perform metamorphosing of a piece of code, of an entire application and of the engine itself too. These techniques, as I said, are mainly involved in malware development, but can be used more generically for both bad and good purposes. Is up to you to decide if you want to proceed in a direction or in the other one. You can design then applications which can hide in plain sight, or Antiviruses utilities which escape from malware detection and destruction (some viruses disable Antiviruses, or make them inoffensive).

Basically what presented in the next chapters is something similar to a weapon in the real world, and in these days of Virtual Wars.

How metamorphism works?

The main objective of a Metamorphic Engine is to change part of the routines codes of a certain areas of the application to make them change the “shape”, but not their job. Since this concept is not generally immediate to get, let’s introduce it with a metaphor: metamorphism in natural language. Applying metamorphism on an sentence written in english means finding out some synonyms with which you can change the shape of the sentence, while maintaining its meaning unchanged.

If we take the following sentence, for example:

The device will operate under water

and we instruct our metamorphic engine with the rule:

device <--> apparatus

what we will end up having after the sentence transformation will be something like:

The apparatus will operate under water

Now notice that, even if the words of the sentence changed, the meaning remains the same. I'm not a native english writer/speaker, so please try understand the concept behind the previous example; is not so immediate for me to play with methapors or english grammar.

If we traspose the previous sentence like CPU instructions, so that every word is an opcode that the processing unit can understand and execute, what will end up having is something of the following format:

Thedevicewilloperateunderwater

this because CPUs do not need any space, since their vocabolary is way more reduced than human one. Every word does exactly one job, which ultimately results in perform operations on the registers. This is a rough picture of how exactly works intruction sets in the CPU and I reduced a lot the complexity behind that. This allow me to introduce the following important problem with CPU instruction and metamorphism: the miss of alignment with the original code.

Missing the alignment with the original code means to start considering the first executable opcode from the wrong point of the sentence. If, for example, the initial 'T' letter is removed from the sentence, what we will found is something like:

hedevicewilloperateunderwater

which is then translated as the following sentence:

hed evicew illo perateu nderw ater

As you brain, result of millions of years of evolution, realize that something is wrong and all the words are "shifted" by one, a CPU does not. It will actually really try to run the operation "hed", and this will lead in the worse case in the generation of an exception. At this point an OS usually handles such exception by terminate the application and reporting the error to the user, which will realize that something nasty is happening in its computer.

The ability to understand the Instruction Set in the right way, together with the a location of a good starting point procedures to modify, are essential requirements for this technique. Usually metamorphic engines are "specially crafted" directly in assembly language, in such way that they just can change their own behavior or swap part of their code to perform metamorphism.

I did not like such restrictions, and so I realized a generic purpose metamorphic engine.

ISA

The Instruction-Set Analyzer (ISA) is a compact x86 scanner which provides the ability to navigate through the CPU instructions without loosing the alignment with them. This means that, once a reliable starting point is located, you can actually compute how much long is the next x86 opcode. If you know that, you can locate the right "spaces" between the CPU words, and thus following the execution path in the right way.

ISA is not a disassembler nor a CPU emulation tool (but can be extended to cover such roles); the only job it has is to locate the length of the opcodes provided, in order not to loose alignment with the code while scanning for patterns to transform in other instructions. ISA has been instructed to evaluate opcode sizes and format for 32 and 64 bits architectures, by following the documentation provided by Intel Manuals 325462 (search it on Intel web site for more information or to download your copy).

You can found ISA source codes in the isa folder, located in the project root. The source basically contains some tables which resume the logic of the Intel instruction set. By using such tables the analyzer can detect if an opcode is a prefix, if it has a ModR/M or SIB bytes, if there are displacement bytes after them and if the instruction requires an immediate value.

The prefix table, for example, is organized as follows:

#define TT            (X86_ARCH_32)
#define SF            (X86_ARCH_64)
#define AL            (TT | SF)

static char x86_pre[256] = {
/*       00  01  02  03  04  05  06  07  08  09  0a  0b  0c  0d  0e  0f       */
/* 00 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, AL,
/* 10 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* 20 */  0,  0,  0,  0,  0,  0, AL,  0,  0,  0,  0,  0,  0,  0, AL,  0,
/* 30 */  0,  0,  0,  0,  0,  0, AL,  0,  0,  0,  0,  0,  0,  0, AL,  0,
/* 40 */  0,  0,  0,  0,  0,  0,  0,  0, SF, SF, SF, SF, SF, SF, SF, SF,
/* 50 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* 60 */  0,  0,  0,  0, AL, AL, AL, AL,  0,  0,  0,  0,  0,  0,  0,  0,
/* 70 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* 80 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* 90 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* a0 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* b0 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* c0 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* d0 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* e0 */  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,
/* f0 */ AL,  0, AL, AL,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0
};

and identifies that 0x26 (row 2, column 6) is a valid prefix for both 32 and 64 bits architectures. Similar tables (for ModR/M byte and Immediates of one-, two- and three-bytes opcodes) can be found in the same code sheet, together with comments that help to understand how the overall logic has been organized.

The only public procedure in the x86.h header is the following:

int x86_decode(unsigned char * buf, x86arch arch, x86op * op);

and allows you to decode the opcode pointed by the buf pointer, by giving the architecture to take in account and a pointer to an opcode structure that will be filled with precious information. The resulting value is a negative value on error, otherwise the length of the evaluated opcode.

ISA have been debugged in order to make sure it wont loose the alignment while evaluating opcodes, but since this is a project made during my free time, I am not 100% sure that it can cover all the possible opcodes without errors. By looking in the test directory you can find the tests carried on ISA, which are composed by specially crafted assembly code sheets for both 32 and 64 bits architectures. In these files are listed all the possible valid operations (one per opcode entry) that a CPU can perform, aligned to 16 bytes and filled with Int3.

Iseta is a debugging utility that you can use to inspect the RAW binary created by the make, and will use ISA to navigate through the instruction and test if something is wrong with the analyzer internal mechanisms. You can invoke such application without any arguments to see a quick help, or with a similar command line to debug one of the compiled binaries:

piku@HAL:/M1/tests$ ./iseta x86/iset32.bin 0 4128 32

The expected result of this execution is something like:

Istruction Set Analizer debugging utility.
Operating in 32 bits mode.
Analyzing file chunk at 0:
00 d8 cc cc cc cc cc cc cc cc cc cc cc cc cc cc
OPCODE RESUME:
OP code:00
OP size:2
N.of prefixes: 0
1-byte operation!
ModR/M detected: d8

Press ENTER for next opcode...

Analyzing file chunk at 16:
01 d9 cc cc cc cc cc cc cc cc cc cc cc cc cc cc
OPCODE RESUME:
OP code:01
OP size:2
N.of prefixes: 0
1-byte operation!
ModR/M detected: d9

Press ENTER for next opcode...

and so on until the binary file termination. If everything is fine, you should always see exactly one opcode per ENTER keyboard input, starting from 0x00 and arriving to 0xFF. There are some holes in the tables, but they are there according to Intel manuals, since some opcodes are not allowed and probably reserved for the future.

M1 simple metamorphic engine

After having introduced all the necessary mechanism to understand problems and behaviors of metamorphic engines, let's see how they actually work. M1 is a simple, general purpose, metamorphic engine which uses ISA to scan opcodes and change operations according to the hardcoded rules. Since I want to keep this simple, the only rule which has been injected in M1 is arithmetic operation swapping for addition and subtraction. This means that additions of a value n will be changed in subtractionds with the value -n, and subtractions with the value n will be translated in additions of the value -n. Alongside with registers swap, this is one of the most basic operations that a metamorphic engine can perform.

To restrict even more the case, I will force my attention on addition/substraction operated on 1 byte immediate data, and just on certain registers. This means that the metamorphic engine will scan for 0x83 opcode family, and will affect only this types of operations that occurs on the selected registers.

If you invoke make in the project root, two applications will be generated: one is M1 and the other our famous dummy. Dummy will be the initial target where to test the metamorphosis. This application don't do anything smart, but only ask for a value which will be incremented and printed on the standard output five times by five different routines (the "something" family routines, see the sources).

If you try it just after the make, you can see the following output:

piku@HAL:/M1$ ./dummy
Give me a value!!!
piku@HAL:/M1$ ./dummy 1
Value is now 1
Value is now 2
Value is now 3
Value is now 4
Value is now 5

Another text document is generated alongside the applications, and is called dummy.pre.txt. This file will contains the disassembled code section of our dummy application, which is necessary to understand what is actually happening after M1 execution on it. If you concentrate on something procedure, present at the offset 40066e, line 197, you will see in details how the C code has been translated, compiled and then disassembled back from CPU opcodes.

000000000040066e <something>:
  40066e:    55                       push   %rbp
  40066f:    48 89 e5                 mov    %rsp,%rbp
  400672:    48 83 ec 10              sub    $0x10,%rsp
  400676:    89 7d fc                 mov    %edi,-0x4(%rbp)
  400679:    8b 45 fc                 mov    -0x4(%rbp),%eax
  40067c:    89 c6                    mov    %eax,%esi
  40067e:    bf 74 07 40 00           mov    $0x400774,%edi
  400683:    b8 00 00 00 00           mov    $0x0,%eax
  400688:    e8 03 fe ff ff           callq  400490 <printf@plt>
  40068d:    8b 45 fc                 mov    -0x4(%rbp),%eax
  400690:    83 c0 01                 add    $0x1,%eax
  400693:    89 c7                    mov    %eax,%edi
  400695:    e8 a6 ff ff ff           callq  400640 <something_1>
  40069a:    c9                       leaveq
  40069b:    c3                       retq

Here 400672 and 400690 will be changed according to M1 necessity.

NOTE: For this and successive opcode reports, I will suppose you are operating on a 64bits architecture. the case for 32bits is a little different and not covered in this document.

Now it's time to run M1 while pointing to the unfortunate dummy utility. If you run the metamorphic engine with the following arguments:

piku@HAL:/M1$ ./m1 /M1/dummy 1646 45 2

You will evaluate dummy starting from 1646 bytes (in hex 0x66e), for 45 bytes (in hex 0x66e + 0x2d = 0x69b). This is the right offset where our "something" procedure is located! By invoking make dump after having run M1 you will dump another text file with the disassembled target application. By sliding the newly created file to the "something" application (always at 40066e), you will now found that the two operations we pointed out are different (highlighted here):

000000000040066e <something>:
  40066e:    55                       push   %rbp
  40066f:    48 89 e5                 mov    %rsp,%rbp
  400672:    48 83 c4 f0              add    $0xfffffffffffffff0,%rsp
  400676:    89 7d fc                 mov    %edi,-0x4(%rbp)
  400679:    8b 45 fc                 mov    -0x4(%rbp),%eax
  40067c:    89 c6                    mov    %eax,%esi
  40067e:    bf 74 07 40 00           mov    $0x400774,%edi
  400683:    b8 00 00 00 00           mov    $0x0,%eax
  400688:    e8 03 fe ff ff           callq  400490 <printf@plt>
  40068d:    8b 45 fc                 mov    -0x4(%rbp),%eax
  400690:    83 e8 ff                 sub    $0xffffffff,%eax
  400693:    89 c7                    mov    %eax,%edi
  400695:    e8 a6 ff ff ff           callq  400640 <something_1>
  40069a:    c9                       leaveq
  40069b:    c3                       retq

Which show that M1 did its job and swapped the operation correctly. But does the affected application still work? Well, why don't you invoke again the previous command on dummy and just see yourself?

piku@HAL:/M1$ ./dummy 1
Value is now 1
Value is now 2
Value is now 3
Value is now 4
Value is now 5

The same, exact, output is produced, but the application now has physically changed! Compliment, you just performed your first metamorphic operation. :-)

What if you want ot mutate more than one procedure? Well, nothing basically stop you from selecting a bigger area, but you always have to be sure that you are evaluating from a valid opcode and nothing in the middle destroy your alignment. M1 has been left dummy intentionally, since I just wanted to demonstrate how metamorphic engines work, and not really build a powerful one. It happened to me some time to modify some sections of code where, in the middle of it, there was stored what i belive was random data of some sort (well, invalid op codes); this mis-aligned the computation of ISA from that point on, invalidating the following modified procedures (which crashed the application by signalling a Segmentation Fault).

If you rebuild dummy invoking make again, and run M1 using the command line:

piku@HAL:/M1$ ./m1 /M1/dummy 1469 407 2

You will perform metamorphosis starting from procedure something4 and including the main one. If you dump again the dummy application now, you will notice that all the compatible operations have been swapped to the one instructed in M1. Again, if you try to invoke the dummy utility after it's metamorphosis, you will get again a valid output like:

piku@HAL:/M1$ ./dummy 1
Value is now 1
Value is now 2
Value is now 3
Value is now 4
Value is now 5

Moving forward

Now that we have an application which is capable of apply metamorphosis to other applications without destroying them, what is the next step? What else can we do to go on and validate more M1?

Well, obviously modifying a legacy application. :-)

During the tests i performed I tried, with success, to modify Explorer.exe application, but despite the code inside its text section was legal, Windows (version 10) detected the changes and prevented it from running (InPageError, 0xc0000428). While looking inside Windows folder then I copied and tried to run other applications in order to detect who else was able to run outside the classic "C:\Windows" domain, and I detected write.exe.

Write.exe is the simple Wordpad, an application which is half-way between Notepad and Microsoft Office Word (well, way more to the Notepad side I would say). To perform this test you will have to copy it in another folder so that the original file will not be affected by the changes (So from C:\Windows to whatever you want).

As a first operation I scanned write.exe for it's headers, and saved (as for dummy) a backup objdump trace by invoking the following commands (note that I saved the copy of write.exe in the project root):

piku@HAL:/M1$ objdump -x write.exe
piku@HAL:/M1$ objdump -S write.exe > write.pre.dump

The first operation is necessary to obtain information with which you will found the right locations where to modify the procedures, while the second is used to have a copy of the original internal to evaluate after the metamorphosis. The output of the header scan provided me the following feedbacks:

piku@HAL:/M1$ objdump -x write.exe | more

write.exe:     file format pei-x86-64
write.exe
architecture: i386:x86-64, flags 0x0000012f:
HAS_RELOC, EXEC_P, HAS_LINENO, HAS_DEBUG, HAS_LOCALS, D_PAGED
start address 0x0000000140001420

Characteristics 0x22
        executable
        large address aware

Time/Date               Sat Jul 16 04:28:49 2016
Magic                   020b    (PE32+)
MajorLinkerVersion      14
MinorLinkerVersion      0
SizeOfCode              00000a00
SizeOfInitializedData   00002200
SizeOfUninitializedData 00000000
AddressOfEntryPoint     0000000000001420
BaseOfCode              0000000000001000
ImageBase               0000000140000000
SectionAlignment        0000000000001000
FileAlignment           0000000000000200
MajorOSystemVersion     10
MinorOSystemVersion     0
MajorImageVersion       10
MinorImageVersion       0
MajorSubsystemVersion   10
MinorSubsystemVersion   0
Win32Version            00000000
SizeOfImage             00007000
SizeOfHeaders           00000400
CheckSum                00011d73
Subsystem               00000002        (Windows GUI)
DllCharacteristics      0000c160
SizeOfStackReserve      0000000000080000
SizeOfStackCommit       0000000000002000
SizeOfHeapReserve       0000000000100000
SizeOfHeapCommit        0000000000001000
LoaderFlags             00000000
NumberOfRvaAndSizes     00000010

In particular what is necessary here is the AddressOfEntryPoint and BaseOfCode values. By looking for the .text header you will find that is located, as classic, at offset 0x400 from the file start. Armed with such information now you can finally inspect and locate an area where to test the metamorphic engine. Since you can never know if the modified part is affected by the normal opening of the application (where the OS loader load in memory it and start its execution from the entry point), we will aim to modify a part which is immediately near the Entry Point (in a context of code path), so if something is wrong the application will immediately fail and you will know of your failure.

First, we need to read what there is actually at location 0x1420. You can quickly do it by looking at the write.pre.dump file and search for location 140001420; again, for the laziest one I'll report the output here:

   14000141c:    cc                       int3   
   14000141d:    cc                       int3   
   14000141e:    cc                       int3   
   14000141f:    cc                       int3   
   140001420:    48 83 ec 28              sub    $0x28,%rsp
   140001424:    e8 5b 02 00 00           callq  0x140001684
   140001429:    48 83 c4 28              add    $0x28,%rsp
   14000142d:    e9 7e fd ff ff           jmpq   0x1400011b0
   140001432:    cc                       int3   
   140001433:    cc                       int3   
   140001434:    cc                       int3   
   140001435:    cc                       int3

As you can see it does nothing special, and immediately move the execution to another location. So let's move to 140001684 and see what we can find there:

   140001682:    cc                       int3   
   140001683:    cc                       int3   
   140001684:    48 89 5c 24 20           mov    %rbx,0x20(%rsp)
   140001689:    55                       push   %rbp
   14000168a:    48 8b ec                 mov    %rsp,%rbp
   14000168d:    48 83 ec 20              sub    $0x20,%rsp
   140001691:    48 83 65 18 00           andq   $0x0,0x18(%rbp)
   140001696:    48 bb 32 a2 df 2d 99     movabs $0x2b992ddfa232,%rbx
   14000169d:    2b 00 00
   1400016a0:    48 8b 05 61 19 00 00     mov    0x1961(%rip),%rax        # 0x140003008
   1400016a7:    48 3b c3                 cmp    %rbx,%rax
   1400016aa:    0f 85 8f 00 00 00        jne    0x14000173f
   1400016b0:    48 8d 4d 18              lea    0x18(%rbp),%rcx
   1400016b4:    ff 15 76 0a 00 00        callq  *0xa76(%rip)        # 0x140002130
   1400016ba:    48 8b 45 18              mov    0x18(%rbp),%rax
   1400016be:    48 89 45 10              mov    %rax,0x10(%rbp)
   1400016c2:    ff 15 78 0a 00 00        callq  *0xa78(%rip)        # 0x140002140
   1400016c8:    8b c0                    mov    %eax,%eax
   1400016ca:    48 31 45 10              xor    %rax,0x10(%rbp)
   1400016ce:    ff 15 64 0a 00 00        callq  *0xa64(%rip)        # 0x140002138
   1400016d4:    8b c0                    mov    %eax,%eax
   1400016d6:    48 31 45 10              xor    %rax,0x10(%rbp)
   1400016da:    ff 15 48 0a 00 00        callq  *0xa48(%rip)        # 0x140002128
   1400016e0:    8b c0                    mov    %eax,%eax
   1400016e2:    48 c1 e0 18              shl    $0x18,%rax
   1400016e6:    48 31 45 10              xor    %rax,0x10(%rbp)
   1400016ea:    ff 15 38 0a 00 00        callq  *0xa38(%rip)        # 0x140002128
   1400016f0:    8b c0                    mov    %eax,%eax
   1400016f2:    48 8d 4d 10              lea    0x10(%rbp),%rcx
   1400016f6:    48 33 45 10              xor    0x10(%rbp),%rax
   1400016fa:    48 33 c1                 xor    %rcx,%rax
   1400016fd:    48 8d 4d 20              lea    0x20(%rbp),%rcx
   140001701:    48 89 45 10              mov    %rax,0x10(%rbp)
   140001705:    ff 15 3d 0a 00 00        callq  *0xa3d(%rip)        # 0x140002148
   14000170b:    8b 45 20                 mov    0x20(%rbp),%eax
   14000170e:    48 b9 ff ff ff ff ff     movabs $0xffffffffffff,%rcx
   140001715:    ff 00 00
   140001718:    48 c1 e0 20              shl    $0x20,%rax
   14000171c:    48 33 45 20              xor    0x20(%rbp),%rax
   140001720:    48 33 45 10              xor    0x10(%rbp),%rax
   140001724:    48 23 c1                 and    %rcx,%rax
   140001727:    48 b9 33 a2 df 2d 99     movabs $0x2b992ddfa233,%rcx
   14000172e:    2b 00 00
   140001731:    48 3b c3                 cmp    %rbx,%rax
   140001734:    48 0f 44 c1              cmove  %rcx,%rax
   140001738:    48 89 05 c9 18 00 00     mov    %rax,0x18c9(%rip)        # 0x140003008
   14000173f:    48 8b 5c 24 48           mov    0x48(%rsp),%rbx
   140001744:    48 f7 d0                 not    %rax
   140001747:    48 89 05 c2 18 00 00     mov    %rax,0x18c2(%rip)        # 0x140003010
   14000174e:    48 83 c4 20              add    $0x20,%rsp
   140001752:    5d                       pop    %rbp
   140001753:    c3                       retq   
   140001754:    cc                       int3   
   140001755:    cc                       int3

Ok, now things here are getting interesting, since this is a proper procedure. Some basic math will provide us then the location in the source file on the Hard disk and the size of the procedure (0x1684 - 0x1000 + 0x400 = 0xa84, which is decimal is 2692, while the size of this procedure is 206 bytes). You can verify if this is right by opening the executable file with vim and switch to hex view, like this:

piku@HAL:/M1$ vim ./write.exe
:%!xxd -c16

What is left is to invoke M1 with the right arguments to modify the procedure, as follows (I also reported the output of M1 this time):

piku@HAL:/M1$ ./m1 /M1/write.exe 2692 206 2
Read 206 bytes from location a84
Found one at 0x9
To add: 8b --> c0
Found one at 0xd
Found one at 0xca
Written 206 bytes to location a84
File mutated!

As you can see only one change is made, and two other possible candidates have been dropped because does not respect the limits we imposed to the metamorphic engine.

Time to repeat now the objdump operation and to check what is changed:

piku@HAL:/M1$ objdump -S write.exe > write.dump

Now, if you align the write.pre.dump file with write.dump file at location 0x140001684, what you will get is the following procedure (I higlighted the morphed operation at position 0x14000168d):

   140001682:    cc                       int3   
   140001683:    cc                       int3   
   140001684:    48 89 5c 24 20           mov    %rbx,0x20(%rsp)
   140001689:    55                       push   %rbp
   14000168a:    48 8b ec                 mov    %rsp,%rbp
   14000168d:    48 83 c4 e0              add    $0xffffffffffffffe0,%rsp
   140001691:    48 83 65 18 00           andq   $0x0,0x18(%rbp)
   140001696:    48 bb 32 a2 df 2d 99     movabs $0x2b992ddfa232,%rbx
   14000169d:    2b 00 00
   1400016a0:    48 8b 05 61 19 00 00     mov    0x1961(%rip),%rax        # 0x140003008
   1400016a7:    48 3b c3                 cmp    %rbx,%rax
   1400016aa:    0f 85 8f 00 00 00        jne    0x14000173f
   1400016b0:    48 8d 4d 18              lea    0x18(%rbp),%rcx
   1400016b4:    ff 15 76 0a 00 00        callq  *0xa76(%rip)        # 0x140002130
   1400016ba:    48 8b 45 18              mov    0x18(%rbp),%rax
   1400016be:    48 89 45 10              mov    %rax,0x10(%rbp)
   1400016c2:    ff 15 78 0a 00 00        callq  *0xa78(%rip)        # 0x140002140
   1400016c8:    8b c0                    mov    %eax,%eax
   1400016ca:    48 31 45 10              xor    %rax,0x10(%rbp)
   1400016ce:    ff 15 64 0a 00 00        callq  *0xa64(%rip)        # 0x140002138
   1400016d4:    8b c0                    mov    %eax,%eax
   1400016d6:    48 31 45 10              xor    %rax,0x10(%rbp)
   1400016da:    ff 15 48 0a 00 00        callq  *0xa48(%rip)        # 0x140002128
   1400016e0:    8b c0                    mov    %eax,%eax
   1400016e2:    48 c1 e0 18              shl    $0x18,%rax
   1400016e6:    48 31 45 10              xor    %rax,0x10(%rbp)
   1400016ea:    ff 15 38 0a 00 00        callq  *0xa38(%rip)        # 0x140002128
   1400016f0:    8b c0                    mov    %eax,%eax
   1400016f2:    48 8d 4d 10              lea    0x10(%rbp),%rcx
   1400016f6:    48 33 45 10              xor    0x10(%rbp),%rax
   1400016fa:    48 33 c1                 xor    %rcx,%rax
   1400016fd:    48 8d 4d 20              lea    0x20(%rbp),%rcx
   140001701:    48 89 45 10              mov    %rax,0x10(%rbp)
   140001705:    ff 15 3d 0a 00 00        callq  *0xa3d(%rip)        # 0x140002148
   14000170b:    8b 45 20                 mov    0x20(%rbp),%eax
   14000170e:    48 b9 ff ff ff ff ff     movabs $0xffffffffffff,%rcx
   140001715:    ff 00 00
   140001718:    48 c1 e0 20              shl    $0x20,%rax
   14000171c:    48 33 45 20              xor    0x20(%rbp),%rax
   140001720:    48 33 45 10              xor    0x10(%rbp),%rax
   140001724:    48 23 c1                 and    %rcx,%rax
   140001727:    48 b9 33 a2 df 2d 99     movabs $0x2b992ddfa233,%rcx
   14000172e:    2b 00 00
   140001731:    48 3b c3                 cmp    %rbx,%rax
   140001734:    48 0f 44 c1              cmove  %rcx,%rax
   140001738:    48 89 05 c9 18 00 00     mov    %rax,0x18c9(%rip)        # 0x140003008
   14000173f:    48 8b 5c 24 48           mov    0x48(%rsp),%rbx
   140001744:    48 f7 d0                 not    %rax
   140001747:    48 89 05 c2 18 00 00     mov    %rax,0x18c2(%rip)        # 0x140003010
   14000174e:    48 83 c4 20              add    $0x20,%rsp
   140001752:    5d                       pop    %rbp
   140001753:    c3                       retq   
   140001754:    cc                       int3   
   140001755:    cc                       int3

As you can see, again, M1 morphed a subtraction operation with an addition which will have the same effect on the CPU registers. Does write.exe work now? Why don't you run it and see for yourself?

Resume

Binary metamorphism is an interesting technique which allows to instruct an engine to morph a desired target set of instructions. The techniques goes from the most simple change, as the example showed in the previous chapters, to more complex ones which requires .text section extension, relocations recomputation and other kind of adjustment to avoid to destroy the application.

This techniques allows the creator to hide its application in plain sight, because any effort to create an hash on it will be destroyed by the morphing of the internal instructions. This of course does not stop Operating Systems and Antiviruses to perform integrity checks, as I saw in the test over explorer.exe application. As you seen this is all but simple, because requires precise Instruction-Set analysing software, correct morphing rules and alignment detection routines that avoid to change the wrong part of the application.

This is all, and I hope you enjoyed this simple introduction to metamorphism. :-)