Nowhere to Hide: Three methods of XOR obfuscation
Joshua Cannell
Joshua Cannell
A couple of months ago, I did an article on generic obfuscation techniques used to hide malware. It continues to be no surprise that malware tries to hide using an array of techniques that are easy to implement.
I wanted to elaborate on one of those techniques I mentioned earlier, which was the exclusive or more commonly abbreviated “XOR” logical operation. In computer science, XOR is a type of bitwise operation used to manipulate values, along with several others to include AND, OR, NOT, etc. Back when I had my first lesson in Discrete Mathematics, I remember creating what is known as a truth table to help me better understand how these bitwise operations worked. A truth table uses Boolean logic to compute the value of an expression—here is a simple one for an XOR operation.
XOR Truth Table | ||
Input | Output | |
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
As you can see from the table above, the input values must differ for the result to be true. If they are the same, the result is false.
Let’s try a practical example using the XOR operation. This time we’ll XOR the letter ‘J’ with the letter ‘v’ and observe the results. The first thing we’ll need to do is consult the standard ASCII table and see which numeric value corresponds to these two letters.
So it looks like a ‘J’ is 0x4A and a ‘v’ is 0x76. A byte is equal to 8 bits, and a hexadecimal (hex) number is equal to 4 bits, therefore two hex numbers equal a byte. If we convert 0x4A and 0x76 to binary we have 01001010 and 01110110, respectively.
Now, if you’re looking at the chart and are confused by all the numbers, you may need to brush-up on different numbering systems. The numbers corresponding to the values here are actually all the same number, just represented differently. Discussing various numbering systems is outside the scope of this blog, so if you need help understanding the difference, I suggest doing some research on hexadecimal first, since that’s what we’ll primarily be using.
Ok, now we can XOR these two values at the bit level, so let’s go ahead and do that.
01001010 (0x4A) 01110110 (0x76) 00111100 (0x3C)
Our result is 0x3C, which in ASCII is a less-than sign (<). Pretty neat, huh?A malicious approach After reading the examples above, you might have been able to figure out that this same technique could be used as a simple form of encryption/obfuscation in malware.
In fact, most malware I look at nowadays has some form of XOR obfuscation. Whether it’s to decode strings, an embedded file, or self-modifying code, using the XOR operator is good at getting the job done. Why is it used so much? Well, when compared to cryptography, bitwise operations and rotations are much easier to implement while programming.
In this article I’m going to cover three examples where files have been obfuscated using an XOR sequence. These three scenarios will all use the XOR operation to obfuscate the malware, but all in different ways. These obfuscated samples were all found in the wild and have all been identified as some form of malware.
Scenario 1: I received the file below from a user on our forums and soon discovered something wasn’t right after opening it in a hex editor.
One skill that is useful to have when trying to decrypt a file is pattern recognition. What is the pattern you see in the image above? It’s also important to know file structure, especially for a Portable Executable (PE) file. In some cases, knowing what a byte is supposed to be in its de-obfuscated form can be the difference in finding a reliable pattern or not.
If there is one flaw to using XOR to obfuscate your file, it’s that any byte you XOR with 0 stays the same. Therefore, if you are going to XOR an entire file with the same byte, anytime you encounter a zero, that byte will then become the XOR key.
Ok, getting back to the point, this first one is pretty easy. After examining the file it appears that every other byte needs an XOR of 0x33 applied, or a ‘3’ in ASCII. The easiest way to fix the file is to write a quick script. I like python, but you could use any language to do this really. First I renamed the obfuscated file to malware and then I wrote the following code.
It’s a very simple script, and just reads one byte at a time until the end of the file (EOF) is reached, performing an XOR 0x33 against every other byte. Below we have our decrypted file, which I just called ‘decode’. As you can see, the decrypted file now looks like a normal PE.
It turned out this file was incomplete, which was kind of a bummer. Nonetheless, it was still good practice and a good warm-up for what’s coming in our second scenario.
Scenario 2: This next scenario is going to be a little harder than the first. I received this file from a researcher, which was dropped by a web exploit. Unlike most web exploits, this file was written to disk in an encrypted format. I’m unsure if the exploit went wrong or this was intentional, but after some digging I found it used XOR obfuscation.
When we take our first look, we can immediately see that this is nothing like the first example. Every single byte is obfuscated, and there are no obvious patterns here.
Since the file is dropped from an exploit and executed, we know it’s likely a PE, which means there should be several zero-value bytes littered throughout the headers. There is one thing, however, that we can instantly rule out: it’s not a single-byte XOR value, like the first sample. If it were, we would see the value repeat itself many times sequentially.
Let’s think about those zero-value bytes for a minute. Recall that I mentioned any value XOR 0 is equal to that value, or in mathematical terms, x ^ 0 = x. If we look at our decoded file from the first scenario, we see that the third row is nothing but zeros. When observing the third row in our new file, we have the following values: 96 08 FA EC DE C0 22 84 66 58 4A BC 2E 90 72 54. Let’s go ahead and search the entire file for this 16-byte sequence and see what we find.
As can be seen from the image, this pattern is found many times. What can also be seen is a repeating pattern starting with 56 48 BA 2C, and then continuing to make a 128-byte sequence. Now we’re onto something here; let’s perform a test and take the first 64 bytes of that pattern and try an XOR against the first 64 bytes of the file. Bitwise operators like XOR are available in most hex editors you’ll find, including one my favorites, ICY Hexplorer.
We have a winner! Now we just need to write a script and use our 128-byte pattern to decode our file. I first renamed my file to malware128.exe and then modified my existing script, adding the byte pattern as a list.
This sample was a little more challenging, but still not too difficult. Our last sample will be the toughest so far, so let’s move on and see what’s in store.
Scenario 3: For the last scenario, we have something a little more unique than our previous samples. Every now and then I perform a search on Virustotal to see if I can find any obfuscated malware that may have been uploaded by a user or perhaps a honeypot. A few days ago I came across a sample that I found particularly interesting. At first glance, I thought this file would be pretty easy to de-obfuscate.
If you followed the first two scenarios and understood them, you should instantly be able to recognize the 0x29 pattern here. We’ll go ahead and XOR the file with 0x29 and we should be done.
Hmm…well that’s definitely not right. I don’t see a DOS stub or a PE header here, but it does look like the zero values are in the right places. So, we can conclude that zero values need an XOR 0x29 applied, but we’re still unsure about the rest.
After some further inspection, it’s easy to notice this file is a PE file, even though it’s obfuscated. If you compare the format of this obfuscated file with that of the de-obfuscated one on our first scenario, you’ll notice the similarities in file structure; both might have been passed through the same type of compiler.
Remember, it’s useful to know what values are supposed to be de-obfuscated. Now we can use a simple equation to determine what the XOR value is for some of these bytes. Suppose we start with the string ‘This program cannot be run in DOS mode.’ that’s located in the DOS stub. First we’ll decode the letter ‘o’, which in ASCII is a 0x6F (0xBA in our obfuscated file).
XORKEY = 0x6F ^ 0xBA XORKEY = 0xD5
After some time I managed to start building a lookup table ranging from 0x00 to 0xFF. For every byte I encountered in the obfuscated file, I placed a corresponding XOR value to retrieve the de-obfuscated byte. Here is what I managed to find out after a little work. Notice you can see the 0xBA value needs an XOR 0xD5 applied, just like we determined above.
If you look at the area I circled, you’ll see that the second hex value in the byte follows a pattern: 1155995511559955…and so on. Using this pattern and a little trial and error, I managed to fill in the rest of the table and successfully de-obfuscate the file with another script.
The file turned out to be a Zeus Trojan, which is nothing unique. However, we just learned yet another interesting obfuscation technique that we might see in the future.
Conclusion While this article focuses mainly on the XOR operator, this is only the tip of the iceberg for binary obfuscation. As aforementioned, bit rotations and other operators exist that can be used to hide data, but it’s safe to say that XOR is likely the most popular. For more advanced malware, encryption like DES, RC4, or AES might be used, and if that’s the case, you’re going to need more help than the techniques in this article to break anything that complex.
Regardless, I hope this article has given you some additional insight on how exclusive or (XOR) obfuscation works. As I hope you gleaned from this article, using the XOR operator is popular for obfuscation since it’s easy to use, does the job well, and offers flexible implementations. Obfuscating malware using XOR techniques continue to be popular to avoid detection from Antivirus/Antimalware products and other network detection systems. We can expect to see it used to hide nasty programs for years to come._______________________________________________________________________________
Joshua Cannell is a Malware Intelligence Analyst at Malwarebytes where he performs research and in-depth analysis on current malware threats. He has over 5 years of experience working with US defense intelligence agencies where he analyzed malware and developed defense strategies through reverse engineering techniques. His articles on the Unpacked blog feature the latest news in malware as well as full-length technical analysis. Follow him on Twitter @joshcannell