JSON Smuggling: A far-fetched intrusion detection evasion technique

Grimminck
5 min readFeb 7, 2024

TL:DR Insignificant whitespaces in the JSON standard can be used to encode data without breaking the format. This could aid malicious actors in covert lateral movement or data exfiltration.

Mastering the art of intrusion involves a delicate balance — gaining access to critical systems and doing so in a covert manner. This drives attackers to find ways to move files, data, and executables around networks to either gain access, gather data or navigate through different parts of the network in a concealed way.

In this blog post, I try to shed some light on JSON smuggling, a technique that uses the standard implementation of the JSON specification [RFC 8259] itself to discreetly transfer data across networks, presenting a formidable challenge to conventional IDS and IPS detection mechanisms.

This technique resembles similar methods observed in real-world scenarios, where attackers have utilized tactics like whitespace steganography, image steganography, covert channels within protocols, concealed data within file formats, and DNS tunneling to fulfill their malicious goals. By using these covert communication techniques, attackers can bypass traditional security measures and execute their actions without being noticed.

The JSON specification as defined in https://datatracker.ietf.org/doc/html/rfc8259 allows for “insignificant whitespace” characters to be used within the JSON objects.

Insignificant whitespace is allowed before or after any of the six
structural characters.

ws = *(
%x20 / ; Space
%x09 / ; Horizontal tab
%x0A / ; Line feed or New line
%x0D ) ; Carriage return

This smuggling technique uses these characters as data encoding for injecting a valid JSON object with additional data, without altering the source data itself. Remarkably, this process doesn’t compromise the integrity of the JSON data or trigger alarms in most intrusion detection systems. By using combinations of spaces, tabs, newlines, and carriage returns, data can be covertly embedded within JSON files, flying under the radar of parsers and detection systems.

An example:

Normal JSON object:

{
"Month": "February",
"year": 2024
}

JSON object with smuggled data (hint: select the text):

{
"Month": "February",
"year": 2024
} ⏎

To the human eye, and almost all systems that interpret JSON these two examples are exactly the same, as the standard allows for insignificant whitespaces characters to exist. They both contain one valid JSON object, and are interpreted as such. However, the second example contains the encoded data “Hello World!” in tabs and spaces.

A hexdump of the last objects reveals us the encoded data:

00000000  7b 0a 20 20 22 4d 6f 6e  74 68 22 3a 20 22 46 65  |{.  "Month": "Fe|
00000010 62 72 75 61 72 79 22 2c 0a 20 20 22 79 65 61 72 |bruary",. "year|
00000020 22 3a 20 32 30 32 34 0a 7d 0a 20 09 20 20 09 20 |": 2024.}. . . |
00000030 20 20 20 09 09 20 20 09 20 09 20 09 09 20 09 09 | .. . . .. ..|
00000040 20 20 20 09 09 20 09 09 20 20 20 09 09 20 09 09 | .. .. .. ..|
00000050 09 09 20 20 09 20 20 20 20 20 20 09 20 09 20 09 |.. . . . .|
00000060 09 09 20 09 09 20 09 09 09 09 20 09 09 09 20 20 |.. .. .... ... |
00000070 09 20 20 09 09 20 09 09 20 20 20 09 09 20 20 09 |. .. .. .. .|
00000080 20 20 20 20 09 20 20 20 20 09 20 20 20 20 09 20 | . . . |
00000090 09 20 |. |
00000092

While not inherently remarkable, the intriguing aspect lies in the potential for encoding data within a JSON file. I’ve developed a script that reads data from various file formats, including executables, images, and videos. This script then transforms this data into the “insignificant” whitespace characters which can be added to a normal JSON object. Subsequently, a decoding program can revert this encoding to the original data from the JSON file, making the possibilities quite expansive.

In the following example I add an executable to a JSON object, validate it against a JSON processor and then decode it back to the original executable.

In scenarios where attackers seek to exchange data between compromised network segments for lateral movement or data exfiltration, especially in the presence of perimeter monitoring, JSON smuggling emerges as a potent technique. By leveraging this method, adversaries can effectively sidestep conventional malware detection rules, creating files that remain valid and innocuous on the surface.

Fortunately, YARA can aid in detecting JSON smuggling by identifying consecutive whitespace characters within the JSON structure. However, detecting these whitespace characters is a computationally intensive operation, so it might not be logical for your security setup. To illustrate, the following YARA rule provides a means to accomplish this detection:

/*
This Yara ruleset is under the GNU-GPLv2 license (http://www.gnu.org/licenses/gpl-2.0.html) and open to any user or organization, as long as you use it under this license.
*/

rule Detect_Hidden_Data_In_Files {
meta:
author = "Stefan Grimminck"
description = "Detects hidden data in files by identifying sequences of at least 8 consecutive whitespace characters."
date = "2024-02-05"

strings:
$hidden_data = /[\x20\x09\x0A\x0D]{8,}/ // Pattern for at least 8 consecutive whitespace characters

condition:
$hidden_data
}

For many, the JSON smuggling technique may not initially pose a significant threat to their infrastructure. However, when integrated with intrusion frameworks, this method transforms into a potent tool for moving data within or beyond monitored networks, all while maintaining a low risk of detection.

Moreover, by employing more sophisticated encoding techniques utilizing the six insignificant characters, attackers can further minimize the size of the encoded payload, enhancing their stealth and efficacy in covert operations.

In summary, JSON smuggling is an intrusion evasion technique that utilizes insignificant whitespaces in the JSON specification to allow covert data transfer across monitored networks. It poses a risk for lateral movement and data theft within monitored perimeters. However, due to its niche nature and likely infrequent usage, detection may be challenging and resource-intensive. Despite these challenges, attackers can still exploit common formats for malicious purposes (XML isn’t save either ;) ).

--

--