Insecure deserialization

In this section, we introduce what insecure deserialization is and describe how it can expose websites to high-hazard attacks. We’ll focus on typical scenarios and demonstrate some concrete examples of DESERialization in PHP, Ruby, and Java. Finally, some methods to avoid unsafe deserialization vulnerabilities will be presented.

Taking advantage of insecure deserialization is often difficult. However, it is sometimes easier than you think. If you are not familiar with deserialization, this section contains some important background information that you should familiarize yourself with first. If you already know the basics of deserialization, you can jump ahead and learn how to use it.

What is serialization

Serialization is the process of converting complex data structures, such as objects and their fields, into a “flatter” format in which data can be sent and received as a sequence of byte streams. Serialized data makes the following process easier:

  • Write complex data to interprocess memory, files, or databases
  • Sending complex data, for example, between different components of an application over a network or THROUGH API calls

Critically, when an object is serialized, its state also remains the same. In other words, the attributes of the object and their assignments are preserved.

Serialization vs deserialization

Deserialization is the process of restoring the byte stream to an exact copy of the original object. The logic of the site can then interact with this deserialized object just as it would with any other object.

Many programming languages provide native support for serialization. How you serialize an object depends on the language. Some languages serialize objects to binary format, while others serialize objects to string format with varying degrees of readability. Note that all properties of the original object are stored in the serialized data stream, including all private fields. To prevent a field from being serialized, it must be explicitly marked “transient” in the class declaration.

Note that when using different programming languages, serialization may be called marshalling (Ruby) or pickling (Python), terms that are synonymous with “serialization.”

What is unsafe deserialization

Insecure deserialization is when data that the user can control is deserialized by the site. This enables an attacker to manipulate serialized objects in order to pass harmful data into application code.

You can even replace a serialized object with an object of a completely different class. Worryingly, objects of any class available on the site will be deserialized and instantiated, regardless of whether the class is intended or not. Therefore, unsafe deserialization is sometimes referred to as an “object Injection “vulnerability.

An object of an unexpected class may cause an exception. Before then, though, the damage may be done. Many deserialization-based attacks are completed before deserialization is complete. This means that the deserialization process itself can be attacked, even if the functionality of the site does not directly interact with malicious objects. Therefore, sites whose logic is based on strongly typed languages are also vulnerable to these techniques.

How does an insecure deserialization vulnerability arise

Unsafe deserialization often occurs because of a general lack of understanding of the dangers of deserialization of user-controlled data. Ideally, user input should not be deserialized at all.

Some site owners think they are secure because they perform some form of additional checking on deserialized data. However, this approach is usually ineffective because it is almost impossible to verify or anticipate all possible scenarios. These checks are also fundamentally flawed because they rely on checking data after it has been deserialized, which in many cases is too late to prevent attacks.

Vulnerabilities can also arise because deserialized objects are generally considered trusted. Especially in languages that use binary serialization formats, developers may assume that users cannot read or manipulate data efficiently. However, while this may require more effort, an attacker is just as likely to exploit binary serialized objects as string-based formats.

Deserialization-based attacks are also possible because of the large number of dependencies in modern websites. A site may use many different libraries, each with its own dependencies, creating a pool of classes and methods that is difficult to manage securely. Because an attacker can create instances of any of these classes, it is difficult to predict which methods can be invoked on malicious data. This is especially true if an attacker is able to chain together a long list of unexpected method calls and pass data to a receiver that is completely unrelated to the original source. As a result, it is nearly impossible to predict the flow of malicious data and plug every potential vulnerability.

In short, it is impossible to safely deserialize untrusted input.

What are the effects of unsafe deserialization

The impact of insecure deserialization can be severe, as it provides a point of entry that leads to a significant increase in the attack surface. It allows an attacker to reuse existing application code in harmful ways, leading to many other vulnerabilities, such as remote code execution.

Even in cases where remote code cannot be executed, insecure deserialization can lead to permission promotion, access to arbitrary files, and denial of service attacks.

How to exploit an insecure deserialization vulnerability

More on that below.

How do I prevent insecure deserialization vulnerabilities

In general, deserialization of user input should be avoided unless absolutely necessary. In many cases, the difficulty of defending against potentially dangerous vulnerabilities outweighs the benefits.

If you do need to deserialize data from an untrusted source, take strong measures to ensure that data has not been tampered with. For example, you can implement a digital signature to check data integrity. However, keep in mind that any checks must be made before deserialization begins. Otherwise, the inspection is useless.

If possible, you should avoid using generic deserialization capabilities. The serialized data for these methods contains all the attributes of the original object, as well as private fields that may contain sensitive information. Instead, you should create your own class-specific serialization methods to control the exposed fields.

Finally, remember that the vulnerability is a deserialization of user input, not the presence of a tool chain that subsequently processes the data. Do not rely on trying to eliminate toolchains identified during testing, which is impractical due to cross-library dependencies. A publicly recorded memory corruption vulnerability at any given time also means that an application may be vulnerable.


Exploit insecure Deserialization vulnerability

In this section, we’ll teach you how to exploit some common vulnerability scenarios using examples of DESERialization in PHP, Ruby, and Java. We hope to prove that it is actually much easier to exploit insecure deserialization than many people think. Even during black box testing, if you can use a pre-built toolchain.

We’ll also guide you through creating attacks based on deserialization high-risk vulnerabilities. Although these often require access to source code, they are also easier to learn than you might think once you understand the basic concepts. We will discuss the following topics:

  • How do I identify unsafe deserialization
  • Modify the serialization objects required by the site
  • Passing malicious data to dangerous website features
  • Inject arbitrary object types
  • Chained method calls to control the flow of data into a dangerous sink
  • Manually create your own advanced exploit
  • PHAR deserialization

Note: Although many of the experiments and examples are based on PHP, most of the development techniques work equally well for other languages.

How do I identify unsafe deserialization

Identifying unsafe deserializations is relatively simple, whether you use white-box tests or black-box tests.

During the audit process, you should look at all incoming data from your site and try to identify any data that looks like serialization. If you know the format used by different languages, it is relatively easy to identify the serialized data. In this section, we’ll show examples of PHP and Java serialization. Once you have identified the serialized data, you can test whether you can control it.

PHP serialization format

PHP uses a nearly readable string format, with letters indicating the data type and numbers indicating the length of each part. For example, suppose a User object has the following attributes:

$user->name = "carlos";
$user->isLoggedIn = true;
Copy the code

After serialization, the object might look like this:

O:4:"User":2:{s:4:"name":s:6:"carlos"; s:10:"isLoggedIn":b:1; }Copy the code

The meaning is:

  • O:4:"User"– An object with a class name of 4 characters “User”
  • 2– The object has two properties
  • s:4:"name"– The key for the first property is the 4-character string “name”
  • s:6:"carlos"– The value of the first attribute is the 6-character string “Carlos”
  • s:10:"isLoggedIn"– The key for the second property is the 10-character string “isLoggedIn”
  • b:1– The second property has a Boolean value of true

The native methods for serializing PHP are serialize() and unserialize(). If you have access to the source code, you should first look unserialize() in all locations and investigate further.

Java serialization format

Some languages, such as Java, use the binary serialization format. This is harder to read, but you can still recognize serialized data if you know how to recognize some of the signals. For example, serialized Java objects always start with the same byte, which is encoded as hexadecimal AC Ed and Base64 rO0.

Any class that implements the interface Java.io.Serializable can be serialized and deserialized. If you have access to the source code, note the use of the readObject() method, which is used to read and deserialize data from InputStream.

Manipulating serialized objects

Exploiting some deserialization vulnerabilities can be as easy as changing properties in a serialized object. When the object state is persisted, you can explore the serialized data to identify and edit the property values of interest. The malicious object is then passed to the site through a deserialization process. This is the initial step in a basic deserialization attack.

Broadly speaking, there are two approaches to manipulating serialized objects. You can edit the object directly as a byte stream, or you can create and serialize the new object yourself by writing a short script in the appropriate language. The latter approach is usually easier when using a binary serialization format.

Modifying object properties

When tampering with data, the deserialization process creates a server-side object with the modified property value as long as the attacker retains a valid serialized object.

As a simple example, suppose a web site uses the serialized object User to store data about a User’s session in a cookie. If an attacker finds this serialized object in an HTTP request, they might decode it to find the following byte stream:

O:4:"User":2:{s:8:"username"; s:6:"carlos"; s:7:"isAdmin"; b:0; }Copy the code

This isAdmin attribute is very interesting to an attacker. An attacker simply changes the Boolean value of this property to 1 (true), then recodes the object and overwrites the current cookie with this modified value. On its own, it doesn’t make any sense. However, if the site uses this cookie to check whether the current user has access to certain administrative functions:

$user = unserialize($_COOKIE);
if ($user->isAdmin === true) {
// allow access to admin interface
}
Copy the code

The code above instantiates the User object based on the data from the cookie, including the attacker’s modified isAdmin property, and does not check the authenticity of the serialized object. In this case, the permission of the modified data is directly upgraded.

This simple scenario is not common. However, editing attribute values in this way shows the first step to an attack.

Modifying data Types

In addition to modifying property values in serialized objects, we can also provide unexpected data types.

Weakly typed languages like PHP are particularly vulnerable to this operation when using the loose comparison operator == to compare different data types. For example, if a loose comparison is performed between an integer and a string, PHP will try to convert the string to an integer, meaning that 5 == “5” evaluates to true.

In particular, this also applies to any alphanumeric string that begins with a number. PHP converts the entire string to an integer value of the initial number, and the rest of the string is ignored entirely. Therefore, 5 == “5 of something” is actually treated as 5 == 5.

This becomes even stranger when comparing a string with the integer 0:

0 == "Example string" // true
Copy the code

Because there are no numbers in the string, PHP treats the entire string as the integer 0.

Consider the case where this loose comparison operator is used with user-controlled data from a deserialized object, which can lead to a dangerous logic flaw.

$login = unserialize($_COOKIE)
if ($login['password'] == $password) {
// log in successfully
}
Copy the code

Suppose the attacker has modified the password property so that it is an integer 0 instead of the expected string. As long as the stored password does not start with a number, authentication will pass. Note that this is only a possibility, because deserialization preserves the data type, and if the code gets the password directly from the request, 0 is converted to a string, and the evaluation of the condition is false.

Note that when changing the data type of any serialized object format, it is important to remember to update any type labels and length indicators in the serialized data as well. Otherwise, the serialized object is corrupted and will not be deserialized.

When using the binary format directly, we recommend using the Hackvertor extension, which is available from the BApp Store. With Hackvertor, you can modify the serialized data to a string, which automatically updates the binary data and adjusts the offset accordingly, saving a lot of manual work.

Use application functions

In addition to simply checking property values, the functionality of a web site can perform dangerous operations on the data in a deserialized object. In this case, you can use insecure deserialization to pass unexpected data and cause damage with the associated functionality.

For example, as part of the site’s “Remove User” feature, a user’s profile picture can be deleted by accessing the $user->image_location property. If $user comes from a serialized object, an attacker can set it to any file path by passing in an object that modiifies image_location. Deleting their own user account will also delete this arbitrary file.

This example relies on an attacker manually invoking a dangerous method through a user-accessible function. However, insecure deserialization becomes more interesting when you build exploits that automatically pass data to dangerous methods. This is achieved by using the “magic method”.

Magic methods

Magic methods are a special subset of methods that do not have to be called explicitly. Instead, they are called automatically when a particular event or scenario occurs. Magic methods are a common feature of object-oriented programming in all languages. They are sometimes indicated by prefixing method names or enclosing them with double underscores.

Developers can add magic methods to classes to determine in advance what code should be executed when the corresponding event or scenario occurs. The exact time and reason for calling a magic method varies from method to method. One of the most common examples in PHP is __construct(), which is called when an object of a class is instantiated, similar to Python’s __init__. Typically, constructor magic methods like this include code to initialize instance properties. However, developers can customize magic methods to execute any code they want.

The magic method is widely used and does not in itself represent a bug. But they can become dangerous when the code they execute processes data that an attacker can control (for example, data from a deserialized object). An attacker can exploit this vulnerability to automatically invoke methods on deserialized data when the corresponding conditions are met.

Most important in this case is that some languages have magic methods that are automatically called during deserialization. For example, PHP’s unserialize() method looks for and calls the magic method __wakeup() on an object.

In Java deserialization, the same applies to the readObject() method, which is essentially similar to the constructor that “reinitializes” the serialized object. The ObjectInputStream. ReadObject () method is used to read data from the initial byte streams. However, serializable classes can also declare their own readObject() methods as follows:

private void readObject(ObjectInputStream in) throws IOException, ClassNotFoundException {... };Copy the code

This gives the class tighter control over the deserialization of its fields. Most importantly, the readObject() method declared in this way acts as a magic method called during deserialization.

You should keep an eye out for any classes that contain this kind of magic method. They allow you to pass data from the serialized object to the site code before the object is fully deserialized. This is a starting point for more advanced exploits.

Injecting arbitrary objects

As we have seen, insecure deserialization can occasionally be exploited by editing objects provided by a web site. Injecting arbitrary objects, however, opens up more possibilities.

In object-oriented programming, the methods available to an object are determined by its classes. Thus, if an attacker can manipulate object classes passed in as serialized data, it can influence code executed after deserialization, or even during deserialization.

Deserialization methods typically do not check for deserialized content. This means that you can pass in an object of any serializable class available to the site, and that object will be deserialized. This allows an attacker to create instances of arbitrary classes. The fact that the object is not an expected class is not important. Unexpected object types can cause exceptions in the application logic, but malicious objects are already instantiated.

If an attacker had access to the source code, they could explore all available classes in detail. To construct a simple attack, they look for classes that contain deserialization magic methods, and then check to see if any of them perform dangerous operations on controllable data. The attacker will then pass in the serialized object of this class to attack using its magic methods.

Classes containing these deserialization magic methods can also be used to launch more sophisticated attacks involving a series of method calls called the “gadget chain”.

Call chain

A “gadget” is a piece of code that exists in an application and helps an attacker achieve a specific goal. A single gadget cannot directly cause any harmful effects on user input. However, an attacker’s goal might simply be to call a method that passes its input to another gadget. By linking multiple gadgets together in this way, attackers may be able to cause maximum damage by passing their input to a dangerous “sink gadget”.

It is important to understand that, unlike other types of attacks, the gadget chain is not a payload of chained methods built by the attacker. All the code already exists on the site. The only thing the attacker controls is the data that is passed to the gadget chain. This is usually done by calling a magic method during deserialization, sometimes called a “boot gadget.”

Many insecure deserialization vulnerabilities can only be exploited using the gadget chain. This can sometimes be a simple one – or two-step chain, but building a high-hazard attack may require a more elaborate sequence of object instantiations and method calls. Therefore, being able to construct a gadget chain is one of the key factors in successfully exploiting insecure deserialization.

Use a pre-built gadget chain

Manually identifying the gadget chain can be a fairly arduous process, almost impossible without source code access. Fortunately, there are a few ways to handle pre-built gadget chains that you can try out first.

There are several tools available to help you build gadget chains with minimal effort. These tools provide a list of pre-discovered gadget chains that have been exploited on other sites. Once you find an insecure deserialization vulnerability on a target site, you can use these tools to try and exploit it, even if you don’t have access to the source code. This approach is made possible by the widespread use of libraries that contain the gadget chain available. For example, if a gadget chain that relies on Java’s Apache Collections library can be exploited on a site, then any other site using that library can use the same chain to attack.

One such tool for Java deserialization is “ysoSerial”. You simply specify a library that you think the target application is using, then provide a command to try and execute, and the tool will create the appropriate serialized object based on the known gadget chain for the given library. It still takes a fair amount of trial and error, but it’s a lot easier than building your own gadget chain by hand.

Most languages that regularly suffer insecure deserialization attacks have matching proof-of-concept tools. For example, for PHP-based sites, you can use “PHP Generic Gadget Chains” (PHPGGC).

It is important to note that the existence of a gadget chain in the site’s code or any of its libraries is not the cause of this vulnerability. The vulnerability is a user-controlled deserialization of data, and the gadget chain is simply a means of manipulating the data stream after it has been injected. This also applies to various memory corruption vulnerabilities that rely on untrusted data deserialization. So even if they manage to manage every gadget chain that might be inserted, the site may still be vulnerable.

Use a documented gadget chain

You can see if there are any documented exploits that could be used to attack your target website. Even without a specialized tool for automatically generating serialized objects, you can still find documented gadget chains for popular frameworks and adjust them manually.

If you can’t find a gadget chain to work with, you can still gain valuable knowledge that you can use to create your own custom exploits.

Create your own exploit

When off-the-shelf gadget chains and documented exploits don’t work, you need to create your own exploits.

In order to successfully build your gadget chain, you will almost certainly need access to the source code. The first step is to examine the source code to identify the classes that contain the magic methods called during deserialization. Evaluate the code executed by this magic method to see if it does anything dangerous directly with user-controlled properties.

If the magic method itself is not available, it can be used as a starting point for your gadget chain. Explore any method that starts the gadget call. Do these actions pose a risk to the data you control? If not, take a close look at each method they subsequently call, and so on.

Repeat this process, tracking the values you can access until you reach a dead end or identify a dangerous sink gadget to which your controllable data is passed.

Once you’ve figured out how to successfully construct the gadget chain in your application code, the next step is to create a serialized object that contains the payload. It’s just a matter of studying the class declaration in the source code and creating a valid serialized object with the appropriate values needed to exploit the vulnerability. As we saw in previous LABS, this is relatively simple when using string-based serialization formats.

Using binary formats, for example when building Java deserialization vulnerabilities, can be particularly troublesome. When making small changes to an existing object, it may be comfortable to use bytes directly. However, when making more important changes, such as passing in an entirely new object, this quickly becomes impractical. In order to generate and serialize data yourself, it is often much simpler to write your own code in the target language.

When creating your own gadget chain, be aware of the opportunity to exploit this additional attack surface to trigger minor vulnerabilities.

By looking closely at the source code, you can find longer chains of gadgets that may allow you to build high-risk attacks, often involving remote code execution.

PHAR deserialization

So far, we have focused on exploiting the deserialization vulnerability, where a website explicitly deserializes user input. However, in PHP, it is sometimes possible to exploit a deserialization vulnerability without obviously using the unserialize() method.

PHP offers different ways to handle different files when you access them. One of these is phar://, which provides a streaming interface to access the PHP Archive (.phar) file.

The PHP documentation reveals that the PHAR manifest file contains serialized metadata. Crucially, if you perform a file system operation on a Phar :// stream, its metadata is implicitly deserialized. This means that the Phar :// stream could be a potential point to exploit insecure deserialization, provided it can be passed into a filesystem method.

For obviously dangerous file system methods, such as include() or fopen(), the site has probably implemented countermeasures to reduce their potential for malicious use. However, methods such as file_exists() that do not appear to be obviously dangerous may not be well protected.

This technique requires that you upload the PHAR to the server in some way. One way, for example, is to use the image upload function. If you can disguise a PHAR as a simple JPG file, you can sometimes bypass a site’s verification checks. If you can force the site to load this PHAR stream masquerading as JPG, any harmful data injected through PHAR metadata will be deserialized. Since PHP does not check the file extension when reading the stream, it does not matter whether the file uses an image extension.

As long as the object’s class is supported by a website, the __wakeup() and __destruct() magic methods can be called in this way, allowing you to start a gadget chain using this technique.

Exploit deserialization through memory corruption

It is possible to exploit insecure deserialization without using the gadget chain. If all else fails, there are usually publicly documented memory corruption vulnerabilities that can be exploited by unsafe deserialization. These often lead to remote code execution.

Deserialization methods, such as PHP’s Unserialize (), rarely enhance such attacks, exposing a large number of attack fronts. This is not always considered a bug in and of itself, because these methods are not originally intended to handle input that the user can control.