About the tool

Existing tools

There are already plenty of tools available to analyze stains. Among them, I am most interested in Triton and Bincat, because both are quite mature. However, we cannot use either of these tools because they do not support the MIPS architecture used by the target device.

Use ANGR for symbolic execution tests

So we focused on building our own tools with ANGR; Angr is a Python based binary analysis framework. We chose ANGR because it supports most architectures, including the MIPS and ARM architectures we targeted. Earlier, @Puzzor made some custom modifications to ANGR for static blob analysis: With the help of ANGR, the simulation program executes through symbols, and then performs static analysis based on the generated VEX IR program tracking information. In this way, we successfully found the command injection vulnerability in the test firmware.

However, we soon ran into a problem: in order to generate program trace information, we needed ANGR to simulate each function by simulating each instruction and to use symbolic execution to decide whether to follow a branch instruction.

Specifically, angR maintains a state stack. A state contains information such as register values and memory contents. Therefore, when simulating a function, it will only start with one state. When a branch instruction is encountered, if ANGR is unsure whether to follow the branch, ANGR will repeat the state, one of which will follow the branch and the other will not.

Most of the time, there are loops in functions. If the loop condition is based on some user input, then the stack of states “explodes”. Since the ANGR will never be sure whether to continue or break out of the loop, it will keep replicating the state. Also, one thing to note is that these states are not simulated simultaneously. Instead, only one state is simulated at a time. In this case, it takes a long time for a state to reach vulnerable code; Or, if the function has no vulnerable code at all, the simulation may never end.

As a symbolic execution framework, ANGR provides a variety of customizable Settings (called simulation techniques) to determine which state to simulate first. And, after trying many different techniques, we still couldn’t improve execution times.

For example, with a timeout of two minutes set for each function in the binaries to be analyzed, sometimes the binaries cannot be analyzed even after two hours (because if a function is not leaky, it will simulate execution until the timeout position). To make matters worse, there was an unknown memory leak in the ANGR, so after 2 hours, the computer ran out of memory long ago…

Remember, the goal was to make this tool faster than the manual approach. So, it’s not possible to do it this way, so we continue to look for improvements or alternatives.

Applying the Reaching Definition analysis technique of ANGR

Finally, when we stumbled upon this flaw, our enthusiasm for The Reaching Definitions of ANGR’s analytical techniques kindled, and we perused the following:

◼ an reaching definition engine for binary analysis built-in in Angr

◼Handle function calls during static analysis in angr

◼CSE545 Guest Lecture: Binary Analysis

Use – def relationships

In summary, this method of analysis generates use-def relationships between atoms in the function. Atoms here are similar to variables, and atoms also have various types, such as registers, stack variables, and heap variables. In fact, it’s easy to think of atoms as variables. Here’s an example:

In the above function, there is an obvious command injection vulnerability: the code containing the vulnerability is System (command), where the injected command comes from the name parameter of the QueryString. In this function, the use-def relationship between queryString and other atoms looks like this:

First, we see that queryString is defined as an argument to function Vuln and is used by the get_queryString_value function as an argument to QueryString. In addition, the function get_queryString_value defines a parameter, name. Finally, the get_queryString_value function defines a return value, using the two parameters defined above.

Then, as you can see from the figure below, sprintf is called with a name variable (the return value of get_queryString_value) and a string echo %s >> / TMP /log. This time, things are slightly different. Since we know that the first argument to the sprintf function is the target, we must define command properly so that it uses all of the two arguments provided to the spritnf function, rather than just the return value. The generated use-def relationship looks like this:

 

Using the same concept, this analysis method generates the corresponding use-def relationship for all atoms in the function. As we saw above, this relationship can be modeled as a graph: “use” is represented by edges and “define” by nodes. Therefore, we can turn this into a graph analysis problem.

In stain analysis terminology, the source point is the place in the program where the stain data is generated, and the sink point is the place where the stain data may or may not arrive. Blot analysis is to determine whether the data from the source point reaches the sink point. In the example above, the get_queryString_value function is the source point because it extracts some values from the user input, and the system function is the sink point. In this case, the data from the source point does arrive at the sink point.

In our use-def graph, we can determine the definitions (nodes) of the source point and the sink point, and then use some heuristic methods to traverse the graph to determine whether the data of the source point is used by the sink point. If so, we mark the source point as vulnerable and continue to screen for it.

Tool to summarize

In summary, our tool first uses angR’s Reaching Definitions analysis method to generate a use-def diagram consisting of functions in the router firmware. Then, the graph is analyzed accordingly to detect possible security holes. For example, if the user input (from the source point) reaches a dangerous function (sink point), such as the system function, we consider that a potential security hole has been found.

In fact, our tool is very similar to the functionality of engines like CodeQL or Joern, except that our tool lacks a powerful query interface.

The test results

As we mentioned earlier, with symbolic execution methods, it can sometimes take more than two hours to analyze a program. However, using the methods described above, analyzing the same program can be done in about two minutes. So, it does seem like a good tool. After improving the tool to eliminate false positives and cover more false negatives, we tested it on the DLink and PROLiNK routers.

PROLiNK PRC2402M

Using the tool, we immediately discovered nearly 20 command injection vulnerabilities, 10 of which did not require authentication and were accessible directly through the WAN interface. We immediately reported the bugs to PROLiNK, and they responded quickly. After these vulnerabilities were fixed, we applied for the corresponding CVE numbers, which were CVE-2021-35400 through CVE-2021-35409. Here are some vulnerable code snippets, where the source and sink points are:

◼ Source points: web_get

◼Sink point: system, do_system, popen

A hard coded code? A back door?

In the process, I discovered some other security vulnerabilities. There seems to be a hard-coded password or backdoor password that can be used to log into the router’s administration panel: The administration page sends the MD5 hash of the user-provided password to login.cgi for authentication, and the corresponding pseudo-code looks like this:

However, there is a suspicious piece of code behind it:

By using user as the password, we managed to log in to the administration page. After logging in this way, the dashboard appears slightly different and seems to provide less functionality than a user who logs in with an actual password. However, we still can access prc2402m. Setup/setting. SHT…

We reported this to our supplier and they quickly updated the firmware. To make sure the backdoor is gone, I turn on the same function again. This time, instead of seeing strcat(salted_password, “user”), I see the following:

 

In fact, it’s not hard to find the value of Password_backup through NVRAM.

We soon reported the problem to our supplier. Fortunately, after a second repair, the Debugdoor or back door password could not be found.

Stack-based buffer overflow

We also found a number of stack-based buffer overflow vulnerabilities due to the lack of boundary checking. By exploiting these vulnerabilities, an attacker can override the return address on the stack and gain control over program execution. As you can see in some of the command injection examples above, the user’s input is copied into a string via the sprintf function rather than snprintf function.

Denial of service vulnerability

While testing the PoC for the buffer overflow vulnerability, I also found a vulnerability that caused the router to stop responding to requests until it was manually restarted with the power button. In the pseudocode below, cli_num is passed as an argument to the /sbin/sta_qos.sh script.

 

By examining the script contents, I found the following for loop, where $sta_num is used to hold the value of cli_num.

If cli_num is a large value, such as 999999999, then the script will be stuck in the loop almost forever, in effect, in an infinite loop. By sending such a request to the router, many of these scripts are executed and get stuck in the loop. After a period of time, the routers stop responding to any requests, at which point they can only work properly again with a manual reboot.

A time line

● June 9:10 command injection vulnerabilities were reported to vendors.

● June 11: Supplier fixes corresponding bug.

● June 11: Vendors are advised to use some additional filters to prevent such vulnerabilities.

● June 28: Supplier fixes according to our suggestions.

● July 9: Three more vulnerabilities (backdoor, buffer overflow, DoS) were reported to vendors.

● July 23: Supplier fixes.

DLink DIR-1960

In addition to the PROLiNK router, we also ran the tool on the DIR-1960 firmware. This time, the tool returned nearly 200 test results. However, after screening the results, it was found that only four were command injection vulnerabilities via the HNAP API (which we reported earlier), and all of these required authentication. (So there’s a lot of room for improvement in eliminating false positives!)

Here’s a quick introduction to HNAP: It stands for Home Network Administration Protocol, which is actually a SOAP-based Protocol for communicating with the router Administration panel.

DLink DIR-X1560

Next, I decided to try the tool on the DIR-X1560 firmware as well. Both of the above routers are based on the MIPS architecture, but the DIR-X1560 runs on ARM processors. With a few tweaks, the tool can properly analyze ARM-based firmware. It turns out that the tool is architecturally compatible, which makes me happy.

However, with so many layers of abstraction, identifying vulnerabilities in firmware is not that simple. For this reason, the tool is a great help in firmware reverse engineering. I’m not sure of the exact name of the framework on which the firmware is based, but I managed to find some source code on GitHub, which was very helpful as it contained a lot of comments. The closest terms I found in the source code are CMS (CPE Management system), CPE (customer Resident facility), and TR-069. Note, however, that this Repo does not contain any DLink-specific code, so some reverse analysis needs to be performed.

In my opinion, it’s similar to the MVC (Model-View-Controller) architecture, although it might not be.

 

For further explanations of terms and abbreviations, see here.

The DAL (Data Aggregation Layer) API, as the name suggests, is used to interact with data, mainly by passing router configurations. But the actual storage of the data is done by the MDM (In-memory Data Pattern) and ODL (Object Scheduling Layer) apis. DAL uses the cmsObj_get and cmsObj_set functions (or variants of them) as an interface to MDM/ODL to get or set the value of certain objects. For example, the code to get the IP_PING_DIAG MDM object, store it in ipPingObj, and then save it back after modification would look like this:

 

Here are the parameters used:

◼MDMOID_DEV2_IP_PING_DIAG: An enumeration variable that specifies access to the IP_PING_DIAG object.

◼iidStack: Some internal data we don’t need to care about.

◼ipPingObj: Contents of the IP_PING_DIAG object.

In addition, there are THE RCL (Runtime Configuration Layer) and RUT (runtime Usage Tools) apis. Each MDM object (such as MDMOID_DEV2_IP_PING_DIAG) has a corresponding RCL handler (rcl_dev2IpPingDiagObject). Each time cmsObj_set is called, ODL calls the object’s RCL handler, which in turn calls the utility functions of RUT.

Through reverse analysis, we found that its workflow is as follows:

1. The user makes a POST request to interact with the HNAP API (e.g. SetTimeSettings).

2. The HNAP API handler calls the DAL API (e.g. CmsDal_setNtpCfgDLink_dev2).

3. The DAL API calls the MDM/ODL API (cmsObj_set) to set the MDM object (such as Dev2TimeDlinkObject). For example, cmsObj_set(MDMOID_DEV2_TIME_DLINK, & iidStack, 0, &timeDlinkObj)

4. The ODL API calls the RCL handler (for example, rcl_dev2TimeDlinkObject).

5. RCL handler calls RUT API (e.g. Rut_TZ_Nvram_update).

If we look at the HNAP and RUT functions mentioned above, we will see:

 

After a long journey, the NTPServer parameter finally appears in a command passed to system.

As we saw above, the user enters a string (from HNAP) through a number of functions and eventually reaches a system call (in RUT), leading to a command injection vulnerability. If I look at the firmware manually, it takes me almost forever to find it unless I’m very lucky. However, with the help of this tool, although it is not possible to directly connect HNAP to RUT, at least I was able to list the related DAL functions, which saved me a lot of time.

Relationship between DAL and RCL/RUT

Here, we take a closer look at the relationship between DAL API and RCL/RUT API. Where the pseudo-code of cmsDal_setNtpCfgDLink_dev2 (as described earlier in the DAL API called by the HNAP API) is as follows:

 

The code snippet above shows the typical procedure for the DAL function to set/update the MDM object. Note that cmsObj_get is called with the MDMOID (MDM object ID) of value 0x416. Without the source code, I only see the value (0x416) and not the enumeration name (MDMOID_DEV2_TIME_DLINK), which is inferred from the function name and some strings in the firmware.

As mentioned earlier, when cmsObj_set is called, the ODL API calls the corresponding RCL handler, in this case rcl_dev2TimeDlinkObject. I didn’t go into the implementation details of cmsObj_set because it’s quite complex — a lot of checks and function calls. If you’re interested, check out this line, which calls the RCL handler.

Obtaining the mapping between MDMOID and RCL handlers is not difficult, as it is stored in the FIRMWARE OID table, as shown below:

 

In this table, it is easy to see that 0x416 is the MDMOID of the TimeDlink MDM object and that rcl_dev2TimeDlinkObject is the RCL handler. Here, too, we see something called the STL handler, but it doesn’t do much.

Now, the RCL handler rcl_dev2TimeDlinkObject looks like this:

 

We see that newMdmObj is passed to the vulnerable function rut_TZ_Nvram_update (described earlier). NewMdmObj is the same timeDlinkObj that the DAL function passed to cmsObj_set. Therefore, the relationship between DAL and RCL is shown in the figure below:

 

The DAL provides an MDMOID and an object for the cmsObj_set, and then

◼MDMOID determines which RCL handler is invoked

◼ This object is handed over to the RCL handler for processing

As we can see, from a DAL function, it is not difficult to find out which RCL function is called, because we not only have the MDMOID, but also can refer to the OID table above. But when it comes to finding a command injection vulnerability, the steps are reversed.

First, through this tool, I found the potentially leaky RCL/RUT function, where the source point is the function’s argument and the sink point is the system function (or a variant of it). There is nothing new here. But for now, I need to find the DAL function that accesses the relevant MDM object. In other words, just like above, while I know the MDMOID, this time, instead of looking for an RCL handler, I’m answering the following question: Which DAL functions call cmsObj_set with this MDMOID?

 

At first, I used a silly approach: I looked through the cross-references of cmsObj_set one by one until I found the correct MDMOID that was called. There were so many cross-references, over 200 of them, that I gave up after a few minutes. So, I decided to use this tool to help me filter out functions that use a particular MDMOID. In particular, I only care about MDM object fields as strings/buffers. If a field holds an integer value, it is actually useless for command injection or buffer overflow.

To review, the string field of an MDM object is set as follows:

 

Therefore, I don’t need to make too many changes to the tool, just set the source point to cmsObj_get and the sink point to cmsMem_free. As a result, I succeeded. For each MDMOID, I filtered out several DAL functions that modify the associated MDM object. I then check the cross-references of these DAL functions to see how they were called by the HNAP API to find out how user input was passed into the MDM object. With the help of this tool, I was able to find four command injection vulnerabilities in the firmware.

summary

At present, the tool is still in the early stages of development: it can only exploit command injection vulnerabilities. In addition, there is still some manual work to be done when analyzing complex firmware such as dir-X1560, as it cannot automatically determine which HNAP functions are vulnerable. I will continue to improve this tool in the hope that one day it will help discover other types of vulnerabilities, such as buffer overflows in firmware, UAF, etc.

The last

I sort out learning materials related to network security. If necessary, you can pay attention to my personal information or 👇👇

[Network security data]