I’ve just been reading about the CSV Injection attack – a new one to me, and somewhat surprising that it exists. The source of the problem seems to be a gap between what the CSV format was originally conceived to be (a flat-file format containing only “safe” alphanumeric data) and how modern applications such as Microsoft Excel and OpenOffice/LibreOffice Calc actually interpret the contents of a CSV file.
Many web applications allow users to download bulk exports in CSV format, and if this attack isn’t mitigated, an attacker can potentially include a malicious formula that could execute an arbitrary command on the user’s machine, or enable data to be exfiltrated. In most cases, the user will be presented with one or more warnings, but normally these are worded such that users are likely to dismiss them because they deem the CSV file to have originated from a “trusted” source.
The mitigation is to ensure that no fields within a CSV file begin with the equals sign (=) or any of the several other characters that may cause a spreadsheet application to interpret the contents as a formula to be executed. Additionally, field separator characters such as comma (,) and semicolon (;) should also be disallowed anywhere within the user input, as these could be used to start a new field to defeat the protection above.