When sharing on the Internet, there is one operation that is absolutely unavoidable: “file upload.” We often send microblog, wechat moments and so on, and the sending process uses the picture uploading function in file uploading. Upload local pictures, videos, and audio files to the application server for other users to browse or download. As a result, there will be a large amount of data flowing into the website every day. While massive data brings users, it also brings some security problems.
And the website developers in the website storage space, often find all kinds of XML, HTML, APK and other junk files, these files or inject ads, or spread pornography videos and other resources, seriously affect the operation of the website business. These junk files are uploaded to the site’s storage space through the file upload function. If the website uploaders do not validate or filter the data submitted by users, the server can easily upload modified data.
File upload is one of the most easy to be used in data security. To reduce the intrusion of malicious file upload, we first need to understand its principle.
The roles of file types and file extensions
The data content of a computer is usually stored on storage hardware such as a hard disk. Since the hard disk itself has a huge space, just like a big warehouse, to facilitate data storage and management, we created the concept of file, which is the operating system uses file format to encapsulate a piece of data stored in the space.
Along with the development of the Internet, however, from the beginning of a plain text file, development to today, various types of multimedia files, such as image, audio, video, etc., our storage file is becoming more and more species is becoming more and more rich, the file size is becoming more and more big, if does not carry on these documents to distinguish, very trouble to find. So file formats (or file types) come into being. Each type of file can be stored in computer storage in one or more file formats. Each file format is usually identified by one or more extensions that help users and applications identify the file format.
For example, there is a file named readme.txt, with.txt being the extension, and TXT for plain text files. Such a document might be a plain text description document.
In addition, the extension can help the operating system determine how to read the file. For example, the score.doc file can be opened with Word. If a Windows user double-clicks a.doc file, Windows will look for “programs that can open this file” in the self-maintenance database table based on the file extension “doc.” Such as the Word program, the system will automatically start the Word program and tell Word to load the file.
As can be seen from this, when Windows system opens a file, it only needs the extension in the file name to find the corresponding program. Therefore, changing the extension of a file also changes the default way the file is opened in the system. And if the content of the file itself does not fit the program’s expectations for the format of the content of the file, it can be opened with errors or unexpected results.
How does the browser recognize open files
As Internet tools get better, you’re more likely to open a file using a browser than you are to open it locally. So how does the browser determine what file type the resource is accessing? It’s actually based on the response header.
When a user enters a URL, the server where the resource resides responds with a content-type response header, whose value is the corresponding file Type (MIME Type). If the browser supports the format, the browser will attempt to render the corresponding file.
In contrast to Windows, browsers typically use MIME types rather than file extensions to process urls. Therefore, it is important to add the correct MIME type to the response header. If not configured correctly, the browser may misinterpret the content of the file, and the downloaded file may be incorrectly processed, which can affect the normal operation of the website.
How are malicious files uploaded
At the beginning, we mentioned that some malicious resources will be uploaded through the way of uploading files. The file format of these malicious resources is obviously normal, but after opening, it will appear special access effects such as jumping to the web page. How does this work?
In fact, the principle is simple, by modifying the MIME type to do the operation. Like the test.jpg image above. Although the URL links to the resource with the JPG suffix, the actual resource has a file type of text/ HTML, which is the web page type.
This type of webpage file supports the embedding of JS code, and can be used to let the user open the file jump to the specified website. Although this phenomenon may seem similar to DNS hijacking, it is actually different. For more information about DNS hijacking, check out ** ** ** ** ** *
So it seems that such malicious files achieve their effects by modifying file MIME, so can we reduce the upload and access of such malicious files by restricting MIME and other methods? Yes, there are many methods, we will take the cloud storage as an example to explain one by one.
Means to prevent malicious file upload
Identity traceability – TOKEN upload
TOKEN authentication uses the id of the uploaded file to calculate the TOKEN, control the upload validity period, and fix the upload directory or upload suffix. The TOKEN authentication provided by the cloud storage system is different from the common mode in which all users use the information of an operator on the server for authentication and upload. The TOKEN authentication provided by the cloud storage system provides more fine-grained permission control.
After the TOKEN function is enabled, each user can be assigned an independent id. In this way, files uploaded by users will be stored in a separate directory based on the ID.
X-upyun-uri-prefix = / service name/client_37ASCII // User ID Prefix, which corresponds to a directory on the storage, for example, / client_37ASCII/x-upyun-uri-postfix =.jpg // Specifies the file suffix to be uploadedCopy the code
Files uploaded in this way can be quickly traced back through identification to find out who uploaded a large number of malicious files and handle them.
File proof – Content-type
The second way is to restrict the name of the file. We can limit the MIME Type of uploaded files. For example, the uploaded images are restricted to the content-type Type. In this way, even if a malicious file is uploaded, the browser will forcibly parse the file according to the image format when accessing the file.
Both the REST API and the FORM API of the cloud storage system support mandatory setting of the content-type Type. The FORM API supports multiple restrictions:
We use Java SDK Form API upload as an example:
// Initialize uploader FormUploader Uploader = new FormUploader(BUCKET_NAME, OPERATOR_NAME, OPERATOR_PWD); // Initialize the policy parameter group. Map Final Map<String, Object> paramsMap = new HashMap<String, Object>(); // Add the SAVE_KEY parameter paramsmap. put(params.save_key, savePath); Paramsmap. put(params.content_type, "image/ JPG "); // Add file upload limit paramsmap. put(params.content_type, "image/ JPG "); Put (params.allow_file_type, "JPG, JPEG, PNG "); // Force the file MIME type paramsmap. put(params.allow_file_type," JPG, JPEG, PNG "); // force file extension paramsmap. put(params.content_length_range, "102400,1024000"); // force file extension paramsmap. put(params.content_length_range, "102400,1024000"); Uploader.upload (paramsMap, file); // Run uploader.upload(paramsMap, file).Copy the code
If any of the preceding parameters are set in the upload Settings, the cloud storage system detects the content of the uploaded file and matches the specified value with the judgment value. If the matching is successful, the file is allowed to be uploaded. If the matching fails, the status code 403 is returned.
Deny access – Edge rule
Both methods are used to exclude malicious files during upload. What about files that have already been uploaded?
For the upload of malicious resources disguised as pictures, we can identify the edge rules. By adding an implicit cloud image parameter to each image link, the malicious file cannot be displayed properly, resulting in 405 error feedback.
In addition to image restrictions, if a large number of malicious APKs are found in the image space, edge rules can also be used to quickly disable access to them.
Hot links – Statistical analysis
If there are many files and many file types, the troubleshooting time is very long, and the problem needs to be checked immediately. You can also take a look at the log analysis function of clap cloud. This feature collects statistics on the access status of domain names for each service on a daily basis. It can collect TOP 1000 analysis data based on hot files, hot clients, hot reference files, resource status codes, file sizes, and hot IP addresses.
Statistical analysis can help you synthesize the inventory service situation. If a resource or IP address is accessed abnormally frequently, locate and rectify the fault in a timely manner.
Sexual violence terror – content identification
If your site has a lot of traffic, it is not appropriate to limit it in this way. You can also take a look at the two content recognition tools of Youpai Cloud and Fanwei Company, Tianqing and Tianbeta.
These two tools focus on AI intelligent security detection, using machine learning classifier algorithms to “intelligently” review information such as pictures and videos, gradually turning “yellow examiner” from a profession into an “algorithm” and “model”. Liberate manpower, and greatly improve processing efficiency, help enterprises reduce input costs. Provide customers with content security early warning, content security data and content security wind rating services, provide perfect network information and content security solutions,
At present, a number of Internet enterprises and government departments have provided low latency, high precision, visual set of one-stop content security services from engine identification to manual audit.
Recommended reading
Let’s talk about DNS
Net fraud? Internet streaking? All because of HTTP?