In this paper, node.js is used to implement file fragmentation upload, without the framework of Node.js. The front end is implemented in javascript and no framework is used. The mongoDB database is used here. (This code is an exercise, not a project)
The preparatory work
- Install Node and mongoDB.
- use
npm init -y
Initialize the project. - use
NPM install mongo - save
Install the mongodb package.
Implementation approach
The main idea is to slice the file and upload the fragments. After receiving all the fragments, the back end merges them into a complete file. Use
Draw a flow chart, as follows:
Front end: When clicking the upload file button, the first request will be initiated, and subsequent processing needs to wait for the first request to complete. Because the response result of the first request will contain the current file upload situation, the corresponding processing will be made according to the different file upload situation.
Back end: Each fragment of the file corresponds to its index value. After uploading to the back end, once the sharding with index 1 is stored as a file, the location of the uploadedFilesUrls array with index 1 is where the uploaded sharding is stored on the server. After all fragments are uploaded, the fragments are merged into an entire file in the order of index.
The front-end implementation
-
When you select a file through the input box, you get the selected file and take the information from the file to form a string. From this string, you generate a hash value that is uploaded to the back end as the id of the file (the ID here is not unique, more on that later). The jsSHA used here is used to generate a hash string representing a file.
// Change the currently selected file function changeSelectedFile (event) { let fileUploadElement = event.target; let selectedFile = fileUploadElement.files[0]; // Get the currently selected file globalData.selectedFile = selectedFile; // Use the SHA-512 algorithm to generate a hash string that identifies the file const { name, lastModified, size, type, webkitRelativePath } = selectedFile; let fileStr = `${name}${lastModified}${size}${type}${webkitRelativePath}`; let shaObj = new jsSHA('SHA-512'.'TEXT'); // Create a jsSHA instance that represents the sha-512 algorithm and the data to be converted to text format shaObj.update(fileStr); // Pass in the data to be converted globalData.selectedFileHash = shaObj.getHash('HEX'); // Get the hash value representing the file } Copy the code
-
A method that encapsulates a request that is used to initiate a request and upload file fragment information. Fetch is used to initiate requests, both with async/await and with AbortController.
// Upload the file fragment function uploadBlock (body) { const controller = globalData.controller; let url = '/api/uploadFile'; let headersObj = { 'Content-Type': 'multipart/form-data' }; return fetch(url, { method: 'POST', body, headers: new Headers(headersObj), signal: controller.signal // Use the controller instance's signal to indicate the condition of the request }).then(res= > res.json()) .catch(error= > ({ error })) .then(response= > ({ response })); } Copy the code
The controller in the above code is defined in the globalData object that holds the global variables. It is an instance of the AbortController that is used to abort all outstanding requests when the “Pause upload” button is clicked.
let globalData = { ... controller: new AbortController() // Abort the instance of the controller } Copy the code
-
After the first file fragment is sorted out, the upload request is initiated. SelectedFile is the selectedFile that can be sliced directly using the slice method. Create a FormData instance, put the data into the FormData instance, and upload it to the server. Here is the first request initiated after clicking “Upload file”. We need to wait for the completion of this request and make corresponding processing according to the result of the request response, as shown in the flowchart above.
let start = 0, end = blockSize; let blockContent = selectedFile.slice(start, end); let formData = new FormData(); formData.set('fileId', selectedFileHash); formData.set('fileName', fileName); formData.set('blockLength', blockLength); formData.set('blockIndex'.0); formData.set('blockContent', blockContent); const { response } = await uploadBlock(formData); Copy the code
-
The specific processing when the request succeeds. Get fileUrl and uploadedIndexs in the result returned by the request. FileUrl is the path of the file stored on the server after the file is successfully uploaded, and uploadedIndexs is the index of the uploaded file fragments.
If fileUrl exists, the file has been uploaded. Change the value of the progress bar to 100. Otherwise, uploadedIndexs gets the index of the unuploaded fragments according to the uploaded fragment index uploadedIndexs, and uploadBlock is called to upload the fragments in turn.
if (response.code === 1) { // The request succeeded const { data } = response; const { fileUrl, uploadedIndexs } = data; if (fileUrl) { // The file has been uploaded setProgress(100); } else { let uploadedIndexsArr = Object.keys(uploadedIndexs); // Uploaded fragment index let allIndexs = Array.from({ length: blockLength }, (item, index) => `${index}`); // All sharded indexed arrays let notUploadedIndexsArr = allIndexs.filter((item) = > uploadedIndexsArr.indexOf(item) === - 1); // No index for uploaded fragments let notUploadedIndexsArrLength = notUploadedIndexsArr.length; for (let i = 0; i < notUploadedIndexsArrLength; i++) { let item = notUploadedIndexsArr[i]; start = item * blockSize; end = (item + 1) * blockSize; end = end > fileSize ? fileSize : end; let blockContent = selectedFile.slice(start, end); formData.set('blockIndex', item); formData.set('blockContent', blockContent); const { response } = await uploadBlock(formData); const { data } = response; const { fileUrl, uploadedIndexs } = data; if (fileUrl) { setProgress(100); } else { let completedPart = Math.ceil((Object.keys(uploadedIndexs).length / blockLength) * 100); setProgress(completedPart); }}}}Copy the code
That’s the main code for the front end.
The backend implementation
-
Create a simple Node.js server. Because this is an exercise, only one server is created, and both the request resource and the request interface reside on this server. Simply determine whether to request an interface or a resource by whether the request path begins with/API.
Create a MongoClient instance: dbClient and call dbClient’s connect method to connect to the mongoDB database. Use dbclient.db (dbName) to connect to the database, and mongoDB is the database object. Const collection = mongoDB. Collection (‘documents’); Join a collection. Documents above is the name of the collection. The operation of adding, deleting, modifying and checking through collection, such as adding a piece of data to the database, uses collection.insertone.
Create a server using http.createserver () and listen for its request events. The code in the callback function executes when the server receives the request.
let mongoDB = null; // Declare a variable that represents the database const MongoClient = require('mongodb').MongoClient; const assert = require('assert'); const dbUrl = 'mongo: / / 127.0.0.1:27017'; const dbName = 'practice'; const dbClient = new MongoClient(dbUrl, {useNewUrlParser: true.useUnifiedTopology: true}); const serverPort = 8080; const server = http.createServer(); server.on('request', (req, res) => { const { url: requestUrl } = req; let parsedUrl = url.parse(requestUrl); let pathName = parsedUrl.pathname; if (/^\/api/.test(pathName)) { // An interface request begins with/API fileUploadRequest(req, res, parsedUrl); } else { // Otherwise it is a resource request requestResource(req, res, parsedUrl) .catch(err= > { handleError(JSON.stringify(err), res) }); }}); server.listen(serverPort, () => {console.log('Server is running at http://127.0.0.1:${serverPort}`); // Connect to the database dbClient.connect((err) = > { assert.equal(null, err); mongoDB = dbClient.db(dbName); console.log('mongodb connected successfully'); }); }); Copy the code
-
The following is the processing of the requested resource. According to the requested path, the file is read from the specified path. After the file is read, the file content is displayed in the browser.
// How the requested resource is handled async function requestResource (req, res, parsedUrl) { let pathname = parsedUrl.pathname; let filePath = pathname.substr(1); filePath = filePath === ' ' ? 'index.html' : filePath; let suffix = path.extname(filePath).substr(1); let fileData = await readFile(filePath); res.writeHead(200, {'Content-Type': `text/${suffix}`}); res.write(fileData.toString()); res.end(); } Copy the code
ReadFile in the above code is the encapsulated method. Asynchronous operations such as file reading and writing, database reading and writing need to be encapsulated as follows, so that async/await can be used.
// Read the file function readFile (path) { return new Promise((resolve, reject) = > { fs.readFile(path, (err, data) => { if (err) { reject(); } else{ resolve(data); }}); }); }Copy the code
-
When cotent-type is a multipart/form-data request, the message body is transmitted in binary format. Get the message body by listening for data and end events.
// How file upload requests are processed function fileUploadRequest (req, res) { req.on('error', (err) => { handleError(err.message, res); }); let body = []; req.on('data', (chunk) => { body.push(chunk); }); req.on('end', () => { body = Buffer.concat(body); let formattedData = formatData(body); storeFile(formattedData, res); }); } Copy the code
FormatData compiles the resulting binary data into an object. The following figure shows the data format of the message body, where the fileContent part is the binary data of the file. The specific implementation can be viewed in the source formatData.js file.
StoreFile stores the sharded data of a file as a single file.
-
StoreFile processes the request information according to the flowchart above. The main part is what happens when the file is not uploaded for the first time. After all file fragments are uploaded, the file is integrated into one file and deleted.
WriteFile is a encapsulated method similar to readFile, which is used to write sharded data into a file. Determine whether the number of all keywords in the uploadedIndexs object in the database is the number of all fragments to determine whether the fragment has been uploaded. After the sharding is uploaded, all the sharding files are merged into one file, and then all the sharding files are deleted, and the fileUrl value on the corresponding data is updated.
let { fileUrl, blockLength, uploadedIndexs, uploadedFilesUrls } = result; if (fileUrl || uploadedIndexs[blockIndex]) { // The file has been uploaded or the fragment has been uploaded handleSuccess(result, res); } else { let path = `./upload/${fileName}.${blockIndex}`; let blockUrl = await writeFile(path, blockContent, false); uploadedFilesUrls[blockIndex] = blockUrl; uploadedIndexs[blockIndex] = true; let blocksUploadCompleted = Object.keys(uploadedIndexs).length === blockLength; if (blocksUploadCompleted) { // After the block upload is complete, combine all the fragments into one file and delete the block file let blockFileUrls = uploadedFilesUrls.slice(0); // Copy the array of fragment file paths let path = `./upload/${fileName}`; let uploadedFileUrl = await combineBlocksIntoFile(uploadedFilesUrls, path); storageData.fileUrl = uploadedFileUrl; await updateData(collection, { fileId }, storageData); blockFileUrls.forEach((item) = > { // Delete the fragment file fs.unlink(item, (err) => { if (err) throw err; }); }); handleSuccess(storageData, res); } else { storageData.uploadedFilesUrls = uploadedFilesUrls; storageData.uploadedIndexs = uploadedIndexs; awaitupdateData(collection, { fileId }, storageData); handleSuccess(storageData, res); }}Copy the code
-
CombineBlocksIntoFile is the method of merging files.
// Merge the fragments into a single file function combineBlocksIntoFile (uploadedFilesUrls, fileUrl) { return new Promise((resolve) = > { let writeStream = fs.createWriteStream(fileUrl); // Create a writable stream at the fileUrl location let readStream; // Define a readable stream combineFile(); function combineFile () { if(! uploadedFilesUrls.length) {// The fragment has been merged writeStream.end(); // Finish writing here let uploadedFileUrl = getAbsolutePath(fileUrl); resolve(uploadedFileUrl); } else { let currentBlockUrl = uploadedFilesUrls.shift(); readStream = fs.createReadStream(currentBlockUrl); // Put the data from the readable stream into the writable stream. The second parameter is set to indicate that the write is not finished when the read is complete, because it needs to be read from multiple files readStream.pipe(writeStream, { end: false }); readStream.on('end', () => { combineFile(); // Call the function again after the current fragment file has been read}); }}}); }Copy the code
The problem
Problems encountered:
-
It was a dark night. In order to traverse the index of unuploaded shards and upload the sharding corresponding to the index, I wrote the following code:
notUploadedIndexsArr.forEach(async (item) => { ... const { response } = await uploadBlock(formData); }); Copy the code
Define the callback function of forEach as an async function. The idea is to send all sharded requests in parallel, reducing the user’s wait time. The request is actually sent in milliseconds, but the more fragments you have, the longer the wait time for the request. When I used 500M each slice, it was fine, but when I used 50M each slice, it stuck the browser completely.
I guess this is because the server built with Node.js in this article is single-threaded, and the file read operation is asynchronous, so I need to store the shard as a file every time the shard is uploaded. Take a divided into 200 shard file, this is equivalent to the server at the same time, the difference of a few milliseconds) received 200 requests, all need to do some processing, then block data is written to a file, because the file read and write operations are asynchronous, so the fs module need to immediately wrote 200 request to get the data in 200 files, It’s the difference between carrying 1 brick at a time and 200 bricks at a time.
Solution: Use the for loop to upload the files one by one.
-
The node.js Buffer cannot hold more than (2^31)-1, which is about 2GB of data.
Solution: Use Stream to merge files.
Remaining issues:
-
The generated hash value is not unique to the file, because to generate the hash value of the file, the file needs to be converted to the data in the ArrayBuffer format first, and this process is time-consuming when the file is large, so I simply combined some information from the file into a string and generated the hash value from the string. The following is how to convert the file content to a hash value:
function changeSelectedFile (event) { let fileUploadElement = event.target; let selectedFile = fileUploadElement.files[0]; let reader = new FileReader(); reader.readAsArrayBuffer(selectedFile); // Convert the file (Blob data) to an ArrayBuffer reader.addEventListener('load', (event) => { let fileArrayBuffer = event.target.result; let shaObj = new jsSHA('SHA-512'.'ARRAYBUFFER'); // Create a jsSHA instance using the SHA-512 algorithm with ARRAYBUFFER as input shaObj.update(fileArrayBuffer); // Pass in the data to be converted globalData.selectedFileHash = shaObj.getHash('HEX'); // The output type is HEX globalData.selectedFile = selectedFile; }); } Copy the code
-
There are still a lot of cases that have not been dealt with. For example, instead of returning a specific status code for a specific error, it simply returns 500.
Ps: This exercise uses Google Chrome version 78.0.3904.70, which is currently the latest version and supports async/await; The node version used is V10.16.0 and also supports async/await.
Source address.
Refer to the address
- Using the Fetch.
- The use of the Node. Js.
- Use MongoDB with node.js.
- POST The file is uploaded.
- Node.js merges multiple files.
- Interrupt a request in progress.