The story background
One day, the team’s new questioner suddenly asked me: How do I handle receiving uploaded file data in the Koa service?
I answer: this is simple, you match multipart in koA-body, and then in ctX.request. Files
He asked again: Ok. Why is it available in ctx.request.files with multipart configured?
And I said, “Because KOA-Body did it for you.
He asked again: Ok. So what does it do?
I:… I’ll take a look at the source code and get back to you
Thoughts together
Three questions can be distilled from the previous probing conversation:
-
WHAT: WHAT makes Koa parse file uploads?
A: koa – body
-
HOW: HOW do you configure it?
A: Turn on multipart configuration (note: more details on formidable configuration)
-
WHY: Koa-body can parse the uploaded file, and when was it added to the CTX attribute?
We now have answers to the first two questions. As a developer, once we’ve mastered the WHAT and HOW, it’s time to settle down and explore the principles behind the WHY. Know more about the how than the why.
So, for this kind of question of principle, our train of thought need not say much: look at the source code
Simple analysis of KOA-Body source code
Entrance to the file
When analyzing NPM dependencies, we must start with the entry file, starting with the main field in package.json, which is generally index.js
According to the koA.js middleware implementation specification, we can learn from the above code:
requestBody
Method returned byfunction
It’s the middleware that actually executes it- When the service starts,
requestBody
Method initializes the configuration
RequestBody returns to the middleware implementation
Let’s look at the code that actually implements the middleware.
I’m folding some irrelevant code because of the screen, so let’s focus on the red box.
- when
opts.multipart
(Configuration authentication) andctx.is('multipart')
(Request header validation) are bothtrue
“, determine the file upload scenario, and callformy
methods - when
formy
Method returned bypromise
The instanceresolved
When willpromise
Instance returns data attached toctx.request.body
andctx.request.files
on
The second part of the WHY puzzle is solved: koA-body will append the returned data to the CTX.request after the promise instance that is actually processing the logic is resolved
Formy method implementation
From the screenshot in the previous section, you can see that the parsing logic for the file is all in the formy method. So how does this work.
So much for code
function formy(ctx, opts) { return new Promise(function (resolve, reject) { var fields = {}; var files = {}; var form = new forms.IncomingForm(opts); form.on('end', function () { return resolve({ fields: fields, files: files }); }).on('error', function (err) { return reject(err); }).on('field', function (field, value) { if (fields[field]) { if (Array.isArray(fields[field])) { fields[field].push(value); } else { fields[field] = [fields[field], value]; } } else { fields[field] = value; } }).on('file', function (field, file) { if (files[field]) { if (Array.isArray(files[field])) { files[field].push(file); } else { files[field] = [files[field], file]; } } else { files[field] = file; }}); if (opts.onFileBegin) { form.on('fileBegin', opts.onFileBegin); } form.parse(ctx.req); }); }Copy the code
That’s not a lot of logic, so let’s summarize it.
- The new one
IncomingForm
Instance, pass informidable
Corresponding Configuration - Calling instance
parse
Method, and listenend
,error
,file
,field
Events such as - in
end
Eventresolve
Where does forms.IncomingForm come from?
const forms = require('formidable');
Copy the code
The original KOA-body reference is a third party dependency formidable
The data attached to the CTx.request object is processed in the formidable.IncomingForm instance, received through the file, field and other event callbacks, and returned in the end event callback.
Simple analysis of formidable source code
Entrance to the file
From the previous analysis, we know that koA-body’s processing of the file is the reference formidable. Again, we analyze from the entry file.
The entry code is very simple, and the core logic appears to be in Formidable
Formidable. Js analysis
Let’s start with a macro impression of Formidable.js:
- Defined and exported
IncomingForm
类 IncomingForm
inheritedEventEmitter
IncomingForm
Many methods are defined
Through inheritanceEventEmitter
.IncomingForm
The instance can trigger relevant events at appropriate times and listen for processing
The parse process
Back to the above, the IncomingForm instance calls the parse method. Let’s start with the parse approach.
As you can see from the red box, the parse method has two responsibilities:
- Parse the request header and set the parser
- Listen for the data event of the req parameter and process the data.
We know that the req parameter is the ctx.req, node.js native Request object.
Part of the puzzle: how did KOA-Body get access to the uploaded file data? By listening for data events on the Node.js native Request object
The write process
This section covers a number of nested method calls, which I refer to collectively as the write procedure.
The call chain to the writeHeaders method looks like this:
writeHeaders()
-> _parseContentType()
-> initMultipart()
-> MultipartParser
After a series of complex calls, we complete the mission of the writeHeaders method: parse the request header and set the Parser
MultipartParser source analysis
As you can see, the MultipartParser class defined inherits from the Transform stream
Let’s take a quick look at the official document’s definition of a Tranform stream
By defining the _transform method, we can process the buffer written into the stream
_transform(buffer, _, done) { let i = 0; let prevIndex = this.index; let { index, state, flags } = this; const { lookbehind, boundary, boundaryChars } = this; const boundaryLength = boundary.length; const boundaryEnd = boundaryLength - 1; this.bufferLength = buffer.length; let c = null; let cl = null; const setMark = (name, idx) => { this[`${name}Mark`] = typeof idx === 'number' ? idx : i; }; const clearMarkSymbol = (name) => { delete this[`${name}Mark`]; }; const dataCallback = (name, shouldClear) => { const markSymbol = `${name}Mark`; if (! (markSymbol in this)) { return; } if (! shouldClear) { this._handleCallback(name, buffer, this[markSymbol], buffer.length); setMark(name, 0); } else { this._handleCallback(name, buffer, this[markSymbol], i); clearMarkSymbol(name); }}; for (i = 0; i < this.bufferLength; i++) { c = buffer[i]; switch (state) { case STATE.PARSER_UNINITIALIZED: return i; case STATE.START: index = 0; state = STATE.START_BOUNDARY; case STATE.START_BOUNDARY: if (index === boundary.length - 2) { if (c === HYPHEN) { flags |= FBOUNDARY.LAST_BOUNDARY; } else if (c ! == CR) { return i; } index++; break; } else if (index - 1 === boundary.length - 2) { if (flags & FBOUNDARY.LAST_BOUNDARY && c === HYPHEN) { this._handleCallback('end'); state = STATE.END; flags = 0; } else if (! (flags & FBOUNDARY.LAST_BOUNDARY) && c === LF) { index = 0; this._handleCallback('partBegin'); state = STATE.HEADER_FIELD_START; } else { return i; } break; } if (c ! == boundary[index + 2]) { index = -2; } if (c === boundary[index + 2]) { index++; } break; case STATE.HEADER_FIELD_START: state = STATE.HEADER_FIELD; setMark('headerField'); index = 0; case STATE.HEADER_FIELD: if (c === CR) { clearMarkSymbol('headerField'); state = STATE.HEADERS_ALMOST_DONE; break; } index++; if (c === HYPHEN) { break; } if (c === COLON) { if (index === 1) { // empty header field return i; } dataCallback('headerField', true); state = STATE.HEADER_VALUE_START; break; } cl = lower(c); if (cl < A || cl > Z) { return i; } break; case STATE.HEADER_VALUE_START: if (c === SPACE) { break; } setMark('headerValue'); state = STATE.HEADER_VALUE; case STATE.HEADER_VALUE: if (c === CR) { dataCallback('headerValue', true); this._handleCallback('headerEnd'); state = STATE.HEADER_VALUE_ALMOST_DONE; } break; case STATE.HEADER_VALUE_ALMOST_DONE: if (c ! == LF) { return i; } state = STATE.HEADER_FIELD_START; break; case STATE.HEADERS_ALMOST_DONE: if (c ! == LF) { return i; } this._handleCallback('headersEnd'); state = STATE.PART_DATA_START; break; case STATE.PART_DATA_START: state = STATE.PART_DATA; setMark('partData'); case STATE.PART_DATA: prevIndex = index; if (index === 0) { // boyer-moore derrived algorithm to safely skip non-boundary data i += boundaryEnd; while (i < this.bufferLength && ! (buffer[i] in boundaryChars)) { i += boundaryLength; } i -= boundaryEnd; c = buffer[i]; } if (index < boundary.length) { if (boundary[index] === c) { if (index === 0) { dataCallback('partData', true); } index++; } else { index = 0; } } else if (index === boundary.length) { index++; if (c === CR) { // CR = part boundary flags |= FBOUNDARY.PART_BOUNDARY; } else if (c === HYPHEN) { // HYPHEN = end boundary flags |= FBOUNDARY.LAST_BOUNDARY; } else { index = 0; } } else if (index - 1 === boundary.length) { if (flags & FBOUNDARY.PART_BOUNDARY) { index = 0; if (c === LF) { // unset the PART_BOUNDARY flag flags &= ~FBOUNDARY.PART_BOUNDARY; this._handleCallback('partEnd'); this._handleCallback('partBegin'); state = STATE.HEADER_FIELD_START; break; } } else if (flags & FBOUNDARY.LAST_BOUNDARY) { if (c === HYPHEN) { this._handleCallback('partEnd'); this._handleCallback('end'); state = STATE.END; flags = 0; } else { index = 0; } } else { index = 0; } } if (index > 0) { // when matching a possible boundary, keep a lookbehind reference // in case it turns out to be a false lead lookbehind[index - 1] = c; } else if (prevIndex > 0) { // if our boundary turned out to be rubbish, the captured lookbehind // belongs to partData this._handleCallback('partData', lookbehind, 0, prevIndex); prevIndex = 0; setMark('partData'); // reconsider the current character even so it interrupted the sequence // it could be the beginning of a new sequence i--; } break; case STATE.END: break; default: return i; } } dataCallback('headerField'); dataCallback('headerValue'); dataCallback('partData'); this.index = index; this.state = state; this.flags = flags; done(); return this.bufferLength; }Copy the code
The logic for _transform is simple: it iterates through the buffer, doing different processing at different stages and firing different events via the _handleCallback method to trigger the corresponding callback.
Note: The implementation of the _handleCallback method is very interesting, interested students can see for themselves
parser.on(‘data’)
As mentioned above, when data is written to the Parser stream, the _transform method is called to process it and the callback for each cycle event is fired.
The code for the event callback looks like this:
What we need to focus on is the headersEnd event, which calls the onPart method of the IncomingForm instance in the callback
Why do you saythis.onPart
Call isIncomingForm
And the method for instance, if you look at the previous code, there’s one stepcall
The bindingthis
The operation of the
With all the nested calls, we’re finally back in the IncomingForm logic.
A brief summary of the write process
Let’s review the write process again:
this.writeHeaders(req.headers);
– setparser
req.on('data')
– callparser
thewrite
Method to write data to the streamparser.on('headersEnd')
–headers
After parsing, callthis.onPart
methods
Save the process
_handlePart
Method source analysis
When the headers is resolved, we call the this.onPart method. The core logic of the this.onPart method is in the _handlePart method. Next, let’s analyze the _handlePart source code
_handlePart(part) { if (part.originalFilename && typeof part.originalFilename ! == 'string') { this._error( new FormidableError( `the part.originalFilename should be string when it exists`, errors.filenameNotString, ), ); return; } if (! part.mimetype) { let value = ''; const decoder = new StringDecoder( part.transferEncoding || this.options.encoding, ); part.on('data', (buffer) => { this._fieldsSize += buffer.length; if (this._fieldsSize > this.options.maxFieldsSize) { this._error( new FormidableError( `options.maxFieldsSize (${this.options.maxFieldsSize} bytes) exceeded, received ${this._fieldsSize} bytes of field data`, errors.maxFieldsSizeExceeded, 413, // Payload Too Large ), ); return; } value += decoder.write(buffer); }); part.on('end', () => { this.emit('field', part.name, value); }); return; } if (! this.options.filter(part)) { return; } this._flushing += 1; const newFilename = this._getNewName(part); const filepath = this._joinDirectoryName(newFilename); Const File = this._newfile ({newFilename, filepath, originalFilename: part.originalFilename, mimetype: part.mimetype, }); file.on('error', (err) => { this._error(err); }); this.emit('fileBegin', part.name, file); // Call the open method file.open(); this.openedFiles.push(file); Part.on ('data', (buffer) => {this._filesize += buffer.length; if (this._fileSize < this.options.minFileSize) { this._error( new FormidableError( `options.minFileSize (${this.options.minFileSize} bytes) inferior, received ${this._fileSize} bytes of file data`, errors.smallerThanMinFileSize, 400, ), ); return; } if (this._fileSize > this.options.maxFileSize) { this._error( new FormidableError( `options.maxFileSize (${this.options.maxFileSize} bytes) exceeded, received ${this._fileSize} bytes of file data`, errors.biggerThanMaxFileSize, 413, ), ); return; } if (buffer.length === 0) { return; } this.pause(); Write (buffer, () => {this.resume(); // Write file file.write(buffer, () => {this.resume(); }); }); // Listen for end event part.on('end', () => {if (! this.options.allowEmptyFiles && this._fileSize === 0) { this._error( new FormidableError( `options.allowEmptyFiles is false, file size should be greather than 0`, errors.noEmptyFiles, 400, ), ); return; } // close the event file.end(() => {this._flushing -= 1; this.emit('file', part.name, file); this._maybeEnd(); }); }); }Copy the code
We can see the core logic of the _handlePart method from the comments I made at the key points of the code:
new
aFile
Object instancefile
- open
file
- Listening to the
part
的data
Event to write datafile
中 - Listening to the
part
的end
Event, closefile
Before reading this core logic, we need to make two important messages clear:
-
What is the part parameter passed in?
part
Is in theinitMultipart
Method to create a readable streamdata
Events transmit data to the outside world
-
What is the instance File of the File object?
It’s based on incoming
filePath
Create a writable stream
Now that we understand these two premises, we finally understand!
The _handlePart method opens a file stream, writes the parsed data through the stream, and then closes it.
The end of the process
_maybeEnd
This is the end of our analysis!
How does the whole process end?
Notice that there is a method called _maybeEnd that fires an end event when the condition is met
So let’s take the this._flushing variable and try to figure out how this satisfies this condition.
As you can see, this._flushing starts at 0, and when the _handlePart method executes, this._flushing+=1; When file is closed, this._flushing-=1.
When all uploaded files are processed, the this._flushing variable returns to 0, which triggers the end event
You should be excited to see the end event, because we’re finally at the end.
Really over
We already know who receives the “end” event.
That’s right, we’re back in the code for koA-body. The following process is as follows:
- in
formy
Method returned bypromise
InstanceIncomingForm
Instance issuedend
The event,promise
instanceresolve
- in
promise
The instancethen
In receiving theresolve
The returned data is attached toctx.request
On the object - Middleware execution ends and calls
next()
Answer the questions
In this article, we will start with an example of file uploading and analyze the core logic of koA-Body and formidable for file uploading.
I believe you have the answer to the question we left behind.
Short answer, how does KOA-body handle file uploadings?
A:
- through
req.on('data')
To get the data - parsing
header
, parsingboundary
- Write to a local file through a file stream
Analysis summary
In addition to having a clear understanding of the KOA-Body file upload process, there are a few other things we should learn from the overall exploration analysis, such as
- Something unknown, read the source code
- For some dependencies, the source code is in a different format than the code downloaded from node_modules, and reading the two is a miracle
- We should be aware of using streams to manipulate files
EventEmitter
Is a communication artifact, this idea can be used in business code
Thank you for reading