IOS Network Monitoring

preface

Recently, the company needs to monitor the network. This paper mainly records the solutions to some problems encountered in the process of network traffic monitoring.

Now there are many online network monitoring articles, this article is mainly a reference

IOS traffic monitoring analysis

IOS performance monitoring solution Wedjat

Analysis of technical principle of mobile terminal monitoring system

Proxy-connection in the Http request header

Hypertext Transfer Protocol — HTTP/1.1

Flow calculation

Response flow calculation

The Response of HTTP network is mainly composed of three parts, the distribution of which is status line, header and body. The following mainly analyzes the traffic calculation methods of these three parts respectively.

Status-Line

Through the method provided by the author of iOS traffic monitoring analysis, we can obtain the status-line of Response

typedef CFHTTPMessageRef (*DMURLResponseGetHTTPResponse)(CFURLRef response); - (NSString *)statusLineFromCF { NSURLResponse *response = self; NSString *statusLine = @""; / / get CFURLResponseGetHTTPResponse function nsstrings * funName = @ "CFURLResponseGetHTTPResponse"; DMURLResponseGetHTTPResponse originURLResponseGetHTTPResponse = dlsym(RTLD_DEFAULT, [funName UTF8String]); SEL theSelector = NSSelectorFromString(@"_CFURLResponse"); if ([response respondsToSelector:theSelector] && NULL ! = originURLResponseGetHTTPResponse) {/ / to get NSURLResponse _CFURLResponse CFTypeRef cfResponse = CFBridgingRetain([response performSelector:theSelector]); if (NULL ! = cfResponse) {// convert CFURLResponseRef to CFHTTPMessageRef CFHTTPMessageRef messageRef = originURLResponseGetHTTPResponse(cfResponse); statusLine = (__bridge_transfer NSString *)CFHTTPMessageCopyResponseStatusLine(messageRef); CFRelease(cfResponse); } } return statusLine; }Copy the code

The status-line of Response obtained by this method is ok. I have made a small addition to the calculation of status-line.

Here is a brief description of the definition of status-line in RFC-HTTP /1.1

The first line of a Response message is the Status-Line, consisting of the protocol version followed by a numeric status code and its associated textual phrase, with each element separated by SP characters. No CR or LF is allowed except in the final CRLF sequence.

       Status-Line = HTTP-Version SP Status-Code SP Reason-Phrase CRLF
Copy the code

The Status-Code element is a 3-digit integer result code of the attempt to understand and satisfy the request. These codes are fully defined in section 10. The Reason-Phrase is intended to give a short textual description of the Status-Code. The Status-Code is intended for use by automata and the Reason-Phrase is intended for the human user. The client is not required to examine or display the Reason- Phrase.

In the formula above, SP stands for space, CR for carriage return, and LF for Line feed. In other words, a complete status-line is shown below

Status-line = http-version space status-code space reason-phrase /r/nCopy the code

The client is not required to examine or display The reason-phrase field. The client is not required to examine or display The reason-phrase field. In other words, the value reason-phrase may not exist. When using Wirehark to capture packets in the network, it is indeed found that some Response stattus-line does not contain reason-phrase.

By comparison, it can be found that although some status-lines do not contain reason-phrase, there is still a space after status-code. In iOS, the status-line obtained by the above method is: If there is no reason-phrase, no space is added at the end of the string. Therefore, special processing is required in this case. In addition, the data length of status-line is calculated by adding the CRLF carried in the tail of status-line. Finally, the modified result of status-line calculation is as follows:

- (NSUInteger)ep_getStatusLineLengths:(NSString *)statusLine { NSMutableString *lineStr = @"".mutableCopy; [lineStr appendString: statusLine]; NSArray *statusLineArr = [statusLine componentsSeparatedByString:@" "]; If (statusLinearr. count == 2 &&!) // If (statusLinearr. count == 2 &&!) // If (statusLinearr. count == 2 &&! [statusLine hasSuffix:@" "]) { [lineStr appendString:@" "]; } // Check whether the status-line has a \r\n suffix. If not, add a \r\n suffix if (! [lineStr hasSuffix:@"\r\n"]) { [lineStr appendString:@"\r\n"]; } / / the Status - the Line after the utf-8 encoding, access to its length NSData * lineData = [lineStr dataUsingEncoding: NSUTF8StringEncoding]; return lineData.length; }Copy the code

Response Header

Again, before calculating the Response Header, let’s look at the definition of Message Headers in HTPP/1.1, as shown in the following reference:

Each header field consists of a name followed by a colon (“:”) and the field value. Field names are case-insensitive. The field value MAY be preceded by any amount of LWS, though a single SP is preferred. Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT. Applications ought to follow “common form”, where one is known or indicated, when generating HTTP constructs, since there might exist some implementations that fail to accept anything

beyond the common forms.

       message-header = field-name ":" [ field-value ]
       field-name     = token
       field-value    = *( field-content | LWS )
       field-content  = <the OCTETs making up the field-value
                        and consisting of either *TEXT or combinations
                        of token, separators, and quoted-string>
Copy the code

Precede The field value MAY be preceded by any amount of LWS, though a single SP is preferred There may be several LWS, also known as Spaces, and the documentation also emphasizes that it is best to have only one space. Although it is not mandatory, when using wireShark to capture packets, it is found that the field value of HTTP/1.1 requests is almost always preceded by a space. As shown in the figure below:

In addition, multiple message-header formats are defined in the Message Types section of the document:

Both types of message consist of a start-line, zero or more header fields (also known as “headers”), an empty line (i.e., a line with nothing preceding the CRLF) indicating the end of the header fields, and possibly a message-body.

        generic-message = start-line
                          *(message-header CRLF)
                          CRLF
                          [ message-body ]
        start-line      = Request-Line | Status-Line
Copy the code

Each message-header will be followed by a CRLF, and there will be an empty line of CRLF to indicate the end of the header fileds. Based on the format definition of message-headers, suppose we have the following headers content

{ "Connection" = "keep-alive"; "Content-Type" = "application/json; charset=UTF-8"; "Date" = "Sat, 14 Jul 2018 02:31:00 GMT"; "Server" = "openresty / 1.13.6.1"; "Transfer-Encoding" = "Identity"; }Copy the code

The HTTP/1.1 header field should look like this:

Connection: keep-alive\r\nConnect-Type: application/json; Charset = utF-8 \r\nData: Sat, 14 Jul 2018 02:31:00 GMT\r\nServer: openresty/1.13.6.1\r\ ntransfer-encoding: Identity\r\n\r\nCopy the code

Therefore, according to the definition, the resulting message Headers will be computed as follows:

- (NSUInteger)ep_getHeadersLength:(NSDictionary *)headers {
    NSUInteger headersLength = 0;
    NSDictionary<NSString *, NSString *> *headerFields = headers;
    NSString *headerStr = @"";
    for (NSString *key in headerFields.allKeys) {
        headerStr = [headerStr stringByAppendingString:key];
        headerStr = [headerStr stringByAppendingString:@": "];
        if ([headerFields objectForKey:key]) {
            headerStr = [headerStr stringByAppendingString:headerFields[key]];
        }
        headerStr = [headerStr stringByAppendingString:@"\r\n"];
    }
    headerStr = [headerStr stringByAppendingString:@"\r\n"];
    NSData *headerData = [headerStr dataUsingEncoding:NSUTF8StringEncoding];
    headersLength = headerData.length;
    return headersLength;
}
Copy the code

However, if you are happy to use the above method to calculate the length of Response Headers, you will find that no matter how you calculate, the length of the Headers will be inconsistent with what you see in WireShark. As shown in the figure below:

Click on each line of Message Headers in the wireShark to see the length of the message headers circled in red. Add up the length of all the message headers (remember to add the \r\n length in the last line). The computed length is (28 + 37 + 46 + 28 + 30 + 2) 171, while the computed length in the program is 172, as shown below:

By comparing the data content of the Response headers obtained by the program with the headers obtained by the wireShark, it can be found that in fact, the content of the two headers is inconsistent:

By comparison, it can be found that the value of transfer-encoding in wireShark is chunked, but the value obtained in our application is Identity. The author has been looking for a long time to find out why it was changed, but there is no valid data to prove it. But I guess apple decoded the data we received in the CFNetwork layer, so the data in the body is not actually chunked for the CFNetwork layer above.

We don’t know why, but we can use the HTTP/1.1 protocol definition to determine the value of transfer-encoding for ourselves, and someone in StackOverflow has already given us the answer.

According to the definition above in RFC2616-SEC3:

Whenever a transfer-coding is applied to a message-body, the set of transfer-codings MUST include “chunked”, unless the message is terminated by closing the connection. When the “chunked” transfer- coding is used, it MUST be the last transfer-coding applied to the message-body. The “chunked” transfer-coding MUST NOT be applied more than once to a message-body. These rules allow the recipient to determine the transfer-length of the message

There is a similar description at RFC7230

If any transfer coding other than chunked is applied to a response payload body, the sender MUST either apply chunked as the final transfer coding or terminate the message by closing the connection.

For example,

    Transfer-Encoding: gzip, chunked
Copy the code

indicates that the payload body has been compressed using the gzip coding and then chunked using the chunked coding while forming the message body.

Any transfer encoding type for message-body must include the chunked transfer encoding, Unless the connection is closed (if the value of “connect” or “proxy-connect” is close in headers), the outermost encoding of the message-body must be chunked.

In RFC2616-SEC4 there is a definition of Message Length calculation:

2.If a Transfer-Encoding header field (section 14.41) is present and has any value other than “identity”, then the transfer-length is defined by use of the “chunked” transfer-coding (section 3.6), unless the message is terminated by closing the connection.

3.If a Content-Length header field (section 14.13) is present, its decimal value in OCTETs represents both the entity-length and the transfer-length. The Content-Length header field MUST NOT be sent if these two lengths are different(i.e., if a Transfer-Encoding header field is present).

     If a message is received with both a
     Transfer-Encoding header field and a Content-Length header field,
     the latter MUST be ignored.
Copy the code

If the header filed has transfer-encoding and its value is not “identity”, then the length of the data transmitted is determined by the “chunked” Encoding.

If there is content-Length in the header field, this 8-byte decimal value represents the entity Length and the data Length transmitted. In other words, the entity Length is equal to the data Length transmitted. When the entity Length and the data Length are different, Content-length may not be transmitted. (For example, if the Transfer-Encoding header field exists, content-Length will be ignored even if it exists)

So taking all the above definitions together, we can conclude the following:

  1. As long as the transfer-Encoding header field exists, the content-Length value is meaningless because the transmitted data must eventually be chunked.
  2. Only if the Transfer-Encoding Header field does not exist and the Content-Length header field does, the entity Length and the transmitted data Length are proved to be the same. In this case, chunked encoding is not used, because chunked will cause the length of the entity and the length of the transmitted data to be unequal.
  3. If the connection is closed, the data transfer will be terminated, so chunked is not an option. The prerequisite for chunked encoding is that HTTP connections must be used.
  4. If the transfer-Encoding Header field exists, HTTP is a regular connection, and the value identity, then the value of the transfer-Encoding header field transmitted over the network must be chunked
  5. If the transfer-Encoding header field exists, HTTP is a regular connection and the value is not identity (for example: Gzip), the value of transfer-encoding must be chunked at the end (this part is derived from the documentation). I have not seen the case of transfer-encoding not being chunked when I catch packets. If Apple is at the CFNetwork layer, it will process whatever transfer-encoding is and set the value of transfer-encoding to Identity. Chunked is not the only code used.

If the Response Header contains a transfer-Encoding Header field and the HTTP connection is a regular connection, then, It must have been chunked in the network. (Judge to ignore case)

  BOOL headerKeysContainsTransferEncodingHeaderField = [self stringArray:[headers allKeys] containsStringCaseInsensitive:@"Transfer-Encoding"];
        BOOL headerValuseContainsKeepAliveString = [self stringArray:[headers allValues] containsStringCaseInsensitive:@"Keep-alive"] ;
        NSString *tranferEncodingKey = [self getStringFromArray:[headers allKeys] caseInsensitiveString:@"Transfer-Encoding"];
        BOOL headerTransferEncodingValueIsIdentify = [((NSString *)headers[tranferEncodingKey]).lowercaseString isEqualToString:@"identity"];
        if (headerKeysContainsTransferEncodingHeaderField &&
            headerValuseContainsKeepAliveString ) {
            if (headerTransferEncodingValueIsIdentify) {
                headers[tranferEncodingKey] = @"chunked";
            }else {
                NSString *transferEncodingvalue = headers[[self getStringFromArray:[headers allKeys] caseInsensitiveString:@"Transfer-Encoding"]];
                NSString *newTransferEncodingvalue = [transferEncodingvalue stringByAppendingString:@", chunked"];
                headers[tranferEncodingKey] = newTransferEncodingvalue;
            }
        }   

- (BOOL)stringArray:(NSArray <NSString *>*)stringArray containsStringCaseInsensitive:(NSString *)string {
    for (NSString *value in stringArray) {
        if ([value.lowercaseString isEqualToString:string.lowercaseString]) {
            return YES;
        }
    }
    return NO;
}

- (NSString *)getStringFromArray:(NSArray <NSString *>*)stringArray caseInsensitiveString:(NSString *)string {
    for (NSString *value in stringArray) {
        if ([value.lowercaseString isEqualToString:string.lowercaseString]) {
            return value;
        }
    }
    return NULL;
}
Copy the code

During the learning process, we saw proxy-connection and Connection for header filed. For details, see proxy-connection in the Http request header in this article, which is interesting.

Response Body

As for the acquisition of Response Body, there are many materials on the Internet that mention the problem of inaccurate Content-length and the impact of content-encoding value on the data size. – (void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data So the value calculated according to the length of the data will always be a little bit larger than the real value. Therefore, you need to simulate a ZIP compression at the application layer to obtain the data length closer to the real transmission.

In addition to Content-encoding, transfer-encoding also has an impact on traffic calculation. This section describes the difference between transfer-encoding and Content-encoding. Extract from RFC2616-SEC3:

Content coding values indicate an encoding transformation that has been or can be applied to an entity.

Transfer-coding values are used to indicate an encoding transformation that has been, can be, or may need to be applied to an entity-body in order to ensure “safe transport” through the network.This differs from a content coding in that the transfer-coding is a property of the message, not of the original entity

Content coding is entity coding, and this value generally remains unchanged throughout network transmission. Transfer-coding is used for data transmission between two nodes directly. When data is sent from the server to the client, it may pass through many nodes, and different transfer-coding can be used between different nodes. Such as: Let’s say we receive an HTTP reply with content-coding zip and transfer-coding chunked when we read the header. The receiver needs to chunked the received data first and then zip it. To get the data that the server is actually transmitting.

In the HTTP/1.1 protocol definition, whenever a connection is constantly connected, there must be a chunked encoding in transfer-coding, and the outermost layer of encoding used to transmit data must be a chunked encoding. Therefore, when we calculate the data length of response body, we also need to consider the impact of transfer-coding on data.

Chunked Transfer Coding

Chunked encoding converts the body of a message into a group of chunks. Each chunk has its own size flag, and at the end of the chunk there may be a tail containing the entity header field. Allows dynamically generated content to follow the necessary information to the receiver, which can verify that the complete message has been received. The format of chunked-body is as follows:

  Chunked-Body   = *chunk
                        last-chunk
                        trailer
                        CRLF
       chunk          = chunk-size [ chunk-extension ] CRLF
                        chunk-data CRLF
       chunk-size     = 1*HEX
       last-chunk     = 1*("0") [ chunk-extension ] CRLF
       chunk-extension= *( ";" chunk-ext-name [ "=" chunk-ext-val ] )
       chunk-ext-name = token
       chunk-ext-val  = token | quoted-string
       chunk-data     = chunk-size(OCTET)
       trailer        = *(entity-header CRLF)
Copy the code

Chunk-size is a hexadecimal string indicating the size of the chunk. At the end of the chunked encoding, there is a string with a chunk-size value of 0, followed by a blank line. (The feeling is just an empty chunk.)

Trailer allows users to add some additional header fields at the end of the message.

When a server responds using Chunked transfer-coding, the use of trailder is not allowed except for the cases listed below:

  1. The TE header field included in the request explicitly states that the use of trailers is allowed in the transfer-codding of responses.
  2. The server is the source server of the response, and the tow field consists entirely of optional metadata that the recipient can use (in a manner acceptable to the original server) without receiving this metadata. In other words, the source server is willing to accept the possibility that the tow field might be quietly discarded along the path to the client.

In general, the client will receive a response that there will be no trailer, so the impact of trailer on the traffic calculation can be ignored.

According to the definition, we use wireShare to capture an HTTP network packet to understand whether the actual situation is consistent with the theory:

Unfortunately, in iOS, there is no way to get the number and size of the chunk, Simply treat a callback to -(void)connection:(NSURLConnection *)connection didReceiveData:(NSData *)data as a chunk received (which is very inaccurate, We will try some hook CFNetwork methods later, and see if we can listen to real chunks. Record the size of the data obtained each time, and then simulate the whole coding process according to the definition of the protocol in the direction to calculate the length of the encoded data. The specific code is as follows:

- (NSUInteger)ep_bodyLength:(NSData *)body headers:(NSDictionary *)headers chunkedSizes:(NSArray *)chunkedSizes { NSString *transferEncodingKey = [EPUtils getStringFromArray:headers.allKeys caseInsensitiveString:@"Transfer-Encoding"];  NSString *contentEncodingKey = [EPUtils getStringFromArray:headers.allKeys caseInsensitiveString:@"Content-Encoding"]; NSString *tranferEncoding = [headers[transferEncodingKey] lowercaseString]; NSString *contentEncoding = [headers[contentEncodingKey] lowercaseString]; NSArray *transferEncodingArr = [tranferEncoding componentsSeparatedByString:@", "]; NSUInteger length = body.length; NSData *bodyData = body; If ([contentEncoding isEqualToString:@"gzip"]) {// To simulate gzip encoding, Obtain the encoded data length} if {(transferEncodingArr. Count > 1) for (nsstrings * transferEncoding in transferEncodingArr) {/ / If ([contentEncoding isEqualToString:@"gzip"]) {// Need to simulate gzip encoding, }else if ([transferEncoding isEqualToString:@"chunked"]) {// The length value is an independent row, Do not include it at the end of the CRLF NSUInteger CRLFSize = [@ "/ r/n" dataUsingEncoding: NSUTF8StringEncoding]. Length; NSUInteger totalChunkedheaderAndFooterSize = 0; for (NSString *chunkedSizeHex in chunkedSizes) { NSUInteger chunkedHeaderSize = [chunkedSizeHex dataUsingEncoding:NSUTF8StringEncoding].length + CRLFSize; totalChunkedheaderAndFooterSize += chunkedHeaderSize; totalChunkedheaderAndFooterSize += CRLFSize; } NSUInteger endChunkedHeaderSize = [@"0"] dataUsingEncoding:NSUTF8StringEncoding].length + CRLFSize; NSUInteger endChunkedFooterSize = CRLFSize; totalChunkedheaderAndFooterSize += endChunkedHeaderSize; totalChunkedheaderAndFooterSize += endChunkedFooterSize; length = length + totalChunkedheaderAndFooterSize; } } } return length; }Copy the code

The whole encoding process of Content-Encoding and transfer-encoding is simulated here, and the final result is just closer to the real length of the transmitted data, but there are still many errors. Content-encoding and TransferEncoding are used for gzip Encoding, but in practice, if the entity has been compressed with GZIP, Transfer-encoding does not use GZIP compression, usually only chunked encoding. As for the data compression process, you can put the child thread to carry out, the child thread to complete the data compression in the database can be saved.

Request Traffic Calculation

The request traffic can be calculated by referring to iOS Traffic monitoring analysis. The following is to record a few knowledge points in the process of calculating the flow.

Request-Line

Due to the lack of an interface, the line section of a reuqest can only be calculated using an empirical value. In general, we will calculate the following code:

- (NSUInteger)ep_getLineLength { NSString *lineStr = [NSString stringWithFormat:@"%@ %@ %@\r\n", self.HTTPMethod, Self. URL. Path, @ "HTTP / 1.1];" NSData *lineData = [lineStr dataUsingEncoding:NSUTF8StringEncoding]; return lineData.length; }Copy the code

In fact, this calculation method is very consistent with the format of line we usually see, but in some cases, there are some errors. The reasons for the errors are mainly recorded below.

The format and content of request-line are defined in RFC2616-SEC5:

        Request-Line   = Method SP Request-URI SP HTTP-Version CRLF
Copy the code

Among them, there is no error in the calculation of Method and HTTP-version. Although we write HTTP/1.1 for http-version, the length of HTTP/1.0 and HTTP/2.0 is the same after UTF-8 encoding, so there is no error. So the main source of error is the request-URI.

In RFC2616-SEC5, the definition of request-URI is as follows:

       Request-URI    = "*" | absoluteURI | abs_path | authority
Copy the code

Thus, self.url.path in the above code is only the third of the four options for the Request-URI, so when the request-URI value is the other three and we calculate the third option, an error is generated.

* : indicates that the request does not apply to a specific resource, but to the server itself, and is only allowed if the method does not necessarily apply to the resource.

AbsoluteURI When a request is sent to an agent, it requires the full request address.

In most cases, the RELATIVE address is used by abs_PATH.

Because ABS_PATH is the most common case, in general, you can simply compute this way.

Header

In the Request created through the Cocoa layer, several header fields are missing, including but not limited to:

1. Accept
2. Connection / Proxy-Connection
3. Host
Copy the code

If you have high requirements for traffic, you can make up some experience.

In the case of requests sent by iOS clients, Accept is usually */*. If the proxy is not set, HTTP/1.1 defaults to Connection, but the header is usually set to Connection: Keep-alive, if a Proxy is set, then proxy-connection: keep-alive. So Host is just the Host in request.

Cookier

I don’t have any cookier information in the request, so I’m just going to get it myself, add it to the header and calculate it.

- (NSDictionary<NSString *, NSString *> *)dgm_getCookiesByUrl:(NSURL *)url {
    NSDictionary<NSString *, NSString *> *cookiesHeader;
    NSHTTPCookieStorage *cookieStorage = [NSHTTPCookieStorage sharedHTTPCookieStorage];
    NSArray<NSHTTPCookie *> *cookies = [cookieStorage cookiesForURL:url];
    if (cookies.count) {
        cookiesHeader = [NSHTTPCookie requestHeaderFieldsWithCookies:cookies];
    }
    return cookiesHeader;
}
Copy the code

Body

Since I’m using NSURLSession for the network request, and the request is not encoded back in the body transfer like the response, I can use the NSURLSessionDelegate callback directly to get the exact value

- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task
   didSendBodyData:(int64_t)bytesSent
    totalBytesSent:(int64_t)totalBytesSent
totalBytesExpectedToSend:(int64_t)totalBytesExpectedToSend;
Copy the code

Time statistics

When making a network request, in addition to traffic, the associated time statistics of the network request is also an important part. The time required for statistics includes:

  1. DNS resolution time
  2. TCP connection establishment time
  3. SSL Authentication time
  4. Request start time
  5. Reply time

In iOS 10 and up, you can use the NSURLSession callback directly, but there’s a lot of data on the web, so I won’t go into that.

- (void)URLSession:(NSURLSession *)session task:(NSURLSessionTask *)task didFinishCollectingMetrics:(NSURLSessionTaskMetrics *)metrics 
Copy the code

If the requirement is to monitor the time of iOS 9, all the methods found on the Internet are using fishhook to hook the connect method of BSDSocket, but I have not succeeded in my test. How to hook a socket or connect is also found in fishhook

In the iOS Wedjat performance monitoring scheme mentioned to hook CFNetwork CFReadStreamCreateForHTTPRequest method, return to an own Proxy, I tried for a long time, also has not been successful. By using fishhook, I am sure that the relevant method has been hooked, but when using NSURLSession to make network requests, the method I hook will not be called. When if it is to call CFReadStreamCreateForHTTPRequest method, is the way will call hook. The reason should be consistent with the method that fishhook can’t hook connect.

Therefore, in the time statistics method, the mobile phone of iOS 9 can only obtain the start time of the application layer request and the time when the application layer receives the reply

If you have a better time to get a TCP connection for iOS9,