One, foreword

In the information age, data has become a valuable resource for mobile Internet enterprises. Data acquisition, reporting, storage, analysis and even visual presentation have become an important research direction. Of course, the core of big data analysis is data, among which the source of data is more important. How to ensure that the data can be accurately, timely and completely uploaded to the specified server is the core problem faced by The Android SDK analysis.

Shence analyzed the Android SDK for data transmission, from the integrity, correctness and efficiency and other aspects of comprehensive consideration, designed and implemented a set of network transmission scheme suitable for data collection. The following is a detailed introduction to the Shence analysis Android SDK network module, hoping to provide you with some reference.

Second, network request scheme

Most apps interact with servers and therefore need to be connected to a network to work. The DATA acquisition SDK needs to upload data to the specified server, which also depends on the network. There are many ways to realize the network request in Android. For example, some mature network frameworks can be used to quickly realize the network request function. Or use the Network access API provided by the Android system to achieve the network request function. The advantages and disadvantages of the two schemes are described below.

2.1 Based on open source network framework

There are many excellent open source network frameworks in Android, such as Volley, OkHttp, Retrofit + RxJava, NoHttp, etc. Network request functions can be easily and quickly implemented based on the open source network framework.

The implementation based on open source network framework has the following advantages:

  1. You can reduce the amount of code and focus on the business without spending too much time on technical frameworks;
  2. Rich function, low threshold of use;
  3. The popular open source network framework has been verified by many applications and its performance is relatively stable.

However, there are some disadvantages to implementing an open source network framework:

  1. More functions, complex code logic, high learning cost;
  2. Internal defects are difficult to repair and even need to rely on the author to update maintenance;
  3. Contains a lot of functionality that may not be used and redundant code, which will increase in size when introduced.

It has advantages and disadvantages to realize network request scheme based on open source network framework. We can choose the appropriate open source network framework according to actual needs.

2.2 System based approach

The network request scheme based on the system method usually uses HttpURLConnection or HttpClient:

  • HttpURLConnection: A versatile, lightweight HTTP access class in the JDK java.net package. Most applications use this interface for network access requests.
  • HttpClient: a network access class provided by the Apache open source organization. It encapsulates the details of the HTTP protocol implementation. Before Android 6.0, it was included in the system API.

They both offer a lot of apis and are relatively stable. Both network request classes have the following capabilities:

  1. Supports HTTPS network requests.
  2. You can configure the timeout period.
  3. Supports IPv6.
  4. Support connection pooling;
  5. Can realize streaming media upload and download.

Before Android 6.0, most applications used HttpClient to implement network requests. Compared with HttpURLConnection, using HttpClient has the following advantages:

  1. Comparison in terms of ease of use: HttpClient encapsulates the details of the HTTP protocol and is convenient to use. HttpURLConnection is a standard class of Java, which is difficult to use due to the lack of encapsulation.
  2. In terms of stability: HttpClient is more powerful and more stable, easy to control details. However, the previous HttpURLConnection has always had version compatibility problems.

Android 6.0 removes HttpClient. If you continue to use HttpClient after Android 6.0, you will need to configure the dependency library in build.gradle of the corresponding Module. The configuration is as follows:

android {
     useLibrary 'org.apache.http.legacy'
}
Copy the code

Therefore, HttpURLConnection is recommended for Android 6.0 or higher. HttpClient is used because HttpURLConnection is unstable. Google has fixed some of the problems with HttppurlConnection. Table 2-1 shows the advantages over HttpClient.

Table 2-1 Comparison of functions of HttpURLConnection and HttpClient

Therefore, the network request scheme based on system method is generally implemented by HttpURLConnection.

SDK network module

If the SDK network module is implemented based on the open source network framework, there are certain risks in maintainability and version control, in addition, it will cause the SDK to become much larger. Since these disadvantages are hard to accept by users, implementing network modules based on the open source network framework is not suitable for the SDK.

Considering the above reasons, the SDK network module is finally implemented based on HttpURLConnection.

HttpURLConnection is a network access API provided by the system. It not only meets the requirements of SDK network requests, but also has more stable and extensible system API functions.

3.1 Principles

3.1.1 Implementation Principle

Android network request is based on HTTP protocol implementation. HTTP protocol is the most commonly used and important protocol on the Internet at present. This protocol is a typical request-response model:

  1. The client establishes a connection and sends a request.
  2. The server accepts and processes the request;
  3. The server sends a reply.
  4. The client accepts and processes the reply.

It is necessary to know more about HttpURLConnection when implementing network request scheme based on HttpURLConnection. HttpURLConnection inherits from the URLConnection abstract class, which itself relies on the Socket class for network connections. Socket, also known as a Socket, abstracts complex network operations into simple interfaces for upper-layer invocation. Because HttpURLConnection is not an underlying connection, but rather a request on an underlying connection, HttpURLConnection does not require a Socket.

HttpURLConnection supports GET, POST, PUT, and DELETE requests. The most commonly used are GET and POST requests. The following compares data transmission length and security:

  1. Data transfer length: Generally speaking, the data transfer length of GET request is limited (URL has length limitation), and the data transfer length of POST request is not limited.
  2. Security: GET requests are less secure (sent data is concatenated after the URL), and POST requests are more secure (data is not displayed in the URL).

Considering the large amount of data collected by SDK and high requirements on data security, the NETWORK request is implemented in HttpURLConnection POST mode.

3.1.2 Usage mode

Figure 3-1 shows the procedure for using HttpURLConnection.

Figure 3-1 Process for using HttpURLConnection

Because network access is involved, you need to add network access permissions to the Manifest file:

<uses-permission android:name="android.permission.INTERNET"/>
Copy the code

The above is the principle of HttpURLConnection and the specific use of the introduction, the following is the specific implementation of network request in SDK.

3.2 Specific Implementation

3.2.1 Network Configuration

The SDK can carry out a series of configurations for data reporting. Developers can set corresponding configurations according to the characteristics of the App, so as to achieve the most efficient data reporting effect. The SDK configuration is completed during initialization. You can configure the following parameters:

MServerUrl: indicates the address to which the collected local data is reported. MFlushInterval: specifies the minimum interval (in milliseconds) between sending data. The default value is 15. MFlushBulkSize: specifies the maximum number of entries in the local cache. When the number of entries in the local cache reaches mFlushBulkSize, data will be reported. The default value is 100. MNetworkTypePolicy: indicates the network upload policy. It can be configured to report data on 3G, 4G, 5G, or WIFI networks.

3.2.2 Worker thread encapsulation

SDK data reporting is completed in sub-threads. When the collected data meets the reporting policy, asynchronous data reporting is triggered. Management and scheduling of upload tasks are completed in Worker class. During Worker initialization, an instance of HandlerThread is created. HandlerThread is essentially a Thread class that inherits from Thread. HandlerThread has its own Looper object that can be Looper Looper. Asynchronous tasks can be performed in the handleMessage method by passing the Looper object in the HandlerThread to the Handler object.

AnalyticsMessageHandler inherits from Handler, receives the message sent by the Worker in handleMessage and performs data reporting or deletion.

The Looper object in the HandlerThread, passed to the AnalyticsMessageHandler object, implements the asynchronous network task in the handleMessage method. AnalyticsMessageHandler code is implemented as follows:

private class AnalyticsMessageHandler extends Handler { ...... Worker() { final HandlerThread thread = new HandlerThread("com.sensorsdata.analytics.android.sdk.AnalyticsMessages.Worker", Thread.MIN_PRIORITY); thread.start(); mHandler = new AnalyticsMessageHandler(thread.getLooper()); } @Override public void handleMessage(Message msg) { ...... if (msg.what == FLUSH_QUEUE) { sendData(); } else if (msg.what == DELETE_ALL) { try { mDbAdapter.deleteAllEvents(); } catch (Exception e) { com.sensorsdata.analytics.android.sdk.SALog.printStackTrace(e); } } else { SALog.i(TAG, "Unexpected message received by SensorsData worker: " + msg); }... }... }Copy the code

Worker encapsulates two methods runMessage and runMessageOnce:

  • The runMessage method is used to perform real-time data reporting.
  • The runMessageOnce method is used to delay the execution of a report task.

The sendMessageDelayed() method in Handler can realize delayed data reporting.

3.2.3 Data reporting Policy

Data collection and storage policies are described in SDK data store parsing: Collected data is saved locally and reported only when the data meets the report policy.

  • When the number of data stored locally on the client exceeds a certain value (100 by default), data is reported

During SDK initialization, you can configure the parameter mFlushBulkSize to control the number of entries. If this parameter is not specified, the default value is 100. If the number is less than 50, the default value is 50. The SDK collects a lot of data. If the number of reported data items is set to too small, frequent network requests (reported data) occur, which affects the performance. If the number of reported entries is too large, the uploaded data is too large, which takes a long time and may cause upload failures. If there are no special requirements, use the default value.

  • After data collection, data will be reported at a certain interval (15 seconds by default)

During SDK initialization, you can configure the mFlushInterval parameter to control the interval limit. If the number of reports does not meet the limit, the SDK executes a delay task at the specified time in mFlushInterval.

In addition to the policy mentioned above, all locally cached data will be reported in the form of blocking when the following events are triggered:

  • Report data when collecting $AppEnd event;
  • Report data when App exception is captured;
  • Data is reported after an activation event is triggered.

3.2.4 Data security

3.2.4.1 Data Encryption

The data reported by SDK involves user privacy, and it is the developer’s responsibility and obligation to protect user privacy. The SDK provides data encryption policies to encrypt reported data to prevent user information leakage during transmission.

The data encryption strategy of SDK is to cache the collected data locally, and then use RSA + AES encryption algorithm for encryption. The main implementation process is as follows:

  1. Internal App RSA public key and key (assume A) or obtained from the server (Obtaining the public key from the server is convenient to replace, but it consumes more transmission bandwidth and increases initialization cost. In addition, the server uses multiple key pairs at the same time, so the corresponding private key needs to be selected during decryption. If there are too many keys, the import performance may be affected.
  2. Generate A 128-bit symmetric encryption AES symmetric secret key (assume B) randomly. Use RSA public key A to encrypt AES symmetric key B.
  3. The user triggers the event to generate JSON data. The AES symmetric secret key B is used to encrypt the collected event (that is, the whole JSON data) to generate ciphertext data.
  4. Assemble data according to the format agreed with the background and store it locally.

The assembled format is as follows:

{" PKV ": indicates the key number corresponding to the RSA public key. "ekey": "The ciphertext generated when RSA public key A is used to encrypt AES symmetric key B ", "payload": "Use AES symmetric secret key B to encrypt the collected event, i.e. the whole JSON data to generate ciphertext data"}Copy the code

The system reads data from disks when reporting data. According to the encryption scheme, the ekey field is very long (related to the LENGTH of the RSA key). One Ekey on each strip is redundant. Therefore, data is merged before reporting. For the data with the same “ekey”, the data is merged into an array. The data sent to the server is in the following format:

[{" PKV ": key number of RSA public key 1, "ekey": "ciphertext generated by encrypting AES symmetric key B with RSA public key A ", "payloads": [" encrypted event data 1 "," encrypted event data 2 "]},{" PKV ": "Ekey ":" ciphertext generated by encrypting AES symmetric key B with RSA public key A ", "payloads": [" event data 3 ", "event data 4"]}]Copy the code

Finally, the server decrypts the “ekey” field using the private key corresponding to “PKV” to get the parameter key of AES symmetric key, decrypts payloads to get multiple original messages.

3.2.4.2 HTTPS Two-way Authentication

HTTPS is the SSL encryption layer established over HTTP and encrypts transmitted data. It is the secure version of HTTP.

Security problems may exist in HTTP protocol, including the following aspects:

  1. Data transmitted in plaintext may be stolen.
  2. Failed to verify data integrity;
  3. The identities of the communication parties could not be confirmed.

Using the HTTPS protocol can effectively prevent these problems:

  1. The content is encrypted and a unique encryption key is generated;
  2. Can verify data integrity;
  3. Can confirm the identity of the communication parties.

The SDK supports HTTPS network requests and ensures data security through HTTPS bidirectional authentication.

3.2.5 Data reporting process

During data collection, an asynchronous task is created and added to a task queue. The Task execution sequence is scheduled using TrackTaskManagerThread. When executing a task in the sub-thread, the preset attribute information is collected first, and then the preset information and custom attribute information are encapsulated into JSON format required by the divine policy and stored in the database.

If the system is in Debug mode or the database cache exceeds the maximum limit, data is reported. If the event triggered is $SignUp or the number of items in the local cache is larger than the value of mFlushBulkSize, data will be reported. Otherwise, delay reporting is triggered and data is reported at the mFlushInterval. Figure 3-2 shows the reporting process.

Figure 3-2 Data reporting process

Data is not sent when:

  • If mEnableNetworkRequest is set to false, data will not be reported. You can use the enableNetworkRequest method to set whether to report data.
  • If the serverURL is empty, no data will be reported.
  • Data is not reported when not in the main process.
  • Data will not be reported when there is no network.
  • Data will not be reported if the report policy specified by the SDK is not met.

If the conditions are met, the SDK reports all local data. If a large amount of data is transferred at a time, the possibility of data uploading failure increases and performance deteriorates. Therefore, THE SDK reads up to 50 pieces of data at a time. Gzip is used to compress the read original event data, and Base64 is used to encode the compressed content to ensure efficient transmission. Considering data integrity and security, the hashCode value of the original data is transmitted to the server for data integrity verification.

After the data is reported, check whether the data is reported successfully based on the network request status code: If the network request status code ranges from 200 to 300, the SDK considers the data reported successfully and deletes the local data reported successfully. If a network request fails, local data is not deleted. Each sending loop reads the data from the local cache until all data is uploaded.

4. Data reporting and verification

During the integration SDK development process, the developer needs to verify whether the SDK reports data to the server normally and accurately. The SDK provides real-time query of Logcat console logs and Debug to verify the accuracy of reported data.

4.1Logcat Local log Verification

After initializing the SDK, you need to call enableLog(true) to enable the log output function of the SDK. If the corresponding event is triggered, the SDK will automatically collect and periodically send it to the Shence analysis background. You can use Logcat to view logs for data verification. You can filter SA. In Logcat to view logs reported by event collection in the following scenarios:

  • If the buried event is triggered successfully, the event data starting with “Track Event” is output.
  • When the buried event fails to trigger, the corresponding error cause is output.
  • If the event data is reported successfully, the event data starting with the Valid Message field is output.
  • If the event data fails to be reported, the event data starting with invalid Message is displayed and the error cause is displayed.

During development, you can check whether data is reported normally by using logs.

4.2 Debug Real-time Data Query

The SDK provides the Debug mode to report data, facilitating data verification during SDK integration. In Debug mode, data collected by the SDK is reported in real time. The SDK provides two modes: DEBUG_ONLY and DEBUG_AND_TRACK:

DEBUG_ONLY: The collected data is reported to the server but not stored in the database. You can view the reported data in the real-time Debug query to avoid the dirty data generated during the test being stored in the database. DEBUG_AND_TRACK: Data is reported in real time and stored in the database. Debug Real-time data query Enables you to check whether data is reported properly in the strategic analysis system. To use the Debug mode, scheme must be correctly configured according to the document. Scheme on Android is an in-page jump protocol. After you define your own scheme, you can use a link to pull up an application or jump to a page in an application.

The purpose of scheme configuration in the SDK is to enable the Debug mode by scanning codes to pull up applications. The method of use is:

  1. First use the debugging device to scan the TWO-DIMENSIONAL code of the web page and enable the “Debugging mode” of the device.
  2. After clicking start refresh, the operation App triggers the event;
  3. If the event is uploaded successfully, the corresponding event is displayed in real-time debugging data viewing.

Five, the summary

This paper mainly introduces the specific implementation of Shence analysis Android SDK network module, SDK network request is not based on the open source network framework to achieve, to avoid excessive volume increase and code redundancy. Through encapsulation of system class HttpURLConnection, complete reporting strategy, data compression, coding, verification and other operations, timely, accurate and high-performance data reporting is realized.

Finally, I hope that through this article, we can have a systematic understanding of shence analysis of the Android SDK network module.

References HttpURLConnection usage analytic: www.jianshu.com/p/7330b4ad8…

Android HttpURLConnection network request of exploration: www.cnblogs.com/whoislcj/p/…

Article source: Shence Technology Community