In order to collect statistics on user behaviors or product data, apps often need to report logs, which often consumes a lot of traffic. How do apps report logs?
Voiceover: The bulk of user traffic, log reporting?
Can APP only collect user behavior and product data from server logs without reporting logs? No, some user actions do not interact with the server, such as “card switching”, and the server log cannot complete all statistics.
How does the APP report logs? There are several common methods.
(1) Use third-party tools like Google Analytics;
Advantages: No development required
Disadvantages: can’t do personalized statistics
(2) formulate their own proprietary agreement to report;
Advantages: Saves traffic
Cons: High development costs
Voice-over: For example, the TCP binary protocol can be customized and saves traffic.
(3) Use HTTP protocol to transfer the data to be reported through GET parameter.
How to report through HTTP?
You can place a file under the Web-server, and the APP initiates an HTTP request to access the file, passing data through the GET parameter, and analyzing the Access log to GET the desired data.
How do I pass data through the GET argument?
Generally, there are two ways:
(1) Convention format method;
(2) KV method.
What is the convention format method?
Convention formatting: convention delimiters, convention placeholders, convention the meaning of each field, for example:
Daojia.com/up? [bj] [201…
The agreement is as follows:
(1) The accessed file is up;
(2) Delimiter [];
(3) The first field [BJ] represents the city, the second represents the date, the third represents the time, the fourth represents the user ID, and the fifth represents the behavior.
The disadvantages of this method are: poor expansibility, sometimes some fields have no value, and placeholders must be reserved in corresponding positions, because the meaning of each field is agreed in advance. To add statistics, you can only add [] after GET.
What is “KV law”?
KV method: Data is reported in KV mode through self-interpretation of GET parameters.
The above example is reported by KV method, and the reported form is:
Daojia.com/up?city=bj&…
The advantages of this method are: good scalability.
Disadvantages: A large amount of data is reported, consuming traffic.
Why does it consume so much traffic?
The main reasons for traffic consumption are as follows:
(1) Invalid traffic, HTTP packets have a lot of invalid data; (2) URL redundancy, URL must be reported every time; (3) The KEY is redundant and must be reported every time; (4) High reporting frequency. If users need to report logs for each operation, a large amount of logs will be reported.
Is there a way to save traffic?
According to the above points 1-4, common optimization schemes are as follows.
Pain point 1: There are many invalid data in HTTP requests.
Solution: Manually construct the HTTP request and remove as much invalid data from the HTTP as possible.
Voice-over:
If you use a third-party library to construct HTTP requests, you may bring UA data with you that you don’t need.
GET/UP HTTP/1.1 and HTTP/1.1;
Pain point 2: URL redundancy.
Solution: Use the shortest possible domain name to receive reported logs.
Voice-over: for example, s.daojia.cn/a
Pain point 3: KEY redundancy.
Solution: Use the shortest possible KEY to identify data. Log collectors must standardize the KEY.
Voice-over: For example, city=bj can be optimized to c=bj
In a BAD CASE, due to the lack of standards, a certain department once reported the user ID, which was repeatedly buried in different projects and reported four times:
name=shenjian&user_id=123&uid=123&user_name=shenjian
Name, user_id, uid, and user_name are reported repeatedly.
Pain point 4: High reporting frequency.
Solution: First save the data to local storage of APP, and then report regularly. This kind of optimization is particularly effective for PV, SUM and AVG statistics.
For example, to count the number of login button clicks, the traditional statistics may be reported three times: daojia.com/up?date=201… Daojia.com/up?date=201… Daojia.com/up?date=201…
After optimization, a parameter is added, which only needs to be reported once:
Daojia.com/up?date=201…
When should logs be reported when they are not reported in real time?
The timeliness of the data may be affected if it is reported in a consolidated manner or in batches.
Voice-over: If the strategy is right, the data error is very small.
(1) Report at special time points: for example, when the APP is opened, closed or the background is active; (2) Batch report by time: for example, report every 10 minutes; (3) Batch report by data volume: for example, report once every 10 records collected;
What other optimizations are there? Batch reporting and data compression.
Hopefully, the logic of the article is clear.
The Architect’s Path – Share easy-to-understand technical articles
Research:
Do you have a specification for buried spots?
Have there been repeated burial points in different projects?