Introduction: What is a remote log? What did you do? How is it implemented internally? This article will introduce the development process and prospect of remote log from three aspects of function, architecture and experience optimization.
Aliyun Cloud Native Application R&D Platform EMAS Zhang Yue (here)
preface
After the App is released to users, developers can only perceive the running state of the App through various business and stability monitoring. These monitoring platforms collect online issues (crash stacks, abnormal network requests, etc.) and business data to the server, and then provide users with BI data such as aggregated Metrics. However, this process is easy to lose details, can not directly reflect the cause of the problem, resulting in online problems difficult to troubleshoot.
You might be smart enough to think, why don’t I just report all the logs to the back end? However, this will bring a lot of meaningless network consumption to the App, and also cause relatively large network and storage pressure. Ali cloud remote logs (www.aliyun.com/product/ema.) In this scenario, logs are stored locally in the App and pulled when needed. At the expense of real time, the problem of log fee storage is solved. If logs are not reported, problems cannot be found.
What exactly does the remote log do in it? How is it implemented internally? I’m going to take a look at the evolution of remote logging in terms of functionality, architecture, and experience optimization, and finally talk about what we’re looking forward to.
function
As introduced in the introduction, we will save the log locally in the App, and then pull it up for analysis and review when needed. The whole process can be easily broken down into the following steps:
Mobile features
- C level implementation: performance improvement, a piece of code multi-terminal support
- Encrypted storage: Asymmetric encryption is used to secure log storage and reporting
- Log rotation: Logs can be stored for a maximum of 7 days, with a maximum of 10 MB per day
- MMAP mechanism: Avoids cache log loss and improves performance. (Note: The MMAP mechanism directly maps files to memory, avoiding page cache to file copy. For details, go to www.cnblogs.com/huxiao-tee/…)
The backend features
Remote logs are asynchronously pulled back from mobile devices for analysis. Combined with different use scenarios and users’ different emphasis on the characteristics of equipment accuracy, timeliness of pull and success rate of pull, we have done different practices in pull mode and product linkage.
Pull mode
Precise pull: Specify the device list
For example, you arrive at work on a sunny morning to find your boss grimacing that his App crashed last night. The choice is to fix the problem or run with the bucket. At this time the remote log appeared, you can skillfully input the boss’s device Id, do a log pull, check the boss App log location reason, solve the problem.
In this mode, users face two experience problems:
- Slow pull speed. After the task is delivered, the terminal user needs to open the App again to complete the log upload, but the time is very uncontrollable.
- Low pull success rate. After the pull and fetch task is delivered, some devices are inactive and cannot receive the pull command. As a result, logs cannot be uploaded.
To solve this problem, we optimize the pull and realize the function of intelligent screening.
Intelligent filter: Specify filter criteria
The user does not need to specify the target device list, but cares about the detailed device logs in one or more dimensions. The user can set the combination of equipment pull conditions, the system automatically help users to select equipment.
To accelerate the pull speed and success rate of the device, the following policies are added to the remote log on the selected device:
- Automatically filters the most recently started device pull.
- Automatically adjust the selected equipment pool, expand the drawing scale, up to 8 times the original size.
Linkage crash data
In the scenario where an server-side problem occurs, it is mostly due to a crash or abnormal behavior. To support this scenario, we provide crash analysis data linkage support, which breaks the data silos between crash analysis and Remote logging products and provides more possibilities for troubleshooting.
Data linkage: Crashes the device list
Crash analysis provides the list of collapsed devices to help remote logs directly determine the range of devices to be pulled, saving users’ efforts. You can click the list page of Crash Analysis in EMAS to jump to the drop-down list page.
This greatly simplifies the task of selecting a list of devices for troubleshooting crash-related problems, but you still need to create a pull log task for each crash problem. There is still a lag in time in the process of pulling, which will not only interrupt the engineers’ troubleshooting ideas, but also kill the enthusiasm of troubleshooting problems. Can we dispense with the pull action and simply prepare the device log for the crash problem? Smart Pull is designed to solve this problem.
Smart pull: Pull ahead of time
We deepened the previous data linkage. For the problem of first occurrence and Top crash, we created tasks at 7 o ‘clock every day. Basically, the development students had pulled successfully when they went to work, which greatly improved the efficiency of the problem investigation of the development students.
architecture
Experience optimization
In addition to polishing the internal work, we have also made a considerable amount of optimization on the use experience of the product, and I would like to share it with you.
- Task message notification, so that you can be the first time aware of the log report
- The task details and log details pages are integrated to facilitate viewing task logs
- Improved task, device, and log query filtering
- The task time line is visible, and the task life cycle is sensed
- The top menu bar becomes a sidebar, and the console and EMAS style are unified
- Pull task persistence, better fault tolerance.
That concludes our current overview of remote logging’s functionality, architecture, and experience, and brings us to our plans for the future.
Looking forward to
Increase reporting form
At present, all tasks are delivered through the server, and the “passive reporting” mode of receiving and reporting tasks on the server has certain limitations. Next, we hope to support the “active reporting” of logs by the client in some cases, such as continuous Crash, or reporting when users feedback problems in the App.
Enrich collected data
Now our log printing is limited to user logs, but we also need to support more traceless burying points to record user operation paths and network IO operations, so as to enrich log data, and to reproduce user operation flow and machine state changes through logs.
More Product linkage
“Collapse analysis” is the first leg the linkage of our products, in addition to collapse and exceptions, the App developers will increasingly focus on quality with the pursuit of performance problems, after all “function decided now, performance decided the future”, later we will think and how to get the products and remote logs through “performance analysis”, better serve our customers.
Mobile development platform EMAS
Alibaba’s application DEVELOPMENT platform EMAS is a leading cloud native application development platform in China (mobile apps, H5 applications, small programs, Web applications, etc.). It is based on a wide range of cloud native technologies (Backend as a Service, Serverless, DevOps, low code, etc.). It is committed to providing one-stop application R&D management services for enterprises and developers, covering the whole application life cycle of development, testing, operation and maintenance, operation and so on. Welcome to move to use: cn.aliyun.com/product/ema…