Abstract:

background

CDN is a very important Internet infrastructure. Users can access various pictures, videos and other resources in the network quickly through CDN. In the process of access, CDN will generate a large amount of log data, and with the change of the increasingly complex network environment and the rapid growth of services, log data becomes more and more large and multidimensional. This data is often relevant to the user’s next business decision.

In the communication with CDN users, we find that they usually face the following dilemmas:

• No user data: The CDN access logs are generated by major CDN manufacturers and cannot be directly obtained by users. At present, most CDN manufacturers only provide offline log download. It takes tens of minutes to several hours for log data to be generated and downloaded by users. Such a large data generation delay greatly reduces the analysis value of real-time stream processing, alarm and other high real-time requirements of the scene.

• Multiple analysis requirements: In order to meet various customized analysis requirements, open source systems, such as Kafka for data channel, Storm or Flink for streaming analysis, Spark for data analysis, hadoop, etc., are usually built and maintained.

• Visualization requirements: To display final analysis results, rely on databases (with small result sets) and HBase (with large result sets) to store the results, and then interconnect with various visualization tools.

Above all, more real-time, attention to detail, and analyze the needs of the log emerging gradually, but the average user to CDN real-time and off-line analysis and log is not easy, need pay structures, operations and management cost, in order to complete the requirements, sometimes you need to write some code, but in the end is not necessarily can get good results. The whole CDN real-time log involves many links and has strict requirements on the quality of service, posing great technical challenges. Is there a better solution?

The CDN real-time log one-stop solution is online

Recently, Aliyun CDN launched the real-time log function, which enables the log service (SLS) to deliver real-time logs collected by CDN to the log service within less than 60 seconds for real-time and interactive analysis and report presentation. Through the real-time analysis of CDN logs, problems can be quickly discovered and located, and log data can be mined to improve the decision-making ability of data and push the business to a new height. Click the CDN Real-time log page to learn more about the function.

CDN Real-time log service and log download

CDN real-time log data is collected in real time. The average delay of log data is less than 30 seconds. At the same time, CDN gets through the ability of log service analysis, customizing four analysis reports for customers, which can quickly analyze logs, find problems and make timely decisions. The CDN provides offline log download, which can only download the log data generated four hours ago.

The simplified data flow in CDN real-time log system is as follows:

• Real-time data collection: A large number of logs will be generated during live streaming and playback, and these logs need to be collected to the log center in real time within a second-level delay.

• Data cleaning: After logs are collected, data is cleaned to meet processing requirements in different scenarios (for example, customized analysis of logs of different domain names).

• Data processing and storage: The data processing and storage methods vary according to application scenarios.

  1. Real-time processing: Real multi-dimensional aggregation statistical analysis of massive data at the second level.
  2. Table storage: Monitoring indicators collected in real time.
  3. Object storage: Logs are packaged and compressed for offline download.
  4. Data warehouse: data offline analysis, user behavior analysis, and property report.

Value of CDN real-time logs

1. The real time

In the traditional log analysis mode, you need to download logs and upload them to the data warehouse. Data analysis is performed after a series of cleaning and data model definition in the data warehouse. This process requires a lot of maintenance and takes a long time.

CDN real-time logs can be collected from tens of thousands of nodes in multiple regions around the world in real time. The delay usually does not exceed 60 seconds. Otherwise, the real-time value of logs is greatly reduced. At the same time, after the service is opened, CDN automatically delivers log data to the log service (SLS), eliminating the tedious traditional log analysis process and viewing the log analysis results in real time.

2. No need to write code, no need for operation and maintenance

As mentioned above, the cost of development, operation, maintenance and management is relatively high if the logging system is to be built by itself to meet the needs of business customization. The access to CDN real-time logging system can enable developers to return to the innovation and performance of business and reduce unnecessary investment.

3. Multi-dimensional SQL analysis, with a scale of 1 billion + seconds

CDN real-time log system supports daily 100 billion, trillion logs 7*24 hours uninterrupted collection, and real-time multi-dimensional analysis of massive logs, flow calculation system in millisecond level. Let users away from log analysis of various miscellaneous “trivia”, more focused on the business more closely, more valuable data “analysis”.

In addition, real-time logs can easily cope with service scenarios such as large data processing combination dimensions, high computing complexity, and various traffic peaks. The object storage system (Oss) that stores logs for users to download can provide high throughput data download capability, and complex analysis scenarios can be supported by the data warehouse system.

4. Data visualization and big data mining

The presentation of the final analysis results is also very important. CDN real-time log can provide users with visualized report services based on business, and users can easily control the data of business health, cache hit ratio, average download speed, traffic status, network speed, operator, delay distribution and so on.

5. One-stop solution for log, monitoring, and alarm linkage

In the CDN scenario, the availability and performance of services are demanding, and real-time and accurate alarms for all kinds of anomalies are required, which requires reliable monitoring and alarm systems. In the future, CDN log system will be linked with monitoring, alarm, and processing mechanisms to automatically solve routine problems, shorten the time of service failures, and avoid user losses.

Typical application scenarios

1. Live streaming

In the live broadcast scenario, after CDN logs are delivered to the log service in real time, several typical real-time analyses can be done.

Live stream data is very important. With the log of live stream, various real-time states of the stream can be controlled:

, push flow overview: real-time know the current push flow quantity, each push flow of traffic and speed, and the statistics, from the dimension of each province, operators pushed the flow quality, multi-dimensional push flow quality statistics, key push flow of real-time quality control, error source tracking: rapid positioning error produced by the source (live source, server, client, operator)

The following figure shows the monitoring statistics of the live streaming. From the overall quality of the streaming, more than 99% of the streaming is normal, indicating that the streaming quality is very good.

The following table shows the causes of various types of errors. The biggest source of errors is the active client disconnection.

2. The CDN downward

The playback end (CDN downlink) is directly contacted by users, and its quality directly determines users’ viewing experience. In downlink log, I can also analyze it from multiple dimensions:

• Overall quality: Health: How many requests are successful among all accesses Cache hit ratio: The higher the hit ratio, the lower the access latency, the better the download speed: This is also an important factor related to the playback quality

• Multi-dimensional analysis: Top domain name access times, traffic: access quality of key domain names region, carrier statistics: quality of each link Download, speed, delay: multiple key indicators

• Error diagnosis: real-time error QPS, Ratio: overall error situation Error Top Domain name and URI: Error related to itself Error Top region, Carrier: Error related to external factors Error Client: Error caused by a new release

In the figure below, you can see that most of the errors occurred in this client version.

3. User behavior analysis

User’s access behavior, and ultimately, things now log on, through the analysis of log, learn how users visit, what resources are hot resources, through the source of the user, more clearly understand the source of user, after operation can also be more targeted, in addition, to monitor abnormal IP, can be found earlier abnormalities, such as the high frequency IP, Whether there is a suspicion of data crawling.

The Demo presentation:

When the system alarms or there are user complaints, the general processing process is often similar:

  • Overview: Is the overall access normal?
  • Narrow down: is it a local error, which domain, or region, or just a user?
  • Precise positioning: after narrowing the scope of the survey, the local data can be compared with the same period and the same period; Observe more detailed logs; Query analysis of Adhoc in multiple dimensions.

In this process, it can be found that the whole analysis process, from top to bottom, from surface to point, interactive analysis, involves the Drill Down/Roll Up and other aspects. Therefore, flexibility and convenience are necessary for the system. The following video shows how to interactively analyze CDN logs in the logging service.

In addition, we have provided a Demo to get a hands-on look at Mock CDN log analysis: Demo connection

6. Access process

At present, the real-time log function has been launched on the CDN console. Users can use the REAL-TIME LOG function of the CDN quickly and barrier-free through simple operations. The main steps are as follows:

  1. Log in to the CDN console.
  2. On the left navigation bar, click Log.
  3. On the Log page, click Real-time Log Push.
  4. Click to create a log service.
  5. Configure Project, Logstore, and region, and click Next.
  6. Select the associated domain name and bind it, then click Create.

Vii. Billing method and activities

In general, real-time logs are paid 0.06 yuan per 10,000 items based on the number of successful notifications, including the cost of log service analysis. Therefore, up to a certain usage limit, you do not have to pay any logging service fees.

However, you also need to pay for the logging service in the following cases: 1. Log storage for more than 7 days is charged by the log service. 2. Extranet read and write costs of the log service.

For details about logging service charges, see Price details.

In November, CDN Real-time Log service launched a special event, 50% off for a limited time, click to buy

Click the CDN Real-time log page to learn more about the function.