Author: Aliyun MVP Tian Liang
I. Product Background: KilaKila is an entertainment interactive content community software that focuses on two-dimensional elements and focuses on young users in China. KilaKila launched interactive voice live broadcasting, short video dubbing, dialogue novels and other functions to meet the personalized and fragmented entertainment needs of young users. As the core business of Community building, App user rating system plays a crucial role in enhancing community activity and improving product retention. As the business scale grows, the bottleneck of real-time collection and calculation of massive user behavior logs becomes increasingly prominent. Due to the limited processing capacity of a single server, massive data analysis needs to be replaced by a distributed computing model. Through technical research and architecture selection, the final solution adopts the infrastructure based on the combination of Ali Cloud Log Service and open source technology Storm.
2. Real-time log collection: LogHub supports multiple non-destructive log collection methods, including client, web page, protocol, AND SDK/API. All the collection methods are implemented based on Restful APIS. Service logs are output to the local server in real time. Logtail can be deployed on the log server to collect logs without loss. According to different service scenarios, logs can be classified according to different topics to meet the personalized computing requirements of different services. In addition, LogHup can be configured with its own delivery service to synchronize massive logs to a data warehouse for permanent storage.
Figure 1: Log collection flow chart
To collect user behavior logs from Nginx to Logstore by Logtail, you only need to configure the machine group where the logs reside and the absolute path of the logs. The log collection from the disk server to the Logstore is completed within one second. In addition, Logstore supports the multi-function log search service, which helps you quickly query user behaviors. Among them, we store logs of different topics in different LogStores for targeted real-time consumption of subsequent different businesses.
Figure 2: Log structure diagram of Clara Topic
Figure 3: Log retrieval
Iii. Real-time Business Scenarios:
In order to meet the target of User community of Clara, this business mainly aims at real-time calculation of user experience value under more than 100 behavior scenarios of the three business lines of live broadcast, novel and video, namely, increase, deletion, change and check of data value. In the real-time computing layer, Krakela chose Storm’s open source distributed real-time big data processing framework, and Ali Cloud log service has very good compatibility and support for Storm.
Figure 4: Clarra real-time computing framework
Figure 5: Relationship between LogHup and Storm
Real-time Data Storage There are many requirements for scenarios such as data cache and permanent storage in the Real-time computing framework of Clarra. Faced with this problem, Clarakra adopted ali Cloud OTS component scheme. OTS, also known as Table Store, is a NoSQL multi-model database developed by Ali Cloud, which provides massive structured data storage and fast query and analysis services. The distributed storage of table storage and powerful indexing engine can provide petabyte storage, ten million TPS and millisecond latency. The data storage requirements involved in Storm calculation were realized by using the Java SDK development package provided by OTS.
Figure 6: Example of clarakots storage