Hornet’s nest technology original content, more dry goods please subscribe to the public number: MFWtech
Advertising is one of the important means of Internet cash.
Take Hornet’s Nest tourism App as an example. When users open our App, they may see pushed advertisements on the first screen or in the information flow and commodity list. If they happen to be interested in the content of the AD, the user may click on the AD to learn more, and then perform the follow-up action that the AD is intended to accomplish, such as downloading the App recommended by the AD.
The task of advertising monitoring platform is to continuously and accurately collect the information carried by users in browsing and clicking ads, including source, time, equipment, location information, etc., and process and analyze it, so as to provide basis for advertisers to pay for settlement and evaluate the effect of advertising.
Therefore, a reliable and accurate monitoring service is very important. In order to better protect the rights and interests of both the platform and advertisers, as well as to provide support for improving the advertising service effect of Hornet’s Nest, we are also constantly exploring suitable solutions to strengthen the ability of advertising monitoring services.
Part.1 Initial form
At the beginning, our advertising monitoring service did not form a complete service to open to the outside world, so the implementation method and the ability to provide are relatively simple. It is mainly divided into two parts: one is to report the event based on the client end; The other part is to transcode archives for exposure, click links, and parse the jump when the request comes in.
But soon, the disadvantages of this approach are exposed, mainly reflected in the following aspects:
-
Accuracy of data collection: Data forwarding can only be completed by accessing the middleware, which increases the probability of multi-segment packet loss. In comparison with third-party monitoring services, the Gap difference is large;
-
Data processing ability: The collected data comes from various business systems and lacks unified data standards. The multiple attributes of the data make it very complicated to parse and increase the difficulty of the secondary utilization of comprehensive data.
-
** Burst traffic: ** When the traffic instantaneous increase, Redis will encounter high memory consumption, service drop frequent problems;
-
Complex deployment: With the change of different devices and different advertising space, the dot tends to be complicated, and even may not be covered;
-
Development efficiency: In the early stage, the advertising monitoring function is single, such as the calculation and query of real-time conditions need additional development, which greatly affects the efficiency.
Part.2 is an architecture implementation based on OpenResty
In this context, we created the Hornet’s Nest advertising data monitoring platform ADMonitor, hoping to gradually make a stable, reliable and highly available advertising monitoring service available.
2.1 Design Roadmap
In order to solve various problems in the old system, we introduced a new monitoring process. The main process is designed as follows:
-
Generate unique monitoring links for each AD on a new monitoring service (ADMonitor), attached to the existing customer links;
-
All exposure links and click links sent from the server depend on the service provided by ADMonitor in parallel;
-
The client makes a parallel request for the exposure behavior, and the click behavior will preferentially jump to ADMonitor, and ADMonitor will do the two-step jump.
In this way, the monitoring service is completely dependent on ADMonitor, which greatly increases the flexibility of monitoring deployment and the performance of the overall service. At the same time, in order to further verify the accuracy of the data, we retained the way of dot comparison.
2.2 Technology selection
In order to implement the above process, the traffic entrance of advertising monitoring must have high availability and high concurrency, and reduce unnecessary network requests as far as possible. Considering that multiple internal systems need traffic, in order to reduce the labor cost of system docking and avoid interference to online services due to system iteration, the first thing we need to do is to separate the traffic gateway.
There are many solutions for C10K programming related technologies, such as OpenResty, JavaNetty, Golang, NodeJS, etc. The common feature is that multiple requests can be processed simultaneously using a single process or thread, based on thread pool, based on multiple coroutines, based on event-driven + callback, and achieve I/O non-blocking.
We chose to build an AD monitoring platform based on OpenResty for the following reasons:
First, OpenResty works at layer 7 of the network and relies on more powerful and flexible regular rules than HAProxy. It can implement some policies of shunting and forwarding according to the domain name and directory structure of HTTP applications. It can do both load and reverse proxy.
Second, OpenResty has a Lua coroutine +Nginx event-driven “event loop callback mechanism,” which is OpenResty’s core Cosoket, The remote backend such as MySQL, Memcached, Redis, etc. can be synchronized to write code to achieve non-blocking I/O;
Third, with LuaJit, the just-in-time compiler compiles frequently executed code into machine code and caches it so that it executes the machine code directly on the next call, making it more efficient than natively executing virtual machine instructions one by one, while still executing code one by one if it is executed only once.
2.3 Architecture Implementation
The overall solution relies on OpenResty processing mechanism and is customized in the server. It is mainly divided into three parts: data collection, data processing and data archiving to achieve asynchronous split request and I/O communication. The schematic diagram of the overall structure is as follows:
We store multi-woker log information in the Master shared memory in the form of double-ended queue, enable Worker Timer millisecond level Timer, and parse traffic offline.
2.3.1 Data collection
The collecting part is also the part where the main body bears the biggest flow pressure. We use Lua for overall checking, filtering, and pushing. In our scenario, data collection does not need to consider timing sequence or aggregate data, so Lua shared memory can be selected as the core push medium, and I/O requests can replace network services required by accessing other middleware, thus reducing network requests and meeting the requirement of immediacy, as shown below:
With the OpenResty configuration, here are some of the optimizations we made to the server nodes:
-
-lua_code_cache:
(1) After this function is enabled, Lua files will be cached in memory to accelerate access, but the modification of Lua code needs reload
(2) Try to avoid the generation of global variables
(3) After closing, it will rely on Woker process to generate its own new LVM
-
Setting up resolvers can be helpful for network requests, good DNS nodes, or self-built DNS nodes when network requests are high:
(1) Add DNS service nodes of the company and compensating public network nodes
(2) Shared is used to reduce Worker query times
-
Set epoll (multi_accept/accept_mutex/worker_connections) :
(1) Set up THE I/O model and prevent alarm group
(2) Avoid service node waste resources to do useless processing and affect the overall flow, etc
-
Set the keepalive:
(1) Including link duration and request upper limit, etc
Configuration optimization is to match the current request scenario on the one hand, and to match Lua for better performance on the other hand. Setting parameters of Nginx server is based on different operating system environment. For example, in Linux, everything is file, the number of files opened, TCP Buckets, TIME_WAIT, etc.
2.3.2 Data processing
This part of the process is to collect the data through ETL first, then create an internal log location, combined with Lua custom log_format, using Nginx sub-request feature to complete offline data drop, while ensuring the data delay in milliseconds.
The processing of parsed data is done in two parts, one is ETL and the other is Count.
(1) the ETL
Main process:
-
After the log is uniformly formatted, the part containing actual parameters is extracted for data parsing
-
The extracted data is filtered and processed according to the overall character set, IP, device, UA and related label information
-
The converted data is reloaded and log redirected
Lua uses FFI to resolve “IP!” through the IP library. Copy the IP library to memory with C, Lua query at the millisecond level:
(2)Count
For advertising data, most of the business needs are from data statistics, here directly use Redis+FluxDB data storage, in order to have the following key technical points:
-
RDS combines Lua to set the link time and configure the link pool to increase link reuse
-
RDS cluster service realizes decentralization, disperses node pressure, increases AOF and delays warehousing to ensure reliability
-
FluxDB ensures that the data log time sequence can be checked, and the aggregated statistics and real-time report performance are better
2.3.3 Data Archiving
Data archiving requires the entry of full data into tables, which involves filtering some invalid data. The whole system is connected to the company’s big data system. The process is divided into online processing and offline processing, and data can be traced back. The solutions used are online Flink and offline Hive, which need to pay attention to:
-
ES index and data are maintained regularly
-
Kafka consumption
-
Use automatic script to restart and alarm the machine when it fails
Real-time data sources: Data collection service → Filebeat → Kafka → Flink → ES
Offline data source: HDFS → Spark → Hive → ES
Reuse of data after parsing:
The parsed data is already valuable for reuse. There are two main application scenarios.
One is OLAP, which analyzes the change of the crowd attribute tag of the visiting advertisement according to the business scenario and data performance, including the region, equipment, the proportion and growth of the crowd distribution, etc. At the same time, the future population inventory ratio is predicted, and finally affect the actual delivery.
The other part is in OLTP, and the main scenario is:
-
Determine whether the user belongs to the advertising audience area
-
UA information is parsed to obtain terminal information and determine whether it is low-level crawler traffic
-
Device number marking, real-time user portrait from Redis, real-time marking, etc
2.4 Other OpenResty Application Scenarios
OpenResty plays an important role throughout our AD data monitoring service:
-
Init_worker_by_lua stage: Service configuration
-
Access_by_lua: Provides services such as CC protection, permission access, and traffic timing monitoring
-
Content_by_lua phase: Responsible for implementing services such as speed limiter, shunt, WebAPI, and traffic collection
-
Log_by_lua phase: Performs services such as log falling
Focus on the implementation of the following two applications.
2.4.1 Diverter service
The NodeJS service reports the current server CPU and memory usage to the OpenResty gateway. After the Lua script calls RedisCluster to obtain the NodeJS cluster usage in the time window, it calculates the NodeJS machine with high load. OpenResty fuses, degrades, and limits the flow of NodeJS cluster traffic. Synchronize monitoring data with the InfluxDB for timing monitoring.
2.4.2 Small WEB Firewall
It uses the third-party open source luA_resty_waf class library to support IP whitelist and blacklist, URL whitelist, UA filtering, and CC attack defense. On this basis, we added WAF support for InfluxDB, timing monitoring and service early warning.
2.5 summary
In summary, ADMonitor, an advertising monitoring service based on OpenResty implementation, has the following features:
-
High availability: Rely on OpenResty as Gateway and multi-node HA
-
Immediate return: After data is parsed, data is processed asynchronously using I/O requests to avoid unnecessary network communication
-
Decoupling function module: decouples request, data processing and forwarding to reduce the time consuming of single request serial processing
-
Service assurance: Important data results are stored separately with third-party components
The complete technical scheme is as follows:
Summary Part. 3
At present, ADMonitor has been connected to the company’s advertising service system, and the overall operation is relatively ideal:
1. Performance effect
-
Reached the high throughput, low delay standards
-
High forwarding success rate, exposure count success rate >99.9%, click success rate >99.8%
2. Business effect
-
Data comparison with mainstream third-party monitoring institutions: GAP of exposure data < 1%, GAP of click data < 3%
-
Can provide real-time search and aggregation services
In the future, we will continue to improve the business development and service scenarios, looking forward to more exchanges with you.
The author of this article: Jiang Minghui, Marhoneycomb travel network brand advertising data server group r&d engineer.