Advertising, value-added services and commission are the three most common means of profit for Internet enterprises. Among these three classics, advertising occupies the largest market share and is almost the main revenue channel for most Internet platforms. The importance of business is self-evident.
From the technical point of view, advertising business involves AI algorithm, big data processing, search engine, high performance and high availability of engineering architecture and other directions, also has a good technical appeal.
I started to engage in advertising business last year, and it has been almost a year since then. This article will combine my personal experience and refer to excellent cases in the industry to elaborate the architectural practice scheme of the advertising system, hoping to make you gain something. The content includes the following three parts:
- Introduction to Advertising Business
- Technical challenges
- Detailed explanation of advertising system architecture
01 Introduction to advertising Business
Advertising, so to speak, is everywhere. Wechat, Douyin, B station, Baidu, Taobao and so on, which occupy the longest time of users, can see the shadow of advertising everywhere.
What is the business logic behind the ads we see everywhere every day? Before I share the architecture of the advertising system, let me give you a quick overview of the business.
1.1 The core of advertising business is balance
Why is the core point of advertising business “balance”? It can be understood from the standard definition of advertising.
Advertising is defined as the means by which advertisers pay to communicate information about goods or services to users through Internet platforms. This definition involves three subjects: advertisers, platforms and users, but the interest concerns of these three subjects are different.
- Advertiser: Focus on ROI and see if the money you spend is delivering the expected return
- Platform: Own traffic and focus on maximizing revenue
- Users: Focus on the experience, is the advertising accurate enough? Does it affect normal function?
Sometimes the interests of the three are in conflict. For example, if the platform increases the number of advertising Spaces, the revenue will definitely increase, but the user experience may deteriorate. Therefore, the advertising business ultimately needs to find a balance between the three parties.
From the perspective of platform, advertising business should ensure user experience while taking into account the ROI of most advertisers (to ensure that they can make money), and then consider maximizing the revenue of the platform, which is a healthy advertising ecosystem.
1.2 Recognize the essence of advertising from the decomposition formula of income
With the development of advertising business for several decades, there are many ways to settle advertising expenses. The most common ones are as follows:
- CPT: indicates the location of an exclusive packet by time
- CPM: Charge per thousand exposures
- CPC: Pay-per-click
- CPA: Based on behavior (e.g. download, registration, etc.)
The reason why there are different settlement methods is actually derived from the development of the advertising market. At the beginning, traffic was scarce and the platform was dominant. Today, it has gradually become a buyer’s market, and the negotiation power of advertisers as the demand side has become larger.
As you can see from the above chart, CPA is best for the advertiser, but worst for the platform, because it represents the conversion effect the advertiser ultimately wants. As the settlement method has evolved to today, it is also a balance, so CPM and CPC near the balance point are the most common settlement methods.
Taking CPC as an example, income can be decomposed into the following formula:
Where, PV represents the page view of the system, PVR and ASN represents the fill rate of the advertisement, CTR represents the click rate of the advertisement, and ACP represents the average click price of the advertisement.
Each of the above indicators can be improved through a series of advertising strategies. For example, the fill rate can be achieved by developing more advertisers, CTR can be improved by precise placement through AI algorithm, and ACP can be achieved by precise traffic premium or increasing the ROI of advertisers.
Understanding the revenue breakdown formula above is crucial to understanding the advertising business, and almost any business action can be tied to one of the metrics of the formula.
1.3 Core business process of advertising
With the development of advertising business to today, as advertisers continue to strengthen the demand for the effect of advertising, accurate orientation and real-time bidding is the most mainstream business form at present.
For Internet platforms, at the initial stage, commercial realization is generally achieved through “self-owned bidding advertising network”, which is simply understood as using the platform’s own traffic and independently developed advertisers to achieve business closed loop. The advertising architecture shared in this article focuses on this form of business, and its core business process is shown in the figure below.
-
Advertisers first through the platform to release advertising, can set up a series of directional conditions, such as city, time, crowd label, bid, etc..
-
After the campaign is completed, the AD is stored in the AD library and indexed so that it can be recalled by the AD search engine.
-
After the request from the C end comes, the advertising engine will complete a series of logic such as recall, algorithm strategy, bidding sorting, and finally screen out the Top N ads to achieve thousands of ads.
-
When a user clicks on an AD, it triggers an AD deduction process, and that’s when the platform actually gets revenue.
The above is the core of the advertising business processes, with further increasing scale of platform flow and advertisers, tend to gradually from “proprietary model for network” to “alliance advertising and RTB real-time bidding” direction, similar to the ali mama, tencent click-through, massive headlines engine, the complexity of business and technical architecture is to a higher level, this paper does not make, I will share more details with you later.
Technical challenges 02
With an initial understanding of the advertising business, let’s take a look at the technical challenges facing advertising systems:
1, high concurrency: advertising engine and C end traffic docking, large request volume (flat peak often has tens of thousands of QPS), requiring real-time response, must return results within dozens of milliseconds.
2. Complex business logic: an advertising request involves complex business processes such as multi-way recall, algorithm model scoring and bidding sorting, with multiple strategies and long execution links.
3, high stability requirements: the advertising system is directly linked to the revenue, advertising engine and billing platform and other core systems have high stability requirements, the availability of at least three 9.
4. Big data storage and calculation: with the development of business, the number of promotion and deduction orders can easily reach tens of millions or even hundreds of millions. In addition, the aggregation dimension of income statements is many, and a single statement may reach tens of billions of records.
5, accounting accuracy: advertising deduction fee is a financial operation, need to do not lose, do not repeat, otherwise it will damage the interests of a party. In addition, inaccurate revenue data can affect business decisions.
03 Detailed explanation of advertising system architecture
After understanding the objectives and technical challenges of the advertising business, the overall architecture and technical solutions of the advertising system will be introduced in detail.
Above is the diagram of our company’s current advertising system architecture, which applies to the initial stage of advertising business and is aimed at “self-owned bidding network and in-site traffic” and does not involve affiliate advertising.
The following is a description of each subsystem:
-
Advertising system: for advertisers to use, the core functions include membership renewal, advertising library management, set promotion conditions, set advertising bid, view the effect of the.
-
Advertising operation background: it is used for product operation of the platform, and its core functions include advertising space management, advertising strategy management and various operation tools.
-
Advertising retrieval platform: it undertakes high concurrent requests from the C-terminal and is responsible for screening several or dozens of advertisements from the massive advertising library, with high real-time requirements. This platform is usually composed of multiple micro-services.
-
AB experimental platform: the stabilizer of advertising business. Any adjustment of advertising strategy can be carried out through this platform to observe the change of income indicators.
-
Advertising billing platform: C-oriented, responsible for real-time deduction of fees, directly linked to revenue, high availability requirements.
-
Account management center: the financial system in advertising business, which is in charge of the business related to amount, including recharge, freezing and deduction.
-
Big data platform: the chassis of the entire advertising system needs to aggregate various heterogeneous data sources, complete offline and real-time data analysis and statistics, produce business reports, production model features, etc.
3.1 Storage of advertising data
Advertising system to store a variety of data, characteristics are not the same, using multi-mode data storage.
-
OLTP scenarios, including advertising library, creative library, membership library, advertising product library, advertising strategy library, etc., are stored in MySQL. The advertising library and creative library with large data scale are divided into sub-tables according to the ID Hash of advertisers.
-
In OLAP scenarios, a large number of reports are involved and the number of records in a single table may reach 10 billion. HDFS and HBase are used for storage at the bottom layer.
-
Index data for AD search scenarios, including forward and inverted indexes, are stored by Redis and ES.
Another storage issue that needs to be addressed is the synchronization of ads. After the advertisement is placed, it will be stored in MySQL database first, and then the advertisement will be transmitted to the retrieval system in real time to complete the update of forward index and inverted index.
Index update service, there are a few points to illustrate:
-
Each business system sends MQ messages when promotion, balance and other information changes, and index update service subscribes to MQ to sense changes and complete incremental synchronization.
-
In the changed message body, the actual changed field is not transmitted, but only the changed advertising ID is notified. The index update service reads the latest data in real time to complete the update, which can effectively solve the data inconsistency caused by message disorder.
-
When the concurrency of index updates reaches a certain level, the overall update speed can be improved by merging changes to the same advertisement or separating the inverted and inverted updates.
3.2 The overall process of advertising retrieval platform
The advertising retrieval platform is responsible for receiving the traffic request from the C terminal, screening the most suitable top N ads from the massive advertising library, and returning the results within tens of milliseconds. It is a process of multi-level screening and sorting.
The Recall layer focuses on algorithm models, while the Search layer focuses on business. From bottom to top, the computational complexity increases layer by layer and the candidate set decreases layer by layer. (Note: There are differences between search advertising scene and recommendation advertising scene in some sub-modules, but the overall process is basically the same, so it will not be expanded here)
Performance design is the focus of retrieval platform, usually by the following means:
-
Do a good job of service layering, each layer can be horizontal expansion.
-
Redis cache is used to avoid sending high-concurrency requests directly to the database. Multiple sets of cache can be distributed according to service planning.
-
Multithreading is used to parallelize some sub-processes, such as multi-path recall logic and multi-model scoring logic.
-
Hotspot data is cached locally, such as advertising space configuration information and policy configuration information. It can be preloaded locally when the service starts and synchronized periodically.
-
Non-core processes set timeout circuit breakers to downgrade logic, such as premium strategy (no premium just makes less money, does not affect AD recall).
-
Asynchronous execution of logic unrelated to the main process, such as the cache of deduction information, recall result cache, etc.
-
Simplify the structure of RPC return results or Redis cache objects, remove unnecessary fields and reduce the size of IO packets.
-
GC optimization, including JVM heap memory setting, garbage collector selection, GC frequency optimization, and GC time optimization.
3.3 Technical solution of billing platform
Billing platform is also a core system, mainly to complete the real-time charge deduction function. For example, under CPC settlement, the budget set by the advertiser is 50 yuan, and 1 yuan is deducted for each click. When the deducted amount reaches the budget, the advertisement needs to be offline in time.
In addition, the billing platform also needs to support a variety of settlement methods such as CPM and CPT, as well as anti-cheating, balance collision processing, amortization and reconciliation of deduction orders and other functions.
The billing platform has the following characteristics: high concurrency, large data volume, and high availability requirements. Many deductions must be made instead of repeated deductions. The following takes CPC real-time click deduction fee as an example to elaborate the technical scheme.
First of all, the whole fee deduction process is asynchronized. When receiving the real-time fee deduction request, the system will first cache the information used in the fee deduction to Redis, and then send MQ message. After the completion of these two steps, the fee deduction action is finished.
This has the advantage of ensuring the performance of the billing interface while leveraging MQ’s reliable delivery and retry mechanism to ensure the ultimate consistency of the billing process.
To improve availability, both Redis and MQ are degraded. When Redis is unavailable, switch to TiKV for persistence; When MQ delivery fails, change to thread pool asynchronously.
In addition, each valid click needs to generate a deduction order, facing the storage problem of large data volume. At present, we use MySQL database and table, and we will consider using distributed storage such as HBase in the future. In addition, the data consistency between the order and accounting system, using the big data platform to do day-level incremental extraction, through Hive tasks to complete account checking and monitoring.
3.4 Technical solution of OLAP mass data report
Data report is also the core business of advertising platform, which is the basis for advertisers and platform operators to optimize and make business decisions. Let’s take a look at the hierarchical structure of the AD data warehouse:
-
Source data layer: corresponds to various source data, including front-end and back-end logs collected in THE HDFS in real time, and MySQL service data tables that are incremental or fully synchronized.
-
Data warehouse layer: contains dimension tables and fact tables, which are usually data wide tables after cleaning source data, such as behavior log tables, promotion wide tables, user wide tables, etc.
-
Data mart layer: lightweight granularity summary table for data, such as advertising effect table, full link table of user behavior, user group analysis table, etc.
-
Data application layer: data tables directly used by upper-layer application scenarios, including various income reports generated by multidimensional analysis and algorithm model features and portrait data generated by Spark tasks.
Adopting such hierarchical structure is similar to the idea of software hierarchical structure, which improves the maintainability and reusability of data.
Let’s look at the challenges faced by reports at the application layer: there are many dimensions of aggregation, including time sharing, advertising space and promotion. Single table up to ten billion level; Supports real-time query of time ranges.
This part is maintained by the big data department of the company and adopts open source technical solutions. Kylin is used for the offline part and data is stored in HBase. The real-time part uses Flink and Spark Streaming, and the data is stored in Druid.
Write in the last
This paper introduces the initial architecture and core technology of advertising system in detail. As services evolve, the architecture becomes more complex. However, big data storage, high concurrency, and high availability are always technical difficulties for advertising services.
About the stability guarantee of the advertising system, the scalability design of the advertising strategy, the system architecture of RTB real-time bidding and other valuable content, and then share with you, welcome to pay attention to my public number. If you have any questions or suggestions about this article, feel free to leave a comment.
About the author: 985 master, former Engineer of Amazon, now 58-year-old technical director
Welcome to pay attention to my personal public number: IT career advancement, wonderful original constantly!