This is peng Wenhua’s 110th original article \

Previously shared the ins and outs of DMP. DMP is the provider of user-base data in the Internet’s most lucrative advertising system. Baidu’s Phoenix Nest, Taobao’s Zhichao (Ali Mom) and Tencent’s Guangdiantong are the old three network advertising platforms, but now there is a huge toutiao engine.

Alibaba’s mother once generated 80% of Alibaba’s cash. Now the flow of Toutiao is also rising, and the huge volume engine brings over 100 billion advertising revenue to Toutiao every year!

To put it simply, every advertisement we see on the Internet, again, every advertisement, has a DSP system behind it. \

Today I will share with you the architecture design of the core DSP (Demand Side Platform, advertising Demand Side Platform) of the advertising system. \

What is DSP?

DSP is the advertiser demand platform, which is the most important system in the advertising system, equivalent to the trading platform in the e-commerce system. However, DSP is the opportunity for advertisers to buy and display (advertising space), the seller is the owner of the advertising space (media), and the product is the user’s click (business opportunity). In simple terms, DSP is the Internet company with traffic realisation of the super weapon!

The overall Internet advertising system looks like this:

As shown in the figure, the whole advertising system serves as a platform, with advertisers on one side and advertising space provided by the media on the other. Media with advertising space resources, through access to SSP, advertising resources on the supplier platform; The advertisers who want to put advertisements on the DSP will make a putting plan and look for putting resources, and the ADN advertising network and ADX advertising trading market will complete the matching and putting of the needs of both sides.

So the requirements of the advertising system can basically be listed:

  • To provide advertisers with a list of advertising space resources provided by the media
  • Offer advertisers the ability to buy and display ads on demand
  • To provide advertisers with advertising display and click effects
  • Stop showing ads when they run out of money and provide both parties with a list of costs

How should DSP data flow be structured?

Since whole advertisement system already touched about, za magnifies DSP to see. According to the logic of information system construction, the business process of DSP is sorted out first, and then its functions are divided, and then the data logic and architecture can be sorted out. \

We first draw the business flow chart, taking Baidu promotion as an example, its delivery process is as follows:

Advertisers in Baidu phoenix Nest on the process is like this: the first to establish a plan, the plan will establish a unit, and then management creativity, select keywords, keywords for bidding (bid), confirm the completion of the plan. Phoenix nest will be set up according to the rules of advertisers, advertising. Whenever someone searches for a keyword, follow the rules and show it to him. If someone clicks, they charge him; Finally, show the statistical results on the advertiser’s page. \

Now that the business process is clear, you can begin to map the architecture of the system. The DSP actually covers:

  • Advertising channel access ability, docking ADX;
  • User label data access capability, DMP connection;
  • Advertising strategy management ability, for advertisers to put in the material of the strategy management, from the demand side (advertising side) to increase the probability of click;
  • Algorithmatic capabilities, intelligent matching of resources, enabling the right people to see the right ads in the right place and time, increasing the probability of clicks from the supply side (user side);
  • Real-time capability, real-time monitoring, dynamic adjustment, do not waste an opportunity to show, nor more than once;
  • Statistical presentation ability, real-time calculation and statistics, long time traceability and analysis.

The figure above is basically the key capabilities required by the DSP. The bottom layer needs the ability to connect advertising channels and user data, the core is the ability of strategy and algorithm, and the top layer must have the ability of real-time monitoring and data statistics display.

Now that we have the system architecture, let’s draw the data architecture. First, let’s sort out the idea: user data has DMP, advertising resources have ADX, we do not care; Algorithms and strategies each have a group of people responsible for, on the one hand, those are transactional, and on the other hand, algorithms and strategies, we don’t have to worry about the data group, just need to consider real-time monitoring and data statistics presentation.

There must be real-time data collection up front, plus user tag data from DMP. In between are two applications, one for AD resource listing and one for AD purchase (RTB), both of which are now bidding for purchase (these are not DSP, but comprehension services). Data statistics and presentation services follow. The system looks something like this:

The AD business application in the middle was thrown to the business development team as we combed through the data architecture diagram. Nginx can be used for real-time data collection. Flume can be used for real-time data collection, but kafka must be used for distribution. Because of the money involved, it is possible to go back a long way, so the Lambda architecture is recommended because Kappa has a hard time going back to old data. Flink + Kafka is recommended for real-time, redis is used for cache, and Hbase is used for persistence. After calculation, data is recommended to be directly thrown into the real-time data warehouse. You can use Hive + Spark if you are offline. Real-time data warehouse can use CK, Doris, Druid, etc. Data V, Davinci, etc. So here’s the data architecture diagram:

Of course, this architecture diagram is still very simple, many details have not been detailed, for example, the data of the third party did not arrive in time, what should we do? How should each indicator be managed and calculated? How to ensure exactly only, deep pagination and so on in each link. These are the problems that the attrition billing system needs to solve. \

  • Data is not timely: FLink’s own Watermark mechanism, add link Retry, wait a little longer;
  • Exactly only: two-stage delivery, three-stage delivery, PAxOS guarantee end-to-end exactly only;
  • Deep paging: bucket splitting + predicate push-down + parallel query.

DSP real-time OLAP selection for major factories

Baidu Phoenix Nest uses Drios (former Baidu Palo), which was developed by itself and opened source in 17 years. Later, the founder Ye Qian led his RESEARCH and development team to start a business. It is already a hot MPP-like real-time OLAP product. Their official account isDorisDB, you can pay attention to a wave, there are meituan, Millet, JINGdong and other large manufacturers to share experience, I will not copy and paste. Before sharing a small secret into the big factory, is to find some only big factory will use the product, will use, practice, it is good into the big factory. It’s a secret I don’t tell people!

Ali Youku uses customized Kylin, which is very interesting. Kylin originally needs predictive calculation, which is not suitable for real-time scenarios, but they ensure the efficiency of data construction by means of microbatch + predictive calculation Rollup materialized view, etc. Through Blink minute-level microbatch calculation, Kylin minute-level increment ensures real-time performance. It turned out very well

Toutiao’s massive engine uses Druid, a time-series based database. Because all the advertising data is log data, it’s naturally ordered and a good match for Druid. Basically, I can do it one by one, which is pretty efficient. Druid does not support Joins, so it has high requirements for scenarios and design.

By the way, you have also prepared DSP business introduction books and classical books in the field of computing advertising, as well as youku DSP practice, you can Chou Chou. Background reply “DSP” to download.

Enjoy better with the following articles

Dry goods | data byte to beat China will share of service records

【 Data package 】 Real-time data warehouse architecture design and selection

Dry goods | CRM, DMP, CDP is all some what ah?

I need your upvotes. I love you