background
Online advertising is a common way of business realization in Internet industry. From the engineering point of view, the structure and implementation of advertising index directly determine the service performance of the whole system. Based on the search advertising system of Meituan Dianping, this paper discusses the engineering secrets of advertising system with readers.
Field problem
An advertising index must have the following basic features:
- Hierarchical index structure
- Real-time index updates
Hierarchical placement model
Generally, the advertising system can be abstracted into the following delivery model, and achieve retrieval, filtering and other processing logic.
There is a one-to-many relationship between the upper and lower levels of the hierarchy. An advertiser typically creates several promotion plans, each of which corresponds to a KPI with a longer cycle, such as a monthly budget and location. Multiple promotion units within a promotion plan are used for finer delivery control, such as the highest bid per click, daily budget, targeted conditions, and so on. Advertising creativity is the material used for advertising exposure, which can be subordinate to the level of advertisers or promotion plans according to the business characteristics.
Real-time update mechanism
The hierarchical structure can more accurately and timely reflect the advertising control needs of advertisers. Each layer of the placement model defines several fields for implementing various types of placement controls. Most of the fields in the advertising system need to support real-time updates, such as audit status, budget up-down status, etc. For example, when a promotion unit changes from publishable to suspended, if the change does not take effect in time in the index, it will result in a large number of invalid posts.
The industry research
At present, most of the production open source indexing systems are designed for general search engines, which basically cannot meet the above conditions.
- Apache Lucene
- Full text retrieval, dynamic script support; Implemented as a Library
- Real-time indexes are supported, but hierarchies are not
- Sphinx
- Full-text retrieval; Implementation as a full Binary is difficult to develop again
- Real-time indexes are supported, but hierarchies are not
As a result, the advertising industry is either customizing based on open source solutions or developing its own closed source systems from scratch. After considering the cost and benefit, we decided to design our own indexing system for the advertising system.
Index design
Engineering practice focuses on stability, scalability, high performance and other indicators.
Design decomposition
The design phase can be decomposed into sub-requirements.
Real-time indexes
The update flow of advertising scenes involves the real-time update of index fields and various attributes. In particular, the property fields related to the offline state need to be updated within several milliseconds, which has high requirements on real-time performance.
Index fields used for recall conditions can be updated with a lag, such as within seconds. Adopting a divide-and-conquer strategy can greatly reduce system complexity.
- Property field update: Directly modify the field value of the forward table, which can be completed in milliseconds
- Update of index field: it involves real-time calculation of update stream, inversion of index, etc., and only needs to be completed in seconds
In addition, index snapshots ensure data correctness by periodically switching full indexes and appending incremental indexes.
hierarchy
The main entities of the advertising model are Advertiser, Campaign, Adgroup and Creative, etc. Among them:
- Advertisers and promotion programs: Define the various status fields used to control advertising delivery
- AD group: describes AD related attributes, such as bid keywords, highest bid, etc
- Creativity: fields related to AD presentation, click, etc., such as title, creative address, click address, etc
Generally, advertising search and ranking are based on the granularity of advertising group, and the inverted index of advertising is also built on the level of advertising group. Referring to the concept of relational database, we can take the Adgroup as the main table (that is, an Adgroup is a doc) and build an inverted index to it. Take advertisers, promotion plans, etc., as auxiliary tables. Primary and secondary tables are associated by foreign keys.
- Search the related docID list from the inverted index by query criteria
- For each docID, relevant field information can be obtained from the main table
- Use the foreign key field to obtain the field information of the corresponding secondary table
In the retrieval process, the synchronization filtering of various field values is realized.
Reliable and efficient
In order to avoid performance jitter caused by dynamic memory management and garbage collection mechanism of Java virtual machine, C++11 was adopted as the development language. Although Java can use out-of-heap memory, copies of out-of-heap and in-heap data are still costly for high concurrent access. The project strictly follows the Google C++ Style and significantly lowers the programming barrier.
In the service scenario of read more than write less, the read performance must be prioritized. Retrieval is a memory search process, which is a computationally intensive service. To ensure high CONCURRENCY of CPU, it is generally designed as a lock-free structure. Can use “write read” and delay deletion technology to ensure efficient and stable operation of the system. In addition, the clever use of array structure, also further optimize the read performance.
Flexible extend
The relationship between main and secondary tables is relatively stable, and the field types in the table need to support extension, such as user-defined data types. Even inversion table types need to support extensions, such as geolocation indexes, keyword indexes, inversion indexes that carry load information, and so on. By inheriting interfaces, more customization functions can be achieved.
Logical structure
From a functional perspective, an Index consists of two parts: Table and Index. As shown in the figure above, Index implements the transformation from Term to the main table docID; Table implements the storage of forward row data, and implements the association between the primary Table and secondary Table through docID.
Layered architecture
The index library is divided into three layers:
- Interface layer: provides functions such as index construction, update, retrieval and filtering through API
- Ability layer: realize index function based on inverted table and forward table, is the core of the system
- Storage layer: Memory layout of index data and persistent storage to files
The index to achieve
This section describes the design details and challenges of each layer from the bottom up, starting with the storage layer.
Storage layer
The storage layer is responsible for memory allocation and data persistence. Mmap can be used to map virtual memory space, and memory and file synchronization can be realized by the operating system. In addition, MMAP makes it easy for external tools to access or verify data for correctness.
The storage layer is abstracted as an Allocator. Different allocators can be customized for different memory usage scenarios, such as requirements for memory continuity and whether memory needs to be reclaimed.
The following are all types of MMAP-based allocators, where “memory” refers to the virtual address space of the calling process. The actual code logic also involves complex Metadata management, which is not covered below.
Simple allocation strategy
-
LinearAllocator
-
Memory allocated contiguous address space, that is, a large block of memory; When the space needs to be expanded, new MMAP file mappings are adopted and the old file mappings are deferred
-
New mappings cause page table reloads, and large memory mappings cause performance jitter due to physical memory loads
-
Typically used for scenarios with relatively fixed space requirements, such as the bucket array of a HashMap
-
SegmentAllocator
-
To solve the performance jitter problem of the LinearAllocator expansion, you can divide the memory into segments. That is, each expansion involves only one segment to ensure stable performance
-
Fragmentation results in discontinuous memory space, but common application scenarios, such as storage of inverted indexes, are suitable for this method
-
The default segment size is 64MB
Intensive allocation strategies
Frequent data addition, deletion, and modification will result in a large amount of external fragmentation. With compression, the memory footprint is more compact, but the cost of moving objects is difficult to balance between performance and complexity. In engineering practice, multiple allocators that are more suitable for business scenarios are realized independently by referring to Linux physical memory allocation strategy.
- PageAllocator
- The page size is 4KB, and the idea of Buddy System is used for page allocation and recycling
- Page allocation is based on SegmentAllocator, which is segmented before paging
The processing of the partner allocator is briefly described here. For efficient management of free blocks, each level of order holds a free block FreeList. Set the maximum level order=4, i.e. from order=0, from low to high, the number of pages in each level order block is 1, 2, 4, 8, 16, etc. When distributing, find the minimum block that meets the condition first; If not, the larger block is looked for at the upper level and divided into two “partners,” one of which is allocated and the other placed on a FreeList at the lower level.
The following figure shows the state change before and after allocating a page-sized block of memory. Before allocating, the allocator looks for a FreeList starting with order=0 and does not find a free block until order=4.
Divide the free block into two partners with 8 pages, use the first half, and mount the second half to the FreeList of order=3; This process is repeated step by step until the required block of memory is returned, and the free block with page number 1 is hung on the FreeList to order=0.
When a block is released, it checks to see if its partner is free and merges the two free partners into a larger free block if possible. This is the reverse of the allocation process and will not be repeated.
While PageAllocator effectively avoids external fragmentation, it does not solve the problem of internal fragmentation. To solve the allocation problem of such small objects, object cache locator (SlabAllocator) is implemented.
- SlabAllocator
- The object cache is allocated based on PageAllocator, and the slab size is in pages
- Free objects are defined as multiple SlabManagers according to their memory size, and each SlabManager holds a PartialFreeList, which is used to place a slab containing free objects
Object memory allocation process, that is, get a slab containing free objects from the corresponding PartialFreeList and allocate objects from this slab. On the contrary, the releasing process is the reverse process of distribution.
To sum up, real-time index storage combined with PageAllocator and SlabAllocator effectively solves the problem of external fragmentation and internal fragmentation of memory management and ensures the efficient and stable long-term operation of the system.
Ability to layer
The capability layer implements basic storage capabilities such as forward and inverted tables, and supports flexible expansion of index capabilities.
Positive index
A Forward Index, also known as a Forward Index, is used to retrieve Doc contents by the primary Key, and is referred to as a Forward Table or Table. Unlike the search engine’s lined Table data structure, tables can also be used in NoSQL scenarios alone, similar to the Kyoto Cabinet’s hash Table.
Table not only provides operations such as adding, deleting, modifying, and querying by pressing the primary key, but also realizes functions such as searching, filtering, and reading with the inverted Table. As a core data structure, Table must support frequent field reads and various types of forward filtering, requiring efficient and compact implementation.
To support random access by docID, the Table is designed as a large array structure (data area). Each doc is an element of the array and has a fixed length. Variable-length fields are stored in the extension area (ext area), and their offset and length in the extension area are stored only in doc. Unlike the column storage of most search engines, the data area is stored in rows so that the cache between CPU and memory can be used for business scenarios to improve access efficiency.
In addition, for NoSQL scenarios, primary key to docID mapping (IDX files) can be implemented via HashMap, which enables random primary key to document access. Since the docID list of inverted indexes has direct access to the forward table, this IDX is not used for inverted searches.
Inverted index
Also known as Inverted Index, the contents of a document are retrieved by Keyword. The Indexer interface is abstracted here to support complex business scenarios, such as algorithmic coarse-ordering logic for traversing index tables.
A specific Indexer only needs to implement each interface method and register the type with an IndexerFactory. An Indexer instance can be obtained through the factory’s NewIndexer method. The class diagram is as follows:
Three commonly used indexers are currently implemented
- NoPayloadIndexer: The simplest inverted index. The inverted table is a simple docID list
- DefaultPayloadIndexer: In addition to docID, the inversion table stores the payload information of the keyword in each doc. For business scenarios, POI can be stored at static quality points or highest bids per Node granularity. In this way, some inversion preference filtering can be done before accessing the forward row table
- GEOHashIndexer: A geographically based Hash index
The above indexers have similar design ideas, and only two common characteristics are described:
- Term: store keywords, signature hashes, offset and length of Posting files, etc. Different from Lucene’s prefix-compressed tree structure, it is implemented as a hash table, which can guarantee stable access performance although it wastes space
- Posting: Stores information such as the docID list and Payload. The retrieval operation is to scan the inverted list in sequence, and perform some filtering based on Payload or Boolean operation between inverted chains in the scanning process. How to make full use of cache to achieve high-performance index reading is an important factor to be considered in the design and implementation. In this paper, segmented memory allocation based on segmentAllocator is implemented to achieve a delicate balance between efficiency and complexity
Lucene’s Skip List structure is not used for business reasons, because advertising scenarios don’t have as many Doc’s as search engines and are usually a single inverted list operation. In addition, if the number of doc grows too fast and the index changes frequently, it can be considered to build a B+ tree structure for the elements in the inverted list to achieve rapid location and modification of the inverted elements.
The interface layer
The interface layer interacts with the outside world through apis and hides the details of internal processing. Its core function is to provide retrieval and update services.
The configuration file
The configuration file is used to describe the Schema of indexes, including Metadata, Table, and Index definitions. The format and content are as follows:
As you can see, Index is built into the Table, but it is not mandatory. The definition of each field in a Table is the core of the Schema. When the Schema changes, such as adding fields or indexes, the indexes need to be rebuilt. Space is limited and the details of the definition are not expanded here.
Retrieve the interface
Retrieval consists of lookup, which produces a collection of docID found, and filtering, which performs basic and business filtering on doc one by one.
- Search: Returns a normalized filtered ResultSet, which internally combines calls to DoSearch and DoFilter
- DoSearch: Query doc, returns the original ResultSet, but does not filter the results forward
- DoFilter: Filters the ResultSet returned by DoSearch
Generally, you only need to call Search to achieve all the functions; DoSearch and DoFilter can be used to implement more complex business logic.
The syntax description of the retrieval is as follows:
/{table}/{indexer|keyfield}? query=xxxxxx&filter=xxxxx
The first part is the path, which specifies the table and index. The second part is parameters. Multiple parameters are separated by & and are in the same format as URI parameters. Query, filter, Payload_filter, and Index_filter are supported.
The query parameter defines the retrieval rule for the inverted index. Currently, only single index can be retrieved. Index_filter can be used to retrieve combined indexes. Supports Boolean operations such as AND, OR, AND NOT, as shown below:
query=(A&B|C|D)! E
The query syntax tree generates code based on Bison. Aiming at the docID union operation of multiple terms commonly used in business scenarios, the temporary storage for the doc merge result of two adjacent terms is eliminated by modifying the Bison grammar rules, and the DOC of the previous term is directly merged into the current result set. This optimization greatly reduces the overhead of temporary objects.
The filter parameter is used to filter the field values of the table. Multiple key-value pairs are filtered by;. Segmentation, support single – value field operation and multi – value field set operation.
The Payload_filter parameter defines the Payload index filtering function. Currently, only single-value field relational computation is supported. Multiple key-value pairs are defined by; Segmentation.
The detailed filtering syntax is as follows:
In addition, index filtering defined by the index_filter parameter will directly operate on the inverted chain. Because constructing the retrieved data structure is more complex than forward filtering, this parameter is only applicable to scenarios where the recalled docList is particularly long but the docList filtered by index is very short.
The result set
The implementation of ResultSet refers to the Java.sql.ResultSet interface. The result set is traversed by cursor, using the overhead of frequent calls to inline functions.
Implemented as a C++ template class, the main interface is defined as follows:
- Next: Move cursor to the Next doc, returning true on success, false otherwise. Returns false if it is already the last record in the collection
- GetValue: Reads the value of a single-valued field whose type is specified by the generic parameter T. Return the default def_value if it fails to get
- GetMultiValue: Reads the value of a multi-value field and returns a pointer to an array of values, the size of which is returned by the size argument. Return null on read failure, size equals 0
Update the interface
Updates include adding, modifying, and deleting doc files. Parameter type Document: indicates a DOC record. The content is the content of the DOC field to be updated. Key is the name of the field, and value is the corresponding value of the field. Returns 0 on success or non-0 on failure. Error messages can be obtained via the GetErrorString interface.
- Add interface Add: Add new doc to Table and Index
- Interface modification Update: Modifies existing doc contents, including Table and Index changes
- Delete interface Delete: Deletes an existing DOC, which involves deleting data from Table and Index
The update service connects to the real-time update stream to realize the real real-time advertising index.
Update the system
In addition to the index implementation mechanism described above, the production system also needs to get through the update flow of online delivery engine and merchant side, budget control, anti-cheating, etc.
Challenges and Goals
The main work of data update system is to aggregate, tiled and calculate the original multi-dimension information, and finally output the dimension and content required by the search engine online.
Upstream triggering in business scenarios can be very irregular. To avoid jitter in the update stream, the throughput of real-time updates must be optimized to leave sufficient performance margin to deal with the triggered spikes. In addition, updating the system involves many-to-many dimensional transformation, and maintaining the maintainability of logic such as computation and update trigger is the main challenge facing the system.
Throughput design
Although updating the system requires a lot of computing resources, it is still IO intensive because it requires queries against dozens of external data sources. Optimizing the external data source access mechanism is the primary goal of throughput optimization.
Here, the classic batch method is adopted, that is, within the cluster, all the data sources that can be batch queried are collected and processed by a specific worker. In a short time, the worker aggregates data sources and returns them to each data flow that needs data one by one. There can be multiple workers dealing with a data source, which are collected to the same worker for batch query according to the same type of query and then returned. After this partition, a series of logical optimizations can be made to improve throughput.
Layered abstract
In addition to generating merchant side release model data, the update system also needs to deal with filtering for various business scenarios and all kinds of exclusive information for advertising presentation. Business change may involve logical adjustment of multiple data sources. Only simple and clear abstraction can cope with the complexity of business iteration.
In engineering practice, the external data source is abstracted into a unified Schema, which not only makes the data source transparent to the business logic, but also realizes complete verification with the help of the compiler and type system, so that more problems can be solved in advance at the compilation time.
Data is defined as abstract types such as Table, Record, Field and Value, and is defined as Scala Path Dependent Type, which is convenient for the compiler to verify the logic inside the program.
Reusable design
In multi-pair multi-dimensional computing scenarios, the processing function (DFP) for each field should be as simple and reusable as possible. For example, the DFP for each output field (DF) only describes the required source data field (SF) and the calculation logic for that field, and does not describe the required query or routing relationship between SF(1) and SF(n).
In addition, DFP is not bound to the hierarchy of the final output. The hierarchical binding is done by defining the fields contained in the output message, that is, defining the level at which the primary key of the message is defined, and binding a series of DFPS to the message.
Thus, the DFP simply describes the logic for generating the field content. If a business scenario needs to tile the same DF to different levels, simply reference the same DFP on the output messages of that level.
triggering
The update system needs to receive the status change of the data source, determine whether to trigger the update, which index fields need to be updated, and finally generate the update message.
To implement automatic triggering of data source changes, the following information needs to be described:
- Associative relation between data: implement the syntax to describe the associative relation, that is, describe the associative relation when describing the external data source, and the routing of the subsequent field query will be processed by the framework
- DFP dependent SF information: Simple DFP processed only by monad segment can be configured to solidify the dependent SF at compile time; For a complex DFP with multiple data sources, the SF of the DFP dependency can be obtained through source code analysis without the need for users to maintain the dependency
The production practice
Early search advertising is based on the natural search system architecture, with the development of business, according to the characteristics of advertising system transformation. The new AD index, which implements a pure real-time update and hierarchical structure, has been launched on Meituan Dianping search ads. This architecture is also applicable to various non-search business scenarios.
System architecture
As the core of the whole system, the advertising retrieval and filtering service (RS) based on real-time index undertakes the functions of advertising retrieval and various business filtering. Daily service iterations can be completed through the upgrade index configuration.
In addition, to improve the throughput of the system, several modules have implemented server asynchronization.
Performance optimization
The following is the performance curve of the monitoring system. The number of doc in the index is in millions and the delay is in milliseconds.
Subsequent planning
In addition to further performance optimizations and functionality extensions, we plan to implement several features to facilitate the integration of real-time index with other production systems.
JNI
With JNI, the Table is treated as a separate NoSQL to provide local caching for Java. For example, in the real-time estimation module of advertising system, Table can be used to store advertising features used by the model.
SQL
Provides SQL syntax, provides simple SQL support, and further lowers the barrier to use. Provides JDBC to further simplify Java calls.
The resources
- Apache Lucene http://lucene.apache.org/
- Sphinx http://sphinxsearch.com/
- “Understanding the Linux Virtual Memory Manager” https://www.kernel.org/doc/gorman/html/understand/
- Kyoto Cabinet http://fallabs.com/kyotocabinet/
- GNU Bison https://www.gnu.org/software/bison/
Author’s brief introduction
Cang Kui: Architect of advertising platform search advertising engine group, leading the design and implementation of real-time advertising index system. Good at C++, Java and other programming languages, have in-depth research on asynchronous system, background service tuning and so on.
Xiao Hui: Advertising platform search advertising engine group core development, responsible for the design and implementation of real-time update stream. Experimenting with Scala on advertising platforms and using it in large-scale engineering practice.
Liu Zheng: The person in charge of search advertising engine group of advertising platform, with many years of experience in Internet background development, and has led many system reconstruction.
CAI Ping: The person in charge of the review side of the advertising platform search advertising engine group, in charge of the architecture and optimization of the review side system.
, recruiting
Interested in Linux background development, computing advertising, high performance computing, distributed systems and other interested friends, please contact us via email [email protected]. If you are interested in our team, you can follow our column.