Hello everyone, I am Tong Yan Wuji. I am a city lion who does not do his duty. I believe that “practice makes real knowledge, life is simpler” and YEARN for freedom.

primers

Students who often use Toutiao APP to browse news will find that within a few clicks, there will be a product promotion advertisement between the two news items. These ads are for Pinduoduo, JD.com, and other types of products to entice you to click and buy.

Sure enough, “The end of the universe is cargo.” But have you ever wondered how those ads end up on your phone screen and on other people’s phones?

To answer this question, let’s look at the relationship between product advertising and news. Information apps are producers of articles. If they do not produce commodities, they need to obtain commodity data from other manufacturers. Commodity manufacturers (JINGdong, Pinduoduo, Taobao, etc.) also hope to rely on the flow of information APP to expose commodities and improve the turnover. And how do you pay for bothIs not the point of the discussion.

What we’re talking about is designing a system that can connect to N product manufacturers, and at the same time, deliver products to mobile APP users.

“Bronze” system gameplay

Overall system simplification, there are three directions (according to the eight formula “up north down south, left west right east”)

  1. From north to South: CP data, such as JINGdong e-commerce, Autohome, Ctrip hotel, etc., pass through the systemcheck,processingAnd so on, finally fall disk to the storage layer;
  2. East: the Web interface is convenient for system operators to manually manage commodity data, such as labeling certain commodities as “promotion”. Specify commodity ID to query commodity details;
  3. Westward: According to recall strategy, user portrait, model and other collaborative filtering, personalized exposure of these commodity data to end-side users.

If you are careful, you may have found the problem.

Different CP carries different commodity data, such as: Jingdong E-commerce (shop name shop_title, commodity price, main picture main_photo, shop_URL, etc.), Auto Home (model, engine, seat_capacity, discount price, etc.), Ctrip Hotel (country, city, location, star, review_score, etc.)

  1. Because the commodity data of each CP is different, verification, processing and other logic must understand the information of specific fields, independent of other CP; The display of the Web interface is also based on specific field information.

    One day, the product says, “For KPI consideration, we need to add a field”, and both the front and back ends need to be adapted to modify the code; Another day, the product cheerfully said “talk about the next CP”, meaning that to introduce new CP commodity data, but also to the system, can be said to “affect the whole body”;

    All of a sudden, the programmer takes a knife to the architect and the product.

  2. The storage layer is a large and wide table, and the data of each CP is stored in a table. Every time a new field is added, other CP will perceive and fill in a new fieldnull, serious data redundancy; And as the item data grows, query performance slowly degrades.

The question

When adding 1 field and 1 CP, how can we decouple the front and back end and change the code as little as possible?

Stop and think about what makes the system so tightly coupled that it’s hard to scale.

Is the data, CP data differences, resulting in each link of the system to understand the meaning of specific fields, so as to make the corresponding response. Ok, “beat the snake and beat the seven inches”, is there any way to let each link of the system not directly face the CP data field, the problem can be solved?

It takes a long time to come out

“Do not directly face CP data field”, that is indirect, indirect means to find a middleman as an agent. Wow, this is buying and selling second-hand homes. Buyers want to buy a house, and a seller to talk, that can not put buyers run broken legs, mouth grinding skin ah. At this time, the real estate agency chi Chi ran out, talk to many sellers, the house information hanging in my this, to help you sell, the seller is very happy, don’t worry about; Buyers are also happy to go directly to the real estate agency to choose a house, saving time and effort.

Who is the middleman for system design?

Metadata, it’s coming.

So what is metadata? In 2021, Facebook will change its name to “meta-universe” company. With this popularity, IT should be easy for everyone to understand “meta-universe”. Simply put, metadata is data that describes data. If you don’t understand metadata, click on portal.

In our system, metadata is the structured data extracted from numerous data fields of CP to define its characteristics and content. For example, (price =100 yuan, product name = watch, product coupon = 50 minus 2, description = watch with display screen, store name = the first store in Southern Jiangnan, praise rate =98%, etc.) the content value of these metadata can concretically express a product information.

“Epic” system play

With a middleman like metadata, 牪, we can redesign the system.

  1. From north to South: Different CP data passes throughcheck,processingAnd other logic, is no longer independent CP; Instead, it is interpreted by metadata and then streamed through the unifiedcheck,processingSuch as logic.
  2. East: web interface display, also no longer need to understand the specific fields of each CP; Instead, it is by understanding the key and value of the metadata, and then unifying the rendering of the page, so that the presentation of the page can be “cookie-cutter”.
  3. To the west:Personalized recommendation, the data will not be recalled from the large and wide table, but from the storage table of each CP; Then how do we know which CP data is recalledMetadata definition.

Well, with this newly designed system, let’s go back and see, have we solved the pain points before?

  • If a new field is added to CP, only one field is added to CP’s storage table, and other CP’s storage tables are not affected. At the same time CPMetadata definitionAlso need to add field rules;
  • If the system introduces a new CP, it only needs to create the CPMetadata definition, and the storage tier table.

It can be seen that with the intermediary — metadata, the front and back ends are completely decoupled, and when another CP data is added, the code changes in each link are less. Good, good, congratulations to yourself, with a qualitative overflight.

theMetadata definitionHow is the definition, just unify each link?

How exactly is metadata definition defined?

Train of thought

For the above example of the business system, provide a metadata definition design ideas, of course, there will be more excellent metadata definition design, looking forward to your comments.

  1. Commodity type is the concept of CP. The data of each CP is ultimately recommended to end-side users. That is commodity, I understand.

  2. There are many types of metadata, but no matter how many new CP, there are only so many types, such as text, country, number, amount and so on; To put it in perspective, there are so many people in the world, but there are only two kinds of men or women.

  3. The query operation type of metadata, for example, price, end-side users need to query jingdong commodities that are in line with [100,200] yuan. Select * from jd where price in [100,200]; select * from JD where price in [100,200]; .

    Select CP commodity information from CP commodity table where a specific field is used to query data to be matched. . Some students will have questions, query statements so simple, there is no need to generalize?

    • From Mysql, it’s really not complicated. The storage layer is not Mysql, it is other, such as ElasticSearch, HDFS, its API is not simple!
    • Also, the sample statement is just a single field, a single CP; And just think, we as APP users, when choosing goods, the screening conditions are much more complex, so after the query operation type, the query statement can be generalized, less to write a lot of code.
  4. The validation rule type for metadata, for example, if the commodity price is of type amount, which means it must be floating point, then the validation rule for the amount is that the content value must be of type float.

    This means that no matter what CP commodity type, no matter what name field, once the metadata type is determined, the rules are determined.

    details

    With this in mind, there is a metadata definition declaration.

    Based on thisMetadata definitionDeclare that all parts of the system can unify the business logic without understanding the specific data fields of the specific CP. This sentence is a little round, it doesn’t matter, let’s break, knead broken to say:

    For example: a jingdong e-commerce product {title= exquisite photo album, price=50, shop_title(store name)= Banyuyanyungang grocery store, score(score)=10, etc.}, how does each link apply metadata definition to complete the operation of the system?

    1. Verification: this data into the system, first to confirm which commodity type, and then according toMetadata definitionDeclaration to verify, in line with the verification rules to enter the next link of the system.
      • Title = beautiful album, the “beautiful album” property String, and less than 64 characters, that is, this field is satisfied.
      • If jingdong e-commerce goods have 10 fields, butMetadata definitionThe declaration only has 9 fields, that is, it indicates that the actual commodity data has more fields, which cannot be checked and discarded.
    2. Processing: can be based onMetadata definitionDeclaration to wrap raw CP data, adding its own flavor to the business.
    3. Web presentation: do not care which fields of which CP, but render based on metadata type so that the page looks uniform.
      • The amount type can be rendered as “number + currency”, e.g. $10, $20;
      • Image type, which can be rendered left or right aligned;
      • Latitude and longitude type, can be directly rendered into a specific location of 3:2 small square.

Come to an end, wonderful continue

So far, the code word is not easy. Everyone see officer, if there is some enlightenment, harvest, please point a praise, also welcome everyone to comment. How can the storage layer be designed to support diverse queries, scalability, and performance? I’ll see you next time.