Guide language | search and recommendation is the user access to information, the two main ways in the shell is also the main means, to help customers to find a house so what are the similarities and differences between both? Can it be implemented using the same architecture? What are the benefits of a unified architecture? This article is about gao Pan, who is in charge of platform architecture of Shell Search recommendation Department, sharing and organizing in cloud + community Salon online, hoping to communicate with you.

[click the video to see complete live playback] (https://cloud.tencent.com/developer/salon/live-1263?channel=salonbanner)

First, shell search recommended use scenarios

1. Person, room and guest matching connection

Shell for everyone to provide a suite of housing, housing services. Because buying a house is a very important and complicated thing, it takes a long time to complete. It is impossible to buy books or clothes online. The process of buying a house usually involves an offline broker, which is commonly known as an “intermediary”.

So shell’s main business scene is the connection and matching of people, rooms and guests. People refer to brokers, houses are houses, and clients are our C-end users.

The connection and matching of the three are several core scenes of search, such as the connection of “people and customers”, we have customer source search system (broker for customer) and broker search system (customer for broker).

The “human house” connection mainly corresponds to the house search at the B end, which is the house search provided to brokers. For example, when you go to the offline lianjia store and tell the broker what kind of house you want, the broker will usually help you find the right house through the B-end house search system.

B end search is more complex than C end search, is specifically for the use of experienced brokers, is another set of search system, including the new house, second-hand house, rental, home chain, overseas and other scenes of the B end house search, these are “people” connection.

The “tenant” match is the familiar c-side of the search recommendation. For example, whether we go on the shell APP, or PC station or small program, will often see the second-hand house, new house, rent, overseas, map search and other channels. As well as a variety of home recommendation, related recommendation, guess you like the recommendation page.

For us, c-terminal is now the more core scene, because c-terminal search recommendation will directly affect the company’s online business conversion rate, we need to continue to optimize the effect of search recommendation, improve click rate, conversion rate, etc., so the following introduction will mainly focus on C-terminal.

In order to better support these core business scenarios, as a search recommendation platform, we mainly focus on three points: efficiency, cost and stability. Efficiency includes “tenant” matching efficiency and r&d iteration efficiency, cost includes personnel cost and machine cost, and stability means that the service needs to ensure more than 99.99% high availability.

2. Scenario example

The picture below is the common scene of search recommendation that you can see on Shell APP: main search box, second-hand house, new house, rental, overseas, must be promising house, commercial office, transaction check, community search, map search, etc.

Conveniently enter a channel, such as second-hand housing channel, above the input they want the village name, business name and so on, will return to the result you want.

If you do not enter the search channel and slide down the home page, you will enter the recommendation home page. You don’t need any keywords to recommend neighborhoods, houses, etc that you might be interested in.

3. Scenario Overview

As a platform, we enable many other scenarios besides our core business. For example, the search platform currently enables more than 500 scenarios.

C-end search including the new lease mentioned above, currently accept 60% of shell online business opportunities. B end search includes housing search, customer search, decoration search and so on. In addition, it also supports many search businesses needed by other internal business units, such as signing platform, trading platform, personnel administration and so on.

In terms of recommendation, it has enabled more than 300 scenarios, mainly at the C end, including second-hand houses, new houses and leasing, etc., taking 15% of online business opportunities. The main scenes include home page recommendation, related recommendation, guess what you like, feed flow and so on.

Like many companies, before Shell, search and recommendation were developed separately by two different teams, and the overall code architecture was very different. Therefore, I will first introduce the evolution process of the two platforms, and then introduce the process of the unification of search and recommendation architecture.

Ii. Evolution of shell search platform architecture

Shell search platform mainly experienced four stages: search service, search platform, search cloud platform and search platform.

In 2017, it was just a simple search service, mainly used for the second-hand house search of Lianjia. With the rapid growth of the company’s business, many other lines of business also require search capabilities. Therefore, in line with the principle of not repeating the wheel, we platform the search service, open its ability, to empower each business, thus becoming a search platform.

By 2018, the search platform was connected to more than 100 businesses, with 500 million daily traffic.

Became search platform, we found that the access to pick up more and more business, each pick a business needs to take up a certain amount of time, cause most of the time you spent on business docking, don’t have much time can be used for iteration itself platform technology, in the long run, it will be hard to have the technology promotion and precipitation, not to the platform or team of students, is extremely unfavorable.

Therefore, in 2019, we upgraded the whole business connection part of the original search platform into a search cloud platform by streamlining, online, productization and self-service.

By 2019, the entire search cloud platform was connected to more than 300 services with 1 billion daily traffic. With the search cloud platform, the business side can complete most of the business access and online work through the cloud platform, releasing most of our search RESEARCH and development manpower, so we can devote more RESEARCH and development resources to the optimization of search effect and stability.

So in 2020, our search cloud platform was further upgraded to search middle platform, so far we have access to more than 500 businesses, with 2 billion traffic per day. It can be seen that the architecture of our whole system is constantly evolving iteratively with the development of business.

1. Stage one, simple search services

The initial architecture of the search service was very simple, with SolrCloud at the bottom and two services at the top: write service and query service.

Write service provides full data update and incremental data update functions, Query service has simple Query parsing service, recall service and sort service, the upper layer is a unified API interface, provides write interface and read interface, as well as configuration change interface, is a very simple search service.

2. Platform stage

After upgrading to a platform, we have made a big improvement in data flow in order to reduce the cost of business access and quickly connect business data. The data changes of the business side can be monitored directly and synchronized to the search platform through MySQL binglog. The underlying engine was also upgraded to an ES cluster.

Query service is split, the upper layer is the effective Query service, including Query parsing, recall, sort SearchService, the lower layer is a basic Query service BasicSearch, directly connected with ES cluster, do some basic recall.

Businesses that do not require a special recall sorting policy can query BasicSearch directly. The upper layer has a unified gateway to collect traffic, authenticate requests from all service parties in a unified manner, and then distribute the requests to lower-layer services.

As mentioned above, after becoming a search platform, more and more businesses are accessed, which leads to RD spending a lot of time on business access.

In the early stage, there are many steps for a business access, such as sending emails first for requirement communication, then sending emails for demand scheduling, RD for development joint adjustment, then sending emails to explain that joint adjustment has passed, then QA joint adjustment, then sending emails for QA joint adjustment test, and finally the business can be launched.

It can be seen that the whole process is very complicated, from the initial demand raising to the final online need to go through 8 steps, when there are many businesses, if always follow the way of offline manual docking, the efficiency is very low.

3. Search the cloud platform

In order to change this situation, we developed the search cloud platform. The core idea of the search cloud platform is to make the whole business docking process online, productized and self-service. You need to manually modify the configuration of the QA test environment after RD joint commissioning is passed, and then modify the configuration of the online environment after QA joint commissioning is passed.

There are two problems: one is the overall low efficiency, and the other is that most of the configurations are manual, so it is easy to make mistakes, resulting in online failures.

In order to solve this problem, the implementation scheme of our search cloud platform is to put the configuration in Mongo. After passing the joint tuning, the configuration of RD environment can be synchronized to QA environment with one click. After QA verification is passed, one-click synchronization to the online environment eliminates the whole process of manual modification of test environment configuration and online environment configuration, thus greatly improving efficiency.

The second is the platformization of the whole business access function. In the upper layer, we developed various visualization modules, including the visualization of word segmentation effect: you can directly see the word segmentation effect of different word segmentation, so as to choose the word segmentation you want. Visualization of data flows: You can see how synchronized the data flows are, including performance, how much data remains unsynchronized, and so on. Next comes SLA visualization, data change logging, configuration change logging, and so on.

The following is the time statistics of each module, including service RD time, service QA time, search RD time, search QA time, and long time active intervention.

The whole search cloud platform is to improve the speed of business access. Through time-consuming statistics, it can be convenient to see which link is time-consuming, so as to optimize the link, just like slow query optimization.

In terms of platform management, the first step is to break through the dependence of data flow, followed by self-service access and self-service operation and maintenance, including index management, cluster management, segmentation management, service replication and other functions.

These functions greatly improved the access efficiency and operation and maintenance efficiency of RD, so we further improved the testing efficiency of QA and developed self-testing and automatic audit on-line functions.

The bottom layer is the monitoring and alarm platform, including the whole link tracking platform, monitoring platform, alarm platform and duty management. The following is the functional module diagram of our entire search cloud platform.

For example, the business side fills in the requirements through the platform and applies for access. After entering the RD, the business side will fill in some corresponding configurations according to the requirements. After that, the business side can further improve the configurations, such as the address of the data source, which will be automatically synchronized to the ES cluster.

The business side can also create its own table structure through the platform, specifying which fields to have, which fields to be word-segmented, which fields to be indexed, and so on. After configuring the monitoring data source, callback address, index structure, and data inspection, the configuration can finally take effect and be returned to the corresponding search interface of the business. Then the business side can go to the joint debugging by itself. After the joint debugging is configured synchronously in the development environment, test environment, and online environment, the whole process is finished. Under good conditions, a service can be completed in as little as half a day from access to online.

Finally, through the launch of the cloud platform, our overall business access efficiency has been improved by 3 times. Before, it took 9 days on average for a business access, but now it only takes 3 days, and the search efficiency of RD has been improved by 6 times. The upper limit was changed manually, but now it is synchronized by platform automation, and the failure rate is reduced by 60%.

Through these efficiency improvements, we free up a lot of r&d manpower, which can be invested in the effect optimization and stability optimization, which can be further upgraded to search middle stage.

4. Search for the middle platform

The following is the architecture diagram of the search center. The upper-layer gateway is responsible for unified authentication, distribution, traffic limiting, fusing, and degradation as before. The data flow is written to the distributed search engine through event construction, data construction and other modules.

The Query layer will call each service through the central control module, and carry out error correction, rewriting, classification and understanding of Query. Then recall is called, and the recall module will recall the underlying data according to the recall strategy or recall model. Then the sorting module is called, and the final result is returned to the user after the finalization of the real-time sorting model.

At the same time, we further improved the unified service governance platform, including registry, configuration center, load balancing, message bus, fuse degradation, link tracking, monitoring alarm and service choreography modules, and finally formed our search center.

3. Architecture evolution of Shell Recommendation platform

Shell recommends platform architecture evolution also experienced four big iteration, the earliest is simply based on the content and rules of recommendation engines, adds further behind customers personalized recommendation, portraits and collaborative filtering through real-time calculation and real-time model to realize the real-time personalized recommendation, the final access and iteration in order to improve business efficiency, The recommendation platform underwent a major upgrade and reconstruction to support service configuration and access, and was finally upgraded to an intelligent recommendation platform.

1. Content-based recommendations

In the early stage, content-based recommendation was very simple. The bottom layer made offline calculation of some housing data (second-hand housing sources, rental housing sources, etc.) and used content-based recommendation algorithm to directly calculate similar housing sources and popular housing sources offline, and then wrote Redis.

The online recommendation service then finds the houses that may be of interest after offline calculation in Redis, and then directly returns them to users for recommendation.

Real-time personalized recommendation

On the basis of content recommendation, we introduce house characteristics, real-time user portraits and real-time user behavior records to upgrade to real-time personalized recommendation.

At the bottom layer of personalized recommendation, broker’s operation data, user behavior logs and other data are added, and then data cleaning and feature engineering are carried out through offline computing to generate house features and user portraits.

Then the collaborative filtering algorithm is used to make collaborative filtering recommendations, and the data are updated to the online storage engine in batches, including the offline calculated recall data, feature pool and filter set.

Similar to the previous architecture, each line of business has independent recommendation services. Recall data and characteristic data can be obtained by directly querying online storage, and then returned to users after calculation based on policies.

The business system will split the flow through AB experimental platform and conduct effect iteration experiment. At the same time, both the business system and the recommendation service will reflux the real-time buried point log into the real-time computing service and the offline data warehouse. In this way, real-time recall data and features are updated to achieve real-time personalized recommendation.

3. Smart platform recommendation

In order to improve service access efficiency and effect iteration efficiency, real-time personalized recommendation is further upgraded and iterated. The online recommendation service is split and reconstructed, and the lower level offline computing and real-time computing remain basically unchanged.

The purpose of reconstruction is mainly to solve the early “chimney mode”. Each business scenario no longer corresponds to an independent recommendation service, but uses the same set of recommendation services to support all upper-layer businesses. New services are directly reused online, rather than re-developing and starting a service, thus greatly improving efficiency.

To achieve this goal, we split the whole recommendation service into logical layers, including application layer, computing layer, data layer and model layer.

The application layer provides API interfaces and handles simple service rules and configuration management. The computing layer contains recommended core processes, such as recall, fusion, sorting, and filtering, which invoke the data layer and the model layer respectively. The data layer queries basic data of the lower-layer online storage systems in a unified manner. The model layer will call model service for online prediction after online feature engineering. The computing layer gets the results returned by the data layer for policy fusion, then calls the model layer for model refinement, and finally returns to the business system.

4. Unified shell search recommendation architecture

If we recall the general architecture of the search platform and recommendation platform, we can see that they have a lot in common or similar. We can first compare the similarities and differences between the search system and the recommendation system.

1. Search for recommendations

The purpose of the two systems is to solve the problem of information overload, and the purpose is the same from the business scene of Shell, which is to improve the online conversion rate of business opportunities and match tenants.

From the perspective of process, both of them contain several core modules: recall fusion, model sequencing, business rearrangement and recommendation reasons.

Data, especially shell search and recommendation, will use these core data: housing details, housing characteristics, user portraits, user behavior characteristics and so on. Algorithm models can also be reused, such as the WDL and DeepFM models we use now, which can be used to search and recommend both scenarios.

Platform tools can also be reused, search and recommendation will use AB experimental platform, machine learning platform, model management platform and effect analysis platform.

Looking at the differences, in terms of behavior, search is a very active behavior, while recommendation is passive. In terms of intent, the intent of search is generally clear, while recommendations require only vague preferences. Query is obvious, and is present in most search scenarios, but not recommended.

Search has relatively weak requirements for personalized recommendation, while recommendation has a very strong requirement for personalized recommendation based on user portraits. Diversity is also going to be weaker searches, stronger recommendations. Search is highly relevant, recommendations don’t need to be so relevant, and you’ll want to be able to deliver some “surprises.”

The real-time requirements of the search data are particularly high, and the data requires second-level updates. For example, a house cannot be found after it has been sold. Much of the recommended data is up to date. There is another difference is read filtering, recommendation is basically read will not be recommended, but search will not, read will also show.

2. Why do we need to do architecture unification

The above comparison of similarities and differences also partly explains why we do architectural unification, which I will elaborate on here.

The first reason is that they can be completely unified as we introduced before. They are identical and similar in overall purpose, function, process and architecture.

The second reason is the core purpose of our unity: to reduce costs and improve efficiency, which is the title of this sharing.

Since its purpose, process and functional architecture are all similar, we can definitely improve our overall efficiency by using the same architecture and code to complete it. Our engineering and algorithmic people can reuse it. Code, data, and feature models can also be reused, reducing development and maintenance costs.

In the past, the search team had its own engineering research and development and algorithm research, while the recommendation team had its own engineering and algorithm development and maintained its own system, so there would certainly be a lot of repetitive work in it. After unification, some platforms and tools used by both sides can be reused, avoiding duplication of wheels.

The above three points can be directly solved through architecture unification, and the last two points are what we hope to optimize in the process of unification. For example, the effect iteration of the conventional strategy can support the on-line configuration of the interface, simplify the process and reduce the on-line cost.

Secondly, it is necessary to decouple the modules of recall, sequencing, rearrangement and reasons to support stratified experiments. Special personnel can be assigned to perform their respective duties. For example, some people are responsible for optimizing recall and others are responsible for optimizing sequencing to further improve the overall research and development efficiency.

Therefore, overall, our core purpose is to achieve a 1+1 greater than 2 effect after the unification of search and recommendation architecture, so as to reduce costs and improve efficiency in all aspects.

There are ancillary benefits, such as improved overall stability. Because of the relative stability of search requirements will be higher than recommended, and the entire search traffic a lot larger than recommended, so the search team before service governance to be more perfect, has a complete set of service management system, recommend side slants little, after the reunification of the architecture, recommend can directly reuse before search service of a complete set of management system.

In addition, performance can be further improved. Previously, the recall of shell recommendation system was based on search. Recommendation recall would directly call the search gateway, and then the search service would call the underlying engine, such as ES, etc., so it would go through several network transmission.

When the architecture is unified, there is no need to distinguish between search and recommendation. The recommended service can directly query the underlying ES just like the search service, so as to reduce network calls and improve the performance of the recommendation system.

3. Unified architecture scheme

The figure above is the overall architecture diagram after the unification of the search recommendation architecture. In fact, it is similar to the previous architecture, but with an integration of search and recommendation. The upper layer is still each business line: the second-hand house, new house, overseas and leasing business lines invoke a unified gateway for traffic distribution, authentication, fusing, traffic limiting, and degradation, and then invoke each service at the bottom.

The search cloud platform mentioned above provides unified service access, overall configuration management and online. It then reuses the entire service governance architecture searched previously: registries, configuration centers, and so on. The data flow monitors service data changes and synchronizes data to the online storage engine in real time.

The major reconstruction we made was the integration of all modules of the original search and recommendation system at the query layer.

Latest query layer is mainly divided into six core modules, the request will initially through the central control module to do parameter calibration, scheduling policy, cache and out, and then control will go to call the lower module, the first intention parsing module (search is used, recommend don’t need), to call back again after getting the result of intention parsing module, recall when will get some user’s portrait and characteristics, Then, multiplex recall and fusion filtering are carried out to return to central control.

After obtaining the recalled data, the central control calls sorting, including coarse and fine sorting, and then rearranging. After that, the reason module is called to supplement the recommendation reason, such as “Man five only”, “near the subway” and so on. After getting the reason, it will finally feed back to the business side and complete the whole process of search recommendation call.

The central controller is responsible for the scheduling of each module, for example, recall can be directly called, and then sort and rearrange, etc.

At the same time, in terms of storage, we have added several new engine capabilities, previously only text retrieval ES engine, later added vector retrieval engine and graph retrieval engine.

The remaining modules are the same as the previous recommendation and search, and will also reflux the buried point log of the business side in real time and then carry out real-time calculation and offline calculation. The above is the new search recommendation architecture after the unification of our architecture.

Introduce some core services:

(1) Central control service

The design principle of the central control service is to avoid business logic as far as possible and ensure the stability of the central control service by minimizing iteration.

We see the core of the control is to determine the lower modules of scheduling, control of the downstream modules do the drop, so the downstream modules of exception will not affect the overall search and recommendation request, but if the central problems may have an impact on the stability of the online, so we need to try to ensure the stability of the control services.

Central control is mainly responsible for parameter verification, scheduling, caching, degradation and other functions. For example, it is recommended to skip this module directly without going through NLU, and some scenes can be skipped directly through the configuration of central control without going through rearrangement or reasons.

Secondly, the central control can cache some modules, such as NLU and reason results can be cached.

In the end, the most important function of the central control is degradation. Any timeout or exception of the downstream service will not cause the abnormal query of the business side. Each module has a default timeout setting, but at the same time, the remaining time will be calculated in real time.

A regular invocation chain, for example, starts with intent resolution, then recalls, and then feeds back to the business side. If we find that the reason service hangs or the response times out when we call the reason after rearranging, the central control will skip the reason module and return directly, which is equivalent to degraded return.

If the recall module times out, the central controller will also skip the recall module and directly access ES or Redis, and then take the results to go through the subsequent process, which is equivalent to skipping the whole recall logic and directly taking the recall data returned by the basic engine and sending it to the later process.

In the worst case, if the underlying storage engine fails, the central controller will directly look up Redis cache data or default data and return it to the user.

The timeout time of the next module is determined by the timeout time of the previous call. The business side generally sets the timeout time as 1 second, but in fact our flat ring is about 50ms.

For example, in an abnormal case, we find that 950ms has been spent when we adjust and rearrange. Since there is only 50ms left, when we adjust reasons again, the timeout time of reason module will be set to 50ms in real time, ignoring its default timeout time.

(2) Recall service

Recall service includes the construction of the request, correction and rewriting of the request after receiving the NLU result, and then obtaining the user portrait, housing characteristics, etc., and then performing multi-way recall, fusion and filtering.

Text recall will call Elasticsearch, policy recall will look up Redis, vector recall will look up Milvus, business recall will call the business side interface, filter recall is recommended specific, such as some read filters. After the multipath recall, an overall fusion filtering will be done, and then back to the central control to go to the next process.

(3) Rearrangement service

The rearrangement service involves a very large number of business rules, each line of business is different, some of which can be reused, some of which cannot be reused, such as strong interpolation, top placement, mixed scheduling, and so on.

In order to easily combine and reuse the rule logic, rearrangement implements the workflow mechanism of Workflow. For example, the default configuration has default rules, such as deduplication, fusion, score calculation, and sorting by field. Opt-in can add rules, and opt-out can remove rules.

Through this workflow mechanism, we can reuse many methods, through simple configuration to decide which rules to follow, which rules not to follow, so that the vast majority of scenarios can be configured to meet the online.

As a matter of fact, we are still in the process of unifying the architecture, because we have a lot of services, but we have made some initial achievements, at least doubled the human efficiency.

The original search project is six people, recommended four people, a total of ten people, now after the merger only need five people. The efficiency of effect iteration has also been improved by three times. Before, the adjustment of some policy rules took an average of ten days from development to test to launch, but now it is enough to launch in three days through configuration.

5. Future planning

There are two main points for future planning.

First, precipitate the general strategy model combination to form a similar “strategy package”. Non-core businesses can be selected independently and reused quickly.

Because our resources are still limited, whether it is algorithm or engineering. We have a large volume of business, which involves hundreds or thousands of businesses. It is impossible to optimize every business.

We hope that by depositing some general strategy model combinations and packaging them into a similar package form, some non-core businesses can choose their own packages to reuse the configured models and algorithms and further improve the overall optimization efficiency.

Second, we hope to create a one-stop effect optimization platform.

Many systems mentioned above, such as cloud platform, experimental platform and model management platform, are scattered at present, and some platform tools have been developed and perfected, but some are not yet completed, such as sample management, feature management and experimental management, etc.

In the future, we will unify and perfect all platforms and tools, and integrate them into a one-stop effect optimization platform, including business management, effect indicators, machine learning, effect prediction, flow experiment, intervention operation and other modules, so as to further improve the efficiency of effect optimization iteration.

Six, Q&A

Q: How is the cloud platform monitored? What indicators are monitored?

**A: ** There are many indicators monitored by our cloud platform, such as data flow, including write time, write quantity, write QPS, real-time rate and loss rate of each module. Query: overall query time, query time of each module, average query time, 9999 quantile time, 9999 quantile time, query QPS, overall traffic size, overall traffic stability, the number and ratio of each status code (200, 400, 499, 5XX), etc. It also includes monitoring of slow queries, such as quantities over 100ms, quantities over 500ms, ratios, and so on.

These are mainly performance and stability monitoring, but there are also some performance monitoring indicators such as CTR, exposure, and conversion rate. Some of these monitoring through log monitoring, some indicators reported monitoring, using a variety of monitoring systems.

Q: Will automatic editing generate data structures and cause unreasonable structure? How to avoid it?

**A: ** actually there will be, we have encountered this, in fact, just now you can see that our cloud platform has A data interface automatic verification, before we did not have that function, often encountered users edit, fill in the table structure, what fields, what types.

When we pull the data, we will find that the structure of the field in his data source MySQL is different from what he edited, sometimes there is an extra field, sometimes there is a missing field, sometimes the table structure is written as a string type is actually a number type. It is common to have such problems, and then find that the data cannot be carried in, cannot be synchronized, resulting in data blocking.

Therefore, we added the automatic verification of the data interface later. After filling in the data, the system would immediately pull out a small amount of data, hundreds or thousands of data for verification. Only when the data pulled out are completely consistent with the table can we proceed to the next step.

Q: Teacher, how does this link tracing work? Is it full link?

**A: ** We are now full link tracing, based on ES APM. If you know, you will know that it can automatically collect the time of each module, and finally put it into an ES log cluster, and then you can analyze the time of each link, the overall time, and so on.

Q: Ask the teacher, how to realize the recommended multi-channel parallel recall?

**A: ** Multi-channel parallel recall mainly sends multiple requests or queries multiple search engines, such as vector engine and text engine, in parallel through the thread pool, and then follows the fusion strategy for multi-channel fusion after obtaining the results of multiple recall streams.

Q: Is the gateway openresty+ Lua?

**A: ** Gateway We are using Zuul, the gateway component in Spring Cloud.

Q: Teacher, is there any good way to ensure the quality of search and recommendation?

**A: ** Search recommendation quality assurance, what warranty does this refer to? Stability has already been mentioned, and in terms of stability, we do a lot of it, as I mentioned earlier, there is a whole system of service governance stability assurance. Secondly, our service must be distributed multi-machine deployment. If a single service fails, it has no impact on the whole service. Meanwhile, a very perfect fusing downgrading mechanism is implemented. Including our underlying storage engines, are also deployed in a dual machine room with each other.

We have also set up a sound monitoring and alarm system, such as 499 and 5XX, which will send messages or call the police if it exceeds 1/1000. There is also a monitoring alarm for the effect indicators. For example, if the CTR conversion rate suddenly decreases significantly, it will also timely alarm and then locate and analyze.

On the one hand is to improve the monitoring alarm, in addition, from the beginning of the design, it is necessary to consider this aspect of the problem, such as the degradation of the central control just said, is in the design of the time to fully consider each service down when we should do? What we want to achieve is that any service failure does not affect the overall query. Secondly, quarterly online pressure measurement will be carried out to timely find out some hidden dangers and find out the actual online throughput.

Q: Sir, is the data center necessary? How to choose?

A: * * * * data actually China relations tell us today is not very big, but now that classmate asked, personally, I think is not A must surely see company scenario, if the data is not so much, don’t need to middle data, if it is big company data very much, need to each department, China is likely to need to do data.

Q: Should we build the underlying server by ourselves or use the public cloud? After the human efficiency is improved, will everyone have to work overtime?

**A: ** Now Shell has both its own computer room and Tencent cloud computer room. We have a double room backup now. In the worst case, if one equipment room is down, services are not affected. You can switch to another equipment room in real time.

Then whether you need to work overtime, overtime this thing has little to do with human efficiency, if the business needs more urgent, generally still need to work overtime. If we are not in a hurry, we will basically work overtime without too much.

As the human efficiency improves, people can do more things and explore newer technologies. Such as our overall effect after ascension, originally need ten people to do things, now only need five people is enough, free up to five people can make a new direction of exploration technology, such as the exploration of new vector engine, new figure database engine, including a new model algorithm research and so on, is actually the plate bigger, do more, output, of course, will be more.

Q: Is all the monitoring developed by yourself? Are there products from third party manufacturers?

A: Both. There are some open source components, some developed by the company, and some unified monitoring systems provided by the company’s other platforms.

Q: What is the essential difference between the search platform and the search medium?

**A: ** Actually, we don’t talk much about the concept of the Middle Platform internally. If you must say the difference, I think it is more in-depth business in Taiwan relative to the platform. When we make a platform, we think more about generality, as little as possible to do business logic in the platform, otherwise we will feel that the platform is not enough generality. But do in the middle of the time, more is to think precipitation of some business common problem solutions. If only one business has such a problem, it may be enough for the business to solve it by itself. However, if hundreds of businesses are connected like ours, many of them have the same or similar problem. Instead of solving each business separately, it would be better for the middle stage to think about a unified solution.

Q: Can it be extrapolated?

A: We are also short at the moment, so you are welcome to send your resume. Email address: [email protected]

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Reduce cost and improve efficiency, shell search recommended architecture unified road