Introduction: Search has always been one of the core entrance of the flow source of the e-commerce industry. How to build the e-commerce industry search and improve the search effect has always been a difficult problem for the developers of the e-commerce industry to overcome. Although basic search service can be built based on traditional database or open source engine, with the increase of commodity data and the growth of business traffic, it will inevitably encounter performance bottleneck and effect bottleneck. On the other hand, with the continuous development of e-commerce, live streaming, cloud computing and other technologies, more and more traditional retail enterprises are transforming on the Internet cloud. Especially affected by the epidemic in the past two years and other factors, APP and mini program have become an important source of business growth for retail enterprises. In this context, how to quickly build efficient search services on the retail industry cloud and transformation of the problem.

The author of this article is Liu Zhijia, intelligent product manager of Ali Cloud

Search has always been one of the core entrance of the traffic source of the e-commerce industry. How to build the e-commerce industry search and improve the search effect has always been a difficult problem for the developers of the e-commerce industry to overcome. Although basic search service can be built based on traditional database or open source engine, with the increase of commodity data and the growth of business traffic, it will inevitably encounter performance bottleneck and effect bottleneck. On the other hand, with the continuous development of e-commerce, live streaming, cloud computing and other technologies, more and more traditional retail enterprises are transforming on the Internet cloud. Especially affected by the epidemic in the past two years and other factors, APP and mini program have become an important source of business growth for retail enterprises. In this context, how to quickly build efficient search services on the retail industry cloud and transformation of the problem.

In order to solve these two problems, Ali Cloud Computing Platform Business Division launched search solutions based on MaxCompute and open search for e-commerce and retail industries to realize commodity storage, database building, search and optimization of the search development platform.

This paper will introduce how to build e-commerce industry search service based on MaxCompute and open search quickly and efficiently from the four aspects of product introduction, e-commerce industry characteristics, industry search development practice and more solutions.

I. Product introduction

MaxCompute profile

Simple, easy to use, fully hosted open services

MaxCompute is a simple, easy-to-use, fully hosted, analyzing-oriented enterprise Saas cloud data warehouse launched by Ali Cloud. It is simple and easy-to-use, and can be flexibly expanded to match business development. For developers in the cloud, MaxCompute supports multiple business analysis scenarios, including machine learning, data lakes, traditional data warehouses, and near-real-time data warehouses, and provides a more open development ecosystem.

Serverless Flexible data warehouse

To achieve the goal of minimizing costs while meeting differentiated requirements, MaxCompute provides fast, fully hosted online data warehouse services based on the Serverless architecture, eliminating the limitations of traditional data platforms in resource scalability and flexibility. It can meet the needs of users, such as service agility, cycle fluctuation scheduling, critical task assurance, stability and predictability, and minimize user operation and maintenance investment, so that users can economically and efficiently analyze and process massive data. These features make MaxCompute suitable for e-commerce and retail applications and meet the computing and storage requirements of industry developers.

In addition, MaxCompute provides Serverless data access services, multiple computing environments, storage services, and resource management, greatly reducing user o&M costs and enabling users to focus on their own business expansion and development.

Open ecology

In terms of product ecology, MaxCompute provides a full range of open ecology, such as product own open ecology, Ali Cloud product solution ecology, data application ecology, open source engine tool integration and so on. Based on MaxCompute, developers can choose the business development mode freely and customize personalized product solutions more flexibly.

Continue to build an open product ecosystem

MaxCompute offline, real-time, analysis, service integration data warehouse, especially suitable for enterprise real-time data warehouse scenario, BI report interactive query scenario, user portrait analysis and other scenarios, and these scenarios are the e-commerce industry commodity data storage, user behavior guidance and analysis of an indispensable part.

In Alibaba Group, MaxCompute, as the best practice of double 11 real-time query scenario, can support hundreds of millions of TPS write speed, and pB-level data sub-second query corresponding, fully meet the requirements of high timeliness in e-commerce industry promotion scenarios. Based on these features, MaxCompute has become the preferred storage and computing service for cloud developers in the e-commerce industry.

As mentioned above, MaxCompute supports open source ecological integration, mainstream commercial software integration and other open ecosystems. At the same time, it can form a one-stop solution together with other alicloud products to build big data service applications such as search and recommendation commonly used by e-commerce. Especially for e-commerce and retail industry search services, MaxCompute can connect with another cloud product open search, forming a one-stop search development platform.

Introduction to Open Search

Open Search is the middle platform of Alibaba Group’s search business and an intelligent search cloud service product based on big data deep learning online service system. Within Alibaba Group, there are more than 500 businesses connected to Taobao, Tmall, Hema, Cainiao and so on, supporting search visits of 10 billion yuan per day. During singles’ Day, I stably supported the search services of various products within Ali Group, and the peak QPS of single business exceeded one million. Open search has been commercialized on Aliyun since 2014, and has provided search services for thousands of customers, hundreds of e-commerce and retail enterprises.

One-stop intelligent search business development platform

Open search products provide core engine, recall sorting, search guidance and other links before, during and after the search services and capabilities, to achieve one-stop search business development. For experienced search developers, open search provides application structure, recall, sorting, algorithm and other links of open services, to meet the personalized customization needs of developers; Open search provides industry templates for e-commerce, education and other industries for zero-based white users as well as product and operation students. One-click search services with better results can be quickly built to help enterprises achieve business goals.

Especially for the e-commerce industry, open search provides products, orders, store search, database acceleration and analysis of multi-scene search methods and solutions.

Ii. Characteristics of e-commerce industry

The e-commerce industry is highly transaction-oriented and GMV-oriented. It takes guiding more and higher purchase transactions as the ultimate goal to achieve a win-win-win situation among e-commerce platforms, buyers and sellers. Search and recommendation are the most important traffic entry points in the e-commerce industry at present. The three apps in the picture all place the search entry at the core of the whole APP to facilitate users to find the search entry in the first time. Below are other sub-applications or product classification and screening, and below are recommendation feed flows. According to the data, more than 90% of GMV contributions come from search and recommended traffic.

When a user opens the e-commerce APP with a clear purchase demand, there is a high probability that he will search for the target product. In this scenario, the guided purchase rate and conversion rate are very high. Therefore, the search effect is very important for the e-commerce industry.

So how do you measure the effectiveness of a search? According to the accumulation of the electricity industry search experience for many years, our main electricity search core index can be divided into effect index, performance index and effect index contains the hit rate, no results, performance indicators include search response time, data synchronization, response time, etc., simply put, is to make the end user is quicker, more accurate find the target product.

In addition, Query in e-commerce industry is also different from Query in other industries. Users in e-commerce industry are accustomed to keyword stacking when searching. For example, when a Query fails to find a specified product, they will enter supplementary Query to realize the filtering of search results. This also results in that the word order of Query in e-commerce industry has less influence on search than that in other industries. For example, the search for Huawei mobile phone and huawei mobile phone can be understood as the same search behavior. Since many general e-commerce apps contain commodity information of all walks of life, the same word will represent different information when it appears in different contexts. When Xiaomi is followed by mobile phones, it is a mobile phone brand; when xiaomi is preceded by organic products, it is a commodity category.

Based on the special search Query characteristics of e-commerce industry, when users build their own searches through databases or open source engines, they often encounter problems such as less Query recall, poor document relevance, and unsatisfactory sorting results caused by colloquial queries, which affect the search effect and even the purchase conversion of users.

In terms of user intention recognition, when different users input the same word in different scenarios, many commodities in various fields may be covered. For example, when a user types in Apple, he may be referring to a phone, fruit, tablet, headset, laptop, and many other categories. This is also one of the badcases often encountered in the early stage of self-built e-commerce search through open source solutions.

So, how to solve these problems and BadCase, optimize the search effect of e-commerce industry, improve the search guidance GMV?

Third, industry search development practice

MaxCompute+ Open search industry search development practices

E-commerce search services involve multiple dimensions such as commodity data, search Query and user behavior, as well as multiple links such as before, during and after the search. When we connect with different enterprises, we often encounter various problems raised by customers. Those of you who have no previous search experience may ask, how do you build a database of goods? How to accurately understand the user query intention? Seasoned developers may ask, how do you personalize the search experience for users? How do you guarantee performance in high concurrency scenarios?

In order to help e-commerce and retail industry developers to solve the above problems faster and better, MaxCompute combined with open search proposed the corresponding industry search solution.

In general, the user transfers the commodity data and behavior data stored in MaxCompute to open search through automatic database synchronization or API/SDK synchronization, and then customizes query analysis, sorting, search guidance, intervention, and extension functions in open search. Finally achieve better search effect of high performance, high real-time type, high reliability, full hosting, free operation and maintenance of the e-commerce industry search solution.

According to the actual search behavior of users, the solution can be disassembled into five key links of building search application, user input query word, user intention identification, access to search engine, and return search results, corresponding to the development of MaxCompute library building, search guidance, query analysis, search engine, sorting service five modules.

Commodity building

In the commodity database construction stage, users store their commodity data and user behavior data into MaxCompute. In order to facilitate the use of e-commerce industry developers, open search provides e-commerce industry template, and users can create search application structure with one click to achieve rapid database construction. Next, you define the field types, meanings, and associations between multiple tables in each table based on the fields in MaxCompute or the custom application structures in open search. Then, according to the search requirements of different business scenarios, different fields are combined into the target index and searched in the corresponding index. For example, in the e-commerce industry, product name, store name, and product category are all common search fields, and these fields can be unified into an index. When users enter Query, they will search for information associated with products and stores in these fields. After the index structure is built, the search service will be built for users. When the application is in the “available” state, the basic version of the search service will be built.

Search guide

Before the user enters a search Query, the e-commerce industry often provides some preset search queries, a process known as search guidance. At present, common pre-search guidance modules include hot search and shading. Hot search is to provide some popular search terms according to recent hot events and user search behaviors, so that users can directly select search. Shading means that there is a preset Query in the search box before the user enters the search term. The user can directly click search to search for the corresponding search term. Hot search and shading are an important part of the search link. On the one hand, hot search and shading can guide users’ search behavior and reduce the difficulty of tuning the following links. On the other hand, they can also achieve the goal of improving search and guiding purchase according to different operational objectives at different times. At present, open search not only supports hot search and automatic training of shading model, but also realizes manual intervention of timing and positioning through black and white list, so as to achieve the effect of manual operation and guidance.

Another commonly used search guidance is a drop-down prompt, that is, in the process of Query input by users, other candidate queries are automatically associated, which reduces user input cost and achieves traffic guidance effect. At present, open search supports a variety of drop-down prompt model construction methods, and supports high-frequency search terms, historical search terms, intelligent sorting, human intervention and other drop-down prompt extension functions.

Search guidance through hot search, shading and drop-down prompts can improve users’ search experience, realize manual operation and attract purchase conversion.

User intent recognition

After the user leads by search or manually enters Query, a search request is opened.

First, we need to understand the actual search intention of users. As mentioned earlier, users in the e-commerce industry sometimes enter some colloquial expressions or pile up keywords when entering a search Query. Therefore, we need to transform the Query described by users from the perspective of purchase requirements into a structured and relatively clear and normative expression, which is the user intent identification process.

Common user intent recognition includes synonym expansion, stop word omission, error correction rewriting, entity label recognition and category prediction.

Next, we will introduce user intent identification in detail through an example.

For example, the user enters a query called NIKE’s blue shoe high top. We will normalize some punctuation marks or upper and lower case. The first step will become Nike’s blue shoe high top, and then we will classify the input query into Niki’s blue shoe high top through word segmentation of e-commerce industry. The next step is to stop using words, such as setting the “of” is a meaningless word, it becomes Nike blue shoes high top. Next comes the spelling correction, which will correct the typos and turn them into Nike basketball shoes. Next, I will use a category often used in the industry called industry entity recognition to analyze the meaning of the previous word, the change is: Nike: brand, basketball shoes: category, high top: style. In addition, development search supports category prediction. Given the above results, the current query is given a weight, Nike – high, basketball shoes – medium, high top – medium. Do another search term extension, such as (Nike OR Nike) sneaker high top. The final output is a query that the engine can understand after a layer of rewriting, which is entered into the search engine.

Search engine recall

After the Query rewrite is complete, the search engine recall phase is entered. Open search provides a variety of recall strategies including text recall, personalized recall and vector recall. Text recall is the most common recall strategy in the search field. It compares the text correlation between the rewritten Query and commodity data and uses inverted index to realize the recall. Open search uses the text search engine qantian 3 developed by Alibaba Group, which can handle search tasks in high concurrency and multi-write scenarios with high performance and return search results faster. Personalized recall will introduce personalized information of users based on the rewriting of query words and return personalized search results of thousands of faces facing users. Vector recall will introduce vector information on the basis of rewritten terms and return search results according to the vector similarity between query terms and commodity data. Vector recall can solve the problem that traditional text search may miss some seemingly irrelevant search results that are actually the user’s target. Multi-path search using text recall and vector recall can greatly reduce the fruitless rate of search results and optimize the search effect.

Results the sorting

After completing the recall phase, we have obtained some product data related to users’ search needs. Next, we need to sort the recalled product data and feed back to users in the most reasonable order to ensure that users are most likely to click on the search results in the first place, so as to improve search guidance transformation and GMV. Open search provides coarse and fine sorting mechanisms, supports sorting expressions, custom plug-ins, and algorithm models, and fully opens the internal sorting process to developers, enabling them to customize their own sorting policies based on their own business requirements.

In the custom plug-in environment, open search provides the CavA compiled language and its plug-ins. Cava is a compiled language developed by alibaba. It has a Java like syntax, C++ like performance, and supports object-oriented programming. The OPEN Search console has integrated an IDE that supports CAVA compilation. Users can directly compile customized CAVA plug-ins on the console, making it easier to debug and modify.

To sum up, users use MaxCompute and open search to realize commodity database building, search guidance, user intent recognition, search engine recall, result sorting of e-commerce and retail industry search development, with better performance, fully customized search services. How to measure and optimize the search results.

Scheme special effects and effect optimization

First of all, word segmentation is the most basic and indispensable part of Chinese search. For e-commerce, retail industry, open search integrated Taobao search group e-commerce word segmentation, model training corpus from Taobao search accumulated over the years of millions of annotated e-commerce industry data. We compare the results of the open search generic e-commerce word segmentation with the open source IK word segmentation. Out of 100 actual e-commerce queries, the word segmentation results of 63 queries are better than those of the open source word segmentation. The ratio of Good to bad is more than 4:1.

Based on the e-commerce general word segmentation, we cooperated with the natural language processing team of Dharma Institute to optimize the e-commerce industry template, and proposed the e-commerce enhanced analyzer and the corresponding query analysis algorithm. Specifically, the accuracy rate of e-commerce word segmentation F1 is improved to 95%, entity recognition F1 is improved to 80%, spelling error FAR is reduced to 1.4%, and more than 100,000 e-commerce synonyms are added, all of which are in the leading level of NLP e-commerce field.

Here are some comparisons between the general profiler and the e-commerce industry’s enhanced profiler. In addition, for e-commerce and retail customers in different fields and vertical categories, we also support special algorithm customization services, providing user-level customized query analysis, CTR prediction, vector model, personalized model, etc., to improve the search effect in all aspects.

One-click configuration

For electricity users in particular are just beginning to cloud retail industry in the transformation of the Internet users, we provide a one-click configuration, users only need to check on the console to recall, querying, analysis, sorting, search for related functions such as peripheral services, can automatically generate the corresponding application structure, index structure, and the specific function strategy, Realize e-commerce search omnidirectional one-click configuration.

Customer case

Customers in e-commerce industry

The following is a brief introduction to two typical customer cases of e-commerce and retail industry search. An E-COMMERCE shopping platform APP provides users with commodity search, coupon guide and other functions. At the beginning, customers chose to develop their own search, but they soon encountered some bottlenecks. For example, under the index volume of hundreds of millions of commodities, complex search and screening requirements often affect the search performance, especially during the promotion of e-commerce, the peak traffic will be greatly increased. After researching a variety of products and solutions, the user finally chose the MaxCompute+ open search solution. The flexible operation and maintenance (O&M) mechanism of MaxCompute is highly applicable to the e-commerce industry, and open search ensures the performance and performance of search services. After continuous use for a period of time, we get good feedback from customers, especially the stable guarantee of engineering, operation and maintenance, which enables users to concentrate on business and algorithm research and promote product revenue and development.

Retail customers

Another user is a recently connected retail user. This is a supermarket retail brand used in more than 10,000 stores around the world. In the context of the rapid development of new retail market in China, online business is particularly important in order to quickly layout and enhance brand influence. At the beginning, the user also chose the self-developed search scheme and applied it to the online mall, but the effect was far from the expected, and the user’s shopping experience was not good. Recently, users have been connected to the open search e-commerce industry template, and the built-in multi-way recall, personalized sorting and other functions have greatly improved the search effect. After half a month of access, the overall add purchase conversion rate increased by 10%, and the no-result rate significantly decreased from 29% to 7.5%. In addition, users also mentioned MaxCompute+ cloud fully hosted service model of open search, which greatly reduces personnel investment and operation and maintenance costs, and the overall cost performance of users is extremely high.

More solutions

Multi-mode, multi-scene search effect optimization

In e-commerce industry, in addition to commodity search scenarios, there are many simple conditional search scenarios such as order search, favorites search, category search and so on. In these scenarios, MaxCompute+ open search can provide database search acceleration service and ensure high performance, high real-time search.

In addition, the vector recall ability of open search can realize the effect of patting and panning by image search, which has become another typical application scenario of reverse search.

On this basis, in conjunction with intelligent recommendation and other cloud products provided by Ali Cloud, the whole process of e-commerce application guarantee of search + recommendation + advertisement can be realized in the e-commerce industry.

More open engine capabilities

In another direction, Open Search is currently working on engine capabilities that will be available in the cloud for more developers to use. It is expected to launch at the end of September, when it will provide a more open ecosystem and all-round user customization capabilities.

The original link

This article is the original content of Aliyun and shall not be reproduced without permission.