background
With the advent of the digital age, the food delivery market has grown rapidly in recent years. Distribution efficiency and user experience are very important for outsourcing logistics system. And the actual delivery process was done by the distribution agent (rider) finally, therefore, to really improve distribution efficiency, not only should the intelligent scheduling system (ETA) orders assigned, path planning, fluctuation kongfu, but also enhance marki “additional” ability, let them more to send the “practice”, the send the “cis”, the send the “fast”. Taking this as the starting point, the R&D team of Meituan-Dianping has designed intelligent assistant for riders to comprehensively improve their abilities in all aspects.
At the AICon Global Artificial Intelligence and Machine Learning Technology Conference in January, He Renqing, head of delivery ARTIFICIAL intelligence at Meituan-Dianping, shared “Technology and Practice of Meituan-Dianping rider Intelligent Assistant”. In the context of complex environment and diversified user groups, intelligent assistant can be equipped with precise recognition of complex scenes, intelligent service push, intelligent guidance and full voice operation by using intelligent headset and voice interaction as the carrier and big data mining, machine learning, natural language processing and other technologies. In the end, the delivery ability of riders will be comprehensively improved from multiple dimensions such as intelligence, safety, convenience and accuracy, so as to improve the overall delivery efficiency and user experience. The following is a summary of the speech:
The business value of AI technology to same-city delivery
In general, logistics business is a traditional industry, but with the rise of e-commerce, mobile Internet and mobile payment, the whole logistics industry has achieved sustained and rapid development in recent years.
The figure above is a report released by China Federation of Logistics and Purchasing in 2016. According to the survey data, the number of logistics items in China increased by more than 50% month-on-month, reaching over 30 billion pieces.
At the same time, the cost of logistics is also very high. As can be seen from the figure, logistics cost has accounted for 15% of GDP. In Europe, the United States and Japan, the proportion is only about 8% ~ 9%, so China’s logistics industry has a lot of room for optimization. This is also a very important reason for many companies to invest in the logistics industry: the industry is in a stage of rapid development, and there is a huge room for optimization in experience, efficiency and cost.
The following figure mainly introduces the current development of Meituan Takeout:
Launched in 2013, Meituan Waimai has been able to serve 250 million users in more than 1,300 cities, serving more than 2 million merchants and placing more than 18 million daily peak orders. Meituan’s intelligent delivery scheduling system matches more than 500,000 delivery attendants every day, ensuring that the average delivery time is no more than 28 minutes based on massive data and artificial intelligence algorithms. It is also the largest and most complex multi-person, multi-point real-time intelligent distribution and scheduling system in the world.
Our positioning for Meituan distribution is to make it the largest instant delivery platform.
Compared with traditional logistics, instant delivery has the following advantages:
-
Number one, very fast. For example, the delivery of a takeout order from the merchant to the user takes an average of 30 minutes to complete, and the slowest should be around an hour. Fast, is one of the most important characteristics, fast, also can make the whole service requirements and service quality has been greatly improved.
-
Second, the ability to connect users and merchants directly. Before the logistics is basically from the business, to go through a lot of links, including warehousing, transportation scheduling, personnel distribution and so on, and finally sent to the user, the middle of several hands, and even by different companies, or different franchisees. But instant delivery directly connects users with merchants, and thus directly affects the target audience, which is a great value.
-
Thirdly, it can undertake a variety of delivery scenarios. It can not only deliver takeout, but also deliver supermarket, fresh and so on. Basically, all same-city express can be included in its delivery service scope.
In general, distribution is a very complex business. In order to facilitate everyone’s understanding, I have abstracted and simplified this business model, which can be illustrated by the following picture.
In essence, distribution is the process of matching users’ distribution needs with various offline transport capabilities (such as riders or vehicles). Matching is divided into offline matching and online matching. Offline matching mainly depends on operation, while online is some systems constructed by our technical department. At this level, what we need to solve is how to achieve the optimal matching between the demand and the capacity.
It is also a relatively traditional problems, like advertising or recommend, will face this problem, the requirement is to recommend products, supply is advertising the position, but the location is not an infinite number, how to achieve the best match between supply and demand, this itself is the efficiency optimization problem, but advertising and it is recommended to use CTR forecast, The methods used in logistics are more complex.
The complexities in distribution, in particular, are:
-
This is a NP-hard problem, and the computational complexity increases exponentially with size. For example, the path planning problem of N orders on riders, or the order allocation problem of M orders and K riders, both of which are exponentially complex and interrelated.
-
It’s not just a question of multipoint take more to send, and increase new orders at any time, has the very strong real-time computing requirements, when a new order form, needs to be done in a few milliseconds don’t scheduling operations, compared with the traditional logistics in dozens of minutes or more computation time, the difficulty of the instant distribution system design is much larger.
-
The distribution scene is very complex, involving dozens of factors such as weather, road conditions, rider proficiency, and restaurant speed, which greatly increases the randomness and complexity of the understanding space and poses great challenges to the stability and adaptability of the distribution algorithm.
For Meituan delivery, to accomplish this task, there are about three levels, as shown on the far right of the figure above.
-
The first layer is logistics infrastructure construction. How to set up the site in the city, how to deploy the manpower, how to deploy the supply of businesses. These infrastructure not only deeply affects the size of distribution, cost, efficiency, and is the basis of logistics management and operation, such as the joining trader operating management, the rider must be based on the structure and so on, so the role of the infrastructure is very important, and they are harder to real-time adjustment, very test technology of long-term forecast and planning ability.
-
The second layer is the dynamic equilibrium of supply and demand matching, which regulates the market through the pricing mechanism, including the following aspects: one is the basic pricing, for example, when an order comes, how much is charged to the user, how much is charged to the merchant, and how much is subsidized to the rider. This needs to consider many factors to ensure the reasonable and fair pricing. The other is the balance between supply and demand. In case of emergencies such as bad weather, users’ demand and transport capacity supply can be adjusted in real time through dynamic price adjustment to ensure the stability and user experience of the whole system.
-
At the third level, the real-time matching of orders and riders, namely the dispatch of orders, will be allocated to a most suitable rider within tens of milliseconds after the order appears, and the path planning of multiple orders will be completed. This is an NP-hard problem, and because new orders are constantly being generated, real-time computation is required, placing great demands on the parallel computing engine. The optimization goal of delivery is to improve the efficiency of the overall distribution and ensure user experience, which is one of the core modules of the entire distribution system.
The above is mainly our understanding of the whole distribution, and then how to use technical means for landing and practice.
For THE AI problem, what should be the classification of the entire delivery in the AI problem? The diagram below provides an explanation.
There are two dimensions to the AI problem. One dimension is to see whether the machine is faster than the artificial and whether the effect is better than the artificial.
Another dimension is the role played by AI. First of all, is it possible to perceive the world? For example, image recognition, speech recognition and OCR are all capable of perceiving the world like human beings. Secondly, is it possible to recognize, for example, “What’s the weather like today?”, not only to translate the pronunciation into text, but also the limiting factors of “today”. The third is to make decisions. Now the most popular ARTIFICIAL intelligence applications are in the “how to make decisions” level, and to make better decisions than people. Some typical apps, such as smart assistants, especially those that help people make decisions (chatbots are worse), can help you do better tasks; Driverless cars; In logistics, for example, how to distribute orders and deliver them through driverless cars or other means; In games and medical care, AI helps doctors make decisions, and in games, when users are offline, game AI can help users fight monsters and upgrade.
It can be seen that at the level of distribution, we will involve intelligent assistant, intelligent logistics, unmanned driving and other dimensions. In order to improve the overall intelligence of distribution, we built our own “Meituan Distribution AI”, which can be divided into two parts:
The first part is informatization, that is, data collection. For example, what kind of data would be collected? We need to collect data to a business circle, the circle to the community and the building level may be fine, a building where, if let the rider in the village, as well as collecting weather data, such as wind speed, temperature, fog, because all the data will affect the efficiency of distribution, users order situation, such as fog today, Take-out orders in Beijing are expected to rise.
The second part is intelligent, that is, to build a set of intelligent modules, constitute an intelligent distribution system, covering all links of distribution.
In order to achieve the challenging goal of “Meituan delivery AI” and consider the long-term development of the whole industry, our overall AI layout is as follows:
-
The first is the construction of breadth. Our goal is to aiize the whole process and link of distribution, and cover every delivery step from the beginning of the user’s order. Therefore, our overall technical direction is very wide, not only across the three major disciplines, but also from prediction, mining, pricing, planning, scheduling and hardware to carry out technical research and business implementation.
-
The second is the construction of depth. It not only refers to the technical aspects, such as basic computing framework and model research, but also includes the deep integration of technology and distribution business, such as the construction of distribution simulation platform, which has the simulation ability of multiple distribution scenarios and can accurately estimate the effect of different business strategies without going online. At the same time, the industry should be combined with the situation of the industry to provide intelligent solutions, such as in the rider operation, more effective rider incentive and retention mechanism design.
Meituan takeout voice assistant is a good example of our combination of breadth and depth. Next, I would like to share with you some experience of how to better implement artificial intelligence technology in the whole process of practice and design of intelligent assistant, as well as in the whole logistics business.
Meituan Takeout intelligent voice assistant positioning
Why do we need intelligent voice assistants? Under what circumstances does the rider need intelligent assistant service, and what is the key of the whole service? Explain the problem first. As shown in the picture above, this is some links encountered by the whole rider in the delivery process, which can be divided into two parts.
The first part is online decision making, and it involves a variety of decisions. For example, if the rider has an order and wants to send it to a user, he may make several decisions, such as whether to call the user or not, because some places do not need to call, such as residential buildings, the rider has a high probability of knowing that the user should be at home and does not need to call. Some must be hit, such as office buildings, because the rider can not get on, so you need to call the user down in advance.
But how much in advance? Is it one minute earlier, two minutes earlier, five minutes earlier? This problem is very critical. If the call time is early, users will come down early, which will cause users to wait for the rider. The user experience is not good, and there may be complaints. If the rider is very conservative and calls downstairs, but the user lives on the 10th floor, it may take 10 minutes for the user to get down, including waiting for the elevator, which becomes very inefficient.
The second part is the operation process of the rider, because the rider frequently interacts with the phone. He needs to check an order, and the process is very complicated. It takes five or six processes to take out the phone, unlock it, open the App, check the information, do the action (like click Finish), and finally put the phone back. If it’s fast, it takes 10 to 20 seconds. And many riders do it while riding, which can be very dangerous.
To sum up, the difficulties encountered by delivery riders can be summarized into three major levels:
-
First, the task is complex and requires a lot of decision making, although the complexity varies with the rider’s proficiency.
-
Second, it’s cumbersome, it takes about five or six steps, it takes at least 10 to 20 seconds, maybe longer.
-
Thirdly, it is very dangerous for riders to operate mobile phones during riding. For a platform with half a million riders, we have to consider the safety of the riders throughout the ride.
Based on these considerations, we developed Meituan Takeout voice assistant, whose positioning mainly includes the following three points:
-
The first point is the requirement of safety. We need to make a set of voice interaction scheme for the whole process. Every link in the distribution process can be operated by voice, which does not require the rider to look at the phone, freeing hands and making the rider safer. For example, in the process of driving, there is an order, the system asks the rider whether to take the order, as long as the command to answer, “yes” or “no”, or “OK”, the whole process is completed; You don’t have to pull your phone out and operate it, which is very popular with riders.
-
Second, the design of minimal steps, all operations can be completed in one or two steps, the first step is information broadcast, the second step through voice command to complete the operation, the original five to six steps, simplified to the present one to two.
-
Third, it provides many intelligent services. The most typical one is that the rider wants to go to a building, and the user may be on the 5th floor or the 4th floor. How long does it take for the user to get down? The intelligent recommendation is made according to the user’s address information.
The above analysis has basically analyzed the most critical points of how to land the intelligent voice assistant in the scene. We want to land, the most core is to help riders complete the delivery task, rather than “chat” or “q&A”. This requires the whole process of voice interaction to be very convenient, but also very intelligent.
The first challenge we encountered was how to design interaction patterns.
As shown in the figure above, the general voice assistant scheme on the left requires four steps: wake up, reply, request and re-reply, but it does not meet the requirements of delivery scenarios. First of all, the scene where the rider is in has a lot of noise, such as wind noise, car noise and mall noise, etc., which makes it difficult to wake up. Secondly, it needs four steps, and the working state of the rider should be considered. This operation process is too tedious.
So what? We thought, can we come up with a solution that doesn’t require a wake-up call? The answer is yes!
One, we have a lot of data. Including riders, users and merchants, this data is real-time, and we can learn much more about the global delivery than the riders. Secondly, we can make accurate prediction. By using machine learning, intelligent scheduling and other technologies, we can identify the next operation scene of the rider.
For example, a rider may have a few orders with me, he is moving in a place to go forward, through the scenario analysis, we know that he will give the concrete which user distribution, and we can understand the user in the building of several layers, down probably take a few minutes, so can be calculated out, probably call to remind the rider at which time is better. This way we can skip the wake up and answer process and send reminders directly to the rider, who can simply answer yes or no. Only in this way can the design be in line with the actual delivery situation of riders offline and truly solve practical problems for riders, which can be truly called “intelligence”.
AI Core Technology
The specific technology is divided into several main parts. The first part is the infrastructure, including speech recognition and semantic understanding, and there’s so much open source stuff out there right now that it’s not too hard to do general speech recognition.
In our scene, to solve the problem of various environmental noises, maybe the rider is not talking, but some noise nearby, car noise or other noise, or even a song is playing on the road, will be identified as the rider is talking, so VAD (mute detection) needs to do a lot of work.
Another basic component is NLU, natural language understanding. For example, if the rider wants to make a phone call to the tail number 6551, the system should first know that the rider intends to make a phone call, and then the operation of making a phone call should be activated. Secondly, we should know who is calling, the user, not the merchant, which is to find out the user information; Third, it is necessary to do testing. For example, if the rider has delivered an order, it may be wrong to call the rider again, so the rider needs to be reminded.
The just-in-time delivery scenario is a typical time series problem. As can be seen from the above figure, the scene contains contextual correlation. The behavior and decision of a rider’s history will affect the present, while the decision and behavior of the present will affect the future, which is a typical time series problem.
Scene identification to solve the two main objectives, one is the event prediction, to know what will happen at the next moment, such as whether the rider has arrived at the business, whether the business has been out of the meal; The other is timing prediction, when is the best time to call in the future?
To illustrate, take the case of a telephone call.
First of all, it is necessary to determine whether a phone call is needed. If you are frequently reminded to call when you do not need it, it is harassment for both the rider and the user. The figure above shows the proportion of calls made by riders in different address types. It can be seen that the proportion is very high in enterprises and office buildings, but very low in residential areas, where most users are at home.
Secondly, for each community and building type, give an appropriate time to call, that is, how long in advance is the best experience for riders and users. Call too early, users wait for riders downstairs, the experience is poor. Too late to call, riders waiting for users downstairs, inefficient. We have precise data on bike trajectory.We know for each building, for each neighborhood, how long riders are downstairs at different times on the phone, so we can plot a curve. The proper interval is between the two red lines.
The first two are mainly big data analysis, and the last one is to make real-time decisions about which order and when to call. Based on the real-time data of the rider, including order status, track status, environment and so on, the rider’s next delivery location and delivery task will be predicted in real time in combination with the previous big data analysis, and the voice assistant will give reminders at the appropriate time.
In terms of implementation, scene recognition requires three technologies: rider track mining, machine learning and data mining.
Just to start with tracks, we have billions of location data every day, and we can do a lot of things with that data.
First, it can accurately know the best navigation way between A and B. Compared with the third-party map, it can dig out A better way of riding between A and B.
Second, track data alone is not enough, we also need to solve the indoor positioning problem, indoor GPS positioning is not enough, need a new technology system. You have to design the hardware, you have to deploy the hardware at the store, you have to determine if the rider is coming to the store.
Third, the use of sensors, whether indoors or indoors, we not only need to know the precise positioning of the rider, but also know the mode of movement, such as staying, walking, riding, climbing or taking the elevator, these information not only judge what the rider is doing. It is also very valuable in pricing and scheduling because it can accurately describe the difficulty of distribution.
We can modify navigation and positioning by riding the trajectory. Let’s look at two examples.
The first example (left) shows the distribution of the location of the user when placing an order. Because people place orders indoors, the location deviation is very large. However, through the correction of the track of the rider, there are only about four points in fact, and each point can be considered as a doorway of the building, which greatly improves the positioning accuracy of the user and makes delivery easier for the rider.
In the second example (on the right), the cycling path at AB and AB points is corrected by the rider track. The track analysis in the figure above shows that the cycling path is shorter and it saves more time to cross the community. In the image below, the original map navigates across the middle overpass, but the track shows that more riders are going around, which is more realistic.
Here are some machine learning-related techniques that can be applied to various time prediction levels.
Only high-precision ETA (estimated time of arrival) estimation can more accurately predict the rider’s behavior. We will make detailed estimation of three dimensions, including the plane delivery time, the time for ascending and descending stairs, and the time for merchants to take out meals. In this way, the delivery process of the rider can be described comprehensively and finely.
To this end, we have done a lot of basic work, such as real-time feature platform, machine learning platform, including deep learning model and other machine learning-related work. At the same time, we will also work on the construction of the distribution knowledge map, such as refined address resolution.
Address is very important information for distribution. Through NLP and map search, it can be analyzed into hierarchical structure, which is very helpful for analyzing the portrait of business circle and building dimension. We divide an address into four levels: community, building number, unit number and floor. There are many practical problems to be solved, such as non-standard and ambiguous information filled in by users.
It’s interesting what actually happens when you do that. We analyze it through the specific scenario of “time up and down stairs”.
The first picture shows the time of going up and down different buildings. The two on the left are the time of two buildings in Xiamen, and the two on the right are the mean values of Xiamen and the national average values. It can be seen that the duration of ascending and descending of different buildings still varies greatly, which cannot be replaced by simply using the mean of urban or national dimensions.
The second picture shows the ascending and descending time of different floors, from B2 to the 8th floor. There is a very interesting is that the time is not linear with the height, about the second, third and fourth floor, the interval is very long, but to the fifth, sixth, seventh floor, the time difference is very small. The reason is simple: at lower floors, riders may choose to climb. The upper floors take the elevator. The time between floors is short, and the higher you go, the shorter the interval.
The third chart shows the distribution of the duration of going up and down stairs in different cities. The most interesting line is the yellow line, that is, the overall duration of going up and down stairs in Chongqing is obviously longer. Because Chongqing is a mountain city, houses are often halfway up the hill, and compared with the plain, it is of course more difficult to get up and down the stairs.
The overall effect
The scene recognition technology relied on by voice assistant is introduced above. Now let’s introduce the overall effect of voice assistant. Firstly, the voice assistant provides four core functions, including customized earphones, voice interaction, scene recognition and intelligent guidance.
Why custom headphones? In the rider’s environment, there is a lot of noise to overcome, which is hard to do with software and programs, but has to be done with hardware. So we cooperated with the manufacturer to customize the hardware with good denoising effect.
The second function is voice interaction, which can realize voice interaction in the whole process of delivery, such as ordering, inquiring, picking up food and making phone calls. Riders do not need to look at their mobile phones during the whole process, but can complete intelligent delivery as long as the headset reminds them.
The third is the intelligent guidance function, including safe driving reminder, information broadcast, mission map guidance, etc., mainly to make riders safer driving, provide comprehensive information services, make the delivery of riders more convenient and efficient.
The following figure shows some actual data in the offline promotion of intelligent voice.
The blue line is the number of operations performed by the rider using the voice assistant, and the green line is the number of operations not performed. As you can see, the number of operations has dropped significantly. But it hasn’t fallen to zero for two reasons: riders don’t need to use a voice assistant when they’re at rest; Some riders’ Bluetooth headsets are not yet in place. Let’s look at the next picture:
The figure on the left shows the distribution of the rider’s order receiving time. The farther to the right, the longer the rider’s order receiving time is, the worse the user experience is. The green line is a distribution of manual order receiving by the rider before, and the long tail situation is serious. Through voice receiving, the time of receiving an order is obviously close to the left, and the overall time of receiving an order is significantly reduced, which improves the user experience.
The graph on the right is the proportion of time that the rider spends when the user delivers the takeaway, and the horizontal axis is the time the rider spends waiting for the user downstairs, and the farther to the right, the longer the rider spends waiting for the user downstairs. After the voice reminder, it can obviously reduce the situation of riders waiting for a long time and save a lot of riders’ time.
Write in the last
To sum up, speech recognition and voice assistants face many challenges in the actual implementation process, and most of them are related to the scene. Scene recognition is very important, even more important than speech recognition.
Because speech recognition is now a more general technology, with many specialized vendors offering services, as well as hardware, it is relatively easy to customize. Therefore, at present, it is not a problem to make a voice assistant combining hardware and software in terms of basic technology, and there will not be too big technical obstacles to make a DEMO.
On the contrary, in the specific business, how to combine business scenarios, the implementation of voice assistant, we need to really consider. In other words, how to make the voice assistant from “usable” to “usable”, and then make users “willing to use”, these are the real challenges facing the voice assistant in the future.
Speech recognition and speech aides in the actual process of ground have a lot of challenges, and related to the scene, the scene recognition of the more important, even more important than voice recognition, because the speech recognition is now one of the more general technology, how to combine the business scenario, assistant to the ground, with a good speech, may be the challenges of the future for a period of time.
In order to realize the comprehensive intelligence of distribution, Meituan-Dianping has done a lot of work and attempts. It is not only about machine learning, but also about how to better optimize real-time operations, real-time spatial data mining and human-computer interaction.
Author’s brief introduction
Renqing, Head of delivery algorithm strategy direction of Meituan Dianping. In 2016, he joined Meituan Dianping and took charge of the overall algorithm direction of Meituan’s delivery business. Including intelligent scheduling system, intelligent network planning system, machine learning platform, distribution simulation platform and so on, fully support meituan special delivery, fast delivery, errand and other business direction development. Before joining Meituan-Dianping, he worked as T9 architect of Baidu Phoenix Nest team, engaged in search advertising NLP, data mining and search technology research.
If you are interested in our team, you can follow our column. Meituan-dianping takeout delivery team welcomes talents in big data, algorithms, machines and other fields to join us. Please send your CV to: sunbiqi#meituan.com
The original address: tech.meituan.com/herenqing_a…