Please pay attention to the wechat public account “AI Front”, (ID: AI-front)
Statement | this is AI the front sole, without permission, shall not be reproduced!
The text version is as follows:
Hello, I’m Wenkel. Today, I am very glad to share with you my experience in the e-commerce industry through the platform of Greedy Technology.
I used to work for KPMG in Southern California as a data-related information service, and then MOVED to Revolve, which is the most popular Fashion e-commerce provider in Los Angeles and also a popular Fashion e-commerce provider in North America. Last year I moved to Beverly Hills, an e-wine business in Beverly Hills called Drinks, which is a startup.
Today is goddess Day (March 8), presumably goddess (students) should be no stranger to e-commerce. It’s also useful for men to learn more about e-commerce. So today our topic is the new favorite of e-commerce, which is data science and AI technology. These two industries are surging in recent years, and e-commerce is sure to have great favor for them.
So let’s take a look at what is e-commerce and the classification and status quo of e-commerce in North America.
What is e-commerce? Personally, I think the following definition is appropriate, which is: Business model enabling a firm or individual to conduct Business means enabling a firm or individual to conduct Business through an electronic network Typlically is the world Wide Web that we are familiar with, Internet.
E-commerce is everywhere now, it can be said that it runs through everyone’s food, clothing, housing and transportation. Just like the shopping cart in the picture on the right, everything can be inseparable from it. Buy and sale can be carried out in the shopping cart or related to it.
Let’s take a look at the main traditional classification of e-commerce is divided into four categories: B2B, C2C, B2B and C2B. The two most important points are Business, that is, B stands for Business enterprise, and an indispensable part is our Consumer.
One of the most common Business models is B2C, the so-called buying and selling transaction between merchants and individuals. To give you a few examples, in North America, the two biggest players by far are probably fairly familiar, Amazon.com and Netflix.
Amazon.com stock has gone from $900 last year to $1,500, $1,600 now, and is expected to double in no time. Its value is getting higher and higher. Netflix is a Video e-commerce provider. Before, it was a small CD that was sent to your home. As long as you are a member, you can choose movies for free every month and change the movies you want to watch. These two are the most typical B2C at present.
The traditional industries, like Walmart, Walmart.co, Target, Best Buy, they are all traditional offline retail businesses, and now they can’t wait or they have to be forced to do all kinds of things online on.com, where you can order, browse, Go straight to the store.
The second common pattern is consumer-to-consumer, the so-called exchange of transactions between individuals, which is a rough picture, but it should make sense to you.
The originator of consumer-to-consumer and consumer-to-consumer exchange should start from EBay, where individuals put the items they want to sell on the Internet. EBay provides this platform for others to purchase and bid for these items.
There are other examples, such as Airbnb, a bay Area unicorn company that has become popular in recent years. It is a kind of rental of idle houses and personal space, through which users can rent rooms and houses that they think are redundant or temporarily unnecessary to others.
Etsy is an interesting DIY lover, that is, a website for APP exchange of craft lovers. You can put your art, pure handmade things on this platform, so that other craft lovers and artists can exchange and trade with each other.
Uber should be familiar to everyone. Like Didi in China, Didi has merged its Chinese business. Uber started out as a ride-sharing service for private cars, or as a temporary driver to help users hitch a ride. Now it has Uber Eats, which, like Meituan and Ele. me, can deliver as well as pick up customers.
I would like to mention amazon.com, the predecessor of amazon.com, which launched Prime Now service and started to try food and catering services. Vegetables and fruits can be sold, and users can also open their own online stores on Amazon for communication.
The third type is business-to-business B2B, the so-called direct communication between businesses, which is enterprise-level.
Let me give you an example. Amazon Web Services is AWS, a cloud technology service center owned by Amazon.
Some of AWS’s most enduring industries:
The first one is databases, fast parallel database connections like Redshift, which is in the cloud;
The second type is Storage, S3 (Simple Storage Service). This is the cloud Storage, which claims to be unlimited and can be stored indefinitely as long as you pay. There’s EC2, the Elastic Compute Cloud, which is enterprise-class servers, so we have a lot of EC2, a lot of Compute Cloud servers in the Cloud so that we don’t have to have another computer room, Or you could open a data center to store the data and web servers, and just pay Amazon to keep you running, which is where Amazon is definitely profitable.
Microsoft also opened Microsoft Azure, cloud storage, cloud computing related services, should be in competition with AWS, they are not too much content. Google also offers Google Analytics, but this is a little different: GA is not a solution to the cloud server, it is not a solution to this problem, it mainly provides tracking, data flow records. How can I put it? It’s like every move you make on various e-commerce sites, like where you click, how long you browse, what kind of device you’re on, whether you’re on a tablet or a computer, and it records all of those user actions. Many e-commerce websites will use GA to do related reporting, that is, data analysis and processing.
Another example is Square, which is an interesting and novel B2B model. It is mainly used in small and medium-sized businesses, such as food Truck (mobile food truck), which is common in North America, and self-employed people. They can use mobile phone, and now NFC is also used. Square will charge a certain service fee for Apple Pay or other mobile phone network connection or 4G connection card swiping service. However, this service is very mobile and does not need to be specially connected to POS.
The last common category is consumer-to-business, transactions between customers and merchants.
A few common examples:
The first is Google’s advertising division. Open a free web page or free APP in North America. Although many web pages or apps say they are free, they come with ads, which are provided with Google AdSense.
For example, there are often links that pop up at the bottom left or right foot of a web page, usually automatically determined by AdSense to help filter out ads you might be interested in. When you click on these ads, you’re actually making money for Google, but you’re also using free features, so you’re essentially trading your time for free features.
The following two examples should be better understood: Survey Monkey and Survey Gizmo do business surveys. After users complete surveys online, merchants will pay for the feedback, and the corresponding commission will be paid to these two platforms. As a Consumer, if you take the time and your personal information to do this research, you may also get some reward, such as discount cards, discount coupons, vouchers, not necessarily cash back, from the Consumer perspective to cooperation.
Let’s take a look at the application of data science and AI in e-commerce.
Data science, as described here, is also data-driven science, Data Driven class, which is a cross-science approach, usually it combines multiple algorithms and systems domains to provide support and indirection in providing Data in various forms, as shown in the diagram on the left.
Generally speaking, the first step of the e-commerce industry is to browse the items in front of the computer, place the order, put it in the shopping cart, swipe the card and pay the bill. The following order is received, swipe the card successfully, and send you a confirmation email. Both the Customer and the merchant receive the corresponding email, and go to the warehouse to pack and prepare for delivery.
The whole process will generate a lot of Data, such as Traffic Data, Traffic Data, such as User Activity, namely User behavior, including Impression, Impression flow; Click Though C. Session is how much time you spend looking at the corresponding item; Email Activity is traffic on the mail side, including open and click, subscribe, or unsubscribe, which are all traffic data.
Order History is the record of placing an Order and what kind of things have been bought and sold. Membership Subscription is a form of Membership that you subscribe to when you start, for how long, and for what kind of Membership.
There is also user information data. User information is related to individuals. The examples I cite here are Demographic and Geographic. Demographic means the user’s gender, age, income, where he/she lives, whether he/she has a house or a car. Geographic is Geographic information, whether you usually live in a big city, a second or third-tier city, or in the suburbs, that’s Geographic information.
Similarly, each customer will have his own Traffic/Transaction Data, which is the Data of browsing information and placing transactions. There are many different kinds of data in e-commerce, and that’s where data science AND AI technology is used, is to extract knowledge and some useful insights from data.
Let’s look at three more interesting examples.
-
Amazon GO, which Amazon officially opened to the public earlier this year;
-
Smart Speaker is our now more popular intelligent sound;
-
Netflix Artwork, a tutor has mentioned how to do user-level recommendation before, about film recommendation.
Amazon GO is an unmanned store, and Alibaba also has similar unmanned stores in China. It is estimated that the technology is similar.
Amazon Go is currently running a trial in Seattle, where there is reportedly a line to get in because so many people want to see it.
As shown in this picture, this shop does not have specific salespeople to settle accounts when they go out. Its main technology, I have checked relevant materials here, is Computer Vision for dynamic recognition and behavior judgment of actions by Deep learning. There are also various sensors and Sensor Fusions for membership and item detection.
When entering the house, you need to download an Amazon GO APP. After scanning the APP, you can use the Sensor to check the user’s appearance, relevant information, what they have bought, and whether they are members or non-members. After scanning the QR code, users can enter. Once inside, there should be cameras all over the ceiling, not in the photos, but your every move should be fully recorded.
Let’s take a look at Computer Vision dynamic recognition. What is the core of this technology?
The main technical core of Computer Vision is the Convolutional Neural Network, which is called Convolutional Neural Network.
Here is a simple example, that is, from the very beginning, usually a picture, an English it can be divided into three colors: RGB; Sometimes you might have some graphics, some transparency: RGBA. Each RGB, red, green and blue will have corresponding values, and the convolution is actually a fillter for each layer of color.
Here’s an example: Use a 3×3 filter, which means to extract the feature of each frame Angle from each corner of the image. As described in the figure, it defines a Stride from left to right for the 3×3 filter, where the Stride should be 1. After all the steps are swept out from left to right and then from top to bottom, The corresponding layer is reduced to a Convolved Feature, which is a reduced matrix. So this matrix can often be Convolved as a convolution layer.
If you do a filter convolution like this, you can also do a simple Max pooling, and get the maximum Value from a 2×2 filter, and a two-step deployment, and get the maximum Value from each 2×2 box, like this box we get the maximum Value from the 6, and the corresponding 8, for each layer. This approach helps us to reduce the whole image, the various colors of the image of the various features extracted.
It can be seen from here that the reduced matrix graph can be obtained by using different filters and steps, and then the corresponding layers can be put into different matrices after convolution and pooling. In this way, the neural network of the next layer can be directly output to extract the corresponding features from all angles and directions.
Here’s an example of what a convolutional neural network can do, which is divide a static image into 3 layers, do convolution and Max pooling for each layer, extract its maximum features, and do several times of convolution and Max pooling for different combinations, and generally this should work. Go to the back, compress them all further, and finally tell you whether it has a dog or a cat or a boat or a bird in the picture.
Each identification has a probability value, and the higher the probability value, the greater the probability of all the objects in the picture, or what kind of objects are in the picture. Here is the boat, and you can see that it does have two boats.
The convolutional meridian mentioned just now is a judgment of a static picture, so we can judge what kind of things the picture has. In fact, everyone in Amazon GO is mobile and walks around when choosing goods. We must further make use of the dynamic recognition of CNN convolutional neural network.
Dynamic recognition algorithms also have a history of rapid development in the past decade. So a quick introduction to Sliding Windows from the very beginning, a Sliding window algorithm, and the concept is: In each static picture, I define a window like filter, in which I scan continuously from left to right and top to bottom. Each window will make a judgment, and each small window has a corresponding picture to determine whether there is an item I want inside. This is to determine if there is a car.
In the sliding form algorithm, the form can basically be fixed and swept until the target object is scanned. For example, the match degree of the car is 0.90 or so, then the window with the highest probability can be found, and then confirm that there is indeed a car here.
The disadvantage of this algorithm is that it is very inefficient. At the beginning, a large image is decomposed into many smaller ones, and continuous scanning is also a challenge to computing power, because it takes a lot of time to constantly scan and determine which picture is most likely to have this item.
One algorithm that has become popular in recent years is called YOLO, which stands for You Only Look Once. What it means is that you are given an image, a sample of a period of time, and you can quickly identify the object in that image.
How does it work? The main idea is to analyze the picture according to the grid, divide it into many small grids, and judge where the Center of the object will be in each grid. Let’s use this car as an example to figure out where these centers of mass are. It will also determine if there are objects in the small grid, providing a high probability if there are.
When all these small Windows with cars are combined into a large window, it is necessary to set a corresponding field value. If the field value exceeds a certain amount, it indicates that there is indeed a small window. When all the small Windows are connected, a larger picture will be obtained. CNN will also be used to make a judgment to determine whether the probability is improved or decreased, so that it can be known that the large picture should be a complete item pieced together by each small picture.
Of course, there are many complicated processes in this process. For example, it may be found that other boxes can also represent a car, there may be many connections between various small grids, and a relatively complete object can be obtained. You need to decide which box best represents this object.
In a word, with YOLO algorithm and the most popular GPU, the dynamic picture can be quickly extracted from the small box of this article, and what kind of article is it, whether it is a person or a commodity? In this case, it can be quickly determined whether the user in this picture has taken something.
Moving on to Patent, Amazon filed a Patent in 2014.
Can see: each item it is numbered, camera and network are related to the code, video recorders, cameras can all real-time processing, each go to different customers shelves next to take the corresponding a coding, after the user took system in order to know whether this item exists, its weight will change, will have its corresponding image changes.
Each item has its own code, which helps the system determine if the item has been removed. Therefore, there are many data sources for judging goods. First of all, we can judge whether there is anything in the compartment of goods based on pictures obtained from convolutional neural network, weight and pressure changes, and whether there is anything in the compartment of goods. Meanwhile, we can also judge whether there is a transaction behavior based on the user’s past transaction records.
Deep Learning can also be used here. As input, it can judge whether the user has bought or taken or put it back, so as to make a corresponding decision.
Let’s move on to the Smart Speaker.
Since 2014, Amazon has invested a lot of money to develop Alexa, a smart speaker product. Then, after everyone found that the market was particularly good, Google also launched Google Assistant. Microsoft also launched Cortana, a speaker, in 2016. By the end of last year, Amazon had introduced more upgrades, including the Show and Look, which have cameras; Apple also introduced Homepod, another smart speaker, and it looks like it will release an updated version this year, tied to Siri; Samsung has also launched Bixby, which is often mentioned in its advertisements.
Smart speakers are a hot product, and major e-commerce technology giants are launching products to help people place orders and buy and sell things online. So let’s focus on the Amazon Echo, which was the first product, or smart speaker to break into the market.
Echo now has very powerful functions. It can be connected to various apps of mobile phones, listening to music, radio, news, watching TV and calling a car, and can be used to watch Amazon fireTV at home. At the same time, smart appliances such as temperature switch and light switch can be controlled by smart speaker.
The main technical background that we should see here is speech recognition and speech analysis. Speech recognition allows smart speakers to understand what people want to do in English and Chinese, as well as in Arabic and Japanese.
Here’s a look at speech recognition technology, which has been developing rapidly in recent years:
From the beginning, each phonetic band, like the phonetic band in the picture, is extracted to extract the corresponding Phoneme called Phoneme. Each Phoneme is extracted to extract features, just like the vowels and consonants in our pinyin. Some representative pronunciation is composed of phonemes, and these elements are extracted and pieced together into corresponding words or phrases. This is what started out as a more engineering technique for speech recognition, and has now evolved to use RNN, which is a recursive neural network.
Recursive neural networks are different from traditional neural networks in that they are recursive, which means that each neuron (see figure above) is connected to each other. The output of a, the activation function on the upper layer, can be directly output to the next function, and so on. At the same time, each corresponding input, a different word, a different phrase, goes into each neuron individually, but these neurons are connected to the previous neuron, a more complicated network, we could say connected in the forward direction, or connected in the opposite direction.
And then when you train this network, each neuron will output something like Y1Y2, and you’ll figure out what each Y means. If you need to extract a name like “Teddy Bear” the output would look like this: 0011000. Now, in a more complicated case, Y can become an entire vector output, and if it has a value of 1 in a thesaurus, for example, “Teddy”, then all other unrelated words in the thesaurus will be judged to be 0.
It means that it determines that the pronunciation of a word at the corresponding stage is close to the corresponding word. It skips the more traditional step of phonemes and determines the meaning of the sound file by looking directly at the length and breadth of the entire sound. This is the general working principle of Voice Recognition.
With Voice Recognition, we can further identify the corresponding text to do NLP, natural language analysis, but also semantic analysis.
Semantic analysis is also a hot subject that can do all kinds of things, such as the most common:
-
Word frequency statistics, by calculating the number of times the word frequency appears in a certain file or document, or counting its frequency in this article or the whole library, we can calculate its orthotopic word frequency statistics. With this data you can use it as data input.
-
NER refers to Name Entity Recognition, which is specialized in the Recognition of objects related to names and nouns.
-
POS is Part Of Speech. In Chinese, subject, verb-object is subject. In English, it is adjective, noun and pronoun.
-
N-gram is the combination of word frequency, for example: Cat is a word, Running Cat is a phrase, it is two words, 2-gram, n-gram is to dig out these high-frequency combination of words.
-
Word embedding is very interesting as it can classify words according to their categories. For example, Man and Woman can be classified according to gender and so on. It can be expanded as a correlation vector, which is Word embedding and each Word is endowed with deeper meaning.
With all of this, we can do further analysis, such as emotion analysis, like or not like, positive color is negative color, like some scoring systems, is done with emotion analysis.
We can also continue to make GloVe/Word2Vec, using Word Embedding to expand the Word into the corresponding vector space, so that we can judge its composition in the whole sentence, or say the specific meaning in a paragraph related to the following, or make a judgment and prediction for the coming context.
Also can develop chatbots, with these NLP means to chat with real people, or to solve some simple real people’s problems. For example, turn on and off the lights, turn on the TV, extract the key points from the voice analysis, to achieve the desire to complete.
Let’s take one last look at Netflix Artwork.
It’s an interesting example, using data science and AI. The picture on the left is a popular science fiction TV series in North America. The main plot is that children go to the imaginary space adventure, from the real to the imaginary.
How should these posters be promoted for different groups of people? Here you can extract the corresponding illustrations for each plot of the film as posters. Friends who like horror movies may see bloody ones or flaming ones. Viewers who like kids might see posters with lots of kids; If someone likes a certain star, then his or her poster should be of interest to you as a fan.
Here we need an effective recommendation engine to recommend to the user. After the user sees the poster, do they want to click play?
The traditional method is to first collect data, build A good Model, do various A/B testing, and then formally produce the Model and implement it in the front end. It can actually take a long time. It can take weeks, it can take months. During this period, may the user’s preferences, great changes will occur in the final after launch, the model to predict before the grade of the user at this point in time and, instead of model is not clear, produce very big differences, not to want, or is not recommended to the user really like.
The latest algorithm of Netflix is a means of Reinforcement Learning, which is characterized by rapid iteration and continuous optimization.
Let’s take a look at how it is implemented. From a simple start, its core idea is simply: the algorithm of multi-arm Bandit defines a state in RL, and the number of REword can be obtained through reward function Q. Like this octopus, each step performed will have different effects. Each machine is like A gambling machine. When gambling machine A and gambling machine C are started, they may lose or both may win, and their values are different.
The formula in the figure above is defined here: efficacy should equal reward plus gamma times future efficacy. If gamma is 0, we don’t take into account the expected value of the future, what the reward was last time, and we just keep doing it in this way and don’t consider other possibilities.
For now, Netflix seems to have the best way to do that, using a Contextual Bandit algorithm called Contextual Bandit.
The contextual variables that define a user are contextual variables, and each user has a different background and preference. The equation is complicated here by adding the learning rate, which means that the Q of Customer’s future needs to be predicted through modeling.
After adding the learning rate, if α is equal to 1, both sides can be removed and MAB algorithm can be replaced. If the model doesn’t need to think too much about the future, then use the simple optimization algorithm, according to the results of each time, choose the model that gets the highest utility each time; If you want the RL model to learn more and explore more unknown situations in the future, assuming α is not 1, the effect of Q ‘can be reflected. Q ‘is to use a model similar to the deep neural network to make predictions. For example, the client’s browsing history, personal background, and perhaps some relevant information such as age and gender are taken as input. Maybe you can get a lot of customer information, maybe millions, and do a deep learning Training on that information.
The example I take here is stock market trading. With corresponding budget and Shares and transaction records, a deep learning network can be made, which can make decisions: whether you need to buy or sell or hold. And this example, on this side, is the prediction of future returns that you can use Q prime at the end. With the whole system, you know the recent return, and you can predict the future return. Of course, this Q is too elementary and may be used in continuous iteration. In this way, it takes more variables into account than the single modeling mentioned earlier, and it iterates more quickly.