A few days ago I wrote an article “which occupation is easily replaced by machine algorithm”, many people do not think so: I surf the Internet every day, how did not feel machine algorithm? Really so smart, register an account is password and security questions two-step verification, why not make a little smart?
At this stage, machine algorithms do not mean robots with advanced intelligence, nor do they mean bionic people with human emotions, but algorithms do play a variety of roles in our lives. Such as when you open the browser to hang out on the Internet, you will find a website an AD will appear a picture of your preferred brand badminton shoes, point once you find inside can directly buy the shoes, because you appear such recommendation on this site a few days ago bought a badminton racket with the brand. To learn about artificial intelligence, you buy a book called Deep Learning. When you pay for it, you’ll see several books like Machine Learning in Action and Python Machine Learning pop up at the bottom of the page. You can’t help but buy another one…
This is the power of algorithms, specifically, recommendation algorithms.
With the development of information technology and the Internet, people have gradually stepped into the era of information overload from the blind spot of information shortage. Recommendation system with recommendation algorithm as the core technology has been widely used by virtue of its characteristics of personalized recommendation and effective reduction of information noise, such as Foreign Google, Facebook and Domestic Toutiao.
However, just as programmers and engineers tend to think of computer repair, many people, especially those in the non-IT field, have a great misunderstanding of the two edges of arithmetic and magic in their understanding of algorithms. Next, I will take Toutiao in the field of content recommendation and Amazon in the field of product recommendation as examples to talk about recommendation algorithms, so as to help readers better understand the Internet life in this era.
Myth # 1: Recommendation algorithms are based on click-through rates
This is probably one of the biggest misconceptions about algorithms.
We often say that recommendation algorithms achieve personalized recommendation effects, and everyone sees things differently. This ignores an important fact: the things most people like are actually highly similar, like the hottest pop songs or the latest celebrity gossip.
Many years ago, Headlines appeared and shouted out what you were interested in. Portal sites feel very common did not follow up, but also fell into the algorithm is equal to the trap of click – according to the heat of news row, is the major portal site has long had the function, what is new?
The personalized recommendation that can really dig the long tail is actually anti-click, otherwise it is difficult to realize personalized demand mining. The system needs to follow up more user information dimensions and various algorithm models to discover and mine the long tail requirements. A famous example was given in The Long Tail. In 1988, Joe Simpson wrote a mountaineering book, “Reaching the Top,” which has been a modest seller. A decade later, another book about a mountaineering disaster, Into Thin Air, became a publishing sensation in the United States. Amazon recommended Reaching the Top to Deep Readers of Into Thin Air after finding that readers had mentioned and given high reviews of Into Thin Air. Soon, After a decade of dismal sales, “Reaching the Top” became a huge success.
In effect, Amazon does what the algorithm recommends it does now. The recommendation process should not only consider the user’s reading track, but also consider the user’s gender, age, even mobile phone models and other information. Meanwhile, it should also comprehensively consider the timeliness of news, as well as geographical location and other information to recommend corresponding content. On the other hand, Hit the Top might never have been featured if it had only been clicks (sales).
Misconception two: the refrigerator is bought to recommend the refrigerator, the point does not like to recommend, the algorithm is not smart
If you only have one friend contact person on wechat, will you think it’s fun?
Circle of friends needs more friends, and algorithm recommendation also needs more data. The amount of content a system or platform can recommend to new users is astronomical. Take Taobao for example, in 2013, the number of online goods on Taobao exceeded 800 million, 800 million candidates, which one to push?
At this time, click or browse goods/articles, obviously the weight is the highest. This is especially true for e-commerce companies that directly sell goods. Therefore, whether it is Foreign Amazon, domestic Taobao and JINGdong, after practice, the current browsing content is the most important recommendation factor.
What’s more, it’s not necessarily a stupid algorithm to recommend a refrigerator. It’s probably just a simple matter of strategy — if you buy a refrigerator, your friends might ask you about it. If you see a new refrigerator that you like better, you may choose to return the original refrigerator during the return period and buy a new one. And this strategy is likely to result in a significant increase in the final sales figures.
The same goes for clicking “not interested” on related news. When you first came to Mr Obama’s speech, click on the “not interested”, the system didn’t know that you are interested in Mr Obama does not or are not interested in the speech, or simply don’t like the topic of the speech, so it will continue to give you recommend related topics, from the overall Numbers, the recommended strategy is sometimes better.
Of course, in order to prevent the emergence of over-fitting, personalized recommendation will predict the preferences of similar users through rigorous mathematical theoretical analysis and calculation based on the reading records of readers, and predict other preferences of similar users according to the association degree of interest tags, and carry out “associative” recommendation. Such as when reading machine discovery “presidential election” information user groups, there is a big part of people at the same time focus on “stock” information, then the machine will recommend “stock” information to that part of the focus on “presidential election” but has yet to pay attention to “stock” information, not a single recommendation “presidential election” of information.
Myth 3: Recommendation algorithms lead to “information cocoon”
One theory is that algorithms create information cocoons because they only push you what you like.
Broadly speaking, this argument is twofold. One is that people only care about their own small world and don’t see more important and meaningful public events. The second is that the algorithm knows more and more about you. If you like Trump, it only recommends good news about Trump. The final result is “information cocoon room” and partial eating.
That’s actually not true. In practice, it is difficult for the algorithm to realize “information cocoon house”. Public events become public undertakings because of their publicness, which determines their natural penetrability. All algorithms will give high weight to such events, otherwise it will violate the original intention of accuracy of algorithms.
Secondly, about attitude tendency. Because everyone may be interested in a lot of articles, in professional terms, the data is very sparse, so for the algorithm, both positive and negative emotions are positive correlation to a topic, the correlation itself is greater than emotion. The translation is that whether you hate Trump or love Trump, performance in a statistical sense is highly relevant to the topic. For the algorithm, anything important about Trump would normally be recommended to you first.
From the perspective of philosophical speculation, “information cocoon room” may have its significance, but in practice, it is impossible to appear such extreme situation. In addition, in the Internet era, due to the great abundance of information, any choice will filter and screen the information itself. Your micro blog and moments are also a cocoon of information – because what you see is what your friends care about.
Misconception four: the technical content of the recommendation algorithm is not high, just take the Cookie information according to the algorithm model
First, strictly speaking, an algorithm is a process of solving a problem, involving specific inputs and specific outputs. The mathematical formula we mentioned is only the theoretical basis of the algorithm. Both recommendation algorithm and deep learning network need not only the theoretical basis, namely the formula, but also the corresponding mathematical model realization, and this realization process is dynamic and needs constant adjustment.
In fact, self-correction and learning of algorithms are very important. For example, AlphaGo constantly plays chess with humans to optimize its own model and improve the accuracy of algorithms. Recommendation algorithm is no exception. Personalized recommendation will give feedback optimization along with users’ reading track and behavior record, and gradually improve its accuracy. Public information shows that Toutiao makes some optimization and adjustment to its algorithm model every week, and has carried out 4 large model iterations of its algorithm in the past year. Amazon has also made numerous improvements and optimizations to its recommendation system over the past two decades, resulting in today’s highly accurate recommendation results.
In the PC era, recommendations were very primitive, which consisted of keyword matching with Cookie data in the browser. Many people will think that the current algorithm is not like this, but more user age attributes, gender attributes, preference attributes, and then put into the formula, gender * 0.3 + age * 0.5 + preference * 0.2, plus some geographical location and other attributes, can be recommended.
In fact, this is what was recommended for 1.0 about 20 years ago. Nowadays, building, using and optimizing recommendation systems is a very complex process. For example, the establishment of recommendation system includes user-based, association rule-based and model-based recommendation. A good recommendation system at present does not only adopt a certain recommendation mechanism and strategy, but often combines multiple recommendation methods to achieve better recommendation effect.
Myth 5: Recommendation algorithms are developing fast, and they can have insight into human nature in the future
The emergence of recommendation algorithm improves the efficiency of information distribution and solves the problem of information overload. Although personalized recommendations need to use certain user characteristics, they are mainly based on public characteristics and directional content, so it is difficult to comprehensively describe a person, let alone to understand human nature. To truly understand human nature, algorithms need to know you better than you do, and with today’s technology, it’s impossible for algorithms to achieve science fiction insights into human nature.
More importantly, any algorithm will have counterexamples. Simply put, if a classification algorithm were to separate men and women based solely on hair length, some men with longer hair would get classified incorrectly. As a new technology, machine recommendation still needs to be optimized and improved, which is also the direction of many scientists’ efforts. Of course, a good algorithm, proportionally, will categorize the vast majority of cases correctly and recommend them effectively to users.
Myth 6: Algorithms are open and barriers to competition are low
First, data is a very important barrier. The recommendation system really applied to industry needs a lot of data for modeling calculation. It is not simply a small amount of data. In general, hundreds of millions of data and hundreds of millions of attributes and features are needed to make recommendations. Without data, only theoretical basis is an empty talk.
Therefore, if you want to make a good recommendation system model, you need to build a very large and mature engineer team on the basis of big data. Google and Microsoft employ a large number of high-end talents to optimize the recommendation algorithm, just to do special Feature Engineering for some specific knowledge points. Nearly half of the employees of Domestic Toutiao are technical engineers.
Some algorithms may perform very well in referral-algorithm competitions, but this is not to say that it is an optimal algorithm model. It is likely that the machine has learned all the features of the sample data and acquired too many local features and false features, forming overfitting. When you use it to identify new data samples, the recommendation accuracy can be very low.
Algorithmic models must be learned and evolved from a large amount of data, and no single machine model can be used as an authoritative rule. The learning and evolution of algorithms is also a barrier. In other words, even if Zhang Yiming leaves Toutiao himself and makes a new set of recommendation algorithm, it cannot reach the level of the current recommendation algorithm of Toutiao.
The peach seller said:
If you’re an engineer, if you’re reading this, don’t you think data, algorithms and math are important? Anyway, I’m gonna go study the algorithm.