Yanxiong Lu graduated from Xidian University with a master’s degree. He has been engaged in natural language processing and machine learning for many years with rich experience. He used to be the head of micro and Micro robot technology of wechat and is now the leader of integrated search algorithm group of wechat.

Lu Yanxiong, who has been studying and working on natural language processing for many years, said in his book “Algorithms on Text: Natural Language Processing in Its Simplest form” : “I still remember when I started to work after graduation. With almost nothing in my professional knowledge, I had a strong desire to learn. In my spare time, I read various books and papers. These notes are the witness of my growth, as well as some of my personal summaries and reflections, but they are always scattered, so I want to organize them into a more formal document for easy reference. These knowledge will be used in our daily study and work, organized into a document can also be used as a reference for others; Besides the essential formula, I also hope to express it in a more colloquial way, abandon the tedious proof, touch the core of the algorithm, and make it as simple as possible. When I finished sorting out the document, I put it on the Internet, and received unanimous praise from the netizens, which was an unexpected harvest and made me very happy. So, I enriched some of the content, and wrote this book. At a higher level, natural language processing is still in its infancy and far from understanding language, so I hope this book will arouse more people’s interest in improving natural language processing technology.”

1. Asynchronous community: Please make a brief introduction to the readers of the asynchronous community. What are you up to these days? What projects?

My name is Lu Yanxiong. I have been engaged in work related to natural language processing and machine learning since I graduated from Xidian University with a master’s degree. I have been engaged in a dialogue system project in wechat Beijing, and I am currently engaged in a soyso project in the Wechat Search application Department.

2. Asynchronous Communities: What inspired you to write Algorithms on Text: Natural Language Processing in its Simplest form? What are the features of this book? I heard that the electronic version of this book went through several iterations, and by chance, the paper version was published. Is there any interesting story?

Most of these are notes I wrote in the past, which were later organized into a slightly more formal document and put on the Internet. However, they received a very good response, and I heard that some people specially print out the electronic version for reading, so I decided to perfect the content and put it into a book. As for the characteristics of the book, I personally think it is readable, down-to-earth and comprehensive. I hope it can help and inspire you.

3. Asynchronous community: As the leader of wechat integrated search algorithm group, what are your thoughts on wechat search algorithm? What are the challenges? Can you elaborate on that?

In fact, it is very difficult to do a good search, on the one hand has a lot to do with ecology, on the one hand involves more technical points, especially natural language processing related technology. Wechat search is also just starting, there is still a lot of space for development, early ecology is more public number and public number articles, circle of friends, expressions, and so on, now there is a very important carrier is: Small programs, they can provide rich services and content, we can let users through the form of search for convenient use, with the size of small programs, our service capacity and content data will be more and more rich, search a search will be more and more perfect, but also better meet the needs of users. In wechat, our data has strong social attributes (we will not touch the private data), so we can design different algorithm models to deal with it. For example, we designed PeopleRank and TrustRank models to do the document quality and ranking model, etc. There are many technical points worth mining and exploring. We also welcome excellent talents to join us to create wechat search.

4. Asynchronous community: One of the hottest keywords in 2017 is artificial intelligence. As an important branch of artificial intelligence, natural language processing has a very wide application space. Can you elaborate on your views on natural language processing and algorithms?

I also explain a lot about this aspect in the book, the contents of some single task in natural language processing has achieved good results, such as word segmentation, entity recognition, document classification and so on, but if compared to understand human language, was still far short of, language quantitative, ambiguity resolution, scene integration, knowledge integration and so on many difficulties need to break, So there is still a long way to go to really understand language, and it needs to be a joint effort.

5. Asynchronous Communities: This book is very smooth and easy to read. Can you summarize your writing and working philosophy and the writing and working experience that you would most like to share with others?

In essence, it is a saying: do unto others as you would have them do unto you. I like to read articles that are down-to-earth and accessible, so my documentation is as close to that goal as possible. In life, try to be the person you like. Conversely, try not to be the person you don’t like.

6. Asynchronous communities: When did you first get into machine learning? When did you start blogging about what you learned? How is writing a book different from writing a tech blog?

When I was in college started to contact with machine learning, but at that time know is still very shallow, more from the book, and not a lot of thinking and practice, after work have more in-depth study and practice, will also send the learning summary down to blog, but it has been abandoned), blogging is can be changed at any time, Anything wrong or new thinking can be updated at any time; But it’s not easy to change after the book is published, and that’s the big difference.

7. Asynchronous community: What professional qualities do you think are necessary for young professionals to enter this field? What tools do you recommend for getting started?

A lot of knowledge is essential, probability theory and matrix theory, numerical analysis of school knowledge can’t back to the teacher, and then according to their own work projects, consciously cultivate themselves to theory of machine learning, natural language processing, etc, finally will gradually develop their ability of abstract modeling and summarizing ability. As for some tools, in fact, there are many open source tools and some articles on the Internet, so we can only search and read them consciously according to our current level. We must learn more and think more, and the ability to find and solve problems is more important.

8. Asynchronous community: From manual editing recommendation to intelligent distribution of machine learning algorithm recommendation, some people hold a different opinion that news algorithm recommendation makes people narrow the access to knowledge. What is your opinion on this issue?

This is a matter of opinion. Before the form of news software, decided that users see more is manual editing of popular news articles; But now some information reading software, through a certain recommendation algorithm, can recommend more long-tail articles to users, which is a big difference. As for the recommendation algorithm way to let people acquire knowledge narrow and it is recommended in the system precision and how to balance the problem of the diversity, in this world, as long as do the balance, will be more difficult, in appropriate cases to make the right choice is the best, but this is difficult to machine, such as, It is technically difficult to capture and represent the user’s current interests and thoughts.

9. Asynchronous communities: What are some of the biggest developments and challenges in natural language processing in 2018?

It is very difficult to predict. Just like what is said in the book, natural language processing is still in its infancy, and many things are worth exploring and making breakthroughs, which requires joint efforts of all. We can do more: 1. 2. Combine the requirements of the scene and make use of existing technologies to make products that can improve people’s efficiency in certain aspects.

10. Asynchronous Communities: Do you have any writing plans going forward? Are there any new works to “reveal the plot” to readers in advance?

Due to my work, I won’t have much time to write something in the short term, but I will pay attention to readers’ feedback on this book. When I have time, MAYBE I will improve more content based on this book and launch the second edition of “Algorithm on Text”, but the time is not sure.

