On July 8, 2017, Dr. Wan Xiaojun from The Institute of Computer Science and Technology of Peking University delivered a speech titled “Machine Writing Technology and Application” at the “CCF-GAIR 2017 Global Artificial Intelligence and Robotics Summit — AI+ Special Session”. As the exclusive video partner, IT mogul Said (wechat ID: Itdakashuo) is authorized to release the video through the review and approval of the host and the speaker.
Read the word count: 2685 | 7 minutes to read
t.cn/RnvWoea
Abstract
At the CCF-GAIR conference, a guest shared the background and current situation of machine writing. For the future, he thinks machine writing will not only be used in the media industry, but also in the game industry and intelligence industry. Still, he thinks getting machines to reason and generalize and write truly in-depth stories is the hardest part, but that’s the goal of the next step.
Machine writing status — abroad
Machine writing started in foreign countries a few years ago. A number of well-known foreign companies have been established, such as ARRIA, AI, NARRATIVESCIENCE, etc. The core technology is the natural language generation engine, which is mainly used in writing weather forecast, air quality, medical reports, finance, sports and other fields. AI has generated hundreds of millions of stories for the Associated Press and others, and Rational Science continues to generate stories for FORBES. Mainly for English and some Western languages.
Machine writing status quo – domestic
With the development of artificial intelligence technology, machine writing has attracted more and more attention in China in recent years. Some media organizations have cooperated with academic institutions to launch some writing robots. Some Internet giants, such as Baidu, Microsoft and Tencent, are also developing their own machine writing technology. I mainly focus on sports, finance and economics, people’s livelihood and entertainment news.
Original vs. second creation
We think there are two ways of machine writing, one is original, the other is secondary creation. Originality is no written manuscript, only structured data. We generate new manuscripts based on structured data. Such as weather forecasts, air quality reports, financial reports, product specifications and so on.
Second creation is based on the content of the existing manuscript to create a new manuscript. Such as news summary, news summary, news rewriting and so on.
NLP technology involved in machine writing
The two different methods of creation rely on different technologies. One is natural language generation technology, the other is automatic summarization technology. Natural language generation technology directly from structured data or semantic expression to generate natural statements, suitable for original. Automatic summarization technology is used to construct manuscripts based on existing text materials, which is suitable for secondary creation.
There are other related technologies, such as text message recommendation. That is, when we want to quote famous quotes, tang and Song poems when writing manuscripts, it will make a recommendation and automatically insert famous sentences when writing here.
And text retelling. Due to copyright issues, if the original content is directly copied, it will be suspected of plagiarism. Therefore, it is necessary to retell the same semantics through different languages.
Traditional media VS We media
Different application units have different requirements for manuscripts. Traditional media units have very strict requirements for manuscripts, zero tolerance for errors, and must be manually reviewed before publishing.
For we media, the requirements for content are relatively high, but some quality problems can be tolerated. For example, some sentences are not consistent, and there are a few typos.
Different quality requirements determine the different selection of machine writing methods.
Machine writer vs. author
The relationship between robots and journalists should now be one of division of labor. Robot writing speed, tireless, good at writing text messages. But robots can only perform low-level, repetitive tasks. Journalists, on the other hand, are highly thinking and can write in-depth reports. Capable of high-level, creative work.
A reporter can clearly understand what he is writing, but a robot cannot understand what he is writing even though he writes out every sentence.
Our research and application in machine writing
We do a lot of basic research, including automatic abstracting, natural language generation and other technologies. In addition, a lot of application technology research has been done, such as automatic generation of news information, automatic generation of news summary and automatic generation of user comments.
News automatically generated
Our writing input is structured data and possible text material. Can generate controllable length of manuscript, can generate dozens of words of short message, can also generate thousands of words of long information. In addition, there are many fields, sports, people’s livelihood and entertainment.
Sports event newsletter automatically generated
We will grab the data of some sports events from the Internet, make data analysis based on these data, and then make document planning and statement implementation. You can generate some simple event coverage. It was short, in a few dozen words. To make the story more vivid, we have different words for the same news.
Sports event long report automatically generated
There is a very important material called sports live text. Whenever there is a famous ladder cloud competition there will be text live, through video live into text. Live text broadcasts often include the host’s description of the game’s highlights. Machine learning is used to sort out the best stories and put them into the stories. This report is longer, more than a thousand words.
Live sports are very common and cover all the important games. First of all, it is rich in information. Sports broadcast text covers any important information involved in the game. The second is good flexibility, different games can build different styles of news. The third point is the good real-time, at any point in time in the game can build and release news.
The process of generating news report is to sort the words of live broadcast and make intelligent choice of the words by means of machine learning. Finally, an event report with an average length of more than 1000 words is generated.
Entertainment news automatically generated
Entertainment news can be generated based on micro-blog. Nowadays, celebrities often send micro blogs, and some micro blogs can make up our entertainment news. So we will have a machine learning tool to automatically determine whether every tweet from a celebrity is likely to become news, whether it is newsworthy. Then judge which of the comments under the microblog are valuable, and then combine the microblog with its comments and related background information to form an entertainment news.
News summaries are automatically generated
Automatic news summary generation is based on multiple news reports of the same event, the automatic generation of long event summary.
Because it is intended to be a summary, it is not written in sentences, but in subtopics. Divide the news into subtopics and get one of them. Each subtopic corresponds to a paragraph, and finally the subtopics are ranked in order of importance. Finally, the sub-topics are selected and combined to obtain a complete event summary. This summary can be thousands of bytes long.
User comments are automatically generated
In addition to generating factual news, we also tried to generate user comments. This is mainly based on product reviews, and we use a deep learning model.
Our application in machine writing
We cooperated with Toutiao, Southern Metropolis Daily and Guangzhou Daily to launch Xiao Ming, Xiao Nan and Ah Tong writing robots respectively.
Xiaoming writing robot can write short messages and long reports of thousands of characters based on the data of sports events and the text of live events.
Xiaonan writing robot writes livelihood news and two sessions news for southern Metropolis Daily APP.
A Tong, a writing robot, was launched in cooperation with Guangzhou Daily to analyze and interpret hot words and key data of various work reports during the two sessions.
Trends outlook
Machine writing will be more and more widely used in all walks of life, not only in the media to write news, but also in other industries.
We hope to make the manuscript have attitude and stand, more humanized. Write in-depth reports by induction and reasoning.
That’s all for today’s sharing, thank you!