The author | Jeremy Howard, Sebastian Ruder
The translator | Liu Zhiyong
Edit | Natalie, Vincent
AI Front Line introduction:This article is an introduction to Fast. Ai’s new paper, aimed at a general audience, which explains how to automatically classify documents with higher accuracy and fewer data requirements than previous methods. This paper will explain in simple terms: natural language processing, text classification, transfer learning, language modeling; And how this new approach brings these ideas together. If you are already familiar with natural language processing and depth of study, you can access the project home page to get technical information: http://nlp.fast.ai/category/classification.html






Please pay attention to the wechat public account “AI Front”, (ID: AI-front)
Before the speech

We published a paper Universal Language Model Fine – tuning for Text Classification, (ULMFiT) (https://arxiv.org/abs/1801.06146), Complete code for pre-trained models and Python implementation is provided. This paper was peer-reviewed and accepted at the Annual Meeting of the Association for Computational Linguistics, ACL 2018, http://acl2018.org/). Can be in this website (http://nlp.fast.ai/category/classification.html) to watch the related video, video provides in-depth discussion of the method, all use Python module, the training model and the scripts used to build their own model.

This approach is a significant improvement over previous text categorization methods, and anyone can use our code and pre-training model to better solve the following problems:

  • Search for documents related to legal cases;

  • Identify spam, bot responses and offensive comments;

  • Categorize positive and negative reviews of products;

  • Classify articles according to political orientation;

  • The other.

ULMFiT requires a much smaller amount of data than previous methods. (Figure 3 in the paper)

So how does this new technology improve? Let’s first take a look at part of the abstract of the paper to see what it says, and then in the following sections of this paper, we’ll expand from there and understand exactly what it all means:

Transfer learning has greatly influenced computer vision, but existing approaches to natural language processing still require task-specific modifications and training from scratch. We propose an efficient transfer learning approach that can be applied to any task in natural language processing and introduce key techniques for fine-tuning language models. Our method significantly outperforms existing techniques in six text categorization tasks, reducing errors by 18-20% on most data sets. Furthermore, using only 100 tagged samples, this approach can match the performance of a model with 100 times more data and trained from scratch.

Natural language processing, deep learning and classification

Natural Language Processing (NLP) is a field of computer science and artificial intelligence. As the name implies, it refers to the use of computers to process Natural language. Natural language refers to the normal language in which we communicate, such as English or Chinese, rather than specialized languages like computer code or musical symbols. Natural language processing is used in a variety of applications, such as search, personal assistants, summaries, and so on. In general, natural language processing is challenging because the strict rules we use when writing computer code do not apply to the nuances and flexibility of language. You may have encountered these limitations yourself, such as the frustrating experience of trying to communicate with an automated answering system; Or the pitiful abilities of early conversational robots like Siri.

Over the past few years, we’ve seen deep learning make a big push into areas where computers previously had only limited success. Rather than requiring programmers to define a fixed set of rules, deep learning uses neural networks to learn rich nonlinear relationships directly from data. Most notably, the success of deep learning in computer vision, such as in the ImageNet image classification contest, has come by leaps and bounds.

Deep learning has also had some success in natural language processing, such as automatic translation, discussed in the New York Times article. A common feature of most successful natural language processing tasks is that a large amount of marker data can be used to train models. So far, however, these applications have been limited to organizations that can collect and label large data sets and have the computing resources to process them over long periods of time on clusters of computers.

Oddly enough, one subfield where natural language processing remains challenging for deep learning is the one where it has had great success in computer vision: classification. It refers to any problem where your goal is to categorize things (such as images or documents) (such as images of cats and dogs, or positive versus negative comments, etc.). A number of important real-world problems are primarily about classification, which is why the success of deep learning with ImageNet, a classification problem, has spawned a number of commercial applications. In natural language processing, current methods can do a good job of identifying, for example, whether a review of a movie is positive or negative, a problem called “sentiment analysis.” However, when things get more ambiguous, the model “hesitates,” usually because there isn’t enough label data to learn from.

The migration study

Our goal is to solve these two problems: a) to deal with natural language processing problems, we don’t have a lot of data and computing resources; B) Make classification of natural language processing easier. Both of us (Jeremy and Sebastian) are working on the exact area of solving this problem: transfer learning. Transfer learning refers to the use of models trained to solve one problem (such as categorizing images in ImageNet) as the basis for solving other similar problems. A common approach is to fine-tune the original model (such as classifying CT scans as cancer or non-cancer, an application of transfer learning developed by Jeremy when he founded Enlitic). Because fine-tuning models do not need to learn from scratch, they can generally achieve higher accuracy and require much less data and computation time than models that do not use transfer learning.

Very simple transfer learning using only one layer of weight (called embedding) has become popular in recent years, such as Google’s Word2vec embedding. In practice, however, complete neural networks consist of many layers, so using only a single layer of transfer learning is obviously only a taste of endless possibilities.

So the question is, in order to solve the problem of natural language processing, where can we transfer learning from? Jeremy had the answer to that question when his friend Stephen Merity announced the development of the AWD LSTM Language Model, https://github.com/salesforce/awd-lstm-lm), which compared with the previous language modeling method had the very big improvement. A language model is a natural language processing model that learns to predict the next word in a sentence. For example, if you need your phone’s keyboard to guess what word to type next, use a language model. This is important because language models are good at guessing what you’re going to say next, and it requires a lot of cosmopolitan common sense (e.g. “I ate a hot” → “dog”; “It is very hot” → “weather”) and an in-depth understanding of syntax, semantics, and other elements of natural language. This is the ability to read and categorize documents without realizing it.

We find that, in practice, this approach to transfer learning has the characteristics that make it a universal approach to transfer learning for natural language processing:

  • Suitable for tasks with different document sizes, numbers, and label types;

  • Use of a single framework and training process;

  • No custom feature engineering or preprocessing required;

  • No additional in-domain documents or labels are required.

Make it work

ULMFiT Advanced Methods (using IMDb as an example)

This idea has been tried before, but it takes millions of documents to get enough performance. We found that we could do better by fine-tuning our language models more intelligently. In particular, we found that if the rate of model learning is carefully controlled and the pre-trained model is updated so that it does not forget what was previously learned, the model can better adapt to new data sets. One thing we’re particularly excited about is how well models learn from a limited sample. In a two-class text classification dataset, we found that training our method with just 100 labeled samples (and allowing it access to approximately 50,000 unlabeled samples) allowed us to achieve the same performance as training the model from scratch with 10,000 labeled samples.

Another important insight is that we can use any reasonably large and generic language corpus to create a generic language model: we can fine-tune any natural language processing target corpus. We decided to use Stephen Merity’s Wikitext 103 dataset, which contains a preprocessed large subset of the English version of Wikipedia.

Research in natural language processing has focused on English, and training models of non-English languages poses a number of challenges. In general, the number of publicly available data sets for non-English languages is small, and if you want to train a text categorization model for a language such as Thai, you need to collect your own data. Gathering data in a non-English language often means you need to annotate the data or find your own annotators, as crowdsourcing services like Amazon Mechanical Turk mostly use English-speaking annotators.

With the help of ULMFiT, we can make it easier to train text categorization models in languages other than English, because all we need is access to Wikipedia, which currently has 301 languages, a small number of documents that can be easily annotated manually, and unlabeled documents that can be added at will. In order to make it easier, we will soon launch a collection model (model zoo, http://nlp.fast.ai/category/model_zoo.html), to provide a variety of language training language model.

Future of ULMFiT We have found that this approach works well with the same Settings for different tasks. In addition to text categorization, there are many other important natural language processing problems, such as sequence tagging or natural language generation, which we hope ULMFiT will be able to solve more easily in the future. We will update this site once we have completed our experiments and built models in these areas.

In the field of computer vision, the success of transfer learning and the availability of pre-trained ImageNet models have transformed the field. Many people, including entrepreneurs, scientists and engineers, are using fine-tuned ImageNet models to solve important problems involving computer vision: from boosting crop yields in Africa to robots that make Lego blocks. The same tools can now be used to deal with natural languages, and we expect to see the same proliferation in this area.

Although we have described the latest achievements in text categorization, there is still a lot of work to be done before we can really take full advantage of NLP transfer learning. In the field of computer vision, many important and insightful papers have emerged to analyze the transfer learning in this field. Especially Yosinski and others trying to answer “how transferable are the features in deep neural networks (https://arxiv.org/abs/1411.1792)” this problem, “Huh and others studied top service ImageNet good for transfer learning (https://arxiv.org/abs/1608.08614)”. Yosinski even created a rich visualization toolkit (the visualization toolkit, https://github.com/yosinski/deep-visualization-toolbox), To help practitioners better understand the features in their computer vision models, as shown in the video below:

V.qq.com/x/page/m065…

Original link:

http://nlp.fast.ai/classification/2018/05/15/introducting-ulmfit.html