Recommend an ApacheCN open source machine learning roadmap:

Github.com/apachecn/Ai…

Note: you need to go directly to the url or “read the article” to open the link in the article \

The roadmap

Follow the steps: 1 => 2 => 3, you can be a Daniel!

1. Machine learning – Basics

The Machine Learning in Action (Machine Learning field) | ApacheCN (apache Chinese)
Machine Learning in Action – Chinese version – with table of contents. PDF
— Thanks to the e-book “Machine Learning Practice-Apachecn.pdf” generated by Feilong Little Brother
The video has been updated, if you find it valuable, please help to Star [follow-up learning activities: Sklearn, Kaggle, Pytorch and TensorFlow]
— Video website: Youku/Bilibili/Acfun/netease Cloud Classroom, which can be directly played online. (Link at bottom)
Red Stone: Machine learning Notes by Lin Xuantian, Taiwan University

A recommended machine learning note:

Feisky. Xyz/machine – lea…

Machine learning	Chapter 1: Fundamentals of machine learning
Machine learning	Chapter 2: KNN nearest Neighbor algorithm
Machine learning	Chapter 3: Decision trees
Machine learning	Chapter 4: Naive Bayes
Machine learning	Chapter 5: Logistic regression
Machine learning	Chapter 6: SVM support vector machines
Online composite content	Chapter 7: Integrated Methods (Random Forest and AdaBoost)
Machine learning	Chapter 8: The Return
Machine learning	Chapter 9: Tree regression
Machine learning	Chapter 10: K-means clustering
Machine learning	Chapter 11: Association analysis using Apriori algorithm
Machine learning	Chapter 12: FP-growth discovers frequent itemsets efficiently
Machine learning	Chapter 13: Using PCA to simplify data
Machine learning	Chapter 14: Using SVD to simplify data
Machine learning	Chapter 15: Big data and MapReduce
Actual Ml project	Chapter 16: Recommended Systems (Migrated)
Summary of the first issue	2017-04-08: Summary of the first issue

How to get into machine learning?

What about the video?

Theory major – It is recommended to learn Andrew Ng’s video (Ng’s video is absolutely authoritative, there is no doubt about this)
Strong coding ability – Please read our “Machine Learning Practice – Teaching Edition”
Weak coding ability – I suggest you read our “Machine learning Practice – Discussion edition”, but when you read the theory, read the teaching edition – theory section; There’s a lot of nonsense in the discussion board, but it goes through the code line by line; So, mix freely according to your own needs.

Introduction to Khan Academy

Khan Academy – netease Open courses

The probability of	statistical	Linear algebra
Khan Academy (Probability)	Khan Academy (Statistics)	Khan Academy (Linear Algebra)

Machine Learning Video – ApacheCN Teaching Edition


AcFun	B station

youku	Netease Cloud Classroom

Machine/Deep Learning Video by Andrew Ng

Machine learning	Deep learning
Ng machine learning	Neural networks and deep learning

2. Deep learning – Basics

Deep learning requires learning

Reverse pass:
www.cnblogs.com/charlotte77…
The principle of the CNN:
www.cnblogs.com/charlotte77…
RNN principle:
Blog.csdn.net/qq_39422642…
LSTM:
Blog.csdn.net/roslei/arti…

3. Natural language processing

Learning process – complex inner changes!!

Since learning NLP, I found the typical differences between China and foreign countries:1Attitudes towards resources are quite opposite:1) Domestic: it is like holding a meeting for the fame of working clothes, but there is no dry goods, all are symbolic PPT presentation, not for the people who are doing2) Abroad: As if to promote the progress of NLP, distributors of all kinds of dry materials and concrete implementation. (Specifically: Python natural language processing)2. Realization of the paper:1) all kinds of lofty paper implementation, but still haven't seen a decent GitHub project! (Maybe I'm not good at searching, so I haven't found it.)2(I don't understand!3Open source framework1Tensorflow/PyTorch: TensorFlow/PyTorch2) Domestic open source framework: ah, really can not give examples! But it is as good as abroad! (Although MXNet is developed by many Chinese, it is not considered as a domestic open source framework. Deep learning of Hands-on Learning based on MXNet/ / zh.diveintodeeplearning.org) Chinese tutorial, has been lived by god (nervegrowold) and aston · teaching recording, public release. Documentation + Season 1 tutorial + Video)Every time go deep all want to turn over the wall, every time go deep all want Google, every time look at home of say: Harbin Institute of Technology, information fly, in science and technology, Baidu, Ali much cow force, but the data still get abroad to look for! Sometimes really quite ruthless! Really look down on their own domestic technology environment! Of course, thanks to many bloggers in China, especially for some introductory demos and basic concepts. 【 In-depth level is limited, do not understand 】Copy the code

Must-see materials for introductory tutorial [Add competition links] :

Github.com/apachecn/Ai…
Python Natural Language Processing Version 2:

Usyiyi. Making. IO/NLP – py – 2 – z e…
A comprehensive knowledge system of NLP compiled by Liuhuanyong is recommended:

liuhuanyong.github.io

1. Usage Scenarios (Baidu Open Courses)

The first part is introduction

1) Introduction to natural language processing

Part two: Machine translation

2.) Machine translation

The third part is chapter analysis

3.1.) Chapter Analysis – Content overview
3.2.) Chapter Analysis – Content tags
3.3.) Chapter Analysis – Emotional analysis
3.4.) Chapter analysis – automatic summary

Part IV UNIT- Language Understanding and Interaction Techniques

4.) UNIT- Language understanding and interaction technology

Application field

Chinese word segmentation:

Build a DAG figure
Dynamic programming search, synthesize positive and negative (positive weighted reverse output) to obtain the maximum probability path of DAG
SBME corpus is used to train a HMM + Viterbi model to solve the problem of unknown words

1. Text Classification

Text categorization is an indicator of sentences or documents, such as E-mail spam categorization and sentiment analysis.

Here are some great text categorization datasets for beginners.

Reuters Newswire Subject Classification (Reuters -21578). A series of news documents appearing in Reuters in 1987, indexed by category. See also RCV1, RCV2 and TRC2.
IMDB Film Review Sentiment Classification (Stanford). A series of movie reviews from the website IMdb.com and their positive or negative emotions.
Newsgroup Film Review Sentiment Classification (Cornell). A series of movie reviews from the website IMdb.com and their positive or negative emotions.

For more information, see the post: Data Sets for Single-label Text Categorization.

Sentiment analysis

Competition Address:

www.kaggle.com/c/word2vec-…

Scheme 1 (0.86) : WordCount + naive Bayes
Scheme 2 (0.94) : LDA + classification model (KNN/decision tree/Logistic regression/SVM/XGboost/random forest)
- A) The effect of decision tree is not very good, this continuous feature is not suitable
- B) 200 topics are adjusted by parameters, and the effect of information preservation is relatively good (calculation topic)
Scheme 3 (0.72) : WORD2vec + CNN
- To tell the truth: without a good machine, is not adjusted out of a good result (: escape

The effectiveness of the model was evaluated by AUC

2. Language Modeling

Language modeling involves developing a statistical model for predicting the next word in a sentence or within a word. It is a pre-requisite for tasks such as speech recognition and machine translation.

It is a pre-requisite for tasks such as speech recognition and machine translation.

Here are some good beginner language modeling data sets.

Project Gutenberg, a series of free books that can be retrieved in plain text in a variety of languages.
There are more formal corpora that are well studied; Example: Brown University Standard Corpus of Modern American English. A large sample of English words. Google’s billion word corpus.

New found

Chinese word segmentation new word discovery
Python3 uses mutual information and left and right information entropy for Chinese word segmentation neologism discovery
Github.com/zhanzecheng…

Sentence similarity recognition

Project address: www.kaggle.com/c/quora-que…
Solution: Word2vec + bi-gru

Text error correction

bi-gram + levenshtein

3. Image Captioning

Mage captioning is the task of generating a text description for a given image.

Here are some good beginner image captions data sets.

Public objects in context (COCO). Contains a collection of over 120,000 images with descriptions
Flickr 8 k. A collection of 8,000 descriptive images from Flickr.com.
Flickr 30 k. A collection of 30,000 descriptive images from Flickr.com. For more, see the post:

Explore the Image caption Dataset, 2016

4. Machine Translation

Machine translation is the task of translating text from one language to another.

Here are some good machine translation data sets for beginners.

Coordinating Member of Parliament, 36th Parliament of Canada. Pairs of English and French sentences.
European Parliament Proceedings Parallel Corpus 1996-2011. Sentences to a set of European languages. A large number of standard data sets are available for the annual Machine translation Challenge; See:

Statistical machine translation

Machine translation

Encoder + Decoder(Attention)
Reference Case:
Pytorch.apachecn.org/cn/tutorial…

5. Question Answering

A question and answer is a task in which a sentence or sample text is provided from which a question is posed and must be answered.

Here are some good data sets for beginner question answers.

Stanford Question Response Data Set (SQuAD). Answer questions about wikipedia articles.
Deepmind question response Corpus. Answer questions about news articles from the Daily Mail.
Amazon q&A data. Answer questions about Amazon products. For more information, see the post:

Data sets: How do I get a corpus of q&A sites such as Quora or Yahoo Answers or Stack Overflow to analyze the quality of Answers?

6. Speech Recognition

Speech recognition is the task of converting spoken audio into human-readable text.

Here are some great beginner speech recognition data sets.

TIMIT Acoustics – Speech Continuous speech Corpus. It’s not free, but it’s on the market because of its widespread use. Spoken American English and related transcriptions.
VoxForge. A project to build an open source database for speech recognition.
LibriSpeech ASR Corpus. A large collection of Audio books in English from LibriVox.

7. Automatic Document Summarization

Document summaries are the task of creating short, meaningful descriptions of larger documents.

Here are some good beginner document summary data sets.

Legal case report data set. Collected 4,000 legal cases and their briefs.
TIPSTER Text abstracts evaluation Conference Corpus. Nearly 200 documents and their abstracts were collected.
AQUAINT Corpus of English news Texts. Not free, but widely available. A corpus of news articles. For more information:

Document Understanding meeting (DUC) tasks. Where can I find a good data set for text summarization?

Named entity recognition

Bi-LSTM CRF
Reference Case:

Pytorch.apachecn.org/cn/tutorial…
CRF Recommended documents:

www.jianshu.com/p/55755fc64…

Text in this paper,

The removable
word2vec + textrank
Word2vec

www.zhihu.com/question/44…
Textrank recommended documents:

Blog.csdn.net/BaiHuaXiu12…

Graph computing

Data set: data/ NLP /graph
Spark graphX Practice.pdf

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

ApacheCN is an open source machine learning roadmap

Recommend an ApacheCN open source machine learning roadmap:

Github.com/apachecn/Ai…

The roadmap

1. Machine learning – Basics

2. Deep learning – Basics

3. Natural language processing

1. Usage Scenarios (Baidu Open Courses)

Application field

Chinese word segmentation:

1. Text Classification

2. Language Modeling

3. Image Captioning

4. Machine Translation

5. Question Answering

6. Speech Recognition

7. Automatic Document Summarization

Graph computing

Further reading

Github.com/apachecn/Ai…

Installation of Python (Anaconda+Jupyter Notebook +Pycharm)

What if Python code is ugly? Recommend a few artifacts to save you

ApacheCN is an open source machine learning roadmap

Recommend an ApacheCN open source machine learning roadmap:

Github.com/apachecn/Ai…

The roadmap

1. Machine learning – Basics

2. Deep learning – Basics

3. Natural language processing

1. Usage Scenarios (Baidu Open Courses)

Application field

Chinese word segmentation:

1. Text Classification

2. Language Modeling

3. Image Captioning

4. Machine Translation

5. Question Answering

6. Speech Recognition

7. Automatic Document Summarization

Graph computing

Further reading

Github.com/apachecn/Ai…

Installation of Python (Anaconda+Jupyter Notebook +Pycharm)

What if Python code is ugly? Recommend a few artifacts to save you

Related Posts

【LeetCode】 Make strings as equal as possible

Spring’s @Autowired annotation based configuration of annotations

Troubled by uneven load on cluster nodes? TKE launched full link scheduling solution