Summary of the machine learning/data mining/data analysis books I have read, some suitable for entry, some suitable for advanced, not in accordance with the hierarchy, first summarize, and so summarized almost according to the entry -> advanced block writing. The following list of books I wrote are basically read, or dare not write, for fear of mislead people
The data analysis
When I was an intern, I only knew Matlab. The company was small and I had no money to buy legal copies, so the leader asked me to learn R in two weeks. I read these books at that time
1.R language practice
Comments: a good primer on installation, getting started, basic statistical analysis, plotting commands, and common classification, regression, dimensionality reduction methods
Rating: five stars
2. Data analysis -R language practice
Evaluation: data analysis book written specifically in R language, can be read after mastering the basics of R, focusing on the basic methods of data analysis, introduces some common analysis methods, comparison of the foundation.
Rating: four and a half stars
3. Exploratory data analysis
Review: written by a foreigner, but the translation is really bad. And the content doesn’t matter actually dry goods, look for the teaching material of this statistics directly about quantile, spread cloth and so on these concepts.
Rating: Three stars
4.R language programming art
Review: Stumbled across this great book in the library, which is good for data structures and performance enhancement in R.
Recommended rating: four stars
5. Use Python for data analysis
The book is written by the author of the Pandas module. It is summed up in one sentence: The Manual for pandas. Pandas is a necessary package for data analysis in Python.
Recommended rating: four stars
Data mining/machine learning
4. R language data mining in the era of big data: R language actual combat
Evaluation: and the above “data analysis -R language combat” seems to be a series, basically common data mining methods are introduced, there are theoretical examples, suitable for entry.
Recommended rating: four stars
5. Data mining concepts and technologies
Evaluation: introductory book, more theory, seems to be a lot of graduate students to learn data mining teaching materials, very detailed, Meng Xiaofeng teacher’s translation or good, relatively many translation is very bad or can.
Recommended rating: four stars
6. Machine learning
Comments: Written in Python, if you do not have the foundation of Python or learn Python first, basically all examples, the code is very detailed, also very easy to understand, github can download the code
Rating: five stars
7. Collective intelligence programming
Evaluation: and machine learning actual combat together, are basically examples, translation can also be, than the “exploratory data analysis” translation is much better!! Have code, can practice, basically really mastered can deal with the general data mining needs.
Rating: five stars
Statistical learning methods
Evaluation: The mathematical derivation of the common algorithm of machine learning written by Dr. Li Hang is more detailed, and it is very good to understand the mathematical basis. If there is no mathematical basis, you can first look at the number of points of high generation convex optimization and other books. Suitable for learning with a certain foundation.
Rating: five stars
9. Recommend system practice
For those who don’t know what a recommendation system is, you can take a good look at it. After reading it, you can basically understand the general framework and process of the recommendation system. There are also some examples, but each example and theory are very shallow, not in-depth, only suitable for beginners.
Recommended rating: four stars
10. Introduction to data mining
Evaluation: intern colleague undergraduate course teaching materials, is also a big giant ah, foreigners write books, very easy to understand, very very detailed.
Recommended rating: four stars
So I’m going to leave you there, basically a few introductory books, and some of them are in Evernote, and I’ll summarize them later. Next time I’ll write a hadoop/Python/Spark book and some good papers.
# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 4.12 update — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – ———–
11.Spark fast big data analysis
Learning Spark is a Chinese version of Learning Spark. It is a very simple book that introduces the basic syntax commands of Spark
Rating: five stars
12.Spark advanced data analysis
Evaluation: There are few comments on Douban, but AFTER I bought it, I found that it is still good. Basically, there are examples explaining the classification, clustering, recommendation and credit investigation, which are quite detailed and quick to read. I am good at it.
Rating: five stars
13.Hadoop authoritative Guide
Evaluation: 7.8/10, very thick, Hadoop talk very deep, not quite suitable for entry, suitable for people to do data warehouse, data mining can first see Hadoop combat
Rating: Three stars
14. Hadoop in actual combat
Evaluation: 7.0/10, I read a professor in China wrote, not “Hadoop in Anction” Chinese translation, this write is very shallow, suitable for entry, but feel or Hadoop in Action write better
Recommended index: Three and a half
15.Hive Programming Guide
Evaluation: 7.4/10, about Hive operation, really, if you really just want to know how to operate Hive, you can not read this book, directly to search Hive programming command set can be, this book is more suitable for ETL people, if only data mining entry you can temporarily do not read this book. But the book itself is very good
Recommended rating: four stars
16.R language and Website analysis
Evaluation: 7.4/10. It was just a book I happened to read when I went to Guitu, but after reading a few chapters, I thought it was very clear. Besides, the examples behind it were very good, so I bought a Kindle e-book on Amazon.
Recommended rating: four stars
17.R’s Geek Ideal Tools section
Evaluation: 7.5/10. The author is Zhang Dan. At the beginning, I paid attention to his blog, which is very clear and the steps are very clear. The last few pages of this book are mainly about R performance, as well as database, Hadoop, hive combined methods, worth a look.
Recommended rating: four stars
Mysql must know must know
Comments: 8.4/10, not much to say, getting started with Mysql is a must-read, very thin booklet.
Rating: five stars
19. High-performance MySQL
Review: 8.7/10, professional MySQL books, suitable for advanced, but the Chinese translation is very poor, buy English English English
Recommended rating: two stars (and three for the English version)
19. The convex optimization
Evaluation: 9.4/10, a very good and comprehensive textbook, including a lot of content learned in the numerical analysis course before, and many concepts of machine learning can be found in the book, so that you can have a deeper understanding of machine learning, rather than just apply the package.
Recommended rating: 5 stars!
20.Pattern Recognition and Machine Learning
Comments: 9.6/10, PRML is a classic textbook on machine learning, it is very worth watching! Someone translated the Chinese version, if necessary, you can leave a message I will send the link ~
Rating: five stars
22. Statistical natural language processing
Evaluation: 8.8/10, do an introduction to natural language processing, the book is very thick, but many conceptual things, but don’t feel boring, the only drawback is that is probably because classical teaching material books, so less instance, a bit like a review, lot, if you want to actual combat, can have a look at Python to write a book of natural language processing, me, Natural Language Processing with Python (Natural Language Processing with Python)
Recommended rating: four stars
Recommend a few popular science books, amateur can read to enhance interest
1. From 0 to 1
2. Age of big data
Top of the wave
4. The beauty of mathematics
5. Top of the data
There are other temporarily can not remember, next time update ~
# — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — 4.19 update — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — — – ———–
23. Machine learning
Evaluation: the last incredibly forgot to write the book, teacher zhou’s new book, the formula is derived in detail, douban score of 9.2, weeks before the teacher talking about data mining algorithm about how to evaluate the effect of the algorithm and selection, can understand macroscopic, machine learning some basic knowledge, at the time of after learning algorithm for their is also a general understanding of the scene. The author’s thought is very clear, strongly recommend!!
Rating: five stars