I have been writing my blog for almost a year, from 1024 last year to October now. Reviewing the original articles published this year, we can see that they are mainly related to the mainstream of big data or surrounding technologies. This blog introduces several classic books on big data that must be read. If you like, remember to send a key and three links.
1. Hadoop’s Definitive Guide
To put this book in the first place, I believe that everyone here must have no objection ~
Hadoop is a software framework that enables distributed processing of large amounts of data. Data processing in a reliable, efficient, scalable manner. Hdfs consists of three parts: Hdfs, MapReduce, and Yarn. Hadoop refers to an ecosystem in a broad sense and refers to open source components or products related to big data technologies, such as HBase, Hive, Spark, Zookeeper, Kafka, and Flume….
Now “Hadoop” has almost become synonymous with big data, so if you are a beginner, you must be exposed to Hadoop first, the importance of Hadoop is self-evident.
2. HBase Authoritative Guide
As you may have noticed, HBase is actually a member of the Hadoop ecosystem. However, this section is not explained in detail in the Hadoop authoritative guide. If you are interested in HBase underlying source code, advanced architecture, performance optimization, cluster management and other advanced operations, this will be a must-see classic!
This is the introduction of the book on Douban, you can refer to:
The HBase Authority Guide explores how to simplify HBase scalability by using Hadoop, which is highly integrated with HBase; Distributing large data sets into relatively inexpensive clusters of commercial servers; Access HBase using a native Java client or a gateway server that provides REST, Avro, and Thrift application programming interfaces. Understand HBase architecture details, including storage formats, pre-write logs, and background processes. Integrate the MapReduce framework in HBase. Learn how to adjust clusters, design patterns, copy tables, import batch data, and delete nodes.
HBase Authoritative Guide Is suitable for advanced database developers who use HBase to develop databases.
3. The Authoritative Guide to Spark
Spark is a memory-based unified analysis engine for large-scale data processing (offline computing, real-time computing, and rapid query (interactive query)). The field of machine learning and artificial intelligence has also been booming in recent years.
Due to copyright reasons, there are very few electronic resources of the book in China, but there are still enthusiastic netizens sharing the Chinese version of their translation on GitHub, interested friends can browse and learn from the ideas of the original author.
4. Flink Basic tutorial
With Spark on the list, Flink can’t be missing! As a new generation of open source stream processor, Flink is a rising star in many big data processing frameworks. It supports both stream processing and batch processing with the same technology and can meet the requirements of high throughput, low latency, and fault tolerance.
Most importantly, this book, written by the core members of Flink project, systematically explains the application scenarios, design concepts, functions, uses and performance advantages of Flink. It doesn’t matter if you don’t understand, the author of this book is a senior technical expert of Alibaba, Apache Flink Committer, Taobao flower name “Big Sand”. Has visited Data Artisans, a company founded by Flink’s founding team, and works extensively with its CEO, Costas Zumas (one of the book’s authors) and CTO, Stephen Ewan.
So, when the fine taste of this book, there is a kind of dialogue with the big guy across the feeling ~
5. Kylin’s Definitive Guide
Apache Kylin is an open source OLAP engine on Hadoop big data platform, which improves the query speed and concurrency performance of big data to more than a hundred times, and opens the door for interactive big data analysis on super-large data sets. Written by the Apache Kylin core development team, this book systematically introduces Apache Kylin installation, getting started, visualization, model tuning, operation and maintenance, secondary development and other aspects. It is the authoritative guide on Apache Kylin.
6. Hadoop source code analysis
This book is a comprehensive and detailed introduction and analysis of Hadoop source code and internal working mechanism of a technical book. Through detailed analysis of Hadoop internal source code, this book enables readers to quickly and efficiently understand the internal working mechanism of Hadoop, understand Hadoop internal source code architecture, quickly and efficiently start Hadoop, and have a deep understanding of Hadoop. It is also the first book to introduce Hadoop source code in detail in China.
7. Big data
I believe you can guess from the title that “The Road to Big Data: Alibaba’s Big Data Practice” is organized and written by the Data Technology and Products Department of Alibaba, the Chinese Internet giant. This book is an important cornerstone for Alibaba to share its knowledge of big data and create data intelligence with ecological partners.
This book mainly tells the technical practice and thinking of Alibaba at the level of data technology, data model, data management and data application. I believe that the practice and thinking in this book will have great inspiration and reference significance for peers.
Big Data Architect’s Guide
Do not be surprised by the title of the “architect”, although everyone’s temporary skills are different, but the final goal will be to try to “architect”, right ~
The purpose of this book is to help readers systematically grasp the technical framework related to big data in the shortest time, and establish the technical thinking ability and principles at the system architecture level. This book is suitable for enterprise IT and big data practitioners, IT and big data related sales personnel, enterprise chief technology officer (CTO), chief information officer (CIO).
Therefore, if you are a novice, it is not recommended to directly start the “architecture”, we first lay a solid foundation, and then cultivate the architecture thinking is not too late ~
9. User portrait: Methodology and engineering solutions
This book systematically explains the methodology of user portrait and some common engineering solutions from the technical dimension. It is suitable for readers who are interested in user portrait and have certain development ability.
For some players who have experience in painting project development, it may be sniffy. But to give you a preview of the plot, when you finish reading this book, you’ll really realize how little you know. According to? Because this book really you can consider, can not consider the scope of all described out, I believe, you read, absolutely will fall in love with learning!
10. Principle and application of big data technology
For the last book, I reserve the place for “Principles and Applications of Big Data Technology”, not because of how authoritative this book is, but because in my opinion, it is the most suitable for beginners of big data!! All of the books mentioned above require readers to have certain big data theory or certain development ability. The major of Big data has been gradually established in recent years. As far as I know, many universities, including 985 students like Xiamen University, are using this book as a supporting textbook. If you are a new star in the field of big data, I strongly recommend you to start reading this book first. After all, many universities are using textbooks, and I have been lucky enough to buy one and read some chapters. I can’t help feeling that this book is really easy to understand
conclusion
These are just 10 great books that I personally consider to be very good, but there are many more that I haven’t covered yet, such as Illustrated Spark Core Technologies and Case Studies, Spark Big Data Processing: Technology, Application and Performance optimization, as well as I just recently into the pit of the glacier big guy wrote “massive data processing and big data technology combat”, are can let me read very “obsessed” god book.
In the future, if I have the opportunity to write more good books for you, this article will end here. The 10 books detailed above can be obtained by replying to “Big data Books” in the background of the blogger’s personal public account [Big Data Dreamer].
More exciting content 👇 “Big data dreamers” 🔥 :
A big data enthusiast who likes reading, output and replicating. Besides sharing basic principles of big data, technical practice, architecture design and prototype realization, I also like to output some interesting and practical programming content and reading experience……