So far, Hadoop has evolved through two generations, Hadoop 1.0 and Hadoop 2.0. Compared with the Hadoop Authoritative Guide (Version 3), version 4 focuses on Hadoop 2.0 and adds special explanations on the current hot Hadoop technologies such as YARN, Parquet, Flume, Crunch and Spark. It helps Hadoop developers better understand the background, principles, and usage of the technology. In addition, version 4 also introduces the latest application achievements of Hadoop in the field of health care and molecular biology, and adds relevant case studies for this purpose, which has better practical guidance significance for the majority of Hadoop users.

Today, Hadoop open source project has become an important platform for researching big data and developing big data applications. A large Hadoop user community has been formed in China, and they have a high demand for learning, mastering and improving Hadoop. The launch of Hadoop Authoritative Guide series can meet such needs. The popularity of the book after the first edition also proves its usefulness and value.

Combining theory and practice, this book comprehensively introduces Hadoop, a high-performance mass data processing and analysis platform. The book consists of 24 chapters in 5 parts,

Part I introduces the basic knowledge of Hadoop, including Hadoop, MapReduce, Hadoop DISTRIBUTED file system, YARN, and Hadoop I/O operations.

Part II introduces MapReduce, with topics including MapReduce application development; MapReduce working mechanism, MapReduce types and formats, and MapReduce features.

The third part introduces the operation and maintenance of Hadoop, which mainly involves building Hadoop clusters and managing Hadoop.

Part IV introduces Hadoop related open source projects, including Avro, Parquet, Flume, Sqoop, Pig, Hive, Crunch, Spark, HBase, and ZooKeeper.

Part V provides three examples from healthcare INFORMATION technology provider Cerner, Microsoft’s ARTIFICIAL intelligence project ADAM(a massively distributed deep learning framework), and the open source project Cascading(a new data processing API for MapReduce).

The book is an authoritative and comprehensive Hadoop reference and reference book, which describes the latest development and application of Hadoop ecosystem. Programmers can explore the storage and analysis of massive data sets, and administrators can learn the installation, operation and maintenance of Hadoop clusters.

Directory overview

Due to the space limit xiaobian, xiaobian only here to show you the catalog and part of the content, there is a need for complete documentation of the program ape (yuan) can help forward + pay attention to the end of the public number to obtain

Basic knowledge of part | Hadoop

Chapter 1: Getting to know Hadoop

Chapter 2 is about MapReduce

Chapter 3 Hadoop Distributed File System

Chapter 4 on YARN

Chapter 5 Hadoop I/O Operations

Part 2 is about MapReduce

Chapter 6 MapReduce application development

Chapter 7 how MapReduce works

Chapter 8 MapReduce types and formats

Chapter 9 features of MapReduce

Due to the space limitation, the detailed information of the PDF document is too comprehensive, the details are too much, so only part of the knowledge point screenshots out of the rough introduction, each small node there are more detailed content! Not only outline and directory, there is a need for the program ape (yuan) can help forward + pay attention to the end of the public account to obtain

Part 3: Hadoop operations