So far, Hadoop has evolved through two generations, Hadoop 1.0 and Hadoop 2.0. Compared with the Hadoop Authoritative Guide (Version 3), version 4 focuses on Hadoop 2.0 and adds special explanations on the current hot Hadoop technologies such as YARN, Parquet, Flume, Crunch and Spark. It helps Hadoop developers better understand the background, principles, and usage of the technology. In addition, version 4 also introduces the latest application achievements of Hadoop in the field of health care and molecular biology, and adds relevant case studies for this purpose, which has better practical guidance significance for the majority of Hadoop users.

Today, Hadoop open source project has become an important platform for researching big data and developing big data applications. A large Hadoop user community has been formed in China, and they have a high demand for learning, mastering and improving Hadoop. The launch of Hadoop Authoritative Guide series can meet such needs. The popularity of the book after the first edition also proves its usefulness and value.

 

Combining theory and practice, this book comprehensively introduces Hadoop, a high-performance mass data processing and analysis platform. The book consists of 24 chapters in 5 parts,

Part I introduces the basic knowledge of Hadoop, including Hadoop, MapReduce, Hadoop DISTRIBUTED file system, YARN, and Hadoop I/O operations.

Part II introduces MapReduce, with topics including MapReduce application development; MapReduce working mechanism, MapReduce types and formats, and MapReduce features.

The third part introduces the operation and maintenance of Hadoop, which mainly involves building Hadoop clusters and managing Hadoop.

Part IV introduces Hadoop related open source projects, including Avro, Parquet, Flume, Sqoop, Pig, Hive, Crunch, Spark, HBase, and ZooKeeper.

Part V provides three examples from healthcare INFORMATION technology provider Cerner, Microsoft’s ARTIFICIAL intelligence project ADAM(a massively distributed deep learning framework), and the open source project Cascading(a new data processing API for MapReduce).

The book is an authoritative and comprehensive Hadoop reference and reference book, which describes the latest development and application of Hadoop ecosystem. Programmers can explore the storage and analysis of massive data sets, and administrators can learn the installation, operation and maintenance of Hadoop clusters.

Directory overview

 

 

 

 

Due to the space limit xiaobian, xiaobian only here to show you the catalog and part of the content, there is a need for complete documentation of the program ape (yuan) can help forward + pay attention to the end of the public number to obtain

Basic knowledge of part | Hadoop

 

Chapter 1: Getting to know Hadoop

 

 

Chapter 2 is about MapReduce

 

 

Chapter 3 Hadoop Distributed File System

 

 

Chapter 4 on YARN

 

 

Chapter 5 Hadoop I/O Operations

 

 

Part 2 is about MapReduce

 

Chapter 6 MapReduce application development

 

 

Chapter 7 how MapReduce works

 

 

Chapter 8 MapReduce types and formats

 

 

Chapter 9 features of MapReduce

 

 

Due to the space limitation, the detailed information of the PDF document is too comprehensive, the details are too much, so only part of the knowledge point screenshots out of the rough introduction, each small node there are more detailed content! Not only outline and directory, there is a need for the program ape (yuan) can help forward + pay attention to the end of the public account to obtain

Part 3: Hadoop operations

 

Chapter 10 building a Hadoop Cluster

 

 

Chapter 11 Managing Hadoop

 

 

Part IV Hadoop related open source projects

 

Chapter 12 on Avro

 

 

Chapter 13 on Parquet

 

 

Chapter 14 is about Flume

 

 

Chapter 15 is about Sqoop

 

 

Chapter 16 is about Pig

 

 

Chapter 17 is about HiveHive’s SHEL environment

 

 

Chapter 18 is about Crunch

 

 

Chapter 19 on Spark

 

 

Chapter 20 about HBase

 

 

Chapter 21 is about ZooKeeper

 

 

Part V case study

 

Chapter 22: Aggregable data of Cerner medical Company

 

 

Chapter 23 biodata science: Saving Lives with Software

 

 

Chapter 24 Open Source Project Cascading

 

 

The appendix

 

Appendix A Installing Apache Hadoop

 

Appendix B on CDH

 

Appendix C Preparing NCDC meteorological data

 

Appendix D New and old JavaMapReduce apis

 

To get the Hadoop Authoritative Guide for Big Data Storage and Analysis on page 730, just:

— The article will be forwarded + comments, follow the public account below to obtain.