I. Overview of big data

concept

Big Data: refers to the Data collection that cannot be captured, managed and processed by conventional software tools within a certain time range. It is a massive, high-growth and diversified information asset that requires new processing mode to have stronger decision-making power, insight and discovery ability and process optimization ability. The main solution, mass data storage and mass data analysis and calculation problems.

The characteristics of

A large Volume of
The Velocity of Velocity
Variety
Value (Low Value density)

Hadoop framework

What is the Hadoop

Hadoop is a distributed system infrastructure developed by the Apache Foundation.
The main solution, mass data storage and mass data analysis and calculation problems.
In broad terms, Hadoop usually refers to a broader concept — the Hadoop ecosystem.

Hadoop ### history

GFS => HDFS Map-Reduce => MR BigTable => HBase

There are three major distributions of Hadoop

Hadoop has three major distributions: Apache, Cloudera, and Hortonworks

The original (most basic) version of Apache is best for getting started.
Cloudera is widely used by large Internet companies;
Hortonworks is well documented;

The advantage of the Hadoop

High reliability: Hadoop maintains multiple data copies, so even if a computing element or storage of Hadoop fails, data will not be lost.
High scalability: Allocating task data between clusters, easily scaling thousands of nodes;
Efficiency: Under the idea of MapReduce, Hadoop works in parallel to speed up task processing;
High fault tolerance: Can automatically reassign failed tasks;

Third, Hadoop composition

In hadoop1. x, MapReduce in Hadoop processes service logic operations and resource scheduling at the same time, resulting in high coupling. HDFS in Hadoop is responsible for data storage.

In hadoop2. x mode, Hadoop adds Yarn for resource scheduling, MapReduce for service logic computing, and HDFS for data storage.

HDFS

NameNode (nn) : Stores file metadata, such as file name, file directory structure, file attributes (generation time, number of copies, and file permissions), block list of each file, and DataNode where the block resides.
DataNode (DN) : Stores file block data and the checksum of block data in the local file system.
Secondary NameNode (2nn) : Secondary background program used to monitor the HDFS status and obtain snapshots of HDFS metadata at intervals.

MapReduce

MapReduce divides the calculation process into two phases: Map and Reduce

Map phase processes input data in parallel.
In the Reduce phase, Map results are summarized.

Yarn

ResourceManager (RM) provides the following functions:
- Handle client requests;
- Monitor the NodeManager;
- Start or monitor ApplicationMaster;
- Allocation and scheduling of resources;
NodeManager (NM) has the following functions:
- Manage resources on a single node.
- Process commands from ResourceManager.
- Handle commands from ApplicationMaster;
ApplicationMaster (AM) does the following:
- Responsible for starting MapTask based on slice information (job.split);
- Allocates resources for the application and assigns them to internal tasks;
- Task monitoring and fault tolerance;

Container is a resource abstraction in YARN. It encapsulates multi-dimensional resources on a node, such as memory, CPUS, disks, and networks.

Container: Container is a resource abstraction in YARN. It encapsulates multi-dimensional resources on a node, such as memory, CPUS, disks, and networks.

Hadoop port

9870: NameNode Web access port

<! - 9870: HDFS NameNode WEB UI port number --> <property> <name>dfs.namenode.http-address</name> <value>0.0.0.0:9870</value> <description>  The address and the base port where the dfs namenode web ui will listen on. </description> </property>Copy the code

8088: Web access port of ResourceManager

<! - 8088: Yarn ResourceManager WEB UI port number --> <property> <description>The HTTP address of The RM WEB Application.</description> <name>yarn.resourcemanager.webapp.address</name> <value>${yarn.resourcemanager.hostname}:8088</value> </property> <! -- yar - site. The XML configuration - > < property > < name > yarn. The resourcemanager. The hostname < / name > < value > centos7202 < value > / < / property >Copy the code

9868: Secondary NameNode Web access port

<! - 9868: HDFS Secondary NameNode WEB UI port - > < property > < name > DFS. The NameNode. Secondary. HTTP - address < / name > <value>0.0.0.0:9868</value> <description> The secondary Namenode HTTP server address and port. </description> </property> <! -- HDFS - site. The XML configuration - > < property > < name > DFS. The namenode. Secondary. HTTP - address < / name > < value > centos7203:9868 < / value > </property>Copy the code

19888: Job History Web access port of the History server

<! - 19888: HDFS the high availability of HDFS RPC ports - > < property > < name > graphs. The jobhistory. Webapp. Address < / name > < value > 0.0.0.0:19888 < value > </property> <! -- mapred-site.xml --> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>centos7203:19888</value> </property>Copy the code

10020: Historical server connection port

<! Server address - history - > < property > < name > graphs. The jobhistory. Address < / name > < value > 0.0.0.0:10020 < value > / < description > graphs JobHistory Server IPC host:port</description> </property> <! -- mapred-site.xml --> <property> <name>mapreduce.jobhistory.address</name> <value>centos7203:10020</value> </property>Copy the code

8485: Journalnode connection port

<! - 8485: Journalnode RPC port --> <property> <name> dfs.journalNode. rpc-address</name> <value>0.0.0.0:8485</value> <description> The  JournalNode RPC server address and port. </description> </property>Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop Framework: Getting Started with Hadoop

I. Overview of big data

concept

The characteristics of

Hadoop framework

What is the Hadoop

Hadoop ### history

There are three major distributions of Hadoop

The advantage of the Hadoop

Third, Hadoop composition

HDFS

MapReduce

Yarn

Hadoop port

Hadoop Framework: Getting Started with Hadoop

I. Overview of big data

concept

The characteristics of

Hadoop framework

What is the Hadoop

Hadoop ### history

There are three major distributions of Hadoop

The advantage of the Hadoop

Third, Hadoop composition

HDFS

MapReduce

Yarn

Hadoop port

Related Posts

Let’s Encrypt Generic Domain certificate application generation 10 minute video tutorial

Interviewer: How do I find the longest string in a string that has no duplicates?

How to convert List<Integer> to an int[] array