The article directories

  • 1. Hadoop cluster Overview
  • 2. Hadoop heat mode
  • 3. Hadoop source code compilation
  • 4. Hadoop cluster installation
    • Step1: Plan cluster roles
    • Step2: prepare the basic server environment
    • Step3: upload and decompress the installation package
    • Step4:Hadoop installation package directory structure
    • Step5: edit Hadoop configuration file (1)
    • Step5: edit Hadoop configuration file (2)
    • Step5: edit Hadoop configuration file (3)
    • Step5: edit Hadoop configuration file (4)
    • Step5: edit Hadoop configuration file (5)
    • Step5: edit Hadoop configuration file (6)
    • Step6: distribute the synchronization installation package
    • Step7: configure Hadoop environment variables
  • 5. To summarize

1. Hadoop cluster Overview

  • The Hadoop cluster consists of two clusters :HDFS cluster and YARN cluster
  • Two clusters are logically separate and usually physically together
  • Both clusters are standard master-slave architecture clusters



  • Logically separated two clusters do not depend on each other and do not affect each other
  • Physically together Some role processes are often deployed on the same physical server
  • What about MapReduce clusters?

    MapReduce is a computing framework without clustering components at the code level

2. Hadoop heat mode

3. Hadoop source code compilation

4. Hadoop cluster installation

Step1: Plan cluster roles

  • Guidelines for role planning

    According to the software working characteristics and server hardware resources reasonable allocation

    Such as memory dependent workNameNodeIs it deployed on a large memory machine?
  • Precautions for role planning

    Resources in conflict, try not to deploy together

    Work needs to cooperate with each other. Deploy together as much as possible

Step2: prepare the basic server environment

  • Host name (3 machines)
vim /etc/hostname
Copy the code

  • Hosts mapping (three machines) |
vim /etc/hosts
Copy the code

  • Firewall down (3 machines)
systemctl stop firewalld.service   # disable firewall
systemctl disable firewalld.service # Disable firewall startup
Copy the code
  • SSH Password-free Login (node1Perform – >node1|node2|node3)
ssh-keygen #4 carriage returns to generate public and private keys
ssh-copy-idNode1, SSH,-copy-id2, SSH,-copy-id node3 #
Copy the code
  • Inspection:

  • Cluster time synchronization (3 machines)
yum -y install ntpdate
ntpdate ntp4.aliyun.com
Copy the code

  • JDK 1.8 Installation (3 machines)

Step3: upload and decompress the installation package

  • Create a unified working directory (on three machines), create it by yourself, and divide it by yourself
mkdir -p /export/server/    Software installation path
mkdir -p /export/data/      # Data storage path
mkdir -p /export/software/  Path for storing the installation package
Copy the code

  • Uploading and decompressing the installation package (node1)
tar -zxvf hadoop- 3.2.2 -C /export/server/
Copy the code

-c: specifies the installation path

Step4:Hadoop installation package directory structure

Step5: edit Hadoop configuration file (1)

  • Open the Hadoop root directoryetc/hadoop-env.shfile
cd /export/server/hadoop- 3.1.4/etc/hadoop/
vim hadoop-env.sh
Copy the code
  • Configuring the JDK environment can be accessed from my previous blog post on the problems encountered in configuring the Java environment for Linux servers
  • Specify the root directory to install the JDK and configure JAVA_HOME
export JAVA_HOME=/export/server/jdk1.8.0_65
Copy the code

  • Add the following command at the end of the file to set the user to execute the shell command of the corresponding role
# set the user to execute the corresponding role shell command
export HDFS_NAMENODE_USER=root
export HDFS_DATANODE_USER=root
export HDFS_SECONDARYNAMENODE_USER=root
export YARN_RESOURCEMANAGER_USER=root
export YARN_NODEMANAGER_USER=root 
Copy the code

Step5: edit Hadoop configuration file (2)

  • Still in the Hadoop root directoryetcThe configurationcore-site.xmlIn the<configuration>Add the following code below
  • Note: the following domain name should be changed to your host corresponding
<! -- Name of the default file system. Distinguish file systems based on the SCHEMA in the URI. -->
<! -- file:/// HDFS :// hadoop distributed file system GFS :// -->
<! -- HDFS file system access address: http://nn_host:8020. -->
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://node1.xdr630.com:8020</value>
</property>
<! -- Hadoop local data store directory format automatically generated -->
<property>
    <name>hadoop.tmp.dir</name>
    <value>/ export/data/hadoop - 3.1.4</value>
</property>
<! - User name for accessing HDFS on the Web UI. -->
<property>
    <name>hadoop.http.staticuser.user</name>
    <value>root</value>
</property>
Copy the code

Step5: edit Hadoop configuration file (3)

  • configurationhdfs-site.xmlAnd in the<configuration>add
<! -- Set the SNN running host and port. -->
<property>
    <name>dfs.namenode.secondary.http-address</name>
    <value>node2.xdr630.com:9868</value>
</property>
Copy the code

Step5: edit Hadoop configuration file (4)

  • configurationmapred-site.xmlTo add
<! The default running mode of the Mr Program. Yarn cluster mode local Local mode -->
<property>
  <name>mapreduce.framework.name</name>
  <value>yarn</value>
</property>
<! -- MR App Master environment variable. -->
<property>
  <name>yarn.app.mapreduce.am.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<! -- MR MapTask environment variable. -->
<property>
  <name>mapreduce.map.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
<! -- MR ReduceTask Environment variables. -->
<property>
  <name>mapreduce.reduce.env</name>
  <value>HADOOP_MAPRED_HOME=${HADOOP_HOME}</value>
</property>
Copy the code

Step5: edit Hadoop configuration file (5)

  • configurationyarn-site.xmlTo add
<! - Active role RM of the YARN cluster runs the machine. -->
<property>
    <name>yarn.resourcemanager.hostname</name>
    <value>node1.xdr630.com</value>
</property>
<! -- Subsidiary services running on NodeManager. You need to configure mapreduce_shuffle to run the MR program. -->
<property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
</property>
<! -- The minimum memory resource (in MB) requested by each container. -->
<property>
  <name>yarn.scheduler.minimum-allocation-mb</name>
  <value>512</value>
</property>
<! -- The maximum memory resource requested by each container in MB. -->
<property>
  <name>yarn.scheduler.maximum-allocation-mb</name>
  <value>2048</value>
</property>
<! The ratio of the container's virtual memory to physical memory. -->
<property>
  <name>yarn.nodemanager.vmem-pmem-ratio</name>
  <value>4</value>
</property>
Copy the code

Step5: edit Hadoop configuration file (6)

  • To configure workers, add
node1.xdr630.com
node2.xdr630.com
node3.xdr630.com
Copy the code

Step6: distribute the synchronization installation package

  • Synchronize the Hadoop installation package SCP to other machines on Node1
cd /export/server/
scp -r hadoop- 3.1.4 root@node2:/usr/local/
scp -r hadoop- 3.1.4 root@node3:/usr/local/
Copy the code

Step7: configure Hadoop environment variables

  • Configure the Hadoop environment variables on Node1
vim /etc/profile
export HADOOP_HOME=/export/server/hadoop- 3.1.4
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Copy the code
  • Synchronize the modified environment variables to other machines
scp /etc/profile root@node2:/etc/
scp /etc/profile root@node3:/etc/
Copy the code
  • Reloading environment variables to verify that they work (3 machines)
source /etc/profile
hadoop Verify that the environment variable is in effect
Copy the code

5. To summarize

  1. Server Base Environment
  2. Hadoop source code
  3. Hadoop configuration file modification
  4. Shell file, 4 XML files, workers file
  5. Configure file cluster synchronization