In the previous article [First Hadoop instance: juejin.cn/post/684490…] JDK and Hadoop have been installed and environment variables have been configured. This article is about deploying distributed Hadoop on the basis of the original stand-alone version.

Vm Preparations

Clone three VMS configured in the previous part, change the IP addresses, disable the firewall, and set the host name. These basic operations are not the focus of this article and will not be covered here. Note: After changing the host name, modify the /etc/hosts file.

The virtual machine configuration in this example:

Hadoop001:192.168.48.111

Hadoop002:192.168.48.112

Hadoop003:192.168.48.113

SSH login without a password

  1. Go to my home directory

    cd ~/.ssh
    Copy the code
  2. To generate public and private keys, type the following command and press Enter three times to generate two files id_rsa (private key) and id_rsa.pub (public key).

    ssh-keygen -t rsa
    Copy the code
  3. Copy the public key to the target machine where you want to avoid secret login

    SSH - copy - id 192.168.48.112 SSH - copy - id 192.168.48.113Copy the code

Configure the cluster

Cluster Deployment Planning

Modifying a Configuration File

Directory where the configuration file resides: /opt/module/hadoop-2.7.2/etc/hadoop/

  • core-site.xml

    <configuration>
        <! -- specify HDFS NameNode address -->
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://hadoop001:9000</value>
        </property>
        <! -- Specify the file storage directory generated by hadoop runtime -->
        <property>
                <name>hadoop.tmp.dir</name>
                <value>The/opt/module/hadoop - 2.7.2 / data/TMP</value>
        </property>
    </configuration>
    Copy the code
  • hadoop-env.sh

    exportJAVA_HOME = / opt/module/jdk1.8.0 _131Copy the code
  • hdfs-site.xml

    <configuration>
        <property>
                <name>dfs.replication</name>
                <value>3</value>
        </property>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>hadoop003:50090</value>
        </property>
    </configuration>
    Copy the code
  • slaves

    hadoop001
    hadoop002
    hadoop003
    Copy the code
  • yarn-env.sh

    exportJAVA_HOME = / opt/module/jdk1.8.0 _131Copy the code
  • yarn-site.xml

    <configuration>
        <! -- Site specific YARN configuration properties -->
        <! -- Reducer obtain data -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <! -- Specify the ADDRESS of ResourceManager in YARN.
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>hadoop002</value>
        </property>
    </configuration>
    Copy the code
  • mapred-env.sh

    exportJAVA_HOME = / opt/module/jdk1.8.0 _131Copy the code
  • mapred-site.xml

    <configuration>
        <! -- Set Mr To run on YARN -->
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
    </configuration>
    Copy the code
Copy the configuration file to other VMS in the cluster
[root @ hadoop001 etc] # SCP - r hadoop/root @ hadoop002: / opt/module/hadoop - 2.7.2 / etc/root @ hadoop001 etc # SCP -r hadoop / Root @ hadoop003: / opt/module/hadoop - 2.7.2 / etcCopy the code

The cluster start

  1. If the cluster is started for the first time, you need to format the Namenode

    [root@hadoop001 hadoop-2.7.2]# bin/hdfs namenode -format
    Copy the code
  2. Start the cluster.

    [root @ hadoop001 hadoop - 2.7.2] # sbin/start - DFS. ShCopy the code
  3. Start YARN. Note: Start YARN on the machine where ResouceManager is located

    [root @ hadoop002 hadoop - 2.7.2] # sbin/start - yarn. ShCopy the code
  4. Check whether the startup is successful

Browser visit: http://192.168.48.111:50070

Browser visit: http://192.168.48.112:8088/cluster

Hadoop start and stop mode

  • Each service component starts one by one

    (1) Start the HDFS components respectively

    hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode

    (2) Start YARN

    yarn-daemon.sh start|stop resourcemanager|nodemanager

  • Start each module separately (SSH configuration is prerequisite)

    (1) Start or stop the HDFS

    ​ start-dfs.sh

    ​ stop-dfs.sh

    (2) Start or stop YARN

    ​ start-yarn.sh

    ​ stop-yarn.sh

  • Start all (not recommended)

    start-all.sh

    stop-all.sh

Operation of the cluster

  • Create an INPUT folder on the HDFS file system

    [root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - mkdir -p/user/sixj/graphs/wordcount/inputCopy the code
  • Upload the test content to the file system

    [root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - put wcinput/wc. Input/user/sixj/graphs/wordcount/inputCopy the code
  • Viewing the File List

    [root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - ls/user/sixj/graphs/wordcount/inputCopy the code
  • Viewing file Contents

    [root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - cat/user/sixj/graphs/wordcount/input/wc. Input the hadoop yarn hadoop mapreduce sixj JAVA sixjCopy the code
  • Download the file to the local PC

    [root @ hadoop001 hadoop - 2.7.2] # hadoop fs - get/user/sixj/graphs/wordcount/input/wc. InputCopy the code
  • Delete the file

    [root @ hadoop001 hadoop - 2.7.2] # hadoop fs - rm/user/sixj/graphs/wordcount/input/wc. InputCopy the code