Distributed Deployment of Hadoop

In the previous article [First Hadoop instance: juejin.cn/post/684490…] JDK and Hadoop have been installed and environment variables have been configured. This article is about deploying distributed Hadoop on the basis of the original stand-alone version.

Vm Preparations

Clone three VMS configured in the previous part, change the IP addresses, disable the firewall, and set the host name. These basic operations are not the focus of this article and will not be covered here. Note: After changing the host name, modify the /etc/hosts file.

The virtual machine configuration in this example:

Hadoop001:192.168.48.111

Hadoop002:192.168.48.112

Hadoop003:192.168.48.113

SSH login without a password

Go to my home directory
```
cd ~/.ssh
Copy the code
```
To generate public and private keys, type the following command and press Enter three times to generate two files id_rsa (private key) and id_rsa.pub (public key).
```
ssh-keygen -t rsa
Copy the code
```
Copy the public key to the target machine where you want to avoid secret login
```
SSH - copy - id 192.168.48.112 SSH - copy - id 192.168.48.113Copy the code
```

Configure the cluster

Cluster Deployment Planning

Modifying a Configuration File

Directory where the configuration file resides: /opt/module/hadoop-2.7.2/etc/hadoop/

core-site.xml

<configuration>
    <! -- specify HDFS NameNode address -->
    <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop001:9000</value>
    </property>
    <! -- Specify the file storage directory generated by hadoop runtime -->
    <property>
            <name>hadoop.tmp.dir</name>
            <value>The/opt/module/hadoop - 2.7.2 / data/TMP</value>
    </property>
</configuration>
Copy the code

hadoop-env.sh

exportJAVA_HOME = / opt/module/jdk1.8.0 _131Copy the code

hdfs-site.xml

<configuration>
    <property>
            <name>dfs.replication</name>
            <value>3</value>
    </property>
    <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop003:50090</value>
    </property>
</configuration>
Copy the code

slaves

hadoop001
hadoop002
hadoop003
Copy the code

yarn-env.sh

exportJAVA_HOME = / opt/module/jdk1.8.0 _131Copy the code

yarn-site.xml

<configuration>
    <! -- Site specific YARN configuration properties -->
    <! -- Reducer obtain data -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
    <! -- Specify the ADDRESS of ResourceManager in YARN.
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop002</value>
    </property>
</configuration>
Copy the code

mapred-env.sh

exportJAVA_HOME = / opt/module/jdk1.8.0 _131Copy the code

mapred-site.xml

<configuration>
    <! -- Set Mr To run on YARN -->
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>
</configuration>
Copy the code

Copy the configuration file to other VMS in the cluster

[root @ hadoop001 etc] # SCP - r hadoop/root @ hadoop002: / opt/module/hadoop - 2.7.2 / etc/root @ hadoop001 etc # SCP -r hadoop / Root @ hadoop003: / opt/module/hadoop - 2.7.2 / etcCopy the code

The cluster start

If the cluster is started for the first time, you need to format the Namenode
```
[root@hadoop001 hadoop-2.7.2]# bin/hdfs namenode -format
Copy the code
```

Start the cluster.

[root @ hadoop001 hadoop - 2.7.2] # sbin/start - DFS. ShCopy the code

Start YARN. Note: Start YARN on the machine where ResouceManager is located
```
[root @ hadoop002 hadoop - 2.7.2] # sbin/start - yarn. ShCopy the code
```
Check whether the startup is successful

Browser visit: http://192.168.48.111:50070

Browser visit: http://192.168.48.112:8088/cluster

Hadoop start and stop mode

Each service component starts one by one

(1) Start the HDFS components respectively

hadoop-daemon.sh start|stop namenode|datanode|secondarynamenode

(2) Start YARN

yarn-daemon.sh start|stop resourcemanager|nodemanager
Start each module separately (SSH configuration is prerequisite)

(1) Start or stop the HDFS

start-dfs.sh

stop-dfs.sh

(2) Start or stop YARN

start-yarn.sh

stop-yarn.sh
Start all (not recommended)

start-all.sh

stop-all.sh

Operation of the cluster

Create an INPUT folder on the HDFS file system

[root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - mkdir -p/user/sixj/graphs/wordcount/inputCopy the code

Upload the test content to the file system

[root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - put wcinput/wc. Input/user/sixj/graphs/wordcount/inputCopy the code

Viewing the File List

[root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - ls/user/sixj/graphs/wordcount/inputCopy the code

Viewing file Contents

[root @ hadoop001 hadoop - 2.7.2] # bin/HDFS DFS - cat/user/sixj/graphs/wordcount/input/wc. Input the hadoop yarn hadoop mapreduce sixj JAVA sixjCopy the code

Download the file to the local PC

[root @ hadoop001 hadoop - 2.7.2] # hadoop fs - get/user/sixj/graphs/wordcount/input/wc. InputCopy the code

Delete the file

[root @ hadoop001 hadoop - 2.7.2] # hadoop fs - rm/user/sixj/graphs/wordcount/input/wc. InputCopy the code

Vm Preparations

SSH login without a password

Configure the cluster

Cluster Deployment Planning

Modifying a Configuration File

Copy the configuration file to other VMS in the cluster

The cluster start

Hadoop start and stop mode

Operation of the cluster

Related Posts

Review on Radio Localization: Basics and state of the art

How to make a better job-hopping choice

Java interview, how to do surprise in a short time