A case study of Hadoop's fully distributed operating mode

This is the 11th day of my participation in Gwen Challenge

Last: Hadoop pseudo-distribution mode of case practice, today xiaobian combined with case operation to tell you about Hadoop completely distributed operation mode. In fact, fully distributed mode (cluster mode) is the focus of development and is a necessary skill to move towards CTO. In the actual work is the complete distribution mode, xiaobian will describe the construction process in detail in this article, feel good, please like forwarding, reprint collection.

The pseudo-distribution mode is to run Hadoop on a “single-node cluster”, where all daemons are running on the same machine. This mode adds code debugging capabilities over stand-alone mode, allowing you to check memory usage, HDFS input and output, and other daemon interactions. The fully distributed running mode is commonly used in production environments. In this case, we use N hosts to form a Hadoop cluster. The Hadoop daemon runs on each host. There are hosts where Namenode runs, hosts where Datanode runs, and hosts where Task Tracker runs. In a distributed environment, the master and slave nodes are separated. The analysis of the fully distributed mode setup process is shown in the following figure.

In the last article, we explained the pseudo-distributed mode construction process. In this article, the xiaobian will explain the completely distributed operation mode construction. In this section, the xiaobian will introduce the whole pseudo-distributed mode construction process in detail based on practical operation, which can be roughly divided into the following 8 steps:

Hadoop102, hadoop103, hadoop104 all files in /opt/module. sudo chown xuefa:xuefa -R /opt/module

Note: Namenodes and Secondarynamenodes should be 1:1. This requires that they be placed on different servers rather than on the same server.

(1) Core configuration file: vim core-site.xml

Run the hadoop-env.sh command to vim hadoop-env.sh

Export JAVA_HOME = / opt/module/jdk1.8.0 _144

Configure hdfs-site. XML: vim hdfs-site. XML

Run the vim yarn-env.sh command to configure yarn-env.sh

Export JAVA_HOME = / opt/module/jdk1.8.0 _144

Configure yarn-site. XML: vim yarn-site. XML

Run the mapred-env.sh command to vim mapred-env.sh

Export JAVA_HOME = / opt/module/jdk1.8.0 _144

cp mapred-site.xml.templatemapred-site.xml

(1) If the cluster is started for the first time, format ****NameNode

(3) Start DataNode on HadoOP102, Hadoop103 and Hadoop104 respectively

http://192.168.220.132:50070/explorer.html#/

(4) Thinking: start one node at a time. What if the number of nodes increases to 1000?

Morning came to start one node at a time, to the evening off just finished, off?

To solve the (4) question above, we need an SSH password-free login configuration

(2) Generate public and private keys: ssh-keygen -t rsa

5.3. Function Description of files in the SSH folder (~/.ssh)

The/opt/module/hadoop – 2.7.2 / etc/hadoop/slaves

Note: Additions to the file are not allowed to end with Spaces, and empty lines are not allowed in the file.

(1) If the cluster is started for the first time, you need to format the NameNode (before formatting, you must stop all NameNode and Datanode processes started last time, and then delete data and log data).

Hadoop103 and Hadoop104 stop the boot in the same way

(2) Start HDFS: sbin/start-dfs.sh

(3) Start YARN: sbin/start-yarn.sh

Note: If NameNode and ResourceManger are not on the same machine, YARN cannot be started on NameNode. Start YARN on the machine where ResouceManager is located.

(4) View SecondaryNameNode on the Web

(a) the browser input: http://hadoop104:50090/status.html

(b) View SecondaryNameNode information, as shown in the figure.

The Hadoop fully distributed mode is over, and the next section will supplement it: cluster time synchronization. Stay tuned. Xiaobian will continue to update big data and other content, please stay tuned. More exciting content, please pay attention to the public number: Xiao Han senior take you to learn

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

A case study of Hadoop’s fully distributed operating mode

A case study of Hadoop’s fully distributed operating mode

Related Posts

Json data in Python -2

Graph neural network core summary

Community discovery (I) : tag propagation algorithm