Environment: Hadoop2.10.1 jdk8, hive2.3.9, centos7 hadoop cluster contains three nodes, three nodes respectively three virtual machine environment, a master (the host called service01), two slave (hostname service03, respectively, Service04), the IP addresses of master, Slave1, and Slave2 are 192.168.248.158, 192.168.248.157, and 192.168.248.159 respectively. Perform all operations as the root user (not necessary). Note: This article is based on the steps of many online articles, combined with their own actual construction, put together the content of the reference article.
1. Environment configuration
(1) Modify the current machine name (three machines synchronous, can also complete one, clone two) suppose we find our machine host name is not what we want, then modify the host name. Perform:
hostnamectl set-hostname service01
Copy the code
(2) Configure the hosts file (mandatory) the /etc/hosts file is used to configure the DNS server information for the host. It is used to record the HostName IP of each host connected to the LAN. When connecting to the network, the user first searches for the IP address corresponding to the host name in this file. If you want to ping the host name of the host and find that the host name cannot be found, the solution is to modify the file “/etc/hosts”. This problem can be solved by writing the IP address of each host in the LAN to the HostName file. Perform:
vim /etc/hosts
Copy the code
As shown below:
You can check whether the host name can be pinged. Ping general rule Configuration succeeds.
2. SSH does not have password authentication configuration
When Hadoop is running, you need to manage remote Hadoop daemons. After Hadoop is started, NameNode uses Secure Shell (SSH) to start and stop daemons on datanodes. Therefore, we need to configure SSH to use password-free public key authentication. NameNode uses SSH to log in and start DataName process password-free. You can also use SSH to log in to NameNode without password on datanodes. Note: If you don’t have SSH installed on Linux, install SSH first
2.1 SSH Basic Principles and Usage
(1) Basic PRINCIPLE of SSH SSH can ensure security because it uses public key encryption. The process is as follows: 1) The remote host receives the login request from the user and sends its public key to the user. 2) The user uses the public key to encrypt the login password and send it back. 3) The remote host decrypts the login password with its own private key, and if the password is correct, it agrees to the user login. If the remote login user name is Java and the remote login host name is Linux, run the following command: SSH java@linux The default SSH port is 22, that is, your login request will be sent to port 22 of the remote host. You can use p to change the port number, for example, to 88. Run the SSH -p 88 java@linux command
2.2 Configuring the Master to Log in to all Salves without a Password
A. Service01 uses the ssh-keygen command to generate a password-free key pair. Run ssh-keygen -t rsa -p “to generate key pairs: id_rsa (private key) and id_rsa.pub (public key). As I am the root user, the key pairs are stored in the “/~/. B. Perform the following operations on service01 to add id_rsa.pub to the authorized key. Cat ~/. SSH /id_rsa.pub >> ~/. SSH /authorized_keys Check the permission of authorized_keys. If the permission is incorrect, run the following command to set the permission of authorized_keys: chmod 600 authorized_keys c. Modify the following contents in the SSH configuration file /etc/ssh/sshd_config. RSAAuthentication yes # enabling RSA authentication PubkeyAuthentication Yes # enabling public and private key pairing authentication AuthorizedKeysFile ~/. SSH /authorized_keys # The following figure shows the path of the public key file:
Remember to restart the SSH service for the Settings to take effect. performsystemctl restart sshd
.
performssh root@localhost
Check whether the login takes effect. The next thing to do is copy the public key to all the Slave machines.
D. Run the ssh-copy-id command to transfer the public key to the remote host.
ssh-copy-id root@service03
Test whether login to the remote machine without password is successful,ssh root@service03
.
Next, configure all slaves to log in to Master without a password. The procedure is the same as that for Master to log in to all slaves without a password.
3, JDK environment installation
3.1 to install the JDK
Log in to service01 as user root, create a Java folder in /usr, copy jdK-8U301-linux-x64.tar. gz to the /usr/localcal/java folder, and decompress the folder. /usr/local/java = jdk1.8.0_301; /usr/local/ Java = jdk1.8.0_301; /usr/local/ Java = jdk1.8.0_301;
3.2 Configuring Environment Variables
Edit the “/etc/profile” file and add the following lines at the end:
performsource /etc/profile
To make the configuration take effect immediately.
Check whether the configuration is successful:
If the previous figure is displayed, the configuration is successful.
4. Hadoop cluster installation
4.1 install hadoop
Download the hadoop installation package hadoop-2.10.1.tar.gz, place the Hadoop installation package in /usr (you can customize the directory), and decompress tar -zxvf hadoop-2.10.1.tar.gz. Rename the file mv hadoop-2.10.1 Hadoop. Create TMP directory in /usr/hadoop directory and add hadoop installation path to /etc/profile as follows:
4.2 configure hadoop
After the master node is configured, copy the entire directory to all slave nodes and start the cluster on the master node. (1) Run the hadoop-env.sh command to change the JAVA_HOME value, as shown in the following figure:
2. Configure core-site.xml
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/hadoop/tmp</value>
<description>A base for other temprary directories</description>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://service01:9000</value>
</property>
<property>
<name>hadoop.proxyuser.root.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.root.hosts</name>
<value>*</value>
</property>
</configuration>
Copy the code
(3) Configure hdFS-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
Copy the code
XML. Run the mv mapred-site.xml.template mapred-site. XML command and then configure mapred-site. XML
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Copy the code
(5) Configure yarn-site.xml
<configuration> <! -- Site specific YARN configuration properties --><property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.248.158</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.log-aggregation-seconds</name>
<value>604800</value>
</property>
</configuration>
Copy the code
(6) Configure slaves file
(7) Format the HDFS
cd /usr/hadoop/bin
./hdfs namenode -format
Copy the code
Start the HDFS cluster. Before starting the YARN cluster, start the HDFS cluster on the node where NameNode resides and start the YARN cluster on the node where ResourceManager resides
cd /usr/hadoop/sbin
./start-dfs.sh
./start-yarn.sh
./mr-jobhistory-daemon.sh start historyserver
Copy the code
After that, the Java development plug-in JPS that we mentioned earlier will come in handy. Execute JPS command
At this point, our Hadoop configuration is complete.hadoop
Provides a Web side to help us intuitively viewHDFS
File system, url as follows:
http://localhost:50070/dfshealth.html#tab-overview
This is what it looks like when you open it
Yarn also provides a web side to help you view and use ithttp://localhost:8088/cluster
Mainly refer to the article: blog.csdn.net/u012421093/… Juejin. Cn/post / 684490…