First, preconditions
Hadoop relies on JDK to run and needs to be installed in advance. See the installation steps:
- JDK installation under Linux
2. Configure encryption-free login
Hadoop components communicate with each other based on SSH.
2.1 Mapping Configuration
Configure mapping between IP addresses and host names:
vim /etc/hosts
#End of file add
192.168.43.202 hadoop001
Copy the code
2.2 Generating Public and Private Keys
Execute the following command line to generate the public and private keys:
ssh-keygen -t rsa
Copy the code
3.3 license
Go to the ~/. SSH directory, view the generated public and private keys, and write the public key to the authorization file:
SSH [root@@hadoop001.ssh]# ll-rw -------. 1 root root 1675 3 月 15 09:48 id_rsa -rw-r--r--. 1 root root 388 March 15 09:48 id_rsa.pubCopy the code
#Writes the public key to the authorization file
[root@hadoop001 .ssh]# cat id_rsa.pub >> authorized_keys
[root@hadoop001 .ssh]# chmod 600 authorized_keys
Copy the code
Third, Hadoop(HDFS) environment construction
3.1 Download and Decompress the file
Download Hadoop installation package, here I download is CDH version, the download address is: archive.cloudera.com/cdh5/cdh/5/
#Unpack theThe tar - ZVXF hadoop - server - cdh5.15.2. Tar. GzCopy the code
3.2 Configuring Environment Variables
# vi /etc/profile
Copy the code
Configure environment variables:
exportHADOOP_HOME = / usr/app/hadoop - server - cdh5.15.2export PATH=${HADOOP_HOME}/bin:$PATH
Copy the code
Run the source command to make the configured environment variables take effect immediately:
# source /etc/profile
Copy the code
3.3 Modifying Hadoop Configurations
Go to ${HADOOP_HOME}/etc/hadoop/ and modify the following configuration:
1. hadoop-env.sh
#JDK Installation PathExport JAVA_HOME = / usr/Java/jdk1.8.0 _201 /Copy the code
2. core-site.xml
<configuration>
<property>
<! -- set HDFS address for namenode -->
<name>fs.defaultFS</name>
<value>hdfs://hadoop001:8020</value>
</property>
<property>
<! -- Specify the directory where Hadoop stores temporary files -->
<name>hadoop.tmp.dir</name>
<value>/home/hadoop/tmp</value>
</property>
</configuration>
Copy the code
3. hdfs-site.xml
Specify copy coefficient and temporary file storage location:
<configuration>
<property>
<!--由于我们这里搭建是单机版本,所以指定 dfs 的副本系数为 1-->
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Copy the code
4. slaves
Configure the host name or IP address of all slave nodes. Since it is a single-node version, you can specify the local machine:
hadoop001
Copy the code
3.4 Disabling the Firewall
If the firewall is not disabled, you may fail to access the Hadoop Web UI:
#Checking the Firewall Status
sudo firewall-cmd --state
#Disable the firewall:
sudo systemctl stop firewalld.service
Copy the code
3.5 the initialization
To initialize Hadoop for the first time, go to ${HADOOP_HOME}/bin/ and run the following command:
[root@hadoop001 bin]# ./hdfs namenode -format
Copy the code
3.6 start the HDFS
Go to ${HADOOP_HOME}/sbin/ and start HDFS:
[root@hadoop001 sbin]# ./start-dfs.sh
Copy the code
3.7 Verifying the startup
Method 1: Run JPS to check whether the NameNode and DataNode services are started.
[root@hadoop001 hadoop-2.6.0-cdh5.15.2]# jps
9137 DataNode
9026 NameNode
9390 SecondaryNameNode
Copy the code
Method 2: View the Web UI. Port number is 50070.
Set up the Hadoop(YARN) environment
4.1 Modifying the Configuration
Go to ${HADOOP_HOME}/etc/hadoop/ and modify the following configuration:
1. mapred-site.xml
#If mapred-site. XML is not available, copy a sample file and modify it
cp mapred-site.xml.template mapred-site.xml
Copy the code
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
Copy the code
2. yarn-site.xml
<configuration>
<property>
<! Configure ancillary services to run on NodeManager. You need to configure mapreduce_shuffle to run the MapReduce program on Yarn. -->
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
Copy the code
4.2 Starting the Service
Go to ${HADOOP_HOME}/sbin/ and start YARN:
./start-yarn.sh
Copy the code
4.3 Verifying the startup
Method 1: Run the JPS command to check whether the NodeManager and ResourceManager services are started.
[root@hadoop001 hadoop-2.6.0-cdh5.15.2]# jps
9137 DataNode
9026 NameNode
12294 NodeManager
12185 ResourceManager
9390 SecondaryNameNode
Copy the code
Method 2: View the Web UI using port 8088.
See the GitHub Open Source Project: Getting Started with Big Data for more articles in the big Data series