First, preconditions

Hadoop relies on JDK to run and needs to be installed in advance. See the installation steps:

  • JDK installation under Linux

2. Configure encryption-free login

Hadoop components communicate with each other based on SSH.

2.1 Mapping Configuration

Configure mapping between IP addresses and host names:

vim /etc/hosts
#End of file add
192.168.43.202  hadoop001
Copy the code

2.2 Generating Public and Private Keys

Execute the following command line to generate the public and private keys:

ssh-keygen -t rsa
Copy the code

3.3 license

Go to the ~/. SSH directory, view the generated public and private keys, and write the public key to the authorization file:

SSH [root@@hadoop001.ssh]# ll-rw -------. 1 root root 1675 3 月 15 09:48 id_rsa -rw-r--r--. 1 root root 388 March 15 09:48 id_rsa.pubCopy the code
#Writes the public key to the authorization file
[root@hadoop001 .ssh]# cat id_rsa.pub >> authorized_keys
[root@hadoop001 .ssh]# chmod 600 authorized_keys
Copy the code

Third, Hadoop(HDFS) environment construction

3.1 Download and Decompress the file

Download Hadoop installation package, here I download is CDH version, the download address is: archive.cloudera.com/cdh5/cdh/5/

#Unpack theThe tar - ZVXF hadoop - server - cdh5.15.2. Tar. GzCopy the code

3.2 Configuring Environment Variables

# vi /etc/profile
Copy the code

Configure environment variables:

exportHADOOP_HOME = / usr/app/hadoop - server - cdh5.15.2export  PATH=${HADOOP_HOME}/bin:$PATH
Copy the code

Run the source command to make the configured environment variables take effect immediately:

# source /etc/profile
Copy the code

3.3 Modifying Hadoop Configurations

Go to ${HADOOP_HOME}/etc/hadoop/ and modify the following configuration:

1. hadoop-env.sh

#JDK Installation PathExport JAVA_HOME = / usr/Java/jdk1.8.0 _201 /Copy the code

2. core-site.xml

<configuration>
    <property>
        <! -- set HDFS address for namenode -->
        <name>fs.defaultFS</name>
        <value>hdfs://hadoop001:8020</value>
    </property>
    <property>
        <! -- Specify the directory where Hadoop stores temporary files -->
        <name>hadoop.tmp.dir</name>
        <value>/home/hadoop/tmp</value>
    </property>
</configuration>
Copy the code

3. hdfs-site.xml

Specify copy coefficient and temporary file storage location:

<configuration>
    <property>
        <!--由于我们这里搭建是单机版本,所以指定 dfs 的副本系数为 1-->
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
Copy the code

4. slaves

Configure the host name or IP address of all slave nodes. Since it is a single-node version, you can specify the local machine:

hadoop001
Copy the code

3.4 Disabling the Firewall

If the firewall is not disabled, you may fail to access the Hadoop Web UI:

#Checking the Firewall Status
sudo firewall-cmd --state
#Disable the firewall:
sudo systemctl stop firewalld.service
Copy the code

3.5 the initialization

To initialize Hadoop for the first time, go to ${HADOOP_HOME}/bin/ and run the following command:

[root@hadoop001 bin]# ./hdfs namenode -format
Copy the code

3.6 start the HDFS

Go to ${HADOOP_HOME}/sbin/ and start HDFS:

[root@hadoop001 sbin]# ./start-dfs.sh
Copy the code

3.7 Verifying the startup

Method 1: Run JPS to check whether the NameNode and DataNode services are started.

[root@hadoop001 hadoop-2.6.0-cdh5.15.2]# jps
9137 DataNode
9026 NameNode
9390 SecondaryNameNode
Copy the code

Method 2: View the Web UI. Port number is 50070.

Set up the Hadoop(YARN) environment

4.1 Modifying the Configuration

Go to ${HADOOP_HOME}/etc/hadoop/ and modify the following configuration:

1. mapred-site.xml

#If mapred-site. XML is not available, copy a sample file and modify it
cp mapred-site.xml.template mapred-site.xml
Copy the code
<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
</configuration>
Copy the code

2. yarn-site.xml

<configuration>
    <property>
        <! Configure ancillary services to run on NodeManager. You need to configure mapreduce_shuffle to run the MapReduce program on Yarn. -->
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
</configuration>
Copy the code

4.2 Starting the Service

Go to ${HADOOP_HOME}/sbin/ and start YARN:

./start-yarn.sh
Copy the code

4.3 Verifying the startup

Method 1: Run the JPS command to check whether the NodeManager and ResourceManager services are started.

[root@hadoop001 hadoop-2.6.0-cdh5.15.2]# jps
9137 DataNode
9026 NameNode
12294 NodeManager
12185 ResourceManager
9390 SecondaryNameNode
Copy the code

Method 2: View the Web UI using port 8088.

See the GitHub Open Source Project: Getting Started with Big Data for more articles in the big Data series