First, the preparatory work

1.1) Software preparation

  • Download Hadoop, in this case 3.3.0; Download address: mirrors.hust.edu.cn/apache/hado…
  • Prepare the JDK; The local version is: 1.8.0_262; The installation of the JDK is no longer described here, many online, find their own Baidu bar;

1.2) SSH Public and private key secret-free login configuration

To perform first

ssh localhost
Copy the code

If you need to enter the password, it indicates that the encryption-free login is not enabled. After entering the password, perform the following steps. Skip the following steps if you are not prompted for a password;

ssh-keygen -t rsa
Copy the code

Execute the above command, press enter or Y or yes, and execute the following commands in sequence.

cd .ssh
touch authorized_keys
chmod 600 authorized_keys
Copy the code

Append the public key to the authorized_keys file

cat id_rsa.pub >> authorized_keys
Copy the code

Run the following command again to see the effect

ssh localhost
Copy the code

Second, Hadoop construction

2.1) Decompress the Hadoop package

It is best to put the decompression package in a separate directory for later operations;

The tar - ZXVF hadoop - 3.3.0. Tar. GzCopy the code

When the decompression is complete, the decompression result is in the directory of the current compressed file. Create the following directories in the directory where the compressed package is stored for later use

data_tmp
    |-data_1
    |-data_2
Copy the code

The location of these directories is purely personal preference, you can customize according to your own preferences (happy as you like), pay attention to the following example path;

2.2) Modify environment variables

vim /etc/profile
Copy the code

Add the following two lines to the bottom of the bottom file

Export HADOOP_HOME= [self-defined Hadoop decompression directory] export PATH=.:$HADOOP_HOME/bin:$PATHCopy the code

Save and exit; Make the configuration take effect

source /etc/profile
Copy the code

Use the following command to see if it works

hadoop -version
Copy the code

The presence of this information indicates that the environment variable has been configured successfully

2.3) Modify the core-site.xml file

File path: /tools/hadoop/hadoop-3.3.0/etc/hadoop Modify the file content as follows:


      

      
<! Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->

<! -- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>hadoop.tmp.dir</name>
                <! -- specify directory -->
                <value>File: [directory where the compressed file is stored] /data_tmp</value>
                <description>Abase for other temporary directories.</description>
        </property>
        <property>
                <name>fs.defaultFS</name>
                <! -- Address of the primary node in the distributed cluster: specify the port number; Now set up is a pseudo cluster, the main node is of course the machine -->
                <value>HDFS ://localhost:</value>
        </property>
</configuration>
Copy the code

2.4) Modify the HDFS -site. XML file

File path: /tools/hadoop/hadoop-3.3.0/etc/hadoop Modify the file content as follows:


      

      
<! Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->

<! -- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.replication</name>
                <! -- In real clustering, this should be 3. Build a pseudo-distributed only one machine, so can only write 1 -->
                <value>1</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>Directory for storing compressed files /data_tmp/data_1</value>
        </property>
        <property>
                <name>dfs.datanode.data.dir</name>
                <value>Directory for storing compressed files /data_tmp/data_2</value>
        </property>
</configuration>
Copy the code

2.5) Modify the mapred-site. XML file

File path: /tools/hadoop/hadoop-3.3.0/etc/hadoop Modify the file content as follows:


      

      
<! Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->

<! -- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
</configuration>
Copy the code

2.6) Modify yarn-site. XML

File path: /tools/hadoop/hadoop-3.3.0/etc/hadoop Modify the file content as follows:


      
<! Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. See accompanying LICENSE file. -->
<configuration>

<! -- Site specific YARN configuration properties -->
        <! -- Specify the leader -->
        <property>
                <name>yarn.resourcemanager.hostname</name>
                <value>localhost</value>
        </property>
        <! -- Configure heavy nodes in the YARN cluster and transfer intermediate results generated by Map to Reduce using shuffle -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
</configuration>
Copy the code

Start Hadoop

Between this instance is for their own play and development and debugging use, so you can close the firewall;

3.1) Initialization

Execute the following command:

hadoop namenode -format
Copy the code

Note: If the Hadoop command does not exist, check whether the configured environment variables are valid.

3.2) to start the

Go to the Hadoop decompressed directory and start the file in sbin

./sbin/start-dfs.sh
./sbin/start-yarn.sh
Copy the code

After the execution, use the JPS command to check. If the following figure shows that the startup is successful.

3.3) Browse in a browser

For Hadoop3.0.0 or later, the default port for accessing the WebUI is changed from 50070 to 9870. In the browser, enter [VM IP address] :9870

As shown in the figure above, the logo is successful, successful is green; (PS: I really don’t know why they like green so much. They have to “wear green hair if they want to live.” ?).

4. Common problems

4.1) Abnormal startup

ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. Starting datanodes ERROR: Attempting to operate on hdfs datanode as root ERROR: but there is no HDFS_DATANODE_USER defined. Aborting operation. Starting secondary namenodes [localhost.localdomain] ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but there is no HDFS_SECONDARYNAMENODE_USER defined. Aborting operation.

If you encounter this problem, adjust the following files (find the sbin folder in the Hadoop installation directory) : add the following files at the beginning of start-dfs.sh and stop-dfs.sh:

#! /usr/bin/env bash HDFS_DATANODE_USER=root HADOOP_SECURE_DN_USER=hdfs HDFS_NAMENODE_USER=root HDFS_SECONDARYNAMENODE_USER=rootCopy the code

Sh and stop-yarn.sh add the following to the headers of the start-yarn.sh and stop-yarn.sh files:

#! /usr/bin/env bash YARN_RESOURCEMANAGER_USER=root HADOOP_SECURE_DN_USER=yarn YARN_NODEMANAGER_USER=rootCopy the code

Save the modification and run it again

./sbin/start-dfs.sh
./sbin/start-yarn.sh
Copy the code

4.2) There is only one JPS viewing process after Hadoop is started

The solution to this problem comes from the network, which is recorded here;

After completion of the configuration I directly start the hadoop and JPS check, found that only the JPS a process that began to not pay attention to the boot process tips have been check the configuration errors, remove the configuration above and do the following operation: (in fact do not have below does not need to, but in order to avoid the operation and the discrepancy so finally listed)

(hadoop_home) (core-site. XML) (mapred-site. XMLCopy the code
<property>
    <name>mapreduce.admin.user.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
<property>
    <name>yarn.app.mapreduce.am.env</name>
    <value>HADOOP_MAPRED_HOME=$HADOOP_COMMON_HOME</value>
</property>
Copy the code

Then you realize that there is no password, and you will not be prompted to enter a password during the startup process, so you must first configure SSH password-free login.