1. Pseudo-distribution mode

Pseudo-distributed mode is a mode that runs on a single node and multiple Java processes. Compared with the local mode, you need to set more configuration files, SSH, and YARN Settings.

2 `Hadoop`The configuration file

Modify three configuration files in the Hadoop installation directory:

etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/hadoop/hadoop-env.sh

2.1 `core-site.xml`

First modify core-site.xml:

<configuration>
    <property>
        <name>fs.defaultFS</name>
        <value>hdfs://localhost:9000</value>
    </property>
    <property>
    	<name>hadoop.tmp.dir</name>
    	<value>/usr/local/hadoop/tmp</value>
    </property>
</configuration>
Copy the code

fs.defaultFSSet up theHDFSSet the address to run locally9000On port
hadoop.tmp.dirThe temporary directory is set, if not set by default/tmp/hadoop-${user.name}, data will be lost after the system restarts. Therefore, change the path of the temporary directory

Then create the temporary directory:

mkdir -p /usr/local/hadoop/tmp
Copy the code

2.2 `hdfs-site.xml`

Alter fsds-site.xml

<configuration>
    <property>
        <name>dfs.replication</name>
        <value>1</value>
    </property>
</configuration>
Copy the code

Dfs. replication sets the number of temporary backups stored by HDFS. Since there is only one node in pseudo-distribution mode, set it to 1.

2.3 `hadoop-env.sh`

Modify the file to add the JAVA_HOME environment variable, even if JAVA_HOME is present

~/.bashrc
~/.bash_profile
/etc/profile

JAVA_HOME cannot be found. Therefore, you need to manually set JAVA_HOME in hadoop-env.sh:

3 There is no password on the local PC`ssh`The connection

The next step is to set up the local password-free SSH connection. First, ensure that the SSHD service is enabled:

systemctl status sshd
Copy the code

Localhost:

ssh localhost
Copy the code

Enter your own user password to access. However, a password-free connection is required. Therefore, configure the key authentication mode for the connection:

ssh-keygen -t ed25519 -a 100 
cat ~/.ssh/id_25519.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Copy the code

After the public and private keys are generated, add the public key to authorized_keys and modify the permission. Note that only the local user has the write permission.

SSH localhost to connect to the localhost.

4 run

4.1 the formatting`HDFS`

Run in single-node mode, format HDFS first:

# HADOOP indicates the HADOOP installation directory
HADOOP/bin/hdfs namenode -format
Copy the code

Format is to block datanodes in the HDFS, collect statistics on all the initial metadata after block, and store it in the NameNode.

After the formatting is successful, the DFS directory will be generated in the temporary directory set in the configuration file above, as follows:

The TMP/DFS /name/current directory contains the following files:

The description of the document is as follows:

fsimage:NameNodeMetadata stored in a persistent file when the memory is full
fsimage*.md5: Verification file, used for verificationfsimageThe integrity of the
seen_txidDeposit:transactionIDFile,formatAnd then 0, which meansNameNodeThe inside of theedits_*Mantissa of the file
VERSION: Save the creation time,namespaceID,blockpoolID,storageType,cTime,clusterID,layoutVersion

Note about VERSION:

namespaceID:HDFSUnique identifier, inHDFSGenerated after the first formatting
blockpoolID: Identify ablock poolIs globally unique across clusters
storageTypeWhat process’s data structure information is stored
cTime: Creation time
clusterID: A cluster generated or specified by the systemID, you can use-clusteridThe specified
layoutVersionSaid:HDFSInformation about versions of persistent data structures

4.2 start`NameNode`

HADOOP/sbin/start-dfs.sh
Copy the code

And then you can go through

localhost:9870
Copy the code

Visit the NameNode:

4.3 test

Generate an input directory and use a configuration file as input:

bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/USER_NAME # USER_NAME is your user name
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
Copy the code

Testing:

Bin/hadoop jar share/hadoop/graphs/hadoop - graphs - examples - 3.3.0. Jar grep input output'dfs[a-z.]+'
Copy the code

Get the output:

bin/hdfs dfs -get output output Copy output to the output directory
cat output/*
Copy the code

Stop:

sbin/stop-hdfs.sh
Copy the code

5 using`YARN`configuration

In addition to starting a single node in pseudo-distribution mode, you can use YARN to schedule nodes in a unified manner. You only need to modify the configuration file.

5.1 Configuration File

Modify the following files:

HADOOP/etc/hadoop/mapred-site.xml
HADOOP/etc/hadoop/yarn-site.xml

5.1.1 `mapred-site.xml`

<configuration>
    <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
    </property>
    <property>
        <name>mapreduce.application.classpath</name>
        <value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
    </property>
</configuration>
Copy the code

mapreduce.framework.nameSpecifies theMapReduceRunning on theYARNon
mapreduce.application.classpathClass path specified

5.1.2 `yarn-site.xml`

<configuration>
    <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
    </property>
    <property>
        <name>yarn.nodemanager.env-whitelist</name>
        <value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED _HOME</value>
    </property>
</configuration>
Copy the code

yarn.nodemanager.aux-services: running inNodeManagerSatellite services run on
yarn.nodemanager.env-whitelist: Environment variables are passed fromNodeManagersThe environment properties inherited by the container

5.2 run

sbin/start-yarn.sh
Copy the code

Run it and you can pass it

localhost:8088
Copy the code

Access:

Stop:

sbin/stop-yarn.sh
Copy the code

6 reference

Hadoop3.3.0 official documentation
CSDN GitChat · with large data | in the history of the most detailed Hadoop environment set up
CSDN-Hadoop Namenode metadata files Fsimage, editlog, and seen_txID

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop complete construction process (II) : pseudo-distribution mode

1. Pseudo-distribution mode

2 `Hadoop`The configuration file

2.1 `core-site.xml`

2.2 `hdfs-site.xml`

2.3 `hadoop-env.sh`

3 There is no password on the local PC`ssh`The connection

4 run

4.1 the formatting`HDFS`

4.2 start`NameNode`

4.3 test

5 using`YARN`configuration

5.1 Configuration File

5.1.1 `mapred-site.xml`

5.1.2 `yarn-site.xml`

5.2 run

6 reference

Hadoop complete construction process (II) : pseudo-distribution mode

1. Pseudo-distribution mode

2 HadoopThe configuration file

2.1 core-site.xml

2.2 hdfs-site.xml

2.3 hadoop-env.sh

3 There is no password on the local PCsshThe connection

4 run

4.1 the formattingHDFS

4.2 startNameNode

4.3 test

5 usingYARNconfiguration

5.1 Configuration File

5.1.1 mapred-site.xml

5.1.2 yarn-site.xml

5.2 run

6 reference

Related Posts

Plug-in Activity (1)

Docker debugging skills

Native Android integration with Flutter hybrid development

2 `Hadoop`The configuration file

2.1 `core-site.xml`

2.2 `hdfs-site.xml`

2.3 `hadoop-env.sh`

3 There is no password on the local PC`ssh`The connection

4.1 the formatting`HDFS`

4.2 start`NameNode`

5 using`YARN`configuration

5.1.1 `mapred-site.xml`

5.1.2 `yarn-site.xml`