1. Pseudo-distribution mode
Pseudo-distributed mode is a mode that runs on a single node and multiple Java processes. Compared with the local mode, you need to set more configuration files, SSH, and YARN Settings.
2 Hadoop
The configuration file
Modify three configuration files in the Hadoop installation directory:
etc/hadoop/core-site.xml
etc/hadoop/hdfs-site.xml
etc/hadoop/hadoop-env.sh
2.1 core-site.xml
First modify core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/hadoop/tmp</value>
</property>
</configuration>
Copy the code
fs.defaultFS
Set up theHDFS
Set the address to run locally9000
On porthadoop.tmp.dir
The temporary directory is set, if not set by default/tmp/hadoop-${user.name}
, data will be lost after the system restarts. Therefore, change the path of the temporary directory
Then create the temporary directory:
mkdir -p /usr/local/hadoop/tmp
Copy the code
2.2 hdfs-site.xml
Alter fsds-site.xml
<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Copy the code
Dfs. replication sets the number of temporary backups stored by HDFS. Since there is only one node in pseudo-distribution mode, set it to 1.
2.3 hadoop-env.sh
Modify the file to add the JAVA_HOME environment variable, even if JAVA_HOME is present
~/.bashrc
~/.bash_profile
/etc/profile
JAVA_HOME cannot be found. Therefore, you need to manually set JAVA_HOME in hadoop-env.sh:
3 There is no password on the local PCssh
The connection
The next step is to set up the local password-free SSH connection. First, ensure that the SSHD service is enabled:
systemctl status sshd
Copy the code
Localhost:
ssh localhost
Copy the code
Enter your own user password to access. However, a password-free connection is required. Therefore, configure the key authentication mode for the connection:
ssh-keygen -t ed25519 -a 100
cat ~/.ssh/id_25519.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys
Copy the code
After the public and private keys are generated, add the public key to authorized_keys and modify the permission. Note that only the local user has the write permission.
SSH localhost to connect to the localhost.
4 run
4.1 the formattingHDFS
Run in single-node mode, format HDFS first:
# HADOOP indicates the HADOOP installation directory
HADOOP/bin/hdfs namenode -format
Copy the code
Format is to block datanodes in the HDFS, collect statistics on all the initial metadata after block, and store it in the NameNode.
After the formatting is successful, the DFS directory will be generated in the temporary directory set in the configuration file above, as follows:
The TMP/DFS /name/current directory contains the following files:
The description of the document is as follows:
fsimage
:NameNode
Metadata stored in a persistent file when the memory is fullfsimage*.md5
: Verification file, used for verificationfsimage
The integrity of theseen_txid
Deposit:transactionID
File,format
And then 0, which meansNameNode
The inside of theedits_*
Mantissa of the fileVERSION
: Save the creation time,namespaceID
,blockpoolID
,storageType
,cTime
,clusterID
,layoutVersion
Note about VERSION:
namespaceID
:HDFS
Unique identifier, inHDFS
Generated after the first formattingblockpoolID
: Identify ablock pool
Is globally unique across clustersstorageType
What process’s data structure information is storedcTime
: Creation timeclusterID
: A cluster generated or specified by the systemID
, you can use-clusterid
The specifiedlayoutVersion
Said:HDFS
Information about versions of persistent data structures
4.2 startNameNode
HADOOP/sbin/start-dfs.sh
Copy the code
And then you can go through
localhost:9870
Copy the code
Visit the NameNode:
4.3 test
Generate an input directory and use a configuration file as input:
bin/hdfs dfs -mkdir /user
bin/hdfs dfs -mkdir /user/USER_NAME # USER_NAME is your user name
bin/hdfs dfs -mkdir input
bin/hdfs dfs -put etc/hadoop/*.xml input
Copy the code
Testing:
Bin/hadoop jar share/hadoop/graphs/hadoop - graphs - examples - 3.3.0. Jar grep input output'dfs[a-z.]+'
Copy the code
Get the output:
bin/hdfs dfs -get output output Copy output to the output directory
cat output/*
Copy the code
Stop:
sbin/stop-hdfs.sh
Copy the code
5 usingYARN
configuration
In addition to starting a single node in pseudo-distribution mode, you can use YARN to schedule nodes in a unified manner. You only need to modify the configuration file.
5.1 Configuration File
Modify the following files:
HADOOP/etc/hadoop/mapred-site.xml
HADOOP/etc/hadoop/yarn-site.xml
5.1.1 mapred-site.xml
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
</configuration>
Copy the code
mapreduce.framework.name
Specifies theMapReduce
Running on theYARN
onmapreduce.application.classpath
Class path specified
5.1.2 yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_MAPRED _HOME</value>
</property>
</configuration>
Copy the code
yarn.nodemanager.aux-services
: running inNodeManager
Satellite services run onyarn.nodemanager.env-whitelist
: Environment variables are passed fromNodeManagers
The environment properties inherited by the container
5.2 run
sbin/start-yarn.sh
Copy the code
Run it and you can pass it
localhost:8088
Copy the code
Access:
Stop:
sbin/stop-yarn.sh
Copy the code
6 reference
- Hadoop3.3.0 official documentation
- CSDN GitChat · with large data | in the history of the most detailed Hadoop environment set up
- CSDN-Hadoop Namenode metadata files Fsimage, editlog, and seen_txID