This is the 10th day of my participation in Gwen Challenge
- The official documentation
I. Prerequisites
- Install the Java
- You can use the command
jps
Check whether Java is successfully installed
- You can use the command
- Configure SSH keyless login
#1.Generate a key ssh-keygen -t dsa -p in the /root/. SSH directory' ' -f ~/.ssh/id_dsa
#2.Append the key to the authentication file authorized_keys $cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keysCopy the code
Second, build,
- Unzip the Hadoop installation package, which I put here
/opt/hadoop
directory - Configure hadoop environment variables
- After the configuration is complete, you can press TAB to check whether the command can be completed
export JAVA_HOME=/usr/local/java/jdk18.. 0 _60
export HADOOP_HOME=/opt/hadoop/hadoop-2.6. 5
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Copy the code
- Modify the Java environment variable it points to in the Hadoop configuration file
- If left unchanged, the default is to point to a local Java environment variable, which may not be found by nodes in the cluster
# The Java implementation to use. export JAVA_HOME=${JAVA_HOME}Copy the code
- Locate the JAVA_HOME field in the file and change its value to your Java installation path
- There are three files in this path that need to be changed
The/opt/hadoop/hadoop - 2.6.5 / etc/hadoop
- hadoop-env.sh
- mapred-env.sh
- yarn-env.sh
- Modify the Hadoop core configuration file
Namenode /etc/hadoop/core-site.xml<configuration>
<property>
<name>fs.defaultFS</name>// Node01 is where the current namenode is located<value>hdfs://node1:9000</value>
</property>
</configuration>#2. Configure datanode // /etc/hadoop/hdfs-site.xml<configuration>
<property>
<name>dfs.replication</name>// Pseudo distributed, only one server, so set one copy<value>1</value>
</property>
</configuration>#3. Modify /etc/hadoop/slaves // to set the slave node information (write the slave node IP or alias) // for example here my host alias is node1 node1 #4. Configure sencondarynode // directory: /etc/hadoop/hdfs-site.xml<property>
<name>dfs.namenode.secondary.http-address</name>// Note the port number is 50090<value>node1:50090</value>
</property>/ TMP /hadoop-${user.name} // Modify file /etc/hadoop/core-site.xml<property>
<name>hadoop.tmp.dir</name>
<value>/var/hadoop/pseudo</value>
</property>
Copy the code
- To begin testing
- Formatting HDFS
- Using the command
hdfs -format
- It’s going to be in what we just said
/var/hadoop/pseudo
Create file name under the directory where version contains the current cluserID - Each format will create a new HDFS, that is, each format will change the cluserId brand new, so if the format is repeated, what should I do?
- Using the command
- Start the cluster.
start-dfs.sh
- It’s gonna be there when it starts
/var/hadoop/pseudo
A data file representing databode that contains the cluserID of the Datanode is generated in the directory
- Use the Web site to connect to Hadoop
- It should be noted that the firewall must be closed, otherwise the website can not be opened
- use
Host name + port
Connect website
# Hadoop ports1HDFS page:50070 2, YARN management interface:8088 3HistoryServer management interface:19888 4Zookeeper service port number:2181 5Mysql server port id:3306 6And Hive. Server1 =10000 7Kafka service port number:9092 8, Azkaban interface:8443 9, Hbase interface:16010.60010 10, Spark interface:8080 11URL of Spark:7077 Copy the code
- Formatting HDFS
- Modify the Hosts file in Windows
- Path: C: \ Windows \ System32 \ drivers \ etc
Add the following content192.16885.151. node1 192.16885.152. node2 192.16885.153. node3 192.16885.154. node4 Copy the code
- HDFS is used to store files
1048576Peter said1This command means to upload tpMCat to HDFS, for each block1MB HDFS DFS -d dfs.blocksize=1048576 -put apache-tomcat-8.030..tar.gz /user/root
Copy the code