Hadoop Learning 02 - Building a pseudo-distributed cluster

This is the 10th day of my participation in Gwen Challenge

The official documentation

I. Prerequisites

Install the Java
- You can use the commandjpsCheck whether Java is successfully installed
Configure SSH keyless login

#1.Generate a key ssh-keygen -t dsa -p in the /root/. SSH directory' ' -f ~/.ssh/id_dsa
#2.Append the key to the authentication file authorized_keys $cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keysCopy the code

Second, build,

Unzip the Hadoop installation package, which I put here/opt/hadoopdirectory
Configure hadoop environment variables
- After the configuration is complete, you can press TAB to check whether the command can be completed

export JAVA_HOME=/usr/local/java/jdk18.. 0 _60
export HADOOP_HOME=/opt/hadoop/hadoop-2.6. 5
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
Copy the code

Modify the Java environment variable it points to in the Hadoop configuration file
- If left unchanged, the default is to point to a local Java environment variable, which may not be found by nodes in the cluster
```
# The Java implementation to use. export JAVA_HOME=${JAVA_HOME}Copy the code
```
- Locate the JAVA_HOME field in the file and change its value to your Java installation path
- There are three files in this path that need to be changedThe/opt/hadoop/hadoop - 2.6.5 / etc/hadoop
  - hadoop-env.sh
  - mapred-env.sh
  - yarn-env.sh
Modify the Hadoop core configuration file

Namenode /etc/hadoop/core-site.xml<configuration>
    <property>
        <name>fs.defaultFS</name>// Node01 is where the current namenode is located<value>hdfs://node1:9000</value>
    </property>
</configuration>#2. Configure datanode // /etc/hadoop/hdfs-site.xml<configuration>
    <property>
        <name>dfs.replication</name>// Pseudo distributed, only one server, so set one copy<value>1</value>
    </property>
</configuration>#3. Modify /etc/hadoop/slaves // to set the slave node information (write the slave node IP or alias) // for example here my host alias is node1 node1 #4. Configure sencondarynode // directory: /etc/hadoop/hdfs-site.xml<property>
        <name>dfs.namenode.secondary.http-address</name>// Note the port number is 50090<value>node1:50090</value>
    </property>/ TMP /hadoop-${user.name} // Modify file /etc/hadoop/core-site.xml<property>
        <name>hadoop.tmp.dir</name>
        <value>/var/hadoop/pseudo</value>
    </property>
Copy the code

To begin testing
1. Formatting HDFS
  - Using the commandhdfs -format
  - It’s going to be in what we just said/var/hadoop/pseudoCreate file name under the directory where version contains the current cluserID
  - Each format will create a new HDFS, that is, each format will change the cluserId brand new, so if the format is repeated, what should I do?
2. Start the cluster.
  - start-dfs.sh
  - It’s gonna be there when it starts/var/hadoop/pseudoA data file representing databode that contains the cluserID of the Datanode is generated in the directory
3. Use the Web site to connect to Hadoop
  - It should be noted that the firewall must be closed, otherwise the website can not be opened
  - useHost name + portConnect website
```
# Hadoop ports1HDFS page:50070

2, YARN management interface:8088

3HistoryServer management interface:19888

4Zookeeper service port number:2181

5Mysql server port id:3306

6And Hive. Server1 =10000

7Kafka service port number:9092

8, Azkaban interface:8443

9, Hbase interface:16010.60010

10, Spark interface:8080

11URL of Spark:7077
Copy the code
```

Modify the Hosts file in Windows

Path: C: \ Windows \ System32 \ drivers \ etc

Add the following content192.16885.151. node1
192.16885.152. node2
192.16885.153. node3
192.16885.154. node4
Copy the code

HDFS is used to store files

1048576Peter said1This command means to upload tpMCat to HDFS, for each block1MB HDFS DFS -d dfs.blocksize=1048576 -put apache-tomcat-8.030..tar.gz /user/root
Copy the code

mo4tech.com (Moment For Technology) is a global community with thousands techies from across the global hang out!Passionate technologists, be it gadget freaks, tech enthusiasts, coders, technopreneurs, or CIOs, you would find them all here.

Hadoop Learning 02 — Building a pseudo-distributed cluster

I. Prerequisites

Second, build,

Hadoop Learning 02 — Building a pseudo-distributed cluster

I. Prerequisites

Second, build,

Related Posts

Talk about Java thread synchronization: synchronized

STM32F103 MCU uses DMA function to read ADC sampling data

Why is Redis slower? Article explaining how to thoroughly screen Redis performance problems | word long