This is the 10th day of my participation in Gwen Challenge

  • The official documentation

I. Prerequisites

  1. Install the Java
    • You can use the commandjpsCheck whether Java is successfully installed
  2. Configure SSH keyless login
#1.Generate a key ssh-keygen -t dsa -p in the /root/. SSH directory' ' -f ~/.ssh/id_dsa
#2.Append the key to the authentication file authorized_keys $cat ~/.ssh/ >> ~/.ssh/authorized_keysCopy the code

Second, build,

  1. Unzip the Hadoop installation package, which I put here/opt/hadoopdirectory
  2. Configure hadoop environment variables
    • After the configuration is complete, you can press TAB to check whether the command can be completed
export JAVA_HOME=/usr/local/java/jdk18.. 0 _60
export HADOOP_HOME=/opt/hadoop/hadoop-2.6. 5
Copy the code
  1. Modify the Java environment variable it points to in the Hadoop configuration file
    • If left unchanged, the default is to point to a local Java environment variable, which may not be found by nodes in the cluster
    # The Java implementation to use. export JAVA_HOME=${JAVA_HOME}Copy the code
    • Locate the JAVA_HOME field in the file and change its value to your Java installation path
    • There are three files in this path that need to be changedThe/opt/hadoop/hadoop - 2.6.5 / etc/hadoop
  2. Modify the Hadoop core configuration file
Namenode /etc/hadoop/core-site.xml<configuration>
        <name>fs.defaultFS</name>// Node01 is where the current namenode is located<value>hdfs://node1:9000</value>
</configuration>#2. Configure datanode // /etc/hadoop/hdfs-site.xml<configuration>
        <name>dfs.replication</name>// Pseudo distributed, only one server, so set one copy<value>1</value>
</configuration>#3. Modify /etc/hadoop/slaves // to set the slave node information (write the slave node IP or alias) // for example here my host alias is node1 node1 #4. Configure sencondarynode // directory: /etc/hadoop/hdfs-site.xml<property>
        <name>dfs.namenode.secondary.http-address</name>// Note the port number is 50090<value>node1:50090</value>
    </property>/ TMP /hadoop-${} // Modify file /etc/hadoop/core-site.xml<property>
Copy the code
  1. To begin testing
    1. Formatting HDFS
      • Using the commandhdfs -format
      • It’s going to be in what we just said/var/hadoop/pseudoCreate file name under the directory where version contains the current cluserID
      • Each format will create a new HDFS, that is, each format will change the cluserId brand new, so if the format is repeated, what should I do?
    2. Start the cluster.
      • It’s gonna be there when it starts/var/hadoop/pseudoA data file representing databode that contains the cluserID of the Datanode is generated in the directory
    3. Use the Web site to connect to Hadoop
      • It should be noted that the firewall must be closed, otherwise the website can not be opened
      • useHost name + portConnect website
    # Hadoop ports1HDFS page:50070
    2, YARN management interface:8088
    3HistoryServer management interface:19888
    4Zookeeper service port number:2181
    5Mysql server port id:3306
    6And Hive. Server1 =10000
    7Kafka service port number:9092
    8, Azkaban interface:8443
    9, Hbase interface:16010.60010
    10, Spark interface:8080
    11URL of Spark:7077
    Copy the code
  2. Modify the Hosts file in Windows
    • Path: C: \ Windows \ System32 \ drivers \ etc
    Add the following content192.16885.151. node1
    192.16885.152. node2
    192.16885.153. node3
    192.16885.154. node4
    Copy the code
  3. HDFS is used to store files
1048576Peter said1This command means to upload tpMCat to HDFS, for each block1MB HDFS DFS -d dfs.blocksize=1048576 -put apache-tomcat-8.030..tar.gz /user/root
Copy the code