Hadoop cluster environment configuration

This is the 23rd day of my participation in the August More Text Challenge.More challenges in August

WangScaler: A writer with heart.

Declaration: uneducated, if there is a mistake, kindly correct.

In the last article, we built part of the Hadoop environment. In this article, we continue to build our environment. Combining these two articles, the basic environment of Hadoop was built.

Install hadoop

Install the JDK in advance and download the Hadoop installation package. Upload the Hadoop installation package to three VMS. You can use MobaxTerm, which integrates XShell and XFTP and is much more user-friendly and aesthetically pleasing than either, and is highly recommended.

Decompress the Hadoop package tar -zxf hadoop-2.6.4.tar.gz -c /usr/local

Configure the VM environment

1. Modify the host files of the three machinesvim /etc/host

192.168253.130.	master
192.168253.131.	slave1
192.168253.132.	slave2
Copy the code

2. Modify hostname

Vim /etc/sysconfig/network master

hostname=master

Copy the code

Slave1 machine modified to

hostname=slave1

Copy the code

And so on, how many nodes do you have and how many do you modify.

Modify the Hadoop configuration file

cd /usr/local/hadoop-2.64./etc/hadoop/
Copy the code

1, the core site. XML

Modified into

<configuration>
    <property>
    <name>fs.defaultFS</name>  
      <value>hdfs://master:8020</value>  
      </property>  
    <property>
      <name>hadoop.tmp.dir</name>
      <value>/var/log/hadoop/tmp</value>
    </property>
</configuration>
Copy the code

This configuration is used to specify the address of the namenode HDFS ://master:8020 and the directory where the files generated when hadoop is used are stored.

2, the hadoop – env. Sh

Introduction of Java environment, so must be prepared in advance of the Java environment, how to install JDK baidu, it is not difficult.

export JAVA_HOME=/usr/java/jdk17.. 0 _80
Copy the code

3, the HDFS – site. XML

<configuration>
<property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///data/hadoop/hdfs/data</value>
</property>
<property>
     <name>dfs.namenode.secondary.http-address</name>
     <value>master:50090</value>
</property>
<property>
     <name>dfs.replication</name>
     <value>3</value>
</property>
</configuration>

Copy the code

This configuration is used to specify the storage location of namenode and Datanode in HDFS and the number of data copies saved in HDFS is 3.

4, the mapred – site. XML

Cp mapred-site.xml.template mapred-site. XML The mapred-site file is generated using the template file. The modified content is as follows:

<configuration>
<property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
</property><! -- jobhistory properties --><property>
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
</property>
<property>
     <name>mapreduce.jobhistory.webapp.address</name>
     <value>master:19888</value>
</property>
</configuration>
Copy the code

After Hadoop is specified, Map/Reduce runs on YARN

5, yarn – site. XML

<configuration>

 <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>master</value>
  </property>    
  <property>
    <name>yarn.resourcemanager.address</name>
    <value>${yarn.resourcemanager.hostname}:8032</value>
  </property>
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>${yarn.resourcemanager.hostname}:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.address</name>
    <value>${yarn.resourcemanager.hostname}:8088</value>
  </property>
  <property>
    <name>yarn.resourcemanager.webapp.https.address</name>
    <value>${yarn.resourcemanager.hostname}:8090</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>${yarn.resourcemanager.hostname}:8031</value>
  </property>
  <property>
    <name>yarn.resourcemanager.admin.address</name>
    <value>${yarn.resourcemanager.hostname}:8033</value>
  </property>
  <property>
    <name>yarn.nodemanager.local-dirs</name>
    <value>/data/hadoop/yarn/local</value>
  </property>
  <property>
    <name>yarn.log-aggregation-enable</name>
    <value>true</value>
  </property>
  <property>
    <name>yarn.nodemanager.remote-app-log-dir</name>
    <value>/data/tmp/logs</value>
  </property>
<property> 
 <name>yarn.log.server.url</name> 
 <value>http://master:19888/jobhistory/logs/</value>
 <description>URL for job history server</description>
</property>
<property>
   <name>yarn.nodemanager.vmem-check-enabled</name>
    <value>false</value>
  </property>
 <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
      <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
<property>  
        <name>yarn.nodemanager.resource.memory-mb</name>  
        <value>2048</value>  
 </property>  
 <property>  
        <name>yarn.scheduler.minimum-allocation-mb</name>  
        <value>512</value>  
 </property>   
 <property>  
        <name>yarn.scheduler.maximum-allocation-mb</name>  
        <value>4096</value>  
 </property> 
 <property> 
    <name>mapreduce.map.memory.mb</name> 
    <value>2048</value> 
 </property> 
 <property> 
    <name>mapreduce.reduce.memory.mb</name> 
    <value>2048</value> 
 </property> 
 <property> 
    <name>yarn.nodemanager.resource.cpu-vcores</name> 
    <value>1</value> 
 </property>
</configuration>
Copy the code

Specify the nomenodeManager to obtain data through shuffl and specify the ResourceManager address

Note: if you don’t have Slave2 and only have one node, the configuration file is different. The configuration of a single node must be reduced; otherwise, the operation time is too long. You can download the Hdoop single-node yarn-site. XML configuration file from CSDN, or contact me for comments.

Modify slaves file

Vim Slaves revises the document of Slaves, and the revised content is as follows:

SSH Password-free login

ssh-keygen -t rsa

  ssh-copy-id -i /root/.ssh/id_rsa.pub master
  ssh-copy-id -i /root/.ssh/id_rsa.pub slave1
  ssh-copy-id -i /root/.ssh/id_rsa.pub slave2
Copy the code

Type SHH slave1 to test and log in to Slave1 without password

Configuration time synchronization problem

Note: There is an error when I install NTP

Media is not mounted, just mount the side again.

Example Change the vim /etc/ntp.conf file of the master host. Comment the server opening line and add

restrict 192.168. 0. 0 mask 255.255255.. 0 nomodify notrap
server 127.1271.. 0
fudge 127.1271.. 0 stratum 10
Copy the code

Because of the virtual machine we use here, in order to prevent firewall problems, we directly turn off the firewall, if you are a real server, remember not to do this.

service iptables stop & chkconfig iptables off
service ntpd start & chkconfig ntpd on
Copy the code

Execute ntpDate master on the child node.

The above basic Hadoop environment is set up.

Come all come, click “like” and then go!

Follow WangScaler and wish you a promotion, a raise and no bucket!

Hadoop cluster environment configuration

Install hadoop

Configure the VM environment

Modify the Hadoop configuration file

Modify slaves file

SSH Password-free login

Configuration time synchronization problem

Related Posts

SPI protocol, SPI to CAN, MCP2515 naked driver details

You should know this when we talk about Serverless

How to get started with Python?