This is the 23rd day of my participation in the August More Text Challenge.More challenges in August
WangScaler: A writer with heart.
Declaration: uneducated, if there is a mistake, kindly correct.
In the last article, we built part of the Hadoop environment. In this article, we continue to build our environment. Combining these two articles, the basic environment of Hadoop was built.
Install hadoop
Install the JDK in advance and download the Hadoop installation package. Upload the Hadoop installation package to three VMS. You can use MobaxTerm, which integrates XShell and XFTP and is much more user-friendly and aesthetically pleasing than either, and is highly recommended.
Decompress the Hadoop package tar -zxf hadoop-2.6.4.tar.gz -c /usr/local
Configure the VM environment
- 1. Modify the host files of the three machines
vim /etc/host
192.168253.130. master
192.168253.131. slave1
192.168253.132. slave2
Copy the code
- 2. Modify hostname
Vim /etc/sysconfig/network master
hostname=master
Copy the code
Slave1 machine modified to
hostname=slave1
Copy the code
And so on, how many nodes do you have and how many do you modify.
Modify the Hadoop configuration file
cd /usr/local/hadoop-2.64./etc/hadoop/
Copy the code
- 1, the core site. XML
Modified into
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:8020</value>
</property>
<property>
<name>hadoop.tmp.dir</name>
<value>/var/log/hadoop/tmp</value>
</property>
</configuration>
Copy the code
This configuration is used to specify the address of the namenode HDFS ://master:8020 and the directory where the files generated when hadoop is used are stored.
- 2, the hadoop – env. Sh
Introduction of Java environment, so must be prepared in advance of the Java environment, how to install JDK baidu, it is not difficult.
export JAVA_HOME=/usr/java/jdk17.. 0 _80
Copy the code
- 3, the HDFS – site. XML
<configuration>
<property>
<name>dfs.namenode.name.dir</name>
<value>file:///data/hadoop/hdfs/name</value>
</property>
<property>
<name>dfs.datanode.data.dir</name>
<value>file:///data/hadoop/hdfs/data</value>
</property>
<property>
<name>dfs.namenode.secondary.http-address</name>
<value>master:50090</value>
</property>
<property>
<name>dfs.replication</name>
<value>3</value>
</property>
</configuration>
Copy the code
This configuration is used to specify the storage location of namenode and Datanode in HDFS and the number of data copies saved in HDFS is 3.
- 4, the mapred – site. XML
Cp mapred-site.xml.template mapred-site. XML The mapred-site file is generated using the template file. The modified content is as follows:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property><! -- jobhistory properties --><property>
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
Copy the code
After Hadoop is specified, Map/Reduce runs on YARN
- 5, yarn – site. XML
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>${yarn.resourcemanager.hostname}:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>${yarn.resourcemanager.hostname}:8030</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.address</name>
<value>${yarn.resourcemanager.hostname}:8088</value>
</property>
<property>
<name>yarn.resourcemanager.webapp.https.address</name>
<value>${yarn.resourcemanager.hostname}:8090</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>${yarn.resourcemanager.hostname}:8031</value>
</property>
<property>
<name>yarn.resourcemanager.admin.address</name>
<value>${yarn.resourcemanager.hostname}:8033</value>
</property>
<property>
<name>yarn.nodemanager.local-dirs</name>
<value>/data/hadoop/yarn/local</value>
</property>
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<property>
<name>yarn.nodemanager.remote-app-log-dir</name>
<value>/data/tmp/logs</value>
</property>
<property>
<name>yarn.log.server.url</name>
<value>http://master:19888/jobhistory/logs/</value>
<description>URL for job history server</description>
</property>
<property>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
<value>org.apache.hadoop.mapred.ShuffleHandler</value>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>512</value>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>4096</value>
</property>
<property>
<name>mapreduce.map.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>mapreduce.reduce.memory.mb</name>
<value>2048</value>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>1</value>
</property>
</configuration>
Copy the code
Specify the nomenodeManager to obtain data through shuffl and specify the ResourceManager address
Note: if you don’t have Slave2 and only have one node, the configuration file is different. The configuration of a single node must be reduced; otherwise, the operation time is too long. You can download the Hdoop single-node yarn-site. XML configuration file from CSDN, or contact me for comments.
Modify slaves file
Vim Slaves revises the document of Slaves, and the revised content is as follows:
SSH Password-free login
ssh-keygen -t rsa
ssh-copy-id -i /root/.ssh/id_rsa.pub master
ssh-copy-id -i /root/.ssh/id_rsa.pub slave1
ssh-copy-id -i /root/.ssh/id_rsa.pub slave2
Copy the code
Type SHH slave1 to test and log in to Slave1 without password
Configuration time synchronization problem
Note: There is an error when I install NTP
Media is not mounted, just mount the side again.
Example Change the vim /etc/ntp.conf file of the master host. Comment the server opening line and add
restrict 192.168. 0. 0 mask 255.255255.. 0 nomodify notrap
server 127.1271.. 0
fudge 127.1271.. 0 stratum 10
Copy the code
Because of the virtual machine we use here, in order to prevent firewall problems, we directly turn off the firewall, if you are a real server, remember not to do this.
service iptables stop & chkconfig iptables off
service ntpd start & chkconfig ntpd on
Copy the code
Execute ntpDate master on the child node.
The above basic Hadoop environment is set up.
Come all come, click “like” and then go!
Follow WangScaler and wish you a promotion, a raise and no bucket!