The target
- Master how to install Centos7 in VMware
- Master Hadoop cluster construction
- Master installation of related software
- Know how to handle common problems
1 Install CentOS7 on VMware
1.1 installation VMware15
Follow the public account: EZ Big Data, reply VM to obtain VMware15 installation package and activation key
- Expand: Bridged, NAT, host-only difference
- Bridged (bridge mode) : suitable for office, LAN environment, production environment commonly used. LAN in an independent host, can access any machine in the network, but need their own shops east configuration IP address.
- NAT (Network Address translation mode) : applies to the home environment (no router). It can access the Internet in the virtual system, which is directly converted by the host computer for networking and located in the VMnet8 subnet of the virtual machine.
- Host-only: the vm cannot be connected to the Internet but can only be connected to the Host and is located in the VMnet1 subnet of the VM.
1.2 Centos7 installation
-
Install the system
Reference: blog.51cto.com/13880683/21…
-
Setting a Static IP Address
Run the service network restart command to restart the network
Directory: ifcfg-ens33 file in /etc/sysconfig/network-scripts/
Key information, as follows:
BOOTPROTO=static ONBOOT=yes IPADDR=192.168.xxx.200 NETMASK=255.255.255.0 GATEWAY=192.168.xxx.2 DNS1=114.114.114.114 DNS2 = 8.8.8.8Copy the code
-
Disabling the Firewall
Firewall-cmd --state, view the default firewall status (notrunning is displayed when disabled, running is displayed when enabled) systemctl stop firewalld.service, Disable firewall systemctl disable firewalld.service to disable firewall startup upon startupCopy the code
-
Set up shared folders (FTP upload preferred, EZ preferred)
Reference: www.cnblogs.com/skyheaving/…
-
Network Exception
Failed to start LSB: Bring up/down networking. #Solution: Disable NetworkManager systemctl stop NetworkManager systemctl disable NetworkManager Copy the code
2 Hadoop Cluster Construction (Fully Distributed)
Hadoop operating mode: local mode, pseudo-distributed mode, fully distributed mode.
Note: This article mainly describes the fully distributed installation, the following configuration, need to configure three machines in VMware: Master, Slave1, Slave2, need to turn off the firewall, set static IP, change the host name.
The corresponding hardware configuration (memory, hard disk capacity) is set according to the situation of the machine.
2.1 Hadoop2.7.7 installation
The official document: hadoop.apache.org/docs/r2.7.7…
Download version: archive.apache.org/dist/hadoop…
Installation reference: www.cnblogs.com/thousfeet/p…
-
Uninstall the Java delivered with the system
Java - verizon RPM - qa | grep JDK. Noarch, delete all RPM -e -- nodeps XXXCopy the code
-
Change the hostname
#Changing the host Name hostnamectl set-hostname xxx #Set the relationship between master and Slave1 and slave2 #Add IP and hostname for Salve1, Slave2 vim /etc/hosts Copy the code
-
Adding environment variables
#It takes effect for the current user vim ~/.bash_profile #It takes effect for all users vim /etc/profile #Execute effective command source ~/.bash_profile or source /etc/profile Copy the code
-
Avoid close login
The principles of the secret-free login are as follows:
- Create a public/private key
Ssh-keygen -t rsa #Copy the code
- Create the authorized_keys file and change the permission to 600
cd .ssh touch authorized_keys chmod 600 authorized_keys Copy the code
- Append the public key to the authorized_keys file
#The public keys of master, Slave1, and Slave2 are appended to authorized_keyscat id_rsa.pub >> authorized_keysSSH master/slave1/slave2 Copy the code
-
Important directory
- Bin: stores scripts for operating Hadoop related services (HDFS, YARN)
- Etc: Hadoop configuration file directory for storing Hadoop configuration files
- Lib: a local repository for Hadoop (to compress and uncompress data)
- Sbin: stores scripts for starting or stopping hadoop-related services
- Share: Stores Hadoop dependent JAR packages, documents, and official cases
2.2 configure Hadoop
-
Modify the configuration
-
Modify the TMP directory in core-site. XML configuration
<configuration> <property> <name>fs.defaultFS</name> <value> HDFS ://master:9000</value> <description>HDFS URI, File system ://namenode identifier: port number </description> </property> <property> <name>hadoop.tmp.dir</name> < value > / home/Amos/SoftWare/hadoop - 2.7.7 / HDFS/TMP < value > / < description > namenode locally on hadoop temporary folder < / description > < / property > <property> <name>hadoop.proxyuser.root.hosts</name> <value>*</value> </property> <property> <name>hadoop.proxyuser.root.groups</name> <value>*</value> </property> </configuration>Copy the code
-
Example Modify hadoop-env.sh to configure JAVA_HOME
Export JAVA_HOME = / home/Amos/SoftWare/jdk1.8.0 _251Copy the code
-
Modify HDFS -site. XML to set DFS /name and DFS /data
< configuration > < property > < name > DFS. The namenode. Secondary. HTTP - address < / name > < value > master: 9001 < value > / < / property > The < property > < name > DFS. The namenode. Name. Dir < / name > < value > / home/Amos/SoftWare/hadoop - 2.7.7 / HDFS/name value > < / <description> Namenode store HDFS namespace metadata </description> </property> <property> <name>dfs.datanode.data.dir</name> < value > / home/Amos/SoftWare/hadoop - 2.7.7 / HDFS/data < value > / < description > physical storage location data blocks on a datanode < / description > < / property > DFS. Replication </name> <value>3</value> <description> Number of copies, </description> </property> </ Configuration >Copy the code
-
Modify mapred-site. XML to set the yarn name
< Configuration > <property> <name> mapReduce.framework. name</name> <value> YARN </value> <description> Specifies that mapReduce is run in yarn. Hadoop1 </description> </property> <! - hadoop history server - > < property > < name > graphs. The jobhistory. Address < / name > < value > master: 10020 < value > / < description > MR JobHistory location of the log Server management < / description > < / property > < property > < name > graphs. The JobHistory. Webapp. Address < / name > <value>master:19888</value> <description> View the Web address of Mapreduce jobs that have been run on the historical server. They only need to start the service < / description > < / property > < property > < name > graphs. The jobhistory. Done - dir < / name > <value>/mr-history/done</value> <description> Store the log managed by Mr JobHistory Server. Default :/ MR-history /done</description> </property> <property> <name>mapreduce.jobhistory.intermediate-done-dir</name> <value>/mr-history/tmp</value> <description> Stores MapReduce job logs. Default: - the history/TMP/Mr < / description > < / property > < property > < name > yarn. The app. Graphs. Am. Staging - dir < / name > <value>/ MR-history /hadoop-yarn/</value> <description> Specifies the applicationID and required JAR files, etc. </description> </property> <property> <name>mapreduce.map.memory.mb</name> <value>2048</value> <description> Physical memory limit for each map task </description> </property> < property > < name > graphs. Reduce. The memory. MB < / name > < value > 2048 < / value > < description > each reduce task of physical memory limit < / description > </property> </configuration>Copy the code
-
Modify slaves file to configure slave nodes
slave1slave2 Copy the code
-
Modify the yarn-site. XML file to configure the RM port
<configuration><! Log aggregation --> <property> <name>yarn.log-aggregation --> <property> <value>true</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>master:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8035</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value> </property> <property> <name>yarn.nodemanager.pmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value> </property> <property> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> </configuration>Copy the code
-
-
The remote copy
- SCP: security copy
Note: For the files copied to Slave1 and slave2, modify the related files, for example, source ~/.bash_profile
Remote copy to slave1, slave2 (premise is that the hostname is set to slave1 slave2) and SCP - r hadoop - 2.7.7 / [email protected]: / home/Amos/SoftWareCopy the code
- Rsync: remote synchronization tool
The difference between rsync and SCP is that rsync copies files faster than SCP. In addition, rsync only updates different files. SCP copies all files.
rsync -rvl ./yarn-site.xml root@slave1:/home/amos/test Copy the code
-
Formatting & Launching
Important: Format NameNode only at the first startup. As a result, a new cluster ID will be generated when NameNode is formatted. As a result, the cluster ids of NameNode and DataNode are inconsistent, and the cluster cannot find past data. Therefore, when formatting NameNode, delete data data and log logs before formatting NameNode.
-
Formatting on the master node
bin/hadoop namenode -format Copy the code
-
Start the cluster
Note: If NameNode and ResourceManger are not on the same machine, Yarn cannot be started on NameNode. Start Yarn on the machine where ResourceManger resides.
Sbin /start-all.sh # JPS # JPS is a JDK command. 4464 ResourceManager 4305 SecondaryNameNode 4972 Jps 4094 NameNodeCopy the code
-
The JPS view
#Master node: 16747 Jps16124 NameNode16493 ResourceManager16334 SecondaryNameNode# slave1 node: 10485 DataNode10729 Jps10605 NodeManager# slave2 node: 10521 NodeManager10653 Jps10399 DataNode Copy the code
-
2.3 Troubleshooting
-
process information unavailable
After a common user starts the corresponding program, the root user runs the kill command, causing the process to be in this state. This phenomenon may occur when different accounts kill processes. User starts a Java process, but kills it as user root.
ll /tmp/|grep hsperfdata rm -rf /tmp/hsperfdata* Copy the code
-
Turn off safe Mode
hdfs dfsadmin -safemode leave Copy the code
3 summary
All things are difficult at the beginning, and I have always believed that the initial installation and configuration of any project is the most difficult.
This article mainly summarizes the installation and configuration of VMware CentOS7 and Hadoop. I recall that AT the beginning of my own groping, I encountered various pits during the period, and then spent nearly two days to lose more than N hair to complete. Speaking of, the installation of this article is only part of the pit, some easy Baidu is not necessary to elaborate here.
Identify the problem, summarize the thinking, then try to solve the problem yourself, the progress will be obvious.
Well, that’s all for today, bye bye ~