We may search for big data on the Internet, and a lot of theoretical things come to us, all in this way, at that time a face meng forced, so I will not introduce theoretical knowledge to you, directly take you to build a distributed development environment.
Hadoop distributed architecture (one master, two slave)
The host name | The IP address | namenode | datanode |
---|---|---|---|
master | 192.168.6.133 | is | no |
slave1 | 192.168.6.131 | no | is |
slave2 | 192.168.6.132 | no | is |
The first step
The VM and Java environment are available
A centerOS7 VM is required and the JDK environment has been set up. If you have any questions, you can take the Hadoop tour 1-Centeros7: Set up the Java environment
The second step
Software to prepare
Get your Hadoop development kit ready
- Apache official website download
- Apache history library download
- Hadoop-2.7.3 is the version I share with you
- I used Filezilla to put the decompression package into the Linux system. You can also use the wget command to download it directly to the Linux system
The third step
Unzip Hadoop and rename it
- Decompress the Hadoop package in the downloaded directory
[root@localhost mmcc]# tar - ZXVF hadoop - 2.7.3. Tar. Gz. // Rename the directory name (optional) [root@localhost MMCC]# mv hadoop - 2.7.3 / hadoop2.7.3
Copy the code
- View the hadoop root path
[root@localhost mmcc]# CD hadoop2.7.3 /[root @ localhost hadoop2.7.3]# pwd/home/ MMCC /hadoop2.7.3 // Is used to configure environment variablesCopy the code
The fourth step
Configuring environment Variables
- in
/etc/profile
At the bottom,Hadoop tour 1-centerOS7: build Java environmentSection configurationPATH
.CLASSPATH
Add environment variable configuration above
HADOOP_HOME = / home/MMCC/hadoop2.7.3 PATH =$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$JAVA_HOME/bin:$PATH:.
Copy the code
- Enable environment variables
[root @ localhost jdk1.8]# source /etc/profile
Copy the code
- Configure the Java environment of Hadoop and edit the Hadoop root directory
/etc/hadoop/
Under thehadoop.env.sh
The script file
Vi/hadoop - 2.7.3 / etc/hadoop/hadoop env. ShexportJAVA_HOME=/home/ MMCC /jdk1.8 // Configure the Java environment directoryCopy the code
- Configure the hadoop startup environment and edit the Hadoop root directory
/etc/hadoop/
Under thecore-site.xml
File.
<property> <name>fs.defaultFS</name> <value> HDFS ://master:9000Copy the code
The master here will teach you later
Step 5
Distributed Environment construction
- For convenience, clone multiple images using vm cloning. In this way, all environments created before this step are synchronized
- Use this command to set a host name for each node
[root@localhost mmcc]# hostnamectl set-hostname master/slave1/slave2
Copy the code
- Detect network
[root@localhost mmcc]# ifconfigens33: Flags = 4163 < UP, BROADCAST, RUNNING, MULTICAST > mtu 1500 inet 192.168.6.133 netmask 255.255.255.0 BROADCAST 192.168.6.255 inet6 fe80::3d1d:5127:6666:c62d prefixlen 64 scopeid 0x20<link> ether 00:0c:29:f4:ef:5d txqueuelen 1000 (Ethernet) RX Packets 317168 bytes 315273916 (300.6 MiB) RX errors 0 Dropped 0 Overruns 0 Frame 0 TX packets 149675 bytes 14400069 (13.7 MiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0 LO: Flags =73<UP,LOOPBACK,RUNNING> MTU 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixLen 128 scopeid 0x10<host> loop Txqueuelen 1 (Local Loopback) RX packets 12826 bytes 3163428 (3.0 MiB) RX errors 0 Dropped 0 Overruns 0 frame 0 TX Packets 12826 bytes 3163428 (3.0 MiB) TX errors 0 Dropped 0 Overruns 0 carrier 0 collisions 0Copy the code
If the IP address cannot be queried, configure the network
cd/etc/sysconfig/network-scripts/ vi ifcfg-ens33 (my VM version, other versions may vary) ONBOOT="yes"Table starts the network.Copy the code
- Set the network alias, that is, the alias corresponding to the IP address. For example:
hdfs://master:9000
[root@localhost network-scripts]# vi /etc/hosts127.0.0.1 localhost localhost.localdomain localhost4 localhost4. Localdomain4 ::1 localhost localhost.localdomain Localhost6 localhost6. Localdomain6 192.168.6.133 Master 192.168.6.131 slave1 192.168.6.132 slave2Copy the code
Restart the network
Service network restart Restarts the networkCopy the code
Then you can try ping master/slave1/slave2. If the ping succeeds, the configuration is successful
- Format HDFS and run the following command on each node
hdfs namenode -format
Copy the code
Format before startup. If there is no error or Exception, the format is successful
6. Configure the Hadoop cluster node for the master host
cd/ home/MMCC hadoop2.7.3 / etc/hadoop [root @ localhost hadoop]# vi slaves // Add the following content slave1 slave2Copy the code
7. Disable the firewall on each node and start the HDFS service.
[root@localhost mmcc]# systemctl stop firewalld
[root@localhost mmcc]# hadoop-daemon.sh start namenode
[root@localhost mmcc]Sh start datanode // slave nodes slave1, slave2
Copy the code
Then you can enter the master node address master:50070 or IP address :50070 on the web page to view the current status and node status oh. At this point a distributed Hadoop environment has been successfully started. In the next section, you will learn how to perform secret free login, one-click cluster startup, and some simple HDFS file storage commands. If there is any problem during configuration, you can check the log to troubleshoot the problem. Welcome to add my wechat to learn and make progress together