Hadoop for the back end of the basic tutorial (3) : Hadoop running environment setup

Preface:

In the last article, we briefly described the three important components of Hadoop: Map-Reduce, Yarn, and HDFS. Since we want to learn Hadoop, the first thing we need to solve is the problem of the running environment. After all, I personally think the best way to learn programming is to type while reading books. If you just read cloud learning, theoretical things are ok. But if the content of the nature of the code does not knock again, it will lead to a very obvious problem, that is, I feel clearly will, the program has been written out of the error, the overall learning efficiency is very touching.

Cut the crap and just get on with it.

The virtual machine:

First, before you learn about Hadoop. You need a virtual machine, of course if you happen to have deep pockets, you can also buy your own server, or later buy a group of servers as a cluster, so that learning is much easier.

Given that we will be building a cluster and may have to start multiple Linux hosts at the same time, virtual machines are a good choice, especially for those of us whose primary purpose is learning. However, this requires that your computer configuration may be slightly better, in my case, I5+8G configuration of the desktop, running three virtual machines is still a bit of a struggle.

Create a new user, in this case Hanshu, and configure user hanshu to have root privileges.

Create two folders, module and Software, under /opt

sudo mkdir module
sudo mkdir software
Copy the code

Change the module and Software folder owner to Hanshu

sudo chown hanshu:hanshu module/ software/
Copy the code

At this point, the basic preparation of our virtual machine is already complete.

Setting up the Java environment:

The Linux distribution we choose to use this time is centos7 system. Centos7 comes with Java environment by default, but the openjdk of centos7 does not add support for Java monitoring command JPS. There are two solutions to solve this problem. The first is to uninstall the original OpenJDK and reinstall it. The second is to install the JDK development plug-in through Yum.

First let’s look at our native Openjdk version:

rpm -qa | grep openjdk
Copy the code

# yum install Java 1.8 JDK plugin # yum install JDK 1.8

Yum install - y Java - 1.8.0 comes with - its develCopy the code

The third step is to add our Java environment variables to our /etc/profile file. I will not list the specific operations, and finally I will post my /etc/profile content for your reference.

Hadoop installation:

The first step is to download our Hadoop. I choose Hadoop version 2.7.2 here. I know many friends may ask:

Hadoop3.x is already out, so why not use 3.x?

What I want to say here is that the cost of knowledge update is very low once you learn a version of Java 1.6, for example, and then go to Java 1.8, it can be overdone very quickly. As far as I know, hadoop2.x version is still the most widely used version in enterprises. After all, what enterprises pursue is the stability of development, but Hadoop 3.x version will definitely be a trend in the future.

Hadoop download address:

Archive.apache.org/dist/hadoop…

Use Xshell or other Linux terminal management tools to upload the Hadoop installation package we downloaded to the /opt/software directory we created above.

Decompress the package to the /opt/module directory:

Tar -zxvf hadoop-2.2.2.tar. gz -c /opt/module/Copy the code

Add Hadoop to the environment variable:

I’m not going to show you how to add a directory to /etc/profile. I’m going to post my /etc/profile information as follows:

##JAVA_HOMEExport JAVA_HOME=/usr/lib/ JVM/java-1.8.0-openJDK /jre/ export JRE_HOME=$JAVA_HOME/jre export CLASSPATH=$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$PATH
##HADOOP_HOME Export HADOOP_HOME=/opt/module/hadoop-2.7.2 export PATH=$PATH:$HADOOP_HOME/bin Export PATH=$PATH:$HADOOP_HOME/sbinCopy the code

Run the following command to make the configuration take effect:

source /etc/profile
Copy the code

Run the hadoop version command on the terminal to check whether Hadoop is installed successfully:

[hanshu@hadoop100 ~]$hadoop version Hadoop 2.7.2Copy the code

When the Hadoop version information is displayed, it means that the Hadoop running environment has been configured successfully.

Hadoop directory structure:

Just like Java, Hadoop has a clear directory structure for storing content. Here are a few important directories to briefly explain what they do:

Bin directory: stores scripts for operating Hadoop related services (HDFS,YARN).
Etc directory: Hadoop configuration file directory that stores Hadoop configuration files and other information.
Lib directory: local library for storing Hadoop (data compression and decompression).
Sbin directory: stores scripts for starting or stopping Hadoop-related services.
Share directory: Stores Hadoop dependent JARS, documents, and official cases, such as wordCount.

Let’s start with the technical summary:

In today’s article, we briefly walked over the configuration of the Hadoop base runtime environment. Because a lot of operating is really too much foundation, such as check file directory, configure the environment variables, using the vim editor, etc. These operations should be a Java programmer’s basic operation, so I didn’t do very detailed description, of course, if you have not understand students can go to Google or baidu access to relevant data, the whole configuration is successful or not complicated. In the next section, we will implement the construction of Hadoop pseudo-distributed environment by modifying Hadoop configuration files. After I finish the exam on Saturday, the update frequency will probably be maintained at the pace of two days and one more. The winter holiday is coming soon, and my laptop for many years will not be able to run cluster.

Thank you very much for reading this, your support and attention is my motivation to continue high-quality sharing.

Relevant code has been uploaded to my Github. Make sure you hit “STAR” ahhhhhhh

Long march always feeling, to a star line

Hanshu development notes

Welcome to like, follow me, have good fruit to eat (funny)

Hadoop for the back end of the basic tutorial (3) : Hadoop running environment setup

Preface:

The virtual machine:

Setting up the Java environment:

Hadoop installation:

Add Hadoop to the environment variable:

Hadoop directory structure:

Let’s start with the technical summary:

Related Posts

What about the rumored zero message loss configuration in Kafka

SQL11 Obtain the current manager of all employees

Stay up late liver finished, Tencent front end, Java, C++, Go surface summary (all)