This is the 9th day of my participation in Gwen Challenge

Today I officially start my Hadoop learning journey. Linux and Shell written by the editor before are the foundation stones for learning big data, but I still need to master them as a whole. When encountering problems, I can take targeted solutions to improve work efficiency. Ok, no more nonsense, let’s get to the main topic: big data Hadoop operating environment construction. If there is any improper writing of this article, I hope you big data practitioners or enthusiasts to correct or give better suggestions, we grow together in the road of big data slowly.

Hadoop operating environment is a necessary skill for big data developers, but I believe that if you work in big data related departments, you should understand it, which is conducive to your growth and development. This article uses CentOS6.8 as the Linux operating system version. The following three aspects are discussed :(1) preparing for the vm environment; (2) Install JDK and Hadoop; (3) Hadoop directory structure.

If you already have virtual machines and CentOS installed on your computer, this process is So Easy for you. If not, don’t worry, please see xiaobian to prepare for you 0 infrastructure: Big data technology Linux practical practice (a). It is easy to clone a VIRTUAL machine.

Then you start cloning, almost the next step along the way, see the screenshot below:

Big Data For Linux (Part 1)

Start the VM, log in to the VM as user root, and change the IP address. For details, see the following figure:

Open the terminal and run the vim /etc/udev/rules.d/70-persistent-net.rules command

After entering, do three things :(1) delete a “SUBSYTEM” by pressing dd; (2) change NAME=”eth1″ to eth0; (3) Copy address for later use. The specific results are shown below:

Next, enter the following command: vim /etc/sysconfig/network-scripts/ifcfg-eth0 to configure the network and modify the content as shown in the figure:

As shown in the following figure, do not change the circle yes, static and gateway according to your previous Settings:

Vim /etc/sysconfig/network

Run the vim /etc/hosts command to configure the hosts file. The following figure shows the modifications:

Restart the service, restart the VM, check the IP address, and ping the network

Simply run the useradd XXXX and passwd XXX….. commands “Can be easily done

The following operations are all connected with Xshell. I’m used to this. You can do it directly from the virtual machine. Or use another remote connection tool.

The vim /etc/sudoers command is used to configure permissions

We must hear the JDK is not thought of Java that difficult ah, a Java deep as the sea. Don’t worry, this is just configuring a Java environment, Hadoop framework is based on Java, you can’t play without this thing. Preparations for JDK installation are as follows :(1) create the module and software folders in the /opt directory. (2) Modify the owner of module and software folders. The two directories are divided into :(1) software storage jar package, that is, software installation package; (2) Module is the software installation directory.

There are other ways you can do this, but I personally recommend using Xshell and Xftp, which are intuitive and easy to use.

3. In the Opt directory of Linux, check whether the import is successful

Run the tar -zxvf jdK-8u144-linux-x64.tar. gz -c /opt/module/ command

If you already have virtual machines and CentOS installed on your computer, this process is So Easy for you. If not, don’t worry, please see xiaobian to prepare for you 0 infrastructure: Big data technology Linux practical practice (a). It is easy to clone a VIRTUAL machine.

After the JDK installation process, the following install Hadoop, the method and process is the same, xiaobian will not say more, please refer to JDK installation.

Hadoop download address: archive.apache.org/dist/hadoop…

To learn something, you must first know it. Now we need to know something about Hadoop. This point and our ordinary interpersonal communication, are from the recognition of friends slowly. All right, let’s cut to the chase.

(1) Bin directory: stores scripts for operating Hadoop related services (HDFS,YARN)

(2) Etc directory: Hadoop configuration file directory where Hadoop configuration files are stored

(3) Lib directory: local library for storing Hadoop (data compression and decompression function)

(4) Sbin directory: stores scripts for starting or stopping Hadoop-related services

(5) Share directory: stores Hadoop dependent JAR packages, documents, and official cases

Xiaobian will continue to update big data and other content, please stay tuned. More exciting content, please pay attention to the public number: Xiao Han senior take you to learn