This is the 15th day of my participation in Gwen Challenge

Hadoop framework overview of HDFS, mainly explains the common Shell operation commands. The content of today’s small edition is HDFS client operation (development focus), the main content is :(1) HDFS client environment preparation; (2) HDFS API operation; (3) HDFS I/O flow operations.

Copy the compiled Hadoop JAR package to a non-Chinese path (for example, D:\software\hadoop-2.7.2) according to your PC’s operating system, as shown in the following figure.

2. Configure the HADOOP_HOME environment variable, as shown in the figure.

3. Configure the Path environment variable, as shown in The figure.

References 1 to 3 are summarized as follows:

= = = = = = = = = = = = = = = = = = =

Reference 1: Configuring the JDK with IntelliJ IDEA:

Blog.csdn.net/nobb111/art…

Refer 2: Maven environment configuration and Maven deployment in IntelliJ IDEA

www.cnblogs.com/sigm/p/6035…

Blog.csdn.net/tigaobanson…

Blog.csdn.net/weixin\_392…

4. Create a Maven project HdfsClientDemo

Reference 1: Create a Maven project HdfsClientDemo

Blog.csdn.net/weixin\_398…

www.cnblogs.com/clicklin/p/…

Note: If Eclipse/Idea does not print logs, they are only displayed on the console

Create a new file named “log4j.properties” in the SRC /main/resources directory of your project and fill it with the following contents:

The user name needs to be configured at runtime. When a client operates HDFS, it has a user identity. By default, the HDFS client API obtains a parameter from the JVM as its user identity: -dhadoop_user_name =xuefa, where xuefa is the user name.

http://192.168.220.132:50070/explorer.html#/

2. Copy hdFS-site. XML to the root directory of the project (resource).

Priority order of parameters :(1) values set in the client code > (2) user-defined profiles in ClassPath > (3) then the default configuration of the server

1. Idea view source code without comments, refer to the following article Settings:

www.cnblogs.com/tdyang/p/11…

Intellij IDEA Error: Java release 5 is not supported.

Blog.csdn.net/qq\_4258320…

For details about Maven settings. XML configuration, see the following link:

www.cnblogs.com/dalianpai/p…

No.3 HDFS I/O flow operations

The API we learned above to operate the HDFS system is packaged by the framework. So what if we want to implement the above API operations ourselves?

3.1 Uploading HDFS Files


1. Note: Upload the banhua. TXT file on drive E to the HDFS root directory

3.2 Downloading HDFS Files

1. Requirements: Download the banhua. TXT file from HDFS to local drive E

3.3 Locating file Reading

1. Requirements: Read large files in the HDFS in chunks, for example, / hadoop-2.4.tar. gz in the root directory

In the Window command Window, go to the directory E:\ and run the following command to merge the data

Type the hadoop – 2.7.2. Tar. Gz. Part2 > > hadoop – 2.7.2. Tar. Gz. Part1

After the merge, rename hadoop-2.4.tar.gz. part1 to hadoop-2.4.tar.gz. The tar package is very complete when decompressed.

This is the end of today’s learning content, continue to learn, please stay tuned! More exciting content, please pay attention to the public number: Xiao Han senior take you to learn