🙋♂️🧠🙆♂️hadoop3.0 Command line Guide

Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.

preface

Hadoop’s command line is often used by administrators or developers, so familiarity with these commands is a must

If you do not know the result after the command is executed, run the command in the development environment rather than the production environment, because the cost of a fault cannot be evaluated

HDFS consists of user commands, administrator commands, and Daemon commands

Because hadoop FS and HDFS DFS commands are the same, but most of the commands do not have Hadoop. Most of the bloggers use HDFS commands to manage and use Hadoop. Of course, some HDFS sub-commands do not have HADOOP, so they use Hadoop to implement. For example, migrate data distcp across clusters, compress archive from small files, and so on

🚀🌅😅 User command

User commands can be further divided into the following categories

Viewing Cluster Information

# HDFS classpath # HDFS envvars # hadoop getconf -namenodes # HDFS groups = rootCopy the code

HDFS File Operations

Common commands for adding, deleting, modifying, and querying HDFS files

# to check what are the file directory hadoop fs - ls/data # page to check the file content, will direct view | and more just to get rid of the hadoop fs - cat/wordcount/output/part - r -00000# | more force to create multiple directory hadoop fs - mkdir -p/aaa/BBB/cc/dd # additional appendToFile at the end of a file to the already existing hadoop fs - appendToFile. / hello. TXT Tail hadoop fs -tail /weblog/access_log1.-chgrp -chmod -chown example: Hadoop fs -chmodAwesome!/hello. TXT hadoop fs -chown someuser:somegrp /hello. TXT /bbb/jdk.tar.gz2.Hadoop fs -mv /aaa/jdk.tar.gz / # upload file to HDFS hadoop fs -mv /aaa/jdk.tar.gz / # upload file to HDFS hadoop fs -put ./jdk.tar.gz /bbb/jdk.tar.gz2.# Delete a file or folder rm hadoop fs -rm -r -f /aaa/ BBB / # Collect statistics about the available space of the file system df hadoop fs -df -h / # Collect statistics about the folder size du hadoop fs -du -s -h /aaaSetrep hadoop fs -setrep 3 /aaa/jdk.tar.gz /* # # count the number of file nodes in a specified directory count hadoop fs -count -v /test2 < the number of copies set here is only recorded in namenode metadata, whether there are really so many copies, Hadoop fs-expunge hadoop fs-getmerge /data/log.*./log.sumCopy the code

When the HDFS uploads a file larger than the specified size, it uploads the file to the HDFS in blocks. The file is displayed as one file externally, but may have different nodes and contain multiple blocks internally
There are no strict restrictions on users and groups in the HDFS. You can set any user or group, even if the specified user or group does not exist
When the read value is greater than the set value (128M by default), a new socket and output are written to another block file. In this way, different blocks of the same file may have different cluster nodes, and each block has the same copy as the set value (3 copies by default).
If the value of the number of copies is greater than the number of Datanodes, only the number of datanodes can be saved. One more file block does not save multiple copies on one Datanode
See the namenode web console file replicas is a copy of the metadata value, due to the above shows that the real value, there may be a deviation, can direct callback interface to view: http://namenode:9870/jmx curl – s | grep BlocksTotal
The HDFS access port of namenode is 9000 by default, and 8020 by cluster

Check the HDFS file system

# check the status of the entire file system HDFS FSCK / # Display the location of the specified path block HDFS FSCK /user/hue - files-blocks -locations # Display the detailed information about the copy HDFS FSCK /user/hue -files -blocks -replicaDetails # Displays the rack location of the block HDFS FSCK /user/hue - files-blocks -locations -racks # displays the file HDFS FSCK /user that is being operated FSCK/test2-list-corruptFileBlocks # Delete HDFS FSCK/User/Hue -delete
Copy the code

Piece of statistical

Statistics directory block data, the second column block data

hdfs dfs -count /user/* # check whether the number of blocks in the system is the same as the number above. If the number is not the same as the number above, HDFS FSCK / -blocks should be createdCopy the code

Block recovery

HDFS debug recoverley-path /user/ hue-retries If a block is lost and data is recovered5
Copy the code

The data backup

The snapshot backup

HDFS dfsadmin -allowsnapshot /data # createSnapshot HDFS dfsadmin -disallowsnapshot /data # createSnapshot HDFS dfs-createsnapshot /data data-snapshot-${Date} # delete HDFS DFS -deletesnapshot /data data-snapshot-${Date} # renameSnapshot HDFS DFS -renamesnapshot /data <oldName> <newName snapshotDiff /data <fromSnapshot> <toSnapshot>Copy the code

Migration across clusters

Mysql > migrate a file from n1 cluster to n2 foo//nn1:9820/foo/a hdfs://nn2:9820/foo/Hadoop distcp -overwrite HDFS://nn1:9820/source/first hdfs://nn1:9820/source/second hdfs://nn2:9820/targetHadoop distcp -update HDFS://nn1:9820/source/first hdfs://nn1:9820/source/second hdfs://nn2:9820/targetHadoop distcp -update -delete hdfs://nn1:9820/source/first hdfs://nn1:9820/source/second hdfs://nn2:9820/target
Copy the code

Edit log

When file system clients perform write operations (changing files, changing permissions, and so on), these transactions are first recorded in the EDits log

In extreme cases, you can avoid service risk by changing the appropriate action (delete to authorize)

Oev -i edits_xxxxx. XML -o edits_xxxxx. XML -o edits_xxxxx edits_xxxxxx -p binaryCopy the code

edits_inprogress

The action being processed by edits_inProgress generates edits files, which are numbered consecutiely.

fsimage

A permanent checkpoint for HDFS metadata that contains serialization information (ID, type, directory, owning user, user permissions, timestamp…) for all HDFS directories and file inodes. It’s all stored in fsimage, loaded into memory when namenode starts, and periodically updated with snapshots

HDFS oiv -i fsimage_0000000000000004588 -o fsimage_0000000000000004588Copy the code

🚀🌅😅 Administrator command

The data is balanced

Set the data percentage between datanodes to be in5The default is10, if there are many nodes,10HDFS Balancer-threshold is also ok5
Copy the code

Cluster management

HDFS dfsadmin -report # enter safemode (HDFS is read-only and cannot be written) HDFS dfsadmin -safemode enter # leave safemode HDFS dfsadmin -safemode leave Save the namenode data structure to HDFS dfsadmin-metasave20210914
--- cat /var/log/hadoop-hdfs/20210914FetchImage (fsimage); fsimage (fsimage); Do not run this command normally. HDFS dfsadmin-rolleditsCopy the code

High availability cluster management

ServiceId Indicates the Namenode ID in the HDFS UI.

Haadmin -checkhealth <serviceId> change the status of namenodes to Active or Standby. Therefore should be rarely used) HDFS haadmin [- transitionToActive | - transitionToStandby] [< serviceId >]Copy the code

Haadmin -failover <serviceId> <serviceId>Copy the code

The NameNode provided by the first fails over to the second. If the first NameNode is Standby, this command will only convert the second NameNode to Active without error. If the first NameNode is in the Active state, an attempt is made to gracefully transition it to Standby. If it fails, the fencing methods (configured by dfs.ha.fencing. Methods) are tried sequentially until they succeed. Only after this process does the second NameNode transition to the Active state. If no fencing method succeeds, the second NameNode does not transition to the Active state and an error is returned

HDFS haadmin -getServicestate <serviceId>Copy the code

Beginning at the beginning of the namenode

If the PATH/ DFS /nn/current/ directory is not empty, the PATH/ DFS /nn/current/ directory is not empty. HDFS namenode-Initializesharededits is reportedCopy the code

Verify datanode metadata and meta-files

This command usually doesn’t work

HDFS debug verifyMeta -meta /PATH/ DFS /dn/current/BP-1796118229-10.101.26.-1600740369277/current/finalized/subdir0/subdir0/blk_1073742076_1252.meta -block /PATH/dfs/dn/current/BP-1796118229-10.101.26.-1600740369277/current/finalized/subdir0/subdir0/blk_1073742076
Copy the code

Small file compression

One way to manage blocks is to compress a large number of small files into a directory, reducing the number of blocks and reducing namenode pressure

Har -p /user/ XXX /old /user/ XXX/hadoop archive -archiveName data.har -p /user/ XXX /newHadoop fs -ls /user/ XXX /new/data.har # HDFS DFS -ls har:///user/xxx/new/data.harHDFS DFS -cat har:///user/xxx/new/data.har/99.txtHadoop fs -cp har///user/xxx/new/data.har/* /test10
Copy the code

🚀 🌅 😅 Daemon command

The Deamon command starts a cluster component service

Because the cluster is good, the startup is good, basically these commands are not used, as long as you know

To start the service, these commands read the corresponding configuration from the environment variables

# start journalNode (JN, HDFS namenode(NN, namenode) HDFS namenode(NN, namenode) Nfs3 (share HDFS file system, mount it locally, HDFS nfs3 # Start HDFS HTTPFS # Start KMS management service (key management service, default is not started) Hadoop KMSCopy the code

reference

The blogger finer command description, layout is very good, have a directory links are very convenient www.cnblogs.com/shudazhaofe…