Small knowledge, big challenge! This article is participating in the creation activity of “Essential Tips for Programmers”.
preface
Hadoop’s command line is often used by administrators or developers, so familiarity with these commands is a must
If you do not know the result after the command is executed, run the command in the development environment rather than the production environment, because the cost of a fault cannot be evaluated
HDFS consists of user commands, administrator commands, and Daemon commands
Because hadoop FS and HDFS DFS commands are the same, but most of the commands do not have Hadoop. Most of the bloggers use HDFS commands to manage and use Hadoop. Of course, some HDFS sub-commands do not have HADOOP, so they use Hadoop to implement. For example, migrate data distcp across clusters, compress archive from small files, and so on
🚀🌅😅 User command
User commands can be further divided into the following categories
Viewing Cluster Information
# HDFS classpath # HDFS envvars # hadoop getconf -namenodes # HDFS groups = rootCopy the code
HDFS File Operations
Common commands for adding, deleting, modifying, and querying HDFS files
# to check what are the file directory hadoop fs - ls/data # page to check the file content, will direct view | and more just to get rid of the hadoop fs - cat/wordcount/output/part - r -00000# | more force to create multiple directory hadoop fs - mkdir -p/aaa/BBB/cc/dd # additional appendToFile at the end of a file to the already existing hadoop fs - appendToFile. / hello. TXT Tail hadoop fs -tail /weblog/access_log1.-chgrp -chmod -chown example: Hadoop fs -chmodAwesome!/hello. TXT hadoop fs -chown someuser:somegrp /hello. TXT /bbb/jdk.tar.gz2.Hadoop fs -mv /aaa/jdk.tar.gz / # upload file to HDFS hadoop fs -mv /aaa/jdk.tar.gz / # upload file to HDFS hadoop fs -put ./jdk.tar.gz /bbb/jdk.tar.gz2.# Delete a file or folder rm hadoop fs -rm -r -f /aaa/ BBB / # Collect statistics about the available space of the file system df hadoop fs -df -h / # Collect statistics about the folder size du hadoop fs -du -s -h /aaaSetrep hadoop fs -setrep 3 /aaa/jdk.tar.gz /* # # count the number of file nodes in a specified directory count hadoop fs -count -v /test2 < the number of copies set here is only recorded in namenode metadata, whether there are really so many copies, Hadoop fs-expunge hadoop fs-getmerge /data/log.*./log.sumCopy the code
-
When the HDFS uploads a file larger than the specified size, it uploads the file to the HDFS in blocks. The file is displayed as one file externally, but may have different nodes and contain multiple blocks internally
-
There are no strict restrictions on users and groups in the HDFS. You can set any user or group, even if the specified user or group does not exist
-
When the read value is greater than the set value (128M by default), a new socket and output are written to another block file. In this way, different blocks of the same file may have different cluster nodes, and each block has the same copy as the set value (3 copies by default).
-
If the value of the number of copies is greater than the number of Datanodes, only the number of datanodes can be saved. One more file block does not save multiple copies on one Datanode
-
See the namenode web console file replicas is a copy of the metadata value, due to the above shows that the real value, there may be a deviation, can direct callback interface to view: http://namenode:9870/jmx curl – s | grep BlocksTotal
-
The HDFS access port of namenode is 9000 by default, and 8020 by cluster
Check the HDFS file system
# check the status of the entire file system HDFS FSCK / # Display the location of the specified path block HDFS FSCK /user/hue - files-blocks -locations # Display the detailed information about the copy HDFS FSCK /user/hue -files -blocks -replicaDetails # Displays the rack location of the block HDFS FSCK /user/hue - files-blocks -locations -racks # displays the file HDFS FSCK /user that is being operated FSCK/test2-list-corruptFileBlocks # Delete HDFS FSCK/User/Hue -delete
Copy the code
Piece of statistical
Statistics directory block data, the second column block data
hdfs dfs -count /user/* # check whether the number of blocks in the system is the same as the number above. If the number is not the same as the number above, HDFS FSCK / -blocks should be createdCopy the code
Block recovery
HDFS debug recoverley-path /user/ hue-retries If a block is lost and data is recovered5
Copy the code
The data backup
The snapshot backup
HDFS dfsadmin -allowsnapshot /data # createSnapshot HDFS dfsadmin -disallowsnapshot /data # createSnapshot HDFS dfs-createsnapshot /data data-snapshot-${Date} # delete HDFS DFS -deletesnapshot /data data-snapshot-${Date} # renameSnapshot HDFS DFS -renamesnapshot /data <oldName> <newName snapshotDiff /data <fromSnapshot> <toSnapshot>Copy the code
Migration across clusters
Mysql > migrate a file from n1 cluster to n2 foo//nn1:9820/foo/a hdfs://nn2:9820/foo/Hadoop distcp -overwrite HDFS://nn1:9820/source/first hdfs://nn1:9820/source/second hdfs://nn2:9820/targetHadoop distcp -update HDFS://nn1:9820/source/first hdfs://nn1:9820/source/second hdfs://nn2:9820/targetHadoop distcp -update -delete hdfs://nn1:9820/source/first hdfs://nn1:9820/source/second hdfs://nn2:9820/target
Copy the code
Edit log
When file system clients perform write operations (changing files, changing permissions, and so on), these transactions are first recorded in the EDits log
In extreme cases, you can avoid service risk by changing the appropriate action (delete to authorize)
Oev -i edits_xxxxx. XML -o edits_xxxxx. XML -o edits_xxxxx edits_xxxxxx -p binaryCopy the code
edits_inprogress
The action being processed by edits_inProgress generates edits files, which are numbered consecutiely.
fsimage
A permanent checkpoint for HDFS metadata that contains serialization information (ID, type, directory, owning user, user permissions, timestamp…) for all HDFS directories and file inodes. It’s all stored in fsimage, loaded into memory when namenode starts, and periodically updated with snapshots
HDFS oiv -i fsimage_0000000000000004588 -o fsimage_0000000000000004588Copy the code
🚀🌅😅 Administrator command
The data is balanced
Set the data percentage between datanodes to be in5The default is10, if there are many nodes,10HDFS Balancer-threshold is also ok5
Copy the code
Cluster management
HDFS dfsadmin -report # enter safemode (HDFS is read-only and cannot be written) HDFS dfsadmin -safemode enter # leave safemode HDFS dfsadmin -safemode leave Save the namenode data structure to HDFS dfsadmin-metasave20210914
--- cat /var/log/hadoop-hdfs/20210914FetchImage (fsimage); fsimage (fsimage); Do not run this command normally. HDFS dfsadmin-rolleditsCopy the code
High availability cluster management
ServiceId Indicates the Namenode ID in the HDFS UI.
Haadmin -checkhealth <serviceId> change the status of namenodes to Active or Standby. Therefore should be rarely used) HDFS haadmin [- transitionToActive | - transitionToStandby] [< serviceId >]Copy the code
Haadmin -failover <serviceId> <serviceId>Copy the code
- The NameNode provided by the first fails over to the second. If the first NameNode is Standby, this command will only convert the second NameNode to Active without error. If the first NameNode is in the Active state, an attempt is made to gracefully transition it to Standby. If it fails, the fencing methods (configured by dfs.ha.fencing. Methods) are tried sequentially until they succeed. Only after this process does the second NameNode transition to the Active state. If no fencing method succeeds, the second NameNode does not transition to the Active state and an error is returned
HDFS haadmin -getServicestate <serviceId>Copy the code
Beginning at the beginning of the namenode
If the PATH/ DFS /nn/current/ directory is not empty, the PATH/ DFS /nn/current/ directory is not empty. HDFS namenode-Initializesharededits is reportedCopy the code
Verify datanode metadata and meta-files
This command usually doesn’t work
HDFS debug verifyMeta -meta /PATH/ DFS /dn/current/BP-1796118229-10.101.26.-1600740369277/current/finalized/subdir0/subdir0/blk_1073742076_1252.meta -block /PATH/dfs/dn/current/BP-1796118229-10.101.26.-1600740369277/current/finalized/subdir0/subdir0/blk_1073742076
Copy the code
Small file compression
One way to manage blocks is to compress a large number of small files into a directory, reducing the number of blocks and reducing namenode pressure
Har -p /user/ XXX /old /user/ XXX/hadoop archive -archiveName data.har -p /user/ XXX /newHadoop fs -ls /user/ XXX /new/data.har # HDFS DFS -ls har:///user/xxx/new/data.harHDFS DFS -cat har:///user/xxx/new/data.har/99.txtHadoop fs -cp har///user/xxx/new/data.har/* /test10
Copy the code
🚀 🌅 😅 Daemon command
The Deamon command starts a cluster component service
Because the cluster is good, the startup is good, basically these commands are not used, as long as you know
To start the service, these commands read the corresponding configuration from the environment variables
# start journalNode (JN, HDFS namenode(NN, namenode) HDFS namenode(NN, namenode) Nfs3 (share HDFS file system, mount it locally, HDFS nfs3 # Start HDFS HTTPFS # Start KMS management service (key management service, default is not started) Hadoop KMSCopy the code
reference
The blogger finer command description, layout is very good, have a directory links are very convenient www.cnblogs.com/shudazhaofe…