The profile
All HDFS commands are invoked using bin/ HDFS scripts. Empty arguments Running the script displays a description of all commands.
HDFS [SHELL_OPTIONS] COMMAND [GENERIC_OPTIONS] [COMMAND_OPTIONS]
Hadoop has an option resolution framework that resolves common options and runs classes.
COMMAND_OPTIONS | Description |
---|---|
–config –loglevel | The common set of shell options. These are documented on the Commands Manual page. |
GENERIC_OPTIONS | The common set of options supported by multiple commands. See the Hadoop Commands Manual for more information. |
COMMAND COMMAND_OPTIONS | Various commands with their options are described in the following sections. The commands have been grouped into User Commands and Administration Commands. |
User commands
A number of commands useful to Hadoop cluster users.
classpath
Usage: HDFS classpath [- glob | – jar | | – h – help]
COMMAND_OPTION | Description |
---|---|
–glob | expand wildcards |
–jar path | write classpath as manifest in jar named path |
-h, –help | print help |
Print the classpath required to obtain the Hadoop JAR and dependent libraries. If called without arguments, it prints the classpath set by the command script, which may contain wildcards in the classpath entry. Other options print the classpath after the wildcard extension, or write the classpath to a listing in the JAR file. The latter is useful in environments where wildcards are not available and the extended classpath exceeds the maximum supported command line length.
dfs
Usage: HDFS DFS [COMMAND [COMMAND_OPTIONS]] Runs a file system COMMAND on a file system supported by Hadoop. The COMMAND_OPTIONS variable can be found in the file system shell guide.
fetchdt
HDFS fetchdt <token_file_path>
COMMAND_OPTION | Description |
---|---|
–webservice NN_Url | Url to contact NN on (starts with http or https) |
–renewer name | Name of the delegation token renewer |
–cancel | Cancel the delegation token |
–renew | Renew the delegation token. Delegation Token must have been fetched using the — renewer name option. |
Print the delegation token | |
token_file_path | File path to store the token into. |
Get the delegate token from NameNode. For details, see Fetchdt
fsck
Usage:
hdfs fsck <path>
[-list-corruptfileblocks |
[-move | -delete | -openforwrite]
[-files [-blocks [-locations | -racks | -replicaDetails | -upgradedomains]]]
[-includeSnapshots]
[-storagepolicies] [-maintenance] [-blockId <blk_Id>]
Copy the code
COMMAND_OPTION | Description |
---|---|
path | Start checking from this path. |
-delete | Delete corrupted files. |
-files | Print out files being checked. |
-files -blocks | Print out the block report |
-files -blocks -locations | Print out locations for every block. |
-files -blocks -racks | Print out network topology for data-node locations. |
-files -blocks -replicaDetails | Print out each replica details. |
-files -blocks -upgradedomains | Print out upgrade domains for every block. |
-includeSnapshots | Include snapshot data if the given path indicates a snapshottable directory or there are snapshottable directories under it. |
-list-corruptfileblocks | Print out list of missing blocks and files they belong to. |
-move | Move corrupted files to /lost+found. |
-openforwrite | Print out files opened for write. |
-storagepolicies | Print out storage policy summary for the blocks. |
-maintenance | Print out maintenance state node details. |
-blockId | Print out information about the block. |
Run the HDFS file system check utility. For details, see FSCK
getconf
Usage:
hdfs getconf -namenodes
hdfs getconf -secondaryNameNodes
hdfs getconf -backupNodes
hdfs getconf -includeFile
hdfs getconf -excludeFile
hdfs getconf -nnRpcAddresses
hdfs getconf -confKey [key]
Copy the code
COMMAND_OPTION | Description |
---|---|
-namenodes | gets list of namenodes in the cluster. |
-secondaryNameNodes | gets list of secondary namenodes in the cluster. |
-backupNodes | gets list of backup nodes in the cluster. |
-includeFile | gets the include file path that defines the datanodes that can join the cluster. |
-excludeFile | gets the exclude file path that defines the datanodes that need to decommissioned. |
-nnRpcAddresses | gets the namenode rpc addresses |
-confKey [key] | gets a specific key from the configuration |
Obtain the configuration information from the configuration directory for post-processing.
groups
HDFS groups [username…] Returns group information for the specified one or more users.
lsSnapshottableDir
HDFS lsSnapshottableDir [-help]
COMMAND_OPTION | Description |
---|---|
-help | print help |
The snapshot directory list is displayed. When run as superuser, all snapshot directories are returned; otherwise, snapshot directories belonging to that user are returned.
jmxget
Usage: HDFS jmxget [- localVM ConnectorURL | – port port | – server mbeanserver | – service service]
COMMAND_OPTION | Description |
---|---|
-help | print help |
-localVM ConnectorURL | connect to the VM on the same machine |
-port mbean server port | specify mbean server port, if missing it will try to connect to MBean Server in the same VM |
-server | specify mbean server (localhost by default) |
-service NameNode|DataNode | specify jmx service. NameNode by default. |
Dump JMX information from the service
oev
HDFS oev [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
Required command line arguments:
COMMAND_OPTION | Description |
---|---|
-i,–inputFile arg | edits file to process, xml (case insensitive) extension means XML format, any other filename means binary format |
-o,–outputFile arg | Name of output file. If the specified file exists, it will be overwritten, format of the file is determined by -p option |
Optional command line arguments:
COMMAND_OPTION | Description |
---|---|
-f,–fix-txids | Renumber the transaction IDs in the input, so that there are no gaps or invalid transaction IDs. |
-h,–help | Display usage information and exit |
-r,–recover | When reading binary edit logs, use recovery mode. This will give you the chance to skip corrupt parts of the edit log. |
-p,–processor arg | Select which type of processor to apply against image file, currently supported processors are: binary (native binary format that Hadoop uses), xml (default, XML format), stats (prints statistics about edits file) |
-v,–verbose | More verbose output, prints the input and output filenames, for processors that write to a file, also output to screen. On large image files this will dramatically increase processing time (default is false). |
Hadoop offline editor viewer. See Offline Edits Viewer Guide for more information
oiv
HDFS oiv [OPTIONS] -i INPUT_FILE
Required command line arguments:
COMMAND_OPTION | Description |
---|---|
-i | –inputFile input file |
Optional command line arguments:
COMMAND_OPTION | Description |
---|---|
-o,–outputFile output file | Specify the output filename, if the specified output processor generates one. If the specified file already exists, it is silently overwritten. (output to stdout by default) If the input file is an XML file, it also creates an .md5. |
-p,–processor processor | Specify the image processor to apply against the image file. Currently valid options are Web (default), XML, Delimited, FileDistribution and ReverseXML. |
-addr address | Specify the address(host:port) to listen. (localhost:5978 by default). This option is used with Web processor. |
-maxSize size | Specify the range [0, maxSize] of file sizes to be analyzed in bytes (128GB by default). This option is used with FileDistribution processor. |
-step size | Specify the granularity of the distribution in bytes (2MB by default). This option is used with FileDistribution processor. |
-format | Format the output result in a human-readable fashion rather than a number of bytes. (false by default). This option is used with FileDistribution processor. |
-delimiter arg | Delimiting string to use with Delimited processor. |
-t,–temp temporary dir | Use temporary dir to cache intermediate result to generate Delimited outputs. If not set, Delimited processor constructs the namespace in memory before outputting text. |
-h,–help | Display the tool usage and help information and exit. |
Hadoop offline viewer for image files (Hadoop 2.4 and above). See Offline Image Viewer Guide for details
oiv_legacy
HDFS oiv_legacy [OPTIONS] -i INPUT_FILE -o OUTPUT_FILE
COMMAND_OPTION | Description |
---|---|
-i,–inputFile input file | Specify the input fsimage file to process. |
-o,–outputFile output file | Specify the output filename, if the specified output processor generates one. If the specified file already exists, it is silently overwritten |
Optional command line arguments:
COMMAND_OPTION | Description |
---|---|
-p|–processor processor | Specify the image processor to apply against the image file. Valid options are Ls (default), XML, Delimited, Indented, FileDistribution and NameDistribution. |
-maxSize size | Specify the range [0, maxSize] of file sizes to be analyzed in bytes (128GB by default). This option is used with FileDistribution processor. |
-step size | Specify the granularity of the distribution in bytes (2MB by default). This option is used with FileDistribution processor. |
-format | Format the output result in a human-readable fashion rather than a number of bytes. (false by default). This option is used with FileDistribution processor. |
-skipBlocks | Do not enumerate individual blocks within files. This may save processing time and outfile file space on namespaces with very large files. The Ls processor reads the blocks to correctly determine file sizes and ignores this option. |
-printToScreen | Pipe output of processor to console as well as specified file. On extremely large namespaces, this may increase processing time by an order of magnitude. |
-delimiter arg | When used in conjunction with the Delimited processor, replaces the default tab delimiter with the string specified by arg. |
-h|–help | Display the tool usage and help information and exit. |
Hadoop offline viewer for older versions of image files. See HDFS Snapshot Documentation
version
Usage: HDFS version Printed version.
An admin command
A number of commands useful to Hadoop cluster administrators.
balancer
Usage:
hdfs balancer
[-policy <policy>]
[-threshold <threshold>]
[-exclude [-f <hosts-file> | <comma-separated list of hosts>]]
[-include [-f <hosts-file> | <comma-separated list of hosts>]]
[-source [-f <hosts-file> | <comma-separated list of hosts>]]
[-blockpools <comma-separated list of blockpool ids>]
[-idleiterations <idleiterations>]
[-runDuringUpgrade]
Copy the code
COMMAND_OPTION | Description |
---|---|
-policy | atanode (default): Cluster is balanced if each datanode is balanced.blockpool: Cluster is balanced if each block pool in each datanode is balanced. |
-threshold | Percentage of disk capacity. This overwrites the default threshold. |
-exclude -f <hosts-file> | <comma-separated list of hosts> | Excludes the specified datanodes from being balanced by the balancer. |
-include -f <hosts-file> | <comma-separated list of hosts> | Includes only the specified datanodes to be balanced by the balancer. |
-source -f <hosts-file> | <comma-separated list of hosts> | Pick only the specified datanodes as source nodes. |
-blockpools <comma-separated list of blockpool ids> | The balancer will only run on blockpools included in this list. |
-idleiterations <iterations> | Maximum number of idle iterations before exit. This overwrites the default idleiterations(5). |
-runDuringUpgrade | Whether to run the balancer during an ongoing HDFS upgrade. This is usually not desired since it will not affect used space on over-utilized machines. |
-h|–help | Display the tool usage and help information and exit. |
By running the cluster balancer application, the administrator can simply stop the balancer process with the Ctrl-C command. See Balancer for more information.
Note that the Blockpool policy is stricter than the Datanode policy.
In addition to the above command options, a fixed feature has been introduced since 2.7.0 to prevent some copies from being moved by balancers/movers. This feature is disabled by default and can be enabled by configuring the attribute ‘dfs.datanode.block-pinning. Enabled’. When enabled, this feature only affects the block of data that is called create() to write to the specified node. This function is useful for applications such as HBase Regionserver that want to maintain data locality.
cacheadmin
Usage:
hdfs cacheadmin [-addDirective -path <path> -pool <pool-name> [-force] [-replication <replication>] [-ttl <time-to-live>]]
hdfs cacheadmin [-modifyDirective -id <id> [-path <path>] [-force] [-replication <replication>] [-pool <pool-name>] [-ttl <time-to-live>]]
hdfs cacheadmin [-listDirectives [-stats] [-path <path>] [-pool <pool>] [-id <id>]]
hdfs cacheadmin [-removeDirective <id>]
hdfs cacheadmin [-removeDirectives -path <path>]
hdfs cacheadmin [-addPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]]
hdfs cacheadmin [-modifyPool <name> [-owner <owner>] [-group <group>] [-mode <mode>] [-limit <limit>] [-maxTtl <maxTtl>]]
hdfs cacheadmin [-removePool <name>]
hdfs cacheadmin [-listPools [-stats] [<name>]]
hdfs cacheadmin [-help <command-name>]
Copy the code
See HDFS Cache Administration Documentation for more information
crypto
Usage:
hdfs crypto -createZone -keyName <keyName> -path <path>
hdfs crypto -listZones
hdfs crypto -provisionTrash -path <path>
hdfs crypto -help <command-name>
Copy the code
See HDFS Transparent Encryption Documentation for more information
datanode
Usage: HDFS datanode [- regular | – rollback | – rollingupgrade rollback]
COMMAND_OPTION | Description |
---|---|
-regular | Normal datanode startup (default). |
-rollback | Rollback the datanode to the previous version. This should be used after stopping the datanode and distributing the old hadoop version. |
-rollingupgrade rollback | Rollback a rolling upgrade operation. |
Run an HDFS Datanode
dfsadmin
Usage:
hdfs dfsadmin [-report [-live] [-dead] [-decommissioning] [-enteringmaintenance] [-inmaintenance]]
hdfs dfsadmin [-safemode enter | leave | get | wait | forceExit]
hdfs dfsadmin [-saveNamespace]
hdfs dfsadmin [-rollEdits]
hdfs dfsadmin [-restoreFailedStorage true |false|check] hdfs dfsadmin [-refreshNodes] hdfs dfsadmin [-setQuota <quota> <dirname>...<dirname>] hdfs dfsadmin [-clrQuota <dirname>...<dirname>] hdfs dfsadmin [-setSpaceQuota <quota> [-storageType <storagetype>] <dirname>...<dirname>] hdfs dfsadmin [-clrSpaceQuota [-storageType <storagetype>] <dirname>...<dirname>] hdfs dfsadmin [-finalizeUpgrade] hdfs dfsadmin [-rollingUpgrade [<query> |<prepare> |<finalize>]] hdfs dfsadmin [-refreshServiceAcl] hdfs dfsadmin [-refreshUserToGroupsMappings] hdfs dfsadmin [-refreshSuperUserGroupsConfiguration] hdfs dfsadmin [-refreshCallQueue] hdfs dfsadmin [-refresh <host:ipc_port> <key> [arg1..argn]] hdfs dfsadmin [-reconfig <namenode|datanode> <host:ipc_port> <start |status |properties>] hdfs dfsadmin [-printTopology] hdfs dfsadmin [-refreshNamenodes datanodehost:port] hdfs dfsadmin [-getVolumeReport datanodehost:port] hdfs dfsadmin [-deleteBlockPool datanode-host:port blockpoolId [force]] hdfs dfsadmin [-setBalancerBandwidth <bandwidthin bytes per second>]
hdfs dfsadmin [-getBalancerBandwidth <datanode_host:ipc_port>]
hdfs dfsadmin [-fetchImage <local directory>]
hdfs dfsadmin [-allowSnapshot <snapshotDir>]
hdfs dfsadmin [-disallowSnapshot <snapshotDir>]
hdfs dfsadmin [-shutdownDatanode <datanode_host:ipc_port> [upgrade]]
hdfs dfsadmin [-evictWriters <datanode_host:ipc_port>]
hdfs dfsadmin [-getDatanodeInfo <datanode_host:ipc_port>]
hdfs dfsadmin [-metasave filename]
hdfs dfsadmin [-triggerBlockReport [-incremental] <datanode_host:ipc_port>]
hdfs dfsadmin [-listOpenFiles]
hdfs dfsadmin [-help [cmd]]
Copy the code
Run an HDFS DFsadmin client.
dfsrouter
Usage: HDFS DFsRouter
Run the DFS router. See the Router
dfsrouteradmin
Usage:
hdfs dfsrouteradmin
[-add <source> <nameservice> <destination> [-readonly] -owner <owner> -group <group> -mode <mode>]
[-rm <source>]
[-ls <path>]
[-safemode enter | leave | get]
Copy the code
journalnode
Usage: HDFS JournalNode This command starts a journalnode. For details, see HDFS HA with QJM
namenode
Usage:
hdfs namenode [-backup] |
[-checkpoint] |
[-format [-clusterid cid ] [-force] [-nonInteractive] ] |
[-upgrade [-clusterid cid] [-renameReserved<k-v pairs>] ] |
[-upgradeOnly [-clusterid cid] [-renameReserved<k-v pairs>] ] |
[-rollback] |
[-rollingUpgrade <rollback|downgrade |started> ] |
[-finalize] |
[-importCheckpoint] |
[-initializeSharedEdits] |
[-bootstrapStandby [-force] [-nonInteractive] [-skipSharedEditsCheck] ] |
[-recover [-force] ] |
[-metadataVersion ]
Copy the code