HDFS Tutorial
HDFS Tutorial
HDFS Tutorial
NOTE:
Use hadoop dfs or hadoop fs or hdfs dfs or hdfs fs to write any HDFS Shell
command listed below:
usage:
help:
o HDFS Command that displays help for given command or all commands if
none is specified.
o Syntax :
hdfs dfs –help
mkdir:
o HDFS Command to create the directory in HDFS.
o Syntax:
hdfs dfs –mkdir <directory_name>
o Example:
hdfs dfs –mkdir /new_univo
o NOTE:
Here we are trying to create a directory named ‚new_univo‛ in
HDFS.
In order to create sub directory, parent directory must exist
else ‘No Such directory found’ Error would come.
ls:
o HDFS Command to display the list of Files and Directories in HDFS.
o Syntax:
hdfs dfs –ls <dir_path>
o Example:
hdfs dfs –ls /
NOTE :
This command will list all the files and directories in HDFS
root directory.
o The information returned is very similar to that returned by the
Unix command ls -l, with a few minor differences. The first column
shows the file mode.
o The second column is the replication factor of the file (something a
traditional Unix filesystem does not have). Remember we set the
default replication factor in the site-wide configuration to be 1,
which is why we see the same value here. The entry in this column is
empty for directories because the concept of replication does not
apply to them—directories are treated as metadata and stored by the
namenode, not the datanodes.
o The third and fourth columns show the file owner and group.
o The fifth column is the size of the file in bytes, or zero for
directories.
o The sixth and seventh columns are the last modified date and time.
o Finally, the eighth column is the name of the file or directory.
o To list the content of the directories recursively:
o Example:
hdfs dfs –lsr /
chmod:
o Changes the permission of a file
o This works similar to a Linux shell’s chmod command with a few
exceptions.
o –R: Recursively change the permission for all the files underlying
the path you mention.
o HDFS has a permissions model for files and directories that is much
like the POSIX model.
o There are three types of permission:
o The read permission (r), the write permission (w), and the execute
permission (x).
o The read permission is required to read files or list the contents
of a directory.
o The write permission is required to write a file or, for a
directory, to create or delete files or directories in it.
o The execute permission is ignored for a file because you can’t
execute a file on HDFS (unlike POSIX), and for a directory this
permission is required to access its children.
o Each file and directory has an owner, a group, and a mode.
o The mode is made up of the permissions for the user who is the
owner, the permissions for the users who are members of the group,
and the permissions for users who are neither the owners nor members
of the group.
o Syntax:
hadoop dfs –chmod [–R] <MODE> <PATH>
o Example:
hadoop dfs –chmod –R 754 /user/Hadoop
o r w x (User)
o r w x (Group)
o r w x (Mode For all the Users Members of Non-Members)
copyFromLocal or put:
o HDFS Command to copy the file from a Local file system to HDFS.
o Syntax:
hdfs dfs -copyFromLocal <localsrc> <hdfs destination>
o Example:
hdfs dfs –copyFromLocal /home/univo/test /new_univo
o NOTE :
Here the test is the file present in the local directory
/home/edureka and after the command is executed the test file
will be copied in /new_edureka directory of HDFS.
Default path to destination would be home directory at local
file system. That is by default : /home/<user_name>
hdfs dfs –put /home/univo/test
File name in Unix can be without extension as well.
copyToLocal or get:
o HDFS Command to copy the file from HDFS to Local File System.
o Syntax :
hdfs dfs -copyToLocal <hdfs_source> <localdst>
or
hdfs dfs -get <hdfs source> <localdst>
o Example :
hdfs dfs –copyToLocal /new_univo/test /home/univo
NOTE:
Default path to destination would be home directory at local file
system. That is by default : /home/<user_name>
hdfs dfs –get /new_univo/test
cat:
o HDFS Command that reads a file on HDFS and prints the content of
that file to the standard output.
o Syntax :
hdfs dfs –cat /path/to/file_in_hdfs
o Example:
hdfs dfs –cat /new_univo/test
du:
o HDFS Command to check the file size.
o Syntax:
hdfs dfs –du –s /directory/filename
o Example:
hdfs dfs –du –s /new_univo/test
touchz:
o HDFS Command to create a file in HDFS with file size 0 bytes.
o Syntax:
hdfs dfs –touchz /directory/filename
o Example:
hdfs dfs –touchz /new_univo/sample
count:
o HDFS Command to count the number of directories, files, and bytes
under the paths that match the specified file pattern.
o Syntax:
hdfs dfs -count <path>
o Example:
hdfs dfs –count /user
rm:
o HDFS Command to remove the file from HDFS.
o Syntax:
hdfs dfs –rm <path>
o Example:
hdfs dfs –rm /new_univo/test
rm –r:
o HDFS Command to remove the entire directory and all of its
content from HDFS.
o Syntax:
hdfs dfs -rm -r <path>
o Example:
hdfs dfs -rm -r /new_univo
cp:
o HDFS Command to copy files from source to destination. This
command allows multiple sources as well, in which case the
destination must be a directory.
o Syntax:
hdfs dfs -cp <src> <dest>
o Example:
hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
hdfs dfs -cp /user/hadoop/file1 /user/hadoop/file2
/user/hadoop/dir
mv:
o HDFS Command to move files from source to destination. This
command allows multiple sources as well, in which case the
destination needs to be a directory.
o Syntax:
hdfs dfs -mv <src> <dest>
o Example:
hdfs dfs -mv /user/hadoop/file1 /user/hadoop/file2
rmdir:
o HDFS Command to remove the directory.
o Syntax:
hdfs dfs -rmdir <path>
o Example:
hdfs dfs –rmdir /user/Hadoop
setrep:
o This command is used to change the replication factor of a file
to a specific instead of the default of replication factor for
the remaining in HDFS. Remaining in HDFS.
o If <path> is a directory then the command recursively changes the
replication factor of all files under the directory tree rooted
at path <path>
o Syntax:
hadoop dfs –setrep <replication factor number>
<file/path_name>
o Example:
hadoop dfs –setrep 2 /user/Hadoop