Filer Command

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 38

SIMPLE FILTERS

CONTENTS
Filters – definition
To format text – pr
Pick lines from the beginning – head
Pick lines from the end – tail
Extract characters – cut
Join two lines / files – paste
Sort, merge and remove – sort
Find unique and nonunique lines – uniq
Change, delete or squeeze characters - tr
Delimiter
 A delimiter is one or more characters that separates text
strings. Common delimiters are commas (,), semicolon
(;), quotes ( ", ' ), braces ({}), pipes (|), or slashes ( / \ ).
When a program stores lots of data it may use a delimiter
to separate each of the data values.
 Pipe is used to combine two or more command and in
this the output of one command act as input to another
command and this command output may act as input to
next command and so on. It can also be visualized as a
temporary connection between two or more commands/
programs/ processes. The command line programs that do
the further processing are referred to as filters.
 For example, "john|doe" has a pipe as its delimiter, a
program or script could distinguish between the first and
last name in a string of text.
SIMPLE FILTERS
 Commands which accept data from standard input,
manipulate it and write the results to standard output
 Each filter performs a simple function
 Some commands use delimiter, pipe (|) or colon (:)
 Many filters work well with delimited fields, and some
simply won’t work without them
 The piping mechanism allows the standard output of
one filter serve as standard input of another
 The filters can read data from standard input when
used without a filename as argument, and from the
file otherwise.
THE SIMPLE DATABASE
 Several UNIX commands are provided for text
editing and shell programming (emp.lst)
 Each line of this file has six fields separated by five
delimiters
 The details of an employee are stored in one single
line
2233 | a.k.shukla | g.m | sales | 12/12/52 | 6000
pr : paginating files
We know that,
cat dept.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
 pr command adds suitable headers, footers and
formatted text
 pr adds five lines of margin at the top and bottom
pr dept.lst
May 06 10:38 1997 dept.lst page 1
01:accounts:6213
02:progs:5423
03:marketing:6521
04:personnel:2365
05:production:9876
06:sales:1006
Output: The header shows the date and time of last
modification of the file along with the filename and page
number
pr options
-k prints k (integer) columns
-t to suppress the header and footer
-h to have a header of user’s choice
-d double spaces input
-n will number each line and helps in debugging
-on offsets the lines by n spaces and increases left
margin of page
pr +10 chap01
starts printing from page 10
pr -l 54 chap01
this option sets the page length to 54
head
 Displays the top of the file
 It displays the first 10 lines of the file, when used
without an option
head emp.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
04|personnel|2365
05|production|9876
06|sales|1006
 Option –n to specify a line count
 head -n 3 emp.lst
01|accounts|6213
02|progs|5423
03|marketing|6521
tail
 Displays the end of the file
 It displays the last 10 lines of the file, when used
without an option
 tail emp.lst
 Options
 -n to specify a line count
 tail -n 3 emp.lst
 Monitoring the file growth (-f):
 Extracting bytes rather than lines (-c)
cut
 It is used for slitting the file vertically
 head -n 5 emp.lst | tee shortlist
will select the first five lines of emp.lst and saves it to
shortlist
 We can cut by using -c option with a list of column
numbers, delimited by a comma (cutting columns)
cut -c 6-22,24-32 shortlist
cut -c -3,6-22,28-34,55- shortlist
 Most files don’t contain fixed length lines, so we have
to cut fields rather than columns (cutting fields)
-d for the field delimiter
-f for the field list
cut -d \ | -f 2,3 shortlist | tee cutlist1
will display the second and third columns of shortlist
and saves the output in cutlist1. here | is escaped to
prevent it as pipeline character
 To print the remaining fields, we have
cut –d \ | -f 1,4- shortlist > cutlist2
paste
When we cut with cut, it can be pasted back with
the paste command, vertically
paste cutlist1 cutlist2
We can view two files side by side
Sort
 Through this command data can be arranged in ascending
and descending order
 By default whitespaces,numerals, uppercase letter and
lowercase letter.
 Option
 -r: It reverse the previous sorting order.
 -u : remove repeated lines.
uniq
 It require sorted file and print unique lines.
Tr (Translating Character
 Tr options exp1 exp2 standardd input
Regular Expression
 Regular Expression provides an ability to match a “string of text” in a very
flexible and concise manner. A “string of text” can be further defined as a
single character, word, sentence or particular pattern of characters.
 Like the shell’s wild–cards which match similar filenames with a single
expression, grep uses an expression of a different sort to match a group of
similar patterns.
 [ ]: Matches any one of a set characters
 [ ] with hyphen: Matches any one of a range characters
 ^: The pattern following it must occur at the beginning of each line
 ^ with [ ] : The pattern must not contain any character in the set specified
 $: The pattern preceding it must occur at the end of each line
 . (dot): Matches any one character
 \ (backslash): Ignores the special meaning of the character following it
 *: zero or more occurrences of the previous character
 (dot).*: Nothing or any numbers of characters.
Grep command
 The grep filter searches a file for a particular pattern of
characters, and displays all lines that contain that pattern.
The pattern that is searched in the file is referred to as
the regular expression (grep stands for globally search for
regular expression and print out).
 Syntax:
 grep [options] pattern [files]
Options Description

 -c : This prints only a count of the lines that match a


pattern
 -h : Display the matched lines, but do not display the
filenames.
 -i : Ignores, case for matching
 -l : Displays list of a filenames only.
 -n : Display the matched lines and their line numbers.
 -v : This prints out all the lines that do not matches the
pattern
 -e exp : Specifies expression with this option. Can use
multiple times.
Options Description

 -f file : Takes patterns from file, one per line.


 -E : Treats pattern as an extended regular expression
(ERE)
 -w : Match whole word
 -o : Print only the matched parts of a matching line, with
each such part on a separate output line.
Example
 $cat > geekfile.txt
 unix is great os. unix is opensource. unix is free os. learn
operating system. Unix linux which one you choose. uNix
is easy to learn.unix is a multiuser os.Learn unix .unix is a
powerful.
 1. Case insensitive search : The -i option enables to
search for a string case insensitively in the give file. It
matches the words like “UNIX”, “Unix”, “unix”.
 $grep -i "UNix" geekfile.txt
 Output: unix is great os. unix is opensource. unix is free
os. Unix linux which one you choose. uNix is easy to
learn.unix is a multiuser os.Learn unix .unix is a powerful.
 Displaying the count of number of matches : We
can find the number of lines that matches the given
string/pattern
 $grep -c "unix" geekfile.txt Output:
 2
 Display the file names that matches the pattern
: We can just display the files that contains the given
string/pattern.
 $grep -l "unix" * or $grep -l "unix" f1.txt f2.txt
f3.xt f4.txt Output:
 geekfile.txt
 Checking for the whole words in a file : By default,
grep matches the given string/pattern even if it found as a
substring in a file. The -w option to grep makes it match
only the whole words.
 $ grep -w "unix" geekfile.txt
 Output:
 unix is great os. unix is opensource. unix is free os. uNix
is easy to learn.unix is a multiuser os.Learn unix .unix is a
powerful.
 Show line number while displaying the output
using grep -n : To show the line number of file with the
line matched.
 $ grep -n "unix" geekfile.txt
 Output:
 1:unix is great os. unix is opensource. unix is free os.
4:uNix is easy to learn.unix is a multiuser os.Learn unix
.unix is a powerful.
 Inverting the pattern match : You can display the
lines that are not matched with the specified search sting
pattern using the -v option.
 $ grep -v "unix" geekfile.txt
 Output:
 learn operating system. Unix linux which one you choose.
 Matching the lines that start with a string : The ^
regular expression pattern specifies the start of a line.
This can be used in grep to match the lines which start
with the given string or pattern.
 $ grep "^unix" geekfile.txt
 Output:
 unix is great os. unix is opensource. unix is free os.
 (a) [ ] : Matches any one of a set characters
 $grep “New[abc]” filename It specifies the search pattern as :
 Newa , Newb or Newc
 $grep “[aA]g[ar][ar]wal” filename It specifies the search
pattern as
 Agarwal , Agaawal , Agrawal , Agrrwal agarwal , agaawal , agrawal ,
agrrwal

(b) Use [ ] with hyphen: Matches any one of a range characters
 $grep “New[a-e]” filename It specifies the search pattern as
 Newa , Newb or Newc , Newd, Newe
 $grep “New[0-9][a-z]” filename It specifies the search pattern
as: New followed by a number and then an alphabet.
 New0d, New4f etc
 (c ) Use ^: The pattern following it must occur at the beginning of each
line
 $grep “^san” filename Search lines beginning with san. It specifies the
search pattern as
 sanjeev ,sanjay, sanrit , sanchit , sandeep etc.
 $ls –l |grep “^d” Display list of directories only
 $ls –l |grep “^-” Display list of regular files only
 (d) Use ^ with [ ]: The pattern must not contain any character in the set
specified
 $grep “New[^a-c]” filename It specifies the pattern containing the
word “New” followed by any character other than an ‘a’,’b’, or ‘c’
 $grep “^[^a-z A-Z]” filename Search lines beginning with an non-
alphabetic character
 (e) Use $: The pattern preceding it must occur at the end of each line
 $ grep "vedik$" file.txt
Egrep
 Egrep or grep -E is another version of grep or the
Extended grep. This version of grep is efficient and fast
when it comes to searching for a regular expression
pattern as it treats meta-characters as is and doesn’t
substitute them as strings like in grep, and hence you are
freed from the burden of escaping them as in grep. It uses
ERE or the Extended Regular Expression set.
 In case of egrep, even if you do not escape the meta-
characters, it would treat them as special characters and
substitute them for their special meaning instead of
treating them as part of string.
 $ egrep -C 0 '(f|g)ile' check_file
 (f) Use . (dot): Matches any one character
 $ grep "..vik" file.txt $ grep "7..9$" file.txt (g) Use \
(backslash): Ignores the special meaning of the character
following it
 $ grep "New\.\[abc\]" file.txtIt specifies the search pattern
as New.[abc]
 $ grep "S\.K\.Kumar" file.txt It specifies the search pattern
as
 S.K.Kumar
 (h) Use *: zero or more occurrences of the previous
character
 $ grep "[aA]gg*[ar][ar]wal" file.txt
(i) Use (dot).*: Nothing or any numbers of characters.
 $ grep "S.*Kumar" file.txt

You might also like