How does grep handle DOS end of line?

Question

I have a Windows text file which contains a line (with ending CRLF)

aline

The following is several commands' output:

[root@panel ~]# grep aline file.txt
aline
[root@panel ~]# grep aline$'\r' file.txt

[root@panel ~]# grep aline$'\r'$'\n' file.txt

[root@panel ~]# grep aline$'\n' file.txt
aline

The first command's output is normal. I'm curious about the second and the third output. Why is it an empty line? And the last output, I think it can not find the string but it actually finds it, why? The commands are run on CentOS/bash.

Since grep is line-oriented, I believe it strips the $'\n' from the pattern, since no line to match against it will contain a line feed. After that, it appears to be implementation-dependent, as all the BSD grep that ships with macOS outputs aline for all four. — chepner, Commented Nov 19, 2020 at 15:48
Are you certain the file has CRLF line endings? od -c file.txt — glenn jackman, Commented Nov 19, 2020 at 16:52
yes @glennjackman [root@panel ~]# od -c file.txt 0000000 a l i n e \r \n 0000007 — William, Commented Nov 19, 2020 at 16:56

phuclv · Accepted Answer · 2021-04-21 04:48:43Z

In this case grep really matches the string "aline\r" but you just don't see it because it was overwritten by the ANSI sequence that prints color. Pass the output to od -c and you'll see

$ grep aline file.txt
aline
$ grep aline$'\r' file.txt

$ grep aline$'\r' --color=never file.txt
aline
$ grep aline$'\r' --color=never file.txt | od -c
0000000   a   l   i   n   e  \r  \n
0000007
$ grep aline$'\r' --color=always file.txt | od -c
0000000 033   [   0   1   ;   3   1   m 033   [   K   a   l   i   n   e
0000020  \r 033   [   m 033   [   K  \n
0000030

With --color=never you can see the output string because grep doesn't print out the color. \r simply resets the cursor to the start of the line and then a new line is printed out, nothing is overwritten. But by default grep will check whether it's running on the terminal or its output is being piped and prints out the matched string in color if supported, and it seems resetting the color then print \n clears the rest of the line

To match \n you can use the -z option to make null bytes the line separator

$ grep -z aline$'\r'$'\n' --color=never file.txt
aline
$ grep -z aline$'\r'$'\n' --color=never file.txt  | od -c
0000000   a   l   i   n   e  \r  \n  \0
0000010
$ grep -z aline$'\r'$'\n' --color=always file.txt | od -c
0000000 033   [   0   1   ;   3   1   m 033   [   K   a   l   i   n   e
0000020  \r 033   [   m 033   [   K  \n  \0
0000031

Your last command grep aline$'\n' file.txt works because \n is simply a word separator in bash, so the command is just the same as grep aline file.txt. Exactly the same thing happened in the 3^rd line: grep aline$'\r'$'\n' file.txt To pass a newline you must quote the argument to prevent word splitting

$ echo "aline" | grep -z "aline$(echo $'\n')"
aline

To demonstrate the effect of the quote with the 3^rd line I added another line to the file

$ cat file.txt
aline
another line
$ grep -z "aline$(echo $'\n')" file.txt | od -c
0000000   a   l   i   n   e  \r  \n   a   n   o   t   h   e   r       l
0000020   i   n   e  \n  \0
0000025
$ grep -z "aline$(echo $'\n')" file.txt
aline
another line
$

Great! I've never been aware of such subtle detail about grep. — William, Commented Nov 21, 2020 at 2:55

tripleee · Accepted Answer · 2020-11-19 16:47:22Z

If the input is not well-formed, the behavior is undefined.

In practice, some versions of GNU grep use CR for internal purposes, so attempting to match it does not work at all, or produces really bizarre results.

For not entirely different reasons, passing in a literal newline as part of the regular expression could have some odd interpretations, including, but not limited to, interpreting the argument as two separate patterns. (Look at how grep -F reads from a file, and imagine that at least some implementations use the same logic to parse the command line.)

In the grand scheme of things, the sane solution is to fix the input so it's a valid text file before attempting to run Unix line-oriented tools on it.

For quick and dirty solutions, some tools have well-defined semantics for random binary input. Perl is a model citizen in this respect.

bash$ perl -ne 'print if /aline\r$/' <<<$'aline\r'
aline

Awk also tends to work amicably, though there are several implementations, so the risk that somebody somewhere has a version which doesn't behave identically to AT&T Awk is higher.

Maybe notice also how \r is the last character before the end of the line (the DOS line ending is the sequence CR LF, where LF is the standard Unix line terminator for text files).

grep '$al\$\nin' <<<$'in\nal' returns nothing on Ubuntu, and in on MacOS. — tripleee, Commented Nov 19, 2020 at 16:48

LN2 · Accepted Answer · 2022-11-16 13:59:42Z

At least for me phuclv's answer doesn't completely cover the last case, i.e. grep aline$'\n' file.txt. Your mileage my vary depending on which shell and which version and implementation of grep you are using, but for me grep -z "aline$(echo $'\n')" and grep -z aline$'\n' both just match the same pattern as grep -z aline.

This becomes more apparent if the -o switch is used, so that grep outputs only the matched string and not the entire line (which is the entire file for most text files when the -z option is used).

If you use the same file.txt as in phuclv's second example:

$ cat file.txt
aline
another line
$ grep -z "aline$(echo $'\n')" file.txt | od -c

0000000   a   l   i   n   e  \r  \n   a   n   o   t   h   e   r       l

0000020   i   n   e  \n  \0

0000025

$ grep -z -o "aline$(echo $'\n')" file.txt | od -c

0000000   a   l   i   n   e  \0

0000006

$ grep -z -o aline$'\n' file.txt | od -c

0000000   a   l   i   n   e  \0

0000006

$ grep -z -o aline file.txt | od -c

0000000   a   l   i   n   e  \0

0000006

To actually match a \n as part of the pattern I had to use the -P switch to turn on "Perl-compatible regular expression"

$ grep -z -o -P 'aline\r\n' file.txt | od -c

0000000   a   l   i   n   e  \r  \n  \0

0000010

$ grep -z -o -P 'aline\r\nanother' file.txt | od -c

0000000   a   l   i   n   e  \r  \n  a   n   o   t   h   e   r  \0

0000017

For reference:

grep --version|head -n1
grep (GNU grep) 3.1

bash --version|head -n1
GNU bash, version 4.4.20(1)-release (x86_64-pc-linux-gnu)

Collectives™ on Stack Overflow

How does grep handle DOS end of line?

3 Answers 3

Your Answer

Not the answer you're looking for? Browse other questions tagged
bash
grep
carriage-return
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged bashgrepcarriage-return or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
bash
grep
carriage-return
or ask your own question.