I have a very big text file ~ 105 G
it's includes a lot of <XXXX>
(angle brackets including text in between).
I want to remove those brackets and text in between.
I have a very big text file ~ 105 G
it's includes a lot of <XXXX>
(angle brackets including text in between).
I want to remove those brackets and text in between.
"sed" is your friend. I am supposing there are no embedded brackets.
Careful! this will overwrite your file.
sed -i 's/<[^>]*>//g' big_file
Given an XML file and availability of XMLStarlet:
$ cat file.xml
<root>
<tag attrib="hello">Hello world</tag>
<tag attrib="nice">Nice to see you</tag>
</root>
$ xmlstarlet sel -t -v / file.xml
Hello world
Nice to see you
This uses XMLStarlet to extract the values of the root node and all of its child nodes.