0

I have a very big text file ~ 105 G it's includes a lot of <XXXX> (angle brackets including text in between).

I want to remove those brackets and text in between.

5
  • Is the file in fact an XML file?
    – Kusalananda
    Commented Aug 16, 2018 at 10:13
  • Actually yes. @Kusalananda
    – LamaMo
    Commented Aug 16, 2018 at 10:14
  • Oops, actually not a dupe of that one.
    – Sparhawk
    Commented Aug 16, 2018 at 10:15
  • I try this 'Removing text between two specific strings' but I want also the angle bracket to be removed not just in between @Sparhawk
    – LamaMo
    Commented Aug 16, 2018 at 10:18
  • The top answer there actually does remove the outer strings too, but the problem is that it will remove text between multiple sets of brackets too. (Hence my "oops".)
    – Sparhawk
    Commented Aug 16, 2018 at 12:21

2 Answers 2

1

"sed" is your friend. I am supposing there are no embedded brackets.

Careful! this will overwrite your file.

sed -i 's/<[^>]*>//g' big_file
2
  • doesn't work with me :(
    – LamaMo
    Commented Aug 16, 2018 at 13:01
  • what is not working? wrong output? Commented Aug 16, 2018 at 14:15
1

Given an XML file and availability of XMLStarlet:

$ cat file.xml
<root>
<tag attrib="hello">Hello world</tag>
<tag attrib="nice">Nice to see you</tag>
</root>
$ xmlstarlet sel -t -v / file.xml

Hello world
Nice to see you

This uses XMLStarlet to extract the values of the root node and all of its child nodes.

Not the answer you're looking for? Browse other questions tagged .