With awk
, printing the first and last field, including the second field if it's comprised of alphabetic characters only:
awk '$2~/^[[:alpha:]]+$/ {print $1, $2, $NF; next} {print $1, $NF}' file.txt
If you insist on using sed
:
sed -E 's/^([^[:blank:]]+)[[:blank:]]+([[:alpha:]]+)?.*[[:blank:]]([^[:blank:]]+)$/\1 \2 \3/'
For the lines that do not have only alphabetic second field, this will have two spaces between the two fields, you could tack another sed
for that:
sed -E 's/^([^[:blank:]]+)[[:blank:]]+([[:alpha:]]+)?.*[[:blank:]]([^[:blank:]]+)$/\1 \2 \3/; s/ / /'
Example:
% cat file.txt
comp.os.linux announce 0000002587 02190 m
comp.arch 00000 28874 y
utsa.cs.3423 00000000004 000000000001 y
% awk '$2~/^[[:alpha:]]+$/ {print $1, $2, $NF; next} {print $1, $NF}' file.txt
comp.os.linux announce m
comp.arch y
utsa.cs.3423 y
% sed -E 's/^([^[:blank:]]+)[[:blank:]]+([[:alpha:]]+)?.*[[:blank:]]([^[:blank:]]+)$/\1 \2 \3/' file.txt
comp.os.linux announce m
comp.arch y
utsa.cs.3423 y
% sed -E 's/^([^[:blank:]]+)[[:blank:]]+([[:alpha:]]+)?.*[[:blank:]]([^[:blank:]]+)$/\1 \2 \3/; s/ / /' file.txt
comp.os.linux announce m
comp.arch y
utsa.cs.3423 y
y
and notm