[techtalk] help with sorting text in a file
Chris J/#6
sixie at nccnet.co.uk
Sat May 13 20:48:02 EST 2000
Gene wrote:
>
> Try this ...
>
> cat file | grep 'QAA' | sed s/^.*QAA/QAA/ | sed s/:.*$//
>
> That's paraphrased, and I know there is a shorter way of doing it,
> but that should get the job done. Basically, it gets all the lines
> with QAA in them, then removes the text either side of the
> QAAnnnnn code. The "cat file" can be replaced with the actual
> output command, if the output is being filtered direct.
>
A shorter way would be:
grep 'QAA' file | sed -e 's/^.*QAA/QAA/ -e s/:.*$//
But to cut it even more,
grep 'QAA' file | sed 's/^.*\(QAA[0-9]*\):.*$/\1/'
As mentioned below by Gene though, cut and awk will to the job just as
admirably :) What the sed is doing is seeking for all patterns containing QAA
followed by a bunch of numbers. As the pattern for this 'QAA[0-9]*' has been
surrounded by parathensis (which have been escaped), sed assigns any match a
place number. As it's the first place holder (actually, the only one in this
RE), it's given the number 1. This is referenced in the replace string as \1.
The '^.*' and ':.*$' portions of the RE are there for completeness to make
sure the entire line is covered.
But cut (ie "grep 'QAA' file | cut -d: -f4"), as mentioned by Gene below, is
probably a nicer way (its certainly less expensive in terms of CPU load).
One final way - potentially overkill, but I'll show it here anyhow, is to use
the shells own functions to split the string and do the job of 'cut',
vis-a-vis this shell script:
#!/bin/sh
IFS=':'
grep 'QAA' $1 | while read date min sec id junk
do
echo $id
done
This technique is very useful if you want to parse a file into seperate
compenents (eg a config file) without the need for huge chunks of 'cut',
'sed', or 'awk' as the line is split into variables at the start of each loop
*by the shell*.
note in all these cases, I've avoided using 'cat' as it's a wasted resource
if your piping it straight into another command. Unless you concatenate
files, cat can be replaced either by:
command args file
or
command args < file
Just my 2p :)
Chris...
Gene continued to write:
>
> You can do just about any filtering with grep, sed, and regular
> expressions.
>
> Cut will work also, and probably produces a shorter line. You can
> also replace the 2 sed's with an 'awk', if you know awk syntax ...
>
> grep 'QAA' file | awk '{FS=":"; print $NF-1}'
>
> Also paraphrased, since I'm not near Linux/UNIX at the moment.
>
> Gene Dolgner
>
>
--
@}-,'-------------------------------------------------- Chris Johnson --'-{@
/ "(it is) crucial that we learn the difference / sixie at nccnet.co.uk \
/ between Sex and Gender. Therein lies the key / \
/ to our freedom" -- LB / www.nccnet.co.uk/~sixie \
More information about the Techtalk
mailing list