[techtalk] help with sorting text in a file

Chris J/#6 sixie at nccnet.co.uk
Sat May 13 20:48:02 EST 2000


Gene wrote:
>
> Try this ...
>
> cat file | grep 'QAA' | sed s/^.*QAA/QAA/ | sed s/:.*$//
>
> That's paraphrased, and I know there is a shorter way of doing it,
> but that should get the job done.  Basically, it gets all the lines
> with QAA in them, then removes the text either side of the
> QAAnnnnn code.  The "cat file" can be replaced with the actual
> output command, if the output is being filtered direct.
>

A shorter way would be:
        grep 'QAA' file | sed -e 's/^.*QAA/QAA/ -e s/:.*$//

But to cut it even more,
        grep 'QAA' file | sed 's/^.*\(QAA[0-9]*\):.*$/\1/'

As mentioned below by Gene though, cut and awk will to the job just as
admirably :) What the sed is doing is seeking for all patterns containing QAA
followed by a bunch of numbers. As the pattern for this  'QAA[0-9]*' has been
surrounded by parathensis (which have been escaped), sed assigns any match a
place number. As it's the first place holder (actually, the only one in this
RE), it's given the number 1. This is referenced in the replace string as \1.
The '^.*' and ':.*$' portions of the RE are there for completeness to make
sure the entire line is covered.

But cut (ie "grep 'QAA' file | cut -d: -f4"), as mentioned by Gene below, is
probably a nicer way (its certainly less expensive in terms of CPU load).

One final way - potentially overkill, but I'll show it here anyhow, is to use
the shells own functions to split the string and do the job of 'cut',
vis-a-vis this shell script:

        #!/bin/sh
        IFS=':'
        grep 'QAA' $1 | while read date min sec id junk
        do
                echo $id
        done

This technique is very useful if you want to parse a file into seperate
compenents (eg a config file) without the need for huge chunks of 'cut',
'sed', or 'awk' as the line is split into variables at the start of each loop
*by the shell*.

note in all these cases, I've avoided using 'cat' as it's a wasted resource
if your piping it straight into another command. Unless you concatenate
files, cat can be replaced either by:
        command args file
or
        command args < file

Just my 2p :)

Chris...


Gene continued to write:
>
> You can do just about any filtering with grep, sed, and regular
> expressions.
>
> Cut will work also, and probably produces a shorter line.  You can
> also replace the 2 sed's with an 'awk', if you know awk syntax ...
>
> grep 'QAA' file | awk '{FS=":"; print $NF-1}'
>
> Also paraphrased, since I'm not near Linux/UNIX at the moment.
>
> Gene Dolgner
>
>


-- 
@}-,'--------------------------------------------------  Chris Johnson --'-{@
    / "(it is) crucial that we learn the difference / sixie at nccnet.co.uk  \
   / between Sex and Gender. Therein lies the key  /                       \ 
  / to our freedom" -- LB                         / www.nccnet.co.uk/~sixie \ 







More information about the Techtalk mailing list