[Techtalk] A (Difficult?) Regular Expression Construction Question

Julie txjulie at austin.rr.com
Sun Sep 7 18:46:36 EST 2003


Elizabeth Barham wrote:
> Dear Ladies and Gentlemen,
> 
>    I am trying to fathom a proper method within the constraints of
> regular-expressions of dealing with a rather complicated
> text-structure. For example,
> 
> TECHNOLOGY
> Internet, p. 20
> Routers, p. 35
> Techgear
> Apple's New Ipod, p. 21
> Compaq's new Ipaq, p. 12
> Some new PDA, p. 22
> Weatherquest's New PDA, p. 25
> UNIX, p. 30
> 
> Basically, this is a title (TECHNOLOGY), a series of entries
> (e.g. Internet, p. 20), a sub-topic (Techgear) and its own entries -
> note where its entries end, however: UNIX, which ends the
> forward-flowing alphabetic entry. e.g.:
> 
> TECHNOLOGY (title)
>   Internet, p. 20 (entry)
>   Routers, p. 35
>    Techgear (subtopic)
>      Apple's New Ipod, p. 21 (subtopic entry)
>      Compaq's new Ipaq, p. 12
>      Some new PDA, p. 22
>      Weatherquest's New PDA, p. 25
>   UNIX, p. 30 (entry)
> 
> I'm trying to come up with a regular expression that would correctly
> identify the "Techgear" and its subsequent entries by noting that the
> next-entry (UNIX) is *behind* the last of its own entries
> (Weatherquest), but I cannot think of a regular-expression that can do
> such a thing, which is a bit surprising because it seems like
> regular-expressions can do so much.
> 
> Any ideas on how to deal with it with regular-expressions?

I could suggest regular expressions, but I don't see what distinguishes
between the "subtopic entry" Weatherquest and the "entry" UNIX.  If you
can write out the distinguishing characteristics of each entry type then
perhaps you could derive the regular expressions more easily.

The other thing is that you can't do things like "alphabetical order"
with regular expressions.  You need something which can keep state
information.  You might want to look at Perl or Awk.
-- 
Julianne Frances Haugh             Life is either a daring adventure
txjulie at austin.rr.com                  or nothing at all.
					    -- Helen Keller



More information about the Techtalk mailing list