[Techtalk] A (Difficult?) Regular Expression Construction Question

Elizabeth Barham lizzy at soggytrousers.net
Mon Sep 8 16:13:54 EST 2003


Kai writes:

> This is pretty much my feeling, too. Whether it's a homework
> question or a real-world situation, the delineation between "indent
> levels" should be made by something other than alphabetical order.
> 
> Indeed, why not just use leading tabs or spaces? (Personally, I'd
> prefer tabs, because the indents are then easier to see.) Then, you
> just count the number of leading tabs, and that's your indent level.

There *is* a pattern and its a matter of figuring it out and then
going from there. As for leading tabs/spaces, unfortunately that isn't
a luxury in this case. And while, yes, on the one hand I could say,
"Hey, indent those lines for me," on the other hand my job is to make
theirs as painless and easy as possible.

> I think the parsing as it's currently defined (and vaguely defined,
> at that) is basically dependent on judgement calls by humans -- and
> judgement calls that are easily mistaken, due to the vagueness in
> the data definition.

Yes, there needs to be an intermediate step where the user can go
through the data in an easy-to-understand way and make corrections,
but, again, the regex stuff is to make that intermediate step as easy
as possible meaning ideally it would correctly parse and interpret the
data from the get-go.

An even better method would be to write a GUI application on which the
user creates the index and well-rendered output is generated for
whatever media they would like. The problem here is that the primary
client uses MS Word and I've found that the general computer user has
no desire to learn a new application if possible: "I've been doing it
in MS Word and that's what I'm comfortable with."

>From what I understand, though, MS Word and the other Office
components are storing or will store their data in XML so this
shouldn't be much of an issue in the future.

Elizabeth


More information about the Techtalk mailing list