[Courses] Automata.

Mary linuxchix at puzzling.org
Wed May 22 22:10:16 EST 2002


This was originally a post to techtalk, Sonja recommended a repost here.

This is under the Open Publication Licence as per
http://www.opencontent.org/openpub/ so that it can go on the webpage.

Introduction:

Regular expressions ("regexps") are patterns that you
specify that mean things like "find any occurance of three or more 'a'
characters in a row in this text" or "replace any occurance of a
capitalised word with the word 'fruitbat'"

They're much more powerful than standard search and replace utilites
that you find in word processors and so on, because you don't have to
match contants, instead you can match things like "any sequence of
numbers".

A lot of the more powerful UNIX editors (vi and emacs at least) allow
you to use regular expressions, and there are also command line
utilities that use them (sed, grep and procmail, the mail sorting
program).

You should be able to learn regexps without understanding automata, but
if you're interested in how they work beneath the hood, as it were,
here's a basic intro to automata.

Automata and regular expressions:

Telsa said on IRC that she's stuck in a part of the O'Reilly regexps
book on DFAs and NFAs. I explained it thus:

DFA and NFA (automatas) are "inside the bonnet" level concepts [with
regard to using regular expressions] though.

The automata is just a series of states, something like: "start here. If
the next letter is an X, move into state 2, if it is a Y move into state
3".  And then states 2 and 3 have their own list of "if the next letter
is a P, go to state 5" and so on.

So you have your input text and you follow the instructions letter by
letter.  There are special states called "acceptance states".  If you're
at the end of the input, and your current state is an acceptance state,
then your text has matched the regexp.  Otherwise it has not matched the
regexp.

Now in a DFA (D for deterministic), there is only one possible
instruction to follow.  Eg, you're in state 2, you see a P, there's only
one instruction for what to do when you see a P.

In an NFA (N for non-deterministic), there might be more than one choice
- "if you see a P, you can go to state 5 or state 6" So then you need to
make a choice.  You go to state 5 and keep working.  But if you fail,
you go back at some point and try state 6 instead.

So, when you're programming a regular expression interpreter, you
"translate" the regexp into a automata.  But you won't be programming
the interpreter, any more than you have to write your own kernels to use
Linux.

Telsa commented re NFAs:

Oh, this is like me with crosswords :)
The answer here might be owl or bat, and that gives me a letter
in another clue... if it was owl, then...

-Mary.



More information about the Courses mailing list