[techtalk] switch function in C (or how to read commandline args?)

Penguina penguina at cosyn.co.nz
Thu Jul 5 06:40:46 EST 2001


On Tue, 3 Jul 2001, Conor Daly wrote:
> On Mon, Jul 02, 2001 at 08:17:58PM -0500 or so it is rumoured hereabouts,
> Jeff Dike thought:
> > conor.daly at oceanfree.net said:
> > > Now that's interesting.  So how do I tokenise the strings?

There are a few good references on doing this -- strtok is a good
thing to do -- but if you're planning on writing more than one
parser in your life, you might want to read the ORA book on lex
and yacc.

> > OK, let's say you have an interpreted language, and a program being
> > interpreted has this in an inner loop:
> > 	foo;
> > 	bar;
> > 	baz;
> >
> > If you represent pieces of this program with strings, you'll end up with an
> > array like [ "foo", "bar", "baz" ] for that loop body.
> >
> > The interpreter will need to do something like this for each iteration of that
> > loop:
> > 	if(!strcmp(instruction, "foo")) do_foo();
> > 	else if(!strcmp(instruction, "bar")) do_bar();
> > 	else if(!strcmp(instruction, "baz")) do_baz();
> > 	else if(!strcmp(instruction, "hoo")) do_hoo();
> > 	else if(!strcmp(instruction, "ha")) do_ha();
> >
> > which will be slow.
> >
> > So, what's normally done instead is that when the program is read in, a
> > conversion like this happens:
> > 	"foo;" -> FOO_INSTR
> > 	"bar;" -> BAR_INSTR
> > 	"baz;" -> BAZ_INSTR

This is where an explanation of strtok could come in handy.

> So, I can do something like:
>
> #define FORCE_SWITCH "--force"
> #define CONFIG_SWITCH "-C"
>
> and so on?

No.
What this will accomplish is setting the constants
to be strings, which will then require slow strcmp
processing as above.  When you tokenize something,
you come up with a unique *numerical* constant to
represent a particular string.  You can also have
several different strings parse to the same token.
So I could say

#define HOLA   23
#define HELLO  23
#define AMIGO  24
#define FRIEND 24

HOLA AMIGO would parse to 23 24 as would HELLO FRIEND --
which makes sense, since they mean the same thing.

> > where *_INSTR are #defined constants or enum constants.
> >
> > So, now the inner loop is represented as this array :
> > 	[ FOO_INSTR, BAR_INSTR, BAZ_INSTR]
> >

You couldn't do the following if you'd #define HOLA "hola",
since you can't use a full string in a switch:

> > And the interpreter can be implemented using a switch:
> >
> > 	switch(instruction){
> > 	case FOO_INSTR:
> > 		do_foo();
> > 		break;
> > 	case BAR_INSTR:
> > 		do_bar();
> > 		break;
>
> and
>
> 	case FORCE_SWTICH:
> 		force=1;
> 		break;
> 	case CONFIG_SWITCH:
> 		strcpy(config_file_name, argv[2]);
> 		break;
>

won't work if you've defined

> #define FORCE_SWITCH "--force"
> #define CONFIG_SWITCH "-C"

as above, because a case statement won't take
a string as an argument.  "Tokenizing" means
turning the string into a numerical constant.

You can do it by hand with strtok, or use a system
like lex and yacc to do it systematically.  One of
the advantages of using lex and yacc for parsing
text input to programmes is that you can also more
systematically avoid buffer overflow issues (real
security risks) that can arise when you're processing
lots of strings in an ad hoc fashion.







More information about the Techtalk mailing list