[Courses] [C] Beginner's Lesson: How do I...?

Andrew Edgecombe andrew.edgecombe at spheresystems.com.au
Mon Dec 16 13:28:23 EST 2002


Morning :-)
strtok() will certainly work for what you're trying to do, but it's
probably overkill. I'd suggest (as Akkana has) writing the token parsing
function yourself. (at the very least it'll be another good exercise)

The disadvantage of using strtok() on it's own is that you have no way
of telling the difference between the end of a field and the end of a
record (assuming that you'd like to display them differently)
Of course, if you make the assumption that you've always got 4 fields to
each record, it's ok - but... :-)

In the example you've given, altering the input string is not a problem,
so that part of the strtok() warning can be ignored. Which is good,
because it's a good technique to learn about.

(background stuff - ignore if you wish)
In C, a string is just a bunch of characters with a 0x00 on the end. So,
any bunch of characters can be treated as a string by tacking a 0 byte
on the end.
Or, to look at it another way, if you took a string, and put a few 0
bytes through it, it would actually be useful as a bunch of shorter
strings.
strtok() uses this as the basis behind the tokenising. It will take a
string input, and will search for one of a set of characters, and then
substitute a 0 for the found character. So, you're left with a shorter
string, terminated at the position of the delimiter character.
(end of background stuff)

So, an example would be...
#include <stdio.h>
#include <string.h>

#ifndef FALSE
#define FALSE (0)
#endif
#ifndef TRUE
#define TRUE (1)
#endif

int main( void )
	{
	char records[] = "rec1 fld1\t rec1 fld2\t rec1 fld3\t rec1 fld4\n" \
		"rec2 fld1\t rec2 fld2\t rec2 fld3\t rec2 fld4\n" \
		"rec3 fld1\t rec3 fld2\t rec3 fld3\t rec3 fld4\n";
	char *stringPtr;
	int stringIdx;
	unsigned char foundToken;
	unsigned char complete;
	unsigned char foundEndOfRecord;
	int i;
	
	/*
		We wish to search through a field until we find either a tab or a newline.
		Substitute a NULL for that character, and print out that field as a string.
		If the terminator was a newline, then we print out a newline as well.
	*/
	stringPtr = records;
	complete = FALSE;
	do
		{
		stringIdx = 0;
		foundToken = FALSE;
		foundEndOfRecord = FALSE;
		do
			{
			/*
				Check out what kind of character this was.
			*/
			switch( stringPtr[ stringIdx ] )
				{
				case '\n':
					foundEndOfRecord = TRUE;
					// no break statement here - intend to fall through to '\t' case
				case '\t':
					foundToken = TRUE;
					stringPtr[ stringIdx ] = 0x00;
					break;
				case 0x00:
					complete = TRUE;
					foundToken = TRUE;
					break;
				default:
					stringIdx++;
					break;
				}
			}
		while( !foundToken );

		/*
			If we are not yet complete (ie. we haven't reached the end of our input string)
			then we can print out the field that we have just found.
		*/
		if( !complete )
			{
			/*
				Was the field that we found the last of a record? Or was it just an ordinary field?
			*/
			if( foundEndOfRecord )
				{
				// handle the end of a record
				printf("%s\n", stringPtr);
				}
			else
				{
				// handle an ordinary field
				printf("%s\t", stringPtr);
				}
			
			/*
				Since there is more stuff to process, update our string pointer to point to the next string segment
			*/
			stringPtr = &(stringPtr[ stringIdx + 1 ] );
			}
		}
	while( !complete );
	return 0;
	}

-- 
Andrew Edgecombe <andrew.edgecombe at spheresystems.com.au>




More information about the Courses mailing list