[prog] learning c

Almut Behrens almut-behrens at gmx.net
Sun Mar 28 17:36:59 EST 2004


On Sat, Mar 27, 2004 at 06:59:50AM -0800, Alice Moomlyn wrote:
> Hello Everybody.
> 
> I am teaching myself c and would like it if anybody
> who has time would be able to look over what I've
> written and give me some feedback.
> 
> At the moment I'm trying to get my head around strings
> and pointers so I've written a little program that
> reads in a string and reverses it (using two different
> methods) and then prints it out again.
> 
> I don't really know what I'm doing so any and all
> comments/suggestions/criticisms would be very helpful!
>

Hi Alice,

I must admit, I was rather impressed while looking through your code.
For someone claiming that she doesn't know what she's doing, this is
done pretty well.  Feel flattered :)  Honestly, I mean it!

However, as with almost any C program, there are, of course, some
subtle bugs ;)  Don't worry, even experienced programmers are making
the same errors from time to time. That's why at least every second
security issue brought up has in one way or another to do with buffer
overflows...

Most of them are the typical "off-by-one" errors:

(a) when you have a string of length 'size', and you want to
zero-terminate it with

   result[size] = '\0';

you always need a buffersize 'size + 1' when allocating the buffer

   if ((result = malloc((size+1)*sizeof(char))) == NULL)

because, when size==20, result[size] in fact refers to the 21st element
in the array, as indices are counted from zero.
You were, of course, well aware of this - just as any programmer - still,
these things do happen again and again...

This needs to be fixed in alice_getstring() (2x), alice_reverse1() and
alice_reverse2().

The function alice_reverse1() should, for example, then read:

  char *alice_reverse1(char* string, int size)
  {
     char* result;
     int i, j;
  
     if ((result = malloc((size+1)*sizeof(char))) == NULL)
     {
         printf("Out of memory!");
         return NULL;
     }
  
     for (i=0, j=size-1; i < size; i++, j--)
     {
         result[i] = string[j];
     }
  
     result[size] = '\0';
     return result;
  }


(b) another common thing: using the same variable/value for different
purposes, e.g. an integer as a boolean value in an if-condition, and at
the same time as an array index. Done so in alice_endline(): a return
value of 0 is supposed to mean FALSE (not found). Unfortunately, 0 also
is a valid array index, in which case it's supposed to be treated as
TRUE. This conflict brings the program to stop its service when you
enter a string of exactly length==bufsize-1 (i.e. 19). The newline char
\n then happens to be placed in the first array element [0]...

If you want to use the same variable for both purposes (which is
convenient, no doubt), I'd suggest you use -1 as the return value of
alice_endline() to indicate FALSE - this will never collide with a
legal array index.


A purely syntactic issue:

Only statements need to be terminated by semicolons, not the control
structure elements like the {}-blocks used for grouping statements
(as usually used in combination with if/else, while, for, function
definitions, etc.).  I.e. it suffices to write

  while (cond) {
    statement;
    statement;
  }
  nextstatement;

A semicolon directly after the } doesn't do much harm, but doesn't help
much either (it's just an empty statement).

Another minor thing I saw (probably a typo): there was a semicolon
directly after the 'else' near the end of alice_length(). That's not
supposed to be there ('cos the else-branch then just consists of an
empty statement). Currently, it doesn't make much of a difference
whether you have it there or not (the net-effect accidentally is the
same in the overall context). But in case you should start modifying
the code, it could make quite a difference...


Okay, how did I find those bugs?  Ubergeek or what? - Bullshit!  I was
just using the right tool: valgrind.

This is a very neat program for locating all sorts of memory-related
errors, like accessing undefined variables, memory outside of allocated
buffers, and much more.  To be able to do that, it runs the program in
a CPU emulated by software. Sounds complex? It is, but it works rather
well.  More info at http://valgrind.kde.org/
If you want to keep programming in C(++), do yourself a real favor - go
and get it now!

The deeper meaning of its output is not always immediately evident, but
in case of memory-related bugs, _any_ information is better than none.
Anyway, compile your program with debugging information (option -g) and
then run it in valgrind:

$ valgrind [valgrind-options] <your-program-with-options>

It then spits out any violations, together with the corresponding line
numbers in your sourcecode - plus various statistics. Really useful.

Note that it _doesn't_ do an abstract semantic analysis of your
program, though. It just runs it, and reports any errors occuring (many
of which would remain unseen during regular usage of the program).
But you still need to supply the appropriate test cases (data) that
make the errors show up in the first place...

This is where creativity comes into play (and experience to some
degree).  The "input universe" is usually too large to test
exhaustively, so you must draw some samples intelligently. Always come
up with test cases which are maximally "nasty" to the program, i.e. use
sizes exactly matching internal buffersizes (or multiples thereof), or
buffersize-/+1, or any other magic value which is brought into
existence by way of the various constants you put in your code -- you
get the idea...


BTW, upon first running your orignal version, I got, for example

  Enter a string of any length: 
  Alice Moomlyn
  
  This is the string you entered:
  Alice Moomlyn
  
  Reversed using array notation:
  ylmooM ecilA
  
  Reversed using pointer notation:
  ylmooM ecilA

(Note that in the reversed string, the 'n' from Moomlyn is missing...)

This was already a strong indication of at least one off-by-one error.
The details were then located via valgrind.


> 
> P.S If anybody knows how to instruct yahoo NOT to add
> its own newline characters to cut-and-pasted text,
> please let me know, or would I be better off sending
> example code as an attatchment?

There is a way, IIRC -- accidentally I still have an old yahoo
account... okay, just checked it:  via

  Mail Options -> General Preferences -> Screen Width

you can change the max. width for outgoing (and separately for
incoming) messages. The maximum value is 99, but that's in any case
more useful than the default 55.


Happy coding :)

Almut


P.S. a final meta comment:  in order to get more feedback here (and
elsewhere), try to pose _specific_ questions, or point at certain
errors you encountered. For example, you could have asked "why does
the first character of the reversed string get lost?". I bet you'd
have gotten at least 5 responses on that one. Very general requests
for feedback tend to offer a low incentive for the time-stressed geek.
At least _I_ feel more inclined to respond, when there's something
catchy, and it looks like I can find out and answer it in a reasonable
amount of time. Could be that I'm not the only chick feeling that way.


More information about the Programming mailing list