[Courses] [C] Beginner's Lessons: preprocessor details

KWMelvin kwmelvin at intrex.net
Wed Nov 13 17:37:05 EST 2002

C preprocessor details attached.
-------------- next part --------------

admin at surgo.net requested more info about the C preprocessor.

The following is from K.N.King's _C Programming A Modern Approach_
Chapter 14 "The Preprocessor". (WWNorton, 1996)

The preprocessor is a piece of software that edits C programs just
prior to compilation.  Its reliance on a preprocessor makes C (along
with C++) unique among major programming languages.

The preprocessor is a powerful tool, but it also can be a source of
hard-to-find bugs.  Moreover, the preprocessor can easily be misused
to create programs that are almost impossible to understand. Modern C
programming style calls for decreased reliance on the preprocessor.

How the preprocessor works
The behaviour of the preprocessor is controlled by DIRECTIVES: commands
that begin with a hash character (#). We've encountered two of these
already, the #include and the #define.

The #define directive defines a MACRO -- a name that represents
something else, typically a constant of some kind. The preprocessor
responds to a #define directive by storing the name of the macro
together with its definition.  When the macro is used later in the
program, the preprocessor "expands" the macro, replacing it by its
defined value.

The #include directive tells the preprocessor to open a particular
file and "include" its contents as part of the file being compiled.
For example, the line
    #include <stdio.h>
instructs the preprocessor to open the file named stdio.h and bring
its contents into the program.  It works like this:

C Program -> Preprocessor -> Modified C Program -> Compiler -> Program

The input to the preprocessor is a C program, possibly containing
directives.  The preprocessor executes these directives, removing them
in the process.  The output of the preprocessor is another C program:
an edited version of the original program, containing no directives.
The preprocessor's output goes directly into the compiler, which checks
the program for errors and translates it to object code (machine

To see what the preprocessor does, let's apply it to a sample program:

/* Converts a Fahrenheit temperature to Celcius */
#include <stdio.h>
#define FREEZING_PT 32.0
#define SCALE_FACTOR (5.0 / 9.0)
int main(void)
  float fahrenheit, celcius;
  printf("Enter Fahrenheit temperature: ");
  scanf("%f", &fahrenheit);
  celcius = (fahrenheit - FREEZING_PT) * SCALE_FACTOR;
  printf("Celcius equivalent is: %.1f\n", celcius);
  return 0;

After preprocessing the program may have the following appearance:

Blank line
Blank line
Lines brought in from stdio.h
Blank line
Blank line
Blank line
Blank line
  float fahrenheit, celcius;
  printf("Enter Fahrenheit temperature: ");
  scanf("%f", &fahrenheit);
  celcius = (fahrenheit - 32.0) * (5.0 / 9.0);
  printf("Celcius equivalent is: %.1f\n", celcius);
  return 0;

The preprocessor responded to the #include directive by bringing in the
contents of stdio.h, which is not shown here because of its length. The
preprocessor also removed the #define directives and replaced FREEZING_PT
and SCALE_FACTOR wherever they appeared later in the file.  Notice that
the preprocessor doesn't remove lines containing directives; instead, it
simply makes them empty.

As this example shows, the preprocessor does a bit more than just execute
directives. In particular, it replaces each comment with a single space
character. Some preprocessors go further and remove unnecessary white-
space characters, including spaces and tabs at the beginning of indented

On MY Debian system, the preprocessor is called `cpp' and it can be run
on a source code file (or any other file) by simply supplying it with 
an input file and an output file: 

    $ cpp input_file output_file

(The actual source code with the changes was at the very bottom of the
 output_file, which was over 3500 lines long!  Using vi, I `dd'd all
 the extra lines, so just the source code was left.)

You can use this to experiment with what the preprocessor does, as well
as to look at the preprocessor output before compiling a program.

Caution: The C preprocessor is quite capable of creating illegal programs
as it executes directives. Often the original program looks fine, making
errors harder to find. In complicated programs, examining the output of
the preprocessor may prove useful for locating this kind of error.

Most preprocessor directives fall into one of three catgories:

1] Macro definition.  The #define directive defines a macro; the #undef
                      directive removes a macro definition.

2] File inclusion.    The #include directive causes the contents of a
                      specified file to be included in a program.

3] Conditional Compilation.  The #if, #ifdef, #ifndef, #elif, #else, and
                             #endif directives allow blocks of text to be
                      either included in or excluded from a program,
                      depending on conditions that can be tested by the

The remaining directives -- #error, #line, and #pragma -- are more
specialized and therefore used less often.

Let's look at a few rules that apply to ALL directives:

* Directives always begin with the hash (#) symbol.  The # symbol does
  not need to be at the beginning of a line, as long as only white
  space precedes it. After the # comes the name of the directive, followed
  by any other information the directive requires.

* Any number of spaces and horizontal tab characters may separate the
  tokens in a directive.  For example, the following directive is legal:
  #    define    N    100

* Directives always end at the first new-line character, unless explicitly
  continued.  To continue a directive to the next line, we must end the
  current line with a backslash (\) character.  For example, the following
  directive defines a macro that represents the capacity of a hard disk,
  measured in bytes:

  #define DISK_CAPACITY  (SIDES *             \
                          TRACKS_PER_SIDE *   \
                          SECTORS_PER_TRACK * \

* Directives can appear anywhere in a program.  Although we usually put
  #define and #include directives at the beginning of a file, other
  directives are more likely to show up later, even in the middle of
  function definitions.

* Comments may appear on the same line as a directive. In fact, it's good
  practice to put a comment at the end of a macro definition to explain
  the macro's significance:

  #define FREEZING_PT  32.0          /* Freezing point of water */

The definition of a simple macro has the form:

      #define   identifier   replacement-list

replacement-list is any sequence of C tokens; it may include identifiers
keywords, numbers, character constants, string literals, operators, and
punctuation.  When it encounters a macro definition, the preprocessor
makes a note that `identifier' represents replacement-list, wherever
identifier appears later in the file, the preprocessor substitutes

Don't put any extra symbols in a macro definition -- they'll become
part of the replacement list.  Putting the = symbol in a macro definition
is a common error:

  #define N = 100   /*** WRONG ***/

  int a[N];         /* becomes  int a[= 100]; */

Ending a macro definition with a semicolon is another popular mistake:

  #define N 100;    /*** WRONG ***/

  int a[N];         /* becomes  int a[100;]; */

Using #define to create names for constants has several significant

* It makes programs easier to read.  The name of the macro--if well
  chosen--helps the reader understanding the meaning of the constant.
  The alternative is a program full of "magic numbers" that can easily
  mystify the reader.

* It makes programs easier to modify.  We can change the value of a
  constant throughout a program by modifying a single macro definition.
  "Hard-coded" constants are much harder to change, especially since
  they sometimes appear in a slightly altered form.

* It helps avoid inconsistencies and typographical errors. If a numerical
  constant like 3.14159 appears many times in a program, chances are it
  will occasionally be written 3.1416 or 3.14195 by accident.

* Controlling conditional compilation.  Macros play an important role
  in controlling conditional compilation as we'll see later.  For
  example, the following line in a program might indicate that it's to be
  compiled in "debugging mode", with extra statements included to
  produce debugging output:

      #define DEBUG

  It is legal for a macro's replacement list to be empty.

The definition of a parameterized macro has the form:

  #define  identifier(x1 , x2 , . . . . xn) replacement-list

where x1,x2,...xn are identifiers (the macros parameters). The parameters
may appear as many times as desired in the replacement list.

There must be NO SPACE between the macro name and the left parentheses.
If space is left, the preprocessor will assume that we're defining a
simple macro, with (x1,x2,...xn) part of the replacement-list.


#define    MAX(x,y)    ((x)>(y)?(x):(y))
#define    IS_EVEN(n)  ((n)%2==0)

Now suppose that the following statements appear later in the program:

 i = MAX(j+k, m-n);
 if (IS_EVEN(i)) i++;

The preprocessor will replace these lines with:

 i = ((j+k)>(m-n)?(j+k):(m-n));
 if (((i)%2==0)) i++;

As this example shows, parameterized macros often serve as simple functions.
MAX behaves like a function that computes the larger of two values.
IS_EVEN behaves like a function that returns 1 if its argument is an
even number and 0 otherwise.  Here's a more complicated macro that
behaves like a function:

  #define TOUPPER(c) ('a'<=(c)&&(c)<='z'?(c)-'a'+'A':(c))

This macro tests whether the character c is between 'a' and 'z'.  If so,
it produces the upper-case version of c by subtracting 'a' and adding
'A'.  If not, it leaves c unchanged.

Here are some more rules for macros:

* A macro's replacewment list may contain invocations of other macros.
  For example, we could define the macro TWO_PI in terms of the macro
      #define     PI  3.14159
      #define TWO_PI  (2*PI)

  When the preprocessor encounters TWO_PI later in the program, it
  replaces it with (2*PI).  The preprocessor then RESCANS the replacement
  list to see if it contains invocations of other macros (PI in this case).
  The preprocessor will scan the replacement list as many times as needed
  to eliminate all macro names.

* The preprocessor replaces only entire tokens, not portions of tokens.
  As a result, the preprocessor ignores macro names that are embedded
  in identifiers, character constants, and string literals. Example:

  #define  SIZE  256

     puts("Error: SIZE exceeded");

  After preprocessing, these lines look like this:


  if (BUFFER_SIZE > 256)
     puts("Error: SIZE exceeded");

* A macro definition normally remains in effect until the end of the
  file in which it appears.  The preprocessor doesn't obey scope normal
  scope rules.  A macro defined inside a function definition isn't local
  to that function; it remains defined until the end of the file.

* Macros may be "undefined" by the #undef directive. The #undef directive
  has the form:  #undef identifier
  where identifier is a macro name. Example:  #undef N
  removes the current definition of the macro N. (If N hasn't been
  defined as a macro, the #undef directive has no effect.)  One use
  of the #undef is to remove the existing definition of a macro so
  that it can be given a new definition.

The C preprocessor recognizes a number of directives that support
conditional compilation--the inclusion or exclusion of a section
of program text depending on the outcome of a test performed by the

Suppose we're in the process of debugging a program. We'd like the
program to print the values of certain variables, so we put calls
of printf() in critical parts of the program. Once we've located the
bugs, it's often a good idea to let the printf() calls remain, just
in case we need them later.  Conditional compilation allows us to
leave the calls in place, but have the compiler ignore them when we
make the production version.

Here's how we'll proceed.  We'll first define a macro and give it
a nonzero value:

  #define DEBUG 1

The name of the macro doesn't matter. Next, we'll surround each group
of printf() calls by an #if-#endif pair:

  #if DEBUG
  printf("Value of i: %d\n, i);
  printf("Value of j: %d\n, j);

During preprocessing, the #if directive will test the value of DEBUG.
Since its value isn't zero, the preprocessor will leave the two calls
of printf() in the program (the #if-#endif lines will disappear, though).
If we change the value of DEBUG to zero and recompile the program, the
preprocessor will remove all four lines from the program.  The compiler
won't see the calls of printf(), so they won't occupy any space in the
object code and won't cost any time when the program is run. We can
leave the #if-#endif blocks in the final program, allowing diagnostic
information to be produced later (by recompiling with DEBUG set to 1).

The #ifdef directive tests whether an identifier is currently defined
as a macro:
  #ifdef identifier

Using #ifdef is similar to using #if:

  #ifdef identifier
  lines to be included if identifier is defined as a macro

The #ifndef directive is similar to #ifdef, but tests whether an
identifier is NOT defined as a macro:

  #ifndef identifier

#if, #ifdef, and #ifndef blocks can be nested just like ordinary `if'
statements. When nesting occurs, it's a good idea to use an increasing
amount of indentation as the level of nesting grows.  Some programmers
put a comment on each closing #endif to indicate what condition the
matching #if tests:

  #if DEBUG
  . . . .
  . . . .
  #endif  /* DEBUG */

#elif and #else can be used in conjuction with #if, #ifdef, or #ifndef
to test a series of conditions:

  #if expr-1
  lines to be included if expr-1 is nonzero
  #elif expr-2
  lines to be included if expr-1 is zero but expr2 is nonzero
  lines to be included otherwise

Although the #if directive is shown above, the #ifdef or #ifndef
directive can be used instead.  Any number of #elif directives--
but at most one #else-- may appear between #if and #endif.

I hope this helps!

Happy Programming!

More information about the Courses mailing list