[prog] Passing data between modules

Mon Jul 28 12:54:20 EST 2003

>I'm planning a suite of data quality control programs in C to run under
>*NIX.  I want to keep everything as modular as possible in order to make the
>suite flexible and futureproof.

Hold it: you just wrote two conflicting sentences here. You want to write 
the program in C, and you want to make it modular, flexible and futureproof.

Sorry to be so negative, but if you want to be futureproof, at least use 
C++. (Of course, you may have meant C generally, including C++.) Why is it 
so important to use C?

I recommend a higher-level interpreted language: the program will be done 
five times faster and will be at least twice as readable, not to mention 
easier to change and less buggy. If you plan to do lots and lots of text 
processing, I recommend Perl. Otherwise, I recommend Python.

>I'm concerned with passing data between the
>modules and how to do this in a portable, accessible manner.

This is something that C is really bad at. At least use C++.

>To effectively use modules, each module should be a stand-alone program (I
>may be wrong here).

If this system has to be available asynchronously in realtime (i.e., it's 
not just a couple of people in a room entering data), it might make sense 
to make the IO module a daemon program. Otherwise, I don't see a good 
reason to do this. And even if you want the data in realtime, most database 
programs (e.g., MySQL or Postgres) handle asynchronous access 
automatically, so the IO module doesn't have to worry about it.

Other modules might be standalone programs, but they can be linked to each 
other staticly. For example, you might have a TEST engine with a thin 
wrapper that executes on the command line, while still compiling the TEST 
module into the FIX module so that the FIX module can figure out what it 
needs to fix.

If you really want to make all the modules separate programs (which I don't 
necessarilly recommend), you might try using Unix sockets for 
communication. This adds a lot of overhead (because all communications have 
to be serialized and deserialized), and in C it may make it more difficult 
to change the data structures later (because you have to change the 
serialization algorithm each time you change a data structure), but it's 
flexible means of message passing. And with minimal effort, you could later 
change the program to use TCP sockets, allowing the different modules to 
run on different machines.

>Am I mad?

No more than any other programmer.   :-)

>In over my head?

Maybe. I often have a tendency to jump into a project, only to drop it a 
week later because I see something that seems more important. If you're 
going to do this, go ahead and do it. But ask yourselves seriously if 
you're in it for the long haul. If this is something you intend to do in 
your spare time, it's probably not a very high-priority item, so chances 
are something higher priority is going to come along pretty soon. A single 
finished project is better than ten unfinished projects.

Oh, by the way, unit testing is your friend. Test even the smallest 
functions. In addition to catching mistakes, the tests allow you to be 
reasonably certain of where the bugs are NOT, so that you can concentrate 
on looking for bugs in the right places. And they give you peace of mind.   :-)

-- 
    I will follow the dictum of the greatest book-marketing genius
    who has ever lived, Mao Tse-tung: If you don't own two copies of
    my little red book, you die.
          - novelist T. Coraghessan Boyle, joking about what he
            would do if he were dictator of the United States