[prog] Sample implementations of UNIX utilities.

Sun Dec 29 16:43:06 EST 2002

Warning: strong opinions herein.  All of these opinions have several
lines of reasoning to back them up, and many of them have hilarious
stories of failed software projects.  :)  But don't mistake a strong
opinion, even one with great reasoning to back it up, for truth.

> OK, ante upping somewhat here - once again, the opinions expressed within are 
> my own, and not necessarily those of my employ-oops, wrong disclaimer. But 
> you get the idea.

I do so love a vigorous debate.  :)

> Oh, a theoretician are we? You know, that explains a lot - your examples of 

No, I'm a cryptographer who specializes in computer security.  That
means I'm either a mathematician who has to deal with the Real World, or
a computer scientist who has to constantly whap his friends upside the
head and remind them that computers literally cannot do what they want
them to do.  :)

In crypto, math is _important_.  If you don't have the math on your
side, you're nothing but meat for the people who do have math on their
side.  And likewise, programming skills are important; the best math
background out there doesn't help you out one bit if you can't translate
it into working _and reliable_ code.

As a general rule, most cryppies live and breathe for functional
languages.  LISP, particularly.  LISP is reducible to set theory, and
Bertrand Russell did wonderful work at the turn of the century building
all known mathematics from set theory.  The upshot of this is that
turning math equasions into LISP code is a very painless process.

I'm certainly not suggesting LISP is the be-all end-all, by the by.  I'm
bringing this up to show the problem domain I spend most of my time in. 
I'll let you figure out how badly my exposure to the problem domain has
addled my brain.  :)

> "real-world" applications seem to be mostly mathematical problems, and you're 
> completely right when you say that C sucks at pure maths, expecially next to 

Not necessarily.  Number-crunching, C is good at; number _theory_, it's
time to shift to another language.

> languages like ML (I know that much). But your original post wasn't talking 
> about mathematical problem-solving, it was talking about systems programming, 
> which is far closer to C's strong spots. And about the lamda calculus - that 

Define "systems programming", please.  I've had this discussion at
length with other people, and they all seem to have different ideas of
what systems programming means.

My number one objection to C is its reliability.  Namely, it has very
little--there are too many places a programmer can make a subtle error. 
I once spent four days chasing a bad pointer over four compilation units
and three separate libraries (in telephone-switching software for a
major U.S. phone carrier).  That experience was ten kinds of not fun.

In many industrial C programs, there are literally _tens of thousands_
of pointers.  Now, I think I'm a pretty good C programmer, but there's
no way I could honestly give an assurance that I'd checked every one of
those tens of thousands of pointers to make sure none were dereferenced
after being NULLed, that every one was free()d, etc.  Automated tools
like Purify and valgrind help out a lot, but they can only get you 90%
of the way.  The remaining 10% only come out with great pain and
suffering.

I don't know if you remember this or not, but about in 1990 the entire
North American AT&T network went down for nine hours due to a programmer
who forgot a "break;" at the end of a case statement.  

60,000 people were left without any phone service whatsoever; all of
North America was without AT&T long-distance.  70 million phone calls
went uncompleted--some of them phone calls to ambulance crews, fire
departments, police stations.  $60 million in revenue was lost; total
second-order damages (to businesses who were affected by the phone
outage) is estimated at something on the order of $1 billion.  And even
worse than that--people died.

See: http://www.dmine.com/phworld/history/attcrash.htm
See also: http://www.cs.berkeley.edu/~nikitab/courses/cs294-8/hw1.html
See also: Kuhn, Richard D.  "Sources of Failure in the Public Switched
Telephone Network".  IEEE Computer, Volume 30, Number 4 (April, 1997).

If you're writing in a part of your code where you need that low-level
control, by all means, go for it.  Break out the Assembler, even, if
it'll help.  But if you _don't_ need that degree of control... then I
think it's only sane engineering to move to a different language, one
which doesn't suffer as badly from reliability woes, and use FFI to
interface with the C-or-Assembler-written bottleneck modules.

> describes, as you say, how a Universal Turing Machine works. Not what your 
> average person will call "a computer". What most people mean by "a computer" 

The average person doesn't understand what relativity is, either; does
that mean astronomers should go back to Newtonian Mechanics?  We have
better tools and better theories than the average person knows about or
understands; why not use these better tools and theories in order to
give the average person a better, more reliable, _safer_ computing
experience?

> is the hunk of hardware sitting on your desk, and you learn far more about 
> how *it* works by learning a low-level language than by studying lamda 
> calculus. Not that the mathematical stuff is irrelevant or anything - the 

Nobody's suggesting (at least, I'm not suggesting) that people shouldn't
learn how a particular processor operates.  Like I said, an
understanding of the math involved is absolutely essential--but sooner
or later, you have to make it run on real hardware.

But it doesn't make much sense to me to go at it the other route: to say
"well, I know how this processor works, therefore, I know what I need to
know."  In that case, great--what happens when your employer says "We're
making circuit boards, and we want to use the absolute minimum amount of
solder in our connections.  Write a program where we can give the layout
of the board and it'll give us the minimum solder needed."

With a math background, you can tell your boss "that can't be done;
that's an NP-complete problem."

With just a purely practical background, you'll wind up telling your
boss "sure, it won't be too hard..."

> theory is great fun, but you can't say that C is no good for most systems 
> programming just because it's not a highly theoretical language.

I never said C was no good for most systems programming.  It's a
Turing-complete language; it can do anything LISP can do.  Probably
more, since it's (technically) supra-Turing.  I said C wasn't a good
choice for most systems programming, since the areas in which you need
C's fine-grained control are typically very small.  That leaves you with
very large areas of the problem where you gain nothing from C, and only
expose yourself to risk.

Check this link: 

http://www.counterpane.com/crypto-gram-0212.html#4

... and the link inside that one, to:

http://m.bacarella.com/papers/secsoft/html

The Bacarella link is _wonderful_.  He Gets It(tm).  :)

I have yet to find someone in my field (crypto and computer security)
who has disagreed in any substantial way with Bacarella's essay.  Most
of us have some minor quibble, but by and large common consensus is that
it's great.  :)

> I rather liked that quotation Jenn (I think it was) gave about how concepts 
> of the One True Language would be called a childhood disease of programmers 
> if so many adults didn't do it too <g>

I was the one who gave the quote.  It's from Bjarne Stroustrup.

> therefore sticking to it. I suppose it's also easier to do it with C than 
> many other languages because you *can* do *anything* in C without being too 
> obviously unreasonable. Ugh, that last sentence *felt* clunky - did you get 

Read this paper before you say "you can do anything in C without being
too obviously unreasonable": 

http://www.research.att.com/~bs/new_learning.pdf

It's a great example of how trying to do even very reasonable tasks in a
sane, error-handling manner can be unreasonable in C.

> About that better-than-manual automatic GC, by the way - how does that work? 

I'm going purely from memory here, but it had something to do with the
GC system being smart enough to put the memory-recollection in
nonbottlenecked parts of the code.  Most programmers don't run their
code through profilers and then re-code to avoid bottlenecks, which
means oftentimes manual GC winds up being handled in the middle of a
code bottleneck.

I may be completely barking wrong.

> Agreed. But you started off suggesting that the students on this computing 
> course should learn something more high-level _instead of_ C, which shrinks 
> the toolbox as much as doing C at the expense of all higher-level languages.

Not at all.  Once you learn the Turing Machine, you can take those
skills _anywhere_.  Once you learn how to analyze algorithm complexity,
you get to see why C strings are so hideously awful and weird little
tricks you can do to them to make them orders of magnitude more
efficient.  

Once you learn how an alpha-beta tree search works, you can apply that
to any language.  Once you learn how red-black trees work... once you
learn how balanced binary trees work... etc.

I think students should learn a very high-level language first, and then
as they progress in their studies, move down further and further until
they're working on bare metal.  This is a view shared by many
institutions, MIT in particular--many of the world's best CompSci
universities teach Scheme as a first language.

> But you still don't acknowledge that it's ever a reasonable language for 

I don't think C is a reasonable language for ordinary systems
programming.  It is so demanding of the programmer that pervasive
problems are de rigeur.  There is a reason why the DOE says nuclear
reactor control software can't be written in C, why airplane avionics
software is written in Ada, why the Space Shuttle flies on HAL/S. 
Wherever you see reliable as Death and taxes software, you're usually
not seeing C.

Linux systems are more reliable than Windows, yes, but Linux still isn't
reliable--not when I have an X lockup once a week.  OS/2 is a pretty
reliable piece of software, but I still see bank ATM terminals in a
reboot cycle.

> performance is important", and then go on, in the face of quite a bit of 
> evidence, to say that realtime computing, low-level drivers, virtual machines 

I haven't seen any compelling evidence to the contrary.  I see a lot of
evidence to support my position--see the fuzz tests, for instance.  GNU
tools were hailed as the victor because they only catastrophically
failed five percent of the time on random inputs.  Realistically, this
number should be _zero_.  GNU has nothing to be proud of--saying "we
only catastrophically failed five percent of the time on random inputs"
is sort of like saying you're the best-behaved inmate in the
penitentiary.

> I still call unfair test - you went and used the most advanced, time-critical 
> tools in your toolbox to do the C++ version, and then pointed at it and said, 

No, I used the most advanced, time-critical tools in _all_ the
toolboxes.  :)  At least, the most advanced, time-critical tools that I
knew of; I'm not going to claim there weren't some weird things I
could've done with Java to make the Collections work faster, etc.  Just
that I used the best tools I knew of in each language.

It's just that for that one particular example, C++ had the best tools.