Security holes in scripts (Re: [Techtalk] Theory vs. practice)

Tue Jan 15 11:01:23 EST 2002

On Mon, Jan 14, 2002 at 03:00:56AM -0500, Raven, corporate courtesan wrote:
> Quoth Jenn Vesperman (Mon, Jan 14, 2002 at 06:28:30PM +1100):
> > Yes, but programmers aren't being taught how to avoid these coding
> > errors, or what errors to avoid.
>  
> 	No kidding.  I know vaguely that not checking your input for CGI
> scripts is bad (things like ' in input can cause large problems if not
> properly dealt with by the script -- a username 
> 
> ' or 0=0 
> 
> will always return true if the ' is not escaped or disallowed by the
> script, because 0 always equals 0) and that buffer overflows can be
> caused by not saying, essentially, "and if there's more data than buffer
> space, return this error and stop writing to memory", but that's it.
> Mostly I end up applying patches to fix things like this rather than
> correcting the code.

Another problem is if, like a lot of CGI scripts, you are taking user
input (say their credit card number, or their address, or their order)
and putting it in a database.

This is fine if you put '4056' in, but what if you put in '4075" DROP
TABLE; DROP DATABASE;' ?

There are too many sites where this would indeed drop (delete) the table
and the database because the input gets fed right into the SQl query.

If you are using user input to generate *anything at all* (filenames,
database queries, URLs, parts of the next page por output  - eg "Please
confirm that your name is Mary Gardiner by clicking here") you should
perform appropriate escapes on it. Look for functions in your scripting
langauge that perform these escapes for you, as they may check for
things you didn't think of, but read the documentation for them :)

This gets more difficult in languages like C where you will have to also
allow for the size of the string.

It's all about the assumptions that you make while you're programming.
Assumptions can make things a lot easier, and are very hard to
completely rid yourself of. In some cases it may make sense to assume
that input is ASCII or ANSI, in other cases you're going to have to
check that someone hasn't given you executable binary data where you
expected their first name. Much C code I've seen assumes that a call to
malloc worked - there is, of course, no guarentee that malloc will be
able to return a memory location to you.

The reason assumptions are made is that it can make coding quicker, and
quicker is a virtue when you have a deadline breathing down your neck
(especially if your boss has his/her house mortgaged on you getting a
product out the door in the next month), or when you have a university
assignment due in two hours.

I'd be interesting in hearing people's experiences with formal software
development procedures, actually, as I've never worked anywhere that has
them. Which methodologies were helpful in producing robust code in
commercial time periods? Which were helpful in developing correct code?
Which were helpful in maintaining code correctness during, for example,
refactoring?

-Mary.

-- 
Mary Gardiner
<mary at puzzling.org>