[Techtalk] Server Deployment

Sat Jun 1 15:09:16 EST 2002

OK, I have a bit of experience in this area, since part of my day job
involves working with systems that serve lots of HTTP requests out of
databases which "absolutely must not fail, ever" requirements (so
redundancy all over the place).

Please excuse the very long "here's what I do at work" post, but it may
provide some help...

To put my examples in contect, we don't run "full fledged" web servers
that serve arbitrary HTML pages and run CGI scripts, but we do do a fair
bit of post-processing of the database queries on the "web serving"
machines before sending it back and everything is done via HTTP or
HTTPS (depending on circumstances). In everything that follows, I am
talking about commodity level hardware (easily available, easily
replaceable, build things in many tiny boxes rather than one enormous
box) -- everything is different if you go with things like the Sun
Enterprise level boxes (E10000 and up).

Also, we are using PostgreSQL across the board, rather than MySQL, so my
numbers may be a little different than you get. We also cache the
database connections, rather than opening up new ones on demand, since
the overhead involved in the latter would just cripple our systems.

On Sat, Jun 01, 2002 at 12:39:43PM +1000, Jenn Vesperman wrote:
> On Sat, 2002-06-01 at 11:22, James wrote:
> > Any suggestions for hardware specs on these machines?  It would be two
> > web servers and two SQL servers (master, slave).  Primarily going for
> > "bang for the buck" as we're facing the economic ripple effect (aka ****
> > flows down hill).
> 
> I'm not up on current hardware, so I can't say. But I know that the
> people who are will want to know things like:
> 
> 
> Web server:
> * how many requests/second
> * how large the average response is
> * minimum and maximum response size
> * other stuff they'll ask for

You certainly need to know those and have some way of testing for it.
Since every combination of hardware and software is going perform
slightly differently (and exhibit strange behaviour under load) you
absolutely *must* have some way of evaluating what you have, or be in a
position to rapidly throw more hardware at the problem (if the solution
is one that scales nicely). Unfortunately, for very high loads, it's
really hard to test (i.e. it's basically impossible without extensive
hardware labs), so you need to have active monitoring in place on the
production boxes to try and understand what is going on after you make
everything live, too.

The only thing I would add here is knowing what the traffic patterns are
like. For example, if your peak load is ten times your average load, but
only lasts for half an hour a day, then you can tolerate a bit of
thrashing during that period, since the system can probably recover in
the subsequent minutes. If, however, your peak load lasts for large
numbers of hours, you don't have time to "recover", so everything needs
to run extremely smoothly at the peak load (with room to spare, since
the peak can increase and you still need to be able to log in to perform
diagnostics in realtime if necessary).

> SQL server:
> * requests/second
> * amount of data
> * sizes of tables
> * other stuff they'll ask for

Number of simultaneous connections and the behaviour of those
connections (mostly reading, reading and writing, periodically lots of
writing, but mostly quiet, whatever is typical) -- I'll give examples of
what I found in a minute.

So typically we are using boxes that are 
	- dual PIII or dual Athalon boxes (as fast as possible -- in the
	  1 - 1.6 GHz ranges at the moment);
	- lots of memory -- about 1GB;
	- multiple disks (to spread the data access contention)
		- some boxes have IDE drives on different channels
		- some boxes are using SCSI disks
	- EEPro100 network cards, all talking 100Mb via switches.
(what significant components have I left out here?)

One configuration we have in use is three of these boxes serving HTTP
requests, three database boxes and one extra box (much lower powered)
using iptables to balance the connections between the three HTTP boxes.

We have not found any real problems using IDE drives versus their SCSI
counterparts and they are much cheaper and easier to replace when
something goes wrong (and hard drives *do* fail). You just have to
remember that two hard drives on the same IDE cable are not going to be
accessible at the same time -- they have to serialise the data transfer
on the cable which is a real performance hit. So we use the Promise IDE
multi-channel cards (two port ones or the first two ports on the four
port cards) to get lots of hard drives into the boxes. Again, this turns
out cheaper than the multiple SCSI approach.

Athalon CPUs are significantly cheaper and perform better than their
equivalent Intel counterparts, in our experience, under Linux.

The huge memory means that we never really start using swap space on
disk and a bunch of commonly accessed files end of living in the buffer
cache, so that a request to read them from disk really just sucks them
out of RAM which is waaaaay faster.

On these sorts of platforms (running databases on seperate machines from
the HTTP-talking boxes and having multiple HTTP boxes talking to
multiple database boxes for redyndancy), we server sustained loads of
200 requests per second out the front-end, which is about four times as
many queries to the databases (via persistent connections, though, so
it's all data transfer -- no teardown and setup tiems) and our typical
response times are in the 5 - 20 millisecond range (although in absolute
peak periods some things can take up to 2 seconds). Our testing seems to
indicate that we can probably use the same hardware and handle sustained
rates of around 600 transactions per second on the HTTP side, but it
would be close.

I should also note that at the volume of transactions we are currently
doing, the half-duplex (aargh!) network we are feeding the data to on
the client's side is starting to get close to 40% saturated (which is
extremely bad for half-duplex, since the collision count starts to
become significant). So, while it may seem obvious, you may not need to
be as fast as you think if the upstream side can't handle the volume.

Hopefully that gives you some sort of idea about is being used
by at least one group of lunatics in production. We are also running web
hosting for some clients on much lower powered machines, but they don't
really do serious volumes like the thing I talked about above.

Cheers,
Malcolm

--