[Techtalk] Server monitoring and trends

Magni Onsoien magnio+lc-techtalk at pvv.ntnu.no
Wed Sep 15 09:12:46 UTC 2010


Hi all,

I have fallen off the bandwagon of useful tools over the past years, so
I must admit I have no clues whatsoever about whichs tools I should
choose for monitoring and to see trends in resource usage (statistics) 
for my servers. 8 years ago I would have chosen Bigbrother for the 
monitoring part and something else I can't remember the name of for 
trends. But what about in 2010?

Short overview of our needs for monitoring software: 
* 30ish servers at 4 sites.  The state of each server as well as each 
site should be available. The state of a site is the sum of all 
services, not necessarily the sum of servers. Each server may run
several services, and it may take several servers to run one service -
even several services for another service.

* The previous requirement may be complicated, so as a minimum we need
to monitor each physical server and see the state of a site as the sum
of all servers.

* All servers are running Ubuntu Server 8.04 LTS, and while we build
everything ourself, we try to base our packages on standard Ubuntu
src-debs.

* People here love perl, but I am pretty agnostic when it comes to 
finding something that will be useful to us.

* The applications are inhouse-maintained business software plus
Apache and Postgres. Some are available via network, some are not. The 
monitoring software should be able to check the state via network and 
locally on a server (including the server health itself). If necessary
we want to write our own plugins to monitor certain services (so a plgin
architechture is good).

* It's confidential data, so any cloud computing stuff will specifically 
NOT be possible to use.

The other part is statistics and trends. I need something to parse
logfiles, e.g. Apache and Postgres, and all the standard syslog files.
Our own software logs via syslog, but we need to be able to define our
own logfile contents. It's typically number crunching we need here, as
well as nice graphs. I guess we can predict a bit from the graphs.


Since we're running out of time, we need something that is quick (and
hopefully not dirty) to implement first. If there is a better solution
that is more time consuming to implement, we can do that in a few months
- tips for both quick and longterm solutions appreciated :-)


I will send a summary of private replies (anonymous and generalized, of
course) to the list later if they seem generally useful. Please state it 
if you aboslutely don't want me to do that. Replies to the list is of
course easiest :-)



Magni :)
-- 
sash is very good for you.


More information about the Techtalk mailing list