[Techtalk] Diagnosing server problems

R. Daneel Olivaw linuxchix at r-daneel.com
Tue Sep 27 16:40:06 EST 2005


> * access web applications running on it
> * ssh into it (times out)

Second issue, is quite interesting.
Try loggin in locally. This will tell you if the problem is network
related or system related.
> For example, does the box have enough memory?  The following is from
> the  headers of top, shortly after one of these "slow" times:
>   16:20:24  up 110 days, 12:13,  1 user,  load average: 0.00, 0.00,
>   0.00
> 238 processes: 237 sleeping, 1 running, 0 zombie, 0 stopped
> CPU states:  cpu    user    nice  system    irq  softirq  iowait   
> idle
>             total    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  
>             99.8% cpu00    0.0%    0.0%    0.3%   0.0%     0.0%   
>             0.1%   99.4% cpu01    0.0%    0.0%    0.0%   0.0%     0.0%
>                0.0%  100.0%
>             cpu02    0.0%    0.0%    0.0%   0.0%     0.0%    0.0% 
>             100.0% cpu03    0.0%    0.0%    0.0%   0.0%     0.0%   
>             0.0%  100.0%
> Mem:  3094228k av, 3002716k used,   91512k free,       0k shrd, 
> 234328k  buff
>                     1937528k actv,    5820k in_d,   46812k in_c
> Swap: 4192956k av,  130932k used, 4062024k free                
> 1343584k  cached

Try 'atop', it's a more advanced program that also shows you network
throughput by interface and quite some more details.
I run such monitors continously with an open ssh session on servers when
I try to hunt down performance problems.

Just keep in mind that usually, use of any monitoring software 'may'
worsen the situation because it eats up cpu time, but obviously, your
issue doesn't seem to be related to cpu outages.

> Obviously the CPUs aren't being strained at all.  But do the memory
> data  indicate heavy usage or is 91512k free actually perfectly
> adequate?  Am  I even reading this correctly?

This is only the 'real' free memory, the system uses free memory for
caching and buffering so the maximum amount of physical ram is used to
enhance performance. Usually, a memory shortage is indicated by heavy
use of swap. In and Out swapping also reduces system performance.

However, this should not totally cut out the system responses.
> Some of the other possible things I can think of are
> * insufficiently frequent garbage collection by the Java web apps 
> running on it

java ... we run several instances of java webapps on a similar system at
work, there are close to no performance issues (just a strange
random system hang & reboot with builtin ASR functionality. It's a
Proliant Server from HP).

> * heavy usage on other machines at the colo that share bandwidth

using hubs or switches ?

> * misconfigured DNS somewhere that might be causing delay for some
> users

hmmm, this would need in depth diagnostic, ... from within the server :(

> Surely there are other possibilities as well.  Any thoughts of any
> kind  are appreciated!

Try also looking into /var/log/messages ...
Else, use webmin's "system status" module to monitor local services and
network connectivity (ping/http/...) from inside the server and raise
mail alerts automatically (the server will queue e-mails if not
connected). Also, make sure the server hasn't just 'booted' (use
'uptime' command).

Here you go, and good luck,

R. Daneel Olivaw,
The Human Robot Inside. 

More information about the Techtalk mailing list