[Techtalk] tracking down resource leaks

N Hospodarsky hospodarsky at gmail.com
Fri Jan 5 19:32:08 UTC 2007


Hi Almut,

Thanks so much for your thoughtful reply! I found your
answers/suggestions helpful in that a) I know I'm on the right path,
and that I'm not overlooking some obvious system monitoring or
debugging tool, and b) I appreciated the valgrind resource and c) your
concise information on what futex is/does...I'd searched through
google for a simple explanation of what it is, but came up mostly with
developers talking to one another about specific calls. Yours was a
nice, accessible once-over.

I suppose I will open a ticket with Red Hat. The situation is
certainly not pressing, and of course I can schedule a monthly restart
of the process...but it is definitely something that has cropped up
since their last update to that software.

I saw it as possibly a good opportunity to increase my monitoring and
resrource tracking skills.

Anyway, thanks so much for your input,

Best,
N

On 1/5/07, Almut Behrens <almut-behrens at gmx.net> wrote:
> On Wed, Jan 03, 2007 at 10:06:48AM -0600, N Hospodarsky wrote:
> > Hi All,
> >
> > I have a question about resource monitoring. I have a RH server that
> > is running proprietary RH software on it. It seems that one of the
> > processes in that software is slowly sucking up CPU resources. I've
> > been following it using Cricket (http://cricket.sourceforge.net/), and
> > over the last month there's been a steady upward trend of CPU
> > resoruces being used by %User.
> >
> > I have been trying to get as much information as possible before
> > opening a ticket with the vendor; I'm curious what you all generally
> > do when attempting to  track down resource leaks...so far I've
> > narrowed it down to a python process, using the typical looking
> > through logs, getting information from PS....and have used strace to
> > minimally look for information...strace wasn't all that illuminating
> > to me because the output was just a huge stream of:
> >
> > futex(0x9fbc1c0, FUTEX_WAKE, 1)         = 0
> >
> > which means nothing to me.
> >
> > What else can I use to get information about a leaky process? Or is
> > this information the best I can hope for with my non-python-programmer
> > skillset?
>
> Hi,
>
> not sure I can contribute anything useful to this problem, but as no
> one else has said anything so far, I'll just say something :)
>
> If you had reported memory leakage problems, I would have recommended
> tools like valgrind [1] -- but you haven't, so I won't ;)  And problems
> with CPU usage is quite a different beast. I'm not aware of any general
> purpose debugging tools for this, except maybe some profiler. It could
> tell you how much time the program is spending in indidual parts, like
> subroutines, etc. But this typically only makes sense if you have the
> sources (as you mention it's proprietary software, I presume you don't).
>
> In case strace and ltrace aren't providing any useful info, there's
> essentially only a general purpose debugger like gdb to resort to.
> However, in order to debug a program that's essentially working, with
> only some slow gradual degredation over time (caused by some yet
> unknown part of the program), a lot of patience and expertise in the
> proper handling of the debugger would be required. (Also, you can never
> be sure that the debugging itself won't have a significant impact on
> what you're observing, i.e. how the application is behaving.)
>
> The fact that you're observing a gazillion of futex calls [2], probably
> doesn't mean much. They're most likely just reflecting the regular
> synchronization activity of some threaded program, or some such.
> And just in case there should really be a problem at this level, it's
> not something you'd want to debug yourself, most certainly not without
> access to the source code of the application that's causing the
> problems...
>
> What you could try is the following: based on your knowledge of what
> the software needs to accomplish, and how it might go about doing it at
> the implementation level, come up with ideas which aspects external to
> the program might be involved, and whether those aspects might either
> help shed some light on what the program is doing wrong, or whether it
> simply is something outside of the program that's forcing it to do more
> and more work the longer it is running. It might not be the program's
> fault after all. Then develop test scenarios to verify those hypotheses.
> Sounds a little abstract, I know, but as I haven't got the foggiest
> idea what the software is for, you'll understand I can't help you
> generate hypotheses...
>
> Anyway, maybe it's best to just go right ahead and delegate the issue
> to RH. They should be the experts for their software.  So, what I would
> do is
>
> * make sure I'm not overlooking something silly (I think you're past
>   this step already)
>
> * collect evidence that there is a problem, and how it manifests
>
> * open a call with the vendor, and pass on the collected info
>
> As to step two, you could periodically run top from a cronjob to gather
> general resource usage info (if you haven't done so already). Something
> like "(date; top -b -n 1) >>top.stats". (The -b makes top run
> non-interactively).  It's probably helpful to extend this to also log
> other info you _suspect_ might have to do with the problem (for example
> size of files being used, number/status of open sockets, etc.).
> Then, at the end of the observation period, maybe run some script over
> the results to filter out irrelevant stuff.
>
> And, if all else fails to resolve the issue, there's still the good ol'
> "restart once a day" strategy.  (Under Linux you usually only need to
> restart the problematic process, not the entire OS -- but you never
> know...).  Not really satisfactory for those with a keen sense of
> beauty-in-IT, but at least pragmatic.
>
> Well, as I warned you up-front, nothing really enlightening :)
> But good luck anyway,
>
> Almut
>
>
> [1] http://valgrind.org/
>
> [2] futexes are typically used to synchronize usage of memory and other
> resources that are shared between threads or processes.
> Just in case you want to learn more about this tricky kind of stuff,
> there's a good article by Ulrich Drepper (not an easy read, though):
> http://people.redhat.com/drepper/futex.pdf
>


More information about the Techtalk mailing list