[Techtalk] system call to bin/mail from a program sometimes fails

Marilyn Wulfekuhler marilyn at mackinacsoftware.com
Fri Dec 7 20:43:29 UTC 2007


Hi folks,

We have a multi-threaded application that runs 24/7 which, when it 
detects an error condition out in the field, sends mail to us, and to 
the customer it's running on behalf of.  Most of the time.  Sometimes, 
for no apparent reason we can detect, the mail just fails to show up.  
The thing runs for weeks and dutifully reports the error conditions via 
email, so that we can address them, until one day it doesn't -- and we 
find out via other means :-( .   Killing the app process and restarting 
it makes the email start flowing again (but not any that were missing; 
ie, they were not queued up).  The app still runs perfectly, other than 
not sending the mail (ie, the error conditions that its supposed to 
send mail on are still detected and logged).

I'm using a system call to /bin/mail from a C++ program, on Cent-OS  
4.3.  /etc/issue doesn't give a kernel version.  system call from the 
program is given below:

string mailcommand;
mailcommand = "mail -s\'Unable to get info from " + id + ". reason: "
       + err->enumToString(err->getReason())
       + "\' "
       + recipients
       + " < /dev/null";

system (mailcommand.c_str());

Like I said, it works for a while (weeks), then doesn't.  Sort of 
sounds memory leakish, or a buffer getting full somewhere.  I've tried 
creating a small test program with the above code in a loop, sleeping 
10 seconds between, and it's sent me over 30,000 emails without 
failing.  The most of the "real" errors we've ever had since the 
beginning of time is around 2500.  Anyone have any ideas on what's 
going on?

Thanks,
Marilyn



More information about the Techtalk mailing list