Print

Print


At 02:24 AM 10/27/2006, Matthew Kolb wrote:
>On Oct 27, 2006, at 12:02 AM, John Gorentz wrote:
>
>>We've had several persons lately who've experienced long delays  
>>lately in mail that goes to the mail.msu.edu system -- up to 24  
>>hours for mail to go from person to person within our department.    
>>Has this been a general problem for others lately?    It isn't  
>>happening to all mail, and sometimes mails between the same two  
>>persons go through right quick.   Some of these mails originate or  
>>end up on our own mail server, but the delays all seem to be  
>>happening at mail.msu.edu.
>>
>>Is there perhaps a retry queue that mails can get tossed into when  
>>there are hiccups in the system, with a retry interval that's  
>>pretty long?
>
>Unfortunately this has been an ongoing problem.  We have tried our  
>best to document it during periods of extremely high load via http:// servicestatus.msu.edu/.  If you look at our mail traffic graph:  
>http://project.mail.msu.edu/~rrdtool/ you can see that we have gone  
>from about 1.5M messages per day last November to almost 4M messages  
>per day today.  This is all running on the same hardware, and it was  
>difficult to foresee this much growth -- almost all of which is SPAM  
>(I know other Higher Ed institutions are also experiencing this mail  
>corpus growth, and I would assume that other major ISPs are as  
>well).  This increase in fun-new-SPAM requires more processing power,  
>and consequently our SPAM filters have to constantly become more  
>complex, compounding processing time.  As you can imagine, these  
>factors increase load dramatically on our incoming MX machines.  This  
>increased load means more time processing the queue, which directly  
>corresponds to the occasional mail delivery delays our users are seeing.
>
>We are *currently* working feverishly to port some of our software to  
>our new platform (we received 10 Sun X4100s (dual-dual cores) with  
>8GB of RAM each).  Once this step is complete, we will put this  
>hardware into production, and subsequently alleviate these delivery  
>delays; for how long, I'm not sure!
>
>./matt

Thanks for the information, Matt.  I had been vaguely aware that keeping up with the volume of mail has been an issue that can sometimes cause performance problems.  But all of a sudden I had a rash of incidents like this, with specific enough information to enable me to trace what was happening to the messages on our end.  

I'll let people know about the measures you're taking.

John