rryBlog
Tue, 14 Aug 2007 @ 12:22
[/tech]
Ashes to, er Flurry…

Time for a new mailserver, then.

In an ideal world, I’d be putting the following on my shopping list:

HP DL380 G5, dual 2.67 GHz Xeon 5150 CPUs, 8 GB RAM, 8x 36 GB 2.5” 15k rpm SAS disks

- for the main mail server, anti-spam&virus, the mail queue (a large number of low-capacity 2.5” disks is the best route to achieving ultra-low seek times, which is important for randomly-accessed data like email), and sending our weekly mailshots.

HP DL320s, single 2.67 GHz Xeon 3070 CPU, 4 GB RAM, 12x 300 GB 3.5” 15k rpm SAS disks

- nfs server for users’ Maildirs, and the customer care mail database.

DL140 G3, dual 2.67 GHz Xeon 5150 CPUs, 8 GB RAM, 2x 100 GB 7.2 krpm SATA disks (for booting from only) x2

- user-facing servers - the first hosting a number of Xen instances for people to read mail using mutt or adjust their procmail setups, as well as pop3 and imap servers, and the second running webmail and the web frontend for the customer care mail system.

mm, tasty.

Unfortunately, I don’t have a budget of fifteen grand to spend on the above, so I’m going to make do with just the one box. In fact, it’s worse than that, as I’m also going to have to use this machine as a replacement for our ftp server, our “friends’n’family” webserver, and as a backup server (connected to a nice lto2 tape array).

As a result, I plumped for the following:

HP DL320s, single 2.67 GHz Xeon 3070 CPU, 4 GB RAM

- with an upgrade to a 512 MB battery-backed write cache

2x 72 GB 15k rpm 3.5” SAS disks

- a RAID 1 array for the mail queue and system partitions

10x 250 GB 7.2k rpm 3.5” SATA disks

- for users’ Maildirs

Total price was about £3,000 - a fifth of the cost of doing it right.


Meet flurry, our new mailserver

Things won’t be so awful for our technical staff - I’ll export their Maildirs to their own Xen instances on our big development server, laganside - so they can read their mail nice and quickly there. And I’ll probably inject the half-million message mailshots from infuse, a Xen instance elsewhere on our network. Even so - the new setup will merely provide a noticeable improvement to our users, rather tham being “zomg ultra-turbo-plus-plus!”. bah!

My task for the rest of week is to thoroughly benchmark the new machine, dubbed flurry. In particular, I’m interested to see the difference in speed between the various disk array setups that are open to me - JBOD (2.5 TB available), RAID 1+0 (1.25 TB available), RAID 5 (2.25 TB available), and RAID 6 (2.0 TB available).

My instinct is to go for either JBOD or RAID 6 - two disk failures will kill a RAID 5 array, and has a 50% chance of killing a RAID 1+0. With that number of disks, from the same manufacturer (a number of different batches, though), and subject to the same physical environment, the chances of experiencing multiple disk failures is higher than I’d like. I’m willing to be persuaded otherwise if the performance penalty for RAID 6 turns out to be huge, though.

Anyway, flurry has now been running memtest86+ for just over 24 hours, so it’s time for me to go start the benchmarking. hurrah!


Tue, 14 Aug 2007 @ 10:43
[/tech]
Dust to dust, etc.

I arrived at Sendit two-and-a-half years ago, and it quickly became obvious that every server needed to be replaced, and every system needed to be overhauled. That job is pretty much complete now - and, thanks to the magic of Xen, we’ve gone from having 60 or so servers to 11 (in fact, the savings in electricity alone will pay for the cost of the machines within the first half of their expected service life).

One of the last tasks on my list is to replace our mail server - something I’ve been looking forward to, as email is probably the closest I have to being a specialist subject. Our current server is ashes - a Dell Poweredge 2450 that entered service on 5th September, 2000. I’m aiming to do the switchover on its seventh birthday :)

Ashes has two 666 MHz Pentium III Xeon processors, a gigabyte of RAM, and 54 GB of disk (4x 18 GB 10krpm SCSI disks in a hardware RAID 5 array). It runs a fairly vanilla install of Qmail, qmail-popup for pop3, and courier-imap for imap4 (both of which are wrapped with stunnel for the ssl-ised variants). Some semblance of anti-spam measures are provided by rblsmtpd (pointed at the sbl-xbl.spamhaus.org blacklist), and most users also run spamassassin from their procmailrc. Anti-virus is provided by McAfee’s uvscan.

Our mail system has a couple of quirks - first, mail coming in to our customer care team is forwarded on to a separate server, angel (another PE2450, though slightly older) for entry into a mysql-back perl behemoth. Secondly, we send weekly special offers mails to half a million or so customers that have opted in to that service - this is done by a third server, y02, which is a shoddy *8* year old Dell Dimension XPS desktop box. eep!

Mail volume is pretty substantial - we have 91 “real” users, and 568 aliases (not counting all the various username-blah@ “dot-qmail”-style aliases). On a typical day, we would see around 350,000 delivery attempts, of which maybe 140,000 will be accepted into the system. Both of those figures can rise by 100,000 in the 24 hours after we send a mailshot, thanks to the staggering number of inventively-broken “out of office” autoresponders that our customers use.

This brings us to one of qmail’s major weaknesses. Since it doesnt check if a user actually exists before accepting mail in to the system, we can’t bounce backscatter / spam / improperly-addressed mail at SMTP time. Instead, we create around 40,000 new bounce messages every day, which might have been acceptable a decade ago, but is terribly anti-social these days.

Of the 100,000 emails that are delivered to users every day, perhaps three quarters are spam or viruses that either get sent to /dev/null or (all to often) end up in people’s inboxes. In short, more modern SMTP-time checks would save our server from doing an awful lot of work.

Finally, we have the issue of disk space. Ashes has 36 GB devoted to the /home partition. This has currently got less than a gigabyte free, and has never been less than 90% full since I’ve been here. Users have become adept at downloading mail, and storing it in whatever nasty, fragile binary format outlook uses. Even so, I have to harrass them every month or two to clean their inboxes out - a huge waste of everyone’s time. Since we use the dreaded RicerFS (in notail mode, too!) most Maildir/ files are unimaginably fragmented - to the point that opening a maildir containing 1,000 messages can take over two minutes.

Something needs to be done…