I don’t usually take benchmarks very seriously. It’s worthwhile running them on new hardware as a quick check that everything’s working as expected - but if the results are within an order of magnitude of the hoped-for numbers after a single run, then I’m usually happy to move onto more productive tasks. Leave the endless tweaking and measurbation to the inhabitants of gentoo-land.
With flurry, though, I thought I should take a little more care. It has a 10 disk array, so the standard “ach, sure, raid 5 will do” instinct can be very dangerous. A single disk failure will leave the machine vulnerable for up to 72 hours - a couple of days to replace the disk, and another to rebuild the array. That’s a bit too long for comfort, especially if environmental factors have been the root cause of the initial failure.
So; I really, really wanted to go for RAID 6 - but I was unsure as to how much of a performance penalty that that would incur. My vague, handwave-y guess was that it’d be about a third slower in use, when compared to RAID 5. I’d consider 50% slower to be unacceptable, and anything less than 25% slower to be surprisingly good.
It turns out that bonnie++ was the best tool for the job. I was able to mimic the sort of operations that our current mail server does most often by using it with the following command line:
bonnie++ -b -d /home/rory/ -u rory -n 128:25000:500:16
ie. to write 128*1024 files to each of 16 directories, with a random variation of sizes between 500 and 25000 bytes (the average filesize on our current mailserver is 12.14 Kb - so that’s about right) - 25 Gigabytes of data in total. The -b option causes it to issue fsync() calls after each file has been written - again, this is the same setup that we’ll be using when the server goes live.
I ran that five times on a 1 TB “vanilla” ext3 partition (mounted noatime, like every other disk partition i’ve touched in the past five years!) sitting on top of a LVM volume, which in turn was mounted on the various types of RAID array (5, 6, 1+0, 0 [the latter for comedy value only, of course]) supported by out HP P400 card. I didn’t bother trying any form of software raid.
For comparison purposes, I also ran bonnie++ on a machine that is identical to ashes - which had served as a webserver from September 2000 to August 2005, and hasn’t been touched since then. It has a 30GB partition mounted at the start of the array (ashes has a 40GB one), which is formatted as reiserfs (as it is on ashes). It’s therefore going to give us a nice indication of how much (if at all) faster the new system is compared to what it’s replacing.
The results are as follows:

Well, unsurprisingly, all of the SATA RAID levels are faster than the old SCSI RAID 5 array for each of the four operations - between 13 and 14 times as fast for random reads (due mostly to having 10 spindles rather than just 4, I’m sure).
However, RAID 6 is “only” 3.01 times faster for random creates, compared to 5.87x for RAID5. That’s very close to being unacceptable to me, especially since this sort of operation accounts for almost a third of all those performed on our current mailserver.
Another option may be to go for RAID 5 + a hot spare. I’d end up with almost the same speed as the 10-disk RAID5 array, whilst being able to automatically being the array rebuild after the first failure - reducing the “danger time” by two thirds.
On the other hand, a mimimum of three times faster than the current system is still perfectly decent. I think I’m going to go need to do another round of measurebation aren’t I? Oh god, it’ll be -funroll-loops and buying a bass tube for my vauxhaul nova next…
Time for a new mailserver, then.
In an ideal world, I’d be putting the following on my shopping list:
HP DL380 G5, dual 2.67 GHz Xeon 5150 CPUs, 8 GB RAM, 8x 36 GB 2.5” 15k rpm SAS disks- for the main mail server, anti-spam&virus, the mail queue (a large number of low-capacity 2.5” disks is the best route to achieving ultra-low seek times, which is important for randomly-accessed data like email), and sending our weekly mailshots.
HP DL320s, single 2.67 GHz Xeon 3070 CPU, 4 GB RAM, 12x 300 GB 3.5” 15k rpm SAS disks- nfs server for users’ Maildirs, and the customer care mail database.
DL140 G3, dual 2.67 GHz Xeon 5150 CPUs, 8 GB RAM, 2x 100 GB 7.2 krpm SATA disks (for booting from only) x2- user-facing servers - the first hosting a number of Xen instances for people to read mail using mutt or adjust their procmail setups, as well as pop3 and imap servers, and the second running webmail and the web frontend for the customer care mail system.
mm, tasty.
Unfortunately, I don’t have a budget of fifteen grand to spend on the above, so I’m going to make do with just the one box. In fact, it’s worse than that, as I’m also going to have to use this machine as a replacement for our ftp server, our “friends’n’family” webserver, and as a backup server (connected to a nice lto2 tape array).
As a result, I plumped for the following:
HP DL320s, single 2.67 GHz Xeon 3070 CPU, 4 GB RAM- with an upgrade to a 512 MB battery-backed write cache
2x 72 GB 15k rpm 3.5” SAS disks
- a RAID 1 array for the mail queue and system partitions
10x 250 GB 7.2k rpm 3.5” SATA disks
- for users’ Maildirs
Total price was about £3,000 - a fifth of the cost of doing it right.

Things won’t be so awful for our technical staff - I’ll export their Maildirs to their own Xen instances on our big development server, laganside - so they can read their mail nice and quickly there. And I’ll probably inject the half-million message mailshots from infuse, a Xen instance elsewhere on our network. Even so - the new setup will merely provide a noticeable improvement to our users, rather tham being “zomg ultra-turbo-plus-plus!”. bah!
My task for the rest of week is to thoroughly benchmark the new machine, dubbed flurry. In particular, I’m interested to see the difference in speed between the various disk array setups that are open to me - JBOD (2.5 TB available), RAID 1+0 (1.25 TB available), RAID 5 (2.25 TB available), and RAID 6 (2.0 TB available).
My instinct is to go for either JBOD or RAID 6 - two disk failures will kill a RAID 5 array, and has a 50% chance of killing a RAID 1+0. With that number of disks, from the same manufacturer (a number of different batches, though), and subject to the same physical environment, the chances of experiencing multiple disk failures is higher than I’d like. I’m willing to be persuaded otherwise if the performance penalty for RAID 6 turns out to be huge, though.
Anyway, flurry has now been running memtest86+ for just over 24 hours, so it’s time for me to go start the benchmarking. hurrah!
I’ve bought myself a Canon 5D digital SLR. I’ve only been out to use it once so far, but photos are at my new gallery-thing, at http://gallery.nothovel.net/ (and an RSS feed of my favourite images is available at http://gallery.nothovel.net/rss/favourites ).
I’m incredibly impressed with the camera. It’s a bit smaller than my film SLR, but a bit heavier (well, of course it is - a film camera is an empty box, whereas a digital camera is stuffed full of electronics). Other than that, they operate in an almost identical manner - other than that the DSLR can change ISO on the fly! hurrah!
Now, I’ve been scanning (mostly self-developed b&w) film for a couple of years; my Nikon film scanner turns out images that are almost 9 megapixels in size, while the DSLR has almost 13 megapixels. The difference in quality, is /much/ huger than a mere 20% jump in horizontal resolution would suggest.
9 mpix is high-enough resolution to see individual grain clusters at iso 400 or above, and even at iso 100, non-grainy, well-focused bits didn’t look /sharp/ when viewed at 1:1. This weekend, with the DSLR, and using my two crappiest lenses, I was able to achieve sharpness at iso 400 that almost cut your eyeballs to ribbons when compared with the output from the film scanner.
Not only that, but there’s /colour/. Lots and lots and /lots/ of it. In fact, perhaps a bit too much - I’d read that Canon’s higer-end DSLRs tended to produce rather flat colour, because that provides a better base for later post-processing work. I wasn’t really wanting to spend hours on each image in the Gimp, so I set contrast and saturation to one level above their default values (and therefore matched the setup in the 350D and 30D cameras). On a bright sunny morning, this made everything look like it had come from a crappy holiday brochure, which I’m not really sure I like.
Ah well, at least I’ve got plenty of new things to learn about :)
Sarah’s been busy in the studio, taking photos of various gorgeous people for her project on body modification. Yesterday’s session produced around a gigabyte of images, which were uploaded to wintermute for safekeeping.
Unfortunately, some of them didn’t turn out properly. In this one, for example, the studio flash didn’t fire - so the image was taken with the modelling lights only. Therefore, it’s underexposed and the colour balance is badly off.

What could I do to help?
Well, the images are on a remote server, and they’re almost 4 MB each. I didn’t fancy having to download, edit, then re-upload them - so that limited me to the command-line tools available in the Imagemagick suite. In particular, I’ve used the “mogrify” and “convert” commands - “mogrify” changes the original image, whereas “convert” saves the changes to a new file.
Step one: auto-rotationSarah’s Canon 350D camera has a sensor that detects if a photo is taken in portatrait orientation, rather than landscape. This information is encoded in the EXIF data within the jpeg headers, and can be used to rotate any images that need it without losing any quality:
jhead -autorot *.jpgOkay, the image is now the right way up, but it’s still very dark. Imagemagick’s “normalize” option will spread out the colour values over the full range, increasing contrast and (usually) helping to correct colour balances (note that this is called “Auto Levels” in the GIMP and Photoshop:
mogrify -normalize *.jpg
Hm, those colours are awful - he looks jaundiced and spotty and hungover, nothing like his appearance in the properly-lit pictures - and the damage is probably beyond repair without lots of tedious pixel-level editing. The range of tones is decent enough, though, and there’s plenty of contrast - so lets try converting that one black and white:
mogrify -monochrome 1213.jpg
Ooops. That’s a dithered two-colour image - we want a grayscale. The “grayscale” or “desaturate” option in GUI graphics programs wil almost always average the red, green, and blue values of each pixel - thus creating a b&w image that contains information from all three colour channels. to replicate this in the gimp, we use the “modulate” option, which lets us specify a percentage change for brightness, saturation, and hue:
mogrify -modulate 100,0,100 1213.jpg
Not much better, is it? Unfortunately, the colour image was rather noisy - but that noise was largely confined to the colour of each pixel, rather than the brightness of it. So, rather than just extracting the r, g, and b values from each pixel, lets try extracting the “intensity” and “luminosity” values from each pixel.
convert 1213.jpg -fx 'luminosity' 1213.luminosity.jpg
convert 1213.jpg -fx 'intensity' 1213.intensity.jpg
For a portrait, the luminosity version is probably the better - but either is a huge improvement over a simple desaturation. Hurrah!
Step four: noise reductionThere’s still a little bit of noise in that image - so, let’s try to smooth that out using imagemagick’s “enhance” option:
convert 1213.luminosity.jpg -enhance 1213.enhanced.jpg
That’s an improvement - but at the cost of a little loss of detail in his hair, and a general softening of the image. Personally, I think the original was fine - but Imagemagick’s noise enhancement can work wonders on images that would otherwise have been ruined by grain.
Step five: sharpeningAll the cool kids apply an unsharp mask to their images, so lets do that, too. Imagemagick can auto-select appropriate values, which is really handy for doing a reasonable-enough job on huge batches of images:
convert 1213.unsharp.jpg -unsharp 1213.unsharp.jpg
Becuase jpegs are lossy, the manipulations we’ve made in steps two to four should really be done at the same time, in order to preserve maximum image quality (and also to keep the file size down):
convert 1213.jpg -normalize -fx 'luminosity' -enhance -unsharp 0 1213.processed.jpg
easy, eh?
…but even when it was working, it had never done a very good job on my living room carpet, which is exceptionally cheap and nasty. It’s woven from artificial fibres, which moult and then get stuck in the remaining pile, forming hairballs. The hoover wasn’t powerful enough to do more than push those around; it functioned more as an old-fashioned carpet sweeper, and left the place looking almost as filthy as if I didn’t bother to hoover at all.
So; I needed a new hoover. Immediately discounting the overpriced dyson-type tat, I went to look for one of those “henry” devices that office cleaners use. I reasoned that they cost about £25, so were essentially disposable. Better still - though they may not have nilfisk-grade filtration, or any other “features” to speak of, they are powerful and reliable enough for industrial use, and are therefore easily good enough for use in the not.nothovel.
Unfortunately, that idea seems to have occurred to more than a few other people, and Henry prices have quadrupled in the last five years. bah.
So, thought I, if I’m going to spend that much, why not spring for a Roomba? Well, they turned out to be £200-ish (or £150 on ebay) - but Paddy mentioned that B&Q had had some cheaper ones in last time he was there…
One trip to B&Q later, and I was now in posession of their own brand Roomba-clone for £30 (it’s not listed on their website, unfortunately - the closest match is some £900[!] Karcher thing).
I’m really rather impressed with it; it doesnt do the return-to-base-after-a-set-time thing like the roomba - it just keeps on wandering about until the battery runs out (generally in about an hour). That would probably be a problem in an office environment, but is fine in my living room. The dust compartment is small - but after a month without hoovering, it was only just about full after the first hour-long run, so it’ll be more than enough for normal use.
Best of all, is that it’s really amusing to watch. I sort-of wish I had cats, to see how they’d react to it…
Anyway; pictures of it are boring - it just looks like a roomba. What you really want is this video (6.8 MB)