As queued by Slashdot, I found an excellent summary (thanks StorageMojo) of a paper presented at the USENIX conference. The paper, which wonthe Best Paper Award, is titled, Disk Failures in the Real World: What Does an MTTF of 1,000,000 Hours Mean to You?.
A summary of the summary:
- Vendor MTBF numbers can be off by as much as 15 times (on average 3 times)
- Consumer drives and enterprise drives have similar failure rates (despite very different advertised failure rates)
- Drive failures don’t follow infant mortality or burn-in patterns, but increase as a function of time
- Complex RAID systems are not as secure as you might think, since the failure of one drive is likely to indicate that the failure of another drive is coming soon