Back when I did hardware design in th ’80s, DRAM errors, ECC and worrying about alpha particles in DRAM packaging was a big deal, which we all happily though was put behind us. Maybe not; a study by Google shows:
The headline conclusion in the study is that DRAM errors are vastly more common than is typically assumed. Nearly one-third of the individual machines in the study saw at least one error per year, a rate that’s orders of magnitude higher than previous research had indicated. To give some hard numbers, previous studies report 200 to 5,000 failures in time per billion hours of operation (FIT) per Mbit; Google found that their numbers were between 25,000 and 75,000 FIT per Mbit. On the bright side, most of these errors are the result of a few bad apples. About 8 percent of DIMMs were responsible for over 90 percent of the errors, as DIMMs that produced one error were hundreds of times more likely to produce another error in the same month. Another lesson that Google learned is that older hardware is much more likely to fail; at about 20 months the error rate shoots up drastically.