There’s been a rash of database-corruption posts recently. On the two forums that I read, there have so far been around 12 corruption-related problems so far this month. That’s frightening. Many of the problems were only solved either with some data loss or with a lot of work, or both. So what are the two things that most of these cases had in common.
- No backups
- No corruption checks
Backups
Having no backups is pretty inexcusable, whether its because no backups were scheduled or whether it’s because the backups were failing. Paul’s written about the importance of the right backups, and I’m not going to repeat him here.
Just having backups is not sufficient. The backups have to be restorable. A backup that’s not restorable is, in my opinion, worse than no backup at all. That means that the jobs have to be checked to ensure that they did run without error and the backups have to be restored somewhere to test them. Ultimately that’s the only way to be 100%, absolutely sure that a backup is restorable.
In SQL 2005 and higher, if the database has checksum page protection and the backup is done with the checksum option then damaged pages will be detected at backup time and damage to the backup set will be detected by RESTORE VERIFYONLY. This may not be the case if the backup was not made with the checksum option.
The other thing that has to be considered is the integrity of the backup media. I’ve seen a case where an IO problem not only damaged the database but also damaged the backup file to the point where it would not restore, despite the backup being on a different LUN of the SAN with no drives shared between it and the data LUN
So, to conclude this section, do you have backups and are you sure your database backups can be restored? Have you tried recently?
Corruption checks
If corruption checks aren’t done regularly, there’s a good chance that any corruption that occurs will remain hidden for quite some time. If this does happen, and the database backups are not made with the CHECKSUM option, then the corrupt pages will get backed up. This causes obvious problems when trying to fix corruption by restoring from a backup.
I’ve seen people say that if they have the Page Verifications et to checksum, there’s no need to run CheckDB. That is not true. Other than backups (when taken with the CHECKSUM option) there is no other operation that reads and checks all of the page checksums in the database. Just having the page checksums in place doesn’t help if there’s never a check made to see that all of them are still correct.
A good database maintenance routine must include running CheckDB regularly or, if there isn’t time to run checkDB, a mixture of CheckAlloc, CheckCatalog and CheckTable on all tables.
How often? Often enough that, should corruption be detected, restoring from a clean backup is an option.
In Conclusion
I would consider the protection of data to be a DBA’s top priority. So backups and integrity checks should be top of the list of things to check and double check, because slipping in this area could have very nasty consequences for the business.
Do you have backups of all your critical production systems?
Are they restorable?
Are you sure?
Do you have corruption in any of your databases?
Are you sure?