5 Most Common Critical SQL Integrity Issues

Presented by Robert L Davis, a review of and recommendations for handling the 5 most common SQL Server integrity issues. You may also download the full white paper, 5 Most Common Critical SQL Integrity Issues

Corruption in Allocation Pages
Corruption in Clustered Indexes and Heaps
Corruption in Nonclusered Indexes
Chain Linkage Problems
Data Purity Errors

Corruption in Allocation Pages

While we do not explain the purpose of the different allocation pages hre, it is important to know how often they repeat throughout the files. If the corruption is in a very high number page, some calculations are needed to determine what type of page is affected by the corruption. It is only necessary to calculate this when there is no Object ID reported in the error messages. If it is associated with an object, it is not an allocation page.

The formula for determining the type of page experiencing contention is

File Header: Page ID = 0 (does not repeat)
PFS: Page ID = 1 or Page ID % 8088
GAM: Page ID = 2 or Page ID % 511232
SGAM: Page ID = 3 or (Page ID – 1) % 511232
DCM: Page ID = 6 or (Page ID – 6) % 511232
BCM: Page ID = 7 or (Page ID – 7) % 511232

The following query will calculate the page ID for you and tell you if it is an allocation page:

Declare @Pageint ;

Set @Page=<page ID>;

Select PageType= CASE
      WHEN @Page= 0 Then ‘File Header Page’
      WHEN @Page= 1 Or @Page% 8088 = 0
            THEN ‘PFS Page’
      WHEN @Page= 2 Or @Page% 511232 = 0
            THEN ‘GAM Page’
      WHEN @Page= 3 Or (@Page– 1)% 511232 = 0
            THEN ‘SGAM Page’
      WHEN @Page= 6 Or(@Page– 6)% 511232 = 0
            THEN ‘DCM Page’
      WHEN @Page= 7 Or(@Page– 7)% 511232 = 0
            THEN ‘BCM Page’
      ELSE ‘Not an allocation page’
END;

Once you know that you are dealing with an allocation page, deciding on a course of action is simple. Allocation pages cannot be single-page restored (more on this process later). They cannot be repaired by the CHECKDB repair process. You are facing a restore of the whole database. The good news is that you can back up the tail of the log file (if in full or bulk-logged recovery model) so that you can restore with no data loss. This is assuming that you have good backups to restore from. If you are backing up with the CHECKSUM option, the odds of having good backups to restore from goes way up.

If you do not have backups to restore from or you are in simple recovery mode and would lose all data since the last full or differential backup, your last remaining option is to try to BCP as much data as possible out of the database, manually recreate the database and all objects, and then reimport the data.

Corruption in clustered Indexes and Heaps

If the corruption occurs in index ID of 0 or 1, then you are dealing with a heap or a clustered index. One of the key things to keep in mind here is that a heap or clustered index is the base underlying data. You cannot fix corruption here by rebuilding or recreating anything. You will likely hear recommendations to drop and recreate the clustered index or rebuild the clustered index or table. These actions cannot make missing data magically reappear.

If you run DBCC CHECKDB on the sample corrupt database CorruptDB (Demo_ClusteredIndexCorruption.zip (5.03 MB)), you will receive a message similar to the below message:

Object ID 245575913, index ID 1 … Page (1:298) could not be processed. See other errors for details.

The key pieces of information here is the Object ID (resolves to dbo.FactInternetSales), index ID of 1 (clustered index), and page 1:298. It is a single page of a clustered index.

If the corruption is in a heap or clustered index, the proper way to fix it is through restore. If the corruption is not too wide-spread, single-page restores is the quickest way to get a large database fully back online. If the corruption is wide-spread, it may be quicker to restore the full database than to perform a series of single-page restores. At that point, it’s a judgment call.

If you decide to try a single-page restore, the first thing you should do is to verify your backups. You will need a full or differential backup with a good version of the page in it, and you will need all transaction log backups, including the current tail of the log, since the full or differential backup to bring the page current with the rest of the database. The transaction log backups are absolutely required. Do not even try a single page restores without all of the required backups.

We identified above in the sample corrupt database that it is a single-page of a clustered index. After verifying that I have proper backups to support a single-page restore, I start the restore process. The steps I follow are:

Put the database in RESTRICTED_USER mode
Restore the specific page from a full or differential backup
Restore all transaction log files since the full or differential backup using NORECOVERY
Back up the tail of the log file
Restore the tail of the log with NORECOVERY
Recover the database
Run DBCC CHECKDB to ensure corruption is gone
Set the database back to MULTI_USER

If you are new to performing restores, this may seem daunting the first few times because you the restore wizard does not support the options you need. You must perform the restores using T-SQL. The SQL commands are actually very simple. The initial restore of the page from the full or differential backup requires specifying which page to restore. The transaction log backups are very simple because you don’t have to specify any special settings other than NORECOVERY.

Let’s walk through the restore process for the sample corrupt database:— Only the one page is corrupt, so let’s do a page restore
— Switch to master to restore the damaged page
USE master;
Go

— Set the database in restricted user mode to keep average users out
Alter Database CorruptDBSet Restricted_UserWith Rollback Immediate;
Go

— Restore the corrupt page from the good full backup
Restore Database CorruptDB
Page = ‘1:298’
From Disk = ‘<Path to full backup>\CorruptDB.bak’;
Go

— Restore the 1st pre=existing log backup to bring the page current
— SQL knows which transations to apply, no need to specify any special commands
Restore Log CorruptDB
From Disk = ‘<Path to log backups>\CorruptDB.trn’
With NoRecovery;
Go
— If there were more pre-existing log backups, we would restore them in order

— Now backup the tail of the log…
Backup Log CorruptDB
To Disk = ‘<Path to tail log backup>\CorruptDB_LOG_TAIL.trn’
With init;
Go

— Restore the tail of the log bringing the page current
Restore Log CorruptDB
From Disk = ‘<Path to tail log backup>\CorruptDB_LOG_TAIL.trn’
With NoRecovery;
Go

— Finally, recover the database to bring it online
Restore Database CorruptDBWith Recovery;
Go

— Recheck the database for corruption again
DBCC CheckDB(CorruptDB) With All_ErrorMsgs, No_InfoMsgs, TableResults;
Go

— Allow users back in
Alter Database CorruptDBSet Multi_User;
Go

If CHECKDB reports that there is no more corruption, then you know you have been successful. You quickly recovered corrupt data with no data loss and minimal downtime. If it still reports that the corruption exists or if new corruption is found, it’s time to consider doing a full restore instead of a single-page restore.

Corruption in Nonclustered Indexes

After you have dealt with several incidents of corruption, you will find yourself hoping for an index ID greater than 1 when you encounter corruption. Corruption in nonclustered indexes is the easiest form of corruption to fix as long as the underlying heap or clustered index is not corrupted.

When the error message from DBCC CHECKDB tells us an index ID, we can tell what kind of index it is by the ID value. A heap is index ID = 0, a clustered index is index ID = 1, and a nonclustered index is index ID > 1. To fix this corruption scenario, we will need to know the names of the object and index involved.

If we run DBCC CHECKDB on the sample corrupt database (CorruptionDemo_AdventureWorksDW2012.zip (12.22 MB)), it will report an Object Id = 341576255 and an index ID = 2. We can query sys.indexes to determine both of these values:

Select Object_Name(object_id) As TableName,
name As IndexName
From sys.indexes
Where object_id= 341576255
andindex_id= 2;

Fixing this form of corruption is simple. You need to recreate the index. Unfortunately, you cannot rebuild the index because the rebuild process uses the existing index to build the new one. The same is true for creating the index with the DROP EXISTING option. You have two options for this. You can drop the existing index and then recreate it new or you can disable the existing index and rebuild it. When you rebuild a disabled index, it uses the underlying data rather than the existing index to rebuild it.

Fixing the sample corrupt database is very simple:

-- Drop and create
Drop Index dbo.FactResellerSales.IX_FactResellerSales_CurrencyKey;

Create Index IX_FactResellerSales_CurrencyKey
On dbo.FactResellerSales(CurrencyKey);

Chain Linkage Problems

The next integrity issue is a metadata corruption issue. It is possible in certain situations for the links from one page to another to get corrupted. All pages in an object or index have pointers to the previous and next pages in the object or index. If these links point to the wrong page, it can result in pages getting skipped and can even cause an infinite loop in a scan. One way this can occur is if the server crashes in the middle of pages getting flushed to disk. If some pages are updated and others are not, some of the pages may point to the wrong page.

It is tempting to attempt to perform a repair of the table with the invalid links. The repair option here fixes linkage problems by deleting the bad pages and making sure everything that remains has valid links. In other words, data loss. Do not try this option unless it is your last resort or if you don’t care about data loss.

There are a couple of tactics you can take with this issue. If the issue is with the base table (heap or clustered index), you can try to bcp the data out of the table, truncate or drop and recreate the table, and then reimport the data. This should get rid of all of the tables with linkage problems and leave you with a clean table if that was your only problem. If it is with a nonclustered index, just drop and recreate the index or disable and rebuild the index.

If the linkage issues are coupled with other forms of corruption, I will often opt to go straight to a full restore of the database. Trying to fix multiple forms of corruption can be problematic as one form of corruption can sometimes prevent the preferred fix for another form of corruption. For example, if the linkage problem is with a nonclustered index and the clustered index also has corruption, then the clustered index corruption would prevent us from recreating the nonclustered index.

In short, the best options are to reload or recreate the table or index.

Data Purity Errors

Data purity issues are when the value stored in a column is out of range for the data type. This is a rare error for databases created in SQL Server 2005 or newer. The most common occurrence of this integrity issue is for databases that were originally created in SQL Server 2000 or older. It was a lot easier for invalid values to end up in columns. Datetime, decimal, and approximate data types like float are affected by this issue.

When you upgrade a database to SQL Server 2005 or SQL Server 2008 from an earlier version of SQL Server (or one that has been upgraded in the past), you should run DBCC CHECKDB with the DATA_PURITY option. This option will validate the data correctness for its data type. If it finds any data purity errors, you must fix the errors manually are re-execute DBCC CHECKDB again. Once you have discovered data purity issues, you must clean the data and get a successful CHECKDB result.

Once you get a clean result with data purity, the data purity checks will be performed automatically every time DBCC CHECKDB runs. Until you get the clean manual result, you must add the DATA_PURITY option to the command to perform the checks.

Let’s assume that you have a SQL Server 2000 database that you just restored on a SQL Server 2008 instance. The database has been upgraded and now you need to run the data purity checks to make sure the data is correct. If it finds out-of-range data, it will report error 2870:

Msg 2570, Level 16, State 3, Line 1
Page (1:279), slot 1 in object ID 341576255, index ID 1 … (type “In-row data”). Column “TaxRate” value is out of range for data type “float”. Update column to a legal value.

How do you fix these errors then? You have to identify the records with the invalid data and update the out-of-range data with valid values. You may have to inspect the data closely, but once you identify the data and update it, you should be able to complete the CHECKDB with no problems. It can be a tedious and frustrating process to find and fix all of the bad data, but it really is your only option other than deleting the data. Better to fix the invalid values now than to have user queries fail because an implicit conversion fails.