On the exorcism of Ghost Records

There was a question earlier on one of the SQL forums as to whether or not Ghost Cleanup overwrote the deleted rows when it ran so as to ensure no one could read that data again.

Now I could just reference Paul Randal‘s blog post where he described Ghost Cleanup in depth and leave it there, but where would the fun be in that? Smile

So, now that everyone’s read Paul’s blog post, time for some fun with undocumented commands to prove (again) what we already know (and what Paul’s already shown).

Setup code:

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
CREATE TABLE TestingCleanup (
ID INT IDENTITY PRIMARY KEY,
Description VARCHAR(20),
Filler CHAR(50) DEFAULT ''
);
GO
 
INSERT INTO TestingCleanup (Description)
VALUES ('One'), ('Two'), ('Three'), ('Four'), ('Five'), ('Six'), ('Seven'), ('Eight'), ('Nine'), ('Ten')
 
-- Find the page number that the table is on
 
SELECT OBJECT_ID('TestingCleanup') -- 1399012065
DBCC IND(11,1399012065,1)
-- The IAM is 1:309 and the data page is 1:308

Now I’m going to delete half the table and then go and look at the page. The ghost cleanup can’t run until the transaction commits, so running this within a transaction gives a chance to go and look at how things work.

1
2
3
4
5
6
7
BEGIN TRANSACTION
DELETE FROM TestingCleanup WHERE ID%2 = 0 -- delete the even rows
SELECT ID, Description FROM TestingCleanup -- 5 rows
DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)
COMMIT TRANSACTION

The delete removes all the rows with even identity values and the select returns only 5 rows, as expected

Dump type 1 for DBCC Page gives the header and then each row separately in binary. I’m editing out uninteresting parts of the output to keep things manageable.

1
2
3
4
5
6
Page header:
 
m_pageId = (1:308)                   m_headerVersion = 1                  m_type = 1
Metadata: ObjectId = 1399012065      m_prevPage = (0:0)                   m_nextPage = (0:0)
m_ghostRecCnt = 5
m_tornBits = 0

Five ghosted records (m_ghostRecCnt = 5), corresponding to the 5 deleted rows. Now for the rows (I’m only going to show the first two)

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
Slot 0, Offset 0x60, Length 68, DumpStyle BYTE
 
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000FD4A060
 
0000000000000000:   30003a00 01000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   004f6e65 ††††††††††††††††††††††††††††.One
 
Slot 1, Offset 0xa4, Length 68, DumpStyle BYTE
 
Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000FD4A0A4
 
0000000000000000:   3c003a00 02000000 20202020 20202020 †<.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   0054776f ††††††††††††††††††††††††††††.Two

The first row, with an ID of 1 and Description of ‘One’, is a Primary Record. The second row, with an ID of 2 and Description of ‘Two’ is a Ghost Data Record. Deleted but not removed from the rows on the page. The same goes for the rest of the rows, the ones that were deleted (even values) are Ghost Data Records, the other are Primary Records. (aside, that begs the question, if there are Primary Data Records, are there Secondary or Tertiary? If so, where?)

Then if we look at the bottom of the DBCC Page output there’s the slot array (the offset of the rows on the page) there’s only 5 entries in there, despite there still being 10 displayed in the record section of DBCC Page. That’s because 5 of the aren’t real rows any longer.

Now let’s exorcise those ghosts.

1
2
3
4
5
DBCC ForceGhostCleanup -- Undocumented. Do not use in production
 
DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)

Now when we look at the header, the ghostRecCnt is 0, the ghosts are gone and DBCC Page only returns the 5 remaining rows thusly

01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
18
19
20
21
22
23
</code>Slot 0, Offset 0x60, Length 68, DumpStyle BYTE
 
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000B8DA060
 
0000000000000000:   30003a00 01000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   004f6e65 ††††††††††††††††††††††††††††.One
 
Slot 1, Offset 0xe8, Length 70, DumpStyle BYTE
 
Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 70
Memory Dump @0x000000000B8DA0E8
 
0000000000000000:   30003a00 03000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010046 †          .....F
0000000000000040:   00546872 6565††††††††††††††††††††††††.Three         

So Two’s gone. Or has is?

DBCC Page with dump type 2 dumps the entire page out without any interpretation of rows (the header is still interpreted, but there’s nothing new there hence I’m omitting it). So…

1
2
3
DBCC TRACEON (3604)
DBCC PAGE(11,1,310,2)
DBCC TRACEOFF (3604)
01
02
03
04
05
06
07
08
09
10
11
12
13
14
15
16
17
</code>DATA:
Memory Dump @0x000000000D6EA000
 
<snip>
000000000D6EA070:   20202020 20202020 20202020 20202020 †
000000000D6EA080:   20202020 20202020 20202020 20202020 †
000000000D6EA090:   20202020 20202020 20200300 00010044 †          .....D
000000000D6EA0A0:   004f6e65 3c003a00 02000000 20202020 †.One<.:.....
000000000D6EA0B0:   20202020 20202020 20202020 20202020 †
000000000D6EA0C0:   20202020 20202020 20202020 20202020 †
000000000D6EA0D0:   20202020 20202020 20202020 20200300 †              ..
000000000D6EA0E0:   00010044 0054776f 30003a00 03000000 †...D.Two0.:.....
000000000D6EA0F0:   20202020 20202020 20202020 20202020 †
000000000D6EA100:   20202020 20202020 20202020 20202020 †
000000000D6EA110:   20202020 20202020 20202020 20202020 †
000000000D6EA120:   20200300 00010046 00546872 65653c00 †  .....F.Three<.
<snip>

The even-numbered data is still there. The rows are deleted, the page has only 5 rows on it and it requires either a raw binary dump of the page or some work with a hex editor to see them, but the data that was stored in the now deleted rows is still there and will be there until its overwritten by new rows from somewhere.

Does Ghost Cleanup overwrite old data to prevent people reading it later? No, most certainly not.  Now it’s not something any user can do. Reading the data file requires permissions to either stop SQL, detach the DB or take the DB offline (or a hex editor that ignores file locks) and, unless a hex editor is standard on servers, permission to install software (or a hex editor that doesn’t require installation) and DBCC Page requires sysadmin rights. It’s still something to bear in mind if you’re ever working with confidential or classified data that has to be irretrievable after being deleted. It’s far from trivial to ensure that in SQL, especially if working with variable-length columns (where an update might not overwrite the old data if a page split/forwarded record results).

5 Comments

  1. Michael Villegas

    Greate post Gail, it’s really informative. I wonder if it is possible to restore deleted data by mistake from the ghosted records. Have you ever tried that?

    Reply
  2. Gail (Post author)

    I suppose you could, but you’d have to check each page of the table and manually create the insert statements to put the data back. Past a few pages it becomes fairly infeasible.

    Reply
  3. Meher

    Great Post Gail. Going indepth clarifies lot of things.

    Thanks
    Meher

    Reply
  4. Mark

    FoxPro was much more forgiving. You could simply issue a command to undelete all of the table’s records!

    Reply
  5. Pingback: SFTW - SQL Server Links, News and Community Stuff This Week

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.