On the exorcism of Ghost Records

There was a question earlier on one of the SQL forums as to whether or not Ghost Cleanup overwrote the deleted rows when it ran so as to ensure no one could read that data again.

Now I could just reference Paul Randal‘s blog post where he described Ghost Cleanup in depth and leave it there, but where would the fun be in that? Smile

So, now that everyone’s read Paul’s blog post, time for some fun with undocumented commands to prove (again) what we already know (and what Paul’s already shown).

Setup code:

CREATE TABLE TestingCleanup (
ID INT IDENTITY PRIMARY KEY,
Description VARCHAR(20),
Filler CHAR(50) DEFAULT ''
);
GO

INSERT INTO TestingCleanup (Description)
VALUES ('One'), ('Two'), ('Three'), ('Four'), ('Five'), ('Six'), ('Seven'), ('Eight'), ('Nine'), ('Ten')

-- Find the page number that the table is on

SELECT OBJECT_ID('TestingCleanup') -- 1399012065
DBCC IND(11,1399012065,1)
-- The IAM is 1:309 and the data page is 1:308

Now I’m going to delete half the table and then go and look at the page. The ghost cleanup can’t run until the transaction commits, so running this within a transaction gives a chance to go and look at how things work.

BEGIN TRANSACTION
DELETE FROM TestingCleanup WHERE ID%2 = 0 -- delete the even rows
SELECT ID, Description FROM TestingCleanup -- 5 rows
DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)
COMMIT TRANSACTION

The delete removes all the rows with even identity values and the select returns only 5 rows, as expected

Dump type 1 for DBCC Page gives the header and then each row separately in binary. I’m editing out uninteresting parts of the output to keep things manageable.

Page header:

m_pageId = (1:308)                   m_headerVersion = 1                  m_type = 1
Metadata: ObjectId = 1399012065      m_prevPage = (0:0)                   m_nextPage = (0:0)
m_ghostRecCnt = 5
m_tornBits = 0

Five ghosted records (m_ghostRecCnt = 5), corresponding to the 5 deleted rows. Now for the rows (I’m only going to show the first two)

Slot 0, Offset 0x60, Length 68, DumpStyle BYTE

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000FD4A060

0000000000000000:   30003a00 01000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   004f6e65 ††††††††††††††††††††††††††††.One

Slot 1, Offset 0xa4, Length 68, DumpStyle BYTE

Record Type = GHOST_DATA_RECORD      Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000FD4A0A4

0000000000000000:   3c003a00 02000000 20202020 20202020 †<.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   0054776f ††††††††††††††††††††††††††††.Two

The first row, with an ID of 1 and Description of ‘One’, is a Primary Record. The second row, with an ID of 2 and Description of ‘Two’ is a Ghost Data Record. Deleted but not removed from the rows on the page. The same goes for the rest of the rows, the ones that were deleted (even values) are Ghost Data Records, the other are Primary Records. (aside, that begs the question, if there are Primary Data Records, are there Secondary or Tertiary? If so, where?)

Then if we look at the bottom of the DBCC Page output there’s the slot array (the offset of the rows on the page) there’s only 5 entries in there, despite there still being 10 displayed in the record section of DBCC Page. That’s because 5 of the aren’t real rows any longer.

Now let’s exorcise those ghosts.

DBCC ForceGhostCleanup -- Undocumented. Do not use in production

DBCC TRACEON (3604)
DBCC PAGE(11,1,310,1)
DBCC TRACEOFF (3604)

Now when we look at the header, the ghostRecCnt is 0, the ghosts are gone and DBCC Page only returns the 5 remaining rows thusly

</code>Slot 0, Offset 0x60, Length 68, DumpStyle BYTE

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 68
Memory Dump @0x000000000B8DA060

0000000000000000:   30003a00 01000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010044 †          .....D
0000000000000040:   004f6e65 ††††††††††††††††††††††††††††.One

Slot 1, Offset 0xe8, Length 70, DumpStyle BYTE

Record Type = PRIMARY_RECORD         Record Attributes =  NULL_BITMAP VARIABLE_COLUMNS
Record Size = 70
Memory Dump @0x000000000B8DA0E8

0000000000000000:   30003a00 03000000 20202020 20202020 †0.:.....
0000000000000010:   20202020 20202020 20202020 20202020 †
0000000000000020:   20202020 20202020 20202020 20202020 †
0000000000000030:   20202020 20202020 20200300 00010046 †          .....F
0000000000000040:   00546872 6565††††††††††††††††††††††††.Three          

So Two’s gone. Or has is?

DBCC Page with dump type 2 dumps the entire page out without any interpretation of rows (the header is still interpreted, but there’s nothing new there hence I’m omitting it). So…

DBCC TRACEON (3604)
DBCC PAGE(11,1,310,2)
DBCC TRACEOFF (3604)
</code>DATA:
Memory Dump @0x000000000D6EA000

<snip>
000000000D6EA070:   20202020 20202020 20202020 20202020 †
000000000D6EA080:   20202020 20202020 20202020 20202020 †
000000000D6EA090:   20202020 20202020 20200300 00010044 †          .....D
000000000D6EA0A0:   004f6e65 3c003a00 02000000 20202020 †.One<.:.....
000000000D6EA0B0:   20202020 20202020 20202020 20202020 †
000000000D6EA0C0:   20202020 20202020 20202020 20202020 †
000000000D6EA0D0:   20202020 20202020 20202020 20200300 †              ..
000000000D6EA0E0:   00010044 0054776f 30003a00 03000000 †...D.Two0.:.....
000000000D6EA0F0:   20202020 20202020 20202020 20202020 †
000000000D6EA100:   20202020 20202020 20202020 20202020 †
000000000D6EA110:   20202020 20202020 20202020 20202020 †
000000000D6EA120:   20200300 00010046 00546872 65653c00 †  .....F.Three<.
<snip>

The even-numbered data is still there. The rows are deleted, the page has only 5 rows on it and it requires either a raw binary dump of the page or some work with a hex editor to see them, but the data that was stored in the now deleted rows is still there and will be there until its overwritten by new rows from somewhere.

Does Ghost Cleanup overwrite old data to prevent people reading it later? No, most certainly not.  Now it’s not something any user can do. Reading the data file requires permissions to either stop SQL, detach the DB or take the DB offline (or a hex editor that ignores file locks) and, unless a hex editor is standard on servers, permission to install software (or a hex editor that doesn’t require installation) and DBCC Page requires sysadmin rights. It’s still something to bear in mind if you’re ever working with confidential or classified data that has to be irretrievable after being deleted. It’s far from trivial to ensure that in SQL, especially if working with variable-length columns (where an update might not overwrite the old data if a page split/forwarded record results).

5 Comments

  1. Michael Villegas

    Greate post Gail, it’s really informative. I wonder if it is possible to restore deleted data by mistake from the ghosted records. Have you ever tried that?

    Reply
  2. Gail (Post author)

    I suppose you could, but you’d have to check each page of the table and manually create the insert statements to put the data back. Past a few pages it becomes fairly infeasible.

    Reply
  3. Meher

    Great Post Gail. Going indepth clarifies lot of things.

    Thanks
    Meher

    Reply
  4. Mark

    FoxPro was much more forgiving. You could simply issue a command to undelete all of the table’s records!

    Reply
  5. Pingback: SFTW - SQL Server Links, News and Community Stuff This Week

Leave a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.