Quantcast
Channel: Aspose.Email Product Family
Viewing all articles
Browse latest Browse all 1367

Corrupt PST files

$
0
0
PST files delivered to a customer appear to be corrupt. Attached to this problem report are two screenshots from the Microsoft Inbox Repair Tool showing an item count discrepancy with the number of items the PST should contain and the PST open in Outlook -- with the correct item count, but no visible items. Also attached is the tool's log file. The log file provided some possible insight into the issue, and is likely related to invalid pointers in the PST's B-tree structure, supported by two findings:

The Item Count and Missing Items

The two screenshots indicate the there are 4712 items in the PST, but when opened in Outlook, although the item count is correct, there are no items shown. An assumption is made that the PST is not zero-bytes, otherwise the customer would have reported that. Additionally, the fact that the repair tool recovered some items also supports this assumption – no data could be recovered if it were not present to be recovered.

The Repair Log

The Microsoft Inbox Repair Tool reports several issues while validating the PST's internal B-tree structure:

**Beginning NDB recovery
 
  **Attempting to open database
 
  **Attempting to validate header
 
  **Attempting to validate AMap
 
  **Attempting to validate BBT
 
    !!Invalid BBT page (bid=36D46, ib=86526976):
      btkeyMin mismatch (read 146E8, parent 146DC)
 
    !!Invalid BBT page (bid=3E87C, ib=95311360):
      btkeyMin mismatch (read 17058, parent 17050)
 
    !!Invalid BBT page (bid=6D78C, ib=146480128):
      btkeyMin mismatch (read 24A38, parent 24A30)
 
    !!Invalid BBT page (bid=CE430, ib=277851648):
      btkeyMin mismatch (read 41DD0, parent 41DC4)
 
    !!Invalid BBT page (bid=D5B9A, ib=287756288):
      btkeyMin mismatch (read 43F94, parent 43F64)
 
    !!Invalid BBT page (bid=105284, ib=337044992):
      btkeyMin mismatch (read 50300, parent 502D0)
 
    !!Invalid BBT page (bid=15DA28, ib=433580544):
      btkeyMin mismatch (read 6AC54, parent 6AC4C)
 
    !!Invalid BBT page (bid=1AE90C, ib=531571200):
      btkeyMin mismatch (read 82208, parent 82200)
 
    !!Invalid BBT page (bid=1C7290, ib=553130496):
      btkeyMin mismatch (read 88C2C, parent 88C24)
 
    !!Invalid BBT page (bid=20F316, ib=636187648):
      btkeyMin mismatch (read 9D5BC, parent 9D5B4)
 
    !!Invalid BBT page (bid=23741C, ib=678841856):
      btkeyMin mismatch (read A7BF8, parent A7BC8)
 
    !!Invalid BBT page (bid=26B8BA, ib=738278400):
      btkeyMin mismatch (read B7DE8, parent B7DE0)
 
    !!Invalid BBT page (bid=292B2C, ib=794512384):
      btkeyMin mismatch (read C3950, parent C3948)
 
  **Attempting to rebuild BBT
 
    **Attempting to scavenge for blocks
 
  **Attempting to validate NBT
 
  **Attempting to validate BBT refcounts
 
    ??Couldn't find BBT entry in the RBT (146DC)
    ??Couldn't find BBT entry in the RBT (146E6)
    ??Couldn't find BBT entry in the RBT (17050)
    ??Couldn't find BBT entry in the RBT (24A30)
    ??Couldn't find BBT entry in the RBT (24A36)
    ??Couldn't find BBT entry in the RBT (41DCE)
    ??Couldn't find BBT entry in the RBT (43F64)
    ??Couldn't find BBT entry in the RBT (43F7C)
    ??Couldn't find BBT entry in the RBT (44060)
    ??Couldn't find BBT entry in the RBT (4406C)
    ??Couldn't find BBT entry in the RBT (440C4)
    ??Couldn't find BBT entry in the RBT (440CC)
    ??Couldn't find BBT entry in the RBT (502D0)
    ??Couldn't find BBT entry in the RBT (50348)
    ??Couldn't find BBT entry in the RBT (50350)
    ??Couldn't find BBT entry in the RBT (50358)
    ??Couldn't find BBT entry in the RBT (503F8)
    ??Couldn't find BBT entry in the RBT (503FC)
    ??Couldn't find BBT entry in the RBT (5040E)
    ??Couldn't find BBT entry in the RBT (6AC52)
    ??Couldn't find BBT entry in the RBT (6ADE6)
    ??Couldn't find BBT entry in the RBT (6AE04)
    ??Couldn't find BBT entry in the RBT (6AE10)
    ??Couldn't find BBT entry in the RBT (88C24)
    ??Couldn't find BBT entry in the RBT (88C2A)
    ??Couldn't find BBT entry in the RBT (88D90)
    ??Couldn't find BBT entry in the RBT (88D96)
    ??Couldn't find BBT entry in the RBT (88DB4)
    ??Couldn't find BBT entry in the RBT (9D5B4)
    ??Couldn't find BBT entry in the RBT (A7BC8)
    ??Couldn't find BBT entry in the RBT (A7BDE)
    ??Couldn't find BBT entry in the RBT (A7BE4)
    ??Couldn't find BBT entry in the RBT (A7BF6)
    ??Couldn't find BBT entry in the RBT (A7C4A)
    ??Couldn't find BBT entry in the RBT (A7C60)
    ??Couldn't find BBT entry in the RBT (A7C64)
    ??Couldn't find BBT entry in the RBT (A7C68)
    ??Couldn't find BBT entry in the RBT (A7C6C)
    ??Couldn't find BBT entry in the RBT (A7C72)
    ??Couldn't find BBT entry in the RBT (A7DB0)
    ??Couldn't find BBT entry in the RBT (A7DDC)
    ??Couldn't find BBT entry in the RBT (A7DEE)
    ??Couldn't find BBT entry in the RBT (B7DE6)
    ??Couldn't find BBT entry in the RBT (C394E)

The PST file structure has been well-documented in the open source community for a long time, and Microsoft also published a PST File Format SDK several years ago. Essentially there is a header data structure followed by a B-Tree. The following documentation on the PST file format's node and leaf node structures is revealing:

Note: PST File Format and ANSI, Unicode, 32-bit, 64-bit

The PST file format differs based on whether or not the file is ANSI/Unicode and 32-bit/64-bit. The examples below are from the 32-bit Unicode PST.

32-bit Index 1 Node

01f0  itemCount       [1 byte]  0x02       in this case
01f1  maxItemCount    [1 byte]  0x29       constant
01f2  itemSize        [1 byte]  0x0c       constant
01f3  nodeLevel       [1 byte]  0x02       in this case
01f8  backPointer     [4 bytes] 0x021eb4   in this case

32-bit Index 1 Leaf Node

01f0  itemCount       [1 byte]  0x1f       in this case
01f1  maxItemCount    [1 byte]  0x29       constant
01f2  itemSize        [1 byte]  0x0c       constant
01f3  nodeLevel       [1 byte]  0x00       defines a leaf node
01f8  backPointer     [4 bytes] 0x01675a   in this case

In the PST file format the nodes actually have backpointers that reference the parent node to which they belong. This is atypical of a B-tree, which is generally unidirectional (parent to child).

However, from this information it would be possible to construct a B-tree structure when reading the nodes from disk. If the itemCount of a node is correct, but the backPointers of the node's child nodes, serialized to disk as the documented leaf nodes, where incorrect, the B-Tree structure may not be constructed correctly upon deserialization of the PST file.

This raises some questions:

  • How would that happen?
  • Why are the item counts correct?
  • Why does the repair tool work?

The educated guesses to these questions are as follows:

  1. Why would the backpointers be invalid?
    • Aspose isn't serializing the b-tree structure to disk correctly. As Microsoft documents on the PST File Format SDK, there are four layers to the PST File Format:
      LayerDescription
      PST LayerThe friendly SDK
      LTP LayerThe abstracted PST data structures
      NDB LayerThe lower-level PST data structures
      Disk LayerSerializing the PST to disk

      If Aspose isn't correctly serializing the data structures to disk, the backpointers may never be set and written to the output stream.

  2. Why are the item counts correct?
    • It's probable that Aspose is correctly emitting valid item counts for nodes.
  3. Why does the repair tool work?
  • If the backpointers are not correct then when the B-tree is deserialized when the PST is opened, the B-tree's nodes will not have their keys set correctly, and thus lookups will fail. This goes back to the above logs:

    **Beginning NDB recovery
     
      **Attempting to open database
     
      **Attempting to validate header
     
      **Attempting to validate AMap
     
      **Attempting to validate BBT
     
        !!Invalid BBT page (bid=36D46, ib=86526976):
          btkeyMin mismatch (read 146E8, parent 146DC)
     
        !!Invalid BBT page (bid=3E87C, ib=95311360):
          btkeyMin mismatch (read 17058, parent 17050)

    Looking at the PST File Format SDK headers (which the Microsoft Inbox Repair Tool likely uses) for the files below, we can see the following:

    • ndb/database.h
    • ndb/database_iface.h
    • util/primitives.h
    • util/btree.h

      ndb/database.h

      486
      template<typename T>
      487
      inline std::tr1::shared_ptr<pstsdk::bbt_page> pstsdk::database_impl<T>::read_bbt_page(const page_info& pi)

      ndb/database_iface.h

      69
      typedef bt_page<block_id, block_info> bbt_page;

      util/primitives.h

      86
      typedef ulonglong block_id;
      87
      typedef block_id page_id;

      util/btree.h

      66
      //! \brief Returns the key at the specified position
      67
      //!
      68
      //! This is specific to this btree_node, not the entire tree
      69
      //! \param[in] pos The position to retrieve the key for
      70
      //! \returns The key at the requested position
      71
      virtualconst K& get_key(uint pos) const = 0;

      Thus if the backpointers are not correct, the keys that form the basis of the b-tree cannot be constructed. However, the b-tree itself could still be constructed in contiguous memory. And that's why I think the item counts are accurate and the repair tool works. The item count is taken from the node that represents the Inbox. However, beneath that the backpointers, and thus tree, are invalid. But since the data still exists in the correct layout on disk, the repair tool is able to walk the data and recreate the backpointers and thus repair the tree, subsequently repairing the PST.


Any assistance Aspose can provide on helping us understand why the PST files are corrupt would be very helpful. Thank you!

Viewing all articles
Browse latest Browse all 1367

Trending Articles