- The attached Microsoft Inbox Repair Tool log file.
- The discrepancy in item count in the attached screenshots.
The Item Count and Missing Items
The two screenshots indicate the there are 4712 items in the PST, but when opened in Outlook, although the item count is correct, there are no items shown. An assumption is made that the PST is not zero-bytes, otherwise the customer would have reported that. Additionally, the fact that the repair tool recovered some items also supports this assumption – no data could be recovered if it were not present to be recovered.
The Repair Log
The Microsoft Inbox Repair Tool reports several issues while validating the PST's internal B-tree structure:
**Beginning NDB recovery |
**Attempting to open database |
**Attempting to validate header |
**Attempting to validate AMap |
**Attempting to validate BBT |
!!Invalid BBT page (bid=36D46, ib=86526976): |
btkeyMin mismatch (read 146E8, parent 146DC) |
!!Invalid BBT page (bid=3E87C, ib=95311360): |
btkeyMin mismatch (read 17058, parent 17050) |
!!Invalid BBT page (bid=6D78C, ib=146480128): |
btkeyMin mismatch (read 24A38, parent 24A30) |
!!Invalid BBT page (bid=CE430, ib=277851648): |
btkeyMin mismatch (read 41DD0, parent 41DC4) |
!!Invalid BBT page (bid=D5B9A, ib=287756288): |
btkeyMin mismatch (read 43F94, parent 43F64) |
!!Invalid BBT page (bid=105284, ib=337044992): |
btkeyMin mismatch (read 50300, parent 502D0) |
!!Invalid BBT page (bid=15DA28, ib=433580544): |
btkeyMin mismatch (read 6AC54, parent 6AC4C) |
!!Invalid BBT page (bid=1AE90C, ib=531571200): |
btkeyMin mismatch (read 82208, parent 82200) |
!!Invalid BBT page (bid=1C7290, ib=553130496): |
btkeyMin mismatch (read 88C2C, parent 88C24) |
!!Invalid BBT page (bid=20F316, ib=636187648): |
btkeyMin mismatch (read 9D5BC, parent 9D5B4) |
!!Invalid BBT page (bid=23741C, ib=678841856): |
btkeyMin mismatch (read A7BF8, parent A7BC8) |
!!Invalid BBT page (bid=26B8BA, ib=738278400): |
btkeyMin mismatch (read B7DE8, parent B7DE0) |
!!Invalid BBT page (bid=292B2C, ib=794512384): |
btkeyMin mismatch (read C3950, parent C3948) |
**Attempting to rebuild BBT |
**Attempting to scavenge for blocks |
**Attempting to validate NBT |
**Attempting to validate BBT refcounts |
??Couldn't find BBT entry in the RBT (146DC) |
??Couldn't find BBT entry in the RBT (146E6) |
??Couldn't find BBT entry in the RBT (17050) |
??Couldn't find BBT entry in the RBT (24A30) |
??Couldn't find BBT entry in the RBT (24A36) |
??Couldn't find BBT entry in the RBT (41DCE) |
??Couldn't find BBT entry in the RBT (43F64) |
??Couldn't find BBT entry in the RBT (43F7C) |
??Couldn't find BBT entry in the RBT (44060) |
??Couldn't find BBT entry in the RBT (4406C) |
??Couldn't find BBT entry in the RBT (440C4) |
??Couldn't find BBT entry in the RBT (440CC) |
??Couldn't find BBT entry in the RBT (502D0) |
??Couldn't find BBT entry in the RBT (50348) |
??Couldn't find BBT entry in the RBT (50350) |
??Couldn't find BBT entry in the RBT (50358) |
??Couldn't find BBT entry in the RBT (503F8) |
??Couldn't find BBT entry in the RBT (503FC) |
??Couldn't find BBT entry in the RBT (5040E) |
??Couldn't find BBT entry in the RBT (6AC52) |
??Couldn't find BBT entry in the RBT (6ADE6) |
??Couldn't find BBT entry in the RBT (6AE04) |
??Couldn't find BBT entry in the RBT (6AE10) |
??Couldn't find BBT entry in the RBT (88C24) |
??Couldn't find BBT entry in the RBT (88C2A) |
??Couldn't find BBT entry in the RBT (88D90) |
??Couldn't find BBT entry in the RBT (88D96) |
??Couldn't find BBT entry in the RBT (88DB4) |
??Couldn't find BBT entry in the RBT (9D5B4) |
??Couldn't find BBT entry in the RBT (A7BC8) |
??Couldn't find BBT entry in the RBT (A7BDE) |
??Couldn't find BBT entry in the RBT (A7BE4) |
??Couldn't find BBT entry in the RBT (A7BF6) |
??Couldn't find BBT entry in the RBT (A7C4A) |
??Couldn't find BBT entry in the RBT (A7C60) |
??Couldn't find BBT entry in the RBT (A7C64) |
??Couldn't find BBT entry in the RBT (A7C68) |
??Couldn't find BBT entry in the RBT (A7C6C) |
??Couldn't find BBT entry in the RBT (A7C72) |
??Couldn't find BBT entry in the RBT (A7DB0) |
??Couldn't find BBT entry in the RBT (A7DDC) |
??Couldn't find BBT entry in the RBT (A7DEE) |
??Couldn't find BBT entry in the RBT (B7DE6) |
??Couldn't find BBT entry in the RBT (C394E) |
The PST file structure has been well-documented in the open source community for a long time, and Microsoft also published a PST File Format SDK several years ago. Essentially there is a header data structure followed by a B-Tree. The following documentation on the PST file format's node and leaf node structures is revealing:
The PST file format differs based on whether or not the file is ANSI/Unicode and 32-bit/64-bit. The examples below are from the 32-bit Unicode PST.
32-bit Index 1 Node |
01f0 itemCount [1 byte] 0x02 in this case |
01f1 maxItemCount [1 byte] 0x29 constant |
01f2 itemSize [1 byte] 0x0c constant |
01f3 nodeLevel [1 byte] 0x02 in this case |
01f8 backPointer [4 bytes] 0x021eb4 in this case |
32-bit Index 1 Leaf Node |
01f0 itemCount [1 byte] 0x1f in this case |
01f1 maxItemCount [1 byte] 0x29 constant |
01f2 itemSize [1 byte] 0x0c constant |
01f3 nodeLevel [1 byte] 0x00 defines a leaf node |
01f8 backPointer [4 bytes] 0x01675a in this case |
In the PST file format the nodes actually have backpointers that reference the parent node to which they belong. This is atypical of a B-tree, which is generally unidirectional (parent to child).
However, from this information it would be possible to construct a B-tree structure when reading the nodes from disk. If the itemCount of a node is correct, but the backPointers of the node's child nodes, serialized to disk as the documented leaf nodes, where incorrect, the B-Tree structure may not be constructed correctly upon deserialization of the PST file.
This raises some questions:
- How would that happen?
- Why are the item counts correct?
- Why does the repair tool work?
The educated guesses to these questions are as follows:
- Why would the backpointers be invalid?
- Aspose isn't serializing the b-tree structure to disk correctly. As Microsoft documents on the PST File Format SDK, there are four layers to the PST File Format:
Layer Description PST Layer The friendly SDK LTP Layer The abstracted PST data structures NDB Layer The lower-level PST data structures Disk Layer Serializing the PST to disk If Aspose isn't correctly serializing the data structures to disk, the backpointers may never be set and written to the output stream.
- Aspose isn't serializing the b-tree structure to disk correctly. As Microsoft documents on the PST File Format SDK, there are four layers to the PST File Format:
- Why are the item counts correct?
- It's probable that Aspose is correctly emitting valid item counts for nodes.
- Why does the repair tool work?
- If the backpointers are not correct then when the B-tree is deserialized when the PST is opened, the B-tree's nodes will not have their keys set correctly, and thus lookups will fail. This goes back to the above logs:
**Beginning NDB recovery
**Attempting to open database
**Attempting to validate header
**Attempting to validate AMap
**Attempting to validate BBT
!!Invalid BBT page (bid=36D46, ib=86526976):
btkeyMin mismatch (read 146E8, parent 146DC)
!!Invalid BBT page (bid=3E87C, ib=95311360):
btkeyMin mismatch (read 17058, parent 17050)
Looking at the PST File Format SDK headers (which the Microsoft Inbox Repair Tool likely uses) for the files below, we can see the following:
- ndb/database.h
- ndb/database_iface.h
- util/primitives.h
- util/btree.h
ndb/database.h
486
template<typename T>
487
inline std::tr1::shared_ptr<pstsdk::bbt_page> pstsdk::database_impl<T>::read_bbt_page(const page_info& pi)
ndb/database_iface.h
69
typedef bt_page<block_id, block_info> bbt_page;
util/primitives.h
86
typedef ulonglong block_id;
87
typedef block_id page_id;
util/btree.h
66
//! \brief Returns the key at the specified position
67
//!
68
//! This is specific to this btree_node, not the entire tree
69
//! \param[in] pos The position to retrieve the key for
70
//! \returns The key at the requested position
71
virtualconst K& get_key(uint pos) const = 0;
Thus if the backpointers are not correct, the keys that form the basis of the b-tree cannot be constructed. However, the b-tree itself could still be constructed in contiguous memory. And that's why I think the item counts are accurate and the repair tool works. The item count is taken from the node that represents the Inbox. However, beneath that the backpointers, and thus tree, are invalid. But since the data still exists in the correct layout on disk, the repair tool is able to walk the data and recreate the backpointers and thus repair the tree, subsequently repairing the PST.