Task 579: .PST File Format

Task 579: .PST File Format

File Format Specifications for the .PST File Format

The .PST file format is a proprietary binary format used by Microsoft Outlook to store messages, calendar events, contacts, and other items in a single file. The official specification is provided in the [MS-PST] document from Microsoft, which describes the structure, including the Node Database (NDB) layer, Lists, Tables, and Properties (LTP) layer, and Messaging layer. The format comes in two main variants: ANSI (older, limited to 2GB file size) and Unicode (newer, supporting larger files up to 50GB or more depending on Outlook version). The file starts with a header at offset 0, followed by data blocks, pages, and trees for organizing content. Data can be encoded with none, permutation, cyclic, or WIP encryption. The format uses little-endian byte order, with specific magic bytes for identification.

  1. List of Properties Intrinsic to the .PST File Format

Based on the header structure, which is the core intrinsic component defining the file's layout and metadata, the following properties are extracted for both Unicode and ANSI variants. These include identification, version, allocation counters, encryption, and reserved fields. Offsets are from the start of the file.

  • dwMagic: Magic number, must be 0x2142444E ("!BDN").
  • dwCRCPartial: 32-bit CRC of data starting from wMagicClient.
  • wMagicClient: Client magic, must be 0x534D ("SM").
  • wVer: File version (14/15 for ANSI, >=23 for Unicode, 37 for WIP support).
  • wVerClient: Client version (typically 19).
  • bPlatformCreate: Platform create flag (must be 0x01).
  • bPlatformAccess: Platform access flag (must be 0x01).
  • dwReserved1: Reserved, ignore and set to 0.
  • dwReserved2: Reserved, ignore and set to 0.
  • bidUnused (Unicode only): Unused padding, set to 0.
  • bidNextP: Next page BID counter.
  • dwUnique: Monotonically increasing unique value for header modifications.
  • rgnid[]: Array of 32 NIDs for NID_TYPE allocation starting values (e.g., 0x400 for normal folders, 0x10000 for normal messages).
  • qwUnused (Unicode only): Unused space, set to 0.
  • root: ROOT structure containing file size, allocation info, etc.
  • dwAlign (Unicode only): Alignment bytes, set to 0.
  • rgbFM: Deprecated FMap, filled with 0xFF.
  • rgbFP: Deprecated FPMap, filled with 0xFF.
  • bSentinel: Sentinel byte, must be 0x80.
  • bCryptMethod: Encryption method (0x00 none, 0x01 permute, 0x02 cyclic, 0x10 WIP).
  • rgbReserved: Reserved, set to 0.
  • bidNextB: Next block BID counter (Unicode: 8 bytes, ANSI: 4 bytes at different offset).
  • dwCRCFull (Unicode only): 32-bit CRC of data from wMagicClient to bidNextB.
  • rgbReserved2: Reserved, ignore and set to 0.
  • bReserved: Reserved, ignore and set to 0.
  • rgbReserved3: Reserved, ignore and set to 0.
  • ullReserved (ANSI only): Reserved, set to 0.
  • dwReserved (ANSI only): Reserved, set to 0.

Additional intrinsic properties from the format:

  • File signature offset: Magic at 0x00.
  • Max file size: ANSI ~2GB, Unicode 20-50GB+.
  • Byte order: Little-endian.
  • Block size: Multiples of 64 bytes for BIDs.
  • Page size: 512 bytes for data pages.
  1. Two Direct Download Links for .PST Files

Note: These are zipped PST files from the public Enron dataset for testing; unzip to access the .PST.

  1. Ghost Blog Embedded HTML/JavaScript for Drag and Drop .PST File Dump

Here's the embedded HTML with JavaScript for a Ghost blog post. It allows drag-and-drop of a .PST file and dumps the header properties to the screen (assumes Unicode format for simplicity; reads first 544 bytes for header).

Drag and drop .PST file here
  1. Python Class for .PST File
import struct
import os

class PstFile:
    def __init__(self, filepath):
        self.filepath = filepath
        self.header = None
        self.is_unicode = False

    def open(self):
        with open(self.filepath, 'rb') as f:
            header_bytes = f.read(544)  # Unicode header size
            if len(header_bytes) < 544:
                raise ValueError("File too small for PST header")
            self.header = header_bytes
            wVer = struct.unpack_from('<H', header_bytes, 10)[0]
            self.is_unicode = wVer >= 23

    def decode_properties(self):
        if not self.header:
            raise ValueError("File not opened")
        properties = {}
        properties['dwMagic'] = hex(struct.unpack_from('<I', self.header, 0)[0])
        properties['dwCRCPartial'] = hex(struct.unpack_from('<I', self.header, 4)[0])
        properties['wMagicClient'] = hex(struct.unpack_from('<H', self.header, 8)[0])
        properties['wVer'] = struct.unpack_from('<H', self.header, 10)[0]
        properties['wVerClient'] = struct.unpack_from('<H', self.header, 12)[0]
        properties['bPlatformCreate'] = hex(struct.unpack_from('<B', self.header, 14)[0])
        properties['bPlatformAccess'] = hex(struct.unpack_from('<B', self.header, 15)[0])
        properties['dwReserved1'] = hex(struct.unpack_from('<I', self.header, 16)[0])
        properties['dwReserved2'] = hex(struct.unpack_from('<I', self.header, 20)[0])
        if self.is_unicode:
            properties['bidUnused'] = hex(struct.unpack_from('<Q', self.header, 24)[0])
            properties['bidNextP'] = hex(struct.unpack_from('<Q', self.header, 32)[0])
            properties['dwUnique'] = hex(struct.unpack_from('<I', self.header, 40)[0])
            # rgnid skipped
            properties['qwUnused'] = hex(struct.unpack_from('<Q', self.header, 172)[0])
            properties['bSentinel'] = hex(struct.unpack_from('<B', self.header, 496)[0])
            properties['bCryptMethod'] = hex(struct.unpack_from('<B', self.header, 497)[0])
            properties['bidNextB'] = hex(struct.unpack_from('<Q', self.header, 500)[0])
            properties['dwCRCFull'] = hex(struct.unpack_from('<I', self.header, 508)[0])
        else:
            # ANSI adjustments
            properties['bidNextB'] = hex(struct.unpack_from('<I', self.header, 24)[0])
            properties['bidNextP'] = hex(struct.unpack_from('<I', self.header, 28)[0])
            properties['dwUnique'] = hex(struct.unpack_from('<I', self.header, 32)[0])
            # Adjust offsets for ANSI
            properties['bSentinel'] = hex(struct.unpack_from('<B', self.header, 460)[0])
            properties['bCryptMethod'] = hex(struct.unpack_from('<B', self.header, 461)[0])
        return properties

    def print_properties(self):
        props = self.decode_properties()
        for k, v in props.items():
            print(f"{k}: {v}")

    def write(self, new_filepath=None):
        if not self.header:
            raise ValueError("File not opened")
        filepath = new_filepath or self.filepath
        with open(self.filepath, 'rb') as f_in:
            data = f_in.read()
        with open(filepath, 'wb') as f_out:
            f_out.write(self.header + data[len(self.header):])  # Write header + rest

# Example usage
# pst = PstFile('path/to/file.pst')
# pst.open()
# pst.print_properties()
# pst.write('path/to/modified.pst')
  1. Java Class for .PST File
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.file.Files;
import java.nio.file.Paths;

public class PstFile {
    private String filepath;
    private byte[] header;
    private boolean isUnicode;

    public PstFile(String filepath) {
        this.filepath = filepath;
    }

    public void open() throws Exception {
        byte[] data = Files.readAllBytes(Paths.get(filepath));
        if (data.length < 544) {
            throw new Exception("File too small for PST header");
        }
        header = new byte[544];
        System.arraycopy(data, 0, header, 0, 544);
        ByteBuffer bb = ByteBuffer.wrap(header).order(ByteOrder.LITTLE_ENDIAN);
        int wVer = bb.getShort(10) & 0xFFFF;
        isUnicode = wVer >= 23;
    }

    public void decodeAndPrintProperties() {
        if (header == null) {
            throw new RuntimeException("File not opened");
        }
        ByteBuffer bb = ByteBuffer.wrap(header).order(ByteOrder.LITTLE_ENDIAN);
        System.out.println("dwMagic: 0x" + Integer.toHexString(bb.getInt(0)));
        System.out.println("dwCRCPartial: 0x" + Integer.toHexString(bb.getInt(4)));
        System.out.println("wMagicClient: 0x" + Integer.toHexString(bb.getShort(8) & 0xFFFF));
        System.out.println("wVer: " + (bb.getShort(10) & 0xFFFF));
        System.out.println("wVerClient: " + (bb.getShort(12) & 0xFFFF));
        System.out.println("bPlatformCreate: 0x" + Integer.toHexString(bb.get(14) & 0xFF));
        System.out.println("bPlatformAccess: 0x" + Integer.toHexString(bb.get(15) & 0xFF));
        System.out.println("dwReserved1: 0x" + Integer.toHexString(bb.getInt(16)));
        System.out.println("dwReserved2: 0x" + Integer.toHexString(bb.getInt(20)));
        if (isUnicode) {
            System.out.println("bidUnused: 0x" + Long.toHexString(bb.getLong(24)));
            System.out.println("bidNextP: 0x" + Long.toHexString(bb.getLong(32)));
            System.out.println("dwUnique: 0x" + Integer.toHexString(bb.getInt(40)));
            System.out.println("qwUnused: 0x" + Long.toHexString(bb.getLong(172)));
            System.out.println("bSentinel: 0x" + Integer.toHexString(bb.get(496) & 0xFF));
            System.out.println("bCryptMethod: 0x" + Integer.toHexString(bb.get(497) & 0xFF));
            System.out.println("bidNextB: 0x" + Long.toHexString(bb.getLong(500)));
            System.out.println("dwCRCFull: 0x" + Integer.toHexString(bb.getInt(508)));
        } else {
            // ANSI
            System.out.println("bidNextB: 0x" + Integer.toHexString(bb.getInt(24)));
            System.out.println("bidNextP: 0x" + Integer.toHexString(bb.getInt(28)));
            System.out.println("dwUnique: 0x" + Integer.toHexString(bb.getInt(32)));
            System.out.println("bSentinel: 0x" + Integer.toHexString(bb.get(460) & 0xFF));
            System.out.println("bCryptMethod: 0x" + Integer.toHexString(bb.get(461) & 0xFF));
        }
    }

    public void write(String newFilepath) throws Exception {
        if (header == null) {
            throw new Exception("File not opened");
        }
        byte[] data = Files.readAllBytes(Paths.get(filepath));
        byte[] newData = new byte[data.length];
        System.arraycopy(header, 0, newData, 0, header.length);
        System.arraycopy(data, header.length, newData, header.length, data.length - header.length);
        Files.write(Paths.get(newFilepath != null ? newFilepath : filepath), newData);
    }

    // Example usage
    // PstFile pst = new PstFile("path/to/file.pst");
    // pst.open();
    // pst.decodeAndPrintProperties();
    // pst.write("path/to/modified.pst");
}
  1. JavaScript Class for .PST File
class PstFile {
  constructor(filepath) {
    this.filepath = filepath;
    this.header = null;
    this.isUnicode = false;
  }

  async open() {
    const fs = require('fs'); // Node.js
    const data = fs.readFileSync(this.filepath);
    if (data.length < 544) {
      throw new Error("File too small for PST header");
    }
    this.header = data.slice(0, 544);
    const dv = new DataView(this.header.buffer);
    const wVer = dv.getUint16(10, true);
    this.isUnicode = wVer >= 23;
  }

  decodeProperties() {
    if (!this.header) {
      throw new Error("File not opened");
    }
    const dv = new DataView(this.header.buffer);
    const props = {};
    props.dwMagic = `0x${dv.getUint32(0, true).toString(16)}`;
    props.dwCRCPartial = `0x${dv.getUint32(4, true).toString(16)}`;
    props.wMagicClient = `0x${dv.getUint16(8, true).toString(16)}`;
    props.wVer = dv.getUint16(10, true);
    props.wVerClient = dv.getUint16(12, true);
    props.bPlatformCreate = `0x${dv.getUint8(14).toString(16)}`;
    props.bPlatformAccess = `0x${dv.getUint8(15).toString(16)}`;
    props.dwReserved1 = `0x${dv.getUint32(16, true).toString(16)}`;
    props.dwReserved2 = `0x${dv.getUint32(20, true).toString(16)}`;
    if (this.isUnicode) {
      props.bidUnused = `0x${dv.getBigUint64(24, true).toString(16)}`;
      props.bidNextP = `0x${dv.getBigUint64(32, true).toString(16)}`;
      props.dwUnique = `0x${dv.getUint32(40, true).toString(16)}`;
      props.qwUnused = `0x${dv.getBigUint64(172, true).toString(16)}`;
      props.bSentinel = `0x${dv.getUint8(496).toString(16)}`;
      props.bCryptMethod = `0x${dv.getUint8(497).toString(16)}`;
      props.bidNextB = `0x${dv.getBigUint64(500, true).toString(16)}`;
      props.dwCRCFull = `0x${dv.getUint32(508, true).toString(16)}`;
    } else {
      // ANSI
      props.bidNextB = `0x${dv.getUint32(24, true).toString(16)}`;
      props.bidNextP = `0x${dv.getUint32(28, true).toString(16)}`;
      props.dwUnique = `0x${dv.getUint32(32, true).toString(16)}`;
      props.bSentinel = `0x${dv.getUint8(460).toString(16)}`;
      props.bCryptMethod = `0x${dv.getUint8(461).toString(16)}`;
    }
    return props;
  }

  printProperties() {
    const props = this.decodeProperties();
    for (const [key, value] of Object.entries(props)) {
      console.log(`${key}: ${value}`);
    }
  }

  async write(newFilepath = this.filepath) {
    const fs = require('fs');
    const data = fs.readFileSync(this.filepath);
    const newData = Buffer.alloc(data.length);
    this.header.copy(newData, 0);
    data.copy(newData, this.header.length, this.header.length);
    fs.writeFileSync(newFilepath, newData);
  }
}

// Example usage
// const pst = new PstFile('path/to/file.pst');
// await pst.open();
// pst.printProperties();
// await pst.write('path/to/modified.pst');
  1. C Class for .PST File

(Assuming C++, as "class" in C is struct-based, but class for OOP.)

#include <fstream>
#include <iostream>
#include <vector>
#include <iomanip>

class PstFile {
private:
    std::string filepath;
    std::vector<unsigned char> header;
    bool isUnicode;

public:
    PstFile(const std::string& fp) : filepath(fp), isUnicode(false) {}

    void open() {
        std::ifstream f(filepath, std::ios::binary);
        if (!f) {
            throw std::runtime_error("Cannot open file");
        }
        header.resize(544);
        f.read(reinterpret_cast<char*>(header.data()), 544);
        if (f.gcount() < 544) {
            throw std::runtime_error("File too small for PST header");
        }
        unsigned short wVer = *reinterpret_cast<unsigned short*>(header.data() + 10);
        isUnicode = wVer >= 23;
    }

    void decodeAndPrintProperties() {
        if (header.empty()) {
            throw std::runtime_error("File not opened");
        }
        unsigned int dwMagic = *reinterpret_cast<unsigned int*>(header.data() + 0);
        std::cout << "dwMagic: 0x" << std::hex << dwMagic << std::endl;
        unsigned int dwCRCPartial = *reinterpret_cast<unsigned int*>(header.data() + 4);
        std::cout << "dwCRCPartial: 0x" << std::hex << dwCRCPartial << std::endl;
        unsigned short wMagicClient = *reinterpret_cast<unsigned short*>(header.data() + 8);
        std::cout << "wMagicClient: 0x" << std::hex << wMagicClient << std::endl;
        unsigned short wVer = *reinterpret_cast<unsigned short*>(header.data() + 10);
        std::cout << "wVer: " << std::dec << wVer << std::endl;
        unsigned short wVerClient = *reinterpret_cast<unsigned short*>(header.data() + 12);
        std::cout << "wVerClient: " << std::dec << wVerClient << std::endl;
        unsigned char bPlatformCreate = header[14];
        std::cout << "bPlatformCreate: 0x" << std::hex << static_cast<int>(bPlatformCreate) << std::endl;
        unsigned char bPlatformAccess = header[15];
        std::cout << "bPlatformAccess: 0x" << std::hex << static_cast<int>(bPlatformAccess) << std::endl;
        unsigned int dwReserved1 = *reinterpret_cast<unsigned int*>(header.data() + 16);
        std::cout << "dwReserved1: 0x" << std::hex << dwReserved1 << std::endl;
        unsigned int dwReserved2 = *reinterpret_cast<unsigned int*>(header.data() + 20);
        std::cout << "dwReserved2: 0x" << std::hex << dwReserved2 << std::endl;
        if (isUnicode) {
            unsigned long long bidUnused = *reinterpret_cast<unsigned long long*>(header.data() + 24);
            std::cout << "bidUnused: 0x" << std::hex << bidUnused << std::endl;
            unsigned long long bidNextP = *reinterpret_cast<unsigned long long*>(header.data() + 32);
            std::cout << "bidNextP: 0x" << std::hex << bidNextP << std::endl;
            unsigned int dwUnique = *reinterpret_cast<unsigned int*>(header.data() + 40);
            std::cout << "dwUnique: 0x" << std::hex << dwUnique << std::endl;
            unsigned long long qwUnused = *reinterpret_cast<unsigned long long*>(header.data() + 172);
            std::cout << "qwUnused: 0x" << std::hex << qwUnused << std::endl;
            unsigned char bSentinel = header[496];
            std::cout << "bSentinel: 0x" << std::hex << static_cast<int>(bSentinel) << std::endl;
            unsigned char bCryptMethod = header[497];
            std::cout << "bCryptMethod: 0x" << std::hex << static_cast<int>(bCryptMethod) << std::endl;
            unsigned long long bidNextB = *reinterpret_cast<unsigned long long*>(header.data() + 500);
            std::cout << "bidNextB: 0x" << std::hex << bidNextB << std::endl;
            unsigned int dwCRCFull = *reinterpret_cast<unsigned int*>(header.data() + 508);
            std::cout << "dwCRCFull: 0x" << std::hex << dwCRCFull << std::endl;
        } else {
            // ANSI
            unsigned int bidNextB = *reinterpret_cast<unsigned int*>(header.data() + 24);
            std::cout << "bidNextB: 0x" << std::hex << bidNextB << std::endl;
            unsigned int bidNextP = *reinterpret_cast<unsigned int*>(header.data() + 28);
            std::cout << "bidNextP: 0x" << std::hex << bidNextP << std::endl;
            unsigned int dwUnique = *reinterpret_cast<unsigned int*>(header.data() + 32);
            std::cout << "dwUnique: 0x" << std::hex << dwUnique << std::endl;
            unsigned char bSentinel = header[460];
            std::cout << "bSentinel: 0x" << std::hex << static_cast<int>(bSentinel) << std::endl;
            unsigned char bCryptMethod = header[461];
            std::cout << "bCryptMethod: 0x" << std::hex << static_cast<int>(bCryptMethod) << std::endl;
        }
    }

    void write(const std::string& newFilepath = "") {
        std::string outPath = newFilepath.empty() ? filepath : newFilepath;
        std::ifstream fIn(filepath, std::ios::binary);
        std::ofstream fOut(outPath, std::ios::binary);
        if (!fIn || !fOut) {
            throw std::runtime_error("Cannot open files for write");
        }
        fOut.write(reinterpret_cast<const char*>(header.data()), header.size());
        char buf[4096];
        while (fIn.read(buf, sizeof(buf))) {
            fOut.write(buf, fIn.gcount());
        }
        fOut.write(buf, fIn.gcount());
    }
};

// Example usage
// PstFile pst("path/to/file.pst");
// pst.open();
// pst.decodeAndPrintProperties();
// pst.write("path/to/modified.pst");