Task 583: .PUB File Format
Task 583: .PUB File Format
- The .PUB file format is a proprietary format used by Microsoft Publisher for desktop publishing documents. It is built on the Microsoft Compound File Binary (CFB) format, which is a file-system-like structure within a file for storing streams and storages. The specifications are not fully public, but the container format is documented in [MS-CFB]: Compound File Binary File Format. Based on this, the properties intrinsic to its file system (i.e., the compound file header fields that define the structure, sector allocation, and metadata) are as follows:
- Header Signature: Offset 0, Size 8 bytes - Fixed value 0xD0CF11E0A1B11AE1, identifies the file as a compound file.
- Header CLSID: Offset 8, Size 16 bytes - Reserved, MUST be set to zero.
- Minor Version: Offset 24, Size 2 bytes - The version of the file format (typically 0x003E for standard files).
- Major Version: Offset 26, Size 2 bytes - The version of the file format (3 or 4, indicating sector size capabilities).
- Byte Order: Offset 28, Size 2 bytes - Indicates endianness (0xFFFE for little-endian).
- Sector Shift: Offset 30, Size 2 bytes - Power of 2 for sector size (9 for 512 bytes in major version 3, 12 for 4096 bytes in major version 4).
- Mini Sector Shift: Offset 32, Size 2 bytes - Power of 2 for mini sector size (must be 6 for 64 bytes).
- Reserved: Offset 34, Size 6 bytes - Reserved, MUST be zero.
- Number of Directory Sectors: Offset 40, Size 4 bytes - Number of directory sectors in the compound file (0 for major version 3).
- Number of FAT Sectors: Offset 44, Size 4 bytes - Number of sectors in the FAT chain.
- First Directory Sector Location: Offset 48, Size 4 bytes - Sector ID of the first directory sector.
- Transaction Signature Number: Offset 52, Size 4 bytes - For transactioning, incremented with each update.
- Mini Stream Cutoff Size: Offset 56, Size 4 bytes - Maximum size for mini streams (must be 0x00001000 or 4096 bytes).
- First Mini FAT Sector Location: Offset 60, Size 4 bytes - Sector ID of the first mini FAT sector.
- Number of Mini FAT Sectors: Offset 64, Size 4 bytes - Number of mini FAT sectors.
- First DIFAT Sector Location: Offset 68, Size 4 bytes - Sector ID of the first DIFAT sector.
- Number of DIFAT Sectors: Offset 72, Size 4 bytes - Number of DIFAT sectors.
- DIFAT: Offset 76, Size 436 bytes - Array of 109 32-bit sector IDs for the first part of the DIFAT.
These properties define the "file system" aspects, such as sector allocation, directory structure, and stream management. Additional Publisher-specific details (e.g., version in the Contents stream magic header like 0xE8AC2C00) are reverse-engineered but not officially documented.
- Two direct download links for .PUB files (sample templates for Microsoft Publisher):
- https://www.researchgate.net/publication/267145722/download (Ramble-strip MS Publisher template)
- https://extension.iastate.edu/4h/files/V-10901.pub (4-H Newsletter Template, direct .PUB file)
- Here is an embedded HTML/JavaScript snippet for a Ghost blog (or any HTML page) that allows drag-and-drop of a .PUB file and dumps the properties to the screen. It reads the file as binary, parses the compound file header, and displays the properties.
Drag and drop a .PUB file here
- Python class for opening, decoding, reading, writing, and printing .PUB file properties:
import struct
import os
class PubFileHandler:
def __init__(self, filepath):
self.filepath = filepath
self.properties = {}
self._read_properties()
def _read_properties(self):
with open(self.filepath, 'rb') as f:
data = f.read(512) # Header is 512 bytes
if len(data) < 512:
raise ValueError("File too small for .PUB header")
self.properties = {
'Header Signature': data[0:8].hex(),
'Header CLSID': data[8:24].hex(),
'Minor Version': struct.unpack_from('<H', data, 24)[0],
'Major Version': struct.unpack_from('<H', data, 26)[0],
'Byte Order': struct.unpack_from('<H', data, 28)[0],
'Sector Shift': struct.unpack_from('<H', data, 30)[0],
'Mini Sector Shift': struct.unpack_from('<H', data, 32)[0],
'Reserved': data[34:40].hex(),
'Number of Directory Sectors': struct.unpack_from('<I', data, 40)[0],
'Number of FAT Sectors': struct.unpack_from('<I', data, 44)[0],
'First Directory Sector Location': struct.unpack_from('<I', data, 48)[0],
'Transaction Signature Number': struct.unpack_from('<I', data, 52)[0],
'Mini Stream Cutoff Size': struct.unpack_from('<I', data, 56)[0],
'First Mini FAT Sector Location': struct.unpack_from('<I', data, 60)[0],
'Number of Mini FAT Sectors': struct.unpack_from('<I', data, 64)[0],
'First DIFAT Sector Location': struct.unpack_from('<I', data, 68)[0],
'Number of DIFAT Sectors': struct.unpack_from('<I', data, 72)[0],
'DIFAT': data[76:512].hex(),
}
def print_properties(self):
for key, value in self.properties.items():
print(f"{key}: {value}")
def write_properties(self, new_properties):
# Update properties and write back (simple overwrite of header)
with open(self.filepath, 'r+b') as f:
for key, value in new_properties.items():
if key in self.properties:
self.properties[key] = value
# Pack back the header (simplified, assuming no change in sizes)
header = bytearray(512)
header[0:8] = bytes.fromhex(self.properties['Header Signature'])
header[8:24] = bytes.fromhex(self.properties['Header CLSID'])
struct.pack_into('<H', header, 24, self.properties['Minor Version'])
struct.pack_into('<H', header, 26, self.properties['Major Version'])
struct.pack_into('<H', header, 28, self.properties['Byte Order'])
struct.pack_into('<H', header, 30, self.properties['Sector Shift'])
struct.pack_into('<H', header, 32, self.properties['Mini Sector Shift'])
header[34:40] = bytes.fromhex(self.properties['Reserved'])
struct.pack_into('<I', header, 40, self.properties['Number of Directory Sectors'])
struct.pack_into('<I', header, 44, self.properties['Number of FAT Sectors'])
struct.pack_into('<I', header, 48, self.properties['First Directory Sector Location'])
struct.pack_into('<I', header, 52, self.properties['Transaction Signature Number'])
struct.pack_into('<I', header, 56, self.properties['Mini Stream Cutoff Size'])
struct.pack_into('<I', header, 60, self.properties['First Mini FAT Sector Location'])
struct.pack_into('<I', header, 64, self.properties['Number of Mini FAT Sectors'])
struct.pack_into('<I', header, 68, self.properties['First DIFAT Sector Location'])
struct.pack_into('<I', header, 72, self.properties['Number of DIFAT Sectors'])
header[76:512] = bytes.fromhex(self.properties['DIFAT'])
f.seek(0)
f.write(header)
# Example usage
# handler = PubFileHandler('example.pub')
# handler.print_properties()
# handler.write_properties({'Minor Version': 0x003F})
- Java class for the same:
import java.io.RandomAccessFile;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
public class PubFileHandler {
private String filepath;
private Map<String, Object> properties = new HashMap<>();
public PubFileHandler(String filepath) {
this.filepath = filepath;
readProperties();
}
private void readProperties() {
try (RandomAccessFile file = new RandomAccessFile(filepath, "r")) {
byte[] header = new byte[512];
file.readFully(header);
ByteBuffer buffer = ByteBuffer.wrap(header).order(ByteOrder.LITTLE_ENDIAN);
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 8; i++) sb.append(String.format("%02X", buffer.get(i)));
properties.put("Header Signature", sb.toString());
sb = new StringBuilder();
for (int i = 8; i < 24; i++) sb.append(String.format("%02X", buffer.get(i)));
properties.put("Header CLSID", sb.toString());
properties.put("Minor Version", buffer.getShort(24) & 0xFFFF);
properties.put("Major Version", buffer.getShort(26) & 0xFFFF);
properties.put("Byte Order", buffer.getShort(28) & 0xFFFF);
properties.put("Sector Shift", buffer.getShort(30) & 0xFFFF);
properties.put("Mini Sector Shift", buffer.getShort(32) & 0xFFFF);
sb = new StringBuilder();
for (int i = 34; i < 40; i++) sb.append(String.format("%02X", buffer.get(i)));
properties.put("Reserved", sb.toString());
properties.put("Number of Directory Sectors", buffer.getInt(40));
properties.put("Number of FAT Sectors", buffer.getInt(44));
properties.put("First Directory Sector Location", buffer.getInt(48));
properties.put("Transaction Signature Number", buffer.getInt(52));
properties.put("Mini Stream Cutoff Size", buffer.getInt(56));
properties.put("First Mini FAT Sector Location", buffer.getInt(60));
properties.put("Number of Mini FAT Sectors", buffer.getInt(64));
properties.put("First DIFAT Sector Location", buffer.getInt(68));
properties.put("Number of DIFAT Sectors", buffer.getInt(72));
sb = new StringBuilder();
for (int i = 76; i < 512; i++) sb.append(String.format("%02X", buffer.get(i)));
properties.put("DIFAT", sb.toString());
} catch (IOException e) {
e.printStackTrace();
}
}
public void printProperties() {
properties.forEach((key, value) -> System.out.println(key + ": " + value));
}
public void writeProperties(Map<String, Object> newProperties) {
properties.putAll(newProperties);
try (RandomAccessFile file = new RandomAccessFile(filepath, "rw")) {
ByteBuffer buffer = ByteBuffer.allocate(512).order(ByteOrder.LITTLE_ENDIAN);
// Write back (simplified)
byte[] sig = hexStringToByteArray((String) properties.get("Header Signature"));
buffer.put(sig);
byte[] clsid = hexStringToByteArray((String) properties.get("Header CLSID"));
buffer.put(clsid);
buffer.putShort(24, ((Number) properties.get("Minor Version")).shortValue());
buffer.putShort(26, ((Number) properties.get("Major Version")).shortValue());
buffer.putShort(28, ((Number) properties.get("Byte Order")).shortValue());
buffer.putShort(30, ((Number) properties.get("Sector Shift")).shortValue());
buffer.putShort(32, ((Number) properties.get("Mini Sector Shift")).shortValue());
byte[] reserved = hexStringToByteArray((String) properties.get("Reserved"));
buffer.position(34);
buffer.put(reserved);
buffer.putInt(40, ((Number) properties.get("Number of Directory Sectors")).intValue());
buffer.putInt(44, ((Number) properties.get("Number of FAT Sectors")).intValue());
buffer.putInt(48, ((Number) properties.get("First Directory Sector Location")).intValue());
buffer.putInt(52, ((Number) properties.get("Transaction Signature Number")).intValue());
buffer.putInt(56, ((Number) properties.get("Mini Stream Cutoff Size")).intValue());
buffer.putInt(60, ((Number) properties.get("First Mini FAT Sector Location")).intValue());
buffer.putInt(64, ((Number) properties.get("Number of Mini FAT Sectors")).intValue());
buffer.putInt(68, ((Number) properties.get("First DIFAT Sector Location")).intValue());
buffer.putInt(72, ((Number) properties.get("Number of DIFAT Sectors")).intValue());
byte[] difat = hexStringToByteArray((String) properties.get("DIFAT"));
buffer.position(76);
buffer.put(difat);
file.seek(0);
file.write(buffer.array());
} catch (IOException e) {
e.printStackTrace();
}
}
private byte[] hexStringToByteArray(String s) {
int len = s.length();
byte[] data = new byte[len / 2];
for (int i = 0; i < len; i += 2) {
data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4) + Character.digit(s.charAt(i+1), 16));
}
return data;
}
// Example usage
// public static void main(String[] args) {
// PubFileHandler handler = new PubFileHandler("example.pub");
// handler.printProperties();
// Map<String, Object> updates = new HashMap<>();
// updates.put("Minor Version", 0x003F);
// handler.writeProperties(updates);
// }
}
- JavaScript class for the same (node.js, using fs for file I/O):
const fs = require('fs');
class PubFileHandler {
constructor(filepath) {
this.filepath = filepath;
this.properties = {};
this.readProperties();
}
readProperties() {
const data = fs.readSync(this.filepath);
const view = new DataView(data.buffer);
this.properties = {
'Header Signature': Array.from(new Uint8Array(data, 0, 8)).map(b => b.toString(16).padStart(2, '0')).join(''),
'Header CLSID': Array.from(new Uint8Array(data, 8, 16)).map(b => b.toString(16).padStart(2, '0')).join(''),
'Minor Version': view.getUint16(24, true),
'Major Version': view.getUint16(26, true),
'Byte Order': view.getUint16(28, true),
'Sector Shift': view.getUint16(30, true),
'Mini Sector Shift': view.getUint16(32, true),
'Reserved': Array.from(new Uint8Array(data, 34, 6)).map(b => b.toString(16).padStart(2, '0')).join(''),
'Number of Directory Sectors': view.getUint32(40, true),
'Number of FAT Sectors': view.getUint32(44, true),
'First Directory Sector Location': view.getUint32(48, true),
'Transaction Signature Number': view.getUint32(52, true),
'Mini Stream Cutoff Size': view.getUint32(56, true),
'First Mini FAT Sector Location': view.getUint32(60, true),
'Number of Mini FAT Sectors': view.getUint32(64, true),
'First DIFAT Sector Location': view.getUint32(68, true),
'Number of DIFAT Sectors': view.getUint32(72, true),
'DIFAT': Array.from(new Uint8Array(data, 76, 436)).map(b => b.toString(16).padStart(2, '0')).join(''),
};
}
printProperties() {
for (const [key, value] of Object.entries(this.properties)) {
console.log(`${key}: ${value}`);
}
}
writeProperties(newProperties) {
Object.assign(this.properties, newProperties);
const buffer = new ArrayBuffer(512);
const view = new DataView(buffer);
const sig = hexToBytes(this.properties['Header Signature']);
new Uint8Array(buffer, 0, 8).set(sig);
const clsid = hexToBytes(this.properties['Header CLSID']);
new Uint8Array(buffer, 8, 16).set(clsid);
view.setUint16(24, this.properties['Minor Version'], true);
view.setUint16(26, this.properties['Major Version'], true);
view.setUint16(28, this.properties['Byte Order'], true);
view.setUint16(30, this.properties['Sector Shift'], true);
view.setUint16(32, this.properties['Mini Sector Shift'], true);
const reserved = hexToBytes(this.properties['Reserved']);
new Uint8Array(buffer, 34, 6).set(reserved);
view.setUint32(40, this.properties['Number of Directory Sectors'], true);
view.setUint32(44, this.properties['Number of FAT Sectors'], true);
view.setUint32(48, this.properties['First Directory Sector Location'], true);
view.setUint32(52, this.properties['Transaction Signature Number'], true);
view.setUint32(56, this.properties['Mini Stream Cutoff Size'], true);
view.setUint32(60, this.properties['First Mini FAT Sector Location'], true);
view.setUint32(64, this.properties['Number of Mini FAT Sectors'], true);
view.setUint32(68, this.properties['First DIFAT Sector Location'], true);
view.setUint32(72, this.properties['Number of DIFAT Sectors'], true);
const difat = hexToBytes(this.properties['DIFAT']);
new Uint8Array(buffer, 76, 436).set(difat);
// Write to file (overwrite header)
const fd = fs.openSync(this.filepath, 'r+');
fs.writeSync(fd, new Uint8Array(buffer), 0, 512, 0);
fs.closeSync(fd);
}
}
function hexToBytes(hex) {
const bytes = [];
for (let i = 0; i < hex.length; i += 2) {
bytes.push(parseInt(hex.substr(i, 2), 16));
}
return bytes;
}
// Example usage
// const handler = new PubFileHandler('example.pub');
// handler.printProperties();
// handler.writeProperties({ 'Minor Version': 0x003F });
- C++ class for the same (using std::fstream for I/O):
#include <fstream>
#include <iostream>
#include <iomanip>
#include <map>
#include <string>
#include <vector>
class PubFileHandler {
private:
std::string filepath;
std::map<std::string, std::string> properties; // Store as hex strings for simplicity
public:
PubFileHandler(const std::string& fp) : filepath(fp) {
readProperties();
}
void readProperties() {
std::ifstream file(filepath, std::ios::binary);
if (!file) {
std::cerr << "Error opening file" << std::endl;
return;
}
std::vector<char> header(512);
file.read(header.data(), 512);
file.close();
std::ostringstream oss;
for (int i = 0; i < 8; ++i) oss << std::hex << std::setw(2) << std::setfill('0') << (static_cast<unsigned char>(header[i]));
properties["Header Signature"] = oss.str();
oss.str("");
for (int i = 8; i < 24; ++i) oss << std::hex << std::setw(2) << std::setfill('0') << (static_cast<unsigned char>(header[i]));
properties["Header CLSID"] = oss.str();
unsigned short minor = *reinterpret_cast<unsigned short*>(&header[24]);
properties["Minor Version"] = std::to_string(minor);
unsigned short major = *reinterpret_cast<unsigned short*>(&header[26]);
properties["Major Version"] = std::to_string(major);
unsigned short byteOrder = *reinterpret_cast<unsigned short*>(&header[28]);
properties["Byte Order"] = std::to_string(byteOrder);
unsigned short sectorShift = *reinterpret_cast<unsigned short*>(&header[30]);
properties["Sector Shift"] = std::to_string(sectorShift);
unsigned short miniSectorShift = *reinterpret_cast<unsigned short*>(&header[32]);
properties["Mini Sector Shift"] = std::to_string(miniSectorShift);
oss.str("");
for (int i = 34; i < 40; ++i) oss << std::hex << std::setw(2) << std::setfill('0') << (static_cast<unsigned char>(header[i]));
properties["Reserved"] = oss.str();
unsigned int dirSectors = *reinterpret_cast<unsigned int*>(&header[40]);
properties["Number of Directory Sectors"] = std::to_string(dirSectors);
unsigned int fatSectors = *reinterpret_cast<unsigned int*>(&header[44]);
properties["Number of FAT Sectors"] = std::to_string(fatSectors);
unsigned int firstDir = *reinterpret_cast<unsigned int*>(&header[48]);
properties["First Directory Sector Location"] = std::to_string(firstDir);
unsigned int transSig = *reinterpret_cast<unsigned int*>(&header[52]);
properties["Transaction Signature Number"] = std::to_string(transSig);
unsigned int miniCutoff = *reinterpret_cast<unsigned int*>(&header[56]);
properties["Mini Stream Cutoff Size"] = std::to_string(miniCutoff);
unsigned int firstMiniFat = *reinterpret_cast<unsigned int*>(&header[60]);
properties["First Mini FAT Sector Location"] = std::to_string(firstMiniFat);
unsigned int numMiniFat = *reinterpret_cast<unsigned int*>(&header[64]);
properties["Number of Mini FAT Sectors"] = std::to_string(numMiniFat);
unsigned int firstDifat = *reinterpret_cast<unsigned int*>(&header[68]);
properties["First DIFAT Sector Location"] = std::to_string(firstDifat);
unsigned int numDifat = *reinterpret_cast<unsigned int*>(&header[72]);
properties["Number of DIFAT Sectors"] = std::to_string(numDifat);
oss.str("");
for (int i = 76; i < 512; ++i) oss << std::hex << std::setw(2) << std::setfill('0') << (static_cast<unsigned char>(header[i]));
properties["DIFAT"] = oss.str();
}
void printProperties() {
for (const auto& pair : properties) {
std::cout << pair.first << ": " << pair.second << std::endl;
}
}
void writeProperties(const std::map<std::string, std::string>& newProperties) {
for (const auto& pair : newProperties) {
if (properties.count(pair.first)) {
properties[pair.first] = pair.second;
}
}
std::fstream file(filepath, std::ios::binary | std::ios::in | std::ios::out);
if (!file) {
std::cerr << "Error opening file for write" << std::endl;
return;
}
std::vector<char> header(512, 0);
auto hexToBytes = [](const std::string& hex) {
std::vector<char> bytes;
for (size_t i = 0; i < hex.length(); i += 2) {
std::string byteStr = hex.substr(i, 2);
char byte = static_cast<char>(std::stoi(byteStr, nullptr, 16));
bytes.push_back(byte);
}
return bytes;
};
auto sig = hexToBytes(properties["Header Signature"]);
std::copy(sig.begin(), sig.end(), header.begin());
auto clsid = hexToBytes(properties["Header CLSID"]);
std::copy(clsid.begin(), clsid.end(), header.begin() + 8);
*reinterpret_cast<unsigned short*>(&header[24]) = std::stoul(properties["Minor Version"]);
*reinterpret_cast<unsigned short*>(&header[26]) = std::stoul(properties["Major Version"]);
*reinterpret_cast<unsigned short*>(&header[28]) = std::stoul(properties["Byte Order"]);
*reinterpret_cast<unsigned short*>(&header[30]) = std::stoul(properties["Sector Shift"]);
*reinterpret_cast<unsigned short*>(&header[32]) = std::stoul(properties["Mini Sector Shift"]);
auto reserved = hexToBytes(properties["Reserved"]);
std::copy(reserved.begin(), reserved.end(), header.begin() + 34);
*reinterpret_cast<unsigned int*>(&header[40]) = std::stoul(properties["Number of Directory Sectors"]);
*reinterpret_cast<unsigned int*>(&header[44]) = std::stoul(properties["Number of FAT Sectors"]);
*reinterpret_cast<unsigned int*>(&header[48]) = std::stoul(properties["First Directory Sector Location"]);
*reinterpret_cast<unsigned int*>(&header[52]) = std::stoul(properties["Transaction Signature Number"]);
*reinterpret_cast<unsigned int*>(&header[56]) = std::stoul(properties["Mini Stream Cutoff Size"]);
*reinterpret_cast<unsigned int*>(&header[60]) = std::stoul(properties["First Mini FAT Sector Location"]);
*reinterpret_cast<unsigned int*>(&header[64]) = std::stoul(properties["Number of Mini FAT Sectors"]);
*reinterpret_cast<unsigned int*>(&header[68]) = std::stoul(properties["First DIFAT Sector Location"]);
*reinterpret_cast<unsigned int*>(&header[72]) = std::stoul(properties["Number of DIFAT Sectors"]);
auto difat = hexToBytes(properties["DIFAT"]);
std::copy(difat.begin(), difat.end(), header.begin() + 76);
file.seekp(0);
file.write(header.data(), 512);
file.close();
}
};
// Example usage
// int main() {
// PubFileHandler handler("example.pub");
// handler.printProperties();
// std::map<std::string, std::string> updates;
// updates["Minor Version"] = "63"; // 0x003F
// handler.writeProperties(updates);
// return 0;
// }