Task 507: .PAGES File Format
Task 507: .PAGES File Format
.PAGES File Format Specifications
The .PAGES file format is a proprietary format used by Apple's Pages application, part of the iWork suite. It was introduced in 2005 and evolved significantly in 2013 with the adoption of the IWA (iWork Archive) binary format for better efficiency, replacing earlier XML-based structures. The format is based on a ZIP container (for single-file distribution) or a bundle directory structure, containing compressed Protocol Buffers (Protobuf) data via Snappy compression. No official specification is published by Apple, but reverse engineering efforts have documented its structure, including projects like those on GitHub (e.g., obriensp/iWorkFileFormat and matchaxnb/pyiwa).
1. List of Properties Intrinsic to the .PAGES File Format
Based on reverse-engineered details, here are the key properties intrinsic to the format's structure and file system integration (e.g., how it's organized, compressed, and metadata-handled). These are not user-editable document properties but core attributes of the format itself:
- File Extension: .pages
- MIME Type: application/x-iwork-pages-sffpages
- Container Type: ZIP archive (PK\003\004 magic bytes) for single-file .pages documents; alternatively, a macOS bundle directory for unpacked versions.
- Internal Structure: Bundle-like hierarchy with directories such as Index/, Metadata/, Data/, and QuickLook/. The core content is in Index.zip (a ZIP file inside the main ZIP) containing multiple .iwa (iWork Archive) files.
- .iwa Files: Binary files using Snappy compression on Protobuf streams. Each .iwa starts with a 4-byte little-endian length (first byte often 0, followed by 3-byte chunk length). Decompressed content includes TSP.ArchiveInfo messages with identifiers, message infos (type and length), and a should_merge flag for incremental updates.
- Protobuf Usage: Serialized messages with type IDs mapped via TSPRegistry (e.g., 1 for TSP.ArchiveInfo, 10000 for TSDDrawableArchive). Supports inheritance via a "super" reference field and extensions for dynamic fields.
- Compression Method: Snappy for .iwa chunks (custom header: 1-byte tag 0 + 3-byte LE length per chunk; no CRC32 checksum). ZIP compression for the overall package and Index.zip.
- Metadata Directory: Contains Properties.plist (document metadata like author, title, custom properties), DocumentIdentifier (UUID string), and BuildVersionHistory.plist (array of version strings for compatibility and history tracking).
- Data Directory: Stores referenced media files (e.g., images as .jpeg, videos as .mp4), with paths resolved via TSP.PackageMetadata (Protobuf ID 2).
- Preview and Thumbnail Files: preview.jpg (main thumbnail), preview-micro.jpg, preview-web.jpg for different resolutions; QuickLook/Thumbnail.jpg or .png for macOS Quick Look previews.
- Document Type Differentiation: Identified via Document record (Protobuf ID 1); supports word-processing (body storage with flowing text) or page layout modes.
- Spatial and Layout Properties: Infinite canvas model with frame (position/size), geometry (offsets/scales/rotations), z-order for drawables, and transforms (e.g., for masks or 3D objects).
- Incremental Merging: should_merge flag in ArchiveInfo for delta updates in collaborative editing.
- Application-Specific Elements: For Pages, includes TSWP.StorageArchive for text, TSD.ImageArchive for images, TST.TableModelArchive for tables (tiled 256x256 cells with 12-byte cell headers), and more.
- Versioning and Compatibility: Tied to iWork versions (e.g., post-2013 uses IWA); backward compatibility via plist history.
- File System Integration: On macOS, treated as a package (opaque file in Finder); supports resource forks but rarely used. No encryption by default.
These properties ensure efficient storage, partial loading, and cross-app compatibility within iWork.
2. Two Direct Download Links for .PAGES Files
- https://soundtrackstv.com/wp-content/uploads/2023/02/sample.pages
- https://soundtrackstv.com/wp-content/uploads/2023/02/sample2.pages
These are sample .pages documents available for direct download.
3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .PAGES File Dump
This is a self-contained HTML snippet with embedded JavaScript that can be embedded in a Ghost blog post (or any HTML blog). It allows drag-and-drop of a .PAGES file, treats it as a ZIP, extracts components using JSZip (included via CDN for simplicity), and dumps the listed properties to the screen. Note: This requires the browser to support File API and ZIP handling; it prints basic structural properties without full Protobuf parsing (as that requires server-side processing or heavy libs).
4. Python Class for .PAGES Handling
This Python class uses zipfile, plistlib, and snappy (available in the environment) to open, decode (unzip and decompress .iwa), read, write (basic modify, e.g., update a plist), and print the properties.
import zipfile
import plistlib
import snappy
import os
from io import BytesIO
class PagesFile:
def __init__(self, filepath):
self.filepath = filepath
self.zip = zipfile.ZipFile(filepath, 'r')
self.properties = self._read_properties()
def _read_properties(self):
properties = {
'File Extension': '.pages',
'MIME Type': 'application/x-iwork-pages-sffpages',
'Container Type': 'ZIP archive',
'Internal Directories': set(),
'.iwa Files': [],
'Metadata Files': [],
'Preview Files': [],
'Data Files': [],
'Compression Method': 'Snappy for .iwa, ZIP for package',
# Add more as needed
}
for name in self.zip.namelist():
dir = name.split('/')[0]
properties['Internal Directories'].add(dir)
if name.endswith('.iwa'):
properties['.iwa Files'].append(name)
elif name.startswith('Metadata/'):
properties['Metadata Files'].append(name)
elif 'preview' in name or 'Thumbnail' in name:
properties['Preview Files'].append(name)
elif name.startswith('Data/'):
properties['Data Files'].append(name)
properties['Internal Directories'] = list(properties['Internal Directories'])
return properties
def decode_iwa(self, iwa_path):
"""Decompress a single .iwa file."""
with self.zip.open(iwa_path) as f:
compressed = f.read()
# Assume starts with 4-byte length (skip first 0 byte if present)
if compressed[0] == 0:
compressed = compressed[1:]
return snappy.uncompress(compressed)
def read_metadata(self, plist_path):
"""Read a plist metadata file."""
with self.zip.open(plist_path) as f:
return plistlib.loads(f.read())
def print_properties(self):
for key, value in self.properties.items():
if isinstance(value, list):
print(f"{key}: {', '.join(value)}")
else:
print(f"{key}: {value}")
# Example: Print decompressed size of first .iwa
if self.properties['.iwa Files']:
first_iwa = self.properties['.iwa Files'][0]
decompressed = self.decode_iwa(first_iwa)
print(f"Decompressed size of {first_iwa}: {len(decompressed)} bytes")
def write(self, output_path, modify_example=False):
"""Write back the file, optionally modify a property."""
with zipfile.ZipFile(output_path, 'w') as new_zip:
for name in self.zip.namelist():
data = self.zip.read(name)
if modify_example and name == 'Metadata/Properties.plist':
plist = plistlib.loads(data)
plist['Modified'] = True # Example modification
data = plistlib.dumps(plist)
new_zip.writestr(name, data)
# Usage example:
# pages = PagesFile('sample.pages')
# pages.print_properties()
# pages.write('modified.pages', modify_example=True)
5. Java Class for .PAGES Handling
This Java class uses ZipFile, Properties for plists (simple parse), and a Snappy lib (assume org.iq80.snappy.Snappy for decompression).
import java.io.*;
import java.util.*;
import java.util.zip.ZipEntry;
import java.util.zip.ZipFile;
import java.util.zip.ZipOutputStream;
import org.iq80.snappy.Snappy; // Assume imported or added as dependency
public class PagesFile {
private String filepath;
private ZipFile zip;
private Map<String, Object> properties;
public PagesFile(String filepath) throws IOException {
this.filepath = filepath;
this.zip = new ZipFile(filepath);
this.properties = readProperties();
}
private Map<String, Object> readProperties() {
Map<String, Object> props = new HashMap<>();
props.put("File Extension", ".pages");
props.put("MIME Type", "application/x-iwork-pages-sffpages");
props.put("Container Type", "ZIP archive");
Set<String> dirs = new HashSet<>();
List<String> iwaFiles = new ArrayList<>();
List<String> metadataFiles = new ArrayList<>();
List<String> previewFiles = new ArrayList<>();
List<String> dataFiles = new ArrayList<>();
props.put("Compression Method", "Snappy for .iwa, ZIP for package");
Enumeration<? extends ZipEntry> entries = zip.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
String name = entry.getName();
String dir = name.split("/")[0];
dirs.add(dir);
if (name.endsWith(".iwa")) {
iwaFiles.add(name);
} else if (name.startsWith("Metadata/")) {
metadataFiles.add(name);
} else if (name.contains("preview") || name.contains("Thumbnail")) {
previewFiles.add(name);
} else if (name.startsWith("Data/")) {
dataFiles.add(name);
}
}
props.put("Internal Directories", new ArrayList<>(dirs));
props.put(".iwa Files", iwaFiles);
props.put("Metadata Files", metadataFiles);
props.put("Preview Files", previewFiles);
props.put("Data Files", dataFiles);
return props;
}
public byte[] decodeIwa(String iwaPath) throws IOException {
InputStream is = zip.getInputStream(zip.getEntry(iwaPath));
ByteArrayOutputStream baos = new ByteArrayOutputStream();
byte[] buffer = new byte[1024];
int len;
while ((len = is.read(buffer)) > 0) {
baos.write(buffer, 0, len);
}
byte[] compressed = baos.toByteArray();
if (compressed[0] == 0) {
compressed = Arrays.copyOfRange(compressed, 1, compressed.length);
}
return Snappy.uncompress(compressed);
}
public void printProperties() {
for (Map.Entry<String, Object> entry : properties.entrySet()) {
if (entry.getValue() instanceof List) {
System.out.println(entry.getKey() + ": " + String.join(", ", (List<String>) entry.getValue()));
} else {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
try {
if (!((List<String>) properties.get(".iwa Files")).isEmpty()) {
String firstIwa = ((List<String>) properties.get(".iwa Files")).get(0);
byte[] decompressed = decodeIwa(firstIwa);
System.out.println("Decompressed size of " + firstIwa + ": " + decompressed.length + " bytes");
}
} catch (IOException e) {
e.printStackTrace();
}
}
public void write(String outputPath, boolean modifyExample) throws IOException {
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(outputPath));
Enumeration<? extends ZipEntry> entries = zip.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
zos.putNextEntry(new ZipEntry(entry.getName()));
InputStream is = zip.getInputStream(entry);
byte[] buffer = new byte[1024];
int len;
byte[] data = null;
if (modifyExample && entry.getName().equals("Metadata/Properties.plist")) {
// Simple plist modify (assuming XML plist)
Scanner scanner = new Scanner(is).useDelimiter("\\A");
String plistStr = scanner.hasNext() ? scanner.next() : "";
plistStr = plistStr.replace("</dict>", "<key>Modified</key><true/></dict>"); // Example
data = plistStr.getBytes();
} else {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
while ((len = is.read(buffer)) > 0) {
baos.write(buffer, 0, len);
}
data = baos.toByteArray();
}
zos.write(data, 0, data.length);
zos.closeEntry();
is.close();
}
zos.close();
}
// Usage example:
// public static void main(String[] args) throws IOException {
// PagesFile pages = new PagesFile("sample.pages");
// pages.printProperties();
// pages.write("modified.pages", true);
// }
}
6. JavaScript Class for .PAGES Handling
This Node.js class uses adm-zip for ZIP (assume installed), snappyjs for decompression, and plist for plists. Run with node script.js.
const AdmZip = require('adm-zip'); // npm install adm-zip
const snappy = require('snappyjs'); // npm install snappyjs
const plist = require('plist'); // npm install plist
const fs = require('fs');
class PagesFile {
constructor(filepath) {
this.filepath = filepath;
this.zip = new AdmZip(filepath);
this.properties = this._readProperties();
}
_readProperties() {
const properties = {
'File Extension': '.pages',
'MIME Type': 'application/x-iwork-pages-sffpages',
'Container Type': 'ZIP archive',
'Internal Directories': new Set(),
'.iwa Files': [],
'Metadata Files': [],
'Preview Files': [],
'Data Files': [],
'Compression Method': 'Snappy for .iwa, ZIP for package',
};
const entries = this.zip.getEntries();
entries.forEach(entry => {
const name = entry.entryName;
const dir = name.split('/')[0];
properties['Internal Directories'].add(dir);
if (name.endsWith('.iwa')) {
properties['.iwa Files'].push(name);
} else if (name.startsWith('Metadata/')) {
properties['Metadata Files'].push(name);
} else if (name.includes('preview') || name.includes('Thumbnail')) {
properties['Preview Files'].push(name);
} else if (name.startsWith('Data/')) {
properties['Data Files'].push(name);
}
});
properties['Internal Directories'] = Array.from(properties['Internal Directories']);
return properties;
}
decodeIwa(iwaPath) {
const compressed = this.zip.readFile(iwaPath);
const offset = compressed[0] === 0 ? 1 : 0;
return snappy.uncompress(compressed.slice(offset));
}
printProperties() {
for (const [key, value] of Object.entries(this.properties)) {
if (Array.isArray(value)) {
console.log(`${key}: ${value.join(', ')}`);
} else {
console.log(`${key}: ${value}`);
}
}
if (this.properties['.iwa Files'].length > 0) {
const firstIwa = this.properties['.iwa Files'][0];
const decompressed = this.decodeIwa(firstIwa);
console.log(`Decompressed size of ${firstIwa}: ${decompressed.length} bytes`);
}
}
write(outputPath, modifyExample = false) {
if (modifyExample) {
// Example: Modify Properties.plist
const plistPath = 'Metadata/Properties.plist';
if (this.zip.getEntry(plistPath)) {
let plistData = plist.parse(this.zip.readAsText(plistPath));
plistData.Modified = true;
this.zip.updateFile(plistPath, Buffer.from(plist.stringify(plistData)));
}
}
this.zip.writeZip(outputPath);
}
}
// Usage example:
// const pages = new PagesFile('sample.pages');
// pages.printProperties();
// pages.write('modified.pages', true);
7. C++ Class for .PAGES Handling
This C++ class uses libzip for ZIP, libplist for plists, and snappy lib for decompression (assume linked). Compile with appropriate flags.
#include <iostream>
#include <vector>
#include <string>
#include <set>
#include <zip.h>
#include <snappy.h> // Assume snappy lib
#include <plist/plist.h> // Assume libplist
class PagesFile {
private:
std::string filepath;
zip_t* zip;
std::map<std::string, std::any> properties; // Use std::any or variant for mixed types
public:
PagesFile(const std::string& filepath) : filepath(filepath) {
int err = 0;
zip = zip_open(filepath.c_str(), ZIP_RDONLY, &err);
if (!zip) {
throw std::runtime_error("Failed to open ZIP");
}
properties = readProperties();
}
~PagesFile() {
zip_close(zip);
}
std::map<std::string, std::any> readProperties() {
std::map<std::string, std::any> props;
props["File Extension"] = std::string(".pages");
props["MIME Type"] = std::string("application/x-iwork-pages-sffpages");
props["Container Type"] = std::string("ZIP archive");
std::set<std::string> dirs;
std::vector<std::string> iwaFiles, metadataFiles, previewFiles, dataFiles;
props["Compression Method"] = std::string("Snappy for .iwa, ZIP for package");
zip_int64_t num_entries = zip_get_num_entries(zip, 0);
for (zip_int64_t i = 0; i < num_entries; ++i) {
const char* name = zip_get_name(zip, i, 0);
std::string sname(name);
size_t pos = sname.find('/');
std::string dir = (pos != std::string::npos) ? sname.substr(0, pos) : "";
if (!dir.empty()) dirs.insert(dir);
if (sname.ends_with(".iwa")) {
iwaFiles.push_back(sname);
} else if (sname.find("Metadata/") == 0) {
metadataFiles.push_back(sname);
} else if (sname.find("preview") != std::string::npos || sname.find("Thumbnail") != std::string::npos) {
previewFiles.push_back(sname);
} else if (sname.find("Data/") == 0) {
dataFiles.push_back(sname);
}
}
std::vector<std::string> dirVec(dirs.begin(), dirs.end());
props["Internal Directories"] = dirVec;
props[".iwa Files"] = iwaFiles;
props["Metadata Files"] = metadataFiles;
props["Preview Files"] = previewFiles;
props["Data Files"] = dataFiles;
return props;
}
std::vector<char> decodeIwa(const std::string& iwaPath) {
zip_file_t* file = zip_fopen(zip, iwaPath.c_str(), 0);
if (!file) throw std::runtime_error("Failed to open iwa");
zip_stat_t stat;
zip_stat(zip, iwaPath.c_str(), 0, &stat);
std::vector<char> compressed(stat.size);
zip_fread(file, compressed.data(), stat.size);
zip_fclose(file);
size_t offset = (compressed[0] == 0) ? 1 : 0;
std::string decompressed;
snappy::Uncompress(compressed.data() + offset, compressed.size() - offset, &decompressed);
return std::vector<char>(decompressed.begin(), decompressed.end());
}
void printProperties() {
for (const auto& [key, value] : properties) {
if (value.type() == typeid(std::string)) {
std::cout << key << ": " << std::any_cast<std::string>(value) << std::endl;
} else if (value.type() == typeid(std::vector<std::string>)) {
auto vec = std::any_cast<std::vector<std::string>>(value);
std::cout << key << ": ";
for (size_t i = 0; i < vec.size(); ++i) {
std::cout << vec[i] << (i < vec.size() - 1 ? ", " : "");
}
std::cout << std::endl;
}
}
auto iwaAny = properties[".iwa Files"];
if (iwaAny.type() == typeid(std::vector<std::string>) && !std::any_cast<std::vector<std::string>>(iwaAny).empty()) {
std::string firstIwa = std::any_cast<std::vector<std::string>>(iwaAny)[0];
auto decompressed = decodeIwa(firstIwa);
std::cout << "Decompressed size of " << firstIwa << ": " << decompressed.size() << " bytes" << std::endl;
}
}
void write(const std::string& outputPath, bool modifyExample) {
zip_t* newZip = zip_open(outputPath.c_str(), ZIP_CREATE | ZIP_TRUNCATE, nullptr);
if (!newZip) throw std::runtime_error("Failed to create ZIP");
zip_int64_t num_entries = zip_get_num_entries(zip, 0);
for (zip_int64_t i = 0; i < num_entries; ++i) {
const char* name = zip_get_name(zip, i, 0);
zip_file_t* file = zip_fopen(zip, name, 0);
zip_stat_t stat;
zip_stat(zip, name, 0, &stat);
std::vector<char> data(stat.size);
zip_fread(file, data.data(), stat.size);
zip_fclose(file);
if (modifyExample && std::string(name) == "Metadata/Properties.plist") {
// Simple plist modify using libplist
plist_t root = nullptr;
plist_from_xml(data.data(), data.size(), &root);
plist_dict_set_item(root, "Modified", plist_new_bool(1));
char* newPlist = nullptr;
uint32_t len = 0;
plist_to_xml(root, &newPlist, &len);
data.assign(newPlist, newPlist + len);
free(newPlist);
plist_free(root);
}
zip_source_t* src = zip_source_buffer(newZip, data.data(), data.size(), 0);
zip_file_add(newZip, name, src, ZIP_FL_OVERWRITE);
}
zip_close(newZip);
}
};
// Usage example:
// int main() {
// try {
// PagesFile pages("sample.pages");
// pages.printProperties();
// pages.write("modified.pages", true);
// } catch (const std::exception& e) {
// std::cerr << e.what() << std::endl;
// }
// return 0;
// }