Task 147: .DOCM File Format

Task 147: .DOCM File Format

File Format Specifications for the .DOCM File Format

The .DOCM file format is a macro-enabled document format used by Microsoft Word, based on the Office Open XML (OOXML) standard as defined in ISO/IEC 29500 and ECMA-376. It is essentially a ZIP archive containing XML files and binary components, conforming to the Open Packaging Conventions (OPC). The format supports structured document content, metadata, and embedded Visual Basic for Applications (VBA) macros, which distinguish it from the non-macro-enabled .DOCX format. Key structural elements include:

  • A ZIP container with a file signature of PK\003\004.
  • Core files such as [Content_Types].xml for MIME types, _rels/.rels for relationships, docProps/core.xml for core metadata, docProps/app.xml for extended properties, word/document.xml for main content, and word/vbaProject.bin for macros.
  • MIME type: application/vnd.openxmlformats-officedocument.wordprocessingml.document.
  • The format allows for extensibility, digital signatures, and encryption, with macros stored in binary form within the vbaProject.bin part.
  1. List of All Properties of This File Format Intrinsic to Its File System

The properties intrinsic to the .DOCM format are primarily the metadata stored within the ZIP archive's docProps/core.xml (core properties) and docProps/app.xml (extended properties). These are standardized in the OOXML specification and include:

Core Properties (from docProps/core.xml):

  • Title
  • Subject
  • Creator
  • Keywords
  • Description
  • LastModifiedBy
  • Revision
  • Created
  • Modified
  • Category
  • ContentStatus
  • ContentType
  • Identifier
  • Language
  • LastPrinted
  • Version

Extended Properties (from docProps/app.xml, specific to Word documents):

  • Template
  • TotalTime
  • Pages
  • Words
  • Characters
  • Application
  • DocSecurity
  • Lines
  • Paragraphs
  • ScaleCrop
  • Company
  • LinksUpToDate
  • CharactersWithSpaces
  • SharedDoc
  • HyperlinksChanged
  • AppVersion

These properties represent document metadata and are accessible without opening the file in Word, by parsing the XML files within the ZIP structure.

  1. Two Direct Download Links for Files of Format .DOCM
  1. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .DOCM File Property Dump

The following is a self-contained HTML snippet with embedded JavaScript that can be inserted into a Ghost blog post. It creates a drag-and-drop area where a user can drop a .DOCM file. The script uses the browser's File API and a simple ZIP parser (without external libraries) to extract and display the properties from docProps/core.xml and docProps/app.xml. Note that full ZIP parsing in pure JavaScript is limited; this implementation assumes a standard structure and handles basic extraction.

Drag and drop a .DOCM file here

This code assumes uncompressed XML parts for simplicity; in practice, compression may require a full ZIP library like JSZip.

  1. Python Class for .DOCM File Handling

The following Python class uses the zipfile and xml.etree.ElementTree modules to open, read, modify, and print the properties. The write method allows updating properties and saving to a new file.

import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO

class DocmHandler:
    def __init__(self, filepath):
        self.filepath = filepath
        self.core_props = {}
        self.ext_props = {}
        self.zip_file = None

    def open(self):
        self.zip_file = zipfile.ZipFile(self.filepath, 'r')

    def read_properties(self):
        if not self.zip_file:
            self.open()
        core_xml = self.zip_file.read('docProps/core.xml')
        ext_xml = self.zip_file.read('docProps/app.xml')
        core_tree = ET.parse(BytesIO(core_xml))
        ext_tree = ET.parse(BytesIO(ext_xml))
        ns_core = {'dc': 'http://purl.org/dc/elements/1.1/', 'cp': 'http://schemas.openxmlformats.org/package/2006/metadata/core-properties', 'dcterms': 'http://purl.org/dc/terms/'}
        ns_ext = {'ep': 'http://schemas.openxmlformats.org/officeDocument/2006/extended-properties'}
        for elem in core_tree.iter():
            if elem.tag.startswith('{http'):
                key = elem.tag.split('}')[-1]
                self.core_props[key] = elem.text
        for elem in ext_tree.iter():
            if elem.tag.startswith('{http'):
                key = elem.tag.split('}')[-1]
                self.ext_props[key] = elem.text

    def print_properties(self):
        print("Core Properties:")
        for key, value in self.core_props.items():
            print(f"{key}: {value}")
        print("\nExtended Properties:")
        for key, value in self.ext_props.items():
            print(f"{key}: {value}")

    def write_properties(self, new_core={}, new_ext={}, output_path=None):
        if not output_path:
            output_path = self.filepath.replace('.docm', '_modified.docm')
        with zipfile.ZipFile(self.filepath, 'r') as zin:
            with zipfile.ZipFile(output_path, 'w') as zout:
                for item in zin.infolist():
                    if item.filename == 'docProps/core.xml':
                        core_tree = ET.parse(BytesIO(zin.read(item.filename)))
                        for key, value in new_core.items():
                            elem = core_tree.find(f".//*[local-name()='{key}']")
                            if elem is not None:
                                elem.text = value
                        zout.writestr(item.filename, ET.tostring(core_tree.getroot()))
                    elif item.filename == 'docProps/app.xml':
                        ext_tree = ET.parse(BytesIO(zin.read(item.filename)))
                        for key, value in new_ext.items():
                            elem = ext_tree.find(f".//*[local-name()='{key}']")
                            if elem is not None:
                                elem.text = value
                        zout.writestr(item.filename, ET.tostring(ext_tree.getroot()))
                    else:
                        zout.writestr(item, zin.read(item.filename))

    def close(self):
        if self.zip_file:
            self.zip_file.close()

# Example usage:
# handler = DocmHandler('example.docm')
# handler.open()
# handler.read_properties()
# handler.print_properties()
# handler.write_properties(new_core={'title': 'New Title'}, output_path='modified.docm')
# handler.close()
  1. Java Class for .DOCM File Handling

The following Java class uses java.util.zip and javax.xml.parsers to handle the file. It supports reading, printing, and writing properties.

import java.io.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;

public class DocmHandler {
    private String filepath;
    private ZipFile zipFile;
    private Document coreDoc;
    private Document extDoc;

    public DocmHandler(String filepath) {
        this.filepath = filepath;
    }

    public void open() throws IOException {
        zipFile = new ZipFile(filepath);
    }

    public void readProperties() throws Exception {
        if (zipFile == null) open();
        InputStream coreIs = zipFile.getInputStream(zipFile.getEntry("docProps/core.xml"));
        InputStream extIs = zipFile.getInputStream(zipFile.getEntry("docProps/app.xml"));
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        coreDoc = db.parse(new InputSource(coreIs));
        extDoc = db.parse(new InputSource(extIs));
        coreIs.close();
        extIs.close();
    }

    public void printProperties() {
        System.out.println("Core Properties:");
        NodeList coreNodes = coreDoc.getElementsByTagName("*");
        for (int i = 0; i < coreNodes.getLength(); i++) {
            Node node = coreNodes.item(i);
            if (node.getNodeType() == Node.ELEMENT_NODE && node.getTextContent() != null) {
                System.out.println(node.getLocalName() + ": " + node.getTextContent());
            }
        }
        System.out.println("\nExtended Properties:");
        NodeList extNodes = extDoc.getElementsByTagName("*");
        for (int i = 0; i < extNodes.getLength(); i++) {
            Node node = extNodes.item(i);
            if (node.getNodeType() == Node.ELEMENT_NODE && node.getTextContent() != null) {
                System.out.println(node.getLocalName() + ": " + node.getTextContent());
            }
        }
    }

    public void writeProperties(String newCoreKey, String newCoreValue, String newExtKey, String newExtValue, String outputPath) throws Exception {
        if (outputPath == null) outputPath = filepath.replace(".docm", "_modified.docm");
        try (ZipFile zin = new ZipFile(filepath);
             ZipOutputStream zout = new ZipOutputStream(new FileOutputStream(outputPath))) {
            for (ZipEntry entry : (Iterable<ZipEntry>) zin.entries()::iterator) {
                if (entry.getName().equals("docProps/core.xml")) {
                    Document doc = parseXml(zin.getInputStream(entry));
                    Node node = doc.getElementsByTagName(newCoreKey).item(0);
                    if (node != null) node.setTextContent(newCoreValue);
                    zout.putNextEntry(new ZipEntry(entry.getName()));
                    TransformerFactory tf = TransformerFactory.newInstance();
                    Transformer t = tf.newTransformer();
                    t.transform(new DOMSource(doc), new StreamResult(zout));
                } else if (entry.getName().equals("docProps/app.xml")) {
                    Document doc = parseXml(zin.getInputStream(entry));
                    Node node = doc.getElementsByTagName(newExtKey).item(0);
                    if (node != null) node.setTextContent(newExtValue);
                    zout.putNextEntry(new ZipEntry(entry.getName()));
                    TransformerFactory tf = TransformerFactory.newInstance();
                    Transformer t = tf.newTransformer();
                    t.transform(new DOMSource(doc), new StreamResult(zout));
                } else {
                    zout.putNextEntry(entry);
                    byte[] buf = new byte[1024];
                    InputStream is = zin.getInputStream(entry);
                    int len;
                    while ((len = is.read(buf)) > 0) {
                        zout.write(buf, 0, len);
                    }
                    is.close();
                }
            }
        }
    }

    private Document parseXml(InputStream is) throws Exception {
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        DocumentBuilder db = dbf.newDocumentBuilder();
        return db.parse(new InputSource(is));
    }

    public void close() throws IOException {
        if (zipFile != null) zipFile.close();
    }

    // Example usage:
    // public static void main(String[] args) throws Exception {
    //     DocmHandler handler = new DocmHandler("example.docm");
    //     handler.open();
    //     handler.readProperties();
    //     handler.printProperties();
    //     handler.writeProperties("title", "New Title", "Pages", "10", "modified.docm");
    //     handler.close();
    // }
}
  1. JavaScript Class for .DOCM File Handling

The following JavaScript class is for Node.js (using adm-zip for ZIP handling and xml2js for XML parsing; assume these are installed via npm). It supports reading, printing, and writing properties.

const AdmZip = require('adm-zip');
const xml2js = require('xml2js');

class DocmHandler {
    constructor(filepath) {
        this.filepath = filepath;
        this.coreProps = {};
        this.extProps = {};
    }

    open() {
        this.zip = new AdmZip(this.filepath);
    }

    readProperties() {
        const coreXml = this.zip.readAsText('docProps/core.xml');
        const extXml = this.zip.readAsText('docProps/app.xml');
        const parser = new xml2js.Parser();
        parser.parseString(coreXml, (err, result) => {
            if (!err) this.coreProps = this.flattenProps(result);
        });
        parser.parseString(extXml, (err, result) => {
            if (!err) this.extProps = this.flattenProps(result);
        });
    }

    flattenProps(xmlObj) {
        const props = {};
        for (let key in xmlObj) {
            const inner = xmlObj[key];
            for (let subKey in inner) {
                props[subKey] = inner[subKey][0];
            }
        }
        return props;
    }

    printProperties() {
        console.log('Core Properties:');
        console.log(this.coreProps);
        console.log('\nExtended Properties:');
        console.log(this.extProps);
    }

    writeProperties(newCore = {}, newExt = {}, outputPath = this.filepath.replace('.docm', '_modified.docm')) {
        const coreXml = this.zip.readAsText('docProps/core.xml');
        const extXml = this.zip.readAsText('docProps/app.xml');
        const parser = new xml2js.Parser();
        const builder = new xml2js.Builder();
        parser.parseString(coreXml, (err, result) => {
            if (!err) {
                Object.assign(result.cp.coreProperties, newCore);
                const newCoreXml = builder.buildObject(result);
                this.zip.updateFile('docProps/core.xml', Buffer.from(newCoreXml));
            }
        });
        parser.parseString(extXml, (err, result) => {
            if (!err) {
                Object.assign(result.Properties, newExt);
                const newExtXml = builder.buildObject(result);
                this.zip.updateFile('docProps/app.xml', Buffer.from(newExtXml));
            }
        });
        this.zip.writeZip(outputPath);
    }
}

// Example usage:
// const handler = new DocmHandler('example.docm');
// handler.open();
// handler.readProperties();
// handler.printProperties();
// handler.writeProperties({ title: 'New Title' }, { Pages: '10' });
  1. C Class for .DOCM File Handling

The following is a C++ class (as "C class" likely implies C++ for object-oriented features) using the miniz library for ZIP (assume included) and TinyXML2 for XML parsing. It supports reading, printing, and writing properties.

#include <iostream>
#include <string>
#include <map>
#include "miniz.h"  // Assume miniz for ZIP
#include "tinyxml2.h"  // Assume tinyxml2 for XML

class DocmHandler {
private:
    std::string filepath;
    mz_zip_archive zip;
    std::map<std::string, std::string> coreProps;
    std::map<std::string, std::string> extProps;

public:
    DocmHandler(const std::string& fp) : filepath(fp) {
        memset(&zip, 0, sizeof(zip));
    }

    ~DocmHandler() {
        close();
    }

    bool open() {
        return mz_zip_reader_init_file(&zip, filepath.c_str(), 0);
    }

    void readProperties() {
        size_t coreSize;
        void* coreData = mz_zip_reader_extract_file_to_heap(&zip, "docProps/core.xml", &coreSize, 0);
        size_t extSize;
        void* extData = mz_zip_reader_extract_file_to_heap(&zip, "docProps/app.xml", &extSize, 0);

        tinyxml2::XMLDocument coreDoc;
        coreDoc.Parse(static_cast<char*>(coreData), coreSize);
        tinyxml2::XMLDocument extDoc;
        extDoc.Parse(static_cast<char*>(extData), extSize);

        for (tinyxml2::XMLElement* elem = coreDoc.FirstChildElement()->FirstChildElement(); elem; elem = elem->NextSiblingElement()) {
            coreProps[elem->Name()] = elem->GetText() ? elem->GetText() : "";
        }
        for (tinyxml2::XMLElement* elem = extDoc.FirstChildElement()->FirstChildElement(); elem; elem = elem->NextSiblingElement()) {
            extProps[elem->Name()] = elem->GetText() ? elem->GetText() : "";
        }

        free(coreData);
        free(extData);
    }

    void printProperties() {
        std::cout << "Core Properties:" << std::endl;
        for (const auto& prop : coreProps) {
            std::cout << prop.first << ": " << prop.second << std::endl;
        }
        std::cout << "\nExtended Properties:" << std::endl;
        for (const auto& prop : extProps) {
            std::cout << prop.first << ": " << prop.second << std::endl;
        }
    }

    void writeProperties(const std::map<std::string, std::string>& newCore, const std::map<std::string, std::string>& newExt, const std::string& outputPath) {
        mz_zip_archive outZip;
        memset(&outZip, 0, sizeof(outZip));
        mz_zip_writer_init_file(&outZip, outputPath.c_str(), 0);

        for (mz_uint i = 0; i < mz_zip_reader_get_num_files(&zip); ++i) {
            mz_zip_archive_file_stat stat;
            mz_zip_reader_file_stat(&zip, i, &stat);
            size_t size;
            void* data = mz_zip_reader_extract_to_heap(&zip, i, &size, 0);

            if (std::string(stat.m_filename) == "docProps/core.xml") {
                tinyxml2::XMLDocument doc;
                doc.Parse(static_cast<char*>(data), size);
                for (const auto& kv : newCore) {
                    tinyxml2::XMLElement* elem = doc.FirstChildElement()->FirstChildElement(kv.first.c_str());
                    if (elem) elem->SetText(kv.second.c_str());
                }
                tinyxml2::XMLPrinter printer;
                doc.Print(&printer);
                free(data);
                data = strdup(printer.CStr());
                size = printer.CStrSize() - 1;
            } else if (std::string(stat.m_filename) == "docProps/app.xml") {
                tinyxml2::XMLDocument doc;
                doc.Parse(static_cast<char*>(data), size);
                for (const auto& kv : newExt) {
                    tinyxml2::XMLElement* elem = doc.FirstChildElement()->FirstChildElement(kv.first.c_str());
                    if (elem) elem->SetText(kv.second.c_str());
                }
                tinyxml2::XMLPrinter printer;
                doc.Print(&printer);
                free(data);
                data = strdup(printer.CStr());
                size = printer.CStrSize() - 1;
            }

            mz_zip_writer_add_mem(&outZip, stat.m_filename, data, size, MZ_DEFAULT_COMPRESSION);
            free(data);
        }

        mz_zip_writer_finalize_archive(&outZip);
        mz_zip_writer_end(&outZip);
    }

    void close() {
        mz_zip_reader_end(&zip);
    }
};

// Example usage:
// int main() {
//     DocmHandler handler("example.docm");
//     if (handler.open()) {
//         handler.readProperties();
//         handler.printProperties();
//         std::map<std::string, std::string> newCore = {{"title", "New Title"}};
//         std::map<std::string, std::string> newExt = {{"Pages", "10"}};
//         handler.writeProperties(newCore, newExt, "modified.docm");
//     }
//     return 0;
// }