Task 147: .DOCM File Format
Task 147: .DOCM File Format
File Format Specifications for the .DOCM File Format
The .DOCM file format is a macro-enabled document format used by Microsoft Word, based on the Office Open XML (OOXML) standard as defined in ISO/IEC 29500 and ECMA-376. It is essentially a ZIP archive containing XML files and binary components, conforming to the Open Packaging Conventions (OPC). The format supports structured document content, metadata, and embedded Visual Basic for Applications (VBA) macros, which distinguish it from the non-macro-enabled .DOCX format. Key structural elements include:
- A ZIP container with a file signature of
PK\003\004
. - Core files such as
[Content_Types].xml
for MIME types,_rels/.rels
for relationships,docProps/core.xml
for core metadata,docProps/app.xml
for extended properties,word/document.xml
for main content, andword/vbaProject.bin
for macros. - MIME type:
application/vnd.openxmlformats-officedocument.wordprocessingml.document
. - The format allows for extensibility, digital signatures, and encryption, with macros stored in binary form within the
vbaProject.bin
part.
- List of All Properties of This File Format Intrinsic to Its File System
The properties intrinsic to the .DOCM format are primarily the metadata stored within the ZIP archive's docProps/core.xml
(core properties) and docProps/app.xml
(extended properties). These are standardized in the OOXML specification and include:
Core Properties (from docProps/core.xml):
- Title
- Subject
- Creator
- Keywords
- Description
- LastModifiedBy
- Revision
- Created
- Modified
- Category
- ContentStatus
- ContentType
- Identifier
- Language
- LastPrinted
- Version
Extended Properties (from docProps/app.xml, specific to Word documents):
- Template
- TotalTime
- Pages
- Words
- Characters
- Application
- DocSecurity
- Lines
- Paragraphs
- ScaleCrop
- Company
- LinksUpToDate
- CharactersWithSpaces
- SharedDoc
- HyperlinksChanged
- AppVersion
These properties represent document metadata and are accessible without opening the file in Word, by parsing the XML files within the ZIP structure.
- Two Direct Download Links for Files of Format .DOCM
- https://example-files.online-convert.com/document/docm/example.docm
- https://bazaar.abuse.ch/download/d2809e3e60e5d9671be8644750ad1b385aaa6b4ff01fef8fc594d81c69275a33
- Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .DOCM File Property Dump
The following is a self-contained HTML snippet with embedded JavaScript that can be inserted into a Ghost blog post. It creates a drag-and-drop area where a user can drop a .DOCM file. The script uses the browser's File API and a simple ZIP parser (without external libraries) to extract and display the properties from docProps/core.xml
and docProps/app.xml
. Note that full ZIP parsing in pure JavaScript is limited; this implementation assumes a standard structure and handles basic extraction.
This code assumes uncompressed XML parts for simplicity; in practice, compression may require a full ZIP library like JSZip.
- Python Class for .DOCM File Handling
The following Python class uses the zipfile
and xml.etree.ElementTree
modules to open, read, modify, and print the properties. The write
method allows updating properties and saving to a new file.
import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO
class DocmHandler:
def __init__(self, filepath):
self.filepath = filepath
self.core_props = {}
self.ext_props = {}
self.zip_file = None
def open(self):
self.zip_file = zipfile.ZipFile(self.filepath, 'r')
def read_properties(self):
if not self.zip_file:
self.open()
core_xml = self.zip_file.read('docProps/core.xml')
ext_xml = self.zip_file.read('docProps/app.xml')
core_tree = ET.parse(BytesIO(core_xml))
ext_tree = ET.parse(BytesIO(ext_xml))
ns_core = {'dc': 'http://purl.org/dc/elements/1.1/', 'cp': 'http://schemas.openxmlformats.org/package/2006/metadata/core-properties', 'dcterms': 'http://purl.org/dc/terms/'}
ns_ext = {'ep': 'http://schemas.openxmlformats.org/officeDocument/2006/extended-properties'}
for elem in core_tree.iter():
if elem.tag.startswith('{http'):
key = elem.tag.split('}')[-1]
self.core_props[key] = elem.text
for elem in ext_tree.iter():
if elem.tag.startswith('{http'):
key = elem.tag.split('}')[-1]
self.ext_props[key] = elem.text
def print_properties(self):
print("Core Properties:")
for key, value in self.core_props.items():
print(f"{key}: {value}")
print("\nExtended Properties:")
for key, value in self.ext_props.items():
print(f"{key}: {value}")
def write_properties(self, new_core={}, new_ext={}, output_path=None):
if not output_path:
output_path = self.filepath.replace('.docm', '_modified.docm')
with zipfile.ZipFile(self.filepath, 'r') as zin:
with zipfile.ZipFile(output_path, 'w') as zout:
for item in zin.infolist():
if item.filename == 'docProps/core.xml':
core_tree = ET.parse(BytesIO(zin.read(item.filename)))
for key, value in new_core.items():
elem = core_tree.find(f".//*[local-name()='{key}']")
if elem is not None:
elem.text = value
zout.writestr(item.filename, ET.tostring(core_tree.getroot()))
elif item.filename == 'docProps/app.xml':
ext_tree = ET.parse(BytesIO(zin.read(item.filename)))
for key, value in new_ext.items():
elem = ext_tree.find(f".//*[local-name()='{key}']")
if elem is not None:
elem.text = value
zout.writestr(item.filename, ET.tostring(ext_tree.getroot()))
else:
zout.writestr(item, zin.read(item.filename))
def close(self):
if self.zip_file:
self.zip_file.close()
# Example usage:
# handler = DocmHandler('example.docm')
# handler.open()
# handler.read_properties()
# handler.print_properties()
# handler.write_properties(new_core={'title': 'New Title'}, output_path='modified.docm')
# handler.close()
- Java Class for .DOCM File Handling
The following Java class uses java.util.zip
and javax.xml.parsers
to handle the file. It supports reading, printing, and writing properties.
import java.io.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class DocmHandler {
private String filepath;
private ZipFile zipFile;
private Document coreDoc;
private Document extDoc;
public DocmHandler(String filepath) {
this.filepath = filepath;
}
public void open() throws IOException {
zipFile = new ZipFile(filepath);
}
public void readProperties() throws Exception {
if (zipFile == null) open();
InputStream coreIs = zipFile.getInputStream(zipFile.getEntry("docProps/core.xml"));
InputStream extIs = zipFile.getInputStream(zipFile.getEntry("docProps/app.xml"));
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
coreDoc = db.parse(new InputSource(coreIs));
extDoc = db.parse(new InputSource(extIs));
coreIs.close();
extIs.close();
}
public void printProperties() {
System.out.println("Core Properties:");
NodeList coreNodes = coreDoc.getElementsByTagName("*");
for (int i = 0; i < coreNodes.getLength(); i++) {
Node node = coreNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE && node.getTextContent() != null) {
System.out.println(node.getLocalName() + ": " + node.getTextContent());
}
}
System.out.println("\nExtended Properties:");
NodeList extNodes = extDoc.getElementsByTagName("*");
for (int i = 0; i < extNodes.getLength(); i++) {
Node node = extNodes.item(i);
if (node.getNodeType() == Node.ELEMENT_NODE && node.getTextContent() != null) {
System.out.println(node.getLocalName() + ": " + node.getTextContent());
}
}
}
public void writeProperties(String newCoreKey, String newCoreValue, String newExtKey, String newExtValue, String outputPath) throws Exception {
if (outputPath == null) outputPath = filepath.replace(".docm", "_modified.docm");
try (ZipFile zin = new ZipFile(filepath);
ZipOutputStream zout = new ZipOutputStream(new FileOutputStream(outputPath))) {
for (ZipEntry entry : (Iterable<ZipEntry>) zin.entries()::iterator) {
if (entry.getName().equals("docProps/core.xml")) {
Document doc = parseXml(zin.getInputStream(entry));
Node node = doc.getElementsByTagName(newCoreKey).item(0);
if (node != null) node.setTextContent(newCoreValue);
zout.putNextEntry(new ZipEntry(entry.getName()));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(zout));
} else if (entry.getName().equals("docProps/app.xml")) {
Document doc = parseXml(zin.getInputStream(entry));
Node node = doc.getElementsByTagName(newExtKey).item(0);
if (node != null) node.setTextContent(newExtValue);
zout.putNextEntry(new ZipEntry(entry.getName()));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
t.transform(new DOMSource(doc), new StreamResult(zout));
} else {
zout.putNextEntry(entry);
byte[] buf = new byte[1024];
InputStream is = zin.getInputStream(entry);
int len;
while ((len = is.read(buf)) > 0) {
zout.write(buf, 0, len);
}
is.close();
}
}
}
}
private Document parseXml(InputStream is) throws Exception {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
DocumentBuilder db = dbf.newDocumentBuilder();
return db.parse(new InputSource(is));
}
public void close() throws IOException {
if (zipFile != null) zipFile.close();
}
// Example usage:
// public static void main(String[] args) throws Exception {
// DocmHandler handler = new DocmHandler("example.docm");
// handler.open();
// handler.readProperties();
// handler.printProperties();
// handler.writeProperties("title", "New Title", "Pages", "10", "modified.docm");
// handler.close();
// }
}
- JavaScript Class for .DOCM File Handling
The following JavaScript class is for Node.js (using adm-zip
for ZIP handling and xml2js
for XML parsing; assume these are installed via npm). It supports reading, printing, and writing properties.
const AdmZip = require('adm-zip');
const xml2js = require('xml2js');
class DocmHandler {
constructor(filepath) {
this.filepath = filepath;
this.coreProps = {};
this.extProps = {};
}
open() {
this.zip = new AdmZip(this.filepath);
}
readProperties() {
const coreXml = this.zip.readAsText('docProps/core.xml');
const extXml = this.zip.readAsText('docProps/app.xml');
const parser = new xml2js.Parser();
parser.parseString(coreXml, (err, result) => {
if (!err) this.coreProps = this.flattenProps(result);
});
parser.parseString(extXml, (err, result) => {
if (!err) this.extProps = this.flattenProps(result);
});
}
flattenProps(xmlObj) {
const props = {};
for (let key in xmlObj) {
const inner = xmlObj[key];
for (let subKey in inner) {
props[subKey] = inner[subKey][0];
}
}
return props;
}
printProperties() {
console.log('Core Properties:');
console.log(this.coreProps);
console.log('\nExtended Properties:');
console.log(this.extProps);
}
writeProperties(newCore = {}, newExt = {}, outputPath = this.filepath.replace('.docm', '_modified.docm')) {
const coreXml = this.zip.readAsText('docProps/core.xml');
const extXml = this.zip.readAsText('docProps/app.xml');
const parser = new xml2js.Parser();
const builder = new xml2js.Builder();
parser.parseString(coreXml, (err, result) => {
if (!err) {
Object.assign(result.cp.coreProperties, newCore);
const newCoreXml = builder.buildObject(result);
this.zip.updateFile('docProps/core.xml', Buffer.from(newCoreXml));
}
});
parser.parseString(extXml, (err, result) => {
if (!err) {
Object.assign(result.Properties, newExt);
const newExtXml = builder.buildObject(result);
this.zip.updateFile('docProps/app.xml', Buffer.from(newExtXml));
}
});
this.zip.writeZip(outputPath);
}
}
// Example usage:
// const handler = new DocmHandler('example.docm');
// handler.open();
// handler.readProperties();
// handler.printProperties();
// handler.writeProperties({ title: 'New Title' }, { Pages: '10' });
- C Class for .DOCM File Handling
The following is a C++ class (as "C class" likely implies C++ for object-oriented features) using the miniz
library for ZIP (assume included) and TinyXML2 for XML parsing. It supports reading, printing, and writing properties.
#include <iostream>
#include <string>
#include <map>
#include "miniz.h" // Assume miniz for ZIP
#include "tinyxml2.h" // Assume tinyxml2 for XML
class DocmHandler {
private:
std::string filepath;
mz_zip_archive zip;
std::map<std::string, std::string> coreProps;
std::map<std::string, std::string> extProps;
public:
DocmHandler(const std::string& fp) : filepath(fp) {
memset(&zip, 0, sizeof(zip));
}
~DocmHandler() {
close();
}
bool open() {
return mz_zip_reader_init_file(&zip, filepath.c_str(), 0);
}
void readProperties() {
size_t coreSize;
void* coreData = mz_zip_reader_extract_file_to_heap(&zip, "docProps/core.xml", &coreSize, 0);
size_t extSize;
void* extData = mz_zip_reader_extract_file_to_heap(&zip, "docProps/app.xml", &extSize, 0);
tinyxml2::XMLDocument coreDoc;
coreDoc.Parse(static_cast<char*>(coreData), coreSize);
tinyxml2::XMLDocument extDoc;
extDoc.Parse(static_cast<char*>(extData), extSize);
for (tinyxml2::XMLElement* elem = coreDoc.FirstChildElement()->FirstChildElement(); elem; elem = elem->NextSiblingElement()) {
coreProps[elem->Name()] = elem->GetText() ? elem->GetText() : "";
}
for (tinyxml2::XMLElement* elem = extDoc.FirstChildElement()->FirstChildElement(); elem; elem = elem->NextSiblingElement()) {
extProps[elem->Name()] = elem->GetText() ? elem->GetText() : "";
}
free(coreData);
free(extData);
}
void printProperties() {
std::cout << "Core Properties:" << std::endl;
for (const auto& prop : coreProps) {
std::cout << prop.first << ": " << prop.second << std::endl;
}
std::cout << "\nExtended Properties:" << std::endl;
for (const auto& prop : extProps) {
std::cout << prop.first << ": " << prop.second << std::endl;
}
}
void writeProperties(const std::map<std::string, std::string>& newCore, const std::map<std::string, std::string>& newExt, const std::string& outputPath) {
mz_zip_archive outZip;
memset(&outZip, 0, sizeof(outZip));
mz_zip_writer_init_file(&outZip, outputPath.c_str(), 0);
for (mz_uint i = 0; i < mz_zip_reader_get_num_files(&zip); ++i) {
mz_zip_archive_file_stat stat;
mz_zip_reader_file_stat(&zip, i, &stat);
size_t size;
void* data = mz_zip_reader_extract_to_heap(&zip, i, &size, 0);
if (std::string(stat.m_filename) == "docProps/core.xml") {
tinyxml2::XMLDocument doc;
doc.Parse(static_cast<char*>(data), size);
for (const auto& kv : newCore) {
tinyxml2::XMLElement* elem = doc.FirstChildElement()->FirstChildElement(kv.first.c_str());
if (elem) elem->SetText(kv.second.c_str());
}
tinyxml2::XMLPrinter printer;
doc.Print(&printer);
free(data);
data = strdup(printer.CStr());
size = printer.CStrSize() - 1;
} else if (std::string(stat.m_filename) == "docProps/app.xml") {
tinyxml2::XMLDocument doc;
doc.Parse(static_cast<char*>(data), size);
for (const auto& kv : newExt) {
tinyxml2::XMLElement* elem = doc.FirstChildElement()->FirstChildElement(kv.first.c_str());
if (elem) elem->SetText(kv.second.c_str());
}
tinyxml2::XMLPrinter printer;
doc.Print(&printer);
free(data);
data = strdup(printer.CStr());
size = printer.CStrSize() - 1;
}
mz_zip_writer_add_mem(&outZip, stat.m_filename, data, size, MZ_DEFAULT_COMPRESSION);
free(data);
}
mz_zip_writer_finalize_archive(&outZip);
mz_zip_writer_end(&outZip);
}
void close() {
mz_zip_reader_end(&zip);
}
};
// Example usage:
// int main() {
// DocmHandler handler("example.docm");
// if (handler.open()) {
// handler.readProperties();
// handler.printProperties();
// std::map<std::string, std::string> newCore = {{"title", "New Title"}};
// std::map<std::string, std::string> newExt = {{"Pages", "10"}};
// handler.writeProperties(newCore, newExt, "modified.docm");
// }
// return 0;
// }