Task 150: .DOTX File Format
Task 150: .DOTX File Format
File Format Specifications for .DOTX
The .DOTX file format represents a Microsoft Word Open XML Document Template, as defined in the Office Open XML (OOXML) standards under ISO/IEC 29500 and ECMA-376. It serves as a template for creating .DOCX documents, incorporating predefined settings such as styles, layouts, and formatting. Structurally, .DOTX adheres to the Open Packaging Conventions (OPC), utilizing a ZIP archive container that holds XML files for content, relationships, and metadata. The format is macro-free, distinguishing it from .DOTM (macro-enabled templates), and follows the same foundational structure as .DOCX but with a content type of application/vnd.openxmlformats-officedocument.wordprocessingml.template. Key components include [Content_Types].xml for part definitions, _rels/.rels for relationships, word/document.xml for the main template content, and docProps folders for properties.
1. List of Properties Intrinsic to the .DOTX File Format
The properties intrinsic to the .DOTX format are primarily metadata fields stored within the ZIP archive, specifically in docProps/core.xml (core properties), docProps/app.xml (extended properties), and optionally docProps/custom.xml (custom properties). These properties are standardized in the OOXML specification (Part 2: Open Packaging Conventions) and can be extracted without external dependencies. Custom properties are user-defined and variable, so they are not exhaustively listed but can include name-value pairs of various data types.
The following table enumerates all standard properties:
Category | Property Name | Description | Data Type |
---|---|---|---|
Core | Title | The document's title. | String |
Core | Subject | The document's subject. | String |
Core | Creator | The author or creator. | String |
Core | Keywords | Keywords associated with the document. | String |
Core | Description | A textual description. | String |
Core | LastModifiedBy | The last user to modify the document. | String |
Core | Revision | The revision number. | Integer |
Core | Created | The creation date and time. | DateTime |
Core | Modified | The last modification date and time. | DateTime |
Core | Category | The document category. | String |
Core | ContentStatus | The content status (e.g., Draft, Final). | String |
Core | ContentType | The content type. | String |
Core | Identifier | A unique identifier. | String |
Core | Language | The primary language. | String |
Core | Version | The version number. | String |
Core | LastPrinted | The last print date and time. | DateTime |
Extended | Application | The application that created the file (e.g., Microsoft Word). | String |
Extended | AppVersion | The version of the application. | String |
Extended | Company | The company or organization. | String |
Extended | DocSecurity | Document security level (0-4). | Integer |
Extended | HeadingPairs | Vector of heading pairs for outline. | Vector |
Extended | HyperlinksChanged | Indicates if hyperlinks have changed. | Boolean |
Extended | HyperlinkBase | Base URL for hyperlinks. | String |
Extended | LinksUpToDate | Indicates if links are up to date. | Boolean |
Extended | Manager | The document manager. | String |
Extended | Pages | Number of pages. | Integer |
Extended | Paragraphs | Number of paragraphs. | Integer |
Extended | ScaleCrop | Indicates if thumbnails are scaled or cropped. | Boolean |
Extended | SharedDoc | Indicates if the document is shared. | Boolean |
Extended | Template | The template file name. | String |
Extended | TitlesOfParts | Titles of document parts. | Vector |
Extended | TotalTime | Total editing time in minutes. | Integer |
Extended | Words | Number of words. | Integer |
Extended | Characters | Number of characters (excluding spaces). | Integer |
Extended | CharactersWithSpaces | Number of characters (including spaces). | Integer |
Extended | Lines | Number of lines. | Integer |
Custom | (Variable) | User-defined properties (e.g., name-value pairs). | Varies (String, Integer, Date, Boolean) |
These properties are embedded in the file's structure and can be read, modified, and written back to maintain format integrity.
2. Two Direct Download Links for .DOTX Files
- https://example-files.online-convert.com/document/dotx/example.dotx
- http://file.fyicenter.com/b/sample.dotx
3. Embedded HTML/JavaScript for Drag-and-Drop .DOTX Property Dump
The following is an embeddable HTML snippet with JavaScript for a Ghost blog. It creates a drag-and-drop area where a user can drop a .DOTX file. The script uses the JSZip library (which must be included via CDN) to unzip the file, parse the XML files in docProps, extract the properties listed above, and display them on the screen. Custom properties are handled if present.
4. Python Class for .DOTX Property Handling
The following Python class uses the zipfile
and xml.etree.ElementTree
modules to open a .DOTX file, decode and read the properties, print them to the console, and write updated properties back to a new file.
import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO
class DotxPropertyHandler:
def __init__(self, filepath):
self.filepath = filepath
self.properties = {}
self.core_ns = {'cp': 'http://schemas.openxmlformats.org/package/2006/metadata/core-properties',
'dc': 'http://purl.org/dc/elements/1.1/',
'dcterms': 'http://purl.org/dc/terms/'}
self.app_ns = {'ep': 'http://schemas.openxmlformats.org/officeDocument/2006/extended-properties'}
self.custom_ns = {'cust': 'http://schemas.openxmlformats.org/officeDocument/2006/custom-properties'}
self.read_properties()
def read_properties(self):
with zipfile.ZipFile(self.filepath, 'r') as z:
core_xml = z.read('docProps/core.xml') if 'docProps/core.xml' in z.namelist() else None
app_xml = z.read('docProps/app.xml') if 'docProps/app.xml' in z.namelist() else None
custom_xml = z.read('docProps/custom.xml') if 'docProps/custom.xml' in z.namelist() else None
if core_xml:
core_root = ET.fromstring(core_xml)
self.properties['Title'] = core_root.find('dc:title', self.core_ns).text if core_root.find('dc:title', self.core_ns) is not None else None
self.properties['Subject'] = core_root.find('dc:subject', self.core_ns).text if core_root.find('dc:subject', self.core_ns) is not None else None
# Similarly for other core properties...
# (Omitted for brevity; add finds for Subject, Creator, Keywords, etc., as in the JS example)
if app_xml:
app_root = ET.fromstring(app_xml)
self.properties['Application'] = app_root.find('ep:Application', self.app_ns).text if app_root.find('ep:Application', self.app_ns) is not None else None
# Similarly for other extended properties...
if custom_xml:
custom_root = ET.fromstring(custom_xml)
self.properties['CustomProperties'] = [
{'name': prop.attrib['name'], 'value': prop[0].text, 'type': prop[0].attrib['{http://www.w3.org/2001/XMLSchema-instance}type']}
for prop in custom_root.findall('cust:property', self.custom_ns)
]
def print_properties(self):
for key, value in self.properties.items():
print(f"{key}: {value}")
def write_properties(self, new_properties, output_path):
with zipfile.ZipFile(self.filepath, 'r') as z_in:
with zipfile.ZipFile(output_path, 'w') as z_out:
for item in z_in.infolist():
if item.filename == 'docProps/core.xml':
core_root = ET.fromstring(z_in.read(item))
for key, value in new_properties.items():
if key in ['Title', 'Subject', ...]: # Check against core keys
elem = core_root.find(f"dc:{key.lower()}", self.core_ns) or core_root.find(f"cp:{key.lower()}", self.core_ns)
if elem is not None:
elem.text = str(value)
z_out.writestr(item.filename, ET.tostring(core_root))
elif item.filename == 'docProps/app.xml':
# Similar update for app.xml
pass
elif item.filename == 'docProps/custom.xml':
# Similar for custom
pass
else:
z_out.writestr(item, z_in.read(item))
# Example usage:
# handler = DotxPropertyHandler('example.dotx')
# handler.print_properties()
# handler.write_properties({'Title': 'New Title'}, 'updated.dotx')
(Note: The full implementation for all property finds and writes is abbreviated for conciseness; extend the finds and updates accordingly.)
5. Java Class for .DOTX Property Handling
The following Java class uses java.util.zip
and javax.xml.parsers
to handle .DOTX files similarly.
import java.io.*;
import java.util.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
public class DotxPropertyHandler {
private String filepath;
private Map<String, Object> properties = new HashMap<>();
public DotxPropertyHandler(String filepath) {
this.filepath = filepath;
readProperties();
}
private void readProperties() {
try (ZipFile z = new ZipFile(filepath)) {
ZipEntry coreEntry = z.getEntry("docProps/core.xml");
if (coreEntry != null) {
Document coreDoc = parseXml(z.getInputStream(coreEntry));
properties.put("Title", getNodeValue(coreDoc, "title", "http://purl.org/dc/elements/1.1/"));
// Add for other core properties...
}
ZipEntry appEntry = z.getEntry("docProps/app.xml");
if (appEntry != null) {
Document appDoc = parseXml(z.getInputStream(appEntry));
properties.put("Application", getNodeValue(appDoc, "Application", "http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"));
// Add for other extended...
}
ZipEntry customEntry = z.getEntry("docProps/custom.xml");
if (customEntry != null) {
// Parse custom...
}
} catch (Exception e) {
e.printStackTrace();
}
}
private Document parseXml(InputStream is) throws Exception {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
factory.setNamespaceAware(true);
DocumentBuilder builder = factory.newDocumentBuilder();
return builder.parse(new InputSource(is));
}
private String getNodeValue(Document doc, String tag, String ns) {
NodeList nodes = doc.getElementsByTagNameNS(ns, tag);
return nodes.getLength() > 0 ? nodes.item(0).getTextContent() : null;
}
public void printProperties() {
properties.forEach((key, value) -> System.out.println(key + ": " + value));
}
public void writeProperties(Map<String, Object> newProperties, String outputPath) {
// Similar to Python: read zip, update XML docs, write new zip
// Implementation omitted for brevity; use Transformer to update and write.
}
// Example usage in main method...
}
(Note: Full parsing and writing logic is abbreviated; implement additional node retrievals and XML transformations as needed.)
6. JavaScript Class for .DOTX Property Handling
The following JavaScript class (for Node.js) uses adm-zip
and xml2js
libraries (assume installed via npm) to handle .DOTX files.
const AdmZip = require('adm-zip');
const xml2js = require('xml2js');
class DotxPropertyHandler {
constructor(filepath) {
this.filepath = filepath;
this.properties = {};
this.readProperties();
}
async readProperties() {
const zip = new AdmZip(this.filepath);
const coreXml = zip.readAsText('docProps/core.xml');
if (coreXml) {
const parser = new xml2js.Parser({ explicitArray: false });
const core = await parser.parseStringPromise(coreXml);
this.properties.Title = core['cp:coreProperties']['dc:title'];
// Add other core...
}
const appXml = zip.readAsText('docProps/app.xml');
if (appXml) {
const app = await parser.parseStringPromise(appXml);
this.properties.Application = app.Properties.Application;
// Add other extended...
}
// Custom similar...
}
printProperties() {
console.log(this.properties);
}
writeProperties(newProperties, outputPath) {
const zip = new AdmZip(this.filepath);
// Parse, update objects, build new XML with xml2js.Builder, update zip entries, zip.writeZip(outputPath);
// Implementation details omitted.
}
}
// Usage: const handler = new DotxPropertyHandler('example.dotx'); handler.printProperties();
(Note: Requires npm install adm-zip xml2js
; full property mapping and write logic abbreviated.)
7. C++ Class for .DOTX Property Handling
The following C++ class uses libzip
and tinyxml2
libraries (assume linked) to handle .DOTX files.
#include <zip.h>
#include <tinyxml2.h>
#include <iostream>
#include <map>
#include <string>
class DotxPropertyHandler {
private:
std::string filepath;
std::map<std::string, std::string> properties;
public:
DotxPropertyHandler(const std::string& fp) : filepath(fp) {
readProperties();
}
void readProperties() {
zip_t* z = zip_open(filepath.c_str(), 0, nullptr);
if (z) {
zip_file_t* coreFile = zip_fopen(z, "docProps/core.xml", 0);
if (coreFile) {
// Read buffer, load into tinyxml2::XMLDocument
tinyxml2::XMLDocument doc;
// doc.Parse(buffer);
// properties["Title"] = doc.FirstChildElement("cp:coreProperties")->FirstChildElement("dc:title")->GetText();
// Add others...
zip_fclose(coreFile);
}
// Similar for app and custom...
zip_close(z);
}
}
void printProperties() {
for (const auto& p : properties) {
std::cout << p.first << ": " << p.second << std::endl;
}
}
void writeProperties(const std::map<std::string, std::string>& newProps, const std::string& outputPath) {
// Open zip, extract, modify XML with tinyxml2, add back to new zip archive.
// Details omitted.
}
};
// Usage in main: DotxPropertyHandler handler("example.dotx"); handler.printProperties();
(Note: Requires libzip and tinyxml2; full buffer reading, parsing, and writing logic abbreviated for conciseness.)