Task 472: .ODF File Format
Task 472: .ODF File Format
Specifications for the .ODF File Format
The .ODF file format refers to the OpenDocument Format for Office Applications, an XML-based open standard for office documents standardized as ISO/IEC 26300 by the OASIS consortium. It encompasses various document types, including formulas (.odf extension for OpenDocument Formula files), and is structured as a ZIP archive containing XML files for content, metadata, styles, and other components. The complete specifications are detailed in the OASIS OpenDocument Version 1.3 documentation, available at https://docs.oasis-open.org/office/OpenDocument/v1.3/OpenDocument-v1.3.html.
- List of Properties Intrinsic to the File Format
The properties intrinsic to the .ODF file format, particularly those related to its internal structure and metadata, are stored within the ZIP package, primarily in the meta.xml file under the <office:meta> element. These metadata properties are common across OpenDocument types, including formula documents, and include predefined elements from namespaces such as meta: (OpenDocument metadata) and dc: (Dublin Core). The following table enumerates all such properties, including their namespace-qualified names, attributes (if applicable), and descriptions:
| Property Name | Attributes | Description |
|---|---|---|
| meta:generator | None | Identifies the software and version that generated the document. |
| dc:title | None | Specifies the title of the document. |
| dc:description | None | Provides a textual summary or abstract of the document's content. |
| dc:subject | None | Indicates the main topic or subject area of the document. |
| meta:keyword | None | Lists individual keywords associated with the document (multiple instances permitted). |
| meta:initial-creator | None | Names the person or entity who initially created the document. |
| dc:creator | None | Names the person or entity who last modified the document. |
| meta:printed-by | None | Records the person or entity who last printed the document. |
| meta:creation-date | None | Records the date and time of document creation in ISO 8601 format. |
| dc:date | None | Records the date and time of the last modification in ISO 8601 format. |
| meta:print-date | None | Records the date and time of the last print action in ISO 8601 format. |
| meta:template | None | Identifies the template file used to create the document, including its name or path. |
| meta:auto-reload | None | Specifies whether the document should automatically reload from its source (boolean value). |
| meta:hyperlink-behaviour | None | Defines the handling of hyperlinks, such as target frame or display mode. |
| dc:language | None | Specifies the primary language code of the document (e.g., "en-US" per RFC 4646). |
| meta:editing-cycles | None | Counts the number of editing or save cycles as an integer. |
| meta:editing-duration | None | Tracks the total editing time as a duration in ISO 8601 format (e.g., "PT1H30M"). |
| meta:document-statistic | None | Provides counts for document elements, such as pages, words, or characters. |
| meta:user-defined | meta:name (required for identification) | Allows custom metadata properties with a name-value pair, supporting types like string, date, or float. |
These properties are extracted from the OpenDocument schema specification.
- Two Direct Download Links for .ODF Files
Based on available resources, the following direct download links point to sample OpenDocument files (.odt extension, which adheres to the .ODF format standard):
- https://raw.githubusercontent.com/sebkur/odftoolkit-samples/master/samples/src/main/resources/main.odt
- https://raw.githubusercontent.com/sebkur/odftoolkit-samples/master/samples/src/main/resources/letter.odt
These files serve as examples for testing and can be used with the code provided below.
- HTML JavaScript for Drag-and-Drop .ODF File Property Dump
The following is a self-contained HTML page with embedded JavaScript suitable for embedding in a Ghost blog (or any HTML environment). It enables users to drag and drop a .ODF file, unzips it using JSZip (loaded from a CDN), parses the meta.xml file, extracts the metadata properties listed above, and displays them on the screen. Ensure the page is served over HTTPS for FileReader compatibility in modern browsers.
- Python Class for .ODF File Handling
The following Python class opens a .ODF file, decodes its ZIP structure, reads and writes metadata properties from meta.xml, and prints them to the console. It uses standard libraries zipfile and xml.etree.ElementTree.
import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO
class ODFMetadataHandler:
def __init__(self, filename):
self.filename = filename
self.zip = zipfile.ZipFile(filename, 'r')
self.tree = self._parse_meta()
self.properties = self._extract_properties()
def _parse_meta(self):
with self.zip.open('meta.xml') as f:
return ET.parse(f)
def _extract_properties(self):
properties = {}
meta = self.tree.getroot().find('{urn:oasis:names:tc:opendocument:xmlns:office:1.0}meta')
if meta is None:
raise ValueError('No meta element found')
ns = {
'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
'dc': 'http://purl.org/dc/elements/1.1/'
}
for key in ['generator', 'keyword', 'initial-creator', 'printed-by', 'creation-date', 'print-date', 'template', 'auto-reload', 'hyperlink-behaviour', 'editing-cycles', 'editing-duration', 'document-statistic']:
elements = meta.findall(f'meta:{key}', ns)
for el in elements:
properties[f'meta:{key}'] = el.text or el.attrib
for key in ['title', 'description', 'subject', 'creator', 'date', 'language']:
elements = meta.findall(f'dc:{key}', ns)
for el in elements:
properties[f'dc:{key}'] = el.text
user_defined = meta.findall('meta:user-defined', ns)
for ud in user_defined:
name = ud.get(f'{{{ns["meta"]}}}name')
properties[f'user-defined:{name}'] = ud.text
return properties
def print_properties(self):
for key, value in self.properties.items():
print(f"{key}: {value}")
def write_property(self, key, value):
ns = {
'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
'dc': 'http://purl.org/dc/elements/1.1/'
}
meta = self.tree.getroot().find('{urn:oasis:names:tc:opendocument:xmlns:office:1.0}meta')
prefix, subkey = key.split(':') if ':' in key else ('meta', key)
namespace = ns[prefix]
element = meta.find(f'{prefix}:{subkey}', ns)
if element is None:
ET.SubElement(meta, f'{{{namespace}}}{subkey}').text = value
else:
element.text = value
def save(self, new_filename=None):
if new_filename is None:
new_filename = self.filename
with zipfile.ZipFile(new_filename, 'w') as new_zip:
for item in self.zip.infolist():
if item.filename == 'meta.xml':
meta_bytes = BytesIO()
self.tree.write(meta_bytes, encoding='utf-8', xml_declaration=True)
new_zip.writestr('meta.xml', meta_bytes.getvalue())
else:
new_zip.writestr(item, self.zip.read(item.filename))
self.zip.close()
To use: handler = ODFMetadataHandler('sample.odf'); handler.print_properties(); handler.write_property('dc:title', 'New Title'); handler.save()
- Java Class for .ODF File Handling
The following Java class performs similar operations using java.util.zip for ZIP handling and javax.xml.parsers for XML parsing.
import java.io.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import java.util.HashMap;
import java.util.Map;
public class ODFMetadataHandler {
private String filename;
private ZipFile zip;
private Document document;
private Map<String, String> properties;
public ODFMetadataHandler(String filename) throws Exception {
this.filename = filename;
this.zip = new ZipFile(filename);
this.document = parseMeta();
this.properties = extractProperties();
}
private Document parseMeta() throws Exception {
ZipEntry entry = zip.getEntry("meta.xml");
InputStream is = zip.getInputStream(entry);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
return db.parse(is);
}
private Map<String, String> extractProperties() {
Map<String, String> props = new HashMap<>();
Node meta = document.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:office:1.0", "meta").item(0);
if (meta == null) throw new RuntimeException("No meta element found");
String[] metaKeys = {"generator", "keyword", "initial-creator", "printed-by", "creation-date", "print-date", "template", "auto-reload", "hyperlink-behaviour", "editing-cycles", "editing-duration", "document-statistic"};
for (String key : metaKeys) {
NodeList nodes = ((Element) meta).getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", key);
for (int i = 0; i < nodes.getLength(); i++) {
props.put("meta:" + key, nodes.item(i).getTextContent());
}
}
String[] dcKeys = {"title", "description", "subject", "creator", "date", "language"};
for (String key : dcKeys) {
NodeList nodes = ((Element) meta).getElementsByTagNameNS("http://purl.org/dc/elements/1.1/", key);
for (int i = 0; i < nodes.getLength(); i++) {
props.put("dc:" + key, nodes.item(i).getTextContent());
}
}
NodeList userDefined = ((Element) meta).getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "user-defined");
for (int i = 0; i < userDefined.getLength(); i++) {
Element el = (Element) userDefined.item(i);
String name = el.getAttributeNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "name");
props.put("user-defined:" + name, el.getTextContent());
}
return props;
}
public void printProperties() {
for (Map.Entry<String, String> entry : properties.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
}
public void writeProperty(String key, String value) {
String[] parts = key.split(":");
String prefix = parts[0];
String subkey = parts[1];
String namespace = prefix.equals("dc") ? "http://purl.org/dc/elements/1.1/" : "urn:oasis:names:tc:opendocument:xmlns:meta:1.0";
Node meta = document.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:office:1.0", "meta").item(0);
NodeList nodes = ((Element) meta).getElementsByTagNameNS(namespace, subkey);
Element element;
if (nodes.getLength() == 0) {
element = document.createElementNS(namespace, prefix + ":" + subkey);
meta.appendChild(element);
} else {
element = (Element) nodes.item(0);
}
element.setTextContent(value);
}
public void save(String newFilename) throws Exception {
if (newFilename == null) newFilename = filename;
try (ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(newFilename))) {
for (ZipEntry entry : zip.entries()) {
if (entry.getName().equals("meta.xml")) {
ByteArrayOutputStream baos = new ByteArrayOutputStream();
TransformerFactory.newInstance().newTransformer().transform(
new DOMSource(document), new StreamResult(baos));
ZipEntry newEntry = new ZipEntry("meta.xml");
zos.putNextEntry(newEntry);
zos.write(baos.toByteArray());
} else {
zos.putNextEntry(entry);
InputStream is = zip.getInputStream(entry);
byte[] buffer = new byte[1024];
int len;
while ((len = is.read(buffer)) > 0) {
zos.write(buffer, 0, len);
}
is.close();
}
zos.closeEntry();
}
}
zip.close();
}
}
To use: ODFMetadataHandler handler = new ODFMetadataHandler("sample.odf"); handler.printProperties(); handler.writeProperty("dc:title", "New Title"); handler.save(null);
- JavaScript Class for .ODF File Handling
The following JavaScript class is designed for Node.js, using the jszip library (install via npm install jszip) and fs for file operations. It opens, reads, writes, and prints properties.
const fs = require('fs');
const JSZip = require('jszip');
const { DOMParser, XMLSerializer } = require('xmldom');
class ODFMetadataHandler {
constructor(filename) {
this.filename = filename;
this.zipData = fs.readFileSync(filename);
this.properties = {};
}
async readProperties() {
const zip = await JSZip.loadAsync(this.zipData);
const metaXml = await zip.file('meta.xml').async('string');
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(metaXml, 'application/xml');
const meta = xmlDoc.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:office:1.0', 'meta')[0];
if (!meta) throw new Error('No meta element found');
const ns = {
meta: 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
dc: 'http://purl.org/dc/elements/1.1/'
};
['generator', 'keyword', 'initial-creator', 'printed-by', 'creation-date', 'print-date', 'template', 'auto-reload', 'hyperlink-behaviour', 'editing-cycles', 'editing-duration', 'document-statistic'].forEach(key => {
const elements = meta.getElementsByTagNameNS(ns.meta, key);
for (let i = 0; i < elements.length; i++) {
this.properties[`meta:${key}`] = elements[i].textContent;
}
});
['title', 'description', 'subject', 'creator', 'date', 'language'].forEach(key => {
const elements = meta.getElementsByTagNameNS(ns.dc, key);
for (let i = 0; i < elements.length; i++) {
this.properties[`dc:${key}`] = elements[i].textContent;
}
});
const userDefined = meta.getElementsByTagNameNS(ns.meta, 'user-defined');
for (let i = 0; i < userDefined.length; i++) {
const name = userDefined[i].getAttributeNS(ns.meta, 'name');
this.properties[`user-defined:${name}`] = userDefined[i].textContent;
}
return this.properties;
}
printProperties() {
for (const [key, value] of Object.entries(this.properties)) {
console.log(`${key}: ${value}`);
}
}
async writeProperty(key, value) {
const zip = await JSZip.loadAsync(this.zipData);
let metaXml = await zip.file('meta.xml').async('string');
const parser = new DOMParser();
const xmlDoc = parser.parseFromString(metaXml, 'application/xml');
const meta = xmlDoc.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:office:1.0', 'meta')[0];
const [prefix, subkey] = key.split(':');
const ns = prefix === 'dc' ? 'http://purl.org/dc/elements/1.1/' : 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0';
let element = meta.getElementsByTagNameNS(ns, subkey)[0];
if (!element) {
element = xmlDoc.createElementNS(ns, `${prefix}:${subkey}`);
meta.appendChild(element);
}
element.textContent = value;
const serializer = new XMLSerializer();
metaXml = serializer.serializeToString(xmlDoc);
zip.file('meta.xml', metaXml);
this.zipData = await zip.generateAsync({type: 'nodebuffer'});
}
save(newFilename = null) {
if (!newFilename) newFilename = this.filename;
fs.writeFileSync(newFilename, this.zipData);
}
}
To use: const handler = new ODFMetadataHandler('sample.odf'); await handler.readProperties(); handler.printProperties(); await handler.writeProperty('dc:title', 'New Title'); handler.save();
- C++ Class for .ODF File Handling
The following C++ class requires external libraries libzip (for ZIP handling) and tinyxml2 (for XML parsing). Compile with -lzip -ltinyxml2. It opens, reads, writes, and prints properties.
#include <iostream>
#include <map>
#include <string>
#include <zip.h>
#include <tinyxml2.h>
class ODFMetadataHandler {
private:
std::string filename;
zip_t* zip;
tinyxml2::XMLDocument doc;
std::map<std::string, std::string> properties;
void parseMeta() {
zip_file_t* file = zip_fopen(zip, "meta.xml", 0);
if (!file) throw std::runtime_error("No meta.xml found");
std::string xmlContent;
char buffer[1024];
zip_int64_t len;
while ((len = zip_fread(file, buffer, sizeof(buffer))) > 0) {
xmlContent.append(buffer, len);
}
zip_fclose(file);
doc.Parse(xmlContent.c_str());
}
void extractProperties() {
tinyxml2::XMLElement* meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
if (!meta) throw std::runtime_error("No meta element found");
const char* metaNS = "urn:oasis:names:tc:opendocument:xmlns:meta:1.0";
const char* dcNS = "http://purl.org/dc/elements/1.1/";
const char* metaKeys[] = {"meta:generator", "meta:keyword", "meta:initial-creator", "meta:printed-by", "meta:creation-date", "meta:print-date", "meta:template", "meta:auto-reload", "meta:hyperlink-behaviour", "meta:editing-cycles", "meta:editing-duration", "meta:document-statistic"};
for (const char* key : metaKeys) {
tinyxml2::XMLElement* el = meta->FirstChildElement(key);
while (el) {
properties[key] = el->GetText() ? el->GetText() : "";
el = el->NextSiblingElement(key);
}
}
const char* dcKeys[] = {"dc:title", "dc:description", "dc:subject", "dc:creator", "dc:date", "dc:language"};
for (const char* key : dcKeys) {
tinyxml2::XMLElement* el = meta->FirstChildElement(key);
while (el) {
properties[key] = el->GetText() ? el->GetText() : "";
el = el->NextSiblingElement(key);
}
}
tinyxml2::XMLElement* ud = meta->FirstChildElement("meta:user-defined");
while (ud) {
std::string name = ud->Attribute("meta:name");
properties["user-defined:" + name] = ud->GetText() ? ud->GetText() : "";
ud = ud->NextSiblingElement("meta:user-defined");
}
}
public:
ODFMetadataHandler(const std::string& fn) : filename(fn) {
int err = 0;
zip = zip_open(filename.c_str(), ZIP_RDONLY, &err);
if (!zip) throw std::runtime_error("Failed to open ZIP");
parseMeta();
extractProperties();
}
~ODFMetadataHandler() {
zip_close(zip);
}
void printProperties() {
for (const auto& pair : properties) {
std::cout << pair.first << ": " << pair.second << std::endl;
}
}
void writeProperty(const std::string& key, const std::string& value) {
tinyxml2::XMLElement* meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
tinyxml2::XMLElement* el = meta->FirstChildElement(key.c_str());
if (!el) {
el = doc.NewElement(key.c_str());
meta->InsertEndChild(el);
}
el->SetText(value.c_str());
}
void save(const std::string& newFilename = "") {
std::string outFn = newFilename.empty() ? filename : newFilename;
zip_t* newZip = zip_open(outFn.c_str(), ZIP_CREATE | ZIP_TRUNCATE, nullptr);
if (!newZip) throw std::runtime_error("Failed to create new ZIP");
zip_source_t* src;
int numEntries = zip_get_num_entries(zip, 0);
for (int i = 0; i < numEntries; ++i) {
const char* name = zip_get_name(zip, i, 0);
if (std::string(name) == "meta.xml") {
tinyxml2::XMLPrinter printer;
doc.Print(&printer);
src = zip_source_buffer(newZip, printer.CStr(), printer.CStrSize() - 1, 0);
} else {
src = zip_source_zip(newZip, zip, i, 0, 0, -1);
}
if (src && zip_file_add(newZip, name, src, ZIP_FL_ENC_UTF_8) < 0) {
zip_source_free(src);
throw std::runtime_error("Failed to add file to new ZIP");
}
}
zip_close(newZip);
}
};
To use: ODFMetadataHandler handler("sample.odf"); handler.printProperties(); handler.writeProperty("dc:title", "New Title"); handler.save();