Task 472: .ODF File Format

Task 472: .ODF File Format

Specifications for the .ODF File Format

The .ODF file format refers to the OpenDocument Format for Office Applications, an XML-based open standard for office documents standardized as ISO/IEC 26300 by the OASIS consortium. It encompasses various document types, including formulas (.odf extension for OpenDocument Formula files), and is structured as a ZIP archive containing XML files for content, metadata, styles, and other components. The complete specifications are detailed in the OASIS OpenDocument Version 1.3 documentation, available at https://docs.oasis-open.org/office/OpenDocument/v1.3/OpenDocument-v1.3.html.

  1. List of Properties Intrinsic to the File Format

The properties intrinsic to the .ODF file format, particularly those related to its internal structure and metadata, are stored within the ZIP package, primarily in the meta.xml file under the <office:meta> element. These metadata properties are common across OpenDocument types, including formula documents, and include predefined elements from namespaces such as meta: (OpenDocument metadata) and dc: (Dublin Core). The following table enumerates all such properties, including their namespace-qualified names, attributes (if applicable), and descriptions:

Property Name Attributes Description
meta:generator None Identifies the software and version that generated the document.
dc:title None Specifies the title of the document.
dc:description None Provides a textual summary or abstract of the document's content.
dc:subject None Indicates the main topic or subject area of the document.
meta:keyword None Lists individual keywords associated with the document (multiple instances permitted).
meta:initial-creator None Names the person or entity who initially created the document.
dc:creator None Names the person or entity who last modified the document.
meta:printed-by None Records the person or entity who last printed the document.
meta:creation-date None Records the date and time of document creation in ISO 8601 format.
dc:date None Records the date and time of the last modification in ISO 8601 format.
meta:print-date None Records the date and time of the last print action in ISO 8601 format.
meta:template None Identifies the template file used to create the document, including its name or path.
meta:auto-reload None Specifies whether the document should automatically reload from its source (boolean value).
meta:hyperlink-behaviour None Defines the handling of hyperlinks, such as target frame or display mode.
dc:language None Specifies the primary language code of the document (e.g., "en-US" per RFC 4646).
meta:editing-cycles None Counts the number of editing or save cycles as an integer.
meta:editing-duration None Tracks the total editing time as a duration in ISO 8601 format (e.g., "PT1H30M").
meta:document-statistic None Provides counts for document elements, such as pages, words, or characters.
meta:user-defined meta:name (required for identification) Allows custom metadata properties with a name-value pair, supporting types like string, date, or float.

These properties are extracted from the OpenDocument schema specification.

  1. Two Direct Download Links for .ODF Files

Based on available resources, the following direct download links point to sample OpenDocument files (.odt extension, which adheres to the .ODF format standard):

These files serve as examples for testing and can be used with the code provided below.

  1. HTML JavaScript for Drag-and-Drop .ODF File Property Dump

The following is a self-contained HTML page with embedded JavaScript suitable for embedding in a Ghost blog (or any HTML environment). It enables users to drag and drop a .ODF file, unzips it using JSZip (loaded from a CDN), parses the meta.xml file, extracts the metadata properties listed above, and displays them on the screen. Ensure the page is served over HTTPS for FileReader compatibility in modern browsers.

ODF Property Dumper
Drag and drop a .ODF file here

    

  1. Python Class for .ODF File Handling

The following Python class opens a .ODF file, decodes its ZIP structure, reads and writes metadata properties from meta.xml, and prints them to the console. It uses standard libraries zipfile and xml.etree.ElementTree.

import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO

class ODFMetadataHandler:
    def __init__(self, filename):
        self.filename = filename
        self.zip = zipfile.ZipFile(filename, 'r')
        self.tree = self._parse_meta()
        self.properties = self._extract_properties()

    def _parse_meta(self):
        with self.zip.open('meta.xml') as f:
            return ET.parse(f)

    def _extract_properties(self):
        properties = {}
        meta = self.tree.getroot().find('{urn:oasis:names:tc:opendocument:xmlns:office:1.0}meta')
        if meta is None:
            raise ValueError('No meta element found')
        
        ns = {
            'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
            'dc': 'http://purl.org/dc/elements/1.1/'
        }
        
        for key in ['generator', 'keyword', 'initial-creator', 'printed-by', 'creation-date', 'print-date', 'template', 'auto-reload', 'hyperlink-behaviour', 'editing-cycles', 'editing-duration', 'document-statistic']:
            elements = meta.findall(f'meta:{key}', ns)
            for el in elements:
                properties[f'meta:{key}'] = el.text or el.attrib
        
        for key in ['title', 'description', 'subject', 'creator', 'date', 'language']:
            elements = meta.findall(f'dc:{key}', ns)
            for el in elements:
                properties[f'dc:{key}'] = el.text
        
        user_defined = meta.findall('meta:user-defined', ns)
        for ud in user_defined:
            name = ud.get(f'{{{ns["meta"]}}}name')
            properties[f'user-defined:{name}'] = ud.text
        
        return properties

    def print_properties(self):
        for key, value in self.properties.items():
            print(f"{key}: {value}")

    def write_property(self, key, value):
        ns = {
            'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
            'dc': 'http://purl.org/dc/elements/1.1/'
        }
        meta = self.tree.getroot().find('{urn:oasis:names:tc:opendocument:xmlns:office:1.0}meta')
        
        prefix, subkey = key.split(':') if ':' in key else ('meta', key)
        namespace = ns[prefix]
        
        element = meta.find(f'{prefix}:{subkey}', ns)
        if element is None:
            ET.SubElement(meta, f'{{{namespace}}}{subkey}').text = value
        else:
            element.text = value

    def save(self, new_filename=None):
        if new_filename is None:
            new_filename = self.filename
        with zipfile.ZipFile(new_filename, 'w') as new_zip:
            for item in self.zip.infolist():
                if item.filename == 'meta.xml':
                    meta_bytes = BytesIO()
                    self.tree.write(meta_bytes, encoding='utf-8', xml_declaration=True)
                    new_zip.writestr('meta.xml', meta_bytes.getvalue())
                else:
                    new_zip.writestr(item, self.zip.read(item.filename))
        self.zip.close()

To use: handler = ODFMetadataHandler('sample.odf'); handler.print_properties(); handler.write_property('dc:title', 'New Title'); handler.save()

  1. Java Class for .ODF File Handling

The following Java class performs similar operations using java.util.zip for ZIP handling and javax.xml.parsers for XML parsing.

import java.io.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.InputSource;
import java.util.HashMap;
import java.util.Map;

public class ODFMetadataHandler {
    private String filename;
    private ZipFile zip;
    private Document document;
    private Map<String, String> properties;

    public ODFMetadataHandler(String filename) throws Exception {
        this.filename = filename;
        this.zip = new ZipFile(filename);
        this.document = parseMeta();
        this.properties = extractProperties();
    }

    private Document parseMeta() throws Exception {
        ZipEntry entry = zip.getEntry("meta.xml");
        InputStream is = zip.getInputStream(entry);
        DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
        dbf.setNamespaceAware(true);
        DocumentBuilder db = dbf.newDocumentBuilder();
        return db.parse(is);
    }

    private Map<String, String> extractProperties() {
        Map<String, String> props = new HashMap<>();
        Node meta = document.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:office:1.0", "meta").item(0);
        if (meta == null) throw new RuntimeException("No meta element found");

        String[] metaKeys = {"generator", "keyword", "initial-creator", "printed-by", "creation-date", "print-date", "template", "auto-reload", "hyperlink-behaviour", "editing-cycles", "editing-duration", "document-statistic"};
        for (String key : metaKeys) {
            NodeList nodes = ((Element) meta).getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", key);
            for (int i = 0; i < nodes.getLength(); i++) {
                props.put("meta:" + key, nodes.item(i).getTextContent());
            }
        }

        String[] dcKeys = {"title", "description", "subject", "creator", "date", "language"};
        for (String key : dcKeys) {
            NodeList nodes = ((Element) meta).getElementsByTagNameNS("http://purl.org/dc/elements/1.1/", key);
            for (int i = 0; i < nodes.getLength(); i++) {
                props.put("dc:" + key, nodes.item(i).getTextContent());
            }
        }

        NodeList userDefined = ((Element) meta).getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "user-defined");
        for (int i = 0; i < userDefined.getLength(); i++) {
            Element el = (Element) userDefined.item(i);
            String name = el.getAttributeNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "name");
            props.put("user-defined:" + name, el.getTextContent());
        }

        return props;
    }

    public void printProperties() {
        for (Map.Entry<String, String> entry : properties.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    public void writeProperty(String key, String value) {
        String[] parts = key.split(":");
        String prefix = parts[0];
        String subkey = parts[1];
        String namespace = prefix.equals("dc") ? "http://purl.org/dc/elements/1.1/" : "urn:oasis:names:tc:opendocument:xmlns:meta:1.0";

        Node meta = document.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:office:1.0", "meta").item(0);
        NodeList nodes = ((Element) meta).getElementsByTagNameNS(namespace, subkey);
        Element element;
        if (nodes.getLength() == 0) {
            element = document.createElementNS(namespace, prefix + ":" + subkey);
            meta.appendChild(element);
        } else {
            element = (Element) nodes.item(0);
        }
        element.setTextContent(value);
    }

    public void save(String newFilename) throws Exception {
        if (newFilename == null) newFilename = filename;
        try (ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(newFilename))) {
            for (ZipEntry entry : zip.entries()) {
                if (entry.getName().equals("meta.xml")) {
                    ByteArrayOutputStream baos = new ByteArrayOutputStream();
                    TransformerFactory.newInstance().newTransformer().transform(
                        new DOMSource(document), new StreamResult(baos));
                    ZipEntry newEntry = new ZipEntry("meta.xml");
                    zos.putNextEntry(newEntry);
                    zos.write(baos.toByteArray());
                } else {
                    zos.putNextEntry(entry);
                    InputStream is = zip.getInputStream(entry);
                    byte[] buffer = new byte[1024];
                    int len;
                    while ((len = is.read(buffer)) > 0) {
                        zos.write(buffer, 0, len);
                    }
                    is.close();
                }
                zos.closeEntry();
            }
        }
        zip.close();
    }
}

To use: ODFMetadataHandler handler = new ODFMetadataHandler("sample.odf"); handler.printProperties(); handler.writeProperty("dc:title", "New Title"); handler.save(null);

  1. JavaScript Class for .ODF File Handling

The following JavaScript class is designed for Node.js, using the jszip library (install via npm install jszip) and fs for file operations. It opens, reads, writes, and prints properties.

const fs = require('fs');
const JSZip = require('jszip');
const { DOMParser, XMLSerializer } = require('xmldom');

class ODFMetadataHandler {
    constructor(filename) {
        this.filename = filename;
        this.zipData = fs.readFileSync(filename);
        this.properties = {};
    }

    async readProperties() {
        const zip = await JSZip.loadAsync(this.zipData);
        const metaXml = await zip.file('meta.xml').async('string');
        const parser = new DOMParser();
        const xmlDoc = parser.parseFromString(metaXml, 'application/xml');
        const meta = xmlDoc.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:office:1.0', 'meta')[0];
        if (!meta) throw new Error('No meta element found');

        const ns = {
            meta: 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
            dc: 'http://purl.org/dc/elements/1.1/'
        };

        ['generator', 'keyword', 'initial-creator', 'printed-by', 'creation-date', 'print-date', 'template', 'auto-reload', 'hyperlink-behaviour', 'editing-cycles', 'editing-duration', 'document-statistic'].forEach(key => {
            const elements = meta.getElementsByTagNameNS(ns.meta, key);
            for (let i = 0; i < elements.length; i++) {
                this.properties[`meta:${key}`] = elements[i].textContent;
            }
        });

        ['title', 'description', 'subject', 'creator', 'date', 'language'].forEach(key => {
            const elements = meta.getElementsByTagNameNS(ns.dc, key);
            for (let i = 0; i < elements.length; i++) {
                this.properties[`dc:${key}`] = elements[i].textContent;
            }
        });

        const userDefined = meta.getElementsByTagNameNS(ns.meta, 'user-defined');
        for (let i = 0; i < userDefined.length; i++) {
            const name = userDefined[i].getAttributeNS(ns.meta, 'name');
            this.properties[`user-defined:${name}`] = userDefined[i].textContent;
        }

        return this.properties;
    }

    printProperties() {
        for (const [key, value] of Object.entries(this.properties)) {
            console.log(`${key}: ${value}`);
        }
    }

    async writeProperty(key, value) {
        const zip = await JSZip.loadAsync(this.zipData);
        let metaXml = await zip.file('meta.xml').async('string');
        const parser = new DOMParser();
        const xmlDoc = parser.parseFromString(metaXml, 'application/xml');
        const meta = xmlDoc.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:office:1.0', 'meta')[0];

        const [prefix, subkey] = key.split(':');
        const ns = prefix === 'dc' ? 'http://purl.org/dc/elements/1.1/' : 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0';

        let element = meta.getElementsByTagNameNS(ns, subkey)[0];
        if (!element) {
            element = xmlDoc.createElementNS(ns, `${prefix}:${subkey}`);
            meta.appendChild(element);
        }
        element.textContent = value;

        const serializer = new XMLSerializer();
        metaXml = serializer.serializeToString(xmlDoc);
        zip.file('meta.xml', metaXml);
        this.zipData = await zip.generateAsync({type: 'nodebuffer'});
    }

    save(newFilename = null) {
        if (!newFilename) newFilename = this.filename;
        fs.writeFileSync(newFilename, this.zipData);
    }
}

To use: const handler = new ODFMetadataHandler('sample.odf'); await handler.readProperties(); handler.printProperties(); await handler.writeProperty('dc:title', 'New Title'); handler.save();

  1. C++ Class for .ODF File Handling

The following C++ class requires external libraries libzip (for ZIP handling) and tinyxml2 (for XML parsing). Compile with -lzip -ltinyxml2. It opens, reads, writes, and prints properties.

#include <iostream>
#include <map>
#include <string>
#include <zip.h>
#include <tinyxml2.h>

class ODFMetadataHandler {
private:
    std::string filename;
    zip_t* zip;
    tinyxml2::XMLDocument doc;
    std::map<std::string, std::string> properties;

    void parseMeta() {
        zip_file_t* file = zip_fopen(zip, "meta.xml", 0);
        if (!file) throw std::runtime_error("No meta.xml found");

        std::string xmlContent;
        char buffer[1024];
        zip_int64_t len;
        while ((len = zip_fread(file, buffer, sizeof(buffer))) > 0) {
            xmlContent.append(buffer, len);
        }
        zip_fclose(file);

        doc.Parse(xmlContent.c_str());
    }

    void extractProperties() {
        tinyxml2::XMLElement* meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
        if (!meta) throw std::runtime_error("No meta element found");

        const char* metaNS = "urn:oasis:names:tc:opendocument:xmlns:meta:1.0";
        const char* dcNS = "http://purl.org/dc/elements/1.1/";

        const char* metaKeys[] = {"meta:generator", "meta:keyword", "meta:initial-creator", "meta:printed-by", "meta:creation-date", "meta:print-date", "meta:template", "meta:auto-reload", "meta:hyperlink-behaviour", "meta:editing-cycles", "meta:editing-duration", "meta:document-statistic"};
        for (const char* key : metaKeys) {
            tinyxml2::XMLElement* el = meta->FirstChildElement(key);
            while (el) {
                properties[key] = el->GetText() ? el->GetText() : "";
                el = el->NextSiblingElement(key);
            }
        }

        const char* dcKeys[] = {"dc:title", "dc:description", "dc:subject", "dc:creator", "dc:date", "dc:language"};
        for (const char* key : dcKeys) {
            tinyxml2::XMLElement* el = meta->FirstChildElement(key);
            while (el) {
                properties[key] = el->GetText() ? el->GetText() : "";
                el = el->NextSiblingElement(key);
            }
        }

        tinyxml2::XMLElement* ud = meta->FirstChildElement("meta:user-defined");
        while (ud) {
            std::string name = ud->Attribute("meta:name");
            properties["user-defined:" + name] = ud->GetText() ? ud->GetText() : "";
            ud = ud->NextSiblingElement("meta:user-defined");
        }
    }

public:
    ODFMetadataHandler(const std::string& fn) : filename(fn) {
        int err = 0;
        zip = zip_open(filename.c_str(), ZIP_RDONLY, &err);
        if (!zip) throw std::runtime_error("Failed to open ZIP");
        parseMeta();
        extractProperties();
    }

    ~ODFMetadataHandler() {
        zip_close(zip);
    }

    void printProperties() {
        for (const auto& pair : properties) {
            std::cout << pair.first << ": " << pair.second << std::endl;
        }
    }

    void writeProperty(const std::string& key, const std::string& value) {
        tinyxml2::XMLElement* meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
        tinyxml2::XMLElement* el = meta->FirstChildElement(key.c_str());
        if (!el) {
            el = doc.NewElement(key.c_str());
            meta->InsertEndChild(el);
        }
        el->SetText(value.c_str());
    }

    void save(const std::string& newFilename = "") {
        std::string outFn = newFilename.empty() ? filename : newFilename;
        zip_t* newZip = zip_open(outFn.c_str(), ZIP_CREATE | ZIP_TRUNCATE, nullptr);
        if (!newZip) throw std::runtime_error("Failed to create new ZIP");

        zip_source_t* src;
        int numEntries = zip_get_num_entries(zip, 0);
        for (int i = 0; i < numEntries; ++i) {
            const char* name = zip_get_name(zip, i, 0);
            if (std::string(name) == "meta.xml") {
                tinyxml2::XMLPrinter printer;
                doc.Print(&printer);
                src = zip_source_buffer(newZip, printer.CStr(), printer.CStrSize() - 1, 0);
            } else {
                src = zip_source_zip(newZip, zip, i, 0, 0, -1);
            }
            if (src && zip_file_add(newZip, name, src, ZIP_FL_ENC_UTF_8) < 0) {
                zip_source_free(src);
                throw std::runtime_error("Failed to add file to new ZIP");
            }
        }
        zip_close(newZip);
    }
};

To use: ODFMetadataHandler handler("sample.odf"); handler.printProperties(); handler.writeProperty("dc:title", "New Title"); handler.save();