Task 476: .ODT File Format
Task 476: .ODT File Format
1. List of Properties Intrinsic to the .ODT File Format
The .ODT file format, part of the OpenDocument Format (ODF), is a ZIP archive containing XML files and other resources. The properties intrinsic to the format, particularly those related to file system and metadata aspects, are derived from the meta.xml file within the archive. These include standardized metadata elements from the Dublin Core namespace (dc:) and the OpenDocument meta namespace (meta:). Below is a comprehensive list of these properties, including their namespaces and purposes:
- Title (dc:title): The title of the document, displayed in the title bar of applications.
- Subject (dc:subject): Keywords or phrases describing the document's topic.
- Description (dc:description): A textual summary or comments about the document.
- Creator (dc:creator): The name of the last person to edit the document.
- Date (dc:date): The date and time of the last modification, in ISO-8601 format.
- Language (dc:language): The language code of the document (e.g., en-US).
- Generator (meta:generator): The software or application that generated the document.
- Initial Creator (meta:initial-creator): The name of the person who created the document.
- Creation Date (meta:creation-date): The date and time of document creation, in ISO-8601 format.
- Keywords (meta:keyword): A list of keywords associated with the document (multiple instances possible).
- Editing Cycles (meta:editing-cycles): The number of times the document has been edited.
- Editing Duration (meta:editing-duration): The total time spent editing the document, in ISO-8601 duration format (e.g., PT1H28M55S).
- User-Defined Fields (meta:user-defined): Custom metadata fields, each with a meta:name attribute specifying the field title and text content for the value (multiple instances possible).
- Document Statistics (meta:document-statistic): A set of attributes providing counts, including:
- meta:page-count: Number of pages.
- meta:paragraph-count: Number of paragraphs.
- meta:word-count: Number of words.
- meta:character-count: Number of characters.
- meta:table-count: Number of tables.
- meta:image-count: Number of images.
- meta:object-count: Number of embedded objects.
- meta:frame-count: Number of frames.
- meta:sentence-count: Number of sentences.
- meta:syllable-count: Number of syllables.
- meta:non-whitespace-character-count: Number of non-whitespace characters.
- meta:draw-count: Number of drawing objects.
- meta:ole-object-count: Number of OLE objects.
- meta:row-count: Number of rows (applicable in table contexts).
- meta:cell-count: Number of cells (applicable in table contexts).
These properties are stored in the meta.xml file and represent the core metadata intrinsic to the .ODT format's structure.
2. Two Direct Download Links for .ODT Files
- https://filesamples.com/samples/document/odt/sample3.odt
- https://filesamples.com/samples/document/odt/sample2.odt
3. Ghost Blog Embedded HTML/JavaScript for Drag-and-Drop .ODT Property Dump
The following is a self-contained HTML snippet with embedded JavaScript that can be embedded in a Ghost blog post. It allows users to drag and drop an .ODT file, unzips it using JSZip (included via CDN for simplicity), parses the meta.xml, and displays the properties listed in section 1 on the screen. Ensure the Ghost blog allows script embedding.
4. Python Class for .ODT Property Handling
The following Python class uses the zipfile and xml.etree.ElementTree modules to open an .ODT file, read and decode the meta.xml, print the properties, and includes a method to write modified properties back to a new .ODT file.
import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO
class ODTHandler:
def __init__(self, file_path):
self.file_path = file_path
self.meta_xml = None
self.properties = {}
self._read_meta()
def _read_meta(self):
with zipfile.ZipFile(self.file_path, 'r') as zf:
if 'meta.xml' in zf.namelist():
meta_content = zf.read('meta.xml')
root = ET.fromstring(meta_content)
meta = root.find('{urn:oasis:names:tc:opendocument:xmlns:office:1.0}meta')
if meta is not None:
ns = {
'dc': 'http://purl.org/dc/elements/1.1/',
'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0'
}
self.properties = {
'title': meta.find('dc:title', ns).text if meta.find('dc:title', ns) is not None else 'N/A',
'subject': meta.find('dc:subject', ns).text if meta.find('dc:subject', ns) is not None else 'N/A',
'description': meta.find('dc:description', ns).text if meta.find('dc:description', ns) is not None else 'N/A',
'creator': meta.find('dc:creator', ns).text if meta.find('dc:creator', ns) is not None else 'N/A',
'date': meta.find('dc:date', ns).text if meta.find('dc:date', ns) is not None else 'N/A',
'language': meta.find('dc:language', ns).text if meta.find('dc:language', ns) is not None else 'N/A',
'generator': meta.find('meta:generator', ns).text if meta.find('meta:generator', ns) is not None else 'N/A',
'initial_creator': meta.find('meta:initial-creator', ns).text if meta.find('meta:initial-creator', ns) is not None else 'N/A',
'creation_date': meta.find('meta:creation-date', ns).text if meta.find('meta:creation-date', ns) is not None else 'N/A',
'keywords': [k.text for k in meta.findall('meta:keyword', ns)] or ['N/A'],
'editing_cycles': meta.find('meta:editing-cycles', ns).text if meta.find('meta:editing-cycles', ns) is not None else 'N/A',
'editing_duration': meta.find('meta:editing-duration', ns).text if meta.find('meta:editing-duration', ns) is not None else 'N/A',
'user_defined': {u.attrib['{urn:oasis:names:tc:opendocument:xmlns:meta:1.0}name']: u.text for u in meta.findall('meta:user-defined', ns)} or {'N/A': 'N/A'},
'statistics': {attr.split('}')[-1]: value for attr, value in meta.find('meta:document-statistic', ns).attrib.items()} if meta.find('meta:document-statistic', ns) is not None else {'N/A': 'N/A'}
}
self.meta_xml = ET.tostring(root, encoding='unicode')
def print_properties(self):
for key, value in self.properties.items():
print(f"{key.capitalize()}: {value}")
def write_modified(self, output_path, modifications={}):
with zipfile.ZipFile(self.file_path, 'r') as zf_in:
with zipfile.ZipFile(output_path, 'w') as zf_out:
for item in zf_in.infolist():
if item.filename == 'meta.xml':
root = ET.fromstring(self.meta_xml)
meta = root.find('{urn:oasis:names:tc:opendocument:xmlns:office:1.0}meta')
ns = {
'dc': 'http://purl.org/dc/elements/1.1/',
'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0'
}
for key, value in modifications.items():
elem = meta.find(f"{{{'dc' if key.startswith('dc') else 'meta'}}}{key.replace('_', '-')}", ns)
if elem is not None:
elem.text = value
modified_meta = ET.tostring(root, encoding='unicode')
zf_out.writestr(item.filename, modified_meta.encode('utf-8'))
else:
zf_out.writestr(item, zf_in.read(item.filename))
5. Java Class for .ODT Property Handling
The following Java class uses java.util.zip and javax.xml.parsers to handle .ODT files, read and print properties, and write modifications.
import java.io.*;
import java.util.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.*;
public class ODTHandler {
private String filePath;
private Map<String, Object> properties = new HashMap<>();
private String metaXml;
public ODTHandler(String filePath) {
this.filePath = filePath;
readMeta();
}
private void readMeta() {
try (ZipFile zf = new ZipFile(filePath)) {
ZipEntry entry = zf.getEntry("meta.xml");
if (entry != null) {
InputStream is = zf.getInputStream(entry);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(is);
Element meta = (Element) doc.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:office:1.0", "meta").item(0);
if (meta != null) {
properties.put("title", getText(meta, "http://purl.org/dc/elements/1.1/", "title"));
properties.put("subject", getText(meta, "http://purl.org/dc/elements/1.1/", "subject"));
properties.put("description", getText(meta, "http://purl.org/dc/elements/1.1/", "description"));
properties.put("creator", getText(meta, "http://purl.org/dc/elements/1.1/", "creator"));
properties.put("date", getText(meta, "http://purl.org/dc/elements/1.1/", "date"));
properties.put("language", getText(meta, "http://purl.org/dc/elements/1.1/", "language"));
properties.put("generator", getText(meta, "urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "generator"));
properties.put("initial_creator", getText(meta, "urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "initial-creator"));
properties.put("creation_date", getText(meta, "urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "creation-date"));
List<String> keywords = new ArrayList<>();
NodeList kwList = meta.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "keyword");
for (int i = 0; i < kwList.getLength(); i++) {
keywords.add(kwList.item(i).getTextContent());
}
properties.put("keywords", keywords.isEmpty() ? "N/A" : keywords);
properties.put("editing_cycles", getText(meta, "urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "editing-cycles"));
properties.put("editing_duration", getText(meta, "urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "editing-duration"));
Map<String, String> userDefined = new HashMap<>();
NodeList udList = meta.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "user-defined");
for (int i = 0; i < udList.getLength(); i++) {
Element ud = (Element) udList.item(i);
userDefined.put(ud.getAttributeNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "name"), ud.getTextContent());
}
properties.put("user_defined", userDefined.isEmpty() ? "N/A" : userDefined);
Element stats = (Element) meta.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:meta:1.0", "document-statistic").item(0);
Map<String, String> statMap = new HashMap<>();
if (stats != null) {
NamedNodeMap attrs = stats.getAttributes();
for (int i = 0; i < attrs.getLength(); i++) {
Attr attr = (Attr) attrs.item(i);
statMap.put(attr.getLocalName(), attr.getValue());
}
}
properties.put("statistics", statMap.isEmpty() ? "N/A" : statMap);
}
// Store metaXml for writing
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
StringWriter sw = new StringWriter();
t.transform(new DOMSource(doc), new StreamResult(sw));
metaXml = sw.toString();
}
} catch (Exception e) {
e.printStackTrace();
}
}
private String getText(Element parent, String ns, String tag) {
Node node = parent.getElementsByTagNameNS(ns, tag).item(0);
return node != null ? node.getTextContent() : "N/A";
}
public void printProperties() {
for (Map.Entry<String, Object> entry : properties.entrySet()) {
System.out.println(entry.getKey().substring(0, 1).toUpperCase() + entry.getKey().substring(1).replace("_", " ") + ": " + entry.getValue());
}
}
public void writeModified(String outputPath, Map<String, String> modifications) {
try (ZipFile zfIn = new ZipFile(filePath); ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(outputPath))) {
Enumeration<? extends ZipEntry> entries = zfIn.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
zos.putNextEntry(new ZipEntry(entry.getName()));
if (entry.getName().equals("meta.xml")) {
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
Document doc = db.parse(new ByteArrayInputStream(metaXml.getBytes("UTF-8")));
Element meta = (Element) doc.getElementsByTagNameNS("urn:oasis:names:tc:opendocument:xmlns:office:1.0", "meta").item(0);
for (Map.Entry<String, String> mod : modifications.entrySet()) {
String key = mod.getKey().replace("_", "-");
String ns = key.startsWith("dc") ? "http://purl.org/dc/elements/1.1/" : "urn:oasis:names:tc:opendocument:xmlns:meta:1.0";
Element elem = (Element) meta.getElementsByTagNameNS(ns, key).item(0);
if (elem != null) {
elem.setTextContent(mod.getValue());
}
}
TransformerFactory tf = TransformerFactory.newInstance();
Transformer t = tf.newTransformer();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
t.transform(new DOMSource(doc), new StreamResult(baos));
zos.write(baos.toByteArray());
} else {
InputStream is = zfIn.getInputStream(entry);
byte[] buffer = new byte[1024];
int len;
while ((len = is.read(buffer)) > 0) {
zos.write(buffer, 0, len);
}
is.close();
}
zos.closeEntry();
}
} catch (Exception e) {
e.printStackTrace();
}
}
}
6. JavaScript Class for .ODT Property Handling
The following JavaScript class is designed for a Node.js environment, using fs for file I/O and jszip for ZIP handling (install via npm install jszip), along with xmldom for XML parsing (install via npm install xmldom). It reads, prints, and writes properties.
const fs = require('fs');
const JSZip = require('jszip');
const DOMParser = require('xmldom').DOMParser;
const XMLSerializer = require('xmldom').XMLSerializer;
class ODTHandler {
constructor(filePath) {
this.filePath = filePath;
this.properties = {};
this.metaXml = null;
this.readMeta();
}
readMeta() {
const data = fs.readFileSync(this.filePath);
JSZip.loadAsync(data).then((zip) => {
return zip.file('meta.xml').async('string');
}).then((metaContent) => {
const parser = new DOMParser();
const doc = parser.parseFromString(metaContent, 'text/xml');
const meta = doc.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:office:1.0', 'meta')[0];
if (meta) {
const getText = (ns, tag) => {
const elem = meta.getElementsByTagNameNS(ns, tag)[0];
return elem ? elem.textContent : 'N/A';
};
this.properties = {
title: getText('http://purl.org/dc/elements/1.1/', 'title'),
subject: getText('http://purl.org/dc/elements/1.1/', 'subject'),
description: getText('http://purl.org/dc/elements/1.1/', 'description'),
creator: getText('http://purl.org/dc/elements/1.1/', 'creator'),
date: getText('http://purl.org/dc/elements/1.1/', 'date'),
language: getText('http://purl.org/dc/elements/1.1/', 'language'),
generator: getText('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'generator'),
initial_creator: getText('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'initial-creator'),
creation_date: getText('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'creation-date'),
keywords: Array.from(meta.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'keyword')).map(k => k.textContent) || ['N/A'],
editing_cycles: getText('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'editing-cycles'),
editing_duration: getText('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'editing-duration'),
user_defined: Array.from(meta.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'user-defined')).reduce((acc, u) => {
acc[u.getAttributeNS('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'name')] = u.textContent;
return acc;
}, {}) || { 'N/A': 'N/A' },
statistics: (() => {
const stats = meta.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:meta:1.0', 'document-statistic')[0];
if (!stats) return { 'N/A': 'N/A' };
const statObj = {};
for (let i = 0; i < stats.attributes.length; i++) {
const attr = stats.attributes[i];
statObj[attr.localName] = attr.value;
}
return statObj;
})()
};
this.metaXml = new XMLSerializer().serializeToString(doc);
}
}).catch((err) => {
console.error('Error reading meta.xml:', err);
});
}
printProperties() {
for (const [key, value] of Object.entries(this.properties)) {
console.log(`${key.charAt(0).toUpperCase() + key.slice(1).replace(/_/g, ' ')}: ${JSON.stringify(value)}`);
}
}
writeModified(outputPath, modifications = {}) {
const data = fs.readFileSync(this.filePath);
JSZip.loadAsync(data).then((zip) => {
const parser = new DOMParser();
const doc = parser.parseFromString(this.metaXml, 'text/xml');
const meta = doc.getElementsByTagNameNS('urn:oasis:names:tc:opendocument:xmlns:office:1.0', 'meta')[0];
for (const [key, value] of Object.entries(modifications)) {
const tag = key.replace('_', '-');
const ns = tag.startsWith('dc') ? 'http://purl.org/dc/elements/1.1/' : 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0';
const elem = meta.getElementsByTagNameNS(ns, tag)[0];
if (elem) {
elem.textContent = value;
}
}
const modifiedMeta = new XMLSerializer().serializeToString(doc);
zip.file('meta.xml', modifiedMeta);
return zip.generateAsync({ type: 'nodebuffer' });
}).then((content) => {
fs.writeFileSync(outputPath, content);
}).catch((err) => {
console.error('Error writing modified file:', err);
});
}
}
7. C++ Class for .ODT Property Handling
The following C++ class uses libzip for ZIP handling and tinyxml2 for XML parsing (assume libraries are linked; install via package manager if needed). It provides methods to read, print, and write properties.
#include <iostream>
#include <map>
#include <vector>
#include <string>
#include <zip.h>
#include <tinyxml2.h>
class ODTHandler {
private:
std::string filePath;
std::map<std::string, std::string> simpleProps;
std::vector<std::string> keywords;
std::map<std::string, std::string> userDefined;
std::map<std::string, std::string> statistics;
std::string metaXml;
void readMeta() {
zip_t* za = zip_open(filePath.c_str(), ZIP_RDONLY, nullptr);
if (za) {
zip_file_t* zf = zip_fopen(za, "meta.xml", 0);
if (zf) {
zip_stat_t zs;
zip_stat(za, "meta.xml", 0, &zs);
char* buffer = new char[zs.size + 1];
zip_fread(zf, buffer, zs.size);
buffer[zs.size] = '\0';
metaXml = buffer;
tinyxml2::XMLDocument doc;
doc.Parse(buffer);
tinyxml2::XMLElement* meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
if (meta) {
const char* ns_dc = "dc";
const char* ns_meta = "meta";
simpleProps["title"] = meta->FirstChildElement("title", ns_dc) ? meta->FirstChildElement("title", ns_dc)->GetText() : "N/A";
simpleProps["subject"] = meta->FirstChildElement("subject", ns_dc) ? meta->FirstChildElement("subject", ns_dc)->GetText() : "N/A";
simpleProps["description"] = meta->FirstChildElement("description", ns_dc) ? meta->FirstChildElement("description", ns_dc)->GetText() : "N/A";
simpleProps["creator"] = meta->FirstChildElement("creator", ns_dc) ? meta->FirstChildElement("creator", ns_dc)->GetText() : "N/A";
simpleProps["date"] = meta->FirstChildElement("date", ns_dc) ? meta->FirstChildElement("date", ns_dc)->GetText() : "N/A";
simpleProps["language"] = meta->FirstChildElement("language", ns_dc) ? meta->FirstChildElement("language", ns_dc)->GetText() : "N/A";
simpleProps["generator"] = meta->FirstChildElement("generator", ns_meta) ? meta->FirstChildElement("generator", ns_meta)->GetText() : "N/A";
simpleProps["initial-creator"] = meta->FirstChildElement("initial-creator", ns_meta) ? meta->FirstChildElement("initial-creator", ns_meta)->GetText() : "N/A";
simpleProps["creation-date"] = meta->FirstChildElement("creation-date", ns_meta) ? meta->FirstChildElement("creation-date", ns_meta)->GetText() : "N/A";
simpleProps["editing-cycles"] = meta->FirstChildElement("editing-cycles", ns_meta) ? meta->FirstChildElement("editing-cycles", ns_meta)->GetText() : "N/A";
simpleProps["editing-duration"] = meta->FirstChildElement("editing-duration", ns_meta) ? meta->FirstChildElement("editing-duration", ns_meta)->GetText() : "N/A";
for (tinyxml2::XMLElement* kw = meta->FirstChildElement("keyword", ns_meta); kw; kw = kw->NextSiblingElement("keyword", ns_meta)) {
keywords.push_back(kw->GetText() ? kw->GetText() : "N/A");
}
for (tinyxml2::XMLElement* ud = meta->FirstChildElement("user-defined", ns_meta); ud; ud = ud->NextSiblingElement("user-defined", ns_meta)) {
const char* name = ud->Attribute("name", ns_meta);
userDefined[name ? name : "N/A"] = ud->GetText() ? ud->GetText() : "N/A";
}
tinyxml2::XMLElement* stats = meta->FirstChildElement("document-statistic", ns_meta);
if (stats) {
for (const tinyxml2::XMLAttribute* attr = stats->FirstAttribute(); attr; attr = attr->Next()) {
statistics[attr->Name()] = attr->Value();
}
}
}
delete[] buffer;
zip_fclose(zf);
}
zip_close(za);
}
}
public:
ODTHandler(const std::string& path) : filePath(path) {
readMeta();
}
void printProperties() {
for (const auto& p : simpleProps) {
std::cout << p.first << ": " << p.second << std::endl;
}
std::cout << "keywords: ";
for (const auto& kw : keywords) {
std::cout << kw << " ";
}
std::cout << std::endl;
std::cout << "user_defined: ";
for (const auto& ud : userDefined) {
std::cout << ud.first << ": " << ud.second << "; ";
}
std::cout << std::endl;
std::cout << "statistics: ";
for (const auto& stat : statistics) {
std::cout << stat.first << ": " << stat.second << "; ";
}
std::cout << std::endl;
}
void writeModified(const std::string& outputPath, const std::map<std::string, std::string>& modifications) {
zip_t* za_in = zip_open(filePath.c_str(), ZIP_RDONLY, nullptr);
zip_t* za_out = zip_open(outputPath.c_str(), ZIP_CREATE | ZIP_TRUNCATE, nullptr);
if (za_in && za_out) {
int num_entries = zip_get_num_entries(za_in, 0);
for (int i = 0; i < num_entries; ++i) {
const char* name = zip_get_name(za_in, i, 0);
zip_file_t* zf_in = zip_fopen_index(za_in, i, 0);
if (zf_in) {
zip_stat_t zs;
zip_stat_index(za_in, i, 0, &zs);
char* buffer = new char[zs.size];
zip_fread(zf_in, buffer, zs.size);
if (std::string(name) == "meta.xml") {
tinyxml2::XMLDocument doc;
doc.Parse(metaXml.c_str());
tinyxml2::XMLElement* meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
for (const auto& mod : modifications) {
const char* ns = mod.first.find("dc") == 0 ? "dc" : "meta";
tinyxml2::XMLElement* elem = meta->FirstChildElement(mod.first.c_str(), ns);
if (elem) {
elem->SetText(mod.second.c_str());
}
}
tinyxml2::XMLPrinter printer;
doc.Print(&printer);
delete[] buffer;
buffer = const_cast<char*>(printer.CStr());
zs.size = printer.Size();
}
zip_file_add(za_out, name, zip_buffer_create(buffer, zs.size), ZIP_FL_ENC_UTF_8);
zip_fclose(zf_in);
}
}
zip_close(za_in);
zip_close(za_out);
}
}
};