Task 714: .SXW File Format
Task 714: .SXW File Format
File Format Specifications for .SXW
The .SXW file format is an XML-based document format used by StarOffice Writer and OpenOffice.org 1.0 for text documents. It consists of a ZIP archive containing multiple XML files that separate content, styles, metadata, and settings. The format is defined in the OpenOffice.org XML File Format Specification, available at https://www.openoffice.org/xml/xml_specification.pdf.
List of Properties Intrinsic to the .SXW File Format
Based on the specification, the intrinsic properties refer to the metadata elements stored primarily in the meta.xml file within the ZIP archive. These properties describe the document's attributes, creation details, and statistics. The complete list includes:
- Title: The document's title (dc:title).
- Creator: The initial author of the document (dc:creator).
- Contributor: Additional contributors to the document (dc:contributor).
- Date: The last modification date in ISO 8601 format (dc:date).
- Creation Date: The date the document was created (meta:creation-date).
- Modification Date: The date of the last modification (meta:modification-date).
- Subject: The document's subject (dc:subject).
- Description: A brief description of the document (dc:description).
- Language: The primary language of the document, such as "en-US" (dc:language).
- Keywords: A list of keywords associated with the document (meta:keyword, multiple entries possible).
- Printed By: The user who last printed the document (meta:printed-by).
- Print Date: The date the document was last printed (meta:print-date).
- User-Defined Fields: Custom metadata fields, each with a name and value (meta:user-defined, multiple entries possible).
- Document Statistics: Includes page count (meta:page-count), table count (meta:table-count), image count (meta:image-count), object count (meta:object-count), paragraph count (meta:paragraph-count), word count (meta:word-count), and character count (meta:character-count).
- Template: Reference to the template used, including URL, title, and date (meta:template with attributes xlink:href, xlink:title, meta:date).
- Auto-Reload: Settings for automatic reloading, including URL and delay (meta:auto-reload with attributes xlink:href, meta:delay).
These properties are encoded in XML and can be extracted by unzipping the file and parsing meta.xml.
Two Direct Download Links for .SXW Files
- http://www.linuxplusvalue.be/download/tutoriel_dicooo-en.sxw
- http://www.linuxplusvalue.be/download/tutoriel_dicooo-fr.sxw
Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SXW Property Dump
The following is a self-contained HTML page with embedded JavaScript that allows users to drag and drop a .SXW file. It uses the browser's File API and DOMParser to unzip the file (using a simple ZIP parser implementation for demonstration; in production, consider a library like JSZip), extract meta.xml, parse it, and display the properties on the screen.
Drag and Drop .SXW File
Note: This implementation assumes no compression for meta.xml (common in .SXW files). For full production use, integrate a complete ZIP library like JSZip to handle deflation.
Python Class for .SXW File Handling
The following Python class uses the zipfile and xml.etree.ElementTree modules to open, read, decode, print, and write .SXW files. The read_properties method extracts and prints the properties, while write_properties allows updating them and saving a new file.
import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO
class SXWHandler:
def __init__(self, filename):
self.filename = filename
self.properties = {}
self.namespaces = {
'dc': 'http://purl.org/dc/elements/1.1/',
'meta': 'http://openoffice.org/2000/meta',
'office': 'http://openoffice.org/2000/office',
'xlink': 'http://www.w3.org/1999/xlink'
}
def read_properties(self):
with zipfile.ZipFile(self.filename, 'r') as zf:
if 'meta.xml' in zf.namelist():
with zf.open('meta.xml') as meta_file:
tree = ET.parse(meta_file)
root = tree.getroot()
meta = root.find('office:meta', self.namespaces) or root
self.properties['title'] = meta.find('dc:title', self.namespaces).text if meta.find('dc:title', self.namespaces) is not None else ''
self.properties['creator'] = meta.find('dc:creator', self.namespaces).text if meta.find('dc:creator', self.namespaces) is not None else ''
self.properties['contributor'] = meta.find('dc:contributor', self.namespaces).text if meta.find('dc:contributor', self.namespaces) is not None else ''
self.properties['date'] = meta.find('dc:date', self.namespaces).text if meta.find('dc:date', self.namespaces) is not None else ''
self.properties['creation_date'] = meta.find('meta:creation-date', self.namespaces).text if meta.find('meta:creation-date', self.namespaces) is not None else ''
self.properties['modification_date'] = meta.find('meta:modification-date', self.namespaces).text if meta.find('meta:modification-date', self.namespaces) is not None else ''
self.properties['subject'] = meta.find('dc:subject', self.namespaces).text if meta.find('dc:subject', self.namespaces) is not None else ''
self.properties['description'] = meta.find('dc:description', self.namespaces).text if meta.find('dc:description', self.namespaces) is not None else ''
self.properties['language'] = meta.find('dc:language', self.namespaces).text if meta.find('dc:language', self.namespaces) is not None else ''
self.properties['keywords'] = [k.text for k in meta.findall('meta:keyword', self.namespaces)]
self.properties['printed_by'] = meta.find('meta:printed-by', self.namespaces).text if meta.find('meta:printed-by', self.namespaces) is not None else ''
self.properties['print_date'] = meta.find('meta:print-date', self.namespaces).text if meta.find('meta:print-date', self.namespaces) is not None else ''
self.properties['user_defined'] = [(ud.attrib.get('meta:name'), ud.text) for ud in meta.findall('meta:user-defined', self.namespaces)]
stats = meta.find('meta:document-statistic', self.namespaces)
if stats is not None:
self.properties['statistics'] = {
'page_count': stats.attrib.get('meta:page-count'),
'table_count': stats.attrib.get('meta:table-count'),
'image_count': stats.attrib.get('meta:image-count'),
'object_count': stats.attrib.get('meta:object-count'),
'paragraph_count': stats.attrib.get('meta:paragraph-count'),
'word_count': stats.attrib.get('meta:word-count'),
'character_count': stats.attrib.get('meta:character-count')
}
template = meta.find('meta:template', self.namespaces)
if template is not None:
self.properties['template'] = {
'href': template.attrib.get('{http://www.w3.org/1999/xlink}href'),
'title': template.attrib.get('{http://www.w3.org/1999/xlink}title'),
'date': template.attrib.get('meta:date')
}
auto_reload = meta.find('meta:auto-reload', self.namespaces)
if auto_reload is not None:
self.properties['auto_reload'] = {
'href': auto_reload.attrib.get('{http://www.w3.org/1999/xlink}href'),
'delay': auto_reload.attrib.get('meta:delay')
}
print(self.properties)
else:
print("meta.xml not found.")
def write_properties(self, new_filename, updated_properties):
with zipfile.ZipFile(self.filename, 'r') as zf_in:
with zipfile.ZipFile(new_filename, 'w') as zf_out:
for item in zf_in.infolist():
if item.filename == 'meta.xml':
# Update meta.xml
meta_xml = zf_in.read('meta.xml')
tree = ET.parse(BytesIO(meta_xml))
root = tree.getroot()
meta = root.find('office:meta', self.namespaces) or root
# Update fields (example for title; extend for others)
if 'title' in updated_properties:
elem = meta.find('dc:title', self.namespaces)
if elem is not None:
elem.text = updated_properties['title']
# ... Add updates for other properties similarly
new_meta = ET.tostring(root, encoding='utf-8', method='xml')
zf_out.writestr('meta.xml', new_meta)
else:
zf_out.writestr(item.filename, zf_in.read(item.filename))
# Example usage:
# handler = SXWHandler('example.sxw')
# handler.read_properties()
# handler.write_properties('updated.sxw', {'title': 'New Title'})
Java Class for .SXW File Handling
The following Java class uses java.util.zip and javax.xml.parsers to handle .SXW files. It includes methods to read and print properties, and to write updated properties.
import java.io.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;
public class SXWHandler {
private String filename;
private Document document;
public SXWHandler(String filename) {
this.filename = filename;
}
public void readProperties() throws IOException, ParserConfigurationException, SAXException {
try (ZipFile zf = new ZipFile(filename)) {
ZipEntry entry = zf.getEntry("meta.xml");
if (entry != null) {
InputStream is = zf.getInputStream(entry);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
DocumentBuilder db = dbf.newDocumentBuilder();
document = db.parse(is);
// Extract and print
System.out.println("Title: " + getText("dc:title"));
System.out.println("Creator: " + getText("dc:creator"));
System.out.println("Contributor: " + getText("dc:contributor"));
System.out.println("Date: " + getText("dc:date"));
System.out.println("Creation Date: " + getText("meta:creation-date"));
System.out.println("Modification Date: " + getText("meta:modification-date"));
System.out.println("Subject: " + getText("dc:subject"));
System.out.println("Description: " + getText("dc:description"));
System.out.println("Language: " + getText("dc:language"));
// Keywords: loop over meta:keyword
NodeList keywords = document.getElementsByTagNameNS("http://openoffice.org/2000/meta", "keyword");
System.out.print("Keywords: ");
for (int i = 0; i < keywords.getLength(); i++) {
System.out.print(keywords.item(i).getTextContent() + " ");
}
System.out.println();
System.out.println("Printed By: " + getText("meta:printed-by"));
System.out.println("Print Date: " + getText("meta:print-date"));
// User-defined
NodeList userDefined = document.getElementsByTagNameNS("http://openoffice.org/2000/meta", "user-defined");
for (int i = 0; i < userDefined.getLength(); i++) {
Element ud = (Element) userDefined.item(i);
System.out.println("User Defined - " + ud.getAttribute("meta:name") + ": " + ud.getTextContent());
}
// Statistics
Element stats = (Element) document.getElementsByTagNameNS("http://openoffice.org/2000/meta", "document-statistic").item(0);
if (stats != null) {
System.out.println("Page Count: " + stats.getAttribute("meta:page-count"));
System.out.println("Table Count: " + stats.getAttribute("meta:table-count"));
System.out.println("Image Count: " + stats.getAttribute("meta:image-count"));
System.out.println("Object Count: " + stats.getAttribute("meta:object-count"));
System.out.println("Paragraph Count: " + stats.getAttribute("meta:paragraph-count"));
System.out.println("Word Count: " + stats.getAttribute("meta:word-count"));
System.out.println("Character Count: " + stats.getAttribute("meta:character-count"));
}
// Template
Element template = (Element) document.getElementsByTagNameNS("http://openoffice.org/2000/meta", "template").item(0);
if (template != null) {
System.out.println("Template HREF: " + template.getAttribute("xlink:href"));
System.out.println("Template Title: " + template.getAttribute("xlink:title"));
System.out.println("Template Date: " + template.getAttribute("meta:date"));
}
// Auto-reload
Element autoReload = (Element) document.getElementsByTagNameNS("http://openoffice.org/2000/meta", "auto-reload").item(0);
if (autoReload != null) {
System.out.println("Auto Reload HREF: " + autoReload.getAttribute("xlink:href"));
System.out.println("Auto Reload Delay: " + autoReload.getAttribute("meta:delay"));
}
} else {
System.out.println("meta.xml not found.");
}
}
}
private String getText(String tag) {
Node node = document.getElementsByTagName(tag).item(0);
return node != null ? node.getTextContent() : "";
}
public void writeProperties(String newFilename, String newTitle) throws IOException {
// Example for updating title; extend for others
try (ZipFile zfIn = new ZipFile(filename);
ZipOutputStream zos = new ZipOutputStream(new FileOutputStream(newFilename))) {
Enumeration<? extends ZipEntry> entries = zfIn.entries();
while (entries.hasMoreElements()) {
ZipEntry entry = entries.nextElement();
zos.putNextEntry(new ZipEntry(entry.getName()));
if (entry.getName().equals("meta.xml")) {
// Update document
InputStream is = zfIn.getInputStream(entry);
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
dbf.setNamespaceAware(true);
Document doc = dbf.newDocumentBuilder().parse(is);
Node titleNode = doc.getElementsByTagName("dc:title").item(0);
if (titleNode != null) {
titleNode.setTextContent(newTitle);
}
// Write updated XML
javax.xml.transform.TransformerFactory tf = javax.xml.transform.TransformerFactory.newInstance();
javax.xml.transform.Transformer transformer = tf.newTransformer();
transformer.transform(new javax.xml.transform.dom.DOMSource(doc), new javax.xml.transform.stream.StreamResult(zos));
} else {
InputStream is = zfIn.getInputStream(entry);
byte[] buffer = new byte[1024];
int len;
while ((len = is.read(buffer)) > 0) {
zos.write(buffer, 0, len);
}
}
zos.closeEntry();
}
} catch (Exception e) {
e.printStackTrace();
}
}
// Example usage:
// public static void main(String[] args) throws Exception {
// SXWHandler handler = new SXWHandler("example.sxw");
// handler.readProperties();
// handler.writeProperties("updated.sxw", "New Title");
// }
}
JavaScript Class for .SXW File Handling
The following JavaScript class can be used in a Node.js environment (with fs and adm-zip library for ZIP handling; install via npm if needed). It reads, prints, and writes properties.
const fs = require('fs');
const AdmZip = require('adm-zip');
const { DOMParser, XMLSerializer } = require('xmldom');
class SXWHandler {
constructor(filename) {
this.filename = filename;
this.properties = {};
}
readProperties() {
const zip = new AdmZip(this.filename);
const metaEntry = zip.getEntry('meta.xml');
if (metaEntry) {
const metaXml = zip.readAsText(metaEntry);
const parser = new DOMParser();
const doc = parser.parseFromString(metaXml, 'text/xml');
const meta = doc.getElementsByTagNameNS('http://openoffice.org/2000/office', 'meta')[0] || doc.documentElement;
this.properties.title = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'title');
this.properties.creator = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'creator');
this.properties.contributor = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'contributor');
this.properties.date = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'date');
this.properties.creationDate = this.getText(meta, 'http://openoffice.org/2000/meta', 'creation-date');
this.properties.modificationDate = this.getText(meta, 'http://openoffice.org/2000/meta', 'modification-date');
this.properties.subject = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'subject');
this.properties.description = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'description');
this.properties.language = this.getText(meta, 'http://purl.org/dc/elements/1.1/', 'language');
const keywords = meta.getElementsByTagNameNS('http://openoffice.org/2000/meta', 'keyword');
this.properties.keywords = Array.from(keywords).map(k => k.textContent);
this.properties.printedBy = this.getText(meta, 'http://openoffice.org/2000/meta', 'printed-by');
this.properties.printDate = this.getText(meta, 'http://openoffice.org/2000/meta', 'print-date');
const userDefined = meta.getElementsByTagNameNS('http://openoffice.org/2000/meta', 'user-defined');
this.properties.userDefined = Array.from(userDefined).map(ud => ({
name: ud.getAttributeNS('http://openoffice.org/2000/meta', 'name'),
value: ud.textContent
}));
const stats = meta.getElementsByTagNameNS('http://openoffice.org/2000/meta', 'document-statistic')[0];
if (stats) {
this.properties.statistics = {
pageCount: stats.getAttribute('meta:page-count'),
tableCount: stats.getAttribute('meta:table-count'),
imageCount: stats.getAttribute('meta:image-count'),
objectCount: stats.getAttribute('meta:object-count'),
paragraphCount: stats.getAttribute('meta:paragraph-count'),
wordCount: stats.getAttribute('meta:word-count'),
characterCount: stats.getAttribute('meta:character-count')
};
}
const template = meta.getElementsByTagNameNS('http://openoffice.org/2000/meta', 'template')[0];
if (template) {
this.properties.template = {
href: template.getAttributeNS('http://www.w3.org/1999/xlink', 'href'),
title: template.getAttributeNS('http://www.w3.org/1999/xlink', 'title'),
date: template.getAttributeNS('http://openoffice.org/2000/meta', 'date')
};
}
const autoReload = meta.getElementsByTagNameNS('http://openoffice.org/2000/meta', 'auto-reload')[0];
if (autoReload) {
this.properties.autoReload = {
href: autoReload.getAttributeNS('http://www.w3.org/1999/xlink', 'href'),
delay: autoReload.getAttributeNS('http://openoffice.org/2000/meta', 'delay')
};
}
console.log(this.properties);
} else {
console.log('meta.xml not found.');
}
}
getText(parent, ns, tag) {
const elem = parent.getElementsByTagNameNS(ns, tag)[0];
return elem ? elem.textContent : '';
}
writeProperties(newFilename, updatedProperties) {
const zip = new AdmZip(this.filename);
const metaEntry = zip.getEntry('meta.xml');
if (metaEntry) {
const metaXml = zip.readAsText(metaEntry);
const parser = new DOMParser();
const doc = parser.parseFromString(metaXml, 'text/xml');
const meta = doc.getElementsByTagNameNS('http://openoffice.org/2000/office', 'meta')[0] || doc.documentElement;
// Update example: title
if (updatedProperties.title) {
const titleElem = meta.getElementsByTagNameNS('http://purl.org/dc/elements/1.1/', 'title')[0];
if (titleElem) titleElem.textContent = updatedProperties.title;
}
// Extend for other properties
const serializer = new XMLSerializer();
const newMeta = serializer.serializeToString(doc);
zip.updateFile('meta.xml', Buffer.from(newMeta));
zip.writeZip(newFilename);
}
}
}
// Example usage:
// const handler = new SXWHandler('example.sxw');
// handler.readProperties();
// handler.writeProperties('updated.sxw', { title: 'New Title' });
C++ Class for .SXW File Handling
The following C++ class uses libzip (external library; link with -lzip) and TinyXML-2 (for XML parsing; include tinyxml2.h) to handle .SXW files. It provides methods to read and print properties, and to write updated properties.
#include <zip.h>
#include <tinyxml2.h>
#include <iostream>
#include <string>
#include <vector>
class SXWHandler {
private:
std::string filename;
tinyxml2::XMLDocument doc;
public:
SXWHandler(const std::string& fn) : filename(fn) {}
void readProperties() {
zip* z = zip_open(filename.c_str(), 0, nullptr);
if (z) {
zip_file* f = zip_fopen(z, "meta.xml", 0);
if (f) {
zip_stat_t stat;
zip_stat(z, "meta.xml", 0, &stat);
std::vector<char> buffer(stat.size);
zip_fread(f, buffer.data(), stat.size);
zip_fclose(f);
doc.Parse(buffer.data(), stat.size);
auto meta = doc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
if (meta) {
std::cout << "Title: " << (meta->FirstChildElement("dc:title") ? meta->FirstChildElement("dc:title")->GetText() : "") << std::endl;
std::cout << "Creator: " << (meta->FirstChildElement("dc:creator") ? meta->FirstChildElement("dc:creator")->GetText() : "") << std::endl;
std::cout << "Contributor: " << (meta->FirstChildElement("dc:contributor") ? meta->FirstChildElement("dc:contributor")->GetText() : "") << std::endl;
std::cout << "Date: " << (meta->FirstChildElement("dc:date") ? meta->FirstChildElement("dc:date")->GetText() : "") << std::endl;
std::cout << "Creation Date: " << (meta->FirstChildElement("meta:creation-date") ? meta->FirstChildElement("meta:creation-date")->GetText() : "") << std::endl;
std::cout << "Modification Date: " << (meta->FirstChildElement("meta:modification-date") ? meta->FirstChildElement("meta:modification-date")->GetText() : "") << std::endl;
std::cout << "Subject: " << (meta->FirstChildElement("dc:subject") ? meta->FirstChildElement("dc:subject")->GetText() : "") << std::endl;
std::cout << "Description: " << (meta->FirstChildElement("dc:description") ? meta->FirstChildElement("dc:description")->GetText() : "") << std::endl;
std::cout << "Language: " << (meta->FirstChildElement("dc:language") ? meta->FirstChildElement("dc:language")->GetText() : "") << std::endl;
std::cout << "Keywords: ";
for (auto kw = meta->FirstChildElement("meta:keyword"); kw; kw = kw->NextSiblingElement("meta:keyword")) {
std::cout << (kw->GetText() ? kw->GetText() : "") << " ";
}
std::cout << std::endl;
std::cout << "Printed By: " << (meta->FirstChildElement("meta:printed-by") ? meta->FirstChildElement("meta:printed-by")->GetText() : "") << std::endl;
std::cout << "Print Date: " << (meta->FirstChildElement("meta:print-date") ? meta->FirstChildElement("meta:print-date")->GetText() : "") << std::endl;
for (auto ud = meta->FirstChildElement("meta:user-defined"); ud; ud = ud->NextSiblingElement("meta:user-defined")) {
std::cout << "User Defined - " << ud->Attribute("meta:name") << ": " << (ud->GetText() ? ud->GetText() : "") << std::endl;
}
auto stats = meta->FirstChildElement("meta:document-statistic");
if (stats) {
std::cout << "Page Count: " << stats->Attribute("meta:page-count") << std::endl;
std::cout << "Table Count: " << stats->Attribute("meta:table-count") << std::endl;
std::cout << "Image Count: " << stats->Attribute("meta:image-count") << std::endl;
std::cout << "Object Count: " << stats->Attribute("meta:object-count") << std::endl;
std::cout << "Paragraph Count: " << stats->Attribute("meta:paragraph-count") << std::endl;
std::cout << "Word Count: " << stats->Attribute("meta:word-count") << std::endl;
std::cout << "Character Count: " << stats->Attribute("meta:character-count") << std::endl;
}
auto template = meta->FirstChildElement("meta:template");
if (template) {
std::cout << "Template HREF: " << template->Attribute("xlink:href") << std::endl;
std::cout << "Template Title: " << template->Attribute("xlink:title") << std::endl;
std::cout << "Template Date: " << template->Attribute("meta:date") << std::endl;
}
auto autoReload = meta->FirstChildElement("meta:auto-reload");
if (autoReload) {
std::cout << "Auto Reload HREF: " << autoReload->Attribute("xlink:href") << std::endl;
std::cout << "Auto Reload Delay: " << autoReload->Attribute("meta:delay") << std::endl;
}
}
} else {
std::cout << "meta.xml not found." << std::endl;
}
zip_close(z);
}
}
void writeProperties(const std::string& newFilename, const std::string& newTitle) {
zip* zIn = zip_open(filename.c_str(), 0, nullptr);
zip* zOut = zip_open(newFilename.c_str(), ZIP_CREATE, nullptr);
if (zIn && zOut) {
int num = zip_get_num_entries(zIn, 0);
for (int i = 0; i < num; ++i) {
const char* name = zip_get_name(zIn, i, 0);
zip_file* f = zip_fopen_index(zIn, i, 0);
zip_stat_t stat;
zip_stat_index(zIn, i, 0, &stat);
std::vector<char> buffer(stat.size);
zip_fread(f, buffer.data(), stat.size);
zip_fclose(f);
zip_source* src = zip_source_buffer(zOut, buffer.data(), stat.size, 0);
if (std::string(name) == "meta.xml") {
// Update
tinyxml2::XMLDocument tempDoc;
tempDoc.Parse(buffer.data(), stat.size);
auto meta = tempDoc.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
auto titleElem = meta->FirstChildElement("dc:title");
if (titleElem) titleElem->SetText(newTitle.c_str());
tinyxml2::XMLPrinter printer;
tempDoc.Print(&printer);
src = zip_source_buffer(zOut, const_cast<char*>(printer.CStr()), printer.CStrSize() - 1, 1);
}
zip_file_add(zOut, name, src, ZIP_FL_OVERWRITE);
}
zip_close(zOut);
zip_close(zIn);
}
}
};
// Example usage:
// int main() {
// SXWHandler handler("example.sxw");
// handler.readProperties();
// handler.writeProperties("updated.sxw", "New Title");
// return 0;
// }