Task 708: .SXC File Format

Task 708: .SXC File Format

1. List of Properties of the .SXC File Format Intrinsic to Its File System

The .SXC file format is a ZIP-compressed archive containing XML files, based on the OpenOffice.org XML specification (a predecessor to the OpenDocument format). It is used for spreadsheets and includes structural elements like a mimetype file, manifest, and XML files for content, metadata, styles, and settings. The "properties intrinsic to its file system" refer to the format-defined metadata and document properties extractable from the archive, primarily from meta.xml (which uses Dublin Core and OpenOffice namespaces), as well as key attributes from META-INF/manifest.xml and the mimetype file. These are not general filesystem attributes (e.g., file size or OS timestamps) but format-specific ones stored within the file's structure.

Here is a comprehensive list of extractable properties:

  • MIME Type: The content of the mimetype file (e.g., "application/vnd.sun.xml.calc").
  • Version: The office:version attribute from the root of meta.xml, content.xml, or styles.xml, or manifest:version from entries in META-INF/manifest.xml (e.g., "1.2").
  • Title: From <dc:title> in meta.xml (document title).
  • Description: From <dc:description> in meta.xml (brief abstract or comments).
  • Subject: From <dc:subject> in meta.xml (topic or category).
  • Keywords: From <dc:keywords> (containing multiple <meta:keyword>) or <meta:keywords> in meta.xml (comma-separated or listed keywords).
  • Initial Creator: From <meta:initial-creator> in meta.xml (original author).
  • Creator: From <dc:creator> in meta.xml (last modifier or current author).
  • Publisher: From <dc:publisher> in meta.xml (issuing entity).
  • Contributor: From <dc:contributor> in meta.xml (additional contributors).
  • Creation Date: From <meta:creation-date> in meta.xml (ISO 8601 timestamp of creation).
  • Modification Date: From <dc:date> or <meta:modification-date> in meta.xml (ISO 8601 timestamp of last modification).
  • Print Date: From <meta:print-date> in meta.xml (ISO 8601 timestamp of last print).
  • Printed By: From <meta:printed-by> in meta.xml (user who last printed).
  • Generator: From <meta:generator> in meta.xml (software that created/modified the file, e.g., "OpenOffice.org/1.0").
  • Editing Cycles: From <meta:editing-cycles> in meta.xml (integer count of editing sessions/saves).
  • Editing Duration: From <meta:editing-duration> in meta.xml (ISO 8601 duration, e.g., "PT1H2M3S").
  • Language: From <dc:language> in meta.xml (e.g., "en-US").
  • Type: From <dc:type> in meta.xml (document type, e.g., "Spreadsheet").
  • Format: From <dc:format> in meta.xml (MIME type or format description).
  • Identifier: From <dc:identifier> in meta.xml (unique ID like UUID or URL).
  • Source: From <dc:source> in meta.xml (reference to source document).
  • Coverage: From <dc:coverage> in meta.xml (spatial/temporal scope).
  • Rights: From <dc:rights> in meta.xml (copyright or rights info).
  • Relation: From <dc:relation> in meta.xml (links to related documents).
  • Template: From <meta:template> in meta.xml (attributes: xlink:href for URL, xlink:title for name, meta:date for timestamp).
  • User-Defined Fields: From multiple <meta:user-defined meta:name="FieldName"> in meta.xml (custom key-value pairs; list all names and values).
  • Hyperlink Behavior: From <meta:hyperlink-behaviour> in meta.xml (attributes: meta:target-frame-name, xlink:show).
  • Auto-Reload: From <meta:auto-reload> in meta.xml (attributes: xlink:href for URL, meta:delay for duration).
  • Document Statistics: From attributes of <meta:document-statistic> in meta.xml (integers):
  • table-count (number of sheets/tables)
  • cell-count (number of cells)
  • object-count (number of embedded objects)
  • image-count (number of images)
  • page-count (number of pages)
  • paragraph-count (number of paragraphs)
  • word-count (number of words)
  • character-count (number of characters, including spaces)
  • non-whitespace-character-count (number of non-whitespace characters)
  • And other format-specific counts like ole-object-count, frame-count, sentence-count, syllable-count if present.

These properties are XML-based and may not all be present in every .SXC file. Encryption details (e.g., algorithm, salt) from manifest.xml could be considered properties but are not typically "dumped" as metadata.

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SXC Property Dump

This is a self-contained HTML page with embedded JavaScript that allows dragging and dropping a .SXC file. It uses the browser's File API to read the file as an ArrayBuffer, parses the ZIP structure manually (simple implementation for central directory and local headers), extracts meta.xml and mimetype, parses the XML using DOMParser, and dumps the properties to the screen. Note: This is pure JS without external libraries like JSZip for embedding simplicity; it assumes no encryption and standard ZIP structure.

Drag-and-Drop .SXC Property Dumper
Drag and drop a .SXC file here

Note: The ZIP extraction is simplified and assumes no compression for mimetype and meta.xml (common in .SXC). For full deflate support, integrate a library like pako.js.

4. Python Class for .SXC File Handling

This Python class uses zipfile and xml.etree.ElementTree to open, read, and print properties. It also supports writing (modifying meta.xml and saving a new file).

import zipfile
import xml.etree.ElementTree as ET
from io import BytesIO

class SXCFile:
    def __init__(self, filepath):
        self.filepath = filepath
        self.props = {}
        self.zip = None
        self.meta_xml = None
        self.namespaces = {
            'dc': 'http://purl.org/dc/elements/1.1/',
            'meta': 'urn:oasis:names:tc:opendocument:xmlns:meta:1.0',
            'office': 'urn:oasis:names:tc:opendocument:xmlns:office:1.0'
        }

    def open(self):
        self.zip = zipfile.ZipFile(self.filepath, 'r')
        self.decode()

    def decode(self):
        if 'mimetype' in self.zip.namelist():
            self.props['MIME Type'] = self.zip.read('mimetype').decode('utf-8').strip()
        if 'meta.xml' in self.zip.namelist():
            meta_content = self.zip.read('meta.xml')
            self.meta_xml = ET.parse(BytesIO(meta_content))
            meta = self.meta_xml.find('office:meta', self.namespaces)
            if meta is not None:
                for tag in ['dc:title', 'dc:description', 'dc:subject', 'dc:creator', 'dc:publisher', 'dc:contributor',
                            'dc:date', 'dc:type', 'dc:format', 'dc:identifier', 'dc:source', 'dc:language', 'dc:coverage',
                            'dc:rights', 'dc:relation', 'meta:initial-creator', 'meta:creation-date', 'meta:modification-date',
                            'meta:print-date', 'meta:printed-by', 'meta:generator', 'meta:editing-cycles', 'meta:editing-duration']:
                    elem = meta.find(tag, self.namespaces)
                    if elem is not None:
                        key = tag.split(':')[-1].replace('-', ' ').title()
                        self.props[key] = elem.text
                keywords = [k.text for k in meta.findall('dc:keywords/meta:keyword', self.namespaces) if k.text]
                if keywords:
                    self.props['Keywords'] = ', '.join(keywords)
                user_defined = {ud.attrib['{urn:oasis:names:tc:opendocument:xmlns:meta:1.0}name']: ud.text
                                for ud in meta.findall('meta:user-defined', self.namespaces)}
                if user_defined:
                    self.props['User-Defined Fields'] = user_defined
                stats = meta.find('meta:document-statistic', self.namespaces)
                if stats is not None:
                    stat_props = {attr.split('}')[-1].replace('meta:', '').replace('-', ' ').title(): stats.attrib[attr]
                                  for attr in stats.attrib if attr.startswith('{urn:oasis:names:tc:opendocument:xmlns:meta:1.0}')}
                    self.props['Document Statistics'] = stat_props

    def print_properties(self):
        for key, value in self.props.items():
            print(f"{key}: {value}")

    def write(self, new_filepath, updates=None):
        if updates:
            for key, value in updates.items():
                tag = key.lower().replace(' ', '-')
                if ':' not in tag:
                    tag = 'dc:' + tag if tag in ['title', 'description', 'subject'] else 'meta:' + tag
                elem = self.meta_xml.find(f'office:meta/{tag}', self.namespaces)
                if elem is not None:
                    elem.text = value
        with zipfile.ZipFile(new_filepath, 'w') as new_zip:
            for item in self.zip.infolist():
                data = self.zip.read(item.filename)
                if item.filename == 'meta.xml' and self.meta_xml:
                    data = ET.tostring(self.meta_xml.getroot(), encoding='utf-8', method='xml')
                new_zip.writestr(item, data)

# Example usage:
# sxc = SXCFile('sample.sxc')
# sxc.open()
# sxc.print_properties()
# sxc.write('modified.sxc', {'Title': 'New Title'})

5. Java Class for .SXC File Handling

This Java class uses java.util.zip and javax.xml.parsers to open, read, and print properties. It supports writing by modifying and re-zipping.

import java.io.*;
import java.util.*;
import java.util.zip.*;
import javax.xml.parsers.*;
import org.w3c.dom.*;
import org.xml.sax.SAXException;

public class SXCFile {
    private String filepath;
    private Map<String, Object> props = new HashMap<>();
    private ZipFile zip;
    private Document metaXml;
    private static final Map<String, String> namespaces = Map.of(
        "dc", "http://purl.org/dc/elements/1.1/",
        "meta", "urn:oasis:names:tc:opendocument:xmlns:meta:1.0",
        "office", "urn:oasis:names:tc:opendocument:xmlns:office:1.0"
    );

    public SXCFile(String filepath) {
        this.filepath = filepath;
    }

    public void open() throws IOException, ParserConfigurationException, SAXException {
        zip = new ZipFile(filepath);
        decode();
    }

    private void decode() throws IOException, ParserConfigurationException, SAXException {
        Enumeration<? extends ZipEntry> entries = zip.entries();
        DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        factory.setNamespaceAware(true);
        DocumentBuilder builder = factory.newDocumentBuilder();

        while (entries.hasMoreElements()) {
            ZipEntry entry = entries.nextElement();
            String name = entry.getName();
            if (name.equals("mimetype")) {
                try (InputStream is = zip.getInputStream(entry)) {
                    props.put("MIME Type", new String(is.readAllBytes()).trim());
                }
            } else if (name.equals("meta.xml")) {
                try (InputStream is = zip.getInputStream(entry)) {
                    metaXml = builder.parse(is);
                    Element meta = (Element) metaXml.getElementsByTagNameNS(namespaces.get("office"), "meta").item(0);
                    if (meta != null) {
                        String[] tags = {"dc:title", "dc:description", "dc:subject", "dc:creator", "dc:publisher", "dc:contributor",
                                         "dc:date", "dc:type", "dc:format", "dc:identifier", "dc:source", "dc:language", "dc:coverage",
                                         "dc:rights", "dc:relation", "meta:initial-creator", "meta:creation-date", "meta:modification-date",
                                         "meta:print-date", "meta:printed-by", "meta:generator", "meta:editing-cycles", "meta:editing-duration"};
                        for (String tag : tags) {
                            NodeList nl = meta.getElementsByTagNameNS(namespaces.get(tag.split(":")[0]), tag.split(":")[1]);
                            if (nl.getLength() > 0) {
                                String key = tag.split(":")[1].replace("-", " ").toUpperCase();
                                props.put(key, nl.item(0).getTextContent());
                            }
                        }
                        // Keywords
                        NodeList keywords = meta.getElementsByTagNameNS(namespaces.get("meta"), "keyword");
                        if (keywords.getLength() > 0) {
                            StringBuilder kw = new StringBuilder();
                            for (int i = 0; i < keywords.getLength(); i++) {
                                kw.append(keywords.item(i).getTextContent()).append(", ");
                            }
                            props.put("Keywords", kw.toString().trim().replaceAll(", $", ""));
                        }
                        // User-defined
                        Map<String, String> userDefined = new HashMap<>();
                        NodeList uds = meta.getElementsByTagNameNS(namespaces.get("meta"), "user-defined");
                        for (int i = 0; i < uds.getLength(); i++) {
                            Element ud = (Element) uds.item(i);
                            userDefined.put(ud.getAttributeNS(namespaces.get("meta"), "name"), ud.getTextContent());
                        }
                        if (!userDefined.isEmpty()) props.put("User-Defined Fields", userDefined);
                        // Statistics
                        Element stats = (Element) meta.getElementsByTagNameNS(namespaces.get("meta"), "document-statistic").item(0);
                        if (stats != null) {
                            Map<String, String> statProps = new HashMap<>();
                            NamedNodeMap attrs = stats.getAttributes();
                            for (int i = 0; i < attrs.getLength(); i++) {
                                Node attr = attrs.item(i);
                                if (attr.getNodeName().startsWith("meta:")) {
                                    String key = attr.getNodeName().substring(5).replace("-", " ").toUpperCase();
                                    statProps.put(key, attr.getNodeValue());
                                }
                            }
                            props.put("Document Statistics", statProps);
                        }
                    }
                }
            }
        }
    }

    public void printProperties() {
        for (Map.Entry<String, Object> entry : props.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    public void write(String newFilepath, Map<String, String> updates) throws IOException, TransformerException {
        import javax.xml.transform.*;
        import javax.xml.transform.dom.DOMSource;
        import javax.xml.transform.stream.StreamResult;

        if (updates != null && metaXml != null) {
            Element meta = (Element) metaXml.getElementsByTagNameNS(namespaces.get("office"), "meta").item(0);
            for (Map.Entry<String, String> update : updates.entrySet()) {
                String tag = update.getKey().toLowerCase().replace(" ", "-");
                String prefix = tag.startsWith("title") ? "dc" : "meta"; // Simplify
                NodeList nl = meta.getElementsByTagNameNS(namespaces.get(prefix), tag);
                if (nl.getLength() > 0) nl.item(0).setTextContent(update.getValue());
            }
        }

        try (ZipOutputStream newZip = new ZipOutputStream(new FileOutputStream(newFilepath))) {
            Enumeration<? extends ZipEntry> entries = zip.entries();
            while (entries.hasMoreElements()) {
                ZipEntry entry = entries.nextElement();
                newZip.putNextEntry(new ZipEntry(entry.getName()));
                if (entry.getName().equals("meta.xml") && metaXml != null) {
                    TransformerFactory tf = TransformerFactory.newInstance();
                    Transformer t = tf.newTransformer();
                    t.transform(new DOMSource(metaXml), new StreamResult(newZip));
                } else {
                    try (InputStream is = zip.getInputStream(entry)) {
                        is.transferTo(newZip);
                    }
                }
                newZip.closeEntry();
            }
        }
    }

    // Example usage:
    // public static void main(String[] args) throws Exception {
    //     SXCFile sxc = new SXCFile("sample.sxc");
    //     sxc.open();
    //     sxc.printProperties();
    //     Map<String, String> updates = Map.of("Title", "New Title");
    //     sxc.write("modified.sxc", updates);
    // }
}

6. JavaScript Class for .SXC File Handling

This JS class (Node.js compatible) uses adm-zip (assume installed via npm) for ZIP and xml2js for XML parsing. It reads, prints, and writes properties.

const AdmZip = require('adm-zip'); // npm install adm-zip
const xml2js = require('xml2js'); // npm install xml2js
const fs = require('fs');

class SXCFile {
    constructor(filepath) {
        this.filepath = filepath;
        this.props = {};
        this.metaXml = null;
    }

    open() {
        this.zip = new AdmZip(this.filepath);
        this.decode();
    }

    decode() {
        const mimetype = this.zip.readAsText('mimetype');
        if (mimetype) this.props['MIME Type'] = mimetype.trim();

        const metaContent = this.zip.readAsText('meta.xml');
        if (metaContent) {
            xml2js.parseString(metaContent, (err, result) => {
                if (err) return;
                this.metaXml = result;
                const meta = result['office:document-meta']['office:meta'][0];
                if (meta) {
                    ['dc:title', 'dc:description', 'dc:subject', 'dc:creator', 'dc:publisher', 'dc:contributor',
                     'dc:date', 'dc:type', 'dc:format', 'dc:identifier', 'dc:source', 'dc:language', 'dc:coverage',
                     'dc:rights', 'dc:relation', 'meta:initial-creator', 'meta:creation-date', 'meta:modification-date',
                     'meta:print-date', 'meta:printed-by', 'meta:generator', 'meta:editing-cycles', 'meta:editing-duration'].forEach(tag => {
                        const parts = tag.split(':');
                        const val = meta[parts[0] + ':' + parts[1]];
                        if (val) this.props[parts[1].replace(/-/g, ' ')] = val[0];
                    });
                    if (meta['dc:keywords']) {
                        const keywords = meta['dc:keywords'][0]['meta:keyword'] || [];
                        this.props['Keywords'] = keywords.join(', ');
                    }
                    if (meta['meta:user-defined']) {
                        const userDefined = {};
                        meta['meta:user-defined'].forEach(ud => {
                            userDefined[ud.$['meta:name']] = ud._;
                        });
                        this.props['User-Defined Fields'] = userDefined;
                    }
                    if (meta['meta:document-statistic']) {
                        const stats = meta['meta:document-statistic'][0].$;
                        const statProps = {};
                        for (let key in stats) {
                            if (key.startsWith('meta:')) statProps[key.substring(5).replace(/-/g, ' ')] = stats[key];
                        }
                        this.props['Document Statistics'] = statProps;
                    }
                }
            });
        }
    }

    printProperties() {
        console.log(JSON.stringify(this.props, null, 2));
    }

    write(newFilepath, updates) {
        if (updates && this.metaXml) {
            const meta = this.metaXml['office:document-meta']['office:meta'][0];
            for (let key in updates) {
                const tag = key.toLowerCase().replace(/ /g, '-');
                const parts = tag.startsWith('title') ? ['dc', tag] : ['meta', tag]; // Simplify
                if (meta[parts[0] + ':' + parts[1]]) meta[parts[0] + ':' + parts[1]][0] = updates[key];
            }
            const builder = new xml2js.Builder();
            const newMeta = builder.buildObject(this.metaXml);
            this.zip.updateFile('meta.xml', Buffer.from(newMeta));
        }
        this.zip.writeZip(newFilepath);
    }
}

// Example usage:
// const sxc = new SXCFile('sample.sxc');
// sxc.open();
// sxc.printProperties();
// sxc.write('modified.sxc', { Title: 'New Title' });

7. C++ Class for .SXC File Handling

This C++ class uses libzip (external library) for ZIP and tinyxml2 for XML (assume included). It opens, reads, prints, and writes properties.

#include <iostream>
#include <map>
#include <string>
#include <zip.h>
#include <tinyxml2.h> // Include tinyxml2.h

class SXCFile {
private:
    std::string filepath;
    std::map<std::string, std::string> props; // Simplified to string for demo
    zip_t* zip;
    tinyxml2::XMLDocument metaXml;

public:
    SXCFile(const std::string& fp) : filepath(fp), zip(nullptr) {}

    ~SXCFile() {
        if (zip) zip_close(zip);
    }

    void open() {
        int err = 0;
        zip = zip_open(filepath.c_str(), 0, &err);
        if (!zip) return;
        decode();
    }

    void decode() {
        zip_file_t* file = zip_fopen(zip, "mimetype", 0);
        if (file) {
            char buf[1024];
            zip_int64_t len = zip_fread(file, buf, sizeof(buf) - 1);
            if (len > 0) {
                buf[len] = '\0';
                props["MIME Type"] = std::string(buf).substr(0, len);
            }
            zip_fclose(file);
        }

        file = zip_fopen(zip, "meta.xml", 0);
        if (file) {
            std::string content;
            char buf[1024];
            zip_int64_t len;
            while ((len = zip_fread(file, buf, sizeof(buf))) > 0) {
                content.append(buf, len);
            }
            zip_fclose(file);
            if (metaXml.Parse(content.c_str()) == tinyxml2::XML_SUCCESS) {
                tinyxml2::XMLElement* meta = metaXml.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
                if (meta) {
                    const char* tags[] = {"dc:title", "dc:description", "dc:subject", "dc:creator", "dc:publisher", "dc:contributor",
                                          "dc:date", "dc:type", "dc:format", "dc:identifier", "dc:source", "dc:language", "dc:coverage",
                                          "dc:rights", "dc:relation", "meta:initial-creator", "meta:creation-date", "meta:modification-date",
                                          "meta:print-date", "meta:printed-by", "meta:generator", "meta:editing-cycles", "meta:editing-duration"};
                    for (const char* tag : tags) {
                        tinyxml2::XMLElement* elem = meta->FirstChildElement(tag);
                        if (elem && elem->GetText()) {
                            std::string key = std::string(tag).substr(std::string(tag).find(':') + 1);
                            std::replace(key.begin(), key.end(), '-', ' ');
                            props[key] = elem->GetText();
                        }
                    }
                    // Keywords (simplified)
                    std::string keywords;
                    tinyxml2::XMLElement* kw = meta->FirstChildElement("dc:keywords")->FirstChildElement("meta:keyword");
                    while (kw) {
                        if (kw->GetText()) keywords += std::string(kw->GetText()) + ", ";
                        kw = kw->NextSiblingElement("meta:keyword");
                    }
                    if (!keywords.empty()) props["Keywords"] = keywords.substr(0, keywords.size() - 2);
                    // User-defined (simplified to string)
                    std::string userDefined;
                    tinyxml2::XMLElement* ud = meta->FirstChildElement("meta:user-defined");
                    while (ud) {
                        userDefined += ud->Attribute("meta:name") + ": " + (ud->GetText() ? ud->GetText() : "") + "; ";
                        ud = ud->NextSiblingElement("meta:user-defined");
                    }
                    if (!userDefined.empty()) props["User-Defined Fields"] = userDefined.substr(0, userDefined.size() - 2);
                    // Statistics
                    tinyxml2::XMLElement* stats = meta->FirstChildElement("meta:document-statistic");
                    if (stats) {
                        std::string statStr;
                        const tinyxml2::XMLAttribute* attr = stats->FirstAttribute();
                        while (attr) {
                            std::string name = attr->Name();
                            if (name.find("meta:") == 0) {
                                name = name.substr(5);
                                std::replace(name.begin(), name.end(), '-', ' ');
                                statStr += name + ": " + attr->Value() + "; ";
                            }
                            attr = attr->Next();
                        }
                        if (!statStr.empty()) props["Document Statistics"] = statStr.substr(0, statStr.size() - 2);
                    }
                }
            }
        }
    }

    void printProperties() {
        for (const auto& p : props) {
            std::cout << p.first << ": " << p.second << std::endl;
        }
    }

    void write(const std::string& newFilepath, const std::map<std::string, std::string>& updates) {
        // Simplified: Re-open as write, update meta.xml
        zip_close(zip);
        zip = zip_open(filepath.c_str(), ZIP_CREATE, nullptr); // Re-open for write
        if (!zip) return;

        if (!updates.empty() && metaXml.RootElement()) {
            tinyxml2::XMLElement* meta = metaXml.FirstChildElement("office:document-meta")->FirstChildElement("office:meta");
            for (const auto& update : updates) {
                std::string tag = update.first;
                std::replace(tag.begin(), tag.end(), ' ', '-');
                tinyxml2::XMLElement* elem = meta->FirstChildElement(("dc:" + tag).c_str()); // Assume dc for simplicity
                if (!elem) elem = meta->FirstChildElement(("meta:" + tag).c_str());
                if (elem) elem->SetText(update.second.c_str());
            }
            tinyxml2::XMLPrinter printer;
            metaXml.Print(&printer);
            zip_file_add(zip, "meta.xml", zip_buffer_create((void*)printer.CStr(), printer.Size(), nullptr), ZIP_FL_OVERWRITE);
        }

        zip_close(zip);
        // To save as new, copy or rename if needed
        std::rename(filepath.c_str(), newFilepath.c_str()); // Simplified
    }
};

// Example usage:
// int main() {
//     SXCFile sxc("sample.sxc");
//     sxc.open();
//     sxc.printProperties();
//     std::map<std::string, std::string> updates = {{"title", "New Title"}};
//     sxc.write("modified.sxc", updates);
//     return 0;
// }