Task 820: .WS File Format

Task 820: .WS File Format

1. Properties of the .WS File Format Intrinsic to Its File System

The .WS file format refers to the WordStar document format, a proprietary plain text-based structure used by the WordStar word processor. The following is a comprehensive list of intrinsic properties derived from its specifications, focusing on structural, encoding, and formatting elements that define the format:

  • Encoding: Utilizes 7-bit ASCII for printable characters. In versions prior to 5.0, the high bit (8th bit) is repurposed for formatting information, such as marking the last character of words for microjustification, rather than extending the character set.
  • Line Termination: Normal lines (hard returns) end with the sequence 0Dh 0Ah (carriage return followed by line feed). Soft returns for word-wrapped lines use 8Dh 0Ah. Trailing spaces at line ends are preserved to maintain formatting fidelity.
  • Soft Space: The byte A0h is employed for tabbing, text justification, and left-margin indentation, distinguishing it from standard spaces (20h).
  • Extended Sequences: From version 3.4 onward, 3-byte sequences are used for advanced formatting, starting with lead-in 1Bh, followed by a middle byte (00h to FFh), and ending with trailer 1Ch.
  • Magic Numbers (Header Signifiers): Specific byte sequences at the file beginning indicate the version, such as 1D 7D 00 00 50 for version 5, 1D 7D 00 00 60 for version 6, and 1D 7D 00 00 70 for version 7. For WordStar 2000 (Windows version 2), the sequence is 57 53 32 30 30 30 (ASCII "WS2000").
  • Dot Commands: Formatting instructions embedded as lines starting with a period (.) in the first column, followed by two characters (e.g., .PO for page offset, .PA for new page, .PN for page number). These commands occupy space in the editor but not in printed output.
  • Control Characters for Formatting: Specific low-ASCII bytes toggle inline formatting, including 02h for bold on/off and 13h for underline on/off. Nested formatting is supported by sequential application without immediate clearing.
  • File Padding: Files are padded to multiples of 128 bytes with repeated 1Ah (EOF marker) bytes in unused space at the end.
  • Non-Document Mode: An optional mode for plain text files (e.g., for programming), adhering strictly to 7-bit ASCII without high-bit formatting or control characters, ensuring compatibility with other applications.
  • Backward Compatibility: Pre-version 5.0 files may use high-bit settings for printer-specific features like microjustification, which can render files unreadable in standard text viewers without processing.

These properties are fundamental to the format's structure and behavior within file systems, ensuring consistent parsing, rendering, and editing in compatible software.

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .WS File Dumping

The following is a self-contained HTML page with embedded JavaScript that can be embedded in a Ghost blog post. It allows users to drag and drop a .WS file, parses it to extract and display the properties listed in section 1.

WordStar .WS File Analyzer

Drag and Drop .WS File Analyzer

Drag and drop a .WS file here

4. Python Class for .WS File Handling

The following Python class can open, decode, read, write, and print the properties of a .WS file.

import struct
import os

class WSFile:
    def __init__(self, filepath):
        self.filepath = filepath
        self.bytes = None
        self.properties = {}

    def read(self):
        with open(self.filepath, 'rb') as f:
            self.bytes = f.read()
        self.decode_properties()

    def decode_properties(self):
        if not self.bytes:
            raise ValueError("No data loaded.")
        # Magic Number
        magic = ' '.join(f'{b:02x}' for b in self.bytes[:5])
        self.properties['Magic Number'] = magic

        # High Bit Usage
        self.properties['High Bit Usage'] = 'Detected' if any(b > 127 for b in self.bytes) else 'Not detected'

        # Line Endings
        hard, soft = 0, 0
        for i in range(len(self.bytes) - 1):
            if self.bytes[i] == 0x0D and self.bytes[i+1] == 0x0A:
                hard += 1
            if self.bytes[i] == 0x8D and self.bytes[i+1] == 0x0A:
                soft += 1
        self.properties['Line Endings'] = f'Hard Returns: {hard}, Soft Returns: {soft}'

        # Soft Spaces Count
        self.properties['Soft Spaces Count'] = sum(1 for b in self.bytes if b == 0xA0)

        # Extended Sequences
        seqs = []
        for i in range(len(self.bytes) - 2):
            if self.bytes[i] == 0x1B and self.bytes[i+2] == 0x1C:
                seqs.append(f'{self.bytes[i+1]:02x}')
        self.properties['Extended Sequences'] = seqs

        # Dot Commands
        text = self.bytes.decode('ascii', errors='ignore')
        lines = text.splitlines()
        dot_cmds = [line[:3] for line in lines if line.startswith('.')]
        self.properties['Dot Commands'] = dot_cmds

        # Formatting Controls
        bold = sum(1 for b in self.bytes if b == 0x02)
        underline = sum(1 for b in self.bytes if b == 0x13)
        self.properties['Formatting Controls'] = {'Bold': bold, 'Underline': underline}

        # Padding EOF Count
        eof_count = 0
        for b in reversed(self.bytes):
            if b == 0x1A:
                eof_count += 1
            else:
                break
        self.properties['Padding EOF Count'] = eof_count

        # Non-Document Mode
        self.properties['Non-Document Mode'] = 'Likely' if all(32 <= b <= 127 for b in self.bytes) else 'Unlikely'

    def print_properties(self):
        for key, value in self.properties.items():
            print(f'{key}: {value}')

    def write(self, new_filepath, content=b''):
        # Simple write: append content to a basic template
        header = b'\x1D\x7D\x00\x00\x70'  # Example v7 magic
        padding_len = (128 - (len(header + content) % 128)) % 128
        padding = b'\x1A' * padding_len
        with open(new_filepath, 'wb') as f:
            f.write(header + content + padding)

# Example usage:
# ws = WSFile('sample1.ws')
# ws.read()
# ws.print_properties()
# ws.write('new.ws', b'Test content\x0D\x0A')

5. Java Class for .WS File Handling

The following Java class can open, decode, read, write, and print the properties of a .WS file.

import java.io.*;
import java.util.*;

public class WSFile {
    private String filepath;
    private byte[] bytes;
    private Map<String, Object> properties = new HashMap<>();

    public WSFile(String filepath) {
        this.filepath = filepath;
    }

    public void read() throws IOException {
        try (FileInputStream fis = new FileInputStream(filepath)) {
            bytes = fis.readAllBytes();
        }
        decodeProperties();
    }

    private void decodeProperties() {
        if (bytes == null) throw new IllegalStateException("No data loaded.");

        // Magic Number
        StringBuilder magic = new StringBuilder();
        for (int i = 0; i < 5 && i < bytes.length; i++) {
            magic.append(String.format("%02x ", bytes[i] & 0xFF));
        }
        properties.put("Magic Number", magic.toString().trim());

        // High Bit Usage
        boolean highBit = false;
        for (byte b : bytes) {
            if ((b & 0xFF) > 127) {
                highBit = true;
                break;
            }
        }
        properties.put("High Bit Usage", highBit ? "Detected" : "Not detected");

        // Line Endings
        int hard = 0, soft = 0;
        for (int i = 0; i < bytes.length - 1; i++) {
            if ((bytes[i] & 0xFF) == 0x0D && (bytes[i+1] & 0xFF) == 0x0A) hard++;
            if ((bytes[i] & 0xFF) == 0x8D && (bytes[i+1] & 0xFF) == 0x0A) soft++;
        }
        properties.put("Line Endings", "Hard Returns: " + hard + ", Soft Returns: " + soft);

        // Soft Spaces Count
        int softSpaces = 0;
        for (byte b : bytes) {
            if ((b & 0xFF) == 0xA0) softSpaces++;
        }
        properties.put("Soft Spaces Count", softSpaces);

        // Extended Sequences
        List<String> seqs = new ArrayList<>();
        for (int i = 0; i < bytes.length - 2; i++) {
            if ((bytes[i] & 0xFF) == 0x1B && (bytes[i+2] & 0xFF) == 0x1C) {
                seqs.add(String.format("%02x", bytes[i+1] & 0xFF));
            }
        }
        properties.put("Extended Sequences", seqs);

        // Dot Commands
        String text = new String(bytes);
        String[] lines = text.split("\\r?\\n");
        List<String> dotCmds = new ArrayList<>();
        for (String line : lines) {
            if (line.startsWith(".")) dotCmds.add(line.substring(0, Math.min(3, line.length())));
        }
        properties.put("Dot Commands", dotCmds);

        // Formatting Controls
        int bold = 0, underline = 0;
        for (byte b : bytes) {
            if ((b & 0xFF) == 0x02) bold++;
            if ((b & 0xFF) == 0x13) underline++;
        }
        Map<String, Integer> controls = new HashMap<>();
        controls.put("Bold", bold);
        controls.put("Underline", underline);
        properties.put("Formatting Controls", controls);

        // Padding EOF Count
        int eofCount = 0;
        for (int i = bytes.length - 1; i >= 0; i--) {
            if ((bytes[i] & 0xFF) == 0x1A) eofCount++;
            else break;
        }
        properties.put("Padding EOF Count", eofCount);

        // Non-Document Mode
        boolean nonDoc = true;
        for (byte b : bytes) {
            int val = b & 0xFF;
            if (val < 32 || val > 127) {
                nonDoc = false;
                break;
            }
        }
        properties.put("Non-Document Mode", nonDoc ? "Likely" : "Unlikely");
    }

    public void printProperties() {
        for (Map.Entry<String, Object> entry : properties.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    public void write(String newFilepath, byte[] content) throws IOException {
        byte[] header = new byte[]{0x1D, 0x7D, 0x00, 0x00, 0x70};  // Example v7 magic
        byte[] combined = new byte[header.length + content.length];
        System.arraycopy(header, 0, combined, 0, header.length);
        System.arraycopy(content, 0, combined, header.length, content.length);
        int paddingLen = (128 - (combined.length % 128)) % 128;
        byte[] padding = new byte[paddingLen];
        Arrays.fill(padding, (byte) 0x1A);
        try (FileOutputStream fos = new FileOutputStream(newFilepath)) {
            fos.write(combined);
            fos.write(padding);
        }
    }

    // Example usage:
    // public static void main(String[] args) throws IOException {
    //     WSFile ws = new WSFile("sample1.ws");
    //     ws.read();
    //     ws.printProperties();
    //     ws.write("new.ws", "Test content\r\n".getBytes());
    // }
}

6. JavaScript Class for .WS File Handling

The following JavaScript class can open (via FileReader), decode, read, write (using Blob), and print the properties of a .WS file to the console.

class WSFile {
    constructor() {
        this.bytes = null;
        this.properties = {};
    }

    async read(file) {
        return new Promise((resolve, reject) => {
            const reader = new FileReader();
            reader.onload = (e) => {
                this.bytes = new Uint8Array(e.target.result);
                this.decodeProperties();
                resolve();
            };
            reader.onerror = reject;
            reader.readAsArrayBuffer(file);
        });
    }

    decodeProperties() {
        if (!this.bytes) throw new Error("No data loaded.");

        // Magic Number
        let magic = Array.from(this.bytes.slice(0, 5)).map(b => b.toString(16).padStart(2, '0')).join(' ');
        this.properties['Magic Number'] = magic;

        // High Bit Usage
        this.properties['High Bit Usage'] = this.bytes.some(b => b > 127) ? 'Detected' : 'Not detected';

        // Line Endings
        let hard = 0, soft = 0;
        for (let i = 0; i < this.bytes.length - 1; i++) {
            if (this.bytes[i] === 0x0D && this.bytes[i+1] === 0x0A) hard++;
            if (this.bytes[i] === 0x8D && this.bytes[i+1] === 0x0A) soft++;
        }
        this.properties['Line Endings'] = `Hard Returns: ${hard}, Soft Returns: ${soft}`;

        // Soft Spaces Count
        this.properties['Soft Spaces Count'] = this.bytes.filter(b => b === 0xA0).length;

        // Extended Sequences
        let seqs = [];
        for (let i = 0; i < this.bytes.length - 2; i++) {
            if (this.bytes[i] === 0x1B && this.bytes[i+2] === 0x1C) {
                seqs.push(this.bytes[i+1].toString(16).padStart(2, '0'));
            }
        }
        this.properties['Extended Sequences'] = seqs;

        // Dot Commands
        let text = new TextDecoder('ascii').decode(this.bytes);
        let lines = text.split(/\r?\n/);
        let dotCmds = lines.filter(line => line.startsWith('.')).map(line => line.substring(0, 3));
        this.properties['Dot Commands'] = dotCmds;

        // Formatting Controls
        let bold = this.bytes.filter(b => b === 0x02).length;
        let underline = this.bytes.filter(b => b === 0x13).length;
        this.properties['Formatting Controls'] = { Bold: bold, Underline: underline };

        // Padding EOF Count
        let eofCount = 0;
        for (let i = this.bytes.length - 1; i >= 0; i--) {
            if (this.bytes[i] === 0x1A) eofCount++;
            else break;
        }
        this.properties['Padding EOF Count'] = eofCount;

        // Non-Document Mode
        this.properties['Non-Document Mode'] = this.bytes.every(b => b >= 32 && b <= 127) ? 'Likely' : 'Unlikely';
    }

    printProperties() {
        console.log(this.properties);
    }

    write(filename, content) {
        const header = new Uint8Array([0x1D, 0x7D, 0x00, 0x00, 0x70]);  // Example v7 magic
        const contentBytes = new TextEncoder().encode(content);
        const combined = new Uint8Array(header.length + contentBytes.length);
        combined.set(header);
        combined.set(contentBytes, header.length);
        const paddingLen = (128 - (combined.length % 128)) % 128;
        const padding = new Uint8Array(paddingLen).fill(0x1A);
        const full = new Uint8Array(combined.length + paddingLen);
        full.set(combined);
        full.set(padding, combined.length);
        const blob = new Blob([full], { type: 'application/octet-stream' });
        const url = URL.createObjectURL(blob);
        const a = document.createElement('a');
        a.href = url;
        a.download = filename;
        a.click();
        URL.revokeObjectURL(url);
    }
}

// Example usage:
// const ws = new WSFile();
// const input = document.createElement('input');
// input.type = 'file';
// input.onchange = async (e) => {
//     await ws.read(e.target.files[0]);
//     ws.printProperties();
//     ws.write('new.ws', 'Test content\r\n');
// };
// input.click();

7. C++ Class for .WS File Handling

The following C++ class (using "c class" interpreted as C++) can open, decode, read, write, and print the properties of a .WS file to the console.

#include <iostream>
#include <fstream>
#include <vector>
#include <string>
#include <map>
#include <iomanip>
#include <sstream>

class WSFile {
private:
    std::string filepath;
    std::vector<unsigned char> bytes;
    std::map<std::string, std::string> properties;

public:
    WSFile(const std::string& fp) : filepath(fp) {}

    void read() {
        std::ifstream file(filepath, std::ios::binary);
        if (file) {
            file.seekg(0, std::ios::end);
            size_t size = file.tellg();
            file.seekg(0, std::ios::beg);
            bytes.resize(size);
            file.read(reinterpret_cast<char*>(bytes.data()), size);
            decodeProperties();
        } else {
            throw std::runtime_error("Failed to open file.");
        }
    }

    void decodeProperties() {
        if (bytes.empty()) throw std::runtime_error("No data loaded.");

        // Magic Number
        std::stringstream magicSs;
        for (size_t i = 0; i < 5 && i < bytes.size(); ++i) {
            magicSs << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(bytes[i]) << " ";
        }
        properties["Magic Number"] = magicSs.str();

        // High Bit Usage
        bool highBit = false;
        for (auto b : bytes) {
            if (b > 127) highBit = true;
        }
        properties["High Bit Usage"] = highBit ? "Detected" : "Not detected";

        // Line Endings
        int hard = 0, soft = 0;
        for (size_t i = 0; i < bytes.size() - 1; ++i) {
            if (bytes[i] == 0x0D && bytes[i+1] == 0x0A) ++hard;
            if (bytes[i] == 0x8D && bytes[i+1] == 0x0A) ++soft;
        }
        std::stringstream leSs;
        leSs << "Hard Returns: " << hard << ", Soft Returns: " << soft;
        properties["Line Endings"] = leSs.str();

        // Soft Spaces Count
        int softSpaces = 0;
        for (auto b : bytes) if (b == 0xA0) ++softSpaces;
        properties["Soft Spaces Count"] = std::to_string(softSpaces);

        // Extended Sequences
        std::stringstream seqSs;
        for (size_t i = 0; i < bytes.size() - 2; ++i) {
            if (bytes[i] == 0x1B && bytes[i+2] == 0x1C) {
                seqSs << std::hex << std::setw(2) << std::setfill('0') << static_cast<int>(bytes[i+1]) << " ";
            }
        }
        properties["Extended Sequences"] = seqSs.str();

        // Dot Commands
        std::string text(bytes.begin(), bytes.end());
        std::stringstream textSs(text);
        std::string line, dotCmds;
        while (std::getline(textSs, line)) {
            if (line.rfind('.', 0) == 0 && line.size() >= 3) {
                dotCmds += line.substr(0, 3) + " ";
            }
        }
        properties["Dot Commands"] = dotCmds;

        // Formatting Controls
        int bold = 0, underline = 0;
        for (auto b : bytes) {
            if (b == 0x02) ++bold;
            if (b == 0x13) ++underline;
        }
        std::stringstream fcSs;
        fcSs << "Bold: " << bold << ", Underline: " << underline;
        properties["Formatting Controls"] = fcSs.str();

        // Padding EOF Count
        int eofCount = 0;
        for (auto it = bytes.rbegin(); it != bytes.rend(); ++it) {
            if (*it == 0x1A) ++eofCount;
            else break;
        }
        properties["Padding EOF Count"] = std::to_string(eofCount);

        // Non-Document Mode
        bool nonDoc = true;
        for (auto b : bytes) {
            if (b < 32 || b > 127) nonDoc = false;
        }
        properties["Non-Document Mode"] = nonDoc ? "Likely" : "Unlikely";
    }

    void printProperties() const {
        for (const auto& prop : properties) {
            std::cout << prop.first << ": " << prop.second << std::endl;
        }
    }

    void write(const std::string& newFilepath, const std::string& content) {
        std::vector<unsigned char> header = {0x1D, 0x7D, 0x00, 0x00, 0x70};  // Example v7 magic
        std::vector<unsigned char> contentBytes(content.begin(), content.end());
        std::vector<unsigned char> combined;
        combined.reserve(header.size() + contentBytes.size());
        combined.insert(combined.end(), header.begin(), header.end());
        combined.insert(combined.end(), contentBytes.begin(), contentBytes.end());
        size_t paddingLen = (128 - (combined.size() % 128)) % 128;
        std::vector<unsigned char> padding(paddingLen, 0x1A);
        combined.insert(combined.end(), padding.begin(), padding.end());
        std::ofstream file(newFilepath, std::ios::binary);
        if (file) {
            file.write(reinterpret_cast<const char*>(combined.data()), combined.size());
        } else {
            throw std::runtime_error("Failed to write file.");
        }
    }
};

// Example usage:
// int main() {
//     try {
//         WSFile ws("sample1.ws");
//         ws.read();
//         ws.printProperties();
//         ws.write("new.ws", "Test content\r\n");
//     } catch (const std::exception& e) {
//         std::cerr << e.what() << std::endl;
//     }
//     return 0;
// }