Task 698: .STW File Format

Task 698: .STW File Format

.STW File Format Specifications

The .STW file format is used for Statistica Workbooks, which are containers for data analysis projects in TIBCO Statistica software. It is based on the Microsoft Compound File Binary (CFB) format, a file-system-like structure that organizes data into storages and streams. The format allows for structured storage of spreadsheets, graphs, reports, and other objects. The intrinsic properties are those of the CFB format, including the header and directory entries that define the file's "file system" structure.

1. List of All Properties Intrinsic to the File Format

The CFB format has a fixed 512-byte header followed by sectors for FAT, DIFAT, MiniFAT, directory entries, and user data. The key properties are as follows:

Header Properties (0-511 bytes):

  • Signature: 8 bytes (fixed: 0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1)
  • CLSID: 16 bytes (usually all zeros)
  • Minor Version: 2 bytes (UINT16, typically 0x003E for version 3)
  • Major Version: 2 bytes (UINT16, 0x0003 for version 3, 0x0004 for version 4)
  • Byte Order: 2 bytes (UINT16, fixed 0xFFFE for little-endian)
  • Sector Shift: 2 bytes (UINT16, 0x0009 for 512-byte sectors in v3, 0x000C for 4096-byte sectors in v4)
  • Mini Sector Shift: 2 bytes (UINT16, fixed 0x0006 for 64-byte mini sectors)
  • Reserved: 6 bytes (must be zeros)
  • Number of Directory Sectors: 4 bytes (UINT32, 0 in v3, non-zero in v4 if >4GB)
  • Number of FAT Sectors: 4 bytes (UINT32, number of sectors in the FAT chain)
  • First Directory Sector Location: 4 bytes (UINT32, sector ID of the first directory sector)
  • Transaction Signature Number: 4 bytes (UINT32, for transactioning; often 0)
  • Mini Stream Cutoff Size: 4 bytes (UINT32, fixed 0x00001000 or 4096 bytes)
  • First Mini FAT Sector Location: 4 bytes (UINT32, sector ID of the first MiniFAT sector)
  • Number of Mini FAT Sectors: 4 bytes (UINT32, number of MiniFAT sectors)
  • First DIFAT Sector Location: 4 bytes (UINT32, sector ID of the first DIFAT sector)
  • Number of DIFAT Sectors: 4 bytes (UINT32, number of DIFAT sectors)
  • DIFAT Array: 436 bytes (109 UINT32 entries, sector IDs for the first 109 FAT sectors; unused set to 0xFFFFFFFF)

Directory Entry Properties (each entry is 128 bytes, stored in directory sectors, forming a red-black tree):

  • Name: 64 bytes (UTF-16 little-endian string, null-terminated)
  • Name Length: 2 bytes (UINT16, length in bytes including null terminator; must be even)
  • Object Type: 1 byte (UINT8, 0x00 = unknown, 0x01 = storage, 0x02 = stream, 0x05 = root storage)
  • Color Flag: 1 byte (UINT8, 0x00 = red, 0x01 = black for tree balancing)
  • Left Sibling ID: 4 bytes (UINT32, directory entry ID of left sibling or 0xFFFFFFFF if none)
  • Right Sibling ID: 4 bytes (UINT32, directory entry ID of right sibling or 0xFFFFFFFF if none)
  • Child ID: 4 bytes (UINT32, directory entry ID of child or 0xFFFFFFFF if none)
  • CLSID: 16 bytes (GUID for the storage object)
  • State Bits: 4 bytes (UINT32, user-defined flags)
  • Creation Time: 8 bytes (FILETIME structure, UTC timestamp)
  • Modified Time: 8 bytes (FILETIME structure, UTC timestamp)
  • Starting Sector Location: 4 bytes (UINT32, starting sector ID for the stream)
  • Stream Size: 8 bytes (UINT64, size of the stream in bytes; high 4 bytes must be zero in v3)

Other intrinsic structures:

  • FAT (Fat Allocation Table): Array of UINT32 entries in FAT sectors, indicating next sector in chain (values: positive = next sector, 0xFFFFFFFE = end of chain, 0xFFFFFFFF = free, 0xFFFFFFFD = FAT sector, 0xFFFFFFFC = DIFAT sector)
  • DIFAT (Double Indirect FAT): Chain of UINT32 entries for additional FAT sectors beyond the header DIFAT
  • MiniFAT: Similar to FAT, but for mini streams (small data < cutoff size, stored in mini sectors within the mini stream)
  • Sectors: Data blocks of 2^SectorShift bytes, starting from sector 0 (after header)
  • Mini Sectors: 2^MiniSectorShift bytes for small streams
  • Root Directory Entry: Always the first directory entry, its stream is the mini stream containing all mini sectors

These properties define the file's structure, allowing hierarchical storage and streaming of data.

  1. https://figshare.com/ndownloader/files/14558810 (Statistica workbook sample from figshare dataset)
  2. https://bazaar.abuse.ch/download/77dec68adc9d69b54bb2121cdb1d0a188e4da2f750958062311f1be8133fa3b0/ (Sample detected as Statistica workbook format; note: this is from a malware database, use with caution)

3. Ghost Blog Embedded HTML JavaScript for Drag and Drop .STW File Dump

Here is an embedded HTML/JavaScript snippet for a Ghost blog post. It allows dragging and dropping a .STW file to parse and dump the header and directory entry properties to the screen.

Drag and Drop .STW File Here

4. Python Class for .STW File

Here is a Python class to open, decode, read, write, and print the properties.

import struct
import os

class STWParser:
    def __init__(self, filename):
        self.filename = filename
        self.data = none
        self.header = {}
        self.directory_entries = []
        self.sector_size = None

    def read(self):
        with open(self.filename, 'rb') as f:
            self.data = f.read()
        self.parse_header()
        self.parse_directory_entries()
        self.print_properties()

    def parse_header(self):
        header_format = '<8s16sH H H H H 6s I I I I I I I I I I 436s'
        header_size = struct.calcsize(header_format)
        unpacked = struct.unpack(header_format, self.data[:header_size])
        self.header = {
            'Signature': unpacked[0].hex(' '),
            'CLSID': unpacked[1].hex(' '),
            'Minor Version': unpacked[2],
            'Major Version': unpacked[3],
            'Byte Order': unpacked[4],
            'Sector Shift': unpacked[5],
            'Mini Sector Shift': unpacked[6],
            'Reserved': unpacked[7].hex(' '),
            'Number of Directory Sectors': unpacked[8],
            'Number of FAT Sectors': unpacked[9],
            'First Directory Sector Location': unpacked[10],
            'Transaction Signature Number': unpacked[11],
            'Mini Stream Cutoff Size': unpacked[12],
            'First Mini FAT Sector Location': unpacked[13],
            'Number of Mini FAT Sectors': unpacked[14],
            'First DIFAT Sector Location': unpacked[15],
            'Number of DIFAT Sectors': unpacked[16],
            'DIFAT': [struct.unpack('<I', unpacked[17][i*4:(i+4)])[0] for i in range(109)],
        }
        self.sector_size = 1 << self.header['Sector Shift']

    def parse_directory_entries(self):
        first_dir_sector = self.header['First Directory Sector Location']
        offset = (first_dir_sector + 1) * self.sector_size  # Skip header
        entry_format = '<64s H B B I I I 16s I Q Q I Q'
        entry_size = struct.calcsize(entry_format)
        while offset + entry_size <= len(self.data):
            unpacked = struct.unpack(entry_format, self.data[offset:offset+entry_size])
            name_bytes = unpacked[0]
            name_length = unpacked[1]
            name = name_bytes[:name_length - 2].decode('utf-16le')
            if name == '': break  # End of entries
            entry = {
                'Name': name,
                'Name Length': name_length,
                'Object Type': unpacked[2],
                'Color Flag': unpacked[3],
                'Left Sibling ID': unpacked[4],
                'Right Sibling ID': unpacked[5],
                'Child ID': unpacked[6],
                'CLSID': unpacked[7].hex(' '),
                'State Bits': unpacked[8],
                'Creation Time': unpacked[9],
                'Modified Time': unpacked[10],
                'Starting Sector Location': unpacked[11],
                'Stream Size': unpacked[12],
            }
            self.directory_entries.append(entry)
            offset += entry_size

    def print_properties(self):
        print("Header Properties:")
        for key, value in self.header.items():
            print(f"{key}: {value}")
        print("\nDirectory Entries:")
        for entry in self.directory_entries:
            print(entry)

    def write(self, new_filename=None):
        if not new_filename:
            new_filename = self.filename + '.new'
        with open(new_filename, 'wb') as f:
            f.write(self.data)
        print(f"File written to {new_filename}")

# Example usage
# parser = STWParser('example.stw')
# parser.read()
# parser.write()

5. Java Class for .STW File

Here is a Java class to open, decode, read, write, and print the properties.

import java.io.FileInputStream;
import java.io.FileOutputStream;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.channels.FileChannel;
import java.nio.file.StandardOpenOption;
import java.util.ArrayList;
import java.util.List;

public class STWParser {
    private String filename;
    private ByteBuffer buffer;
    private final int HEADER_SIZE = 512;
    private int sectorSize;
    private final List<String> headerProperties = new ArrayList<>();
    private final List<String> directoryEntries = new ArrayList<>();

    public STWParser(String filename) {
        this.filename = filename;
    }

    public void read() throws IOException {
        try (FileInputStream fis = new FileInputStream(filename); FileChannel channel = fis.getChannel()) {
            buffer = ByteBuffer.allocate((int) channel.size());
            channel.read(buffer);
            buffer.flip();
            buffer.order(ByteOrder.LITTLE_ENDIAN);
            parseHeader();
            parseDirectoryEntries();
            printProperties();
        }
    }

    private void parseHeader() {
        byte[] signature = new byte[8];
        buffer.get(signature);
        headerProperties.add("Signature: " + bytesToHex(signature));
        byte[] clsid = new byte[16];
        buffer.get(clsid);
        headerProperties.add("CLSID: " + bytesToHex(clsid));
        headerProperties.add("Minor Version: " + buffer.getShort());
        headerProperties.add("Major Version: " + buffer.getShort());
        headerProperties.add("Byte Order: 0x" + Integer.toHexString(buffer.getShort() & 0xFFFF));
        short sectorShift = buffer.getShort();
        headerProperties.add("Sector Shift: " + sectorShift);
        sectorSize = 1 << sectorShift;
        headerProperties.add("Mini Sector Shift: " + buffer.getShort());
        byte[] reserved = new byte[6];
        buffer.get(reserved);
        headerProperties.add("Reserved: " + bytesToHex(reserved));
        headerProperties.add("Number of Directory Sectors: " + buffer.getInt());
        headerProperties.add("Number of FAT Sectors: " + buffer.getInt());
        int firstDirSector = buffer.getInt();
        headerProperties.add("First Directory Sector Location: " + firstDirSector);
        headerProperties.add("Transaction Signature Number: " + buffer.getInt());
        headerProperties.add("Mini Stream Cutoff Size: " + buffer.getInt());
        headerProperties.add("First Mini FAT Sector Location: " + buffer.getInt());
        headerProperties.add("Number of Mini FAT Sectors: " + buffer.getInt());
        headerProperties.add("First DIFAT Sector Location: " + buffer.getInt());
        headerProperties.add("Number of DIFAT Sectors: " + buffer.getInt());
        StringBuilder difat = new StringBuilder("DIFAT Array: ");
        for (int i = 0; i < 109; i++) {
            difat.append(buffer.getInt()).append(" ");
        }
        headerProperties.add(difat.toString().trim());
    }

    private void parseDirectoryEntries() {
        buffer.position(HEADER_SIZE); // Reset to start, but use offset
        int firstDirSector = Integer.parseInt(headerProperties.get(10).split(": ")[1]);
        int offset = (firstDirSector + 1) * sectorSize;
        buffer.position(offset);
        while (buffer.hasRemaining()) {
            byte[] nameBytes = new byte[64];
            buffer.get(nameBytes);
            short nameLength = buffer.getShort();
            if (nameLength == 0) break;
            String name = new String(nameBytes, 0, nameLength - 2, "UTF-16LE");
            String entry = "Name: " + name + "\nName Length: " + nameLength + "\nObject Type: " + buffer.get() + "\nColor Flag: " + buffer.get() + "\nLeft Sibling ID: " + buffer.getInt() + "\nRight Sibling ID: " + buffer.getInt() + "\nChild ID: " + buffer.getInt();
            byte[] clsid = new byte[16];
            buffer.get(clsid);
            entry += "\nCLSID: " + bytesToHex(clsid) + "\nState Bits: " + buffer.getInt() + "\nCreation Time: " + buffer.getLong() + "\nModified Time: " + buffer.getLong() + "\nStarting Sector Location: " + buffer.getInt() + "\nStream Size: " + buffer.getLong();
            directoryEntries.add(entry);
        }
    }

    private void printProperties() {
        System.out.println("Header Properties:");
        for (String prop : headerProperties) {
            System.out.println(prop);
        }
        System.out.println("\nDirectory Entries:");
        for (String entry : directoryEntries) {
            System.out.println(entry);
        }
    }

    public void write(String newFilename) throws IOException {
        try (FileOutputStream fos = new FileOutputStream(newFilename); FileChannel channel = fos.getChannel()) {
            buffer.position(0);
            channel.write(buffer);
        }
        System.out.println("File written to " + newFilename);
    }

    private String bytesToHex(byte[] bytes) {
        StringBuilder sb = new StringBuilder();
        for (byte b : bytes) {
            sb.append(String.format("%02X ", b));
        }
        return sb.toString().trim();
    }

    public static void main(String[] args) throws IOException {
        STWParser parser = new STWParser("example.stw");
        parser.read();
        parser.write("example.new.stw");
    }
}

6. JavaScript Class for .STW File

Here is a JavaScript class to open, decode, read, write, and print the properties (using Node.js for file I/O).

const fs = require('fs');

class STWParser {
  constructor(filename) {
    this.filename = filename;
    this.data = null;
    this.header = {};
    this.directoryEntries = [];
    this.sectorSize = null;
  }

  read() {
    this.data = fs.readSync(this.filename);
    const dataView = new DataView(this.data.buffer);
    this.parseHeader(dataView);
    this.parseDirectoryEntries(dataView);
    this.printProperties();
  }

  parseHeader(dataView) {
    this.header.signature = Array.from(new Uint8Array(this.data, 0, 8)).map(b => b.toString(16).padStart(2, '0')).join(' ');
    this.header.clsid = Array.from(new Uint8Array(this.data, 8, 16)).map(b => b.toString(16).padStart(2, '0')).join(' ');
    this.header.minorVersion = dataView.getUint16(24, true);
    this.header.majorVersion = dataView.getUint16(26, true);
    this.header.byteOrder = dataView.getUint16(28, true);
    this.header.sectorShift = dataView.getUint16(30, true);
    this.sectorSize = 1 << this.header.sectorShift;
    this.header.miniSectorShift = dataView.getUint16(32, true);
    this.header.reserved = Array.from(new Uint8Array(this.data, 34, 6)).map(b => b.toString(16).padStart(2, '0')).join(' ');
    this.header.numDirSectors = dataView.getUint32(40, true);
    this.header.numFatSectors = dataView.getUint32(44, true);
    this.header.firstDirSector = dataView.getUint32(48, true);
    this.header.transactionSig = dataView.getUint32(52, true);
    this.header.miniStreamCutoff = dataView.getUint32(56, true);
    this.header.firstMiniFatSector = dataView.getUint32(60, true);
    this.header.numMiniFatSectors = dataView.getUint32(64, true);
    this.header.firstDifatSector = dataView.getUint32(68, true);
    this.header.numDifatSectors = dataView.getUint32(72, true);
    this.header.difat = [];
    for (let i = 0; i < 109; i++) {
      this.header.difat.push(dataView.getUint32(76 + i * 4, true));
    }
  }

  parseDirectoryEntries(dataView) {
    let offset = (this.header.firstDirSector + 1) * this.sectorSize;
    while (offset + 128 < this.data.length) {
      const nameBytes = new Uint8Array(this.data, offset, 64);
      const nameLength = dataView.getUint16(offset + 64, true);
      if (nameLength == 0) break;
      const name = new TextDecoder('utf-16le').decode(nameBytes.slice(0, nameLength - 2));
      const entry = {
        name: name,
        nameLength: nameLength,
        objectType: dataView.getUint8(offset + 66),
        colorFlag: dataView.getUint8(offset + 67),
        leftSibling: dataView.getUint32(offset + 68, true),
        rightSibling: dataView.getUint32(offset + 72, true),
        child: dataView.getUint32(offset + 76, true),
        clsid: Array.from(new Uint8Array(this.data, offset + 80, 16)).map(b => b.toString(16).padStart(2, '0')).join(' '),
        stateBits: dataView.getUint32(offset + 96, true),
        creationTime: dataView.getBigUint64(offset + 100, true),
        modifiedTime: dataView.getBigUint64(offset + 108, true),
        startingSector: dataView.getUint32(offset + 116, true),
        streamSize: dataView.getBigUint64(offset + 120, true),
      };
      this.directoryEntries.push(entry);
      offset += 128;
    }
  }

  printProperties() {
    console.log('Header Properties:');
    console.log(this.header);
    console.log('Directory Entries:');
    console.log(this.directoryEntries);
  }

  write(newFilename) {
    fs.writeFileSync(newFilename || this.filename + '.new', this.data);
    console.log(`File written to ${newFilename || this.filename + '.new'}`);
  }
}

// Example usage
// const parser = new STWParser('example.stw');
// parser.read();
// parser.write();

7. C Class for .STW File

Here is a C class (struct with functions) to open, decode, read, write, and print the properties.

#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>

typedef struct {
    char filename[256];
    uint8_t *data;
    size_t size;
    struct {
        uint8_t signature[8];
        uint8_t clsid[16];
        uint16_t minorVersion;
        uint16_t majorVersion;
        uint16_t byteOrder;
        uint16_t sectorShift;
        uint16_t miniSectorShift;
        uint8_t reserved[6];
        uint32_t numDirSectors;
        uint32_t numFatSectors;
        uint32_t firstDirSector;
        uint32_t transactionSig;
        uint32_t miniStreamCutoff;
        uint32_t firstMiniFatSector;
        uint32_t numMiniFatSectors;
        uint32_t firstDifatSector;
        uint32_t numDifatSectors;
        uint32_t difat[109];
    } header;
    struct DirectoryEntry {
        char name[32]; // Simplified, assuming ASCII for print
        uint16_t nameLength;
        uint8_t objectType;
        uint8_t colorFlag;
        uint32_t leftSibling;
        uint32_t rightSibling;
        uint32_t child;
        uint8_t clsid[16];
        uint32_t stateBits;
        uint64_t creationTime;
        uint64_t modifiedTime;
        uint32_t startingSector;
        uint64_t streamSize;
        struct DirectoryEntry *next;
    } *directoryEntries;
    uint32_t sectorSize;
} STWParser;

void readSTW(STWParser *parser) {
    FILE *f = fopen(parser->filename, "rb");
    if (!f) return;
    fseek(f, 0, SEEK_END);
    parser->size = ftell(f);
    fseek(f, 0, SEEK_SET);
    parser->data = malloc(parser->size);
    fread(parser->data, 1, parser->size, f);
    fclose(f);

    // Parse header
    memcpy(parser->header.signature, parser->data, 8);
    memcpy(parser->header.clsid, parser->data + 8, 16);
    memcpy(&parser->header.minorVersion, parser->data + 24, 2);
    memcpy(&parser->header.majorVersion, parser->data + 26, 2);
    memcpy(&parser->header.byteOrder, parser->data + 28, 2);
    memcpy(&parser->header.sectorShift, parser->data + 30, 2);
    parser->sectorSize = 1 << parser->header.sectorShift;
    memcpy(&parser->header.miniSectorShift, parser->data + 32, 2);
    memcpy(parser->header.reserved, parser->data + 34, 6);
    memcpy(&parser->header.numDirSectors, parser->data + 40, 4);
    memcpy(&parser->header.numFatSectors, parser->data + 44, 4);
    memcpy(&parser->header.firstDirSector, parser->data + 48, 4);
    memcpy(&parser->header.transactionSig, parser->data + 52, 4);
    memcpy(&parser->header.miniStreamCutoff, parser->data + 56, 4);
    memcpy(&parser->header.firstMiniFatSector, parser->data + 60, 4);
    memcpy(&parser->header.numMiniFatSectors, parser->data + 64, 4);
    memcpy(&parser->header.firstDifatSector, parser->data + 68, 4);
    memcpy(&parser->header.numDifatSectors, parser->data + 72, 4);
    for (int i = 0; i < 109; i++) {
        memcpy(&parser->header.difat[i], parser->data + 76 + i * 4, 4);
    }

    // Parse directory entries
    uint32_t offset = (parser->header.firstDirSector + 1) * parser->sectorSize;
    struct DirectoryEntry *current = NULL;
    while (offset + 128 < parser->size) {
        struct DirectoryEntry *entry = malloc(sizeof(struct DirectoryEntry));
        uint8_t nameBytes[64];
        memcpy(nameBytes, parser->data + offset, 64);
        memcpy(&entry->nameLength, parser->data + offset + 64, 2);
        if (entry->nameLength == 0) {
            free(entry);
            break;
        }
        // Convert UTF-16 to ASCII for simplicity
        for (int j = 0; j < (entry->nameLength - 2)/2; j++) {
            entry->name[j] = nameBytes[j*2];
        }
        entry->name[(entry->nameLength - 2)/2] = '\0';
        memcpy(&entry->objectType, parser->data + offset + 66, 1);
        memcpy(&entry->colorFlag, parser->data + offset + 67, 1);
        memcpy(&entry->leftSibling, parser->data + offset + 68, 4);
        memcpy(&entry->rightSibling, parser->data + offset + 72, 4);
        memcpy(&entry->child, parser->data + offset + 76, 4);
        memcpy(entry->clsid, parser->data + offset + 80, 16);
        memcpy(&entry->stateBits, parser->data + offset + 96, 4);
        memcpy(&entry->creationTime, parser->data + offset + 100, 8);
        memcpy(&entry->modifiedTime, parser->data + offset + 108, 8);
        memcpy(&entry->startingSector, parser->data + offset + 116, 4);
        memcpy(&entry->streamSize, parser->data + offset + 120, 8);
        entry->next = NULL;
        if (current) {
            current->next = entry;
        } else {
            parser->directoryEntries = entry;
        }
        current = entry;
        offset += 128;
    }
}

void printProperties(STWParser *parser) {
    printf("Header Properties:\n");
    printf("Signature: ");
    for (int i = 0; i < 8; i++) printf("%02x ", parser->header.signature[i]);
    printf("\n");
    // Similarly print other header fields
    // ... (omit for brevity, similar to Python print)
    printf("\nDirectory Entries:\n");
    struct DirectoryEntry *entry = parser->directoryEntries;
    while (entry) {
        printf("Name: %s\n", entry->name);
        // Print other fields...
        // ... (omit for brevity)
        entry = entry->next;
    }
}

void writeSTW(STWParser *parser, const char *newFilename) {
    FILE *f = fopen(newFilename, "wb");
    if (!f) return;
    fwrite(parser->data, 1, parser->size, f);
    fclose(f);
    printf("File written to %s\n", newFilename);
}

void freeSTW(STWParser *parser) {
    free(parser->data);
    struct DirectoryEntry *entry = parser->directoryEntries;
    while (entry) {
        struct DirectoryEntry *next = entry->next;
        free(entry);
        entry = next;
    }
}

// Example usage
// int main() {
//     STWParser parser;
//     strcpy(parser.filename, "example.stw");
//     readSTW(&parser);
//     printProperties(&parser);
//     writeSTW(&parser, "example.new.stw");
//     freeSTW(&parser);
//     return 0;
// }