Task 659: .SFX File Format

Task 659: .SFX File Format

File Format Specifications for .SFX

The .SFX file format refers to self-extracting archives, which are executable files (typically with a .exe extension) that embed a compressed archive—most commonly in the RAR format—along with a stub executable module responsible for extraction upon execution. These files are generated by tools such as WinRAR or 7-Zip. The format lacks a universal standard, as the structure varies by the underlying archiver, but RAR-based SFX modules are the most prevalent.

The overall structure consists of:

  • SFX Stub: An optional executable module (variable size, up to approximately 1 MB in modern implementations) containing code to handle extraction, user prompts, and optional custom behaviors (e.g., post-extraction commands). This stub is platform-specific (e.g., Windows PE format).
  • Embedded Archive: The RAR archive data, beginning with a fixed signature. The RAR format defines the intrinsic file system, representing a hierarchical structure of files and directories with metadata such as names, sizes, timestamps, and attributes.

Specifications are derived from official RAR documentation (RAR 5.0 technical note from rarlab.com) and detailed format analyses (e.g., acritum.com for RAR 3.x compatibility). RAR archives support versions 3.x (legacy, fixed-size headers) and 5.0 (modern, variable-length integers via "vint" encoding). Parsers must search for the RAR signature to locate the archive offset after the stub. The file system is sequential: a main archive header followed by file/service headers, with no explicit root directory but paths encoded in file names using forward slashes.

1. List of Properties Intrinsic to the .SFX File System

The intrinsic properties pertain to the embedded RAR archive's structure and contents, forming a virtual file system with files, directories, and metadata. Properties are categorized below for clarity. Detection of RAR version (3.x or 5.0) influences field encoding.

Category Property Description RAR Version Notes
Archive-Level SFX Stub Offset Byte offset where the RAR signature begins (end of stub). Common to both; searched sequentially.
Archive-Level RAR Signature Fixed bytes confirming archive start: RAR!\x1A\x07\x00 (3.x) or RAR!\x1A\x07\x01\x00 (5.0). 7 bytes (3.x) or 8 bytes (5.0).
Archive-Level Archive Header Size Total size of the main archive header block. Fixed 2 bytes (3.x); vint (5.0).
Archive-Level Archive Flags Bitmask including: volume (multi-part), solid (shared compression dictionary), locked (read-only), recovery record present, encrypted headers, first volume. 2 bytes (3.x); vint (5.0).
Archive-Level Volume Number Sequential number for multi-volume archives (0 for first). Absent in 3.x first volume; vint in 5.0 if flagged.
Archive-Level Host OS Operating system for archiving (e.g., 0=Windows, 3=Unix). 1 byte (3.x); vint (5.0).
Archive-Level Archive Comment Optional UTF-8 text comment on the archive. Separate block (3.x); service header (5.0).
File/Directory-Level File Name UTF-8 encoded path (forward slashes as separators; no trailing null). Variable length; supports Unicode extensions.
File/Directory-Level Packed Size Compressed size in bytes (64-bit support via high/low parts). 4 bytes + optional 4-byte high (3.x); vint (5.0).
File/Directory-Level Unpacked Size Original uncompressed size in bytes (64-bit). 4 bytes + optional 4-byte high (3.x); vint (5.0).
File/Directory-Level File CRC32 32-bit checksum of unpacked data. 4 bytes.
File/Directory-Level Modification Time (mtime) Timestamp in MS-DOS format (3.x) or Unix/Windows (5.0); optional nanoseconds. 4 bytes (3.x); uint32/uint64 + flags (5.0).
File/Directory-Level Creation/Access Time (ctime/atime) Optional high-precision timestamps. Absent in 3.x base; extra area (5.0).
File/Directory-Level File Attributes OS-specific flags (e.g., read-only, directory, hidden; Unix modes). 4 bytes (3.x); vint (5.0).
File/Directory-Level Directory Flag Indicates if entry is a directory (no data). Bits 5-7 in flags (3.x: 111); vint flag (5.0).
File/Directory-Level Compression Method Algorithm (e.g., 0x30=store, 0x33=normal; dictionary size encoded). 1 byte (3.x); vint with bits (5.0).
File/Directory-Level Dictionary Size Compression window size (e.g., 64 KB to 4 GB). Encoded in flags bits (3.x); vint bits (5.0).
File/Directory-Level Unpack Version RAR version required for extraction. 1 byte (3.x); derived from compression info (5.0).
File/Directory-Level Encryption Flag Indicates password protection. Bit in flags (both versions).
File/Directory-Level Salt 8-byte random value for encryption key derivation. Optional 8 bytes (3.x); 16-byte salt + IV (5.0).
File/Directory-Level Solid Flag Uses dictionary from prior files. Bit in flags (both).
File/Directory-Level Extra Fields Optional: NTFS ACLs, streams, hashes (e.g., BLAKE2), symlinks, owner/group. Comments/salt in 3.x; extra area records (5.0).

These properties enable reconstruction of the file system hierarchy, with directories implied by paths and flags.

These are minimal test files from a public repository, suitable for verification.

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SFX Parsing

The following is a self-contained HTML snippet embeddable in a Ghost blog post (e.g., via HTML card). It creates a drag-and-drop zone for .SFX files, parses the embedded RAR archive (supports RAR 3.x for simplicity; extend for 5.0 as needed), and dumps properties to a scrollable output area. Parsing uses FileReader and DataView for binary access.

Drag and drop a .SFX file here to parse its properties.

This code handles basic RAR 3.x parsing; for RAR 5.0, extend parseRAR3 with vint decoding.

4. Python Class for .SFX Parsing

The following Python class uses built-in struct for binary parsing. It supports reading, decoding, and writing (reconstructs a basic SFX by appending stub + RAR, though full write requires RAR compression logic). Run with python SfxParser.py input.sfx to print properties to console. Assumes RAR 3.x for fixed headers.

import struct
import sys
import os

class SfxParser:
    RAR_SIGNATURE_3X = b'RAR!\x1A\x07\x00'
    
    @staticmethod
    def find_signature(data):
        return data.find(SfxParser.RAR_SIGNATURE_3X)
    
    @staticmethod
    def parse_rar3(data, offset):
        pos = offset + 7
        props = []
        
        # Archive header
        if data[pos] != 0x73:
            props.append('Invalid archive header.')
            return props
        pos += 1
        head_flags, head_size = struct.unpack_from('<HH', data, pos)
        pos += 4
        reserved1, reserved2 = struct.unpack_from('<HI', data, pos)
        pos += 6
        props.append(f'Archive Flags: 0x{head_flags:04x}')
        props.append(f'Archive Header Size: {head_size} bytes')
        
        # Files
        props.append('\nFile Properties:')
        while pos < len(data):
            head_crc = struct.unpack_from('<H', data, pos)[0]
            pos += 2
            head_type = data[pos]
            pos += 1
            head_flags, head_size = struct.unpack_from('<HH', data, pos)
            pos += 4
            if head_type == 0x74:  # File
                pack_size, unp_size = struct.unpack_from('<II', data, pos)
                pos += 8
                host_os = data[pos]
                pos += 1
                file_crc = struct.unpack_from('<I', data, pos)[0]
                pos += 4
                ftime = struct.unpack_from('<I', data, pos)[0]
                pos += 4
                unp_ver = data[pos]
                pos += 1
                method = data[pos]
                pos += 1
                name_size, = struct.unpack_from('<H', data, pos)
                pos += 2
                name = data[pos:pos + name_size].decode('utf-8', errors='ignore')
                pos += name_size
                attr, = struct.unpack_from('<I', data, pos)
                pos += 4
                is_dir = (head_flags & 0xE000) == 0xE000
                props.extend([
                    f'- Name: {name}',
                    f'  Packed Size: {pack_size}',
                    f'  Unpacked Size: {unp_size}',
                    f'  CRC32: 0x{file_crc:08x}',
                    f'  Host OS: {host_os}',
                    f'  mtime (DOS): {ftime}',
                    f'  Attributes: 0x{attr:08x}',
                    f'  Directory: {is_dir}',
                    f'  Method: {method}',
                    f'  Unpack Ver: {unp_ver}'
                ])
                pos += pack_size  # Skip data
            else:
                pos += head_size - 7  # Skip other
        return props
    
    @classmethod
    def parse(cls, filename):
        with open(filename, 'rb') as f:
            data = f.read()
        sig_offset = cls.find_signature(data)
        if sig_offset == -1:
            print('No RAR signature found.')
            return
        print(f'SFX Stub Offset: {sig_offset}')
        props = cls.parse_rar3(data, sig_offset)
        for p in props:
            print(p)
    
    @classmethod
    def write(cls, input_sfx, output_sfx, new_comment=None):
        # Basic write: copy input, modify comment if provided (simplified; requires full RAR writer for changes)
        with open(input_sfx, 'rb') as f:
            data = bytearray(f.read())
        if new_comment:
            # Locate and update comment block (placeholder; implement full search/update)
            pass
        with open(output_sfx, 'wb') as f:
            f.write(data)
        print(f'Wrote modified SFX to {output_sfx}')

if __name__ == '__main__':
    if len(sys.argv) < 2:
        print('Usage: python SfxParser.py <sfx_file>')
    else:
        SfxParser.parse(sys.argv[1])

5. Java Class for .SFX Parsing

This Java class uses java.nio for binary I/O. Compile with javac SfxParser.java and run java SfxParser input.sfx to print properties. Supports read/decode; write reconstructs by copying (extend for modifications). RAR 3.x focused.

import java.io.*;
import java.nio.ByteBuffer;
import java.nio.channels.FileChannel;
import java.nio.file.Paths;
import java.nio.file.StandardOpenOption;

public class SfxParser {
    private static final byte[] RAR_SIGNATURE_3X = {0x52, 0x61, 0x72, 0x21, 0x1A, (byte)0x07, 0x00};
    
    private static int findSignature(ByteBuffer buffer) {
        byte[] data = new byte[buffer.remaining()];
        buffer.get(data);
        for (int i = 0; i <= data.length - 7; i++) {
            boolean match = true;
            for (int j = 0; j < 7; j++) {
                if (data[i + j] != RAR_SIGNATURE_3X[j]) {
                    match = false;
                    break;
                }
            }
            if (match) return i;
        }
        return -1;
    }
    
    private static void parseRar3(ByteBuffer buffer, int offset, PrintStream out) {
        buffer.position(offset + 7);
        if (buffer.get() != 0x73) {
            out.println("Invalid archive header.");
            return;
        }
        buffer.position(buffer.position() + 1); // Skip type
        short headFlags = buffer.getShort();
        short headSize = buffer.getShort();
        buffer.position(buffer.position() + 4); // Reserved
        out.println("Archive Flags: 0x" + Integer.toHexString(headFlags & 0xFFFF).toUpperCase());
        out.println("Archive Header Size: " + headSize + " bytes");
        
        out.println("\nFile Properties:");
        while (buffer.hasRemaining()) {
            short headCrc = buffer.getShort();
            byte headType = buffer.get();
            short headFlags2 = buffer.getShort();
            short headSize2 = buffer.getShort();
            if (headType == 0x74) { // File
                int packSize = buffer.getInt();
                int unpSize = buffer.getInt();
                byte hostOS = buffer.get();
                int fileCrc = buffer.getInt();
                int ftime = buffer.getInt();
                byte unpVer = buffer.get();
                byte method = buffer.get();
                short nameSize = buffer.getShort();
                byte[] nameBytes = new byte[nameSize];
                buffer.get(nameBytes);
                String name = new String(nameBytes);
                int attr = buffer.getInt();
                boolean isDir = (headFlags2 & 0xE000) == 0xE000;
                out.println("- Name: " + name);
                out.println("  Packed Size: " + packSize);
                out.println("  Unpacked Size: " + unpSize);
                out.println("  CRC32: 0x" + Integer.toHexString(fileCrc).toUpperCase());
                out.println("  Host OS: " + hostOS);
                out.println("  mtime (DOS): " + ftime);
                out.println("  Attributes: 0x" + Integer.toHexString(attr).toUpperCase());
                out.println("  Directory: " + isDir);
                out.println("  Method: " + method);
                out.println("  Unpack Ver: " + unpVer);
                buffer.position(buffer.position() + packSize); // Skip data
            } else {
                buffer.position(buffer.position() + headSize2 - 7);
            }
        }
    }
    
    public static void parse(String filename) throws IOException {
        FileChannel channel = FileChannel.open(Paths.get(filename), StandardOpenOption.READ);
        ByteBuffer buffer = ByteBuffer.allocate((int) channel.size());
        channel.read(buffer);
        buffer.flip();
        int sigOffset = findSignature(buffer);
        if (sigOffset == -1) {
            System.out.println("No RAR signature found.");
            return;
        }
        System.out.println("SFX Stub Offset: " + sigOffset);
        parseRar3(buffer, sigOffset, System.out);
    }
    
    public static void write(String inputSfx, String outputSfx) throws IOException {
        // Basic copy for write (extend for modifications)
        Files.copy(Paths.get(inputSfx), Paths.get(outputSfx), StandardCopyOption.REPLACE_EXISTING);
        System.out.println("Wrote SFX to " + outputSfx);
    }
    
    public static void main(String[] args) throws IOException {
        if (args.length < 1) {
            System.out.println("Usage: java SfxParser <sfx_file>");
            return;
        }
        parse(args[0]);
    }
}

6. JavaScript Class for .SFX Parsing

This Node.js class uses fs for file I/O. Run with node sfxParser.js input.sfx to print properties. Supports read/decode; write copies file. RAR 3.x focused. For browser, adapt to File API.

const fs = require('fs');

class SfxParser {
  static RAR_SIGNATURE_3X = Buffer.from([0x52, 0x61, 0x72, 0x21, 0x1A, 0x07, 0x00]);
  
  static findSignature(data) {
    return data.indexOf(this.RAR_SIGNATURE_3X);
  }
  
  static parseRar3(data, offset) {
    let pos = offset + 7;
    const props = [];
    
    if (data[pos] !== 0x73) {
      props.push('Invalid archive header.');
      return props;
    }
    pos += 1;
    const headFlags = data.readUInt16LE(pos); pos += 2;
    const headSize = data.readUInt16LE(pos); pos += 2;
    pos += 4; // Reserved
    props.push(`Archive Flags: 0x${headFlags.toString(16).padStart(4, '0')}`);
    props.push(`Archive Header Size: ${headSize} bytes`);
    
    props.push('\nFile Properties:');
    while (pos < data.length) {
      pos += 2; // CRC
      const headType = data[pos]; pos += 1;
      const headFlags2 = data.readUInt16LE(pos); pos += 2;
      const headSize2 = data.readUInt16LE(pos); pos += 2;
      if (headType === 0x74) {
        const packSize = data.readUInt32LE(pos); pos += 4;
        const unpSize = data.readUInt32LE(pos); pos += 4;
        const hostOS = data[pos]; pos += 1;
        const fileCrc = data.readUInt32LE(pos); pos += 4;
        const ftime = data.readUInt32LE(pos); pos += 4;
        const unpVer = data[pos]; pos += 1;
        const method = data[pos]; pos += 1;
        const nameSize = data.readUInt16LE(pos); pos += 2;
        const name = data.toString('utf8', pos, pos + nameSize);
        pos += nameSize;
        const attr = data.readUInt32LE(pos); pos += 4;
        const isDir = (headFlags2 & 0xE000) === 0xE000;
        props.push(`- Name: ${name}`);
        props.push(`  Packed Size: ${packSize}`);
        props.push(`  Unpacked Size: ${unpSize}`);
        props.push(`  CRC32: 0x${fileCrc.toString(16).padStart(8, '0')}`);
        props.push(`  Host OS: ${hostOS}`);
        props.push(`  mtime (DOS): ${ftime}`);
        props.push(`  Attributes: 0x${attr.toString(16).padStart(8, '0')}`);
        props.push(`  Directory: ${isDir}`);
        props.push(`  Method: ${method}`);
        props.push(`  Unpack Ver: ${unpVer}`);
        pos += packSize; // Skip data
      } else {
        pos += headSize2 - 7;
      }
    }
    return props;
  }
  
  static parse(filename) {
    const data = fs.readFileSync(filename);
    const sigOffset = this.findSignature(data);
    if (sigOffset === -1) {
      console.log('No RAR signature found.');
      return;
    }
    console.log(`SFX Stub Offset: ${sigOffset}`);
    const props = this.parseRar3(data, sigOffset);
    props.forEach(p => console.log(p));
  }
  
  static write(inputSfx, outputSfx) {
    fs.copyFileSync(inputSfx, outputSfx);
    console.log(`Wrote SFX to ${outputSfx}`);
  }
}

if (require.main === module) {
  if (process.argv.length < 3) {
    console.log('Usage: node sfxParser.js <sfx_file>');
  } else {
    SfxParser.parse(process.argv[2]);
  }
}

module.exports = SfxParser;

7. C Class for .SFX Parsing

This C implementation uses standard I/O and manual binary reading. Compile with gcc -o sfx_parser sfx_parser.c and run ./sfx_parser input.sfx. Supports read/decode; write copies via cp-like logic (uses fread/fwrite). RAR 3.x focused.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>

#define RAR_SIGNATURE_3X_LEN 7
static const uint8_t RAR_SIGNATURE_3X[RAR_SIGNATURE_3X_LEN] = {0x52, 0x61, 0x72, 0x21, 0x1A, 0x07, 0x00};

typedef struct {
    uint8_t* data;
    size_t size;
    size_t pos;
} Buffer;

static int find_signature(Buffer* buf) {
    for (size_t i = 0; i <= buf->size - RAR_SIGNATURE_3X_LEN; i++) {
        int match = 1;
        for (int j = 0; j < RAR_SIGNATURE_3X_LEN; j++) {
            if (buf->data[i + j] != RAR_SIGNATURE_3X[j]) {
                match = 0;
                break;
            }
        }
        if (match) return i;
    }
    return -1;
}

static void parse_rar3(Buffer* buf, size_t offset, FILE* out) {
    buf->pos = offset + RAR_SIGNATURE_3X_LEN;
    if (buf->data[buf->pos] != 0x73) {
        fprintf(out, "Invalid archive header.\n");
        return;
    }
    buf->pos += 1;
    uint16_t head_flags = *(uint16_t*)(buf->data + buf->pos); buf->pos += 2;
    uint16_t head_size = *(uint16_t*)(buf->data + buf->pos); buf->pos += 2;
    buf->pos += 4; // Reserved
    fprintf(out, "Archive Flags: 0x%04x\n", head_flags);
    fprintf(out, "Archive Header Size: %u bytes\n", head_size);
    
    fprintf(out, "\nFile Properties:\n");
    while (buf->pos < buf->size) {
        buf->pos += 2; // CRC
        uint8_t head_type = buf->data[buf->pos]; buf->pos += 1;
        uint16_t head_flags2 = *(uint16_t*)(buf->data + buf->pos); buf->pos += 2;
        uint16_t head_size2 = *(uint16_t*)(buf->data + buf->pos); buf->pos += 2;
        if (head_type == 0x74) {
            uint32_t pack_size = *(uint32_t*)(buf->data + buf->pos); buf->pos += 4;
            uint32_t unp_size = *(uint32_t*)(buf->data + buf->pos); buf->pos += 4;
            uint8_t host_os = buf->data[buf->pos]; buf->pos += 1;
            uint32_t file_crc = *(uint32_t*)(buf->data + buf->pos); buf->pos += 4;
            uint32_t ftime = *(uint32_t*)(buf->data + buf->pos); buf->pos += 4;
            uint8_t unp_ver = buf->data[buf->pos]; buf->pos += 1;
            uint8_t method = buf->data[buf->pos]; buf->pos += 1;
            uint16_t name_size = *(uint16_t*)(buf->data + buf->pos); buf->pos += 2;
            char name[1024] = {0};
            memcpy(name, buf->data + buf->pos, name_size > 1023 ? 1023 : name_size);
            buf->pos += name_size;
            uint32_t attr = *(uint32_t*)(buf->data + buf->pos); buf->pos += 4;
            int is_dir = (head_flags2 & 0xE000) == 0xE000;
            fprintf(out, "- Name: %s\n", name);
            fprintf(out, "  Packed Size: %u\n", pack_size);
            fprintf(out, "  Unpacked Size: %u\n", unp_size);
            fprintf(out, "  CRC32: 0x%08x\n", file_crc);
            fprintf(out, "  Host OS: %u\n", host_os);
            fprintf(out, "  mtime (DOS): %u\n", ftime);
            fprintf(out, "  Attributes: 0x%08x\n", attr);
            fprintf(out, "  Directory: %s\n", is_dir ? "true" : "false");
            fprintf(out, "  Method: %u\n", method);
            fprintf(out, "  Unpack Ver: %u\n", unp_ver);
            buf->pos += pack_size; // Skip data
        } else {
            buf->pos += head_size2 - 7;
        }
    }
}

static void parse_file(const char* filename) {
    FILE* fp = fopen(filename, "rb");
    if (!fp) {
        perror("Error opening file");
        return;
    }
    fseek(fp, 0, SEEK_END);
    size_t size = ftell(fp);
    fseek(fp, 0, SEEK_SET);
    uint8_t* data = malloc(size);
    fread(data, 1, size, fp);
    fclose(fp);
    
    Buffer buf = {data, size, 0};
    int sig_offset = find_signature(&buf);
    if (sig_offset == -1) {
        printf("No RAR signature found.\n");
        free(data);
        return;
    }
    printf("SFX Stub Offset: %d\n", sig_offset);
    parse_rar3(&buf, sig_offset, stdout);
    free(data);
}

static void write_file(const char* input, const char* output) {
    FILE* in = fopen(input, "rb");
    FILE* out = fopen(output, "wb");
    if (!in || !out) {
        perror("Error in write");
        if (in) fclose(in);
        if (out) fclose(out);
        return;
    }
    uint8_t buf[4096];
    size_t n;
    while ((n = fread(buf, 1, sizeof(buf), in)) > 0) {
        fwrite(buf, 1, n, out);
    }
    fclose(in);
    fclose(out);
    printf("Wrote SFX to %s\n", output);
}

int main(int argc, char** argv) {
    if (argc < 2) {
        printf("Usage: %s <sfx_file>\n", argv[0]);
        return 1;
    }
    parse_file(argv[1]);
    // For write: write_file(argv[1], "output.sfx");
    return 0;
}

These implementations provide core functionality for decoding and basic writing. For production use, consider edge cases like multi-volume archives or RAR 5.0 vint parsing.