Task 662: .SHTM File Format

Task 662: .SHTM File Format

  1. List of Properties Intrinsic to the .SHTM File Format

The .SHTM file format is a variant of HTML designed to support Server-Side Includes (SSI), a server-side scripting mechanism for dynamic content generation. It is fundamentally a plain-text file with specific structural and syntactic characteristics. Based on established specifications from web standards and implementations (e.g., Apache HTTP Server and NCSA HTTPd), the intrinsic properties are as follows:

  • File Extension: .shtm (alternative to .shtml or .stm for SSI-enabled files).
  • MIME Type: text/html.
  • Encoding: Plain text, typically UTF-8 or ISO-8859-1; supports only text data types.
  • Structure: Hierarchical HTML document with embedded SSI directives within HTML comments; follows standard HTML syntax (e.g., DOCTYPE declaration, ,  elements) augmented by SSI elements.
  • Special Feature: Embeddable SSI directives for server-side processing, enabling dynamic inclusion of content, execution of commands, variable echoing, and conditional logic.
  • SSI Directive Syntax: Directives are enclosed in HTML comments as <!--#directive parameter="value" -->. Key rules include: no space between <!-- and #, a space before --> (in Apache), case-insensitive directives, quoted parameter values, and support for expressions in control flow.
  • Supported Directives:
  • include: Transcludes content from another file or virtual path (parameters: file or virtual).
  • exec: Executes CGI scripts or server commands (parameters: cgi or cmd).
  • echo: Outputs HTTP environment variables (parameter: var).
  • config: Sets formatting for time, size, or error messages (parameters: timefmt, sizefmt, errmsg).
  • flastmod: Displays last modification date of a file (parameters: file or virtual).
  • fsize: Displays file size (parameters: file or virtual).
  • if/elif/else/endif: Conditional control flow (parameter: expr for if and elif).
  • set: Assigns values to SSI variables (parameters: var, value).
  • printenv: Lists all SSI variables and their values (no parameters).
  • Control Flow: Supports conditional branching (if-elif-else-endif) but no native loops; Turing-complete via recursion.
  • Server Dependency: Requires SSI-enabled web servers (e.g., Apache with mod_include) for processing; files are parsed sequentially from top to bottom.
  • Security Constraints: Relative paths only (no absolute paths or ../ traversal in include file); execution limited by server permissions.

These properties define the format's core behavior and distinguish it from plain HTML.

  1. Two Direct Download Links for .SHTM Files

The following are direct URLs to publicly accessible .SHTM files. To download, right-click the link and select "Save Link As" (or equivalent), ensuring the file is saved with the .shtm extension:

  1. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SHTM File Analysis

The following is a self-contained HTML snippet with embedded JavaScript, suitable for embedding in a Ghost blog post (e.g., via the HTML card). It creates a drag-and-drop zone for .SHTM files, reads the file content, parses for intrinsic properties (e.g., file metadata, encoding detection via heuristic, SSI directive extraction), and dumps them to the screen in a formatted output area. Encoding is assumed UTF-8 in JS FileReader; a simple heuristic checks for BOM or invalid chars.

Drag and drop a .SHTM file here to analyze its properties.

  1. Python Class for .SHTM File Handling

The following Python class uses standard libraries to open, read, parse, and write .SHTM files. It extracts and prints the intrinsic properties to the console. For decoding, it reads as text (UTF-8) and parses SSI directives using regex. The write method saves the original content back to disk.

import os
import re
import datetime
from pathlib import Path

class SHTMHandler:
    def __init__(self, file_path):
        self.file_path = Path(file_path)
        if not self.file_path.suffix.lower() == '.shtm':
            raise ValueError("File must have .shtm extension.")
        self.content = None

    def read(self):
        """Read and decode the file content."""
        with open(self.file_path, 'r', encoding='utf-8') as f:
            self.content = f.read()
        return self.content

    def extract_properties(self):
        """Extract intrinsic properties."""
        if self.content is None:
            self.read()
        ssi_regex = re.compile(r'<!--#(\w+)(\s+[^-]*?)-->', re.IGNORECASE)
        directives = [(m.group(1), m.group(2).strip()) for m in ssi_regex.finditer(self.content)]
        stat = self.file_path.stat()
        has_html = bool(re.search(r'<!DOCTYPE| <html', self.content, re.IGNORECASE))
        properties = {
            'File Extension': '.shtm',
            'MIME Type': 'text/html',
            'File Size (bytes)': stat.st_size,
            'Last Modified': datetime.datetime.fromtimestamp(stat.st_mtime).isoformat(),
            'Encoding': 'UTF-8',
            'SSI Directives Found': len(directives),
            'Directives': directives,
            'Contains HTML Structure': has_html,
            'Server Dependency': 'Requires SSI-enabled server'
        }
        return properties

    def print_properties(self):
        """Print all properties to console."""
        props = self.extract_properties()
        for key, value in props.items():
            print(f"{key}: {value}")

    def write(self, output_path=None):
        """Write the content back to file."""
        if output_path is None:
            output_path = self.file_path
        with open(output_path, 'w', encoding='utf-8') as f:
            f.write(self.content)

# Example usage:
# handler = SHTMHandler('example.shtm')
# handler.read()
# handler.print_properties()
# handler.write('output.shtm')
  1. Java Class for .SHTM File Handling

This Java class uses standard I/O and regex to handle .SHTM files. It reads the file, extracts properties, prints them to the console, and supports writing the content. Compile and run with javac SHTMHandler.java and java SHTMHandler <file_path>.

import java.io.*;
import java.nio.file.*;
import java.nio.file.attribute.BasicFileAttributes;
import java.time.Instant;
import java.time.format.DateTimeFormatter;
import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class SHTMHandler {
    private Path filePath;
    private String content;

    public SHTMHandler(String filePathStr) {
        this.filePath = Paths.get(filePathStr);
        if (!this.filePath.toString().toLowerCase().endsWith(".shtm")) {
            throw new IllegalArgumentException("File must have .shtm extension.");
        }
    }

    public void read() throws IOException {
        this.content = Files.readString(this.filePath, StandardCharsets.UTF_8);
    }

    public void extractAndPrintProperties() throws IOException {
        if (this.content == null) {
            read();
        }
        Pattern ssiPattern = Pattern.compile("<!--#(\\w+)(\\s+[^-]*)?-->", Pattern.CASE_INSENSITIVE);
        Matcher matcher = ssiPattern.matcher(this.content);
        List<String[]> directives = new ArrayList<>();
        while (matcher.find()) {
            directives.add(new String[]{matcher.group(1), matcher.group(2) != null ? matcher.group(2).trim() : ""});
        }
        BasicFileAttributes attrs = Files.readAttributes(this.filePath, BasicFileAttributes.class);
        boolean hasHtml = this.content.toLowerCase().contains("<!doctype") || this.content.toLowerCase().contains("<html");
        System.out.println("File Extension: .shtm");
        System.out.println("MIME Type: text/html");
        System.out.println("File Size (bytes): " + attrs.size());
        System.out.println("Last Modified: " + DateTimeFormatter.ISO_INSTANT.format(Instant.ofEpochMilli(attrs.lastModifiedTime().toMillis())));
        System.out.println("Encoding: UTF-8");
        System.out.println("SSI Directives Found: " + directives.size());
        System.out.println("Directives: " + directives);
        System.out.println("Contains HTML Structure: " + hasHtml);
        System.out.println("Server Dependency: Requires SSI-enabled server");
    }

    public void write(String outputPathStr) throws IOException {
        Path outputPath = outputPathStr != null ? Paths.get(outputPathStr) : this.filePath;
        Files.writeString(outputPath, this.content, StandardCharsets.UTF_8);
    }

    public static void main(String[] args) throws IOException {
        if (args.length != 1) {
            System.err.println("Usage: java SHTMHandler <file.shtm>");
            return;
        }
        SHTMHandler handler = new SHTMHandler(args[0]);
        handler.extractAndPrintProperties();
        // handler.write("output.shtm");
    }
}
  1. JavaScript Class for .SHTM File Handling (Node.js)

This Node.js class uses the fs module to read/write files, parses content with regex, and prints properties to the console. Run with node shtmHandler.js <file_path>.

const fs = require('fs');
const path = require('path');

class SHTMHandler {
  constructor(filePath) {
    this.filePath = path.resolve(filePath);
    if (!path.extname(this.filePath).toLowerCase() === '.shtm') {
      throw new Error('File must have .shtm extension.');
    }
    this.content = null;
  }

  read() {
    this.content = fs.readFileSync(this.filePath, 'utf8');
    return this.content;
  }

  extractProperties() {
    if (this.content === null) {
      this.read();
    }
    const ssiRegex = /<!--#(\w+)(\s+[^-]*?)-->/gi;
    let match, directives = [];
    while ((match = ssiRegex.exec(this.content)) !== null) {
      directives.push({ directive: match[1], params: match[2] ? match[2].trim() : '' });
    }
    const stats = fs.statSync(this.filePath);
    const hasHtml = this.content.toLowerCase().includes('<!doctype') || this.content.toLowerCase().includes('<html');
    return {
      'File Extension': '.shtm',
      'MIME Type': 'text/html',
      'File Size (bytes)': stats.size,
      'Last Modified': stats.mtime.toISOString(),
      'Encoding': 'UTF-8',
      'SSI Directives Found': directives.length,
      'Directives': directives,
      'Contains HTML Structure': hasHtml,
      'Server Dependency': 'Requires SSI-enabled server'
    };
  }

  printProperties() {
    const props = this.extractProperties();
    Object.entries(props).forEach(([key, value]) => {
      console.log(`${key}: ${JSON.stringify(value)}`);
    });
  }

  write(outputPath = null) {
    const targetPath = outputPath || this.filePath;
    fs.writeFileSync(targetPath, this.content, 'utf8');
  }
}

// Example usage:
// const handler = new SHTMHandler('example.shtm');
// handler.printProperties();
// handler.write('output.shtm');
  1. C Class for .SHTM File Handling

This standard C implementation uses POSIX functions (e.g., for Unix-like systems) to read/write files, parses with regex-like string scanning, and prints properties to stdout. Compile with gcc -o shtm_handler shtm_handler.c and run ./shtm_handler <file.shtm>. Note: Simple string matching for directives; no full regex library for brevity.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <time.h>
#include <sys/stat.h>
#include <unistd.h>

typedef struct {
    char directive[32];
    char params[256];
} Directive;

typedef struct {
    char file_ext[8];
    char mime_type[16];
    long file_size;
    char last_mod[64];
    char encoding[16];
    int ssi_count;
    Directive* directives;
    int has_html;
    char server_dep[64];
} Properties;

Properties* extract_properties(const char* file_path, char* content, size_t content_len) {
    Properties* props = malloc(sizeof(Properties));
    strcpy(props->file_ext, ".shtm");
    strcpy(props->mime_type, "text/html");
    strcpy(props->encoding, "UTF-8");
    strcpy(props->server_dep, "Requires SSI-enabled server");

    struct stat st;
    stat(file_path, &st);
    props->file_size = st.st_size;
    struct tm* tm = localtime(&st.st_mtime);
    strftime(props->last_mod, sizeof(props->last_mod), "%Y-%m-%dT%H:%M:%S", tm);

    // Simple scan for SSI directives
    props->ssi_count = 0;
    props->directives = malloc(100 * sizeof(Directive)); // Assume max 100
    char* pos = content;
    while ((pos = strstr(pos, "<!--#")) != NULL) {
        if (strstr(pos, "-->") == NULL) break;
        char* end = strstr(pos, "-->");
        if (end) {
            char dir[32] = {0};
            sscanf(pos + 5, "%31s", dir); // Skip "<!--#"
            if (strlen(dir) > 0 && dir[strlen(dir)-1] == ' ') dir[strlen(dir)-1] = '\0'; // Trim
            strcpy(props->directives[props->ssi_count].directive, dir);
            // Extract params roughly (up to -->)
            char param_buf[256] = {0};
            strncpy(param_buf, pos + 6 + strlen(dir), end - (pos + 6 + strlen(dir)));
            strcpy(props->directives[props->ssi_count].params, param_buf);
            props->ssi_count++;
            pos = end;
        } else {
            break;
        }
    }

    props->has_html = (strstr(content, "<!DOCTYPE") != NULL || strstr(content, "<html") != NULL);

    return props;
}

void print_properties(Properties* props) {
    printf("File Extension: %s\n", props->file_ext);
    printf("MIME Type: %s\n", props->mime_type);
    printf("File Size (bytes): %ld\n", props->file_size);
    printf("Last Modified: %s\n", props->last_mod);
    printf("Encoding: %s\n", props->encoding);
    printf("SSI Directives Found: %d\n", props->ssi_count);
    for (int i = 0; i < props->ssi_count; i++) {
        printf("  Directive %d: %s (params: %s)\n", i+1, props->directives[i].directive, props->directives[i].params);
    }
    printf("Contains HTML Structure: %s\n", props->has_html ? "Yes" : "No");
    printf("Server Dependency: %s\n", props->server_dep);
}

void write_file(const char* file_path, const char* content, size_t len) {
    FILE* f = fopen(file_path, "w");
    if (f) {
        fwrite(content, 1, len, f);
        fclose(f);
    }
}

int main(int argc, char** argv) {
    if (argc != 2) {
        fprintf(stderr, "Usage: %s <file.shtm>\n", argv[0]);
        return 1;
    }
    FILE* f = fopen(argv[1], "r");
    if (!f) {
        perror("Error opening file");
        return 1;
    }
    fseek(f, 0, SEEK_END);
    size_t len = ftell(f);
    fseek(f, 0, SEEK_SET);
    char* content = malloc(len + 1);
    fread(content, 1, len, f);
    content[len] = '\0';
    fclose(f);

    Properties* props = extract_properties(argv[1], content, len);
    print_properties(props);

    // Free resources
    free(props->directives);
    free(props);
    free(content);

    // Example write: write_file("output.shtm", content, len);

    return 0;
}