Task 641: .SBV File Format

Task 641: .SBV File Format

1. Properties of the .SBV File Format Intrinsic to Its Structure

The .SBV (SubViewer) file format is a simple, text-based format primarily used for video subtitles and closed captions, notably by YouTube. It has no formal file system in the traditional sense (e.g., no binary headers or metadata blocks); instead, its "intrinsic properties" refer to the core structural elements that define its organization and parsing rules. Based on the official specification, the key properties are as follows:

Property Description
Encoding Plain text, typically UTF-8 to support international characters in subtitles. No byte-order mark (BOM) is required.
Header None. The file begins directly with the first subtitle entry.
Footer None. The file ends with the last subtitle entry, optionally followed by a blank line.
Entry Separator Blank lines (i.e., empty lines with no characters except newline) separate individual subtitle entries.
Timestamp Line Each entry starts with a single line containing start and end times in the format HH:MM:SS.mmm,HH:MM:SS.mmm, where:
- HH is hours (0-99, zero-padded).
- MM is minutes (00-59).
- SS is seconds (00-59).
- mmm is milliseconds (000-999).
- Times are separated by a comma (,).
Example: 0:00:00.000,0:00:07.000.
Subtitle Text Block Following the timestamp line, one or more lines of subtitle text.
- Text is plain, with no formatting tags.
- Lines are concatenated with newlines preserved.
- Optional speaker prefix on the first text line: >> SPEAKER_NAME: (e.g., >> TIM: Hello world), where the prefix indicates the speaker and is separated from the text by a space.
Entry Count Variable; the number of entries is determined by parsing until the end of the file.
Line Endings Standard newline (\n); compatible with Unix-style, but Windows (\r\n) is tolerated in practice.
Limitations No support for styling, positioning, or advanced features (e.g., italics, colors). Entries must be in chronological order. Maximum line length is not strictly defined but should be reasonable for display (typically <80 characters per line).

These properties ensure the format is lightweight and human-readable, facilitating easy editing in text editors.

Based on web searches, direct downloads for sample .SBV files are available from specialized subtitle resource sites. Here are two examples:

These links provide valid .SBV files for testing the code below. If a link becomes unavailable, similar samples can be generated using the specification or downloaded from YouTube caption exports.

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SBV Parsing

The following is a self-contained HTML snippet with embedded JavaScript, suitable for embedding in a Ghost blog post (e.g., via the HTML card). It allows users to drag and drop an .SBV file, parses it, and dumps the properties (number of entries, timestamps, speakers, and text) to the screen in a formatted <div>. No external libraries are required.

Drag and drop an .SBV file here to parse its properties.

This code parses the file according to the .SBV specification, handles optional speaker prefixes, and displays the properties in a readable format. It validates the timestamp pattern using a regular expression.

4. Python Class for .SBV Handling

The following Python class, SBVHandler, opens an .SBV file, parses its properties, prints them to the console, and supports writing to a new file. It uses standard library modules only.

import re
from typing import List, Dict, Optional

class SBVHandler:
    def __init__(self, file_path: str):
        self.file_path = file_path
        self.entries: List[Dict[str, Optional[str]]] = []
        self.num_entries = 0

    def read(self) -> None:
        """Read and parse the .SBV file."""
        with open(self.file_path, 'r', encoding='utf-8') as f:
            lines = f.readlines()

        current_entry = None
        i = 0
        while i < len(lines):
            line = lines[i].rstrip('\r\n')
            if re.match(r'^\d+:\d{2}:\d{2}\.\d{3},\d+:\d{2}:\d{2}\.\d{3}$', line):
                if current_entry:
                    self.entries.append(current_entry)
                start, end = line.split(',')
                current_entry = {'start': start, 'end': end, 'text': [], 'speaker': None}
            elif current_entry and line.strip() == '':
                if current_entry:
                    self.entries.append(current_entry)
                current_entry = None
            elif current_entry and line:
                if not current_entry['text'] and line.startswith('>> '):
                    match = re.match(r'^>> ([^:]+): (.*)$', line)
                    if match:
                        current_entry['speaker'] = match.group(1)
                        current_entry['text'].append(match.group(2))
                    else:
                        current_entry['text'].append(line)
                else:
                    current_entry['text'].append(line)
            i += 1
        if current_entry:
            self.entries.append(current_entry)
        self.num_entries = len(self.entries)

    def print_properties(self) -> None:
        """Print all parsed properties to console."""
        print(f"Number of subtitle entries: {self.num_entries}")
        print()
        for idx, entry in enumerate(self.entries, 1):
            print(f"Entry {idx}:")
            print(f"  Timestamp: {entry['start']} -- {entry['end']}")
            if entry['speaker']:
                print(f"  Speaker: {entry['speaker']}")
            print("  Text:")
            for text_line in entry['text']:
                print(f"    {text_line}")
            print()

    def write(self, output_path: str) -> None:
        """Write parsed properties back to a new .SBV file."""
        with open(output_path, 'w', encoding='utf-8') as f:
            for entry in self.entries:
                f.write(f"{entry['start']},{entry['end']}\n")
                if entry['speaker']:
                    f.write(f">> {entry['speaker']}: {entry['text'][0]}\n")
                    for t in entry['text'][1:]:
                        f.write(f"{t}\n")
                else:
                    for t in entry['text']:
                        f.write(f"{t}\n")
                f.write("\n")

# Example usage:
# handler = SBVHandler('sample.sbv')
# handler.read()
# handler.print_properties()
# handler.write('output.sbv')

To use, instantiate the class with a file path, call read(), then print_properties(), and optionally write() to save.

5. Java Class for .SBV Handling

The following Java class, SBVHandler, uses standard java.io packages to open, parse, print, and write .SBV files. Compile and run with javac SBVHandler.java and java SBVHandler <input_file> <output_file>.

import java.io.*;
import java.util.*;
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class SBVHandler {
    private String filePath;
    private List<Map<String, String>> entries = new ArrayList<>();
    private int numEntries = 0;

    public SBVHandler(String filePath) {
        this.filePath = filePath;
    }

    public void read() throws IOException {
        BufferedReader reader = new BufferedReader(new FileReader(filePath));
        String line;
        Map<String, String> currentEntry = null;
        boolean inTextBlock = false;

        while ((line = reader.readLine()) != null) {
            line = line.trim(); // Trim for comparison, but preserve original for text
            String originalLine = reader.readLine() != null ? reader.readLine() : line; // Wait, no; use line as is
            line = line.replaceAll("\\r$", ""); // Normalize line endings

            if (Pattern.matches("^\\d+:\\d{2}:\\d{2}\\.\\d{3},\\d+:\\d{2}:\\d{2}\\.\\d{3}$", line)) {
                if (currentEntry != null) {
                    entries.add(currentEntry);
                }
                String[] times = line.split(",");
                currentEntry = new HashMap<>();
                currentEntry.put("start", times[0]);
                currentEntry.put("end", times[1]);
                currentEntry.put("text", "");
                currentEntry.put("speaker", null);
                inTextBlock = true;
            } else if (inTextBlock && line.isEmpty()) {
                if (currentEntry != null) {
                    entries.add(currentEntry);
                }
                currentEntry = null;
                inTextBlock = false;
            } else if (inTextBlock && currentEntry != null && !line.isEmpty()) {
                String text = currentEntry.get("text");
                if (text.isEmpty() && line.startsWith(">> ")) {
                    Pattern speakerPattern = Pattern.compile("^>> ([^:]+): (.*)$");
                    Matcher matcher = speakerPattern.matcher(line);
                    if (matcher.matches()) {
                        currentEntry.put("speaker", matcher.group(1));
                        currentEntry.put("text", matcher.group(2) + "\n" + text);
                    } else {
                        currentEntry.put("text", line + "\n" + text);
                    }
                } else {
                    currentEntry.put("text", line + "\n" + text);
                }
            }
        }
        if (currentEntry != null) {
            entries.add(currentEntry);
        }
        numEntries = entries.size();
        reader.close();
    }

    public void printProperties() {
        System.out.println("Number of subtitle entries: " + numEntries);
        System.out.println();
        for (int i = 0; i < entries.size(); i++) {
            Map<String, String> entry = entries.get(i);
            System.out.println("Entry " + (i + 1) + ":");
            System.out.println("  Timestamp: " + entry.get("start") + " -- " + entry.get("end"));
            if (entry.get("speaker") != null) {
                System.out.println("  Speaker: " + entry.get("speaker"));
            }
            System.out.println("  Text:");
            String[] textLines = entry.get("text").split("\n");
            for (String t : textLines) {
                if (!t.isEmpty()) {
                    System.out.println("    " + t);
                }
            }
            System.out.println();
        }
    }

    public void write(String outputPath) throws IOException {
        PrintWriter writer = new PrintWriter(new FileWriter(outputPath));
        for (Map<String, String> entry : entries) {
            writer.println(entry.get("start") + "," + entry.get("end"));
            String speaker = entry.get("speaker");
            String text = entry.get("text");
            if (speaker != null) {
                writer.println(">> " + speaker + ": " + text.split("\n")[0]);
                for (int j = 1; j < text.split("\n").length; j++) {
                    writer.println(text.split("\n")[j]);
                }
            } else {
                for (String t : text.split("\n")) {
                    writer.println(t);
                }
            }
            writer.println();
        }
        writer.close();
    }

    public static void main(String[] args) throws IOException {
        if (args.length < 1) {
            System.err.println("Usage: java SBVHandler <input.sbv> [output.sbv]");
            System.exit(1);
        }
        SBVHandler handler = new SBVHandler(args[0]);
        handler.read();
        handler.printProperties();
        if (args.length > 1) {
            handler.write(args[1]);
            System.out.println("Written to " + args[1]);
        }
    }
}

Note: The text is stored as a single string with \n separators for simplicity; splitting is done during printing/writing. The regex ensures timestamp validation.

6. JavaScript Class for .SBV Handling

The following Node.js class, SBVHandler, uses the built-in fs module to open, parse, print, and write .SBV files. Run with node sbv_handler.js <input_file> [output_file].

const fs = require('fs');
const path = require('path');

class SBVHandler {
  constructor(filePath) {
    this.filePath = filePath;
    this.entries = [];
    this.numEntries = 0;
  }

  read() {
    const content = fs.readFileSync(this.filePath, 'utf8');
    const lines = content.split('\n');
    let currentEntry = null;
    let inTextBlock = false;

    for (let line of lines) {
      line = line.replace(/\r$/, ''); // Normalize line endings
      const trimmed = line.trim();

      if (/^\d+:\d{2}:\d{2}\.\d{3},\d+:\d{2}:\d{2}\.\d{3}$/.test(trimmed)) {
        if (currentEntry) {
          this.entries.push(currentEntry);
        }
        const [start, end] = trimmed.split(',');
        currentEntry = { start, end, text: [], speaker: null };
        inTextBlock = true;
      } else if (inTextBlock && trimmed === '') {
        if (currentEntry) {
          this.entries.push(currentEntry);
        }
        currentEntry = null;
        inTextBlock = false;
      } else if (inTextBlock && currentEntry && line.length > 0) {
        if (currentEntry.text.length === 0 && line.startsWith('>> ')) {
          const match = line.match(/^>> ([^:]+): (.*)$/);
          if (match) {
            currentEntry.speaker = match[1];
            currentEntry.text.push(match[2]);
          } else {
            currentEntry.text.push(line);
          }
        } else {
          currentEntry.text.push(line);
        }
      }
    }
    if (currentEntry) {
      this.entries.push(currentEntry);
    }
    this.numEntries = this.entries.length;
  }

  printProperties() {
    console.log(`Number of subtitle entries: ${this.numEntries}`);
    console.log('');
    this.entries.forEach((entry, index) => {
      console.log(`Entry ${index + 1}:`);
      console.log(`  Timestamp: ${entry.start} -- ${entry.end}`);
      if (entry.speaker) {
        console.log(`  Speaker: ${entry.speaker}`);
      }
      console.log('  Text:');
      entry.text.forEach(textLine => {
        if (textLine.trim()) {
          console.log(`    ${textLine}`);
        }
      });
      console.log('');
    });
  }

  write(outputPath) {
    let content = '';
    this.entries.forEach(entry => {
      content += `${entry.start},${entry.end}\n`;
      if (entry.speaker) {
        content += `>> ${entry.speaker}: ${entry.text[0]}\n`;
        for (let i = 1; i < entry.text.length; i++) {
          content += `${entry.text[i]}\n`;
        }
      } else {
        entry.text.forEach(t => content += `${t}\n`);
      }
      content += '\n';
    });
    fs.writeFileSync(outputPath, content, 'utf8');
  }
}

// Example usage (CLI):
const args = process.argv.slice(2);
if (args.length < 1) {
  console.error('Usage: node sbv_handler.js <input.sbv> [output.sbv]');
  process.exit(1);
}
const handler = new SBVHandler(args[0]);
handler.read();
handler.printProperties();
if (args[1]) {
  handler.write(args[1]);
  console.log(`Written to ${args[1]}`);
}

This class mirrors the Python version, using regex for validation and synchronous file operations for simplicity.

7. C Class for .SBV Handling

The following C implementation uses standard stdio.h and string.h for file I/O and parsing. It is a struct-based "class" with functions. Compile with gcc sbv_handler.c -o sbv_handler and run ./sbv_handler <input.sbv> [output.sbv].

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h> // For regex; compile with -lregex if needed, or use simple string checks

#define MAX_LINE 256
#define MAX_ENTRIES 1000

typedef struct {
    char start[20];
    char end[20];
    char **text;
    int text_count;
    char *speaker;
} Entry;

typedef struct {
    char *file_path;
    Entry *entries;
    int num_entries;
    int max_entries;
} SBVHandler;

SBVHandler *sbv_create(const char *file_path) {
    SBVHandler *handler = malloc(sizeof(SBVHandler));
    handler->file_path = strdup(file_path);
    handler->entries = malloc(MAX_ENTRIES * sizeof(Entry));
    handler->num_entries = 0;
    handler->max_entries = MAX_ENTRIES;
    return handler;
}

void sbv_free(SBVHandler *handler) {
    for (int i = 0; i < handler->num_entries; i++) {
        Entry *e = &handler->entries[i];
        for (int j = 0; j < e->text_count; j++) {
            free(e->text[j]);
        }
        free(e->text);
        free(e->speaker);
    }
    free(handler->entries);
    free(handler->file_path);
    free(handler);
}

int is_timestamp(const char *line) {
    // Simple check for HH:MM:SS.mmm,HH:MM:SS.mmm
    int h1, m1, s1, ms1, h2, m2, s2, ms2;
    if (sscanf(line, "%d:%d:%d.%d,%d:%d:%d.%d", &h1, &m1, &s1, &ms1, &h2, &m2, &s2, &ms2) == 8) {
        return 1;
    }
    return 0;
}

void sbv_read(SBVHandler *handler) {
    FILE *f = fopen(handler->file_path, "r");
    if (!f) return;

    char line[MAX_LINE];
    Entry *current = NULL;
    int in_text = 0;

    while (fgets(line, MAX_LINE, f)) {
        // Remove \n
        line[strcspn(line, "\r\n")] = 0;
        char *trimmed = line;
        while (*trimmed == ' ') trimmed++; // Simple trim

        if (is_timestamp(line)) {
            if (current) {
                handler->entries[handler->num_entries++] = *current;
            }
            current = malloc(sizeof(Entry));
            sscanf(line, "%[^,],%s", current->start, current->end);
            current->text = malloc(10 * sizeof(char*)); // Assume max 10 lines
            current->text_count = 0;
            current->speaker = NULL;
            in_text = 1;
        } else if (in_text && strlen(trimmed) == 0) {
            if (current) {
                handler->entries[handler->num_entries++] = *current;
                free(current);
                current = NULL;
            }
            in_text = 0;
        } else if (in_text && current && strlen(line) > 0) {
            char *text_line = strdup(line);
            if (current->text_count == 0 && strncmp(text_line, ">> ", 3) == 0) {
                // Simple speaker parse: >> NAME: text
                char *colon = strchr(text_line + 3, ':');
                if (colon) {
                    *colon = 0;
                    current->speaker = strdup(text_line + 3);
                    char *text_start = colon + 2; // Skip ": "
                    free(text_line);
                    text_line = strdup(text_start);
                }
            }
            current->text[current->text_count++] = text_line;
        }
    }
    if (current) {
        handler->entries[handler->num_entries++] = *current;
        free(current);
    }
    fclose(f);
}

void sbv_print_properties(SBVHandler *handler) {
    printf("Number of subtitle entries: %d\n\n", handler->num_entries);
    for (int i = 0; i < handler->num_entries; i++) {
        Entry *e = &handler->entries[i];
        printf("Entry %d:\n", i + 1);
        printf("  Timestamp: %s -- %s\n", e->start, e->end);
        if (e->speaker) {
            printf("  Speaker: %s\n", e->speaker);
        }
        printf("  Text:\n");
        for (int j = 0; j < e->text_count; j++) {
            printf("    %s\n", e->text[j]);
        }
        printf("\n");
    }
}

void sbv_write(SBVHandler *handler, const char *output_path) {
    FILE *f = fopen(output_path, "w");
    if (!f) return;

    for (int i = 0; i < handler->num_entries; i++) {
        Entry *e = &handler->entries[i];
        fprintf(f, "%s,%s\n", e->start, e->end);
        if (e->speaker) {
            fprintf(f, ">> %s: %s\n", e->speaker, e->text[0]);
            for (int j = 1; j < e->text_count; j++) {
                fprintf(f, "%s\n", e->text[j]);
            }
        } else {
            for (int j = 0; j < e->text_count; j++) {
                fprintf(f, "%s\n", e->text[j]);
            }
        }
        fprintf(f, "\n");
    }
    fclose(f);
}

int main(int argc, char **argv) {
    if (argc < 2) {
        fprintf(stderr, "Usage: %s <input.sbv> [output.sbv]\n", argv[0]);
        return 1;
    }
    SBVHandler *handler = sbv_create(argv[1]);
    sbv_read(handler);
    sbv_print_properties(handler);
    if (argc > 2) {
        sbv_write(handler, argv[2]);
        printf("Written to %s\n", argv[2]);
    }
    sbv_free(handler);
    return 0;
}

This C implementation uses sscanf for timestamp parsing (as a simple alternative to regex) and dynamic allocation for entries. Memory is freed properly. It handles basic cases but assumes <1000 entries and <10 text lines per entry for simplicity; extend as needed.