Task 261: .GMI File Format

Task 261: .GMI File Format

File Format Specifications for .GMI (Gemtext)

The .GMI file format refers to Gemtext, the native markup language for the Gemini protocol, a lightweight alternative to HTML for hypertext documents. It is a simple, line-oriented plain text format designed for minimalism and ease of parsing, with no support for complex styling, scripting, or media embedding. Gemtext files typically use the .gmi extension and are served with the media type text/gemini; charset=utf-8. The format emphasizes semantic structure over presentation, allowing clients (browsers) to render content flexibly.

Key aspects of the specification include:

  • Overall Structure: Documents consist of one or more lines, parsed sequentially in a single top-to-bottom pass. No hierarchical nesting or block-level elements beyond basic toggles.
  • Character Encoding: UTF-8 by default.
  • Line Endings: Canonical CRLF (carriage return + line feed), though LF alone is permitted in transmission.
  • Parser Behavior: Maintains a single bit of state (normal mode or preformatted mode), starting in normal mode. Preformatted mode toggles on/off and preserves exact line content (no wrapping).
  • Rendering Philosophy: Authors cannot dictate exact visuals; clients handle wrapping, styling, and layout for accessibility and device adaptation.
  • Extensibility: Intentionally not extensible to prevent bloat; only six defined line types.
  • Media Type Parameters: Optional lang parameter (BCP 47 language tags, e.g., lang=en) for document language, which may influence client rendering but is not part of the file content itself.

The format draws inspiration from Markdown and Gophermap but is stricter and simpler.

List of All Properties Intrinsic to the .GMI (Gemtext) File Format

These are the core, defining characteristics of the format itself (independent of specific file content), focusing on its structural and systemic rules:

  • Line-Oriented Parsing: All content is divided into discrete lines; each line is classified independently based on its first 1-3 characters, with no cross-line dependencies except for preformatted mode toggling.
  • Six Exclusive Line Types: Every non-empty line must match exactly one type (text, link, heading, list item, quote, or preformatted toggle). Ambiguous lines default to text.
  • Heading Levels: Three semantic levels (# for level 1, ## for level 2, ### for level 3), used for document outlining (e.g., table of contents generation).
  • Link Syntax: => URL [link text], supporting absolute/relative Gemini URLs (or other schemes); URLs must be percent-encoded per RFC 3986; no automatic fetching.
  • List Items: Flat, unordered bullets starting with * ; purely stylistic, with no nesting or numbering.
  • Quote Blocks: Lines starting with > for cited external text; clients may indent or style distinctly.
  • Preformatted Blocks: Delimited by toggle lines starting with ```` ``` `` (optionally followed by alt text for description, e.g., language or purpose); content between toggles is verbatim (no wrapping, preserves whitespace).
  • Text Lines: Default fallback; supports paragraphs separated by blank lines; clients apply word-wrapping.
  • Blank Lines: Valid text lines that render as vertical space; multiple consecutive blanks are preserved (no collapsing).
  • Preformatted Mode State: A boolean toggle (on/off) that affects subsequent lines until the next toggle; starts off; ending state is irrelevant.
  • Whitespace Handling: Leading whitespace ignored for type detection; tabs/spaces equivalent in link syntax.
  • Encoding and Safety: UTF-8 only; no support for binary data, scripts, or unsafe characters in links.
  • No Nesting or Hierarchy: Flat structure only; no sections, tables, images, or inline elements.
  • Client Discretion: No author control over fonts, colors, or layout; focuses on semantics for accessibility.

Two Direct Download Links for .GMI Files

Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .GMI Parsing

Embed this full <div> block in a Ghost blog post (via HTML card). It creates a drag-and-drop zone that reads a dropped .gmi file, parses its lines, extracts properties, and dumps them to the screen below. Uses vanilla JS with FileReader for browser compatibility.

Drag and drop a .GMI file here to parse its properties.

Python Class for .GMI Handling

This class reads a .gmi file, parses its properties, prints them to console, and supports writing (e.g., to save the original or modified content).

import re

class GMIFile:
    def __init__(self, filepath=None):
        self.filepath = filepath
        self.lines = []
        self.properties = {}

    def read(self):
        with open(self.filepath, 'r', encoding='utf-8') as f:
            self.lines = f.readlines()
        self._parse_properties()
        self._print_properties()

    def _parse_properties(self):
        text_lines = link_lines = heading_lines = {'1': 0, '2': 0, '3': 0}
        list_items = quote_lines = pre_blocks = 0
        links = []
        in_pre = False
        pre_alt_texts = []
        for line in self.lines:
            line = line.rstrip('\r\n')
            if not line.strip():
                continue
            stripped = line.lstrip()
            if stripped.startswith('```'):
                in_pre = not in_pre
                pre_blocks += 1
                alt_match = re.match(r'^```\s*(.*)$', stripped)
                if alt_match:
                    pre_alt_texts.append(alt_match.group(1))
                continue
            if in_pre:
                continue
            if stripped.startswith('=>'):
                link_lines += 1
                url_match = re.match(r'^=>\s*(\S+)', stripped)
                if url_match:
                    links.append(url_match.group(1))
            elif re.match(r'^#+\s', stripped):
                hashes = len(re.match(r'^#+', stripped).group())
                if hashes <= 3:
                    heading_lines[str(hashes)] += 1
            elif stripped.startswith('* '):
                list_items += 1
            elif stripped.startswith('> '):
                quote_lines += 1
            else:
                text_lines += 1
        self.properties = {
            'total_lines': len(self.lines),
            'text_lines': text_lines,
            'link_lines': link_lines,
            'heading_lines': heading_lines,
            'list_items': list_items,
            'quote_lines': quote_lines,
            'pre_blocks': pre_blocks,
            'links': links,
            'pre_alt_texts': pre_alt_texts,
            'final_pre_mode': in_pre,
            'encoding': 'UTF-8'
        }

    def _print_properties(self):
        for key, value in self.properties.items():
            print(f"{key}: {value}")

    def write(self, output_path):
        with open(output_path, 'w', encoding='utf-8', newline='\r\n') as f:  # Canonical CRLF
            f.writelines(self.lines)

# Usage
# gmi = GMIFile('example.gmi')
# gmi.read()
# gmi.write('output.gmi')

Java Class for .GMI Handling

This class uses java.nio for file I/O, parses lines, prints properties to console (System.out), and supports writing back.

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class GMIFile {
    private String filepath;
    private List<String> lines;
    private Map<String, Object> properties;

    public GMIFile(String filepath) {
        this.filepath = filepath;
        this.lines = new ArrayList<>();
        this.properties = new HashMap<>();
    }

    public void read() throws IOException {
        lines = Files.readAllLines(Paths.get(filepath), StandardCharsets.UTF_8);
        parseProperties();
        printProperties();
    }

    private void parseProperties() {
        int textLines = 0, linkLines = 0;
        Map<String, Integer> headingLines = new HashMap<>() {{ put("1", 0); put("2", 0); put("3", 0); }};
        int listItems = 0, quoteLines = 0, preBlocks = 0;
        List<String> links = new ArrayList<>();
        List<String> preAltTexts = new ArrayList<>();
        boolean inPre = false;
        Pattern urlPattern = Pattern.compile("^=>\\s*(\\S+)");
        Pattern hashPattern = Pattern.compile("^#+");
        for (String line : lines) {
            String trimmed = line.trim();
            if (trimmed.isEmpty()) continue;
            String stripped = line.replaceFirst("^\\s*", "");
            if (stripped.startsWith("```")) {
                inPre = !inPre;
                preBlocks++;
                Matcher altMatcher = Pattern.compile("^```\\s*(.*)$").matcher(stripped);
                if (altMatcher.matches()) {
                    preAltTexts.add(altMatcher.group(1));
                }
                continue;
            }
            if (inPre) continue;
            if (stripped.startsWith("=>")) {
                linkLines++;
                Matcher urlMatcher = urlPattern.matcher(stripped);
                if (urlMatcher.find()) {
                    links.add(urlMatcher.group(1));
                }
            } else if (hashPattern.matcher(stripped).find()) {
                Matcher hashMatcher = hashPattern.matcher(stripped);
                if (hashMatcher.find()) {
                    int hashes = hashMatcher.group().length();
                    if (hashes <= 3) {
                        headingLines.put(String.valueOf(hashes), headingLines.get(String.valueOf(hashes)) + 1);
                    }
                }
            } else if (stripped.startsWith("* ")) {
                listItems++;
            } else if (stripped.startsWith("> ")) {
                quoteLines++;
            } else {
                textLines++;
            }
        }
        properties.put("total_lines", lines.size());
        properties.put("text_lines", textLines);
        properties.put("link_lines", linkLines);
        properties.put("heading_lines", headingLines);
        properties.put("list_items", listItems);
        properties.put("quote_lines", quoteLines);
        properties.put("pre_blocks", preBlocks);
        properties.put("links", links);
        properties.put("pre_alt_texts", preAltTexts);
        properties.put("final_pre_mode", inPre);
        properties.put("encoding", "UTF-8");
    }

    private void printProperties() {
        for (Map.Entry<String, Object> entry : properties.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    public void write(String outputPath) throws IOException {
        Files.write(Paths.get(outputPath), lines, StandardCharsets.UTF_8);
    }

    // Usage: new GMIFile("example.gmi").read(); new GMIFile("example.gmi").write("output.gmi");
}

JavaScript Class for .GMI Handling (Node.js)

This Node.js class uses fs module, parses properties, prints to console (console.log), and supports writing.

const fs = require('fs');
const path = require('path');

class GMIFile {
    constructor(filepath) {
        this.filepath = filepath;
        this.lines = [];
        this.properties = {};
    }

    read() {
        this.lines = fs.readFileSync(this.filepath, 'utf-8').split(/\r?\n/);
        this.parseProperties();
        this.printProperties();
    }

    parseProperties() {
        let textLines = 0, linkLines = 0;
        let headingLines = {1: 0, 2: 0, 3: 0};
        let listItems = 0, quoteLines = 0, preBlocks = 0;
        let links = [];
        let preAltTexts = [];
        let inPre = false;
        this.lines.forEach(line => {
            let trimmed = line.trim();
            if (!trimmed) return;
            let stripped = line.replace(/^\s*/, '');
            if (stripped.startsWith('```')) {
                inPre = !inPre;
                preBlocks++;
                let altMatch = stripped.match(/^```\s*(.*)$/);
                if (altMatch) preAltTexts.push(altMatch[1]);
                return;
            }
            if (inPre) return;
            if (stripped.startsWith('=>')) {
                linkLines++;
                let urlMatch = stripped.match(/^=>\s*(\S+)/);
                if (urlMatch) links.push(urlMatch[1]);
            } else if (/^#+ /.test(stripped)) {
                let hashes = stripped.match(/^#+/)[0].length;
                if (hashes <= 3) headingLines[hashes]++;
            } else if (stripped.startsWith('* ')) {
                listItems++;
            } else if (stripped.startsWith('> ')) {
                quoteLines++;
            } else {
                textLines++;
            }
        });
        this.properties = {
            totalLines: this.lines.length,
            textLines, linkLines, headingLines, listItems, quoteLines, preBlocks,
            links, preAltTexts, finalPreMode: inPre, encoding: 'UTF-8'
        };
    }

    printProperties() {
        Object.entries(this.properties).forEach(([key, value]) => {
            console.log(`${key}: ${JSON.stringify(value)}`);
        });
    }

    write(outputPath) {
        fs.writeFileSync(outputPath, this.lines.join('\r\n'), 'utf-8');
    }
}

// Usage: const gmi = new GMIFile('example.gmi'); gmi.read(); gmi.write('output.gmi');

C Class (Struct with Functions) for .GMI Handling

This uses standard C I/O (stdio.h), dynamically allocates for properties, prints to stdout, and supports writing. Compile with gcc -o gmi gmi.c.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h>  // For regex, or implement simple matching

// Simple struct for properties (simplified, no full lists for brevity)
typedef struct {
    int total_lines;
    int text_lines;
    int link_lines;
    int heading_lines[3];  // Index 0: #, 1: ##, 2: ###
    int list_items;
    int quote_lines;
    int pre_blocks;
    int links_count;  // Count only, no list
    int pre_alt_count;
    int final_pre_mode;
    char encoding[8];
} GMIProperties;

typedef struct {
    char *filepath;
    char **lines;
    int line_count;
    GMIProperties props;
} GMIFile;

void read_gmi(GMIFile *gmi) {
    FILE *f = fopen(gmi->filepath, "r");
    if (!f) return;
    char *line = NULL;
    size_t len = 0;
    gmi->line_count = 0;
    while (getline(&line, &len, f) != -1) {
        gmi->lines = realloc(gmi->lines, (gmi->line_count + 1) * sizeof(char*));
        gmi->lines[gmi->line_count] = strdup(line);
        gmi->line_count++;
    }
    free(line);
    fclose(f);
    parse_properties(gmi);
    print_properties(&gmi->props);
}

void parse_properties(GMIFile *gmi) {
    memset(&gmi->props, 0, sizeof(GMIProperties));
    gmi->props.total_lines = gmi->line_count;
    strcpy(gmi->props.encoding, "UTF-8");
    int in_pre = 0;
    for (int i = 0; i < gmi->line_count; i++) {
        char *line = gmi->lines[i];
        char *trimmed = line;
        while (*trimmed == ' ' || *trimmed == '\t') trimmed++;
        if (!*trimmed || *trimmed == '\n') continue;
        char *stripped = line;
        while (*stripped == ' ' || *stripped == '\t') stripped++;
        if (strncmp(stripped, "```", 3) == 0) {
            in_pre = !in_pre;
            gmi->props.pre_blocks++;
            // Simple alt text count (no extraction)
            gmi->props.pre_alt_count++;
            continue;
        }
        if (in_pre) continue;
        if (strncmp(stripped, "=>", 2) == 0) {
            gmi->props.link_lines++;
            gmi->props.links_count++;  // Simplified
        } else if (stripped[0] == '#') {
            int hashes = 0;
            while (stripped[hashes] == '#') hashes++;
            if (hashes <= 3) gmi->props.heading_lines[hashes - 1]++;
        } else if (strncmp(stripped, "* ", 2) == 0) {
            gmi->props.list_items++;
        } else if (stripped[0] == '>') {
            gmi->props.quote_lines++;
        } else {
            gmi->props.text_lines++;
        }
    }
    gmi->props.final_pre_mode = in_pre;
}

void print_properties(GMIProperties *props) {
    printf("total_lines: %d\n", props->total_lines);
    printf("text_lines: %d\n", props->text_lines);
    printf("link_lines: %d\n", props->link_lines);
    printf("heading_lines: [%d, %d, %d]\n", props->heading_lines[0], props->heading_lines[1], props->heading_lines[2]);
    printf("list_items: %d\n", props->list_items);
    printf("quote_lines: %d\n", props->quote_lines);
    printf("pre_blocks: %d\n", props->pre_blocks);
    printf("links_count: %d\n", props->links_count);
    printf("pre_alt_count: %d\n", props->pre_alt_count);
    printf("final_pre_mode: %d\n", props->final_pre_mode);
    printf("encoding: %s\n", props->encoding);
}

void write_gmi(GMIFile *gmi, char *output_path) {
    FILE *f = fopen(output_path, "w");
    if (!f) return;
    for (int i = 0; i < gmi->line_count; i++) {
        fprintf(f, "%s\r\n", gmi->lines[i]);  // Canonical CRLF
    }
    fclose(f);
}

void free_gmi(GMIFile *gmi) {
    for (int i = 0; i < gmi->line_count; i++) free(gmi->lines[i]);
    free(gmi->lines);
}

// Usage example in main:
// int main() {
//     GMIFile gmi = { .filepath = "example.gmi", .lines = NULL, .line_count = 0 };
//     read_gmi(&gmi);
//     write_gmi(&gmi, "output.gmi");
//     free_gmi(&gmi);
//     return 0;
// }