Task 045: .ATG File Format

Task 045: .ATG File Format

The .ATG file format refers to the Attributed Grammar file used by Coco/R, a compiler generator tool that processes these files to produce scanners and parsers for programming languages. It is a text-based format following a specific syntax derived from EBNF with embedded semantic actions in a target programming language (e.g., C#, Java, or C++). The format is Unicode-compatible (UTF-8) and does not include binary headers or fixed structures; instead, it relies on keyword-delimited sections for lexical and syntactic definitions.

1. List of Properties Intrinsic to the .ATG File Format

The properties represent the core structural components of the format, which define the grammar's lexical scanner and parser specifications. These are extracted based on the official Coco/R documentation and include:

  • Imports: Optional list of namespace or package import statements (e.g., using System;).
  • Compiler Name: The identifier following the COMPILER keyword, which names the grammar and serves as the start symbol.
  • Global Fields and Methods: Optional declarations of fields and methods in the target language, placed after the COMPILER declaration but before the scanner specification.
  • IgnoreCase: A boolean indicating whether scanning is case-insensitive (present if the IGNORECASE keyword appears).
  • Character Sets: A collection of named character set declarations under CHARACTERS, each consisting of an identifier and a set expression (e.g., digit = "0123456789".).
  • Tokens: A collection of token declarations under TOKENS, each defining terminal symbols via EBNF expressions (e.g., ident = letter {letter | digit}.).
  • Pragmas: Optional collection of pragma declarations under PRAGMAS, which are special tokens processed semantically without entering the parser.
  • Comments: A list of comment delimiters defined with COMMENTS FROM ... TO ... (optionally NESTED).
  • Ignore (White Space): The set of characters to ignore during scanning, defined with IGNORE (e.g., spaces, tabs, newlines).
  • Productions: A collection of syntactic productions under PRODUCTIONS, each consisting of a nonterminal identifier, optional attributes, and an EBNF expression with semantic actions (e.g., Expr = Term {("+" | "-") Term}.).
  • End Name: The identifier following the END keyword, which must match the Compiler Name for validation.

These properties are text-based and delimited by keywords, with no fixed byte offsets or magic numbers, as the format is human-readable and parsed sequentially.

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .ATG File Dump

The following is an HTML snippet with embedded JavaScript that can be inserted into a Ghost blog post. It creates a drag-and-drop area where a user can upload a .ATG file. The script reads the file as text, parses it using a simple state machine and regular expressions to extract the properties listed above, and displays them on the screen in a structured format.

Drag and drop a .ATG file here

This script uses a basic parser that assumes standard formatting and may not handle complex nested expressions or errors robustly, but it extracts and displays the properties as JSON.

4. Python Class for .ATG File Handling

import re

class ATGFile:
    def __init__(self, filepath=None):
        self.properties = {
            'imports': [],
            'compilerName': '',
            'globalFieldsAndMethods': '',
            'ignoreCase': False,
            'characterSets': {},
            'tokens': {},
            'pragmas': [],
            'comments': [],
            'ignore': '',
            'productions': {},
            'endName': ''
        }
        if filepath:
            self.read(filepath)

    def read(self, filepath):
        with open(filepath, 'r', encoding='utf-8') as f:
            content = f.read()
        lines = content.splitlines()
        state = 'imports'
        for line in lines:
            line = line.strip()
            if not line:
                continue
            if state == 'imports' and re.match(r'^(using|import)\s', line):
                self.properties['imports'].append(line)
            elif re.match(r'^COMPILER', line):
                self.properties['compilerName'] = re.sub(r'^COMPILER\s*', '', line)
                state = 'global'
            elif state == 'global' and not re.match(r'^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)', line):
                self.properties['globalFieldsAndMethods'] += line + '\n'
            elif line == 'IGNORECASE':
                self.properties['ignoreCase'] = True
                state = 'scanner'
            elif re.match(r'^CHARACTERS', line):
                state = 'characters'
            elif state == 'characters' and '=' in line:
                name, defn = [s.strip() for s in line.split('=', 1)]
                self.properties['characterSets'][name] = defn.rstrip('.')
            elif re.match(r'^TOKENS', line):
                state = 'tokens'
            elif state == 'tokens' and '=' in line:
                name, defn = [s.strip() for s in line.split('=', 1)]
                self.properties['tokens'][name] = defn.rstrip('.')
            elif re.match(r'^PRAGMAS', line):
                state = 'pragmas'
            elif state == 'pragmas' and '=' in line:
                self.properties['pragmas'].append(line.rstrip('.'))
            elif re.match(r'^COMMENTS FROM', line):
                self.properties['comments'].append(line)
            elif re.match(r'^IGNORE', line):
                self.properties['ignore'] = re.sub(r'^IGNORE\s*', '', line)
            elif re.match(r'^PRODUCTIONS', line):
                state = 'productions'
            elif state == 'productions' and '=' in line:
                name, defn = [s.strip() for s in line.split('=', 1)]
                self.properties['productions'][name] = defn.rstrip('.')
            elif re.match(r'^END', line):
                self.properties['endName'] = re.sub(r'^END\s*', '', line).rstrip('.')
                state = 'end'

    def print_properties(self):
        import json
        print(json.dumps(self.properties, indent=4))

    def write(self, filepath):
        with open(filepath, 'w', encoding='utf-8') as f:
            for imp in self.properties['imports']:
                f.write(imp + '\n')
            f.write(f"COMPILER {self.properties['compilerName']}\n")
            f.write(self.properties['globalFieldsAndMethods'])
            if self.properties['ignoreCase']:
                f.write('IGNORECASE\n')
            f.write('CHARACTERS\n')
            for name, defn in self.properties['characterSets'].items():
                f.write(f"  {name} = {defn}.\n")
            f.write('TOKENS\n')
            for name, defn in self.properties['tokens'].items():
                f.write(f"  {name} = {defn}.\n")
            if self.properties['pragmas']:
                f.write('PRAGMAS\n')
                for pragma in self.properties['pragmas']:
                    f.write(f"  {pragma}.\n")
            for comment in self.properties['comments']:
                f.write(comment + '\n')
            if self.properties['ignore']:
                f.write(f"IGNORE {self.properties['ignore']}\n")
            f.write('PRODUCTIONS\n')
            for name, defn in self.properties['productions'].items():
                f.write(f"  {name} = {defn}.\n")
            f.write(f"END {self.properties['endName']}.\n")

# Example usage:
# atg = ATGFile('example.ATG')
# atg.print_properties()
# atg.write('output.ATG')

This class reads and parses the file using regular expressions and a state machine, prints the properties as JSON, and writes a new .ATG file based on the parsed properties. It handles basic cases but may require enhancements for deeply nested expressions or semantic actions.

5. Java Class for .ATG File Handling

import java.io.*;
import java.util.*;
import java.util.regex.*;

public class ATGFile {
    private Map<String, Object> properties = new HashMap<>();

    public ATGFile(String filepath) throws IOException {
        properties.put("imports", new ArrayList<String>());
        properties.put("compilerName", "");
        properties.put("globalFieldsAndMethods", "");
        properties.put("ignoreCase", false);
        properties.put("characterSets", new HashMap<String, String>());
        properties.put("tokens", new HashMap<String, String>());
        properties.put("pragmas", new ArrayList<String>());
        properties.put("comments", new ArrayList<String>());
        properties.put("ignore", "");
        properties.put("productions", new HashMap<String, String>());
        properties.put("endName", "");
        if (filepath != null) {
            read(filepath);
        }
    }

    public void read(String filepath) throws IOException {
        StringBuilder content = new StringBuilder();
        try (BufferedReader reader = new BufferedReader(new FileReader(filepath))) {
            String line;
            while ((line = reader.readLine()) != null) {
                content.append(line).append("\n");
            }
        }
        String[] lines = content.toString().split("\n");
        String state = "imports";
        for (String line : lines) {
            line = line.trim();
            if (line.isEmpty()) continue;
            Pattern p = Pattern.compile("^COMPILER\\s*(.*)");
            Matcher m = p.matcher(line);
            if (state.equals("imports") && (line.startsWith("using ") || line.startsWith("import "))) {
                ((List<String>) properties.get("imports")).add(line);
            } else if (m.matches()) {
                properties.put("compilerName", m.group(1));
                state = "global";
            } else if (state.equals("global") && !Pattern.matches("^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)", line)) {
                properties.put("globalFieldsAndMethods", (String) properties.get("globalFieldsAndMethods") + line + "\n");
            } else if (line.equals("IGNORECASE")) {
                properties.put("ignoreCase", true);
                state = "scanner";
            } else if (line.startsWith("CHARACTERS")) {
                state = "characters";
            } else if (state.equals("characters") && line.contains("=")) {
                String[] parts = line.split("=", 2);
                ((Map<String, String>) properties.get("characterSets")).put(parts[0].trim(), parts[1].trim().replace(".", ""));
            } else if (line.startsWith("TOKENS")) {
                state = "tokens";
            } else if (state.equals("tokens") && line.contains("=")) {
                String[] parts = line.split("=", 2);
                ((Map<String, String>) properties.get("tokens")).put(parts[0].trim(), parts[1].trim().replace(".", ""));
            } else if (line.startsWith("PRAGMAS")) {
                state = "pragmas";
            } else if (state.equals("pragmas") && line.contains("=")) {
                ((List<String>) properties.get("pragmas")).add(line.replace(".", ""));
            } else if (line.startsWith("COMMENTS FROM")) {
                ((List<String>) properties.get("comments")).add(line);
            } else if (line.startsWith("IGNORE")) {
                properties.put("ignore", line.replaceFirst("IGNORE\\s*", ""));
            } else if (line.startsWith("PRODUCTIONS")) {
                state = "productions";
            } else if (state.equals("productions") && line.contains("=")) {
                String[] parts = line.split("=", 2);
                ((Map<String, String>) properties.get("productions")).put(parts[0].trim(), parts[1].trim().replace(".", ""));
            } else if (line.startsWith("END")) {
                properties.put("endName", line.replaceFirst("END\\s*", "").replace(".", ""));
                state = "end";
            }
        }
    }

    public void printProperties() {
        System.out.println(properties);
    }

    public void write(String filepath) throws IOException {
        try (BufferedWriter writer = new BufferedWriter(new FileWriter(filepath))) {
            for (String imp : (List<String>) properties.get("imports")) {
                writer.write(imp + "\n");
            }
            writer.write("COMPILER " + properties.get("compilerName") + "\n");
            writer.write((String) properties.get("globalFieldsAndMethods"));
            if ((boolean) properties.get("ignoreCase")) {
                writer.write("IGNORECASE\n");
            }
            writer.write("CHARACTERS\n");
            for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("characterSets")).entrySet()) {
                writer.write("  " + entry.getKey() + " = " + entry.getValue() + ".\n");
            }
            writer.write("TOKENS\n");
            for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("tokens")).entrySet()) {
                writer.write("  " + entry.getKey() + " = " + entry.getValue() + ".\n");
            }
            if (!((List<String>) properties.get("pragmas")).isEmpty()) {
                writer.write("PRAGMAS\n");
                for (String pragma : (List<String>) properties.get("pragmas")) {
                    writer.write("  " + pragma + ".\n");
                }
            }
            for (String comment : (List<String>) properties.get("comments")) {
                writer.write(comment + "\n");
            }
            if (!((String) properties.get("ignore")).isEmpty()) {
                writer.write("IGNORE " + properties.get("ignore") + "\n");
            }
            writer.write("PRODUCTIONS\n");
            for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("productions")).entrySet()) {
                writer.write("  " + entry.getKey() + " = " + entry.getValue() + ".\n");
            }
            writer.write("END " + properties.get("endName") + ".\n");
        }
    }

    // Example usage:
    // public static void main(String[] args) throws IOException {
    //     ATGFile atg = new ATGFile("example.ATG");
    //     atg.printProperties();
    //     atg.write("output.ATG");
    // }
}

This class uses a similar parsing approach, prints the properties as a map string, and writes the file. It handles basic validation but assumes well-formed input.

6. JavaScript Class for .ATG File Handling

const fs = require('fs');

class ATGFile {
  constructor(filepath = null) {
    this.properties = {
      imports: [],
      compilerName: '',
      globalFieldsAndMethods: '',
      ignoreCase: false,
      characterSets: {},
      tokens: {},
      pragmas: [],
      comments: [],
      ignore: '',
      productions: {},
      endName: ''
    };
    if (filepath) {
      this.read(filepath);
    }
  }

  read(filepath) {
    const content = fs.readFileSync(filepath, 'utf-8');
    const lines = content.split('\n');
    let state = 'imports';
    for (let line of lines) {
      line = line.trim();
      if (!line) continue;
      if (state === 'imports' && (line.startsWith('using ') || line.startsWith('import '))) {
        this.properties.imports.push(line);
      } else if (line.startsWith('COMPILER')) {
        this.properties.compilerName = line.replace(/^COMPILER\s*/, '');
        state = 'global';
      } else if (state === 'global' && !/^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)/.test(line)) {
        this.properties.globalFieldsAndMethods += line + '\n';
      } else if (line === 'IGNORECASE') {
        this.properties.ignoreCase = true;
        state = 'scanner';
      } else if (line.startsWith('CHARACTERS')) {
        state = 'characters';
      } else if (state === 'characters' && line.includes('=')) {
        const [name, defn] = line.split('=').map(s => s.trim());
        this.properties.characterSets[name] = defn.replace(/\.$/, '');
      } else if (line.startsWith('TOKENS')) {
        state = 'tokens';
      } else if (state === 'tokens' && line.includes('=')) {
        const [name, defn] = line.split('=').map(s => s.trim());
        this.properties.tokens[name] = defn.replace(/\.$/, '');
      } else if (line.startsWith('PRAGMAS')) {
        state = 'pragmas';
      } else if (state === 'pragmas' && line.includes('=')) {
        this.properties.pragmas.push(line.replace(/\.$/, ''));
      } else if (line.startsWith('COMMENTS FROM')) {
        this.properties.comments.push(line);
      } else if (line.startsWith('IGNORE')) {
        this.properties.ignore = line.replace(/^IGNORE\s*/, '');
      } else if (line.startsWith('PRODUCTIONS')) {
        state = 'productions';
      } else if (state === 'productions' && line.includes('=')) {
        const [name, defn] = line.split('=').map(s => s.trim());
        this.properties.productions[name] = defn.replace(/\.$/, '');
      } else if (line.startsWith('END')) {
        this.properties.endName = line.replace(/^END\s*/, '').replace(/\.$/, '');
        state = 'end';
      }
    }
  }

  printProperties() {
    console.log(JSON.stringify(this.properties, null, 4));
  }

  write(filepath) {
    let output = '';
    this.properties.imports.forEach(imp => output += imp + '\n');
    output += `COMPILER ${this.properties.compilerName}\n`;
    output += this.properties.globalFieldsAndMethods;
    if (this.properties.ignoreCase) output += 'IGNORECASE\n';
    output += 'CHARACTERS\n';
    for (let [name, defn] of Object.entries(this.properties.characterSets)) {
      output += `  ${name} = ${defn}.\n`;
    }
    output += 'TOKENS\n';
    for (let [name, defn] of Object.entries(this.properties.tokens)) {
      output += `  ${name} = ${defn}.\n`;
    }
    if (this.properties.pragmas.length > 0) {
      output += 'PRAGMAS\n';
      this.properties.pragmas.forEach(pragma => output += `  ${pragma}.\n`);
    }
    this.properties.comments.forEach(comment => output += comment + '\n');
    if (this.properties.ignore) output += `IGNORE ${this.properties.ignore}\n`;
    output += 'PRODUCTIONS\n';
    for (let [name, defn] of Object.entries(this.properties.productions)) {
      output += `  ${name} = ${defn}.\n`;
    }
    output += `END ${this.properties.endName}.\n`;
    fs.writeFileSync(filepath, output, 'utf-8');
  }
}

// Example usage:
// const atg = new ATGFile('example.ATG');
// atg.printProperties();
// atg.write('output.ATG');

This class is designed for Node.js, reads and parses the file, prints properties as JSON, and writes a new file.

7. C++ Class for .ATG File Handling

#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <regex>

class ATGFile {
private:
    std::vector<std::string> imports;
    std::string compilerName;
    std::string globalFieldsAndMethods;
    bool ignoreCase;
    std::map<std::string, std::string> characterSets;
    std::map<std::string, std::string> tokens;
    std::vector<std::string> pragmas;
    std::vector<std::string> comments;
    std::string ignore;
    std::map<std::string, std::string> productions;
    std::string endName;

public:
    ATGFile(const std::string& filepath = "") {
        ignoreCase = false;
        if (!filepath.empty()) {
            read(filepath);
        }
    }

    void read(const std::string& filepath) {
        std::ifstream file(filepath);
        if (!file.is_open()) {
            std::cerr << "Failed to open file." << std::endl;
            return;
        }
        std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
        file.close();

        std::istringstream iss(content);
        std::string line;
        std::string state = "imports";
        while (std::getline(iss, line)) {
            line = std::regex_replace(line, std::regex("^\\s+|\\s+$"), "");
            if (line.empty()) continue;

            std::smatch match;
            if (state == "imports" && (std::regex_search(line, std::regex("^using ")) || std::regex_search(line, std::regex("^import ")))) {
                imports.push_back(line);
            } else if (std::regex_match(line, match, std::regex("^COMPILER\\s*(.*)"))) {
                compilerName = match[1];
                state = "global";
            } else if (state == "global" && !std::regex_search(line, std::regex("^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)"))) {
                globalFieldsAndMethods += line + "\n";
            } else if (line == "IGNORECASE") {
                ignoreCase = true;
                state = "scanner";
            } else if (std::regex_search(line, std::regex("^CHARACTERS"))) {
                state = "characters";
            } else if (state == "characters" && line.find('=') != std::string::npos) {
                size_t pos = line.find('=');
                std::string name = line.substr(0, pos);
                std::string defn = line.substr(pos + 1);
                name = std::regex_replace(name, std::regex("^\\s+|\\s+$"), "");
                defn = std::regex_replace(defn, std::regex("^\\s+|\\s+$|\\.$"), "");
                characterSets[name] = defn;
            } else if (std::regex_search(line, std::regex("^TOKENS"))) {
                state = "tokens";
            } else if (state == "tokens" && line.find('=') != std::string::npos) {
                size_t pos = line.find('=');
                std::string name = line.substr(0, pos);
                std::string defn = line.substr(pos + 1);
                name = std::regex_replace(name, std::regex("^\\s+|\\s+$"), "");
                defn = std::regex_replace(defn, std::regex("^\\s+|\\s+$|\\.$"), "");
                tokens[name] = defn;
            } else if (std::regex_search(line, std::regex("^PRAGMAS"))) {
                state = "pragmas";
            } else if (state == "pragmas" && line.find('=') != std::string::npos) {
                pragmas.push_back(std::regex_replace(line, std::regex("\\.$"), ""));
            } else if (std::regex_search(line, std::regex("^COMMENTS FROM"))) {
                comments.push_back(line);
            } else if (std::regex_match(line, match, std::regex("^IGNORE\\s*(.*)"))) {
                ignore = match[1];
            } else if (std::regex_search(line, std::regex("^PRODUCTIONS"))) {
                state = "productions";
            } else if (state == "productions" && line.find('=') != std::string::npos) {
                size_t pos = line.find('=');
                std::string name = line.substr(0, pos);
                std::string defn = line.substr(pos + 1);
                name = std::regex_replace(name, std::regex("^\\s+|\\s+$"), "");
                defn = std::regex_replace(defn, std::regex("^\\s+|\\s+$|\\.$"), "");
                productions[name] = defn;
            } else if (std::regex_match(line, match, std::regex("^END\\s*(.*)\\.$"))) {
                endName = match[1];
                state = "end";
            }
        }
    }

    void printProperties() {
        std::cout << "{\n";
        std::cout << "  \"imports\": [";
        for (size_t i = 0; i < imports.size(); ++i) {
            std::cout << "\"" << imports[i] << "\"" << (i < imports.size() - 1 ? ", " : "");
        }
        std::cout << "],\n";
        std::cout << "  \"compilerName\": \"" << compilerName << "\",\n";
        std::cout << "  \"globalFieldsAndMethods\": \"" << globalFieldsAndMethods << "\",\n";
        std::cout << "  \"ignoreCase\": " << (ignoreCase ? "true" : "false") << ",\n";
        std::cout << "  \"characterSets\": {";
        for (auto it = characterSets.begin(); it != characterSets.end(); ) {
            std::cout << "\"" << it->first << "\": \"" << it->second << "\"";
            if (++it != characterSets.end()) std::cout << ", ";
        }
        std::cout << "},\n";
        std::cout << "  \"tokens\": {";
        for (auto it = tokens.begin(); it != tokens.end(); ) {
            std::cout << "\"" << it->first << "\": \"" << it->second << "\"";
            if (++it != tokens.end()) std::cout << ", ";
        }
        std::cout << "},\n";
        std::cout << "  \"pragmas\": [";
        for (size_t i = 0; i < pragmas.size(); ++i) {
            std::cout << "\"" << pragmas[i] << "\"" << (i < pragmas.size() - 1 ? ", " : "");
        }
        std::cout << "],\n";
        std::cout << "  \"comments\": [";
        for (size_t i = 0; i < comments.size(); ++i) {
            std::cout << "\"" << comments[i] << "\"" << (i < comments.size() - 1 ? ", " : "");
        }
        std::cout << "],\n";
        std::cout << "  \"ignore\": \"" << ignore << "\",\n";
        std::cout << "  \"productions\": {";
        for (auto it = productions.begin(); it != productions.end(); ) {
            std::cout << "\"" << it->first << "\": \"" << it->second << "\"";
            if (++it != productions.end()) std::cout << ", ";
        }
        std::cout << "},\n";
        std::cout << "  \"endName\": \"" << endName << "\"\n";
        std::cout << "}\n";
    }

    void write(const std::string& filepath) {
        std::ofstream file(filepath);
        if (!file.is_open()) {
            std::cerr << "Failed to open file for writing." << std::endl;
            return;
        }
        for (const auto& imp : imports) {
            file << imp << "\n";
        }
        file << "COMPILER " << compilerName << "\n";
        file << globalFieldsAndMethods;
        if (ignoreCase) file << "IGNORECASE\n";
        file << "CHARACTERS\n";
        for (const auto& entry : characterSets) {
            file << "  " << entry.first << " = " << entry.second << ".\n";
        }
        file << "TOKENS\n";
        for (const auto& entry : tokens) {
            file << "  " << entry.first << " = " << entry.second << ".\n";
        }
        if (!pragmas.empty()) {
            file << "PRAGMAS\n";
            for (const auto& pragma : pragmas) {
                file << "  " << pragma << ".\n";
            }
        }
        for (const auto& comment : comments) {
            file << comment << "\n";
        }
        if (!ignore.empty()) file << "IGNORE " << ignore << "\n";
        file << "PRODUCTIONS\n";
        for (const auto& entry : productions) {
            file << "  " << entry.first << " = " << entry.second << ".\n";
        }
        file << "END " << endName << ".\n";
        file.close();
    }
};

// Example usage:
// int main() {
//     ATGFile atg("example.ATG");
//     atg.printProperties();
//     atg.write("output.ATG");
//     return 0;
// }

This class uses C++ standard libraries for file I/O and regex-based parsing, prints properties in JSON-like format to console, and writes a new file. It provides basic functionality but may need adjustments for edge cases involving complex semantic actions.