Task 045: .ATG File Format
Task 045: .ATG File Format
The .ATG file format refers to the Attributed Grammar file used by Coco/R, a compiler generator tool that processes these files to produce scanners and parsers for programming languages. It is a text-based format following a specific syntax derived from EBNF with embedded semantic actions in a target programming language (e.g., C#, Java, or C++). The format is Unicode-compatible (UTF-8) and does not include binary headers or fixed structures; instead, it relies on keyword-delimited sections for lexical and syntactic definitions.
1. List of Properties Intrinsic to the .ATG File Format
The properties represent the core structural components of the format, which define the grammar's lexical scanner and parser specifications. These are extracted based on the official Coco/R documentation and include:
- Imports: Optional list of namespace or package import statements (e.g.,
using System;). - Compiler Name: The identifier following the
COMPILERkeyword, which names the grammar and serves as the start symbol. - Global Fields and Methods: Optional declarations of fields and methods in the target language, placed after the
COMPILERdeclaration but before the scanner specification. - IgnoreCase: A boolean indicating whether scanning is case-insensitive (present if the
IGNORECASEkeyword appears). - Character Sets: A collection of named character set declarations under
CHARACTERS, each consisting of an identifier and a set expression (e.g.,digit = "0123456789".). - Tokens: A collection of token declarations under
TOKENS, each defining terminal symbols via EBNF expressions (e.g.,ident = letter {letter | digit}.). - Pragmas: Optional collection of pragma declarations under
PRAGMAS, which are special tokens processed semantically without entering the parser. - Comments: A list of comment delimiters defined with
COMMENTS FROM ... TO ...(optionallyNESTED). - Ignore (White Space): The set of characters to ignore during scanning, defined with
IGNORE(e.g., spaces, tabs, newlines). - Productions: A collection of syntactic productions under
PRODUCTIONS, each consisting of a nonterminal identifier, optional attributes, and an EBNF expression with semantic actions (e.g.,Expr = Term {("+" | "-") Term}.). - End Name: The identifier following the
ENDkeyword, which must match the Compiler Name for validation.
These properties are text-based and delimited by keywords, with no fixed byte offsets or magic numbers, as the format is human-readable and parsed sequentially.
2. Two Direct Download Links for .ATG Files
- https://ssw.jku.at/Coco/CS/CSharp.ATG (Sample grammar for C#)
- https://ssw.jku.at/Coco/Java/Java.ATG (Sample grammar for Java)
3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .ATG File Dump
The following is an HTML snippet with embedded JavaScript that can be inserted into a Ghost blog post. It creates a drag-and-drop area where a user can upload a .ATG file. The script reads the file as text, parses it using a simple state machine and regular expressions to extract the properties listed above, and displays them on the screen in a structured format.
This script uses a basic parser that assumes standard formatting and may not handle complex nested expressions or errors robustly, but it extracts and displays the properties as JSON.
4. Python Class for .ATG File Handling
import re
class ATGFile:
def __init__(self, filepath=None):
self.properties = {
'imports': [],
'compilerName': '',
'globalFieldsAndMethods': '',
'ignoreCase': False,
'characterSets': {},
'tokens': {},
'pragmas': [],
'comments': [],
'ignore': '',
'productions': {},
'endName': ''
}
if filepath:
self.read(filepath)
def read(self, filepath):
with open(filepath, 'r', encoding='utf-8') as f:
content = f.read()
lines = content.splitlines()
state = 'imports'
for line in lines:
line = line.strip()
if not line:
continue
if state == 'imports' and re.match(r'^(using|import)\s', line):
self.properties['imports'].append(line)
elif re.match(r'^COMPILER', line):
self.properties['compilerName'] = re.sub(r'^COMPILER\s*', '', line)
state = 'global'
elif state == 'global' and not re.match(r'^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)', line):
self.properties['globalFieldsAndMethods'] += line + '\n'
elif line == 'IGNORECASE':
self.properties['ignoreCase'] = True
state = 'scanner'
elif re.match(r'^CHARACTERS', line):
state = 'characters'
elif state == 'characters' and '=' in line:
name, defn = [s.strip() for s in line.split('=', 1)]
self.properties['characterSets'][name] = defn.rstrip('.')
elif re.match(r'^TOKENS', line):
state = 'tokens'
elif state == 'tokens' and '=' in line:
name, defn = [s.strip() for s in line.split('=', 1)]
self.properties['tokens'][name] = defn.rstrip('.')
elif re.match(r'^PRAGMAS', line):
state = 'pragmas'
elif state == 'pragmas' and '=' in line:
self.properties['pragmas'].append(line.rstrip('.'))
elif re.match(r'^COMMENTS FROM', line):
self.properties['comments'].append(line)
elif re.match(r'^IGNORE', line):
self.properties['ignore'] = re.sub(r'^IGNORE\s*', '', line)
elif re.match(r'^PRODUCTIONS', line):
state = 'productions'
elif state == 'productions' and '=' in line:
name, defn = [s.strip() for s in line.split('=', 1)]
self.properties['productions'][name] = defn.rstrip('.')
elif re.match(r'^END', line):
self.properties['endName'] = re.sub(r'^END\s*', '', line).rstrip('.')
state = 'end'
def print_properties(self):
import json
print(json.dumps(self.properties, indent=4))
def write(self, filepath):
with open(filepath, 'w', encoding='utf-8') as f:
for imp in self.properties['imports']:
f.write(imp + '\n')
f.write(f"COMPILER {self.properties['compilerName']}\n")
f.write(self.properties['globalFieldsAndMethods'])
if self.properties['ignoreCase']:
f.write('IGNORECASE\n')
f.write('CHARACTERS\n')
for name, defn in self.properties['characterSets'].items():
f.write(f" {name} = {defn}.\n")
f.write('TOKENS\n')
for name, defn in self.properties['tokens'].items():
f.write(f" {name} = {defn}.\n")
if self.properties['pragmas']:
f.write('PRAGMAS\n')
for pragma in self.properties['pragmas']:
f.write(f" {pragma}.\n")
for comment in self.properties['comments']:
f.write(comment + '\n')
if self.properties['ignore']:
f.write(f"IGNORE {self.properties['ignore']}\n")
f.write('PRODUCTIONS\n')
for name, defn in self.properties['productions'].items():
f.write(f" {name} = {defn}.\n")
f.write(f"END {self.properties['endName']}.\n")
# Example usage:
# atg = ATGFile('example.ATG')
# atg.print_properties()
# atg.write('output.ATG')
This class reads and parses the file using regular expressions and a state machine, prints the properties as JSON, and writes a new .ATG file based on the parsed properties. It handles basic cases but may require enhancements for deeply nested expressions or semantic actions.
5. Java Class for .ATG File Handling
import java.io.*;
import java.util.*;
import java.util.regex.*;
public class ATGFile {
private Map<String, Object> properties = new HashMap<>();
public ATGFile(String filepath) throws IOException {
properties.put("imports", new ArrayList<String>());
properties.put("compilerName", "");
properties.put("globalFieldsAndMethods", "");
properties.put("ignoreCase", false);
properties.put("characterSets", new HashMap<String, String>());
properties.put("tokens", new HashMap<String, String>());
properties.put("pragmas", new ArrayList<String>());
properties.put("comments", new ArrayList<String>());
properties.put("ignore", "");
properties.put("productions", new HashMap<String, String>());
properties.put("endName", "");
if (filepath != null) {
read(filepath);
}
}
public void read(String filepath) throws IOException {
StringBuilder content = new StringBuilder();
try (BufferedReader reader = new BufferedReader(new FileReader(filepath))) {
String line;
while ((line = reader.readLine()) != null) {
content.append(line).append("\n");
}
}
String[] lines = content.toString().split("\n");
String state = "imports";
for (String line : lines) {
line = line.trim();
if (line.isEmpty()) continue;
Pattern p = Pattern.compile("^COMPILER\\s*(.*)");
Matcher m = p.matcher(line);
if (state.equals("imports") && (line.startsWith("using ") || line.startsWith("import "))) {
((List<String>) properties.get("imports")).add(line);
} else if (m.matches()) {
properties.put("compilerName", m.group(1));
state = "global";
} else if (state.equals("global") && !Pattern.matches("^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)", line)) {
properties.put("globalFieldsAndMethods", (String) properties.get("globalFieldsAndMethods") + line + "\n");
} else if (line.equals("IGNORECASE")) {
properties.put("ignoreCase", true);
state = "scanner";
} else if (line.startsWith("CHARACTERS")) {
state = "characters";
} else if (state.equals("characters") && line.contains("=")) {
String[] parts = line.split("=", 2);
((Map<String, String>) properties.get("characterSets")).put(parts[0].trim(), parts[1].trim().replace(".", ""));
} else if (line.startsWith("TOKENS")) {
state = "tokens";
} else if (state.equals("tokens") && line.contains("=")) {
String[] parts = line.split("=", 2);
((Map<String, String>) properties.get("tokens")).put(parts[0].trim(), parts[1].trim().replace(".", ""));
} else if (line.startsWith("PRAGMAS")) {
state = "pragmas";
} else if (state.equals("pragmas") && line.contains("=")) {
((List<String>) properties.get("pragmas")).add(line.replace(".", ""));
} else if (line.startsWith("COMMENTS FROM")) {
((List<String>) properties.get("comments")).add(line);
} else if (line.startsWith("IGNORE")) {
properties.put("ignore", line.replaceFirst("IGNORE\\s*", ""));
} else if (line.startsWith("PRODUCTIONS")) {
state = "productions";
} else if (state.equals("productions") && line.contains("=")) {
String[] parts = line.split("=", 2);
((Map<String, String>) properties.get("productions")).put(parts[0].trim(), parts[1].trim().replace(".", ""));
} else if (line.startsWith("END")) {
properties.put("endName", line.replaceFirst("END\\s*", "").replace(".", ""));
state = "end";
}
}
}
public void printProperties() {
System.out.println(properties);
}
public void write(String filepath) throws IOException {
try (BufferedWriter writer = new BufferedWriter(new FileWriter(filepath))) {
for (String imp : (List<String>) properties.get("imports")) {
writer.write(imp + "\n");
}
writer.write("COMPILER " + properties.get("compilerName") + "\n");
writer.write((String) properties.get("globalFieldsAndMethods"));
if ((boolean) properties.get("ignoreCase")) {
writer.write("IGNORECASE\n");
}
writer.write("CHARACTERS\n");
for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("characterSets")).entrySet()) {
writer.write(" " + entry.getKey() + " = " + entry.getValue() + ".\n");
}
writer.write("TOKENS\n");
for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("tokens")).entrySet()) {
writer.write(" " + entry.getKey() + " = " + entry.getValue() + ".\n");
}
if (!((List<String>) properties.get("pragmas")).isEmpty()) {
writer.write("PRAGMAS\n");
for (String pragma : (List<String>) properties.get("pragmas")) {
writer.write(" " + pragma + ".\n");
}
}
for (String comment : (List<String>) properties.get("comments")) {
writer.write(comment + "\n");
}
if (!((String) properties.get("ignore")).isEmpty()) {
writer.write("IGNORE " + properties.get("ignore") + "\n");
}
writer.write("PRODUCTIONS\n");
for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("productions")).entrySet()) {
writer.write(" " + entry.getKey() + " = " + entry.getValue() + ".\n");
}
writer.write("END " + properties.get("endName") + ".\n");
}
}
// Example usage:
// public static void main(String[] args) throws IOException {
// ATGFile atg = new ATGFile("example.ATG");
// atg.printProperties();
// atg.write("output.ATG");
// }
}
This class uses a similar parsing approach, prints the properties as a map string, and writes the file. It handles basic validation but assumes well-formed input.
6. JavaScript Class for .ATG File Handling
const fs = require('fs');
class ATGFile {
constructor(filepath = null) {
this.properties = {
imports: [],
compilerName: '',
globalFieldsAndMethods: '',
ignoreCase: false,
characterSets: {},
tokens: {},
pragmas: [],
comments: [],
ignore: '',
productions: {},
endName: ''
};
if (filepath) {
this.read(filepath);
}
}
read(filepath) {
const content = fs.readFileSync(filepath, 'utf-8');
const lines = content.split('\n');
let state = 'imports';
for (let line of lines) {
line = line.trim();
if (!line) continue;
if (state === 'imports' && (line.startsWith('using ') || line.startsWith('import '))) {
this.properties.imports.push(line);
} else if (line.startsWith('COMPILER')) {
this.properties.compilerName = line.replace(/^COMPILER\s*/, '');
state = 'global';
} else if (state === 'global' && !/^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)/.test(line)) {
this.properties.globalFieldsAndMethods += line + '\n';
} else if (line === 'IGNORECASE') {
this.properties.ignoreCase = true;
state = 'scanner';
} else if (line.startsWith('CHARACTERS')) {
state = 'characters';
} else if (state === 'characters' && line.includes('=')) {
const [name, defn] = line.split('=').map(s => s.trim());
this.properties.characterSets[name] = defn.replace(/\.$/, '');
} else if (line.startsWith('TOKENS')) {
state = 'tokens';
} else if (state === 'tokens' && line.includes('=')) {
const [name, defn] = line.split('=').map(s => s.trim());
this.properties.tokens[name] = defn.replace(/\.$/, '');
} else if (line.startsWith('PRAGMAS')) {
state = 'pragmas';
} else if (state === 'pragmas' && line.includes('=')) {
this.properties.pragmas.push(line.replace(/\.$/, ''));
} else if (line.startsWith('COMMENTS FROM')) {
this.properties.comments.push(line);
} else if (line.startsWith('IGNORE')) {
this.properties.ignore = line.replace(/^IGNORE\s*/, '');
} else if (line.startsWith('PRODUCTIONS')) {
state = 'productions';
} else if (state === 'productions' && line.includes('=')) {
const [name, defn] = line.split('=').map(s => s.trim());
this.properties.productions[name] = defn.replace(/\.$/, '');
} else if (line.startsWith('END')) {
this.properties.endName = line.replace(/^END\s*/, '').replace(/\.$/, '');
state = 'end';
}
}
}
printProperties() {
console.log(JSON.stringify(this.properties, null, 4));
}
write(filepath) {
let output = '';
this.properties.imports.forEach(imp => output += imp + '\n');
output += `COMPILER ${this.properties.compilerName}\n`;
output += this.properties.globalFieldsAndMethods;
if (this.properties.ignoreCase) output += 'IGNORECASE\n';
output += 'CHARACTERS\n';
for (let [name, defn] of Object.entries(this.properties.characterSets)) {
output += ` ${name} = ${defn}.\n`;
}
output += 'TOKENS\n';
for (let [name, defn] of Object.entries(this.properties.tokens)) {
output += ` ${name} = ${defn}.\n`;
}
if (this.properties.pragmas.length > 0) {
output += 'PRAGMAS\n';
this.properties.pragmas.forEach(pragma => output += ` ${pragma}.\n`);
}
this.properties.comments.forEach(comment => output += comment + '\n');
if (this.properties.ignore) output += `IGNORE ${this.properties.ignore}\n`;
output += 'PRODUCTIONS\n';
for (let [name, defn] of Object.entries(this.properties.productions)) {
output += ` ${name} = ${defn}.\n`;
}
output += `END ${this.properties.endName}.\n`;
fs.writeFileSync(filepath, output, 'utf-8');
}
}
// Example usage:
// const atg = new ATGFile('example.ATG');
// atg.printProperties();
// atg.write('output.ATG');
This class is designed for Node.js, reads and parses the file, prints properties as JSON, and writes a new file.
7. C++ Class for .ATG File Handling
#include <iostream>
#include <fstream>
#include <sstream>
#include <string>
#include <vector>
#include <map>
#include <regex>
class ATGFile {
private:
std::vector<std::string> imports;
std::string compilerName;
std::string globalFieldsAndMethods;
bool ignoreCase;
std::map<std::string, std::string> characterSets;
std::map<std::string, std::string> tokens;
std::vector<std::string> pragmas;
std::vector<std::string> comments;
std::string ignore;
std::map<std::string, std::string> productions;
std::string endName;
public:
ATGFile(const std::string& filepath = "") {
ignoreCase = false;
if (!filepath.empty()) {
read(filepath);
}
}
void read(const std::string& filepath) {
std::ifstream file(filepath);
if (!file.is_open()) {
std::cerr << "Failed to open file." << std::endl;
return;
}
std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
file.close();
std::istringstream iss(content);
std::string line;
std::string state = "imports";
while (std::getline(iss, line)) {
line = std::regex_replace(line, std::regex("^\\s+|\\s+$"), "");
if (line.empty()) continue;
std::smatch match;
if (state == "imports" && (std::regex_search(line, std::regex("^using ")) || std::regex_search(line, std::regex("^import ")))) {
imports.push_back(line);
} else if (std::regex_match(line, match, std::regex("^COMPILER\\s*(.*)"))) {
compilerName = match[1];
state = "global";
} else if (state == "global" && !std::regex_search(line, std::regex("^(IGNORECASE|CHARACTERS|TOKENS|PRAGMAS|COMMENTS|IGNORE|PRODUCTIONS|END)"))) {
globalFieldsAndMethods += line + "\n";
} else if (line == "IGNORECASE") {
ignoreCase = true;
state = "scanner";
} else if (std::regex_search(line, std::regex("^CHARACTERS"))) {
state = "characters";
} else if (state == "characters" && line.find('=') != std::string::npos) {
size_t pos = line.find('=');
std::string name = line.substr(0, pos);
std::string defn = line.substr(pos + 1);
name = std::regex_replace(name, std::regex("^\\s+|\\s+$"), "");
defn = std::regex_replace(defn, std::regex("^\\s+|\\s+$|\\.$"), "");
characterSets[name] = defn;
} else if (std::regex_search(line, std::regex("^TOKENS"))) {
state = "tokens";
} else if (state == "tokens" && line.find('=') != std::string::npos) {
size_t pos = line.find('=');
std::string name = line.substr(0, pos);
std::string defn = line.substr(pos + 1);
name = std::regex_replace(name, std::regex("^\\s+|\\s+$"), "");
defn = std::regex_replace(defn, std::regex("^\\s+|\\s+$|\\.$"), "");
tokens[name] = defn;
} else if (std::regex_search(line, std::regex("^PRAGMAS"))) {
state = "pragmas";
} else if (state == "pragmas" && line.find('=') != std::string::npos) {
pragmas.push_back(std::regex_replace(line, std::regex("\\.$"), ""));
} else if (std::regex_search(line, std::regex("^COMMENTS FROM"))) {
comments.push_back(line);
} else if (std::regex_match(line, match, std::regex("^IGNORE\\s*(.*)"))) {
ignore = match[1];
} else if (std::regex_search(line, std::regex("^PRODUCTIONS"))) {
state = "productions";
} else if (state == "productions" && line.find('=') != std::string::npos) {
size_t pos = line.find('=');
std::string name = line.substr(0, pos);
std::string defn = line.substr(pos + 1);
name = std::regex_replace(name, std::regex("^\\s+|\\s+$"), "");
defn = std::regex_replace(defn, std::regex("^\\s+|\\s+$|\\.$"), "");
productions[name] = defn;
} else if (std::regex_match(line, match, std::regex("^END\\s*(.*)\\.$"))) {
endName = match[1];
state = "end";
}
}
}
void printProperties() {
std::cout << "{\n";
std::cout << " \"imports\": [";
for (size_t i = 0; i < imports.size(); ++i) {
std::cout << "\"" << imports[i] << "\"" << (i < imports.size() - 1 ? ", " : "");
}
std::cout << "],\n";
std::cout << " \"compilerName\": \"" << compilerName << "\",\n";
std::cout << " \"globalFieldsAndMethods\": \"" << globalFieldsAndMethods << "\",\n";
std::cout << " \"ignoreCase\": " << (ignoreCase ? "true" : "false") << ",\n";
std::cout << " \"characterSets\": {";
for (auto it = characterSets.begin(); it != characterSets.end(); ) {
std::cout << "\"" << it->first << "\": \"" << it->second << "\"";
if (++it != characterSets.end()) std::cout << ", ";
}
std::cout << "},\n";
std::cout << " \"tokens\": {";
for (auto it = tokens.begin(); it != tokens.end(); ) {
std::cout << "\"" << it->first << "\": \"" << it->second << "\"";
if (++it != tokens.end()) std::cout << ", ";
}
std::cout << "},\n";
std::cout << " \"pragmas\": [";
for (size_t i = 0; i < pragmas.size(); ++i) {
std::cout << "\"" << pragmas[i] << "\"" << (i < pragmas.size() - 1 ? ", " : "");
}
std::cout << "],\n";
std::cout << " \"comments\": [";
for (size_t i = 0; i < comments.size(); ++i) {
std::cout << "\"" << comments[i] << "\"" << (i < comments.size() - 1 ? ", " : "");
}
std::cout << "],\n";
std::cout << " \"ignore\": \"" << ignore << "\",\n";
std::cout << " \"productions\": {";
for (auto it = productions.begin(); it != productions.end(); ) {
std::cout << "\"" << it->first << "\": \"" << it->second << "\"";
if (++it != productions.end()) std::cout << ", ";
}
std::cout << "},\n";
std::cout << " \"endName\": \"" << endName << "\"\n";
std::cout << "}\n";
}
void write(const std::string& filepath) {
std::ofstream file(filepath);
if (!file.is_open()) {
std::cerr << "Failed to open file for writing." << std::endl;
return;
}
for (const auto& imp : imports) {
file << imp << "\n";
}
file << "COMPILER " << compilerName << "\n";
file << globalFieldsAndMethods;
if (ignoreCase) file << "IGNORECASE\n";
file << "CHARACTERS\n";
for (const auto& entry : characterSets) {
file << " " << entry.first << " = " << entry.second << ".\n";
}
file << "TOKENS\n";
for (const auto& entry : tokens) {
file << " " << entry.first << " = " << entry.second << ".\n";
}
if (!pragmas.empty()) {
file << "PRAGMAS\n";
for (const auto& pragma : pragmas) {
file << " " << pragma << ".\n";
}
}
for (const auto& comment : comments) {
file << comment << "\n";
}
if (!ignore.empty()) file << "IGNORE " << ignore << "\n";
file << "PRODUCTIONS\n";
for (const auto& entry : productions) {
file << " " << entry.first << " = " << entry.second << ".\n";
}
file << "END " << endName << ".\n";
file.close();
}
};
// Example usage:
// int main() {
// ATGFile atg("example.ATG");
// atg.printProperties();
// atg.write("output.ATG");
// return 0;
// }
This class uses C++ standard libraries for file I/O and regex-based parsing, prints properties in JSON-like format to console, and writes a new file. It provides basic functionality but may need adjustments for edge cases involving complex semantic actions.