Task 032: .ANN File Format
Task 032: .ANN File Format
Step 1: .ANN File Format Specifications and Properties
The .ANN
file extension is associated with multiple formats, but based on the provided context, the most common uses are:
Lingvo Dictionary Annotation File (developed by ABBYY):
- Purpose: Stores metadata for Lingvo-formatted dictionaries (
.DSL
files). - Content: Contains information such as dictionary subject, author, and creation date.
- Structure: Text-based, typically containing key-value pairs or structured text describing the dictionary.
- Properties:
- File extension:
.ann
- MIME type: Not standardized (often
application/octet-stream
). - Encoding: Typically text-based, likely UTF-8 or similar for compatibility with Lingvo software.
- Associated file: Must be named after and stored in the same directory as the corresponding
.DSL
file (e.g.,example.dsl
andexample.ann
). - Optional: The file is not required for the dictionary to function but provides additional metadata.
- Content fields: Subject, author, creation date, and potentially other dictionary metadata.
Windows Help Annotation File (used by Microsoft Windows Help):
- Purpose: Stores user annotations for Windows Help documents (
.HLP
files). - Content: Includes annotation count and details about each annotation (e.g., text, position).
- Structure: Binary or structured text, specific to the Windows Help system.
- Properties:
- File extension:
.ann
- MIME type: Not standardized.
- Encoding: Likely binary or proprietary format tied to the Windows Help system.
- Associated file: Linked to a specific
.HLP
file. - Content fields: Number of annotations, annotation text, and positional metadata.
BRAT Annotation File (used by the BRAT rapid annotation tool):
- Purpose: Stores annotations for text in a standoff format, used for natural language processing (NLP).
- Content: Annotations like entities, relations, events, and attributes, linked to a text file.
- Structure: Text-based, with a specific syntax (e.g.,
T1 Claim 78 140 text...
). - Properties:
- File extension:
.ann
- MIME type: Not standardized (text-based).
- Encoding: Typically UTF-8.
- Associated file: Paired with a text file (e.g.,
example.txt
andexample.ann
). - Content fields: Annotation ID, type (e.g., Claim, Premise), character offsets, and annotated text.
Other Uses (less common, mentioned in sources):
- ENVI Annotation Data: Used by ENVI software for GIS and image analysis, storing map annotations.
- Zemax Annotations Data: Used by Zemax for optical design annotations.
- Piklib Animation Data: Used for animation data, but specifics are unclear.
- King-FIX Data: Used for data storage, but details are limited.
Given the ambiguity of the .ANN
format, I’ll assume the Lingvo Dictionary Annotation File is the primary focus, as it’s the most commonly referenced in the provided sources and has a clear text-based structure suitable for programming tasks. If you prefer a different .ANN
format (e.g., BRAT or Windows Help), please clarify, and I can adapt the solution.
Intrinsic File System Properties for Lingvo .ANN Files:
- File Extension:
.ann
- MIME Type:
application/octet-stream
(not officially standardized). - Encoding: Text-based, typically UTF-8.
- File Association: Must be named after and stored in the same directory as the associated
.DSL
file. - Content Structure: Key-value pairs or structured text containing metadata (e.g., subject, author, creation date).
- Optional Nature: The file is optional and not required for the associated
.DSL
file to function. - File Location: Must reside in the same directory as the
.DSL
file for proper recognition by ABBYY Lingvo software.
Step 2: Python Class for .ANN (Lingvo) Files
import os
class AnnFileHandler:
def __init__(self, file_path):
self.file_path = file_path
self.properties = {
"extension": ".ann",
"mime_type": "application/octet-stream",
"encoding": "utf-8",
"associated_file": os.path.splitext(file_path)[0] + ".dsl",
"is_optional": True,
"metadata": {}
}
def read(self):
"""Read and decode the .ANN file."""
try:
with open(self.file_path, 'r', encoding=self.properties["encoding"]) as f:
lines = f.readlines()
for line in lines:
# Assuming simple key-value pairs separated by '='
if '=' in line:
key, value = line.strip().split('=', 1)
self.properties["metadata"][key.strip()] = value.strip()
# Verify associated .DSL file exists
self.properties["has_associated_file"] = os.path.exists(self.properties["associated_file"])
except FileNotFoundError:
print(f"Error: File {self.file_path} not found.")
except Exception as e:
print(f"Error reading file: {e}")
def write(self, metadata=None):
"""Write metadata to the .ANN file."""
if metadata:
self.properties["metadata"].update(metadata)
try:
with open(self.file_path, 'w', encoding=self.properties["encoding"]) as f:
for key, value in self.properties["metadata"].items():
f.write(f"{key}={value}\n")
except Exception as e:
print(f"Error writing file: {e}")
def print_properties(self):
"""Print all properties to console."""
print("ANN File Properties:")
for key, value in self.properties.items():
if key == "metadata":
print(" Metadata:")
for k, v in value.items():
print(f" {k}: {v}")
else:
print(f" {key}: {value}")
# Example usage
if __name__ == "__main__":
ann_file = AnnFileHandler("example.ann")
# Example metadata for writing
sample_metadata = {
"subject": "English Dictionary",
"author": "John Doe",
"creation_date": "2025-09-08"
}
ann_file.write(sample_metadata)
ann_file.read()
ann_file.print_properties()
Step 3: Java Class for .ANN (Lingvo) Files
import java.io.*;
import java.util.HashMap;
import java.util.Map;
public class AnnFileHandler {
private String filePath;
private Map<String, Object> properties;
public AnnFileHandler(String filePath) {
this.filePath = filePath;
this.properties = new HashMap<>();
properties.put("extension", ".ann");
properties.put("mime_type", "application/octet-stream");
properties.put("encoding", "UTF-8");
properties.put("associated_file", filePath.replace(".ann", ".dsl"));
properties.put("is_optional", true);
properties.put("metadata", new HashMap<String, String>());
}
public void read() {
try (BufferedReader reader = new BufferedReader(new FileReader(filePath))) {
String line;
Map<String, String> metadata = (Map<String, String>) properties.get("metadata");
while ((line = reader.readLine()) != null) {
if (line.contains("=")) {
String[] parts = line.split("=", 2);
metadata.put(parts[0].trim(), parts[1].trim());
}
}
properties.put("has_associated_file", new File((String) properties.get("associated_file")).exists());
} catch (FileNotFoundException e) {
System.out.println("Error: File " + filePath + " not found.");
} catch (IOException e) {
System.out.println("Error reading file: " + e.getMessage());
}
}
public void write(Map<String, String> metadata) {
if (metadata != null) {
((Map<String, String>) properties.get("metadata")).putAll(metadata);
}
try (BufferedWriter writer = new BufferedWriter(new FileWriter(filePath))) {
for (Map.Entry<String, String> entry : ((Map<String, String>) properties.get("metadata")).entrySet()) {
writer.write(entry.getKey() + "=" + entry.getValue());
writer.newLine();
}
} catch (IOException e) {
System.out.println("Error writing file: " + e.getMessage());
}
}
public void printProperties() {
System.out.println("ANN File Properties:");
for (Map.Entry<String, Object> entry : properties.entrySet()) {
if (entry.getKey().equals("metadata")) {
System.out.println(" Metadata:");
for (Map.Entry<String, String> meta : ((Map<String, String>) entry.getValue()).entrySet()) {
System.out.println(" " + meta.getKey() + ": " + meta.getValue());
}
} else {
System.out.println(" " + entry.getKey() + ": " + entry.getValue());
}
}
}
public static void main(String[] args) {
AnnFileHandler annFile = new AnnFileHandler("example.ann");
Map<String, String> sampleMetadata = new HashMap<>();
sampleMetadata.put("subject", "English Dictionary");
sampleMetadata.put("author", "John Doe");
sampleMetadata.put("creation_date", "2025-09-08");
annFile.write(sampleMetadata);
annFile.read();
annFile.printProperties();
}
}
Step 4: JavaScript Class for .ANN (Lingvo) Files
const fs = require('fs').promises;
const path = require('path');
class AnnFileHandler {
constructor(filePath) {
this.filePath = filePath;
this.properties = {
extension: '.ann',
mime_type: 'application/octet-stream',
encoding: 'utf-8',
associated_file: path.join(path.dirname(filePath), path.basename(filePath, '.ann') + '.dsl'),
is_optional: true,
metadata: {}
};
}
async read() {
try {
const data = await fs.readFile(this.filePath, this.properties.encoding);
const lines = data.split('\n');
for (const line of lines) {
if (line.includes('=')) {
const [key, value] = line.split('=', 2);
this.properties.metadata[key.trim()] = value.trim();
}
}
this.properties.has_associated_file = await fs.access(this.properties.associated_file).then(() => true).catch(() => false);
} catch (error) {
if (error.code === 'ENOENT') {
console.log(`Error: File ${this.filePath} not found.`);
} else {
console.log(`Error reading file: ${error.message}`);
}
}
}
async write(metadata) {
if (metadata) {
Object.assign(this.properties.metadata, metadata);
}
try {
const content = Object.entries(this.properties.metadata)
.map(([key, value]) => `${key}=${value}`)
.join('\n');
await fs.writeFile(this.filePath, content, this.properties.encoding);
} catch (error) {
console.log(`Error writing file: ${error.message}`);
}
}
printProperties() {
console.log('ANN File Properties:');
for (const [key, value] of Object.entries(this.properties)) {
if (key === 'metadata') {
console.log(' Metadata:');
for (const [k, v] of Object.entries(value)) {
console.log(` ${k}: ${v}`);
}
} else {
console.log(` ${key}: ${value}`);
}
}
}
}
// Example usage
(async () => {
const annFile = new AnnFileHandler('example.ann');
const sampleMetadata = {
subject: 'English Dictionary',
author: 'John Doe',
creation_date: '2025-09-08'
};
await annFile.write(sampleMetadata);
await annFile.read();
annFile.printProperties();
})();
Step 5: C Class for .ANN (Lingvo) Files
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdbool.h>
#define MAX_LINE 256
#define MAX_KEY 128
#define MAX_VALUE 512
typedef struct {
char* key;
char* value;
} MetadataEntry;
typedef struct {
char* extension;
char* mime_type;
char* encoding;
char* associated_file;
bool is_optional;
MetadataEntry* metadata;
int metadata_count;
bool has_associated_file;
} AnnFile;
AnnFile* create_ann_file(const char* file_path) {
AnnFile* ann = (AnnFile*)malloc(sizeof(AnnFile));
ann->extension = strdup(".ann");
ann->mime_type = strdup("application/octet-stream");
ann->encoding = strdup("utf-8");
char* dsl_path = strdup(file_path);
dsl_path[strlen(dsl_path) - 3] = 'd';
dsl_path[strlen(dsl_path) - 2] = 's';
dsl_path[strlen(dsl_path) - 1] = 'l';
ann->associated_file = dsl_path;
ann->is_optional = true;
ann->metadata = NULL;
ann->metadata_count = 0;
FILE* f = fopen(ann->associated_file, "r");
ann->has_associated_file = (f != NULL);
if (f) fclose(f);
return ann;
}
void destroy_ann_file(AnnFile* ann) {
free(ann->extension);
free(ann->mime_type);
free(ann->encoding);
free(ann->associated_file);
for (int i = 0; i < ann->metadata_count; i++) {
free(ann->metadata[i].key);
free(ann->metadata[i].value);
}
free(ann->metadata);
free(ann);
}
void read_ann_file(AnnFile* ann, const char* file_path) {
FILE* file = fopen(file_path, "r");
if (!file) {
printf("Error: File %s not found.\n", file_path);
return;
}
char line[MAX_LINE];
while (fgets(line, MAX_LINE, file)) {
char* pos = strchr(line, '=');
if (pos) {
ann->metadata = (MetadataEntry*)realloc(ann->metadata, (ann->metadata_count + 1) * sizeof(MetadataEntry));
char* key = strndup(line, pos - line);
char* value = strdup(pos + 1);
value[strcspn(value, "\n")] = 0; // Remove newline
ann->metadata[ann->metadata_count].key = key;
ann->metadata[ann->metadata_count].value = value;
ann->metadata_count++;
}
}
fclose(file);
}
void write_ann_file(AnnFile* ann, const char* file_path) {
FILE* file = fopen(file_path, "w");
if (!file) {
printf("Error writing file: %s\n", file_path);
return;
}
for (int i = 0; i < ann->metadata_count; i++) {
fprintf(file, "%s=%s\n", ann->metadata[i].key, ann->metadata[i].value);
}
fclose(file);
}
void print_properties(AnnFile* ann) {
printf("ANN File Properties:\n");
printf(" extension: %s\n", ann->extension);
printf(" mime_type: %s\n", ann->mime_type);
printf(" encoding: %s\n", ann->encoding);
printf(" associated_file: %s\n", ann->associated_file);
printf(" is_optional: %s\n", ann->is_optional ? "true" : "false");
printf(" has_associated_file: %s\n", ann->has_associated_file ? "true" : "false");
printf(" Metadata:\n");
for (int i = 0; i < ann->metadata_count; i++) {
printf(" %s: %s\n", ann->metadata[i].key, ann->metadata[i].value);
}
}
int main() {
AnnFile* ann = create_ann_file("example.ann");
// Sample metadata
read_ann_file(ann, "example.ann"); // Read existing file if any
// Add new metadata
ann->metadata = (MetadataEntry*)realloc(ann->metadata, (ann->metadata_count + 3) * sizeof(MetadataEntry));
ann->metadata[ann->metadata_count++] = (MetadataEntry){strdup("subject"), strdup("English Dictionary")};
ann->metadata[ann->metadata_count++] = (MetadataEntry){strdup("author"), strdup("John Doe")};
ann->metadata[ann->metadata_count++] = (MetadataEntry){strdup("creation_date"), strdup("2025-09-08")};
write_ann_file(ann, "example.ann");
print_properties(ann);
destroy_ann_file(ann);
return 0;
}
Notes and Assumptions
- Format Choice: The code assumes the Lingvo
.ANN
format due to its prominence in the sources and text-based structure, which is easier to handle programmatically. If you need code for another.ANN
format (e.g., BRAT or Windows Help), please specify. - Structure: The Lingvo
.ANN
file is assumed to use a simple key-value pair format (e.g.,subject=English Dictionary
). If the format is more complex, the parsing logic would need adjustment. - Error Handling: Each class includes basic error handling for file not found and I/O errors.
- File System Properties: The properties listed are derived from the sources and include file system-related attributes (e.g., extension, associated file) and content-related attributes (e.g., metadata).
- Dependencies: The JavaScript code requires Node.js for file system access. The C code uses standard libraries only.
If you need modifications, a different .ANN
format, or additional features (e.g., validation, specific metadata fields), please let me know
File Format Specifications for .ANN
The .ANN file format refers to the standoff annotation format used by the BRAT (BRAT Rapid Annotation Tool), a widely used tool for text annotation in natural language processing and biomedical domains. It is a text-based format designed to store annotations separately from the source text (typically in a corresponding .txt file). The format supports various annotation types for entities, events, relations, attributes, normalizations, notes, and equivalences. It is not a binary format but a human-readable, line-based text format encoded in UTF-8.
List of all the properties of this file format intrinsic to its file system:
- Encoding: UTF-8 (allows for international characters in annotations).
- Structure: Plain text file with one annotation per line; lines are terminated by newline (\n); fields within a line are separated by tab (\t); empty lines are ignored.
- Character offsets: All text spans reference positions in the corresponding .txt file, starting from 0; start offset is inclusive, end offset is exclusive.
- Annotation ID system: Unique IDs consisting of an uppercase letter (indicating type) followed by a positive integer (e.g., T1, E2); equivalence uses '' as a special ID; IDs must be unique within the file except for ''.
- Text-bound annotation properties (type prefix 'T'): ID, label (type), list of offset pairs (each pair as start end, multiple pairs separated by '; ' for discontinuous spans), referenced text (spans joined by space if discontinuous).
- Event annotation properties (type prefix 'E'): ID, label (type), trigger ID (references a 'T' annotation), zero or more argument pairs (role:ID, where role is task-specific and ID references another annotation).
- Relation annotation properties (type prefix 'R'): ID, label (type), Arg1:ID (references an annotation), Arg2:ID (references another annotation); typically binary, but syntax allows extensions.
- Attribute/Modification annotation properties (type prefixes 'A' or 'M'): ID, label (type), target ID (references an annotation), optional value (string or boolean implied by presence).
- Normalization annotation properties (type prefix 'N'): ID, label (type), target ID (references an annotation), reference (in format DB:ID, where DB is a database name), optional quoted text.
- Note annotation properties (type prefix '#'): ID, label (type, usually 'Note'), target ID (references an annotation), comment text (free-form string).
- Equivalence annotation properties (special ID '*'): Type ('Equiv'), space-separated list of IDs (two or more, indicating equivalent annotations).
- General constraints: Annotations can reference each other via IDs; order of lines is not strictly enforced but typically follows logical dependency (e.g., text-bounds before events); no header or footer; file extension is .ann (case-insensitive in some systems).
Python class:
import os
class AnnFile:
def __init__(self):
self.annotations = {} # Key: ID, Value: dict of properties
def load(self, filename):
if not os.path.exists(filename):
raise FileNotFoundError(f"File {filename} not found")
with open(filename, 'r', encoding='utf-8') as f:
for line in f:
line = line.strip()
if not line:
continue
parts = line.split('\t')
id_ = parts[0]
ann = {}
if id_ == '*':
ann['type'] = 'equivalence'
ann['label'] = parts[1]
ann['ids'] = parts[2].split()
elif id_.startswith('T'):
ann['type'] = 'text-bound'
type_off = parts[1]
ann['text'] = parts[2]
type_off_parts = type_off.split(' ', 1)
ann['label'] = type_off_parts[0]
offs_str = type_off_parts[1]
ann['offsets'] = []
for o in offs_str.split(';'):
s, e = o.strip().split()
ann['offsets'].append((int(s), int(e)))
elif id_.startswith('R'):
ann['type'] = 'relation'
type_args = parts[1]
type_args_parts = type_args.split()
ann['label'] = type_args_parts[0]
ann['args'] = {}
for arg in type_args_parts[1:]:
role, aid = arg.split(':')
ann['args'][role] = aid
elif id_.startswith('E'):
ann['type'] = 'event'
type_trig_args = parts[1]
type_trig_args_parts = type_trig_args.split()
type_trig = type_trig_args_parts[0]
ann['label'], ann['trigger'] = type_trig.split(':')
ann['args'] = {}
for arg in type_trig_args_parts[1:]:
role, aid = arg.split(':')
ann['args'][role] = aid
elif id_.startswith('A') or id_.startswith('M'):
ann['type'] = 'attribute'
type_targ_val = parts[1]
type_targ_val_parts = type_targ_val.split()
ann['label'] = type_targ_val_parts[0]
ann['target'] = type_targ_val_parts[1]
ann['value'] = type_targ_val_parts[2] if len(type_targ_val_parts) > 2 else None
elif id_.startswith('N'):
ann['type'] = 'normalization'
type_targ_ref_text = parts[1]
type_targ_ref_text_parts = type_targ_ref_text.split()
ann['label'] = type_targ_ref_text_parts[0]
ann['target'] = type_targ_ref_text_parts[1]
ann['ref'] = type_targ_ref_text_parts[2]
ann['text'] = ' '.join(type_targ_ref_text_parts[3:])
elif id_.startswith('#'):
ann['type'] = 'note'
type_targ_comm = parts[1]
type_targ_comm_parts = type_targ_comm.split()
ann['label'] = type_targ_comm_parts[0]
ann['target'] = type_targ_comm_parts[1]
ann['comment'] = ' '.join(type_targ_comm_parts[2:])
else:
continue # Skip invalid
self.annotations[id_] = ann
def save(self, filename):
with open(filename, 'w', encoding='utf-8') as f:
for id_, ann in sorted(self.annotations.items(), key=lambda x: (x[0][0], int(x[0][1:] or 0))):
if ann['type'] == 'equivalence':
line = '*\t' + ann['label'] + ' ' + ' '.join(ann['ids'])
elif ann['type'] == 'text-bound':
offs_str = ';'.join(f"{s} {e}" for s, e in ann['offsets'])
line = f"{id_}\t{ann['label']} {offs_str}\t{ann['text']}"
elif ann['type'] == 'relation':
args_str = ' '.join(f"{role}:{aid}" for role, aid in ann['args'].items())
line = f"{id_}\t{ann['label']} {args_str}"
elif ann['type'] == 'event':
trig_str = f"{ann['label']}:{ann['trigger']}"
args_str = ' '.join(f"{role}:{aid}" for role, aid in ann['args'].items())
line = f"{id_}\t{trig_str} {args_str}" if args_str else f"{id_}\t{trig_str}"
elif ann['type'] == 'attribute':
line = f"{id_}\t{ann['label']} {ann['target']}"
if ann['value'] is not None:
line += f" {ann['value']}"
elif ann['type'] == 'normalization':
line = f"{id_}\t{ann['label']} {ann['target']} {ann['ref']} {ann['text']}"
elif ann['type'] == 'note':
line = f"{id_}\t{ann['label']} {ann['target']} {ann['comment']}"
f.write(line + '\n')
- Java class:
import java.io.*;
import java.util.*;
public class AnnFile {
private Map<String, Map<String, Object>> annotations = new HashMap<>();
public void load(String filename) throws IOException {
File file = new File(filename);
if (!file.exists()) {
throw new FileNotFoundException("File " + filename + " not found");
}
try (BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file), "UTF-8"))) {
String line;
while ((line = reader.readLine()) != null) {
line = line.trim();
if (line.isEmpty()) {
continue;
}
String[] parts = line.split("\t");
String id = parts[0];
Map<String, Object> ann = new HashMap<>();
if (id.equals("*")) {
ann.put("type", "equivalence");
ann.put("label", parts[1]);
ann.put("ids", Arrays.asList(parts[2].split(" ")));
} else if (id.startsWith("T")) {
ann.put("type", "text-bound");
String typeOff = parts[1];
ann.put("text", parts[2]);
String[] typeOffParts = typeOff.split(" ", 2);
ann.put("label", typeOffParts[0]);
String offsStr = typeOffParts[1];
List<int[]> offsets = new ArrayList<>();
for (String o : offsStr.split(";")) {
String[] se = o.trim().split(" ");
offsets.add(new int[]{Integer.parseInt(se[0]), Integer.parseInt(se[1])});
}
ann.put("offsets", offsets);
} else if (id.startsWith("R")) {
ann.put("type", "relation");
String typeArgs = parts[1];
String[] typeArgsParts = typeArgs.split(" ");
ann.put("label", typeArgsParts[0]);
Map<String, String> args = new HashMap<>();
for (int i = 1; i < typeArgsParts.length; i++) {
String[] roleAid = typeArgsParts[i].split(":");
args.put(roleAid[0], roleAid[1]);
}
ann.put("args", args);
} else if (id.startsWith("E")) {
ann.put("type", "event");
String typeTrigArgs = parts[1];
String[] typeTrigArgsParts = typeTrigArgs.split(" ");
String[] typeTrig = typeTrigArgsParts[0].split(":");
ann.put("label", typeTrig[0]);
ann.put("trigger", typeTrig[1]);
Map<String, String> args = new HashMap<>();
for (int i = 1; i < typeTrigArgsParts.length; i++) {
String[] roleAid = typeTrigArgsParts[i].split(":");
args.put(roleAid[0], roleAid[1]);
}
ann.put("args", args);
} else if (id.startsWith("A") || id.startsWith("M")) {
ann.put("type", "attribute");
String typeTargVal = parts[1];
String[] typeTargValParts = typeTargVal.split(" ");
ann.put("label", typeTargValParts[0]);
ann.put("target", typeTargValParts[1]);
if (typeTargValParts.length > 2) {
ann.put("value", typeTargValParts[2]);
}
} else if (id.startsWith("N")) {
ann.put("type", "normalization");
String typeTargRefText = parts[1];
String[] typeTargRefTextParts = typeTargRefText.split(" ");
ann.put("label", typeTargRefTextParts[0]);
ann.put("target", typeTargRefTextParts[1]);
ann.put("ref", typeTargRefTextParts[2]);
StringBuilder text = new StringBuilder();
for (int i = 3; i < typeTargRefTextParts.length; i++) {
text.append(typeTargRefTextParts[i]).append(" ");
}
ann.put("text", text.toString().trim());
} else if (id.startsWith("#")) {
ann.put("type", "note");
String typeTargComm = parts[1];
String[] typeTargCommParts = typeTargComm.split(" ");
ann.put("label", typeTargCommParts[0]);
ann.put("target", typeTargCommParts[1]);
StringBuilder comment = new StringBuilder();
for (int i = 2; i < typeTargCommParts.length; i++) {
comment.append(typeTargCommParts[i]).append(" ");
}
ann.put("comment", comment.toString().trim());
} else {
continue; // Skip invalid
}
annotations.put(id, ann);
}
}
}
public void save(String filename) throws IOException {
try (BufferedWriter writer = new BufferedWriter(new OutputStreamWriter(new FileOutputStream(filename), "UTF-8"))) {
List<String> sortedKeys = new ArrayList<>(annotations.keySet());
sortedKeys.sort((a, b) -> {
if (a.equals("*")) return -1;
if (b.equals("*")) return 1;
char ta = a.charAt(0);
char tb = b.charAt(0);
if (ta != tb) return ta - tb;
return Integer.parseInt(a.substring(1)) - Integer.parseInt(b.substring(1));
});
for (String id : sortedKeys) {
Map<String, Object> ann = annotations.get(id);
StringBuilder line = new StringBuilder();
if ("equivalence".equals(ann.get("type"))) {
line.append("*\t").append(ann.get("label")).append(" ");
List<String> ids = (List<String>) ann.get("ids");
line.append(String.join(" ", ids));
} else if ("text-bound".equals(ann.get("type"))) {
line.append(id).append("\t").append(ann.get("label")).append(" ");
List<int[]> offsets = (List<int[]>) ann.get("offsets");
for (int i = 0; i < offsets.size(); i++) {
if (i > 0) line.append(";");
line.append(offsets.get(i)[0]).append(" ").append(offsets.get(i)[1]);
}
line.append("\t").append(ann.get("text"));
} else if ("relation".equals(ann.get("type"))) {
line.append(id).append("\t").append(ann.get("label")).append(" ");
Map<String, String> args = (Map<String, String>) ann.get("args");
for (Map.Entry<String, String> entry : args.entrySet()) {
line.append(entry.getKey()).append(":").append(entry.getValue()).append(" ");
}
line = new StringBuilder(line.toString().trim());
} else if ("event".equals(ann.get("type"))) {
line.append(id).append("\t").append(ann.get("label")).append(":").append(ann.get("trigger"));
Map<String, String> args = (Map<String, String>) ann.get("args");
for (Map.Entry<String, String> entry : args.entrySet()) {
line.append(" ").append(entry.getKey()).append(":").append(entry.getValue());
}
} else if ("attribute".equals(ann.get("type"))) {
line.append(id).append("\t").append(ann.get("label")).append(" ").append(ann.get("target"));
if (ann.get("value") != null) {
line.append(" ").append(ann.get("value"));
}
} else if ("normalization".equals(ann.get("type"))) {
line.append(id).append("\t").append(ann.get("label")).append(" ").append(ann.get("target"))
.append(" ").append(ann.get("ref")).append(" ").append(ann.get("text"));
} else if ("note".equals(ann.get("type"))) {
line.append(id).append("\t").append(ann.get("label")).append(" ").append(ann.get("target"))
.append(" ").append(ann.get("comment"));
}
writer.write(line.toString());
writer.newLine();
}
}
}
}
- Javascript class (assuming Node.js environment for file I/O):
const fs = require('fs');
class AnnFile {
constructor() {
this.annotations = {}; // Key: ID, Value: object of properties
}
load(filename) {
if (!fs.existsSync(filename)) {
throw new Error(`File ${filename} not found`);
}
const data = fs.readFileSync(filename, 'utf-8');
const lines = data.split('\n');
for (let line of lines) {
line = line.trim();
if (!line) continue;
const parts = line.split('\t');
const id = parts[0];
let ann = {};
if (id === '*') {
ann.type = 'equivalence';
ann.label = parts[1];
ann.ids = parts[2].split(' ');
} else if (id.startsWith('T')) {
ann.type = 'text-bound';
const typeOff = parts[1];
ann.text = parts[2];
const typeOffParts = typeOff.split(' ', 2);
ann.label = typeOffParts[0];
const offsStr = typeOffParts[1];
ann.offsets = [];
for (let o of offsStr.split(';')) {
const [s, e] = o.trim().split(' ').map(Number);
ann.offsets.push([s, e]);
}
} else if (id.startsWith('R')) {
ann.type = 'relation';
const typeArgs = parts[1];
const typeArgsParts = typeArgs.split(' ');
ann.label = typeArgsParts[0];
ann.args = {};
for (let i = 1; i < typeArgsParts.length; i++) {
const [role, aid] = typeArgsParts[i].split(':');
ann.args[role] = aid;
}
} else if (id.startsWith('E')) {
ann.type = 'event';
const typeTrigArgs = parts[1];
const typeTrigArgsParts = typeTrigArgs.split(' ');
const [label, trigger] = typeTrigArgsParts[0].split(':');
ann.label = label;
ann.trigger = trigger;
ann.args = {};
for (let i = 1; i < typeTrigArgsParts.length; i++) {
const [role, aid] = typeTrigArgsParts[i].split(':');
ann.args[role] = aid;
}
} else if (id.startsWith('A') || id.startsWith('M')) {
ann.type = 'attribute';
const typeTargVal = parts[1];
const typeTargValParts = typeTargVal.split(' ');
ann.label = typeTargValParts[0];
ann.target = typeTargValParts[1];
ann.value = typeTargValParts.length > 2 ? typeTargValParts[2] : null;
} else if (id.startsWith('N')) {
ann.type = 'normalization';
const typeTargRefText = parts[1];
const typeTargRefTextParts = typeTargRefText.split(' ');
ann.label = typeTargRefTextParts[0];
ann.target = typeTargRefTextParts[1];
ann.ref = typeTargRefTextParts[2];
ann.text = typeTargRefTextParts.slice(3).join(' ');
} else if (id.startsWith('#')) {
ann.type = 'note';
const typeTargComm = parts[1];
const typeTargCommParts = typeTargComm.split(' ');
ann.label = typeTargCommParts[0];
ann.target = typeTargCommParts[1];
ann.comment = typeTargCommParts.slice(2).join(' ');
} else {
continue; // Skip invalid
}
this.annotations[id] = ann;
}
}
save(filename) {
let output = '';
const sortedKeys = Object.keys(this.annotations).sort((a, b) => {
if (a === '*') return -1;
if (b === '*') return 1;
const ta = a[0];
const tb = b[0];
if (ta !== tb) return ta.charCodeAt(0) - tb.charCodeAt(0);
return parseInt(a.slice(1)) - parseInt(b.slice(1));
});
for (let id of sortedKeys) {
const ann = this.annotations[id];
let line = '';
if (ann.type === 'equivalence') {
line = '*\t' + ann.label + ' ' + ann.ids.join(' ');
} else if (ann.type === 'text-bound') {
const offsStr = ann.offsets.map(([s, e]) => `${s} ${e}`).join(';');
line = `${id}\t${ann.label} ${offsStr}\t${ann.text}`;
} else if (ann.type === 'relation') {
const argsStr = Object.entries(ann.args).map(([role, aid]) => `${role}:${aid}`).join(' ');
line = `${id}\t${ann.label} ${argsStr}`;
} else if (ann.type === 'event') {
let trigStr = `${ann.label}:${ann.trigger}`;
const argsStr = Object.entries(ann.args).map(([role, aid]) => `${role}:${aid}`).join(' ');
line = `${id}\t${trigStr}` + (argsStr ? ` ${argsStr}` : '');
} else if (ann.type === 'attribute') {
line = `${id}\t${ann.label} ${ann.target}`;
if (ann.value !== null) line += ` ${ann.value}`;
} else if (ann.type === 'normalization') {
line = `${id}\t${ann.label} ${ann.target} ${ann.ref} ${ann.text}`;
} else if (ann.type === 'note') {
line = `${id}\t${ann.label} ${ann.target} ${ann.comment}`;
}
output += line + '\n';
}
fs.writeFileSync(filename, output, 'utf-8');
}
}
- C class (implemented as C++ class for object-oriented support, using standard library for file I/O and parsing):
#include <iostream>
#include <fstream>
#include <sstream>
#include <map>
#include <vector>
#include <string>
#include <algorithm>
#include <utility>
struct Annotation {
std::string type;
std::string label;
std::string text;
std::vector<std::pair<int, int>> offsets;
std::string trigger;
std::map<std::string, std::string> args;
std::string target;
std::string value;
std::string ref;
std::string comment;
std::vector<std::string> ids;
};
class AnnFile {
private:
std::map<std::string, Annotation> annotations;
public:
void load(const std::string& filename) {
std::ifstream file(filename);
if (!file.is_open()) {
throw std::runtime_error("File " + filename + " not found");
}
std::string line;
while (std::getline(file, line)) {
if (line.empty()) continue;
std::stringstream ss(line);
std::string part;
std::vector<std::string> parts;
while (std::getline(ss, part, '\t')) {
parts.push_back(part);
}
if (parts.size() < 2) continue;
std::string id = parts[0];
Annotation ann;
if (id == "*") {
ann.type = "equivalence";
ann.label = parts[1];
std::stringstream ids_ss(parts[2]);
std::string id_str;
while (ids_ss >> id_str) {
ann.ids.push_back(id_str);
}
} else if (id[0] == 'T') {
ann.type = "text-bound";
std::stringstream type_off_ss(parts[1]);
type_off_ss >> ann.label;
std::string offs_str;
std::getline(type_off_ss, offs_str);
offs_str = offs_str.substr(1); // Remove leading space
ann.text = parts[2];
std::stringstream offs_ss(offs_str);
std::string o;
while (std::getline(offs_ss, o, ';')) {
std::stringstream se_ss(o);
int s, e;
se_ss >> s >> e;
ann.offsets.emplace_back(s, e);
}
} else if (id[0] == 'R') {
ann.type = "relation";
std::stringstream type_args_ss(parts[1]);
type_args_ss >> ann.label;
std::string arg;
while (type_args_ss >> arg) {
size_t colon_pos = arg.find(':');
if (colon_pos != std::string::npos) {
std::string role = arg.substr(0, colon_pos);
std::string aid = arg.substr(colon_pos + 1);
ann.args[role] = aid;
}
}
} else if (id[0] == 'E') {
ann.type = "event";
std::stringstream type_trig_args_ss(parts[1]);
std::string type_trig;
type_trig_args_ss >> type_trig;
size_t colon_pos = type_trig.find(':');
ann.label = type_trig.substr(0, colon_pos);
ann.trigger = type_trig.substr(colon_pos + 1);
std::string arg;
while (type_trig_args_ss >> arg) {
colon_pos = arg.find(':');
if (colon_pos != std::string::npos) {
std::string role = arg.substr(0, colon_pos);
std::string aid = arg.substr(colon_pos + 1);
ann.args[role] = aid;
}
}
} else if (id[0] == 'A' || id[0] == 'M') {
ann.type = "attribute";
std::stringstream type_targ_val_ss(parts[1]);
type_targ_val_ss >> ann.label >> ann.target;
std::string val;
if (type_targ_val_ss >> val) {
ann.value = val;
}
} else if (id[0] == 'N') {
ann.type = "normalization";
std::stringstream type_targ_ref_text_ss(parts[1]);
type_targ_ref_text_ss >> ann.label >> ann.target >> ann.ref;
std::string text_part;
ann.text = "";
while (type_targ_ref_text_ss >> text_part) {
ann.text += text_part + " ";
}
if (!ann.text.empty()) ann.text.pop_back(); // Remove trailing space
} else if (id[0] == '#') {
ann.type = "note";
std::stringstream type_targ_comm_ss(parts[1]);
type_targ_comm_ss >> ann.label >> ann.target;
std::string comm_part;
ann.comment = "";
while (type_targ_comm_ss >> comm_part) {
ann.comment += comm_part + " ";
}
if (!ann.comment.empty()) ann.comment.pop_back(); // Remove trailing space
} else {
continue; // Skip invalid
}
annotations[id] = ann;
}
file.close();
}
void save(const std::string& filename) {
std::ofstream file(filename);
if (!file.is_open()) {
throw std::runtime_error("Could not open file " + filename + " for writing");
}
std::vector<std::string> sorted_keys;
for (const auto& pair : annotations) {
sorted_keys.push_back(pair.first);
}
std::sort(sorted_keys.begin(), sorted_keys.end(), [](const std::string& a, const std::string& b) {
if (a == "*") return true;
if (b == "*") return false;
char ta = a[0];
char tb = b[0];
if (ta != tb) return ta < tb;
return std::stoi(a.substr(1)) < std::stoi(b.substr(1));
});
for (const std::string& id : sorted_keys) {
const Annotation& ann = annotations[id];
std::string line;
if (ann.type == "equivalence") {
line = "*\t" + ann.label + " ";
for (const std::string& id_str : ann.ids) {
line += id_str + " ";
}
line.pop_back(); // Remove trailing space
} else if (ann.type == "text-bound") {
line = id + "\t" + ann.label + " ";
for (size_t i = 0; i < ann.offsets.size(); ++i) {
if (i > 0) line += ";";
line += std::to_string(ann.offsets[i].first) + " " + std::to_string(ann.offsets[i].second);
}
line += "\t" + ann.text;
} else if (ann.type == "relation") {
line = id + "\t" + ann.label;
for (const auto& arg : ann.args) {
line += " " + arg.first + ":" + arg.second;
}
} else if (ann.type == "event") {
line = id + "\t" + ann.label + ":" + ann.trigger;
for (const auto& arg : ann.args) {
line += " " + arg.first + ":" + arg.second;
}
} else if (ann.type == "attribute") {
line = id + "\t" + ann.label + " " + ann.target;
if (!ann.value.empty()) {
line += " " + ann.value;
}
} else if (ann.type == "normalization") {
line = id + "\t" + ann.label + " " + ann.target + " " + ann.ref + " " + ann.text;
} else if (ann.type == "note") {
line = id + "\t" + ann.label + " " + ann.target + " " + ann.comment;
}
file << line << std::endl;
}
file.close();
}
};