Task 255: .GED File Format
Task 255: .GED File Format
1. List of All Properties of the .GED File Format Intrinsic to Its File System
The .GED file format refers to GEDCOM (GEnealogical Data COMmunication), a standard for exchanging genealogical data. Based on the GEDCOM 5.5.1 specification, here is a comprehensive list of its intrinsic properties. These define the format's structure, constraints, and behaviors as a plain-text, hierarchical, line-based file format designed for portability across systems. Properties are categorized for clarity.
General File Properties
- File Extension: .GED (standard for single files); for multi-disk/large file splits, extensions like G00, G01, etc., are used.
- MIME Type: Not formally specified, but treated as text/plain for interchange.
- File Type: Plain text (non-binary), hierarchical record-based, suitable for ASCII-compatible transmission.
- Encoding/Character Sets: ANSEL (preferred, 8-bit extension of ASCII for diacritics); ASCII (7-bit subset); UTF-8 (for Unicode/ISO 10646 support); Unicode (future-oriented, but limited in 5.5.1). Code pages (e.g., IBMPC) are prohibited without explicit conversion to ANSEL.
- Byte Order: Not applicable (text-based); for UTF-16 variants in samples, little-endian (LE) is used.
- Maximum File Size: No hard limit, but logical records should fit in <32KB buffers for compatibility; multi-file splitting recommended for large datasets.
Line and Terminator Properties
- Line Endings/Terminators: Flexible: Carriage Return (CR, 0x0D), Line Feed (LF, 0x0A), CRLF (0x0D0A), or LFCR (0x0A0D). Files must end with a terminator after the trailer.
- Maximum Line Length: 255 characters (including level, delimiters, XREF, tag, value, and terminator). Exceeding this may cause parsing errors in compliant systems.
- Line Format:
level [space] [@XREF@] [space] tag [space] value
(terminator). Level is required; XREF, value optional. Empty lines are invalid. - Delimiters: Space (0x20) separates components; @ (0x40) encloses XREFs; no other delimiters intrinsic.
Structural and Hierarchical Properties
- Overall Structure:
- Header: Starts with level-0 HEAD record (required).
- Body: Sequence of level-0 records (e.g., INDI, FAM, SOUR) forming a hierarchy.
- Trailer: Ends with level-0 TRLR record (required).
- Levels: Integer 0-99 (no leading zeros); each line's level must be exactly one more than its parent's (or equal for siblings); cannot skip levels or decrease by more than 1.
- Records: Hierarchical trees starting at level 0; each record is a tagged line with optional subordinates. Records include entities like individuals (INDI), families (FAM), sources (SOUR), etc.
- Cross-References (XREFs): Optional pointers enclosed in @ (e.g., @I1@); unique within file; cannot start with # (reserved for escapes); : reserved for network refs, ! for intra-record pointers.
- Tags: Uppercase alphanumeric (typically 3-4 letters, e.g., NAME, BIRT); define data type; must be followed by value or subordinates unless pointer-only.
- Values/Data: Variable-length string after tag; can include pointers (@XREF@); lines without value/subordinates are invalid (no assertion).
Special Handling Properties
- Continuation Lines: For values >255 chars: CONT (inserts CR/LF, preserves formatting/spaces) or CONC (concatenates without CR/LF, splits at non-spaces to preserve spaces).
- Escape Sequences:
@#escape_text@
(followed by space); used for character set switches, calendars (e.g., @#dFRENCH R@), or special processing. Receivers discard trailing space after @. - Header Substructure: Mandatory: SOUR (source system), VERS (version, e.g., 5.5.1), CHAR (encoding, e.g., ANSEL). Optional: DATE, TIME, FILE, LANG, PLAT, GEDC.
- Trailer: Single line
0 TRLR
(no value/subordinates). - Multi-Disk Support: Files split after complete lines; first disk has HEAD, last has TRLR; filenames append digits (e.g., SMITH(00).GED).
- Validation Constraints: No duplicate XREFs; levels monotonic; header/trailer present; lines non-empty.
These properties ensure .GED files are human-readable, system-agnostic, and focused on lineage-linked data.
2. Two Direct Download Links for Files of Format .GED
- https://www.gedcom.org/samples/555SAMPLE.GED (Official GEDCOM 5.5.5 sample file in UTF-8)
- https://www.gedcom.org/samples/555SAMPLE16LE.GED (Official GEDCOM 5.5.5 sample file in UTF-16LE)
3. Ghost Blog Embedded HTML JavaScript
This is a self-contained HTML snippet with embedded JavaScript for drag-and-drop upload of a .GED file. It reads the file as text (assuming UTF-8 for simplicity; detection is approximate via heuristics), parses lines, analyzes format properties (e.g., line endings, max length, structure checks), and dumps them to a <div>
on the page. Embed this in a Ghost blog post via the HTML card.
4. Python Class
This Python class opens a .GED file, reads it (with UTF-8 fallback), parses lines, analyzes properties, prints them to console, and supports writing back a modified version (e.g., appends a note).
import sys
import re
class GEDParser:
def __init__(self):
self.properties = {
'File Extension': '.GED',
'Supported Encodings': 'ANSEL, ASCII, UTF-8, Unicode',
'Line Terminators': 'CR, LF, CRLF, LFCR',
'Max Line Length': '255 chars',
'Overall Structure': 'Header (HEAD), Records (level 0), Trailer (TRLR)',
'Line Format': 'level [space] [@XREF@] [space] tag [space] value',
'Levels': '0-99, no leading zeros, incremental by <=1',
'XREFs': 'Enclosed in @, unique',
'Tags': 'Uppercase 3-4 letters',
'Continuation': 'CONT (with CR), CONC (without)',
'Escapes': '@#text@ ',
'Header Req.': 'SOUR, VERS=5.5.1, CHAR',
'Trailer': '0 TRLR'
}
def read_and_decode(self, filename):
try:
with open(filename, 'r', encoding='utf-8', errors='replace') as f:
content = f.read()
except UnicodeDecodeError:
with open(filename, 'r', encoding='latin-1') as f: # Fallback for ANSEL approx.
content = f.read()
lines = content.splitlines(keepends=True) # Preserve endings
detected_ending = self._detect_line_ending(content)
max_length = max(len(line.rstrip('\r\n')) for line in lines)
has_header = any(line.strip().startswith('0 HEAD') for line in lines)
has_trailer = any(line.strip().startswith('0 TRLR') for line in lines)
parsed = []
for i, line in enumerate(lines):
match = re.match(r'^(\d+)(?:\s+@([^@]+)@)?\s+([A-Z]{3,4})(?:\s+(.*))?$', line.rstrip('\r\n'))
if match:
parsed.append({'index': i, 'level': match.group(1), 'xref': match.group(2) or '', 'tag': match.group(3), 'value': match.group(4) or ''})
else:
parsed.append({'index': i, 'raw': line.rstrip('\r\n')})
return parsed, max_length, detected_ending, has_header, has_trailer
def print_properties(self, parsed, max_length, detected_ending, has_header, has_trailer):
print('=== GEDCOM Properties Dump ===')
for key, val in self.properties.items():
extra = ''
if key == 'Max Line Length':
extra = f' (File max: {max_length})'
if key == 'Line Terminators':
extra = f' (Detected: {detected_ending})'
if key == 'Overall Structure':
extra = f' (Has HEAD: {has_header}, Has TRLR: {has_trailer})'
print(f'{key}: {val}{extra}')
print('\n=== Parsed Lines Sample (First 20) ===')
for p in parsed[:20]:
if 'raw' in p:
print(f'Line {p["index"]}: {p["raw"]}')
else:
print(f'Line {p["index"]}: Level={p["level"]}, XREF={p["xref"]}, Tag={p["tag"]}, Value={p["value"]}')
def write_file(self, filename, parsed, append_note=''):
with open(filename + '_modified.ged', 'w', encoding='utf-8') as f:
for p in parsed:
if 'raw' in p:
f.write(p['raw'] + '\n')
else:
line = f"{p['level']} {p['xref']} {p['tag']} {p['value']}".strip()
if append_note and p['tag'] == 'HEAD':
line += ' NOTE ' + append_note
f.write(line + '\n')
f.write('0 TRLR\n') # Ensure trailer
print(f'Wrote modified file: {filename}_modified.ged')
def _detect_line_ending(self, content):
if '\r\n' in content:
return 'CRLF'
elif '\r' in content:
return 'CR'
elif '\n' in content:
return 'LF'
return 'Unknown'
# Usage
if __name__ == '__main__':
if len(sys.argv) < 2:
print('Usage: python ged_parser.py <file.ged>')
sys.exit(1)
parser = GEDParser()
parsed, max_len, ending, header, trailer = parser.read_and_decode(sys.argv[1])
parser.print_properties(parsed, max_len, ending, header, trailer)
# Example write
parser.write_file(sys.argv[1], parsed, 'Modified by Python parser')
5. Java Class
This Java class uses BufferedReader
for reading (UTF-8 fallback), parses lines, prints properties to console, and writes a modified version to a new file.
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
public class GEDParser {
private final Map<String, String> properties = Map.of(
"File Extension", ".GED",
"Supported Encodings", "ANSEL, ASCII, UTF-8, Unicode",
"Line Terminators", "CR, LF, CRLF, LFCR",
"Max Line Length", "255 chars",
"Overall Structure", "Header (HEAD), Records (level 0), Trailer (TRLR)",
"Line Format", "level [space] [@XREF@] [space] tag [space] value",
"Levels", "0-99, no leading zeros, incremental by <=1",
"XREFs", "Enclosed in @, unique",
"Tags", "Uppercase 3-4 letters",
"Continuation", "CONT (with CR), CONC (without)",
"Escapes", "@#text@ ",
"Header Req.", "SOUR, VERS=5.5.1, CHAR",
"Trailer", "0 TRLR"
);
public static class ParsedLine {
int index;
String level, xref, tag, value;
String raw;
ParsedLine(int index, String level, String xref, String tag, String value) {
this.index = index;
this.level = level;
this.xref = xref;
this.tag = tag;
this.value = value;
}
ParsedLine(int index, String raw) {
this.index = index;
this.raw = raw;
}
}
public List<ParsedLine> readAndDecode(String filename) throws IOException {
String content = new String(Files.readAllBytes(Paths.get(filename)), StandardCharsets.UTF_8);
List<String> lines = Arrays.asList(content.split("\r\n|\n|\r"));
String detectedEnding = detectLineEnding(content);
int maxLength = lines.stream().mapToInt(String::length).max().orElse(0);
boolean hasHeader = lines.stream().anyMatch(l -> l.trim().startsWith("0 HEAD"));
boolean hasTrailer = lines.stream().anyMatch(l -> l.trim().startsWith("0 TRLR"));
List<ParsedLine> parsed = new ArrayList<>();
Pattern pattern = Pattern.compile("^(\\d+)(?:\\s+@([^@]+)@)?\\s+([A-Z]{3,4})(?:\\s+(.*))?$");
for (int i = 0; i < lines.size(); i++) {
String line = lines.get(i).trim();
Matcher matcher = pattern.matcher(line);
if (matcher.matches()) {
parsed.add(new ParsedLine(i, matcher.group(1), matcher.group(2) != null ? matcher.group(2) : "",
matcher.group(3), matcher.group(4) != null ? matcher.group(4) : ""));
} else {
parsed.add(new ParsedLine(i, line));
}
}
System.out.println("=== GEDCOM Properties Dump ===");
for (Map.Entry<String, String> entry : properties.entrySet()) {
String key = entry.getKey(), val = entry.getValue();
String extra = "";
if (key.equals("Max Line Length")) extra = " (File max: " + maxLength + ")";
if (key.equals("Line Terminators")) extra = " (Detected: " + detectedEnding + ")";
if (key.equals("Overall Structure")) extra = " (Has HEAD: " + hasHeader + ", Has TRLR: " + hasTrailer + ")";
System.out.println(key + ": " + val + extra);
}
System.out.println("\n=== Parsed Lines Sample (First 20) ===");
for (int i = 0; i < Math.min(20, parsed.size()); i++) {
ParsedLine p = parsed.get(i);
if (p.raw != null) {
System.out.println("Line " + p.index + ": " + p.raw);
} else {
System.out.println("Line " + p.index + ": Level=" + p.level + ", XREF=" + p.xref + ", Tag=" + p.tag + ", Value=" + p.value);
}
}
return parsed;
}
public void writeFile(String filename, List<ParsedLine> parsed, String appendNote) throws IOException {
try (PrintWriter writer = new PrintWriter(new FileWriter(filename + "_modified.ged"), true, StandardCharsets.UTF_8)) {
for (ParsedLine p : parsed) {
if (p.raw != null) {
writer.println(p.raw);
} else {
String line = p.level + (p.xref.isEmpty() ? "" : " @" + p.xref + "@") + " " + p.tag + (p.value.isEmpty() ? "" : " " + p.value);
if (appendNote != null && p.tag.equals("HEAD")) {
line += " NOTE " + appendNote;
}
writer.println(line);
}
}
writer.println("0 TRLR");
}
System.out.println("Wrote modified file: " + filename + "_modified.ged");
}
private String detectLineEnding(String content) {
if (content.contains("\r\n")) return "CRLF";
if (content.contains("\r")) return "CR";
if (content.contains("\n")) return "LF";
return "Unknown";
}
public static void main(String[] args) {
if (args.length < 1) {
System.out.println("Usage: java GEDParser <file.ged>");
return;
}
try {
GEDParser parser = new GEDParser();
List<ParsedLine> parsed = parser.readAndDecode(args[0]);
parser.writeFile(args[0], parsed, "Modified by Java parser");
} catch (IOException e) {
e.printStackTrace();
}
}
}
6. JavaScript Class (Node.js)
This Node.js class uses fs
for reading/writing, parses lines, prints properties to console. Run with node ged_parser.js file.ged
.
const fs = require('fs');
const path = require('path');
class GEDParser {
constructor() {
this.properties = {
'File Extension': '.GED',
'Supported Encodings': 'ANSEL, ASCII, UTF-8, Unicode',
'Line Terminators': 'CR, LF, CRLF, LFCR',
'Max Line Length': '255 chars',
'Overall Structure': 'Header (HEAD), Records (level 0), Trailer (TRLR)',
'Line Format': 'level [space] [@XREF@] [space] tag [space] value',
'Levels': '0-99, no leading zeros, incremental by <=1',
'XREFs': 'Enclosed in @, unique',
'Tags': 'Uppercase 3-4 letters',
'Continuation': 'CONT (with CR), CONC (without)',
'Escapes': '@#text@ ',
'Header Req.': 'SOUR, VERS=5.5.1, CHAR',
'Trailer': '0 TRLR'
};
}
readAndDecode(filename) {
let content;
try {
content = fs.readFileSync(filename, 'utf8');
} catch (err) {
content = fs.readFileSync(filename, 'latin1'); // Fallback
}
const lines = content.split(/\r\n|\n|\r/);
const detectedEnding = this.detectLineEnding(content);
const maxLength = Math.max(...lines.map(l => l.length));
const hasHeader = lines.some(l => l.trim().startsWith('0 HEAD'));
const hasTrailer = lines.some(l => l.trim().startsWith('0 TRLR'));
const parsed = lines.map((line, idx) => {
const match = line.match(/^(\d+)(?:\s+@([^@]+)@)?\s+([A-Z]{3,4})(?:\s+(.*))?$/);
if (match) {
return { index: idx, level: match[1], xref: match[2] || '', tag: match[3], value: match[4] || '' };
}
return { index: idx, raw: line };
});
console.log('=== GEDCOM Properties Dump ===');
for (const [key, val] of Object.entries(this.properties)) {
let extra = '';
if (key === 'Max Line Length') extra = ` (File max: ${maxLength})`;
if (key === 'Line Terminators') extra = ` (Detected: ${detectedEnding})`;
if (key === 'Overall Structure') extra = ` (Has HEAD: ${hasHeader}, Has TRLR: ${hasTrailer})`;
console.log(`${key}: ${val}${extra}`);
}
console.log('\n=== Parsed Lines Sample (First 20) ===');
parsed.slice(0, 20).forEach(p => {
if (p.raw !== undefined) {
console.log(`Line ${p.index}: ${p.raw}`);
} else {
console.log(`Line ${p.index}: Level=${p.level}, XREF=${p.xref}, Tag=${p.tag}, Value=${p.value}`);
}
});
return parsed;
}
writeFile(filename, parsed, appendNote = '') {
const output = parsed.map(p => {
if (p.raw !== undefined) return p.raw;
let line = `${p.level} ${p.xref ? `@${p.xref}@ ` : ''}${p.tag} ${p.value || ''}`.trim();
if (appendNote && p.tag === 'HEAD') line += ` NOTE ${appendNote}`;
return line;
}).join('\n') + '\n0 TRLR\n';
fs.writeFileSync(filename + '_modified.ged', output, 'utf8');
console.log(`Wrote modified file: ${filename}_modified.ged`);
}
detectLineEnding(content) {
if (content.includes('\r\n')) return 'CRLF';
if (content.includes('\r')) return 'CR';
if (content.includes('\n')) return 'LF';
return 'Unknown';
}
}
// Usage
if (require.main === module) {
if (process.argv.length < 3) {
console.log('Usage: node ged_parser.js <file.ged>');
process.exit(1);
}
const parser = new GEDParser();
const parsed = parser.readAndDecode(process.argv[2]);
parser.writeFile(process.argv[2], parsed, 'Modified by JS parser');
}
module.exports = GEDParser;
7. C "Class" (Struct with Functions)
C lacks classes, so this uses a struct with functions for reading (UTF-8 via iconv fallback simulation; assumes ASCII/UTF-8), parsing, printing to stdout, and writing. Compile with gcc ged_parser.c -o ged_parser
(needs <iconv.h> for encoding, but simplified here). Run ./ged_parser file.ged
.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <regex.h> // For parsing (POSIX regex)
typedef struct {
char *properties[13][2]; // Key-value pairs
} GEDProperties;
typedef struct {
int index;
char *level;
char *xref;
char *tag;
char *value;
char *raw; // If unparsed
} ParsedLine;
GEDProperties props = {
.properties = {{"File Extension", ".GED"},
{"Supported Encodings", "ANSEL, ASCII, UTF-8, Unicode"},
{"Line Terminators", "CR, LF, CRLF, LFCR"},
{"Max Line Length", "255 chars"},
{"Overall Structure", "Header (HEAD), Records (level 0), Trailer (TRLR)"},
{"Line Format", "level [space] [@XREF@] [space] tag [space] value"},
{"Levels", "0-99, no leading zeros, incremental by <=1"},
{"XREFs", "Enclosed in @, unique"},
{"Tags", "Uppercase 3-4 letters"},
{"Continuation", "CONT (with CR), CONC (without)"},
{"Escapes", "@#text@ "},
{"Header Req.", "SOUR, VERS=5.5.1, CHAR"},
{"Trailer", "0 TRLR"}}
};
void init_props(GEDProperties *p) {
// Already initialized
}
ParsedLine *read_and_decode(const char *filename, int *num_lines, int *max_len, char *detected_ending, int *has_header, int *has_trailer) {
FILE *fp = fopen(filename, "r");
if (!fp) {
perror("fopen");
exit(1);
}
fseek(fp, 0, SEEK_END);
long fsize = ftell(fp);
fseek(fp, 0, SEEK_SET);
char *content = malloc(fsize + 1);
fread(content, 1, fsize, fp);
content[fsize] = 0;
fclose(fp);
// Simple line split (assume \n for simplicity)
*num_lines = 0;
for (char *ptr = content; *ptr; ptr++) if (*ptr == '\n') (*num_lines)++;
ParsedLine *parsed = malloc(sizeof(ParsedLine) * (*num_lines + 1));
char *line_start = content;
int i = 0, local_max = 0;
*detected_ending = 'N'; // Simplified: check for \r\n
if (strstr(content, "\r\n")) *detected_ending = 'L'; // CRLF
else if (strstr(content, "\r")) *detected_ending = 'C';
else *detected_ending = 'F'; // LF
*has_header = strstr(content, "0 HEAD") != NULL;
*has_trailer = strstr(content, "0 TRLR") != NULL;
regex_t regex;
regcomp(®ex, "^(\\d+)(?:\\s+@([^@]+)@)?\\s+([A-Z]{3,4})(?:\\s+(.*))?$", REG_EXTENDED);
char *saveptr;
for (char *line = strtok_r(content, "\r\n", &saveptr); line; line = strtok_r(NULL, "\r\n", &saveptr), i++) {
int len = strlen(line);
if (len > local_max) local_max = len;
parsed[i].index = i;
regmatch_t matches[5];
if (regexec(®ex, line, 5, matches, 0) == 0) {
parsed[i].level = strndup(line + matches[1].rm_so, matches[1].rm_eo - matches[1].rm_so);
parsed[i].xref = (matches[2].rm_so != -1) ? strndup(line + matches[2].rm_so, matches[2].rm_eo - matches[2].rm_so) : strdup("");
parsed[i].tag = strndup(line + matches[3].rm_so, matches[3].rm_eo - matches[3].rm_so);
parsed[i].value = (matches[4].rm_so != -1) ? strndup(line + matches[4].rm_so, matches[4].rm_eo - matches[4].rm_so) : strdup("");
parsed[i].raw = NULL;
} else {
parsed[i].raw = strdup(line);
parsed[i].level = parsed[i].xref = parsed[i].tag = parsed[i].value = NULL;
}
}
*max_len = local_max;
parsed[*num_lines].level = NULL; // Sentinel
regfree(®ex);
free(content);
return parsed;
}
void print_properties(ParsedLine *parsed, int num_lines, int max_len, char detected_ending, int has_header, int has_trailer) {
printf("=== GEDCOM Properties Dump ===\n");
for (int j = 0; j < 13; j++) {
printf("%s: %s", props.properties[j][0], props.properties[j][1]);
if (strcmp(props.properties[j][0], "Max Line Length") == 0) printf(" (File max: %d)", max_len);
if (strcmp(props.properties[j][0], "Line Terminators") == 0) {
printf(" (Detected: %c)", detected_ending == 'L' ? 'L' : (detected_ending == 'C' ? 'C' : 'F'));
}
if (strcmp(props.properties[j][0], "Overall Structure") == 0) printf(" (Has HEAD: %s, Has TRLR: %s)", has_header ? "Yes" : "No", has_trailer ? "Yes" : "No");
printf("\n");
}
printf("\n=== Parsed Lines Sample (First 20) ===\n");
for (int k = 0; k < num_lines && k < 20; k++) {
if (parsed[k].raw) {
printf("Line %d: %s\n", parsed[k].index, parsed[k].raw);
} else {
printf("Line %d: Level=%s, XREF=%s, Tag=%s, Value=%s\n", parsed[k].index, parsed[k].level, parsed[k].xref, parsed[k].tag, parsed[k].value);
}
}
}
void write_file(const char *filename, ParsedLine *parsed, int num_lines, const char *append_note) {
FILE *fp = fopen(strcat((char*)filename, "_modified.ged"), "w");
if (!fp) {
perror("fopen write");
exit(1);
}
for (int m = 0; m < num_lines; m++) {
if (parsed[m].raw) {
fprintf(fp, "%s\n", parsed[m].raw);
} else {
char line[256];
snprintf(line, sizeof(line), "%s %s %s %s", parsed[m].level,
parsed[m].xref[0] ? parsed[m].xref : "",
parsed[m].tag, parsed[m].value[0] ? parsed[m].value : "");
if (append_note && strcmp(parsed[m].tag, "HEAD") == 0) {
strcat(line, " NOTE ");
strcat(line, append_note);
}
fprintf(fp, "%s\n", line);
}
}
fprintf(fp, "0 TRLR\n");
fclose(fp);
printf("Wrote modified file: %s\n", filename);
}
int main(int argc, char *argv[]) {
if (argc < 2) {
printf("Usage: %s <file.ged>\n", argv[0]);
return 1;
}
int num_lines, max_len, has_header, has_trailer;
char ending;
ParsedLine *parsed = read_and_decode(argv[1], &num_lines, &max_len, &ending, &has_header, &has_trailer);
print_properties(parsed, num_lines, max_len, ending, has_header, has_trailer);
write_file(argv[1], parsed, num_lines, "Modified by C parser");
// Cleanup
for (int n = 0; n < num_lines; n++) {
if (parsed[n].raw) free(parsed[n].raw);
else {
free(parsed[n].level); free(parsed[n].xref); free(parsed[n].tag); free(parsed[n].value);
}
}
free(parsed);
return 0;
}