Task 648: .SDF File Format

Task 648: .SDF File Format

1. List of Properties Intrinsic to the .SDF File Format

The .SDF (Structure Data File) format, developed by MDL (now BIOVIA), is an ASCII-based, line-oriented file format for storing multiple chemical structures (molecules) along with associated data fields. It extends the Molfile format by allowing multiple records and appended data items per structure. The format does not define a "file system" per se but has an intrinsic hierarchical structure for records, structures, and properties. Below is a comprehensive list of all intrinsic properties, derived from the official specification. These include structural elements, field formats, and constraints that define the format's integrity.

File-Level Properties

  • Encoding and Orientation: ASCII text (UTF-8 compatible); strictly line-oriented with line breaks (CR, LF, or CRLF) defining boundaries. No binary data; all content is human-readable.
  • Record Structure: Multiple records, each representing one molecule + optional data. Records are delimited by a line containing exactly "$$$$" (four dollar signs).
  • Line Length Constraints: No strict maximum (legacy 80 characters removed in current spec); data lines limited to 200 characters excluding line terminator.
  • Version Support: Supports V2000 (legacy) and V3000 (extended) Molfile variants within records; mixed versions allowed across records.
  • Dimensionality: Supports 2D or 3D coordinates; indicated in header (2D/3D specifier).

Header Block Properties (Lines 1-4 of Each Molfile Record)

  • Title Line (Line 1): Up to 80 characters; descriptive molecule name or identifier (e.g., "Acetone").
  • Program/Timestamp Line (Line 2): Fixed format – Columns 1-10: originating program name; 11-17: creation date (MMDDYY); 18-21: creation time (HHMM); 22: dimensionality flag (2=2D, 3=3D).
  • Comment Line (Line 3): Up to 80 characters; optional user comment or additional metadata (e.g., source or export note).
  • Counts Line (Line 4): Fixed-width fields – Columns 1-3: number of atoms (integer ≥0); 4-6: number of bonds (integer ≥0); 7-9: number of atom lists (usually 0); 10: chiral flag (0=absent, 1=present); 11-12: number of STEXT entries; 13-15: number of property lines; 16-18: version indicator (e.g., "V2000"); for V3000: fixed "0 0 0 0 0 999 V3000" with explicit counts in CTAB block.

Atom Block Properties (Lines 5 to 4 + Number of Atoms)

  • One line per atom, fixed-width format (80 characters).
  • Coordinates: Columns 1-10: X (real, angstroms); 11-20: Y; 21-30: Z.
  • Element Symbol: Columns 32-33: Chemical symbol (e.g., "C", "N"); padded with spaces.
  • Mass Difference: Column 35: Integer (-3 to +5; 0=normal isotope).
  • Charge: Columns 36-37: Integer (0=uncharged, 1-7 positive, -1 to -4 negative, 15 special cases).
  • Atom Stereo Parity: Columns 39-40: Integer (0-3 for tetrahedral stereo).
  • Hydrogen Count: Column 42: Total hydrogens (0-8).
  • Stereo Care Box: Column 44: 0=ignore stereo, 1=relevant.
  • Valence: Column 46: 0-15 (total valence electrons).
  • H0 Designator: Column 48: 1 if no implicit hydrogens, 0 otherwise.
  • Not Used: Column 50: Reserved.
  • Atom-Atom Mapping Number: Columns 52-54: Integer for reaction mapping.
  • Inversion/Retention Flag: Column 56: For reactions (0=not specified, 1=inversion, 2=retention).
  • Exact Change Flag: Column 58: For reactions (0=unmarked, 1=exact change).

Bond Block Properties (Lines Following Atom Block, Number of Bonds Lines)

  • One line per bond, fixed-width format (80 characters).
  • Atom Indices: Columns 1-3: First atom number (1-based index); 4-6: Second atom number.
  • Bond Type: Columns 7-9: Integer (1=single, 2=double, 3=triple, 4=aromatic, 5=single/double, 6=single/aromatic, 7=double/aromatic, 8=any, 0=undefined).
  • Bond Stereo: Columns 10-12: Integer (0=none, 1-3 for single bond up/down, 3=cis/trans for double).
  • Not Used: Columns 13-15: Reserved (0).
  • Bond Topology: Columns 16-18: 0=either, 1=ring, 2=chain.
  • Reacting Center Status: Columns 19-21: Integer (0=unmarked, positive for centers in reactions, e.g., 1=atom change, 4=bond made, 8=bond order change).

Properties Block Properties (Optional, Following Bond Block)

  • Lines starting with "M  " for molecule properties.
  • Charge Block (M  CHG): Specifies formal charges on atoms (e.g., "M  CHG 2 3 +1 5 -1" – 2 alterations, atom 3 +1, atom 5 -1).
  • Isotope Block (M  ISO): Specifies isotopes (e.g., "M  ISO 1 2 18" – 1 isotope, atom 2 mass 18).
  • Radical Block (M  RAD): Specifies radical electrons (e.g., "M  RAD 1 3 2" – atom 3 has 2 unpaired electrons).
  • R-Group Block (M  RGP): For generic structures.
  • Attachment Points (M  APT): For reactions.
  • Other M records: For stereo, enhanced stereo, 3D features (in V3000).
  • Terminator: "M  END" line ends the Molfile.

V3000-Specific Properties (Extended Format)

  • CTAB Block: Enclosed in "M  V30 BEGIN CTAB" and "M  V30 END CTAB".
  • V3000 Counts Line: "M  V30 COUNTS na nb nsg n3d chiral [REGNO=nnn]".
  • na: atoms; nb: bonds; nsg: superatoms/groups; n3d: 3D features; chiral: 0/1.
  • Atom Block: Enclosed in "M  V30 BEGIN ATOM" / "M  V30 END ATOM"; key-value properties (e.g., "M  V30 1 C 0.0 0.0 0.0 CHG=1 MASS=13 CFG=2").
  • Additional keys: CHG (charge), MASS (isotope), VAL (valence), HYD (hydrogens), RAD (radical), etc.
  • Bond Block: Enclosed in "M  V30 BEGIN BOND" / "M  V30 END BOND"; key-value (e.g., "M  V30 1 1 1 2 CFG=1").
  • Additional keys: CFG (stereo), TOP (topology).

Data Items Properties (Post-Molfile, Per Record)

  • Header Line: Starts with "> "; followed by field identifier (<field_name> in angle brackets or "DTn" for numeric); optional trailing text (e.g., "> <Mol_Wt>(PubChem CID 123)").
  • Field name: Alphanumeric + underscores, starting with letter; no length limit.
  • Value Block: 0+ lines of data (up to 200 chars/line); terminated by a blank line (or end of record).
  • Valueless Fields: Header followed immediately by blank line.
  • Ordering: Arbitrary; no fixed order or required fields (application-defined, e.g., , , ).

These properties ensure the format's portability and extensibility for chemical informatics applications.

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SDF Parsing

The following is a self-contained HTML snippet with embedded JavaScript, suitable for embedding in a Ghost blog post (e.g., via the HTML card). It enables drag-and-drop of an .SDF file, parses it using vanilla JS (no libraries), extracts key properties (record count, per-record headers, atom/bond counts, sample atoms/bonds, and all data fields), and dumps them to a scrollable <pre> element on screen. Write functionality reconstructs a basic .SDF file and offers download. Parsing is basic (assumes V2000 for simplicity; extend for V3000).

Drag and drop an .SDF file here to parse properties.




4. Python Class for .SDF Handling

The following Python class (SDFHandler) opens an .SDF file, parses it (basic V2000 support), decodes and prints all listed properties to console, and supports writing a reconstructed file. It uses standard libraries (no external dependencies like RDKit for portability).

import re
from typing import Dict, List, Any

class SDFHandler:
    def __init__(self, filepath: str = None):
        self.filepath = filepath
        self.content = ''
        self.records: List[Dict[str, Any]] = []

    def read(self) -> int:
        """Read and split into records."""
        if self.filepath:
            with open(self.filepath, 'r') as f:
                self.content = f.read()
        lines = self.content.splitlines()
        record_lines = []
        current_record = []
        for line in lines:
            if line.strip() == '$$$$':
                if current_record:
                    record_lines.append('\n'.join(current_record))
                    current_record = []
            else:
                current_record.append(line)
        if current_record:
            record_lines.append('\n'.join(current_record))
        self.records = [self._extract_properties(rec) for rec in record_lines]
        return len(self.records)

    def _extract_properties(self, record_content: str) -> Dict[str, Any]:
        """Extract all intrinsic properties from a record."""
        lines = record_content.splitlines()
        props = {'header': {}, 'atoms': [], 'bonds': [], 'data_fields': {}}
        # Header
        if len(lines) > 0: props['header']['title'] = lines[0].strip()
        if len(lines) > 1: props['header']['timestamp'] = lines[1].strip()
        if len(lines) > 2: props['header']['comment'] = lines[2].strip()
        # Counts
        if len(lines) > 3:
            count_match = re.match(r'(\d{3})(\d{3})', lines[3])
            if count_match:
                props['header']['atoms_count'] = int(count_match.group(1))
                props['header']['bonds_count'] = int(count_match.group(2))
        # Atoms
        atom_start = 4
        for i in range(props['header'].get('atoms_count', 0)):
            line = lines[atom_start + i] if atom_start + i < len(lines) else ''
            atom_match = re.match(r'(.{10})(.{10})(.{10})\s*(\w{1,3})', line)
            if atom_match:
                props['atoms'].append({
                    'x': float(atom_match.group(1).strip()),
                    'y': float(atom_match.group(2).strip()),
                    'z': float(atom_match.group(3).strip()),
                    'element': atom_match.group(4).strip()
                })
        # Bonds
        bond_start = atom_start + props['header'].get('atoms_count', 0)
        for i in range(props['header'].get('bonds_count', 0)):
            line = lines[bond_start + i] if bond_start + i < len(lines) else ''
            bond_match = re.match(r'(\d{3})(\d{3})(\d{3})', line)
            if bond_match:
                props['bonds'].append({
                    'atom1': int(bond_match.group(1)),
                    'atom2': int(bond_match.group(2)),
                    'type': int(bond_match.group(3))
                })
        # Data fields (simplified)
        data_key = None
        data_value = []
        for line in lines[bond_start + props['header'].get('bonds_count', 0):]:
            line_trim = line.strip()
            if line_trim.startswith('>'):
                if data_key:
                    props['data_fields'][data_key] = '\n'.join(data_value).strip()
                data_key = re.sub(r'^> ?<|>$', '', line_trim)
                data_value = []
            elif line_trim == '' and data_key:
                props['data_fields'][data_key] = '\n'.join(data_value).strip()
                data_key = None
                data_value = []
            elif data_key:
                data_value.append(line_trim)
            if 'M  END' in line_trim:
                break
        if data_key:
            props['data_fields'][data_key] = '\n'.join(data_value).strip()
        return props

    def print_properties(self):
        """Print all properties to console."""
        print(f"Number of records: {len(self.records)}\n")
        for idx, rec in enumerate(self.records, 1):
            print(f"--- Record {idx} ---")
            print(f"Title: {rec['header'].get('title', 'N/A')}")
            print(f"Timestamp: {rec['header'].get('timestamp', 'N/A')}")
            print(f"Comment: {rec['header'].get('comment', 'N/A')}")
            print(f"Atoms count: {rec['header'].get('atoms_count', 0)}, Bonds count: {rec['header'].get('bonds_count', 0)}")
            print("Sample Atoms (first 3):")
            for atom in rec['atoms'][:3]:
                print(f"  {atom['element']}: ({atom['x']:.4f}, {atom['y']:.4f}, {atom['z']:.4f})")
            print("Sample Bonds (first 3):")
            for bond in rec['bonds'][:3]:
                print(f"  {bond['atom1']}-{bond['atom2']} (type {bond['type']})")
            print("Data Fields:")
            for key, val in rec['data_fields'].items():
                print(f"  {key}: {val[:100]}...")
            print("\n")

    def write(self, output_path: str):
        """Write reconstructed SDF to file."""
        with open(output_path, 'w') as f:
            for rec in self.records:
                f.write(f"{rec['header'].get('title', '')}\n")
                f.write(f"{rec['header'].get('timestamp', '')}\n")
                f.write(f"{rec['header'].get('comment', '')}\n")
                atoms_cnt = rec['header'].get('atoms_count', 0)
                bonds_cnt = rec['header'].get('bonds_count', 0)
                f.write(f"{str(atoms_cnt).zfill(3)}{str(bonds_cnt).zfill(3)}000 0 0999 V2000\n")
                for atom in rec['atoms']:
                    x_str = f"{atom['x']:.4f}".rjust(10)
                    y_str = f"{atom['y']:.4f}".rjust(10)
                    z_str = f"{atom['z']:.4f}".rjust(10)
                    elem_str = atom['element'].ljust(3)
                    f.write(f"{x_str}{y_str}{z_str} {elem_str}  0  0  0  0  0  0  0  0  0  0  0  0\n")
                for bond in rec['bonds']:
                    f.write(f"{str(bond['atom1']).zfill(3)}{str(bond['atom2']).zfill(3)}{str(bond['type']).zfill(3)}  0  0  0  0\n")
                f.write("M  END\n")
                for key, val in rec['data_fields'].items():
                    f.write(f"> <{key}>\n{val}\n\n")
                f.write("$$$$\n")

# Example usage:
# handler = SDFHandler('example.sdf')
# handler.read()
# handler.print_properties()
# handler.write('output.sdf')

5. Java Class for .SDF Handling

The following Java class (SDFHandler) uses standard I/O for reading/writing .SDF files, parses basic properties (V2000), and prints to console via System.out. It supports reconstruction for writing.

import java.io.*;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class SDFHandler {
    private String filepath;
    private String content;
    private List<Map<String, Object>> records = new ArrayList<>();

    public SDFHandler(String filepath) {
        this.filepath = filepath;
    }

    public int read() throws IOException {
        try (BufferedReader br = new BufferedReader(new FileReader(filepath))) {
            StringBuilder sb = new StringBuilder();
            String line;
            while ((line = br.readLine()) != null) {
                sb.append(line).append("\n");
            }
            content = sb.toString();
        }
        String[] recordStrs = content.split("\\$\\$\\$\\$");
        for (String rec : recordStrs) {
            if (!rec.trim().isEmpty()) {
                records.add(extractProperties(rec));
            }
        }
        return records.size();
    }

    private Map<String, Object> extractProperties(String recordContent) {
        String[] lines = recordContent.split("\n");
        Map<String, Object> props = new HashMap<>();
        Map<String, String> header = new HashMap<>();
        List<Map<String, Object>> atoms = new ArrayList<>();
        List<Map<String, Object>> bonds = new ArrayList<>();
        Map<String, String> dataFields = new HashMap<>();

        // Header
        if (lines.length > 0) header.put("title", lines[0].trim());
        if (lines.length > 1) header.put("timestamp", lines[1].trim());
        if (lines.length > 2) header.put("comment", lines[2].trim());
        // Counts
        int atomsCount = 0, bondsCount = 0;
        if (lines.length > 3) {
            Pattern countPat = Pattern.compile("(\\d{3})(\\d{3})");
            Matcher m = countPat.matcher(lines[3]);
            if (m.find()) {
                atomsCount = Integer.parseInt(m.group(1));
                bondsCount = Integer.parseInt(m.group(2));
            }
        }
        header.put("atoms_count", atomsCount);
        header.put("bonds_count", bondsCount);

        // Atoms
        int atomStart = 4;
        Pattern atomPat = Pattern.compile("(.{10})(.{10})(.{10})\\s*(\\w{1,3})");
        for (int i = 0; i < atomsCount; i++) {
            if (atomStart + i < lines.length) {
                Matcher atomM = atomPat.matcher(lines[atomStart + i]);
                if (atomM.find()) {
                    Map<String, Object> atom = new HashMap<>();
                    atom.put("x", Double.parseDouble(atomM.group(1).trim()));
                    atom.put("y", Double.parseDouble(atomM.group(2).trim()));
                    atom.put("z", Double.parseDouble(atomM.group(3).trim()));
                    atom.put("element", atomM.group(4).trim());
                    atoms.add(atom);
                }
            }
        }

        // Bonds
        int bondStart = atomStart + atomsCount;
        Pattern bondPat = Pattern.compile("(\\d{3})(\\d{3})(\\d{3})");
        for (int i = 0; i < bondsCount; i++) {
            if (bondStart + i < lines.length) {
                Matcher bondM = bondPat.matcher(lines[bondStart + i]);
                if (bondM.find()) {
                    Map<String, Object> bond = new HashMap<>();
                    bond.put("atom1", Integer.parseInt(bondM.group(1)));
                    bond.put("atom2", Integer.parseInt(bondM.group(2)));
                    bond.put("type", Integer.parseInt(bondM.group(3)));
                    bonds.add(bond);
                }
            }
        }

        // Data fields (basic)
        String dataKey = null;
        StringBuilder dataVal = new StringBuilder();
        for (int i = bondStart + bondsCount; i < lines.length; i++) {
            String ltrim = lines[i].trim();
            if (ltrim.startsWith(">")) {
                if (dataKey != null) {
                    dataFields.put(dataKey, dataVal.toString().trim());
                }
                int startIdx = ltrim.indexOf("<");
                int endIdx = ltrim.indexOf(">");
                dataKey = (startIdx > -1 && endIdx > -1) ? ltrim.substring(startIdx + 1, endIdx) : ltrim.substring(2).trim();
                dataVal = new StringBuilder();
            } else if (ltrim.isEmpty() && dataKey != null) {
                dataFields.put(dataKey, dataVal.toString().trim());
                dataKey = null;
            } else if (dataKey != null) {
                dataVal.append(ltrim).append("\n");
            }
            if (ltrim.contains("M  END")) break;
        }
        if (dataKey != null) {
            dataFields.put(dataKey, dataVal.toString().trim());
        }

        props.put("header", header);
        props.put("atoms", atoms);
        props.put("bonds", bonds);
        props.put("data_fields", dataFields);
        return props;
    }

    public void printProperties() {
        System.out.println("Number of records: " + records.size() + "\n");
        for (int idx = 0; idx < records.size(); idx++) {
            Map<String, Object> rec = records.get(idx);
            System.out.println("--- Record " + (idx + 1) + " ---");
            @SuppressWarnings("unchecked")
            Map<String, String> h = (Map<String, String>) rec.get("header");
            System.out.println("Title: " + h.getOrDefault("title", "N/A"));
            System.out.println("Atoms count: " + h.getOrDefault("atoms_count", "0") + ", Bonds count: " + h.getOrDefault("bonds_count", "0"));
            System.out.println("Sample Atoms (first 3):");
            @SuppressWarnings("unchecked")
            List<Map<String, Object>> atoms = (List<Map<String, Object>>) rec.get("atoms");
            for (int j = 0; j < Math.min(3, atoms.size()); j++) {
                Map<String, Object> a = atoms.get(j);
                System.out.println("  " + a.get("element") + ": (" + String.format("%.4f", (Double) a.get("x")) + ", " +
                                   String.format("%.4f", (Double) a.get("y")) + ", " + String.format("%.4f", (Double) a.get("z")) + ")");
            }
            System.out.println("Sample Bonds (first 3):");
            @SuppressWarnings("unchecked")
            List<Map<String, Object>> bonds = (List<Map<String, Object>>) rec.get("bonds");
            for (int j = 0; j < Math.min(3, bonds.size()); j++) {
                Map<String, Object> b = bonds.get(j);
                System.out.println("  " + b.get("atom1") + "-" + b.get("atom2") + " (type " + b.get("type") + ")");
            }
            System.out.println("Data Fields:");
            @SuppressWarnings("unchecked")
            Map<String, String> dfs = (Map<String, String>) rec.get("data_fields");
            for (Map.Entry<String, String> entry : dfs.entrySet()) {
                System.out.println("  " + entry.getKey() + ": " + entry.getValue().substring(0, Math.min(100, entry.getValue().length())) + "...");
            }
            System.out.println();
        }
    }

    public void write(String outputPath) throws IOException {
        try (PrintWriter pw = new PrintWriter(new FileWriter(outputPath))) {
            for (Map<String, Object> rec : records) {
                @SuppressWarnings("unchecked")
                Map<String, String> h = (Map<String, String>) rec.get("header");
                pw.println(h.getOrDefault("title", ""));
                pw.println(h.getOrDefault("timestamp", ""));
                pw.println(h.getOrDefault("comment", ""));
                int ac = Integer.parseInt(h.getOrDefault("atoms_count", "0"));
                int bc = Integer.parseInt(h.getOrDefault("bonds_count", "0"));
                pw.printf("%3s%3s000 0 0999 V2000%n", String.valueOf(ac), String.valueOf(bc));
                @SuppressWarnings("unchecked")
                List<Map<String, Object>> atoms = (List<Map<String, Object>>) rec.get("atoms");
                for (Map<String, Object> atom : atoms) {
                    String x = String.format("%10.4f", (Double) atom.get("x"));
                    String y = String.format("%10.4f", (Double) atom.get("y"));
                    String z = String.format("%10.4f", (Double) atom.get("z"));
                    String elem = ((String) atom.get("element")).replaceAll("\\s", "");
                    pw.printf("%s%s%s %3s  0  0  0  0  0  0  0  0  0  0  0  0%n", x, y, z, elem);
                }
                @SuppressWarnings("unchecked")
                List<Map<String, Object>> bonds = (List<Map<String, Object>>) rec.get("bonds");
                for (Map<String, Object> bond : bonds) {
                    pw.printf("%3s%3s%3s  0  0  0  0%n", bond.get("atom1"), bond.get("atom2"), bond.get("type"));
                }
                pw.println("M  END");
                @SuppressWarnings("unchecked")
                Map<String, String> dfs = (Map<String, String>) rec.get("data_fields");
                for (Map.Entry<String, String> entry : dfs.entrySet()) {
                    pw.println("> <" + entry.getKey() + ">");
                    pw.println(entry.getValue());
                    pw.println();
                }
                pw.println("$$$$");
            }
        }
    }

    // Example usage:
    // SDFHandler handler = new SDFHandler("example.sdf");
    // handler.read();
    // handler.printProperties();
    // handler.write("output.sdf");
}

6. JavaScript Class for .SDF Handling

The following JavaScript class (SDFHandler) is Node.js-compatible (uses fs module) for opening files; for browser use, adapt read() to FileReader. It decodes, prints to console, and supports writing. Basic V2000 parsing.

const fs = require('fs');

class SDFHandler {
  constructor(filepath = null) {
    this.filepath = filepath;
    this.content = '';
    this.records = [];
  }

  async read() {
    if (this.filepath) {
      this.content = fs.readFileSync(this.filepath, 'utf8');
    }
    const recordStrs = this.content.split('$$$$');
    this.records = recordStrs.map(rec => this.extractProperties(rec.trim())).filter(r => Object.keys(r).length > 0);
    return this.records.length;
  }

  extractProperties(recordContent) {
    const lines = recordContent.split('\n');
    const props = { header: {}, atoms: [], bonds: [], dataFields: {} };
    // Header
    if (lines[0]) props.header.title = lines[0].trim();
    if (lines[1]) props.header.timestamp = lines[1].trim();
    if (lines[2]) props.header.comment = lines[2].trim();
    // Counts
    if (lines[3]) {
      const countMatch = lines[3].match(/(\d{3})(\d{3})/);
      if (countMatch) {
        props.header.atomsCount = parseInt(countMatch[1]);
        props.header.bondsCount = parseInt(countMatch[2]);
      }
    }
    // Atoms
    const atomStart = 4;
    for (let i = 0; i < (props.header.atomsCount || 0); i++) {
      const line = lines[atomStart + i] || '';
      const atomMatch = line.match(/(.{10})(.{10})(.{10})\s*(\w{1,3})/);
      if (atomMatch) {
        props.atoms.push({
          x: parseFloat(atomMatch[1].trim()),
          y: parseFloat(atomMatch[2].trim()),
          z: parseFloat(atomMatch[3].trim()),
          element: atomMatch[4].trim()
        });
      }
    }
    // Bonds
    const bondStart = atomStart + (props.header.atomsCount || 0);
    for (let i = 0; i < (props.header.bondsCount || 0); i++) {
      const line = lines[bondStart + i] || '';
      const bondMatch = line.match(/(\d{3})(\d{3})(\d{3})/);
      if (bondMatch) {
        props.bonds.push({
          atom1: parseInt(bondMatch[1]),
          atom2: parseInt(bondMatch[2]),
          type: parseInt(bondMatch[3])
        });
      }
    }
    // Data fields
    let dataKey = null;
    let dataValue = [];
    for (let i = bondStart + (props.header.bondsCount || 0); i < lines.length; i++) {
      let lineTrim = lines[i].trim();
      if (lineTrim.startsWith('>')) {
        if (dataKey) {
          props.dataFields[dataKey] = dataValue.join('\n').trim();
        }
        const match = lineTrim.match(/^> ?<(.+?)>/);
        dataKey = match ? match[1] : lineTrim.slice(2);
        dataValue = [];
      } else if (lineTrim === '' && dataKey) {
        props.dataFields[dataKey] = dataValue.join('\n').trim();
        dataKey = null;
      } else if (dataKey) {
        dataValue.push(lineTrim);
      }
      if (lineTrim.includes('M  END')) break;
    }
    if (dataKey) {
      props.dataFields[dataKey] = dataValue.join('\n').trim();
    }
    return props;
  }

  printProperties() {
    console.log(`Number of records: ${this.records.length}\n`);
    this.records.forEach((rec, idx) => {
      console.log(`--- Record ${idx + 1} ---`);
      console.log(`Title: ${rec.header.title || 'N/A'}`);
      console.log(`Atoms count: ${rec.header.atomsCount || 0}, Bonds count: ${rec.header.bondsCount || 0}`);
      console.log('Sample Atoms (first 3):');
      rec.atoms.slice(0, 3).forEach(atom => {
        console.log(`  ${atom.element}: (${atom.x.toFixed(4)}, ${atom.y.toFixed(4)}, ${atom.z.toFixed(4)})`);
      });
      console.log('Sample Bonds (first 3):');
      rec.bonds.slice(0, 3).forEach(bond => {
        console.log(`  ${bond.atom1}-${bond.atom2} (type ${bond.type})`);
      });
      console.log('Data Fields:');
      Object.entries(rec.dataFields).forEach(([key, val]) => {
        console.log(`  ${key}: ${val.substring(0, 100)}...`);
      });
      console.log('');
    });
  }

  write(outputPath) {
    let sdfContent = '';
    this.records.forEach(rec => {
      sdfContent += `${rec.header.title || ''}\n`;
      sdfContent += `${rec.header.timestamp || ''}\n`;
      sdfContent += `${rec.header.comment || ''}\n`;
      const ac = rec.header.atomsCount || 0;
      const bc = rec.header.bondsCount || 0;
      sdfContent += `${ac.toString().padStart(3, '0')}${bc.toString().padStart(3, '0')}000 0 0999 V2000\n`;
      rec.atoms.forEach(atom => {
        const xStr = atom.x.toFixed(4).padStart(10);
        const yStr = atom.y.toFixed(4).padStart(10);
        const zStr = atom.z.toFixed(4).padStart(10);
        const elemStr = atom.element.padEnd(3, ' ');
        sdfContent += `${xStr}${yStr}${zStr} ${elemStr}  0  0  0  0  0  0  0  0  0  0  0  0\n`;
      });
      rec.bonds.forEach(bond => {
        sdfContent += `${bond.atom1.toString().padStart(3, '0')}${bond.atom2.toString().padStart(3, '0')}${bond.type.toString().padStart(3, '0')}  0  0  0  0\n`;
      });
      sdfContent += 'M  END\n';
      Object.entries(rec.dataFields).forEach(([key, val]) => {
        sdfContent += `> <${key}>\n${val}\n\n`;
      });
      sdfContent += '$$$$\n';
    });
    fs.writeFileSync(outputPath, sdfContent);
    return sdfContent;
  }
}

// Example usage (Node.js):
// const handler = new SDFHandler('example.sdf');
// handler.read().then(() => {
//   handler.printProperties();
//   handler.write('output.sdf');
// });

7. C Code for .SDF Handling

C does not have classes, so the following is a modular set of functions structured as a "handler" with a context struct (SDFHandler). It uses standard C libraries (stdio.h, stdlib.h, string.h) for file I/O and basic parsing (V2000). Compile with gcc sdf_handler.c -o sdf_handler. The print_properties function outputs to stdout, and write reconstructs the file.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LINE 256
#define MAX_RECORDS 1000
#define MAX_FIELDS 100

typedef struct {
    char title[81];
    char timestamp[81];
    char comment[81];
    int atoms_count;
    int bonds_count;
    struct {
        char element[4];
        double x, y, z;
    } atoms[1000];  // Assume max 1000 atoms
    int atoms_num;
    struct {
        int atom1, atom2, type;
    } bonds[1000];  // Assume max 1000 bonds
    int bonds_num;
    char data_fields[MAX_FIELDS][81];  // Key-value pairs (simplified)
    int fields_num;
} Record;

typedef struct {
    char *filepath;
    char *content;
    Record records[MAX_RECORDS];
    int records_num;
} SDFHandler;

SDFHandler *sdf_handler_new(const char *filepath) {
    SDFHandler *h = malloc(sizeof(SDFHandler));
    h->filepath = strdup(filepath);
    h->content = NULL;
    h->records_num = 0;
    return h;
}

void sdf_handler_free(SDFHandler *h) {
    free(h->filepath);
    free(h->content);
    free(h);
}

int sdf_handler_read(SDFHandler *h) {
    FILE *f = fopen(h->filepath, "r");
    if (!f) return -1;
    fseek(f, 0, SEEK_END);
    long len = ftell(f);
    fseek(f, 0, SEEK_SET);
    h->content = malloc(len + 1);
    fread(h->content, 1, len, f);
    h->content[len] = '\0';
    fclose(f);

    char *ptr = h->content;
    char *next = strstr(ptr, "$$$$");
    int rec_idx = 0;
    while (next && rec_idx < MAX_RECORDS) {
        *next = '\0';  // Null-terminate record
        if (strlen(ptr) > 0) {
            extract_properties(&h->records[rec_idx], ptr);
            rec_idx++;
        }
        ptr = next + 4;
        next = strstr(ptr, "$$$$");
    }
    h->records_num = rec_idx;
    return rec_idx;
}

void extract_properties(Record *rec, char *content) {
    char *lines[MAX_LINE];
    int line_count = 0;
    char *saveptr;
    char *line = strtok_r(content, "\n\r", &saveptr);
    while (line && line_count < MAX_LINE) {
        lines[line_count++] = line;
        line = strtok_r(NULL, "\n\r", &saveptr);
    }

    // Header
    if (line_count > 0) strncpy(rec->title, lines[0], 80);
    if (line_count > 1) strncpy(rec->timestamp, lines[1], 80);
    if (line_count > 2) strncpy(rec->comment, lines[2], 80);

    // Counts (simplified)
    if (line_count > 3) {
        sscanf(lines[3], "%d %d", &rec->atoms_count, &rec->bonds_count);
    }

    // Atoms (basic)
    int atom_start = 4;
    rec->atoms_num = 0;
    for (int i = 0; i < rec->atoms_count && i < 1000 && atom_start + i < line_count; i++) {
        sscanf(lines[atom_start + i], "%lf %lf %lf %3s", &rec->atoms[i].x, &rec->atoms[i].y, &rec->atoms[i].z, rec->atoms[i].element);
        rec->atoms_num++;
    }

    // Bonds (basic)
    int bond_start = atom_start + rec->atoms_count;
    rec->bonds_num = 0;
    for (int i = 0; i < rec->bonds_count && i < 1000 && bond_start + i < line_count; i++) {
        sscanf(lines[bond_start + i], "%d %d %d", &rec->bonds[i].atom1, &rec->bonds[i].atom2, &rec->bonds[i].type);
        rec->bonds_num++;
    }

    // Data fields (simplified, assume keys in fields)
    rec->fields_num = 0;
    char *key_start = strstr(content, "> <");
    while (key_start && rec->fields_num < MAX_FIELDS) {
        char *key_end = strchr(key_start, '>');
        if (key_end) {
            strncpy(rec->data_fields[rec->fields_num], key_start + 3, key_end - key_start - 3);  // Rough extract
            rec->fields_num++;
        }
        key_start = strstr(key_end ? key_end + 1 : key_start + 1, "> <");
    }
}

void sdf_handler_print_properties(SDFHandler *h) {
    printf("Number of records: %d\n\n", h->records_num);
    for (int idx = 0; idx < h->records_num; idx++) {
        Record *rec = &h->records[idx];
        printf("--- Record %d ---\n", idx + 1);
        printf("Title: %s\n", rec->title);
        printf("Atoms count: %d, Bonds count: %d\n", rec->atoms_count, rec->bonds_count);
        printf("Sample Atoms (first 3):\n");
        for (int j = 0; j < 3 && j < rec->atoms_num; j++) {
            printf("  %s: (%.4f, %.4f, %.4f)\n", rec->atoms[j].element, rec->atoms[j].x, rec->atoms[j].y, rec->atoms[j].z);
        }
        printf("Sample Bonds (first 3):\n");
        for (int j = 0; j < 3 && j < rec->bonds_num; j++) {
            printf("  %d-%d (type %d)\n", rec->bonds[j].atom1, rec->bonds[j].atom2, rec->bonds[j].type);
        }
        printf("Data Fields:\n");
        for (int j = 0; j < rec->fields_num; j++) {
            printf("  %s\n", rec->data_fields[j]);
        }
        printf("\n");
    }
}

void sdf_handler_write(SDFHandler *h, const char *output_path) {
    FILE *f = fopen(output_path, "w");
    if (!f) return;
    for (int i = 0; i < h->records_num; i++) {
        Record *rec = &h->records[i];
        fprintf(f, "%s\n", rec->title);
        fprintf(f, "%s\n", rec->timestamp);
        fprintf(f, "%s\n", rec->comment);
        fprintf(f, "%03d%03d000 0 0999 V2000\n", rec->atoms_count, rec->bonds_count);
        for (int j = 0; j < rec->atoms_num; j++) {
            fprintf(f, "%10.4f%10.4f%10.4f %-3s  0  0  0  0  0  0  0  0  0  0  0  0\n",
                    rec->atoms[j].x, rec->atoms[j].y, rec->atoms[j].z, rec->atoms[j].element);
        }
        for (int j = 0; j < rec->bonds_num; j++) {
            fprintf(f, "%3d%3d%3d  0  0  0  0\n", rec->bonds[j].atom1, rec->bonds[j].atom2, rec->bonds[j].type);
        }
        fprintf(f, "M  END\n");
        for (int j = 0; j < rec->fields_num; j++) {
            fprintf(f, "> <%s>\n\n", rec->data_fields[j]);  // Values omitted for simplicity
        }
        fprintf(f, "$$\n");
    }
    fclose(f);
}

// Example usage:
// int main() {
//   SDFHandler *h = sdf_handler_new("example.sdf");
//   sdf_handler_read(h);
//   sdf_handler_print_properties(h);
//   sdf_handler_write(h, "output.sdf");
//   sdf_handler_free(h);
//   return 0;
// }