Task 409: .MOL File Format
Task 409: .MOL File Format
TASK 409 START:
1. List of all the properties of this file format intrinsic to its file system
The .MOL file format (specifically the MDL Molfile V2000 format) is a text-based format for representing molecular structures. The "properties" refer to the data fields and sections within the file structure itself. Based on the official CTfile Formats specification, here is a comprehensive list of all intrinsic properties (sections, fields, and subfields):
Header Block:
- Molecule Name (Line 1: free text, up to 80 characters)
- User Initials (Line 2: A2)
- Program Name (Line 2: A8)
- Date/Time (Line 2: A10, format MMDDYYHHmm)
- Dimensional Codes (Line 2: A2, e.g., "2D" or "3D")
- Scaling Factors (Line 2: I2)
- Coordinate Scaling Factor (Line 2: F10.5)
- Energy (Line 2: F12.5, if from modeling program)
- Internal Registry Number (Line 2: I6)
- Comment (Line 3: free text, up to 80 characters)
Counts Line:
- Number of Atoms (aaa: I3, 000-999)
- Number of Bonds (bbb: I3, 000-999)
- Number of Atom Lists (lll: I3, 000-999, obsolete)
- Obsolete Field (fff: I3, set to 000)
- Chiral Flag (ccc: I3, 0=not chiral, 1=chiral)
- Obsolete Fields (sss, xxx, rrr, ppp, iii: I3, set to 000)
- Number of Properties Lines (mmm: I3, default 999)
- Version Stamp (vvvvvv: "V2000")
Atom Block (one line per atom, up to aaa atoms):
- X Coordinate (F10.4)
- Y Coordinate (F10.4)
- Z Coordinate (F10.4)
- Atom Symbol (A3, e.g., "C", "N", "R#", "*")
- Mass Difference (I2, -3 to +4)
- Charge (I3, 0-7 mapping to charges/radicals)
- Stereo Parity (I3, 0=not stereo, 1=odd, 2=even, 3=either)
- Hydrogen Count +1 (I3, query only)
- Stereo Care Box (I3, query only)
- Valence (I3, 0=default, 1-15 explicit)
- H0 Designator (I3, query only)
- Unused (rrr: I3, 000)
- Unused (iii: I3, 000)
- Atom-Atom Mapping (I3, reaction only)
- Inversion/Retention Flag (I3, reaction only)
- Exact Change Flag (I3, reaction/query only)
Bond Block (one line per bond, up to bbb bonds):
- First Atom Index (I3, 1 to aaa)
- Second Atom Index (I3, 1 to aaa)
- Bond Type (I3, 1=single, 2=double, 3=triple, 4=aromatic, 5-8=query types)
- Bond Stereo (I3, 0=none, 1=up, 4=either, 6=down for single; 0/3 for double)
- Unused (xxx: I3, 000)
- Bond Topology (I3, 0=either, 1=ring, 2=chain, query only)
- Reacting Center Status (I3, reaction/query only)
Atom List Block (obsolete, one line per list, up to lll lists):
- Attached Atom Number (I3)
- List Type (A1, "T"=NOT, "F"=normal)
- Spaces (A4)
- Number of Entries (I1, max 5)
- Atomic Symbols/Numbers (5 x A4)
Properties Block (variable lines until M END, extensible):
- Atom Alias (A aaa [text])
- Atom Value (V aaa [text])
- Group Abbreviation (G aaappp [text], obsolete)
- Charge (M CHG nn8 aaa vvv ... , nn8=1-8 entries, vvv=-15 to +15)
- Radical (M RAD nn8 aaa vvv ... , vvv=0=none, 1=singlet, 2=doublet, 3=triplet)
- Isotope (M ISO nn8 aaa vvv ... , vvv=absolute mass)
- Rgroup Label (M RGP nn8 aaa vvv ... )
- Rgroup Logic (M LOG rrr loo hhh rxx)
- Atom Attachment Order (M APO aaa ooo bbb)
- Link Node (M LIN aaa ttt nnn xxx yyy)
- Substitution Count (M SUB aaa vvv, query)
- Unsaturation (M UNS aaa vvv, query)
- Ring Bond Count (M RBC aaa vvv, query)
- Sgroup Type (M STY nn8 sss typ ... )
- Sgroup Subtype (M SST nn8 sss typ ... )
- Sgroup Connectivity (M SCN nn8 sss typ ... )
- Sgroup Atom List (M SAL sss nn16 aaa aaa ... )
- Sgroup Parent Atom List (M SPA sss nn16 aaa aaa ... )
- Sgroup Bond List (M SBL sss nn16 bbb bbb ... )
- Sgroup Label (M SMT sss [text])
- Sgroup Parent List (M SPL nn8 sss ppp ... )
- Sgroup Expansion (M SDS EXP nn16 sss sss ... )
- Sgroup Field Description (M SDT sss [field type] [field name] [units] [query op])
- Sgroup Display Info (M SDD sss xxxxx.xxxxyyyyy.yyyy eee ccc fff hhh iii jjj [text])
- Sgroup Data (M SCD sss [data])
- Sgroup Data End (M SED sss [data end])
- Sgroup Component Number (M SNC nn8 sss ccc ... )
- Sgroup Correspondence (M CRS nn8 sss ccc ... )
- Sgroup Display Coordinates (M SDI sss nn4 xxxxx.xxxxyyyyy.yyyy ... )
- Sgroup Bracket Style (M SBT nn8 sss ttt ... )
- Sgroup Attachment Point (M SAP sss nn6 aaa ppp ddd ... )
- Marvin SMARTS Properties (M MRV SMA [smarts])
- Registry Number Override (M REG [number])
- Skip Lines (S SKP nnn)
- 3D Features (M PNC, M V3D, etc., for 3D queries)
- End of Properties (M END)
These properties define the complete structure, including atoms, bonds, stereochemistry, queries, reactions, and extensions.
2. Two direct download links for files of format .MOL
- Benzene .MOL file: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/241/SDF (downloads as SDF, but content is .MOL format for a single molecule)
- Ethanol .MOL file: https://pubchem.ncbi.nlm.nih.gov/rest/pug/compound/cid/702/SDF (downloads as SDF, but content is .MOL format for a single molecule)
3. Ghost blog embedded html javascript that allows a user to drag n drop a file of format .MOL and it will dump to screen all these properties
Here is a self-contained HTML page with embedded JavaScript for drag-and-drop .MOL file upload. It parses the file and dumps all properties to the screen in a readable format. You can embed this in a Ghost blog post using raw HTML.
4. Python class that can open any file of format .MOL and decode read and write and print to console all the properties from the above list
import sys
class MolFileHandler:
def __init__(self, filepath=None):
self.properties = {}
if filepath:
self.read(filepath)
def read(self, filepath):
with open(filepath, 'r') as f:
content = f.read()
lines = content.split('\n')
self.properties = {
'header': {},
'counts': {},
'atoms': [],
'bonds': [],
'atom_lists': [],
'properties_block': []
}
# Header
if len(lines) >= 4:
self.properties['header']['molecule_name'] = lines[0].strip()
line2 = lines[1]
self.properties['header']['user_initials'] = line2[0:2].strip()
self.properties['header']['program_name'] = line2[2:10].strip()
self.properties['header']['date_time'] = line2[10:20].strip()
self.properties['header']['dimensional_codes'] = line2[20:22].strip()
self.properties['header']['scaling_factors'] = int(line2[22:24] or 0)
self.properties['header']['coordinate_scaling'] = float(line2[24:34] or 0)
self.properties['header']['energy'] = float(line2[34:46] or 0)
self.properties['header']['registry_number'] = int(line2[46:52] or 0)
self.properties['header']['comment'] = lines[2].strip()
# Counts
counts_line = lines[3] if len(lines) > 3 else ''
if 'V2000' in counts_line:
self.properties['counts']['num_atoms'] = int(counts_line[0:3] or 0)
self.properties['counts']['num_bonds'] = int(counts_line[3:6] or 0)
self.properties['counts']['num_atom_lists'] = int(counts_line[6:9] or 0)
self.properties['counts']['chiral_flag'] = int(counts_line[12:15] or 0)
self.properties['counts']['num_properties'] = int(counts_line[30:33] or 999)
self.properties['counts']['version'] = counts_line[34:39].strip()
# Atoms
atom_start = 4
for i in range(self.properties['counts']['num_atoms']):
line = lines[atom_start + i] if atom_start + i < len(lines) else ''
atom = {
'x': float(line[0:10] or 0),
'y': float(line[10:20] or 0),
'z': float(line[20:30] or 0),
'symbol': line[31:34].strip(),
'mass_diff': int(line[34:36] or 0),
'charge': int(line[36:39] or 0),
'stereo_parity': int(line[39:42] or 0),
'h_count': int(line[42:45] or 0),
'stereo_care': int(line[45:48] or 0),
'valence': int(line[48:51] or 0),
'h0': int(line[51:54] or 0),
'inversion': int(line[60:63] or 0),
'exact_change': int(line[63:66] or 0)
}
self.properties['atoms'].append(atom)
# Bonds
bond_start = atom_start + self.properties['counts']['num_atoms']
for i in range(self.properties['counts']['num_bonds']):
line = lines[bond_start + i] if bond_start + i < len(lines) else ''
bond = {
'atom1': int(line[0:3] or 0),
'atom2': int(line[3:6] or 0),
'type': int(line[6:9] or 0),
'stereo': int(line[9:12] or 0),
'topology': int(line[15:18] or 0),
'reacting_center': int(line[18:21] or 0)
}
self.properties['bonds'].append(bond)
# Atom Lists
list_start = bond_start + self.properties['counts']['num_bonds']
for i in range(self.properties['counts']['num_atom_lists']):
line = lines[list_start + i] if list_start + i < len(lines) else ''
atom_list = {
'attached_atom': int(line[0:3] or 0),
'list_type': line[3:4].strip(),
'num_entries': int(line[8:9] or 0),
'entries': [line[10 + j*5:14 + j*5].strip() for j in range(5) if line[10 + j*5:14 + j*5].strip()]
}
self.properties['atom_lists'].append(atom_list)
# Properties Block
prop_start = list_start + self.properties['counts']['num_atom_lists']
for i in range(prop_start, len(lines)):
line = lines[i].strip()
if line.startswith('M END'):
break
if line:
self.properties['properties_block'].append(line)
def print_properties(self):
import json
print(json.dumps(self.properties, indent=4))
def write(self, filepath):
with open(filepath, 'w') as f:
# Header
h = self.properties['header']
f.write(f"{h.get('molecule_name', ''):<80}\n")
line2 = f"{h.get('user_initials', ''):>2}{h.get('program_name', ''):>8}{h.get('date_time', ''):>10}{h.get('dimensional_codes', ''):>2}{h.get('scaling_factors', 0):>2}{h.get('coordinate_scaling', 0.0):>10.5f}{h.get('energy', 0.0):>12.5f}{h.get('registry_number', 0):>6}"
f.write(f"{line2:<80}\n")
f.write(f"{h.get('comment', ''):<80}\n")
# Counts
c = self.properties['counts']
counts_line = f"{c.get('num_atoms', 0):>3}{c.get('num_bonds', 0):>3}{c.get('num_atom_lists', 0):>3}{' ':>3}{c.get('chiral_flag', 0):>3}{' '*5}{c.get('num_properties', 999):>3} V2000"
f.write(f"{counts_line:<80}\n")
# Atoms
for atom in self.properties['atoms']:
line = f"{atom['x']:>10.4f}{atom['y']:>10.4f}{atom['z']:>10.4f} {atom['symbol']:<3}{atom['mass_diff']:>2}{atom['charge']:>3}{atom['stereo_parity']:>3}{atom['h_count']:>3}{atom['stereo_care']:>3}{atom['valence']:>3}{atom['h0']:>3}{' ':>6}{atom['inversion']:>3}{atom['exact_change']:>3}"
f.write(f"{line:<80}\n")
# Bonds
for bond in self.properties['bonds']:
line = f"{bond['atom1']:>3}{bond['atom2']:>3}{bond['type']:>3}{bond['stereo']:>3}{' ':>3}{bond['topology']:>3}{bond['reacting_center']:>3}"
f.write(f"{line:<80}\n")
# Atom Lists
for al in self.properties['atom_lists']:
entries_str = ''.join(f"{e:>4} " for e in al['entries'][:5])
line = f"{al['attached_atom']:>3}{al['list_type']:1} {al['num_entries']:1} {entries_str}"
f.write(f"{line:<80}\n")
# Properties Block
for prop in self.properties['properties_block']:
f.write(f"{prop:<80}\n")
f.write("M END\n")
# Example usage
if __name__ == "__main__":
if len(sys.argv) > 1:
handler = MolFileHandler(sys.argv[1])
handler.print_properties()
handler.write('output.mol') # Writes back to a new file
5. Java class that can open any file of format .MOL and decode read and write and print to console all the properties from the above list
import java.io.*;
import java.util.*;
public class MolFileHandler {
private Map<String, Object> properties = new HashMap<>();
public MolFileHandler(String filepath) throws IOException {
read(filepath);
}
public MolFileHandler() {}
@SuppressWarnings("unchecked")
public void read(String filepath) throws IOException {
properties = new HashMap<>();
List<String> lines = new ArrayList<>();
try (BufferedReader br = new BufferedReader(new FileReader(filepath))) {
String line;
while ((line = br.readLine()) != null) {
lines.add(line);
}
}
Map<String, Object> header = new HashMap<>();
Map<String, Object> counts = new HashMap<>();
List<Map<String, Object>> atoms = new ArrayList<>();
List<Map<String, Object>> bonds = new ArrayList<>();
List<Map<String, Object>> atomLists = new ArrayList<>();
List<String> propertiesBlock = new ArrayList<>();
// Header
if (lines.size() >= 4) {
header.put("molecule_name", lines.get(0).trim());
String line2 = lines.get(1);
header.put("user_initials", line2.substring(0, Math.min(2, line2.length())).trim());
header.put("program_name", line2.substring(2, Math.min(10, line2.length())).trim());
header.put("date_time", line2.substring(10, Math.min(20, line2.length())).trim());
header.put("dimensional_codes", line2.substring(20, Math.min(22, line2.length())).trim());
header.put("scaling_factors", parseIntSafe(line2.substring(22, Math.min(24, line2.length())), 0));
header.put("coordinate_scaling", parseDoubleSafe(line2.substring(24, Math.min(34, line2.length())), 0.0));
header.put("energy", parseDoubleSafe(line2.substring(34, Math.min(46, line2.length())), 0.0));
header.put("registry_number", parseIntSafe(line2.substring(46, Math.min(52, line2.length())), 0));
header.put("comment", lines.get(2).trim());
}
// Counts
String countsLine = lines.size() > 3 ? lines.get(3) : "";
if (countsLine.contains("V2000")) {
counts.put("num_atoms", parseIntSafe(countsLine.substring(0, 3), 0));
counts.put("num_bonds", parseIntSafe(countsLine.substring(3, 6), 0));
counts.put("num_atom_lists", parseIntSafe(countsLine.substring(6, 9), 0));
counts.put("chiral_flag", parseIntSafe(countsLine.substring(12, 15), 0));
counts.put("num_properties", parseIntSafe(countsLine.substring(30, 33), 999));
counts.put("version", countsLine.substring(34, 39).trim());
}
// Atoms
int numAtoms = (int) counts.getOrDefault("num_atoms", 0);
int atomStart = 4;
for (int i = 0; i < numAtoms; i++) {
String line = i + atomStart < lines.size() ? lines.get(atomStart + i) : "";
Map<String, Object> atom = new HashMap<>();
atom.put("x", parseDoubleSafe(line.substring(0, 10), 0.0));
atom.put("y", parseDoubleSafe(line.substring(10, 20), 0.0));
atom.put("z", parseDoubleSafe(line.substring(20, 30), 0.0));
atom.put("symbol", line.substring(31, 34).trim());
atom.put("mass_diff", parseIntSafe(line.substring(34, 36), 0));
atom.put("charge", parseIntSafe(line.substring(36, 39), 0));
atom.put("stereo_parity", parseIntSafe(line.substring(39, 42), 0));
atom.put("h_count", parseIntSafe(line.substring(42, 45), 0));
atom.put("stereo_care", parseIntSafe(line.substring(45, 48), 0));
atom.put("valence", parseIntSafe(line.substring(48, 51), 0));
atom.put("h0", parseIntSafe(line.substring(51, 54), 0));
atom.put("inversion", parseIntSafe(line.substring(60, 63), 0));
atom.put("exact_change", parseIntSafe(line.substring(63, 66), 0));
atoms.add(atom);
}
// Bonds
int numBonds = (int) counts.getOrDefault("num_bonds", 0);
int bondStart = atomStart + numAtoms;
for (int i = 0; i < numBonds; i++) {
String line = i + bondStart < lines.size() ? lines.get(bondStart + i) : "";
Map<String, Object> bond = new HashMap<>();
bond.put("atom1", parseIntSafe(line.substring(0, 3), 0));
bond.put("atom2", parseIntSafe(line.substring(3, 6), 0));
bond.put("type", parseIntSafe(line.substring(6, 9), 0));
bond.put("stereo", parseIntSafe(line.substring(9, 12), 0));
bond.put("topology", parseIntSafe(line.substring(15, 18), 0));
bond.put("reacting_center", parseIntSafe(line.substring(18, 21), 0));
bonds.add(bond);
}
// Atom Lists
int numLists = (int) counts.getOrDefault("num_atom_lists", 0);
int listStart = bondStart + numBonds;
for (int i = 0; i < numLists; i++) {
String line = i + listStart < lines.size() ? lines.get(listStart + i) : "";
Map<String, Object> atomList = new HashMap<>();
atomList.put("attached_atom", parseIntSafe(line.substring(0, 3), 0));
atomList.put("list_type", line.substring(3, 4).trim());
atomList.put("num_entries", parseIntSafe(line.substring(8, 9), 0));
List<String> entries = new ArrayList<>();
for (int j = 0; j < 5; j++) {
String e = line.substring(10 + j*5, Math.min(14 + j*5, line.length())).trim();
if (!e.isEmpty()) entries.add(e);
}
atomList.put("entries", entries);
atomLists.add(atomList);
}
// Properties Block
int propStart = listStart + numLists;
for (int i = propStart; i < lines.size(); i++) {
String line = lines.get(i).trim();
if (line.startsWith("M END")) break;
if (!line.isEmpty()) propertiesBlock.add(line);
}
properties.put("header", header);
properties.put("counts", counts);
properties.put("atoms", atoms);
properties.put("bonds", bonds);
properties.put("atom_lists", atomLists);
properties.put("properties_block", propertiesBlock);
}
public void printProperties() {
System.out.println(toJson(properties));
}
private String toJson(Object obj) {
// Simple JSON serializer for console
if (obj instanceof Map) {
StringBuilder sb = new StringBuilder("{\n");
((Map<?, ?>) obj).forEach((k, v) -> sb.append(" \"").append(k).append("\": ").append(toJson(v)).append(",\n"));
return sb.append("}").toString().replace(",\n}", "\n}");
} else if (obj instanceof List) {
StringBuilder sb = new StringBuilder("[\n");
((List<?>) obj).forEach(v -> sb.append(" ").append(toJson(v)).append(",\n"));
return sb.append("]").toString().replace(",\n]", "\n]");
} else if (obj instanceof String) {
return "\"" + obj + "\"";
} else {
return obj.toString();
}
}
@SuppressWarnings("unchecked")
public void write(String filepath) throws IOException {
try (BufferedWriter bw = new BufferedWriter(new FileWriter(filepath))) {
Map<String, Object> h = (Map<String, Object>) properties.getOrDefault("header", new HashMap<>());
bw.write(String.format("%-80s\n", h.getOrDefault("molecule_name", "")));
String line2 = String.format("%2s%8s%10s%2s%2d%10.5f%12.5f%6d",
h.getOrDefault("user_initials", ""), h.getOrDefault("program_name", ""), h.getOrDefault("date_time", ""),
h.getOrDefault("dimensional_codes", ""), (int) h.getOrDefault("scaling_factors", 0),
(double) h.getOrDefault("coordinate_scaling", 0.0), (double) h.getOrDefault("energy", 0.0),
(int) h.getOrDefault("registry_number", 0));
bw.write(String.format("%-80s\n", line2));
bw.write(String.format("%-80s\n", h.getOrDefault("comment", "")));
Map<String, Object> c = (Map<String, Object>) properties.getOrDefault("counts", new HashMap<>());
String countsLine = String.format("%3d%3d%3d %3d %3d V2000",
(int) c.getOrDefault("num_atoms", 0), (int) c.getOrDefault("num_bonds", 0), (int) c.getOrDefault("num_atom_lists", 0),
(int) c.getOrDefault("chiral_flag", 0), (int) c.getOrDefault("num_properties", 999));
bw.write(String.format("%-80s\n", countsLine));
List<Map<String, Object>> atoms = (List<Map<String, Object>>) properties.getOrDefault("atoms", new ArrayList<>());
for (Map<String, Object> atom : atoms) {
String line = String.format("%10.4f%10.4f%10.4f %-3s%2d%3d%3d%3d%3d%3d%3d %3d%3d",
(double) atom.getOrDefault("x", 0.0), (double) atom.getOrDefault("y", 0.0), (double) atom.getOrDefault("z", 0.0),
(String) atom.getOrDefault("symbol", ""), (int) atom.getOrDefault("mass_diff", 0), (int) atom.getOrDefault("charge", 0),
(int) atom.getOrDefault("stereo_parity", 0), (int) atom.getOrDefault("h_count", 0), (int) atom.getOrDefault("stereo_care", 0),
(int) atom.getOrDefault("valence", 0), (int) atom.getOrDefault("h0", 0), (int) atom.getOrDefault("inversion", 0),
(int) atom.getOrDefault("exact_change", 0));
bw.write(String.format("%-80s\n", line));
}
List<Map<String, Object>> bonds = (List<Map<String, Object>>) properties.getOrDefault("bonds", new ArrayList<>());
for (Map<String, Object> bond : bonds) {
String line = String.format("%3d%3d%3d%3d %3d%3d",
(int) bond.getOrDefault("atom1", 0), (int) bond.getOrDefault("atom2", 0), (int) bond.getOrDefault("type", 0),
(int) bond.getOrDefault("stereo", 0), (int) bond.getOrDefault("topology", 0), (int) bond.getOrDefault("reacting_center", 0));
bw.write(String.format("%-80s\n", line));
}
List<Map<String, Object>> atomLists = (List<Map<String, Object>>) properties.getOrDefault("atom_lists", new ArrayList<>());
for (Map<String, Object> al : atomLists) {
List<String> entries = (List<String>) al.getOrDefault("entries", new ArrayList<>());
String entriesStr = "";
for (int j = 0; j < 5; j++) {
entriesStr += j < entries.size() ? String.format("%4s ", entries.get(j)) : " ";
}
String line = String.format("%3d%s %1d %s",
(int) al.getOrDefault("attached_atom", 0), (String) al.getOrDefault("list_type", ""), (int) al.getOrDefault("num_entries", 0), entriesStr.trim());
bw.write(String.format("%-80s\n", line));
}
List<String> propsBlock = (List<String>) properties.getOrDefault("properties_block", new ArrayList<>());
for (String prop : propsBlock) {
bw.write(String.format("%-80s\n", prop));
}
bw.write("M END\n");
}
}
private int parseIntSafe(String s, int defaultVal) {
try {
return Integer.parseInt(s.trim());
} catch (NumberFormatException e) {
return defaultVal;
}
}
private double parseDoubleSafe(String s, double defaultVal) {
try {
return Double.parseDouble(s.trim());
} catch (NumberFormatException e) {
return defaultVal;
}
}
public static void main(String[] args) throws IOException {
if (args.length > 0) {
MolFileHandler handler = new MolFileHandler(args[0]);
handler.printProperties();
handler.write("output.mol");
}
}
}
6. Javascript class that can open any file of format .MOL and decode read and write and print to console all the properties from the above list
Note: JavaScript doesn't have native file I/O in browser, so this assumes Node.js environment. Use fs
module.
const fs = require('fs');
class MolFileHandler {
constructor(filepath) {
this.properties = {};
if (filepath) {
this.read(filepath);
}
}
read(filepath) {
const content = fs.readFileSync(filepath, 'utf8');
const lines = content.split('\n').map(l => l.trimEnd());
this.properties = {
header: {},
counts: {},
atoms: [],
bonds: [],
atom_lists: [],
properties_block: []
};
// Header
if (lines.length >= 4) {
this.properties.header.molecule_name = lines[0].trim();
const line2 = lines[1];
this.properties.header.user_initials = line2.substring(0, 2).trim();
this.properties.header.program_name = line2.substring(2, 10).trim();
this.properties.header.date_time = line2.substring(10, 20).trim();
this.properties.header.dimensional_codes = line2.substring(20, 22).trim();
this.properties.header.scaling_factors = parseInt(line2.substring(22, 24)) || 0;
this.properties.header.coordinate_scaling = parseFloat(line2.substring(24, 34)) || 0;
this.properties.header.energy = parseFloat(line2.substring(34, 46)) || 0;
this.properties.header.registry_number = parseInt(line2.substring(46, 52)) || 0;
this.properties.header.comment = lines[2].trim();
}
// Counts
const countsLine = lines[3] || '';
if (countsLine.includes('V2000')) {
this.properties.counts.num_atoms = parseInt(countsLine.substring(0, 3)) || 0;
this.properties.counts.num_bonds = parseInt(countsLine.substring(3, 6)) || 0;
this.properties.counts.num_atom_lists = parseInt(countsLine.substring(6, 9)) || 0;
this.properties.counts.chiral_flag = parseInt(countsLine.substring(12, 15)) || 0;
this.properties.counts.num_properties = parseInt(countsLine.substring(30, 33)) || 999;
this.properties.counts.version = countsLine.substring(34, 39).trim();
}
// Atoms
const atomStart = 4;
for (let i = 0; i < this.properties.counts.num_atoms; i++) {
const line = lines[atomStart + i] || '';
const atom = {
x: parseFloat(line.substring(0, 10)) || 0,
y: parseFloat(line.substring(10, 20)) || 0,
z: parseFloat(line.substring(20, 30)) || 0,
symbol: line.substring(31, 34).trim(),
mass_diff: parseInt(line.substring(34, 36)) || 0,
charge: parseInt(line.substring(36, 39)) || 0,
stereo_parity: parseInt(line.substring(39, 42)) || 0,
h_count: parseInt(line.substring(42, 45)) || 0,
stereo_care: parseInt(line.substring(45, 48)) || 0,
valence: parseInt(line.substring(48, 51)) || 0,
h0: parseInt(line.substring(51, 54)) || 0,
inversion: parseInt(line.substring(60, 63)) || 0,
exact_change: parseInt(line.substring(63, 66)) || 0
};
this.properties.atoms.push(atom);
}
// Bonds
const bondStart = atomStart + this.properties.counts.num_atoms;
for (let i = 0; i < this.properties.counts.num_bonds; i++) {
const line = lines[bondStart + i] || '';
const bond = {
atom1: parseInt(line.substring(0, 3)) || 0,
atom2: parseInt(line.substring(3, 6)) || 0,
type: parseInt(line.substring(6, 9)) || 0,
stereo: parseInt(line.substring(9, 12)) || 0,
topology: parseInt(line.substring(15, 18)) || 0,
reacting_center: parseInt(line.substring(18, 21)) || 0
};
this.properties.bonds.push(bond);
}
// Atom Lists
const listStart = bondStart + this.properties.counts.num_bonds;
for (let i = 0; i < this.properties.counts.num_atom_lists; i++) {
const line = lines[listStart + i] || '';
const atomList = {
attached_atom: parseInt(line.substring(0, 3)) || 0,
list_type: line.substring(3, 4).trim(),
num_entries: parseInt(line.substring(8, 9)) || 0,
entries: []
};
for (let j = 0; j < 5; j++) {
const e = line.substring(10 + j*5, 14 + j*5).trim();
if (e) atomList.entries.push(e);
}
this.properties.atom_lists.push(atomList);
}
// Properties Block
const propStart = listStart + this.properties.counts.num_atom_lists;
for (let i = propStart; i < lines.length; i++) {
const line = lines[i].trim();
if (line.startsWith('M END')) break;
if (line) this.properties.properties_block.push(line);
}
}
printProperties() {
console.log(JSON.stringify(this.properties, null, 4));
}
write(filepath) {
let output = '';
const h = this.properties.header || {};
output += `${(h.molecule_name || '').padEnd(80)}\n`;
const line2 = `${(h.user_initials || '').padStart(2)}${(h.program_name || '').padStart(8)}${(h.date_time || '').padStart(10)}${(h.dimensional_codes || '').padStart(2)}${ (h.scaling_factors || 0).toString().padStart(2) }${ (h.coordinate_scaling || 0).toFixed(5).padStart(10) }${ (h.energy || 0).toFixed(5).padStart(12) }${ (h.registry_number || 0).toString().padStart(6) }`;
output += `${line2.padEnd(80)}\n`;
output += `${(h.comment || '').padEnd(80)}\n`;
const c = this.properties.counts || {};
const countsLine = `${ (c.num_atoms || 0).toString().padStart(3) }${ (c.num_bonds || 0).toString().padStart(3) }${ (c.num_atom_lists || 0).toString().padStart(3) } ${ (c.chiral_flag || 0).toString().padStart(3) } ${ (c.num_properties || 999).toString().padStart(3) } V2000`;
output += `${countsLine.padEnd(80)}\n`;
(this.properties.atoms || []).forEach(atom => {
const line = `${ atom.x.toFixed(4).padStart(10) }${ atom.y.toFixed(4).padStart(10) }${ atom.z.toFixed(4).padStart(10) } ${ (atom.symbol || '').padEnd(3) }${ atom.mass_diff.toString().padStart(2) }${ atom.charge.toString().padStart(3) }${ atom.stereo_parity.toString().padStart(3) }${ atom.h_count.toString().padStart(3) }${ atom.stereo_care.toString().padStart(3) }${ atom.valence.toString().padStart(3) }${ atom.h0.toString().padStart(3) } ${ atom.inversion.toString().padStart(3) }${ atom.exact_change.toString().padStart(3) }`;
output += `${line.padEnd(80)}\n`;
});
(this.properties.bonds || []).forEach(bond => {
const line = `${ bond.atom1.toString().padStart(3) }${ bond.atom2.toString().padStart(3) }${ bond.type.toString().padStart(3) }${ bond.stereo.toString().padStart(3) } ${ bond.topology.toString().padStart(3) }${ bond.reacting_center.toString().padStart(3) }`;
output += `${line.padEnd(80)}\n`;
});
(this.properties.atom_lists || []).forEach(al => {
let entriesStr = '';
for (let j = 0; j < 5; j++) {
entriesStr += (al.entries[j] || '').padStart(4) + ' ';
}
const line = `${ al.attached_atom.toString().padStart(3) }${ al.list_type } ${ al.num_entries } ${entriesStr.trim()}`;
output += `${line.padEnd(80)}\n`;
});
(this.properties.properties_block || []).forEach(prop => {
output += `${prop.padEnd(80)}\n`;
});
output += 'M END\n';
fs.writeFileSync(filepath, output);
}
}
// Example usage
if (process.argv.length > 2) {
const handler = new MolFileHandler(process.argv[2]);
handler.printProperties();
handler.write('output.mol');
}
7. C class that can open any file of format .MOL and decode read and write and print to console all the properties from the above list
Since C does not have classes, here is a struct with functions for equivalent functionality.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <ctype.h>
#define MAX_LINES 10000
#define LINE_LEN 81
typedef struct {
char molecule_name[81];
char user_initials[3];
char program_name[9];
char date_time[11];
char dimensional_codes[3];
int scaling_factors;
double coordinate_scaling;
double energy;
int registry_number;
char comment[81];
} Header;
typedef struct {
int num_atoms;
int num_bonds;
int num_atom_lists;
int chiral_flag;
int num_properties;
char version[6];
} Counts;
typedef struct {
double x, y, z;
char symbol[4];
int mass_diff;
int charge;
int stereo_parity;
int h_count;
int stereo_care;
int valence;
int h0;
int inversion;
int exact_change;
} Atom;
typedef struct {
int atom1, atom2;
int type;
int stereo;
int topology;
int reacting_center;
} Bond;
typedef struct {
int attached_atom;
char list_type[2];
int num_entries;
char entries[5][5];
} AtomList;
typedef struct {
Header header;
Counts counts;
Atom *atoms;
Bond *bonds;
AtomList *atom_lists;
char **properties_block;
int properties_block_count;
} MolProperties;
void init_properties(MolProperties *props) {
memset(&props->header, 0, sizeof(Header));
memset(&props->counts, 0, sizeof(Counts));
props->atoms = NULL;
props->bonds = NULL;
props->atom_lists = NULL;
props->properties_block = NULL;
props->properties_block_count = 0;
}
void free_properties(MolProperties *props) {
free(props->atoms);
free(props->bonds);
free(props->atom_lists);
for (int i = 0; i < props->properties_block_count; i++) {
free(props->properties_block[i]);
}
free(props->properties_block);
}
int read_mol(const char *filepath, MolProperties *props) {
FILE *fp = fopen(filepath, "r");
if (!fp) return -1;
char lines[MAX_LINES][LINE_LEN];
int line_count = 0;
while (fgets(lines[line_count], LINE_LEN, fp) && line_count < MAX_LINES) {
line_count++;
}
fclose(fp);
init_properties(props);
// Header
if (line_count >= 4) {
strncpy(props->header.molecule_name, lines[0], 80);
props->header.molecule_name[80] = '\0';
char line2[81];
strncpy(line2, lines[1], 80);
line2[80] = '\0';
strncpy(props->header.user_initials, line2, 2);
props->header.user_initials[2] = '\0';
strncpy(props->header.program_name, line2 + 2, 8);
props->header.program_name[8] = '\0';
strncpy(props->header.date_time, line2 + 10, 10);
props->header.date_time[10] = '\0';
strncpy(props->header.dimensional_codes, line2 + 20, 2);
props->header.dimensional_codes[2] = '\0';
props->header.scaling_factors = atoi(line2 + 22);
props->header.coordinate_scaling = atof(line2 + 24);
props->header.energy = atof(line2 + 34);
props->header.registry_number = atoi(line2 + 46);
strncpy(props->header.comment, lines[2], 80);
props->header.comment[80] = '\0';
}
// Counts
char counts_line[81];
if (line_count > 3) {
strncpy(counts_line, lines[3], 80);
counts_line[80] = '\0';
if (strstr(counts_line, "V2000")) {
char temp[4];
strncpy(temp, counts_line, 3); temp[3] = '\0'; props->counts.num_atoms = atoi(temp);
strncpy(temp, counts_line + 3, 3); temp[3] = '\0'; props->counts.num_bonds = atoi(temp);
strncpy(temp, counts_line + 6, 3); temp[3] = '\0'; props->counts.num_atom_lists = atoi(temp);
strncpy(temp, counts_line + 12, 3); temp[3] = '\0'; props->counts.chiral_flag = atoi(temp);
strncpy(temp, counts_line + 30, 3); temp[3] = '\0'; props->counts.num_properties = atoi(temp);
strncpy(props->counts.version, counts_line + 34, 5); props->counts.version[5] = '\0';
}
}
// Atoms
props->atoms = malloc(props->counts.num_atoms * sizeof(Atom));
int atom_start = 4;
for (int i = 0; i < props->counts.num_atoms; i++) {
char line[81];
if (atom_start + i >= line_count) break;
strncpy(line, lines[atom_start + i], 80); line[80] = '\0';
props->atoms[i].x = atof(line);
props->atoms[i].y = atof(line + 10);
props->atoms[i].z = atof(line + 20);
strncpy(props->atoms[i].symbol, line + 31, 3); props->atoms[i].symbol[3] = '\0';
props->atoms[i].mass_diff = atoi(line + 34);
props->atoms[i].charge = atoi(line + 36);
props->atoms[i].stereo_parity = atoi(line + 39);
props->atoms[i].h_count = atoi(line + 42);
props->atoms[i].stereo_care = atoi(line + 45);
props->atoms[i].valence = atoi(line + 48);
props->atoms[i].h0 = atoi(line + 51);
props->atoms[i].inversion = atoi(line + 60);
props->atoms[i].exact_change = atoi(line + 63);
}
// Bonds
props->bonds = malloc(props->counts.num_bonds * sizeof(Bond));
int bond_start = atom_start + props->counts.num_atoms;
for (int i = 0; i < props->counts.num_bonds; i++) {
char line[81];
if (bond_start + i >= line_count) break;
strncpy(line, lines[bond_start + i], 80); line[80] = '\0';
props->bonds[i].atom1 = atoi(line);
props->bonds[i].atom2 = atoi(line + 3);
props->bonds[i].type = atoi(line + 6);
props->bonds[i].stereo = atoi(line + 9);
props->bonds[i].topology = atoi(line + 15);
props->bonds[i].reacting_center = atoi(line + 18);
}
// Atom Lists
props->atom_lists = malloc(props->counts.num_atom_lists * sizeof(AtomList));
int list_start = bond_start + props->counts.num_bonds;
for (int i = 0; i < props->counts.num_atom_lists; i++) {
char line[81];
if (list_start + i >= line_count) break;
strncpy(line, lines[list_start + i], 80); line[80] = '\0';
props->atom_lists[i].attached_atom = atoi(line);
strncpy(props->atom_lists[i].list_type, line + 3, 1); props->atom_lists[i].list_type[1] = '\0';
props->atom_lists[i].num_entries = atoi(line + 8);
for (int j = 0; j < 5; j++) {
strncpy(props->atom_lists[i].entries[j], line + 10 + j*5, 4); props->atom_lists[i].entries[j][4] = '\0';
}
}
// Properties Block
int prop_start = list_start + props->counts.num_atom_lists;
props->properties_block = malloc((line_count - prop_start) * sizeof(char*));
for (int i = prop_start; i < line_count; i++) {
if (strstr(lines[i], "M END")) break;
if (strlen(lines[i]) > 0) {
props->properties_block[props->properties_block_count] = strdup(lines[i]);
props->properties_block_count++;
}
}
return 0;
}
void print_properties(const MolProperties *props) {
printf("{\n");
printf(" \"header\": {\n");
printf(" \"molecule_name\": \"%s\",\n", props->header.molecule_name);
printf(" \"user_initials\": \"%s\",\n", props->header.user_initials);
printf(" \"program_name\": \"%s\",\n", props->header.program_name);
printf(" \"date_time\": \"%s\",\n", props->header.date_time);
printf(" \"dimensional_codes\": \"%s\",\n", props->header.dimensional_codes);
printf(" \"scaling_factors\": %d,\n", props->header.scaling_factors);
printf(" \"coordinate_scaling\": %.5f,\n", props->header.coordinate_scaling);
printf(" \"energy\": %.5f,\n", props->header.energy);
printf(" \"registry_number\": %d,\n", props->header.registry_number);
printf(" \"comment\": \"%s\"\n", props->header.comment);
printf(" },\n");
printf(" \"counts\": {\n");
printf(" \"num_atoms\": %d,\n", props->counts.num_atoms);
printf(" \"num_bonds\": %d,\n", props->counts.num_bonds);
printf(" \"num_atom_lists\": %d,\n", props->counts.num_atom_lists);
printf(" \"chiral_flag\": %d,\n", props->counts.chiral_flag);
printf(" \"num_properties\": %d,\n", props->counts.num_properties);
printf(" \"version\": \"%s\"\n", props->counts.version);
printf(" },\n");
printf(" \"atoms\": [\n");
for (int i = 0; i < props->counts.num_atoms; i++) {
printf(" {\n");
printf(" \"x\": %.4f,\n", props->atoms[i].x);
printf(" \"y\": %.4f,\n", props->atoms[i].y);
printf(" \"z\": %.4f,\n", props->atoms[i].z);
printf(" \"symbol\": \"%s\",\n", props->atoms[i].symbol);
printf(" \"mass_diff\": %d,\n", props->atoms[i].mass_diff);
printf(" \"charge\": %d,\n", props->atoms[i].charge);
printf(" \"stereo_parity\": %d,\n", props->atoms[i].stereo_parity);
printf(" \"h_count\": %d,\n", props->atoms[i].h_count);
printf(" \"stereo_care\": %d,\n", props->atoms[i].stereo_care);
printf(" \"valence\": %d,\n", props->atoms[i].valence);
printf(" \"h0\": %d,\n", props->atoms[i].h0);
printf(" \"inversion\": %d,\n", props->atoms[i].inversion);
printf(" \"exact_change\": %d\n", props->atoms[i].exact_change);
printf(" }%s\n", i < props->counts.num_atoms - 1 ? "," : "");
}
printf(" ],\n");
printf(" \"bonds\": [\n");
for (int i = 0; i < props->counts.num_bonds; i++) {
printf(" {\n");
printf(" \"atom1\": %d,\n", props->bonds[i].atom1);
printf(" \"atom2\": %d,\n", props->bonds[i].atom2);
printf(" \"type\": %d,\n", props->bonds[i].type);
printf(" \"stereo\": %d,\n", props->bonds[i].stereo);
printf(" \"topology\": %d,\n", props->bonds[i].topology);
printf(" \"reacting_center\": %d\n", props->bonds[i].reacting_center);
printf(" }%s\n", i < props->counts.num_bonds - 1 ? "," : "");
}
printf(" ],\n");
printf(" \"atom_lists\": [\n");
for (int i = 0; i < props->counts.num_atom_lists; i++) {
printf(" {\n");
printf(" \"attached_atom\": %d,\n", props->atom_lists[i].attached_atom);
printf(" \"list_type\": \"%s\",\n", props->atom_lists[i].list_type);
printf(" \"num_entries\": %d,\n", props->atom_lists[i].num_entries);
printf(" \"entries\": [");
for (int j = 0; j < 5; j++) {
if (strlen(props->atom_lists[i].entries[j]) > 0) {
printf("\"%s\"%s", props->atom_lists[i].entries[j], j < 4 ? ", " : "");
}
}
printf("]\n");
printf(" }%s\n", i < props->counts.num_atom_lists - 1 ? "," : "");
}
printf(" ],\n");
printf(" \"properties_block\": [\n");
for (int i = 0; i < props->properties_block_count; i++) {
printf(" \"%s\"%s\n", props->properties_block[i], i < props->properties_block_count - 1 ? "," : "");
}
printf(" ]\n");
printf("}\n");
}
int write_mol(const char *filepath, const MolProperties *props) {
FILE *fp = fopen(filepath, "w");
if (!fp) return -1;
fprintf(fp, "%-80s\n", props->header.molecule_name);
char line2[81];
snprintf(line2, 81, "%2s%8s%10s%2s%2d%10.5f%12.5f%6d",
props->header.user_initials, props->header.program_name, props->header.date_time,
props->header.dimensional_codes, props->header.scaling_factors,
props->header.coordinate_scaling, props->header.energy, props->header.registry_number);
fprintf(fp, "%-80s\n", line2);
fprintf(fp, "%-80s\n", props->header.comment);
char counts_line[81];
snprintf(counts_line, 81, "%3d%3d%3d %3d %3d V2000",
props->counts.num_atoms, props->counts.num_bonds, props->counts.num_atom_lists,
props->counts.chiral_flag, props->counts.num_properties);
fprintf(fp, "%-80s\n", counts_line);
for (int i = 0; i < props->counts.num_atoms; i++) {
char line[81];
snprintf(line, 81, "%10.4f%10.4f%10.4f %-3s%2d%3d%3d%3d%3d%3d%3d %3d%3d",
props->atoms[i].x, props->atoms[i].y, props->atoms[i].z, props->atoms[i].symbol,
props->atoms[i].mass_diff, props->atoms[i].charge, props->atoms[i].stereo_parity,
props->atoms[i].h_count, props->atoms[i].stereo_care, props->atoms[i].valence,
props->atoms[i].h0, props->atoms[i].inversion, props->atoms[i].exact_change);
fprintf(fp, "%-80s\n", line);
}
for (int i = 0; i < props->counts.num_bonds; i++) {
char line[81];
snprintf(line, 81, "%3d%3d%3d%3d %3d%3d",
props->bonds[i].atom1, props->bonds[i].atom2, props->bonds[i].type,
props->bonds[i].stereo, props->bonds[i].topology, props->bonds[i].reacting_center);
fprintf(fp, "%-80s\n", line);
}
for (int i = 0; i < props->counts.num_atom_lists; i++) {
char entries_str[26] = "";
for (int j = 0; j < 5; j++) {
char temp[6];
snprintf(temp, 6, "%4s ", props->atom_lists[i].entries[j]);
strcat(entries_str, temp);
}
char line[81];
snprintf(line, 81, "%3d%s %1d %s",
props->atom_lists[i].attached_atom, props->atom_lists[i].list_type,
props->atom_lists[i].num_entries, entries_str);
fprintf(fp, "%-80s\n", line);
}
for (int i = 0; i < props->properties_block_count; i++) {
fprintf(fp, "%-80s\n", props->properties_block[i]);
}
fprintf(fp, "M END\n");
fclose(fp);
return 0;
}
int main(int argc, char *argv[]) {
if (argc < 2) return 1;
MolProperties props;
if (read_mol(argv[1], &props) == 0) {
print_properties(&props);
write_mol("output.mol", &props);
free_properties(&props);
}
return 0;
}
TASK 409 STOP.