Task 137: .DICOM File Format

Task 137: .DICOM File Format

The DICOM (Digital Imaging and Communications in Medicine) file format specifications are defined in Part 10 of the DICOM Standard (PS3.10), which describes the media storage and file format for medical imaging data. This standard is maintained by the National Electrical Manufacturers Association (NEMA) and is accessible through the official DICOM website.

The properties of the DICOM file format intrinsic to its file structure include the following components, as specified in the DICOM Standard PS3.10, Section 7:

File Preamble: A fixed 128-byte field, typically set to zeros or used for application-specific data, but not relied upon for file identification.

DICOM Prefix: A 4-byte string encoded as "DICM" to identify the file as DICOM-compliant.

File Meta Information: A set of Data Elements in Group 0002, encoded using Explicit Value Representation (VR) Little Endian Transfer Syntax. These elements provide metadata about the file and the enclosed Data Set. The following is a list of the File Meta Information elements, including their tags, types (1 = mandatory, 3 = optional), and descriptions:

Tag Attribute Name Type Description
(0002,0000) File Meta Information Group Length 1 Number of bytes from the end of this element to the last File Meta Element in Group 0002.
(0002,0001) File Meta Information Version 1 Two-byte field identifying the version (e.g., 00H 01H for version 1).
(0002,0002) Media Storage SOP Class UID 1 Unique identifier for the SOP Class of the Data Set.
(0002,0003) Media Storage SOP Instance UID 1 Unique identifier for the SOP Instance of the Data Set.
(0002,0010) Transfer Syntax UID 1 Unique identifier for the Transfer Syntax used to encode the Data Set.
(0002,0012) Implementation Class UID 1 Unique identifier for the implementation that created the file.
(0002,0013) Implementation Version Name 3 Version name for the Implementation Class UID (up to 16 characters).
(0002,0016) Source Application Entity Title 3 DICOM Application Entity Title of the entity that wrote or updated the file.
(0002,0017) Sending Application Entity Title 3 DICOM Application Entity Title of the sending entity (if applicable).

Additional optional elements may include (0002,0018) Receiving Application Entity Title, (0002,0100) Private Information Creator UID, and (0002,0102) Private Information, but these are conditional or less commonly required.

  • Data Set: The main content following the File Meta Information, encoded per the specified Transfer Syntax, representing the SOP Instance (e.g., image data and attributes). This is not considered intrinsic to the file system structure in the same way as the above, as it varies by content.

Two direct download links for files in DICOM format (.dcm extension, commonly used for DICOM files):

The following is an embedded HTML and JavaScript code snippet suitable for integration into a Ghost blog post. It creates a drag-and-drop area where a user can drop a DICOM file, parses the file to extract and display the properties listed in item 1 (preamble size confirmation, prefix validation, and File Meta Information elements), and outputs them to the screen. The parser assumes Explicit VR Little Endian for the meta information and handles basic VR types for display.

Drag and drop a DICOM file here

The following is a Python class for handling DICOM files. It can open a file, decode and read the properties, print them to the console, and write a new file with the same properties (for demonstration, it copies the original content after printing).

import struct
import os

class DicomHandler:
    def __init__(self):
        self.properties = {}
        self.meta_names = {
            (0x0002, 0x0000): 'File Meta Information Group Length',
            (0x0002, 0x0001): 'File Meta Information Version',
            (0x0002, 0x0002): 'Media Storage SOP Class UID',
            (0x0002, 0x0003): 'Media Storage SOP Instance UID',
            (0x0002, 0x0010): 'Transfer Syntax UID',
            (0x0002, 0x0012): 'Implementation Class UID',
            (0x0002, 0x0013): 'Implementation Version Name',
            (0x0002, 0x0016): 'Source Application Entity Title',
            (0x0002, 0x0017): 'Sending Application Entity Title'
        }
        self.buffer = None

    def read(self, filepath):
        with open(filepath, 'rb') as f:
            self.buffer = f.read()
        # Preamble
        print('File Preamble: 128 bytes')
        # Prefix
        prefix = self.buffer[128:132].decode('ascii')
        if prefix != 'DICM':
            raise ValueError(f'Invalid DICOM Prefix: {prefix}')
        print('DICOM Prefix: DICM')
        # Meta
        offset = 132
        group_length = 0
        while offset < len(self.buffer):
            group = struct.unpack_from('<H', self.buffer, offset)[0]
            if group != 0x0002:
                break
            element = struct.unpack_from('<H', self.buffer, offset + 2)[0]
            vr = self.buffer[offset + 4:offset + 6].decode('ascii')
            if vr in ['OB', 'OW', 'OF', 'SQ', 'UN', 'UT']:
                length = struct.unpack_from('<I', self.buffer, offset + 8)[0]
                value_offset = offset + 12
            else:
                length = struct.unpack_from('<H', self.buffer, offset + 6)[0]
                value_offset = offset + 8
            value = self.buffer[value_offset:value_offset + length]
            tag = (group, element)
            name = self.meta_names.get(tag, 'Unknown')
            if vr in ['UI', 'SH', 'LO', 'ST', 'PN', 'AE']:
                val_str = value.decode('ascii', errors='ignore').rstrip('\x00')
            elif vr == 'UL':
                val_str = struct.unpack('<I', value)[0]
            elif vr == 'OB':
                val_str = '[Binary data]'
            else:
                val_str = value
            print(f'{name} ({group:04x},{element:04x}): {val_str}')
            if tag == (0x0002, 0x0000):
                group_length = struct.unpack('<I', value)[0]
            offset += value_offset - offset + length
        self.properties['group_length'] = group_length  # For reference

    def write(self, output_filepath):
        if self.buffer is None:
            raise ValueError('No file read yet')
        with open(output_filepath, 'wb') as f:
            f.write(self.buffer)
        print(f'File written to {output_filepath}')

# Example usage:
# handler = DicomHandler()
# handler.read('path/to/file.dcm')
# handler.write('path/to/output.dcm')

The following is a Java class for handling DICOM files. It can open a file, decode and read the properties, print them to the console, and write a new file with the same properties (copying the original content).

import java.io.*;
import java.nio.*;
import java.util.HashMap;
import java.util.Map;

public class DicomHandler {
    private Map<Integer, String> metaNames = new HashMap<>();
    private byte[] buffer;

    public DicomHandler() {
        int[] tags = {0x00020000, 0x00020001, 0x00020002, 0x00020003, 0x00020010, 0x00020012, 0x00020013, 0x00020016, 0x00020017};
        String[] names = {"File Meta Information Group Length", "File Meta Information Version", "Media Storage SOP Class UID", "Media Storage SOP Instance UID", "Transfer Syntax UID", "Implementation Class UID", "Implementation Version Name", "Source Application Entity Title", "Sending Application Entity Title"};
        for (int i = 0; i < tags.length; i++) {
            metaNames.put(tags[i], names[i]);
        }
    }

    public void read(String filepath) throws IOException {
        File file = new File(filepath);
        buffer = new byte[(int) file.length()];
        try (FileInputStream fis = new FileInputStream(file)) {
            fis.read(buffer);
        }
        ByteBuffer bb = ByteBuffer.wrap(buffer).order(ByteOrder.LITTLE_ENDIAN);
        System.out.println("File Preamble: 128 bytes");
        bb.position(128);
        char[] prefix = new char[4];
        for (int i = 0; i < 4; i++) prefix[i] = (char) bb.get();
        String prefixStr = new String(prefix);
        if (!prefixStr.equals("DICM")) {
            throw new IOException("Invalid DICOM Prefix: " + prefixStr);
        }
        System.out.println("DICOM Prefix: DICM");
        int groupLength = 0;
        while (bb.position() < buffer.length) {
            short group = bb.getShort();
            if (group != 0x0002) break;
            short element = bb.getShort();
            String vr = new String(new char[]{(char) bb.get(), (char) bb.get()});
            int length;
            int valueOffset = bb.position();
            if (vr.equals("OB") || vr.equals("OW") || vr.equals("OF") || vr.equals("SQ") || vr.equals("UN") || vr.equals("UT")) {
                bb.getShort(); // Reserved
                length = bb.getInt();
            } else {
                length = bb.getShort() & 0xFFFF;
            }
            valueOffset = bb.position();
            byte[] value = new byte[length];
            bb.get(value);
            int tag = (group << 16) | element;
            String name = metaNames.getOrDefault(tag, "Unknown");
            String valStr;
            if (vr.equals("UI") || vr.equals("SH") || vr.equals("LO") || vr.equals("ST") || vr.equals("PN") || vr.equals("AE")) {
                valStr = new String(value, "US-ASCII").trim();
            } else if (vr.equals("UL")) {
                valStr = Integer.toUnsignedString(ByteBuffer.wrap(value).order(ByteOrder.LITTLE_ENDIAN).getInt());
            } else {
                valStr = "[Data, length " + length + "]";
            }
            System.out.printf("%s (%04x,%04x): %s%n", name, group, element, valStr);
            if (tag == 0x00020000) {
                groupLength = ByteBuffer.wrap(value).order(ByteOrder.LITTLE_ENDIAN).getInt();
            }
        }
    }

    public void write(String outputFilepath) throws IOException {
        if (buffer == null) {
            throw new IOException("No file read yet");
        }
        try (FileOutputStream fos = new FileOutputStream(outputFilepath)) {
            fos.write(buffer);
        }
        System.out.println("File written to " + outputFilepath);
    }

    // Example usage:
    // public static void main(String[] args) throws IOException {
    //     DicomHandler handler = new DicomHandler();
    //     handler.read("path/to/file.dcm");
    //     handler.write("path/to/output.dcm");
    // }
}

The following is a JavaScript class for handling DICOM files. It can open a file (using Node.js fs module), decode and read the properties, print them to the console, and write a new file with the same properties.

const fs = require('fs');

class DicomHandler {
  constructor() {
    this.metaNames = {
      '00020000': 'File Meta Information Group Length',
      '00020001': 'File Meta Information Version',
      '00020002': 'Media Storage SOP Class UID',
      '00020003': 'Media Storage SOP Instance UID',
      '00020010': 'Transfer Syntax UID',
      '00020012': 'Implementation Class UID',
      '00020013': 'Implementation Version Name',
      '00020016': 'Source Application Entity Title',
      '00020017': 'Sending Application Entity Title'
    };
    this.buffer = null;
  }

  read(filepath) {
    this.buffer = fs.readFileSync(filepath);
    const view = new DataView(this.buffer.buffer);
    console.log('File Preamble: 128 bytes');
    const prefix = String.fromCharCode(view.getUint8(128), view.getUint8(129), view.getUint8(130), view.getUint8(131));
    if (prefix !== 'DICM') {
      throw new Error(`Invalid DICOM Prefix: ${prefix}`);
    }
    console.log('DICOM Prefix: DICM');
    let offset = 132;
    let groupLength = 0;
    while (offset < this.buffer.length) {
      const group = view.getUint16(offset, true);
      if (group !== 0x0002) break;
      const element = view.getUint16(offset + 2, true);
      const vr = String.fromCharCode(view.getUint8(offset + 4), view.getUint8(offset + 5));
      let length;
      let valueOffset = offset + 8;
      if (['OB', 'OW', 'OF', 'SQ', 'UN', 'UT'].includes(vr)) {
        valueOffset = offset + 12;
        length = view.getUint32(offset + 8, true);
      } else {
        length = view.getUint16(offset + 6, true);
      }
      let value = '';
      for (let i = 0; i < length; i++) {
        const byte = view.getUint8(valueOffset + i);
        if (vr === 'UI' || vr === 'SH' || vr === 'LO' || vr === 'ST' || vr === 'PN' || vr === 'AE') {
          if (byte !== 0) value += String.fromCharCode(byte);
        }
      }
      if (vr === 'UL') {
        value = view.getUint32(valueOffset, true);
      } else if (vr === 'OB') {
        value = '[Binary data]';
      }
      const tag = group.toString(16).padStart(4, '0') + element.toString(16).padStart(4, '0');
      const name = this.metaNames[tag] || 'Unknown';
      console.log(`${name} (${group.toString(16).padStart(4, '0')},${element.toString(16).padStart(4, '0')}): ${value}`);
      if (tag === '00020000') {
        groupLength = view.getUint32(valueOffset, true);
      }
      offset += valueOffset - offset + length;
    }
  }

  write(outputFilepath) {
    if (this.buffer === null) {
      throw new Error('No file read yet');
    }
    fs.writeFileSync(outputFilepath, this.buffer);
    console.log(`File written to ${outputFilepath}`);
  }
}

// Example usage:
// const handler = new DicomHandler();
// handler.read('path/to/file.dcm');
// handler.write('path/to/output.dcm');

The following is a C implementation (using a struct in place of a class) for handling DICOM files. It can open a file, decode and read the properties, print them to stdout, and write a new file with the same properties. Compile with a C compiler (e.g., gcc) and link with standard libraries.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
#include <endian.h>  // For byte order (if available; otherwise assume little-endian host)

typedef struct {
    char *meta_names[9];
    uint8_t *buffer;
    size_t buffer_size;
} DicomHandler;

void init_dicom_handler(DicomHandler *handler) {
    handler->meta_names[0] = "File Meta Information Group Length";
    handler->meta_names[1] = "File Meta Information Version";
    handler->meta_names[2] = "Media Storage SOP Class UID";
    handler->meta_names[3] = "Media Storage SOP Instance UID";
    handler->meta_names[4] = "Transfer Syntax UID";
    handler->meta_names[5] = "Implementation Class UID";
    handler->meta_names[6] = "Implementation Version Name";
    handler->meta_names[7] = "Source Application Entity Title";
    handler->meta_names[8] = "Sending Application Entity Title";
    handler->buffer = NULL;
    handler->buffer_size = 0;
}

void read_dicom(DicomHandler *handler, const char *filepath) {
    FILE *f = fopen(filepath, "rb");
    if (!f) {
        perror("Failed to open file");
        return;
    }
    fseek(f, 0, SEEK_END);
    handler->buffer_size = ftell(f);
    fseek(f, 0, SEEK_SET);
    handler->buffer = malloc(handler->buffer_size);
    fread(handler->buffer, 1, handler->buffer_size, f);
    fclose(f);

    printf("File Preamble: 128 bytes\n");
    if (strncmp((char*)(handler->buffer + 128), "DICM", 4) != 0) {
        printf("Invalid DICOM Prefix\n");
        free(handler->buffer);
        handler->buffer = NULL;
        return;
    }
    printf("DICOM Prefix: DICM\n");
    size_t offset = 132;
    uint32_t group_length = 0;
    while (offset < handler->buffer_size) {
        uint16_t group = le16toh(*(uint16_t*)(handler->buffer + offset));
        if (group != 0x0002) break;
        uint16_t element = le16toh(*(uint16_t*)(handler->buffer + offset + 2));
        char vr[3] = {handler->buffer[offset + 4], handler->buffer[offset + 5], '\0'};
        uint32_t length;
        size_t value_offset;
        if (strstr("OB OW OF SQ UN UT", vr)) {
            value_offset = offset + 12;
            length = le32toh(*(uint32_t*)(handler->buffer + offset + 8));
        } else {
            value_offset = offset + 8;
            length = le16toh(*(uint16_t*)(handler->buffer + offset + 6));
        }
        char *value = malloc(length + 1);
        memcpy(value, handler->buffer + value_offset, length);
        value[length] = '\0';
        uint32_t tag = (group << 16) | element;
        char *name = "Unknown";
        switch (tag) {
            case 0x00020000: name = handler->meta_names[0]; break;
            case 0x00020001: name = handler->meta_names[1]; break;
            case 0x00020002: name = handler->meta_names[2]; break;
            case 0x00020003: name = handler->meta_names[3]; break;
            case 0x00020010: name = handler->meta_names[4]; break;
            case 0x00020012: name = handler->meta_names[5]; break;
            case 0x00020013: name = handler->meta_names[6]; break;
            case 0x00020016: name = handler->meta_names[7]; break;
            case 0x00020017: name = handler->meta_names[8]; break;
        }
        char *val_str = value;
        if (strcmp(vr, "UL") == 0) {
            uint32_t ul_val = le32toh(*(uint32_t*)value);
            printf("%s (%04x,%04x): %u\n", name, group, element, ul_val);
        } else if (strcmp(vr, "OB") == 0) {
            printf("%s (%04x,%04x): [Binary data]\n", name, group, element);
        } else {
            printf("%s (%04x,%04x): %s\n", name, group, element, val_str);
        }
        if (tag == 0x00020000) {
            group_length = le32toh(*(uint32_t*)value);
        }
        free(value);
        offset = value_offset + length;
    }
}

void write_dicom(DicomHandler *handler, const char *output_filepath) {
    if (handler->buffer == NULL) {
        printf("No file read yet\n");
        return;
    }
    FILE *f = fopen(output_filepath, "wb");
    if (!f) {
        perror("Failed to write file");
        return;
    }
    fwrite(handler->buffer, 1, handler->buffer_size, f);
    fclose(f);
    printf("File written to %s\n", output_filepath);
}

void free_dicom_handler(DicomHandler *handler) {
    free(handler->buffer);
    handler->buffer = NULL;
}

// Example usage:
// int main() {
//     DicomHandler handler;
//     init_dicom_handler(&handler);
//     read_dicom(&handler, "path/to/file.dcm");
//     write_dicom(&handler, "path/to/output.dcm");
//     free_dicom_handler(&handler);
//     return 0;
// }