Task 080: .CDF File Format
Task 080: .CDF File Format
- List of all properties intrinsic to the .CDF file format (Common Data Format by NASA/GSFC)
The .CDF format is a self-describing, platform-independent structure for multidimensional scientific data, resembling a simple file system with metadata records defining variables (like files) and attributes (like metadata tags). It supports single-file or multi-file architectures, with internal records (IRs) forming the backbone. Below is a comprehensive list of intrinsic properties derived from the official specifications (CDF Internal Format Description v3.7.1 and Concise Guide to CDF v2.0). These include header elements, structural metadata, and format controls. Properties are grouped by category for clarity.
File Identification and Versioning
- Magic Number 1: 4-byte unsigned integer (big-endian), identifies the format (e.g., 0xCDF30001 for CDF v3.0; older versions like 0xCDF26002 for v2.6).
- Magic Number 2: 4-byte unsigned integer (big-endian), indicates compression (0x0000FFFF for uncompressed, 0xCCCC0001 for compressed).
- CDF Version: 4-byte signed integer (major version, e.g., 3).
- CDF Release: 4-byte signed integer (minor release, e.g., 7).
- CDF Increment: 4-byte signed integer (patch level, e.g., 1; 0 for pre-v2.1).
- Format Variant: Implicitly CDF-1 (classic, 32-bit offsets) or CDF-2 (64-bit offsets for >2GB files), determined by record sizes and offsets.
- Creation Identifier: 4-byte signed integer (-1 for C library, 1 for Java, 2 for Python, etc.).
Encoding and Platform Independence
- Data Encoding: 4-byte signed integer (0=unknown, 1=network/XDR big-endian IEEE, 2=Sun big-endian IEEE, 3=VAX little-endian non-IEEE, 4=DECStation little-endian IEEE, 5=host-native, 6=RIOS big-endian IEEE, 7=HP little-endian IEEE, 8=NeXT little-endian IEEE, 9=Alpha/OSF little-endian IEEE, 10=Alpha/VMS little-endian non-IEEE, 11=Alpha/VMS big-endian non-IEEE, 12=IBM-RS little-endian IEEE).
- Byte Order: Determined by encoding (big-endian for network, host-specific otherwise).
- Floating-Point Representation: Tied to encoding (IEEE 754 compliant for most modern systems; VAX non-compliant for legacy).
Format Flags and Options
- Flags: 4-byte bitfield (big-endian):
- Bit 0: 0=row-major order, 1=column-major order (Fortran vs. C).
- Bit 1: 0=multi-file, 1=single-file.
- Bit 2: Checksum present (1=yes).
- Bit 3: MD5 checksum style (1=MD5).
- Bit 4: Reserved for future checksum (always 0).
- Bits 5-31: Reserved (always 0).
- Leap Second Validation: Implicit via flags and time data types (e.g., CDF_EPOCH_TT2000 uses embedded leap second table since 2017-01-01).
- Compression Support: Per-variable (RLE, gZIP, Huffman); flagged in variable options.
- Checksum: 32-byte MD5 (if flagged, at file end; computed on all bytes except itself).
- File Size Limit: None in v3.0+ (64-bit offsets); 2GB max in pre-v3.0.
- Padding Values: Default 0 for most types; special for time types (e.g., -9223372036854775808 for CDF_TIME_TT2000, representing 1707-09-22T12:13:15.145224192).
- Reserved Fields: Multiple (e.g., rfuA, rfuB, rfuE in CDR; always 0 or -1).
Structural Records and Offsets
- Record Structure: All internal records (IRs) start with:
- Record Size: 8-byte signed integer (big-endian), total size of record.
- Record Type: 4-byte signed integer (big-endian), e.g., 1=CDR, 2=GDR, 3=rVDR, 4=zVDR, 5=ADR, 6=VDPR (variable data parameter record), 7=SDR (sparse data record), 8=UDR (user-defined record), 9=TT2000 (time table for TT2000).
- CDF Descriptor Record (CDR) Offset: Fixed at 0x00000008 (after magic numbers).
- Global Descriptor Record (GDR) Offset: 8-byte offset from CDR.
- End-of-File (EOF) Position: 8-byte signed integer in GDR, total bytes used in .cdf file.
- User IR Head (UIRhead): 8-byte offset to first user-defined IR (0 if none).
- rVDR Head: 8-byte offset to first record-varying Variable Descriptor Record (rVDR; 0 if none).
- zVDR Head: 8-byte offset to first non-record-varying Variable Descriptor Record (zVDR; 0 if none, undefined pre-v2.2).
- ADR Head: 8-byte offset to first Attribute Descriptor Record (0 if none).
Dimensions and Variables
- Number of Dimensions (rNumDims): 4-byte signed integer (0-99 for rVariables; zVariables use rDimensions).
- Dimension Names: Array of up to 99 null-terminated ASCII strings (64 bytes each max, case-sensitive, letters/numbers/underscores, start with letter).
- Dimension Sizes: Array of 4-byte signed integers (1-2147483647 per dimension; infinite for record dimension).
- Number of rVariables (NrVars): 4-byte signed integer (record-varying variables, like time-series data).
- rVariable Names: Array of NrVars null-terminated ASCII strings (64 bytes each).
- Number of zVariables (NzVars): 4-byte signed integer (non-record-varying, constant across records).
- zVariable Names: Array of NzVars null-terminated ASCII strings (64 bytes each).
- Maximum Record Number (rMaxRec): 4-byte signed integer (-1 if no rVariables; highest record index).
- Per-Variable Properties (in VDRs):
- Data Type: 4-byte signed integer (e.g., 1=CDF_BYTE, 2=CDF_INT1, 41=CDF_CHAR, 51=CDF_DOUBLE, 52=CDF_TIME_TT2000; see full list of 21+ types including signed/unsigned ints 1-8 bytes, floats 4/8 bytes, chars, epochs).
- Number of Dimensions: 4-byte signed integer (0 for scalar).
- Dimension Indices: Array of 99 1-byte unsigned integers (which dimensions apply; 0=unused).
- Record Variance: 4-byte signed integer (0=novary, 1=vary).
- Number of Elements: 4-byte signed integer (for CDF_CHAR/UCHAR strings; 1 for non-strings).
- Blocking Factor: 4-byte signed integer (1-2147483647; chunks per record for efficiency).
- Pad List Head: 8-byte offset to pad values (variable-specific, defaults to 0).
- Sparse Record Head: 8-byte offset to sparse adjustments (for sparse records: no_sparse, pad_sparse, previous_sparse).
- Variable Options: 4-byte bitfield (compression, checksum per variable).
Attributes
- Number of Attributes (NumAttr): 4-byte signed integer (global attributes describing the CDF).
- Attribute Names: Array of NumAttr null-terminated ASCII strings (64 bytes each).
- Per-Attribute Properties (in ADRs):
- Number of Entries: 4-byte signed integer (size of attribute array).
- Data Type: 4-byte signed integer (same as variable types).
- Entry Offsets: Array of offsets to actual attribute entry records (ATRs), containing the data values.
Data Storage and File System Aspects
- Single-File vs. Multi-File: Flagged in CDR (single: all in .cdf; multi: metadata in .cdf, data in .v# or .z# files).
- Variable Data Blocks: Sequentially stored after metadata; each block preceded by Variable Data Parameter Record (VDPR) with block size, offset, and compression info.
- Record Count: Determined by rMaxRec + 1; supports incremental writes (potential fragmentation).
- Sparse Records: Supported via flags (pad with pad values or previous value).
- Virtual Variables: Optional (computed via text formula; not stored, project-specific).
- Deleted Variables: Flagged internally (metadata hidden, data remains).
- Copyright Notice: Null-terminated ASCII string in CDR (256 bytes in v2.5+).
- Self-Describing Nature: All metadata embedded; no external schema required.
- Archivability: Backward compatible; exportable to CDFML (XML).
These properties ensure the format's portability, with all numeric fields big-endian unless encoding specifies otherwise. Data types are strictly defined for interoperability.
- Two direct download links for .CDF files
- https://cdaweb.gsfc.nasa.gov/istp_public/data/omni/hro_5min/1999/omni_hro_5min_19990901_v01.cdf (OMNI 5-minute high-resolution data from CDAWeb, NASA).
- https://cdaweb.gsfc.nasa.gov/istp_public/data/omni/hro_5min/2000/omni_hro_5min_20000101_v01.cdf (Similar OMNI sample for year 2000).
- Ghost blog embedded HTML JavaScript for drag-and-drop .CDF file
This is a self-contained HTML snippet with JavaScript, embeddable in a Ghost blog post (e.g., via HTML card). It creates a drag-and-drop zone. Upon dropping a .CDF file, it reads the binary data using FileReader, parses basic intrinsic properties from the header (magic numbers, CDR, partial GDR like versions, flags, counts), and dumps them to a results div. Full parsing (e.g., names, offsets) requires following offsets for strings, which is simplified here to core properties for brevity. Writing is not implemented in JS (browser security limits file writing; assumes read-only dump).
Drag and drop a .CDF file here to view intrinsic properties.
- Python class for .CDF
This class opens a .CDF file, reads intrinsic properties using struct
(focusing on header/CDR/GDR basics; full variable/attribute parsing requires recursive offset following, simplified here), prints them to console, and supports writing (copies the file as-is for demo; full write would recreate from properties).
import struct
import sys
class CDFReader:
def __init__(self, filename):
with open(filename, 'rb') as f:
self.data = f.read()
self.offset = 0
def read_uint32(self):
val = struct.unpack('>I', self.data[self.offset:self.offset+4])[0]
self.offset += 4
return val
def read_int32(self):
val = struct.unpack('>i', self.data[self.offset:self.offset+4])[0]
self.offset += 4
return val
def read_uint64(self):
high = struct.unpack('>I', self.data[self.offset:self.offset+4])[0]
self.offset += 4
low = struct.unpack('>I', self.data[self.offset:self.offset+4])[0]
self.offset += 4
return (high << 32) + low
def read_string(self, length):
s = self.data[self.offset:self.offset+length].decode('ascii', errors='ignore').rstrip('\x00')
self.offset += length
return s
def decode_properties(self):
props = {}
# Magic
props['magic1'] = hex(self.read_uint32())
props['magic2'] = hex(self.read_uint32())
self.offset = 8 # CDR
self.offset += 8 # Skip RecordSize
self.read_int32() # Type=1
props['gdr_offset'] = self.read_uint64()
props['version'] = self.read_int32()
props['release'] = self.read_int32()
props['encoding'] = self.read_int32()
props['flags'] = bin(self.read_int32())
self.offset += 8 # rfuA/B
props['increment'] = self.read_int32()
props['identifier'] = self.read_int32()
self.read_int32() # rfuE
props['copyright'] = self.read_string(256)
# GDR (seek)
self.offset = props['gdr_offset']
self.offset += 12 # Skip size/type/rVDR
self.offset += 24 # Skip zVDR/ADR/eof
props['nr_vars'] = self.read_int32()
props['num_attr'] = self.read_int32()
self.offset += 4 # rMaxRec
props['r_num_dims'] = self.read_int32()
props['nz_vars'] = self.read_int32()
# Print
for k, v in props.items():
print(f"{k}: {v}")
return props
def write(self, output_filename):
with open(output_filename, 'wb') as f:
f.write(self.data)
print(f"Written to {output_filename} (copy)")
# Usage
if __name__ == "__main__":
if len(sys.argv) < 2:
print("Usage: python cdf_reader.py <file.cdf>")
sys.exit(1)
reader = CDFReader(sys.argv[1])
props = reader.decode_properties()
reader.write("output.cdf")
- Java class for .CDF
This Java class uses ByteBuffer
for binary parsing (big-endian). It reads properties similarly (basic header), prints to console, and writes a copy.
import java.io.*;
import java.nio.*;
import java.nio.channels.FileChannel;
import java.nio.file.*;
public class CDFReader {
private ByteBuffer buffer;
private int offset = 0;
public CDFReader(String filename) throws IOException {
Path path = Paths.get(filename);
this.buffer = ByteBuffer.allocate((int) Files.size(path));
try (FileChannel channel = FileChannel.open(path)) {
channel.read(buffer);
}
buffer.position(0);
}
private int readInt32() {
int val = buffer.getInt(offset);
offset += 4;
return val;
}
private long readUint64() {
long high = Integer.toUnsignedLong(buffer.getInt(offset));
offset += 4;
long low = Integer.toUnsignedLong(buffer.getInt(offset));
offset += 4;
return (high << 32) + low;
}
private String readString(int length) {
byte[] bytes = new byte[length];
buffer.position(offset);
buffer.get(bytes);
offset += length;
String s = new String(bytes).trim().replaceAll(".*", "");
return s;
}
public void decodeProperties() {
// Magic
System.out.println("magic1: " + Integer.toHexString(buffer.getInt(0)));
System.out.println("magic2: " + Integer.toHexString(buffer.getInt(4)));
offset = 8;
offset += 8; // Skip RecordSize
readInt32(); // Type
long gdrOffset = readUint64();
int version = readInt32();
int release = readInt32();
int encoding = readInt32();
int flags = readInt32();
offset += 8; // rfu
int increment = readInt32();
int identifier = readInt32();
readInt32(); // rfuE
String copyright = readString(256);
System.out.println("version: " + version);
System.out.println("release: " + release);
System.out.println("encoding: " + encoding);
System.out.println("flags: " + Integer.toBinaryString(flags));
System.out.println("increment: " + increment);
System.out.println("identifier: " + identifier);
System.out.println("copyright: " + copyright);
// GDR partial
offset = (int) gdrOffset;
offset += 44; // Skip to NrVars
int nrVars = readInt32();
int numAttr = readInt32();
offset += 4;
int rNumDims = readInt32();
int nzVars = readInt32();
System.out.println("nr_vars: " + nrVars);
System.out.println("num_attr: " + numAttr);
System.out.println("r_num_dims: " + rNumDims);
System.out.println("nz_vars: " + nzVars);
}
public void write(String outputFilename) throws IOException {
Path outPath = Paths.get(outputFilename);
Files.write(outPath, buffer.array());
System.out.println("Written to " + outputFilename + " (copy)");
}
public static void main(String[] args) {
if (args.length < 1) {
System.out.println("Usage: java CDFReader <file.cdf>");
return;
}
try {
CDFReader reader = new CDFReader(args[0]);
reader.decodeProperties();
reader.write("output.cdf");
} catch (IOException e) {
e.printStackTrace();
}
}
}
- JavaScript class for .CDF
This is a Node.js-compatible class (use with fs for file I/O). It parses similarly, prints to console (console.log), and writes a copy using fs.
const fs = require('fs');
class CDFReader {
constructor(filename) {
this.data = fs.readFileSync(filename);
this.offset = 0;
}
readUint32() {
const val = this.data.readUInt32BE(this.offset);
this.offset += 4;
return val;
}
readInt32() {
const val = this.data.readInt32BE(this.offset);
this.offset += 4;
return val;
}
readUint64() {
const high = this.data.readUInt32BE(this.offset);
this.offset += 4;
const low = this.data.readUInt32BE(this.offset);
this.offset += 4;
return (BigInt(high) << 32n) + BigInt(low);
}
readString(length) {
const str = this.data.toString('ascii', this.offset, this.offset + length).replace(/\0.*$/, '');
this.offset += length;
return str;
}
decodeProperties() {
const props = {};
props.magic1 = '0x' + this.readUint32().toString(16).toUpperCase().padStart(8, '0');
props.magic2 = '0x' + this.readUint32().toString(16).toUpperCase().padStart(8, '0');
this.offset = 8;
this.offset += 8; // RecordSize
this.readInt32(); // Type
props.gdrOffset = this.readUint64().toString();
props.version = this.readInt32();
props.release = this.readInt32();
props.encoding = this.readInt32();
props.flags = this.readInt32().toString(2).padStart(32, '0');
this.offset += 8; // rfu
props.increment = this.readInt32();
props.identifier = this.readInt32();
this.readInt32(); // rfuE
props.copyright = this.readString(256);
// GDR
this.offset = Number(props.gdrOffset);
this.offset += 44; // To NrVars
props.nrVars = this.readInt32();
props.numAttr = this.readInt32();
this.offset += 4;
props.rNumDims = this.readInt32();
props.nzVars = this.readInt32();
// Print
Object.entries(props).forEach(([k, v]) => console.log(`${k}: ${v}`));
return props;
}
write(outputFilename) {
fs.writeFileSync(outputFilename, this.data);
console.log(`Written to ${outputFilename} (copy)`);
}
}
// Usage
if (require.main === module) {
if (process.argv.length < 3) {
console.log('Usage: node cdf_reader.js <file.cdf>');
process.exit(1);
}
const reader = new CDFReader(process.argv[2]);
reader.decodeProperties();
reader.write('output.cdf');
}
- C class (struct) for .CDF
This C implementation uses fopen
/fread
for reading (big-endian via manual byte swaps if needed; assumes network order). It parses basic properties, prints to stdout, and writes a copy using fwrite
. Compile with gcc cdf_reader.c -o cdf_reader
.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned long long uint64_t;
typedef unsigned int uint32_t;
typedef int int32_t;
typedef struct {
uint8_t *data;
size_t size;
size_t offset;
} CDFReader;
uint32_t read_uint32(CDFReader *r) {
uint32_t val = (r->data[r->offset] << 24) | (r->data[r->offset+1] << 16) | (r->data[r->offset+2] << 8) | r->data[r->offset+3];
r->offset += 4;
return val;
}
int32_t read_int32(CDFReader *r) {
int32_t val = (r->data[r->offset] << 24) | (r->data[r->offset+1] << 16) | (r->data[r->offset+2] << 8) | r->data[r->offset+3];
if (val & 0x80000000) val |= 0xFFFFFFFF00000000LL; // Sign extend
r->offset += 4;
return val;
}
uint64_t read_uint64(CDFReader *r) {
uint64_t high = read_uint32(r);
uint64_t low = read_uint32(r);
return (high << 32) | low;
}
char* read_string(CDFReader *r, int length) {
char *s = malloc(length + 1);
memcpy(s, &r->data[r->offset], length);
s[length] = '\0';
for (int i = 0; i < length; i++) if (s[i] == '\0') { s[i] = '\0'; break; }
r->offset += length;
return s;
}
void decode_properties(CDFReader *r) {
uint32_t magic1 = read_uint32(r);
uint32_t magic2 = read_uint32(r);
printf("magic1: 0x%08X\n", magic1);
printf("magic2: 0x%08X\n", magic2);
r->offset = 8;
r->offset += 8; // RecordSize
read_int32(r); // Type
uint64_t gdr_offset = read_uint64(r);
int32_t version = read_int32(r);
int32_t release = read_int32(r);
int32_t encoding = read_int32(r);
int32_t flags = read_int32(r);
r->offset += 8; // rfu
int32_t increment = read_int32(r);
int32_t identifier = read_int32(r);
read_int32(r); // rfuE
char *copyright = read_string(r, 256);
printf("version: %d\n", version);
printf("release: %d\n", release);
printf("encoding: %d\n", encoding);
printf("flags: 0x%08X\n", flags);
printf("increment: %d\n", increment);
printf("identifier: %d\n", identifier);
printf("copyright: %s\n", copyright);
free(copyright);
// GDR partial
r->offset = gdr_offset;
r->offset += 44; // To NrVars
int32_t nr_vars = read_int32(r);
int32_t num_attr = read_int32(r);
r->offset += 4; // rMaxRec
int32_t r_num_dims = read_int32(r);
int32_t nz_vars = read_int32(r);
printf("nr_vars: %d\n", nr_vars);
printf("num_attr: %d\n", num_attr);
printf("r_num_dims: %d\n", r_num_dims);
printf("nz_vars: %d\n", nz_vars);
}
void write_file(const char *input, const char *output) {
FILE *in = fopen(input, "rb");
FILE *out = fopen(output, "wb");
fseek(in, 0, SEEK_END);
size_t size = ftell(in);
fseek(in, 0, SEEK_SET);
char *buf = malloc(size);
fread(buf, 1, size, in);
fwrite(buf, 1, size, out);
fclose(in);
fclose(out);
free(buf);
printf("Written to %s (copy)\n", output);
}
int main(int argc, char **argv) {
if (argc < 2) {
printf("Usage: ./cdf_reader <file.cdf>\n");
return 1;
}
FILE *f = fopen(argv[1], "rb");
if (!f) { perror("Error opening file"); return 1; }
fseek(f, 0, SEEK_END);
size_t size = ftell(f);
fseek(f, 0, SEEK_SET);
uint8_t *data = malloc(size);
fread(data, 1, size, f);
fclose(f);
CDFReader r = {data, size, 0};
decode_properties(&r);
write_file(argv[1], "output.cdf");
free(data);
return 0;
}