Task 636: .SAV File Format
Task 636: .SAV File Format
1. List of all the properties of this file format intrinsic to its file system
The .SAV file format refers to the SPSS System Data File format, a proprietary binary format used for storing statistical data. Based on reverse-engineered specifications from sources like the GNU PSPP Developers Guide and Library of Congress documentation, the intrinsic properties (metadata fields and structural elements inherent to the file's layout and content) are as follows. These are extracted from the file's header, dictionary, and extension records, excluding the raw data section. I've organized them by section for clarity:
Header Properties (fixed 176-byte record):
- Magic string (4 bytes): Typically "$FL2" or "$FL3" (indicates format and possible ZLIB compression).
- Product identification string (60 bytes): Starts with "@(#) SPSS DATA FILE", includes OS and SPSS version info.
- Layout code (4-byte int32): Usually 2 or 3, indicating integer format.
- Number of variables (4-byte int32): Count of variables in the dataset.
- Compression code (4-byte int32): 0 (uncompressed), 1 (bytecode compression), 2 (ZLIB compression).
- Weight index (4-byte int32): Index of weighting variable (0 if none).
- Number of cases (4-byte int32): Number of observations/rows (-1 if unknown).
- Compression bias (8-byte double): Usually 100.0, used in bytecode compression.
- Creation date (9 bytes): String in "dd mmm yy" format.
- Creation time (8 bytes): String in "hh:mm:ss" format.
- File label (64 bytes): User-defined label, padded with spaces.
- Padding (3 bytes): To align to 176 bytes.
Dictionary Properties (sequence of tagged records, ended by tag 999):
- Variable descriptors (one per variable, tag 2):
- Type (4-byte int32): 0 for numeric, positive integer for string length (in bytes).
- Has label (4-byte int32): 1 if variable has a label, 0 otherwise.
- Number of missing values (4-byte int32): 0-3 for discrete missing values, -2 for range, -3 for range + discrete.
- Print format (4 bytes): Encoded as type (byte), width (byte), decimals (byte), zero (byte).
- Write format (4 bytes): Similar to print format.
- Name (8 bytes): Variable name, padded with spaces.
- Label length (4-byte int32, if has label=1): Length of label.
- Label (variable length, padded to multiple of 4 bytes): Descriptive label.
- Missing values (variable length, if number >0): Doubles or strings representing missing values.
- Value labels (optional, tags 3 and 4):
- Label set (tag 3): Count (int32), followed by pairs of value (8-byte double or string) and label length (int32) + label (padded to 4 bytes).
- Variable assignment (tag 4): Count (int32), list of variable indices (int32) that use this label set.
- Documents (optional, tag 6):
- Line count (int32), followed by list of 80-byte strings (documentation lines).
- Extension records (optional, tag 7):
- Subtype (int32): Identifies extension type.
- Size (int32): Bytes per data element.
- Count (int32): Number of elements.
- Data (variable): Depends on subtype, e.g.:
- Subtype 3 (machine integer info): 8 int32 fields (version, FP rep code, endianness, char code, etc.).
- Subtype 4 (machine floating-point info): Min/max double, sysmis double.
- Subtype 11 (long variable names): String map of short to long names.
- Subtype 13 (long string value labels): Extended labels for long strings.
- Subtype 14 (very long string info): Map of variable to segment lengths.
- Subtype 17 (data file attributes): Custom attributes.
- Subtype 18 (variable attributes): Per-variable custom attributes.
- Subtype 20 (character encoding): String indicating encoding (e.g., "UTF-8").
- Other subtypes for MR sets, display params, etc.
- Dictionary terminator (tag 999): Single int32 (usually 1), marks end of dictionary.
Other Intrinsic Properties:
- Character encoding: Inferred from header or extension (e.g., ASCII, UTF-8, EBCDIC).
- Endianness: Detected from layout code or extension (big or little).
- Numeric representation: From extension (IEEE 754, IBM 370, etc.).
- Data compression details: Bias and method from header.
- Encryption (optional, SPSS 21+): Wrapper with password protection (not always present).
These properties define the file's structure and metadata, independent of the actual data values.
2. Two direct download links for files of format .SAV
- https://rpadgett.butler.edu/ps310/datasets/Teach.sav
- https://rpadgett.butler.edu/ps310/datasets/DarkLord.sav
These are sample SPSS .sav files from an educational resource.
3. Ghost blog embedded HTML JavaScript for drag-and-drop .SAV file dump
Assuming "ghost blog embedded" refers to embeddable HTML/JavaScript code suitable for a Ghost blogging platform post (which supports custom HTML), here's a self-contained HTML snippet with JavaScript. It allows dragging and dropping a .SAV file, parses it client-side using DataView for binary reading, extracts the properties listed in part 1, and dumps them to the screen (in a
element). It handles basic uncompressed or bytecode-compressed files but skips full data parsing and ZLIB (for simplicity; alerts if ZLIB).
Drag and Drop .SAV File Parser
Embed this in a Ghost blog post using the HTML card.
4. Python class for .SAV file handling
Here's a Python class using struct for binary parsing. It can open a .SAV file, decode/read the properties, print them to console, and write a simple new .SAV file (basic numeric dataset for demo; no full write support for complexity).
import struct
import sys
class SavFile:
def __init__(self, filepath=None):
self.header = {}
self.variables = []
self.value_labels = []
self.documents = []
self.extensions = []
self.filepath = filepath
if filepath:
self.read()
def read(self):
with open(self.filepath, 'rb') as f:
data = f.read()
offset = 0
# Header
self.header['magic'] = data[offset:offset+4].decode('ascii', errors='ignore')
offset += 4
self.header['product'] = data[offset:offset+60].decode('ascii', errors='ignore').strip()
offset += 60
self.header['layout'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
self.header['num_vars'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
self.header['compression'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
if self.header['compression'] == 2:
print("ZLIB compression not supported.", file=sys.stderr)
return
self.header['weight_idx'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
self.header['num_cases'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
self.header['bias'] = struct.unpack('<d', data[offset:offset+8])[0]; offset += 8
self.header['date'] = data[offset:offset+9].decode('ascii', errors='ignore').strip(); offset += 9
self.header['time'] = data[offset:offset+8].decode('ascii', errors='ignore').strip(); offset += 8
self.header['label'] = data[offset:offset+64].decode('ascii', errors='ignore').strip(); offset += 64
offset += 3 # Padding
# Dictionary
for _ in range(self.header['num_vars']):
tag = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
if tag != 2: break
var = {}
var['type'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
var['has_label'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
var['n_missing'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
var['print_fmt'] = list(data[offset:offset+4]); offset += 4
var['write_fmt'] = list(data[offset:offset+4]); offset += 4
var['name'] = data[offset:offset+8].decode('ascii', errors='ignore').strip(); offset += 8
if var['has_label']:
label_len = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
var['label'] = data[offset:offset+label_len].decode('ascii', errors='ignore').strip()
offset += ((label_len + 3) // 4) * 4 # Padded
var['missing'] = []
if var['n_missing'] > 0:
for _ in range(var['n_missing']):
if var['type'] == 0:
var['missing'].append(struct.unpack('<d', data[offset:offset+8])[0])
offset += 8
else:
var['missing'].append(data[offset:offset+8].decode('ascii', errors='ignore').strip())
offset += 8
self.variables.append(var)
while True:
tag = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
if tag == 999: break
if tag == 3:
label_set = {}
count = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
label_set['labels'] = []
for _ in range(count):
value = struct.unpack('<d', data[offset:offset+8])[0]; offset += 8
label_len = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
label = data[offset:offset+label_len].decode('ascii', errors='ignore').strip()
offset += ((label_len + 3) // 4) * 4
label_set['labels'].append((value, label))
self.value_labels.append(label_set)
elif tag == 4:
if self.value_labels:
count = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
vars = [struct.unpack('<i', data[offset:offset+4])[0] for _ in range(count)]; offset += 4 * count
self.value_labels[-1]['vars'] = vars
elif tag == 6:
count = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
for _ in range(count):
line = data[offset:offset+80].decode('ascii', errors='ignore').strip(); offset += 80
self.documents.append(line)
elif tag == 7:
ext = {}
ext['subtype'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
ext['size'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
ext['count'] = struct.unpack('<i', data[offset:offset+4])[0]; offset += 4
# Data skipped for brevity; can parse based on subtype
offset += ext['size'] * ext['count']
self.extensions.append(ext)
def print_properties(self):
print("Header:")
for k, v in self.header.items():
print(f"- {k}: {v}")
print("\nVariables:")
for var in self.variables:
print(f"- Name: {var['name']}, Type: {var['type']}, Label: {var.get('label', '')}, Missing: {var['missing']}")
print(f" Print Format: {var['print_fmt']}, Write Format: {var['write_fmt']}")
print("\nValue Labels:")
for lbl in self.value_labels:
print("Set:")
for val, txt in lbl.get('labels', []):
print(f" {val}: {txt}")
print(f" Applied to: {lbl.get('vars', [])}")
print("\nDocuments:")
for doc in self.documents:
print(f"- {doc}")
print("\nExtensions:")
for ext in self.extensions:
print(f"- Subtype: {ext['subtype']}, Size: {ext['size']}, Count: {ext['count']}")
def write(self, output_path):
# Simple write: create basic .sav with header and one numeric var, no data
with open(output_path, 'wb') as f:
f.write(b'$FL2')
f.write(b'@(#) SPSS DATA FILE - Mock - Python ' + b' ' * 28)
f.write(struct.pack('<i', 2)) # layout
f.write(struct.pack('<i', 1)) # num vars
f.write(struct.pack('<i', 0)) # compression
f.write(struct.pack('<i', 0)) # weight
f.write(struct.pack('<i', 0)) # cases
f.write(struct.pack('<d', 100.0)) # bias
f.write(b'11 Nov 25 ') # date
f.write(b'00:00:00') # time
f.write(b'Mock File' + b' ' * 55) # label
f.write(b'\x00\x00\x00') # padding
# Var record
f.write(struct.pack('<i', 2)) # tag
f.write(struct.pack('<i', 0)) # type numeric
f.write(struct.pack('<i', 0)) # no label
f.write(struct.pack('<i', 0)) # no missing
f.write(b'\x05\x08\x02\x00') # print F8.2
f.write(b'\x05\x08\x02\x00') # write F8.2
f.write(b'VAR1 ') # name
# Terminator
f.write(struct.pack('<i', 999))
f.write(struct.pack('<i', 1))
# Example usage:
# sav = SavFile('example.sav')
# sav.print_properties()
# sav.write('output.sav')
5. Java class for .SAV file handling
Here's a Java class using DataInputStream for reading. Similar functionality: read, print to console, write simple file.
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class SavFile {
private Map<String, Object> header = new HashMap<>();
private List<Map<String, Object>> variables = new ArrayList<>();
private List<Map<String, Object>> valueLabels = new ArrayList<>();
private List<String> documents = new ArrayList<>();
private List<Map<String, Object>> extensions = new ArrayList<>();
private String filepath;
public SavFile(String filepath) {
this.filepath = filepath;
read();
}
public void read() {
try (FileInputStream fis = new FileInputStream(filepath);
DataInputStream dis = new DataInputStream(fis)) {
byte[] buf = new byte[176];
dis.readFully(buf);
ByteBuffer bb = ByteBuffer.wrap(buf).order(ByteOrder.LITTLE_ENDIAN);
header.put("magic", new String(buf, 0, 4, "ASCII"));
header.put("product", new String(buf, 4, 60, "ASCII").trim());
header.put("layout", bb.getInt(64));
header.put("num_vars", bb.getInt(68));
header.put("compression", bb.getInt(72));
if ((int) header.get("compression") == 2) {
System.err.println("ZLIB not supported.");
return;
}
header.put("weight_idx", bb.getInt(76));
header.put("num_cases", bb.getInt(80));
header.put("bias", bb.getDouble(84));
header.put("date", new String(buf, 92, 9, "ASCII").trim());
header.put("time", new String(buf, 101, 8, "ASCII").trim());
header.put("label", new String(buf, 109, 64, "ASCII").trim());
int numVars = (int) header.get("num_vars");
for (int v = 0; v < numVars; v++) {
Map<String, Object> var = new HashMap<>();
int tag = dis.readInt();
if (tag != 2) break;
var.put("type", dis.readInt());
var.put("has_label", dis.readInt());
var.put("n_missing", dis.readInt());
byte[] printFmt = new byte[4]; dis.readFully(printFmt);
var.put("print_fmt", printFmt);
byte[] writeFmt = new byte[4]; dis.readFully(writeFmt);
var.put("write_fmt", writeFmt);
byte[] nameB = new byte[8]; dis.readFully(nameB);
var.put("name", new String(nameB, "ASCII").trim());
if ((int) var.get("has_label") == 1) {
int labelLen = dis.readInt();
byte[] labelB = new byte[labelLen]; dis.readFully(labelB);
var.put("label", new String(labelB, "ASCII").trim());
int pad = (4 - (labelLen % 4)) % 4;
dis.skipBytes(pad);
}
List<Object> missing = new ArrayList<>();
int nMiss = (int) var.get("n_missing");
if (nMiss > 0) {
for (int m = 0; m < nMiss; m++) {
if ((int) var.get("type") == 0) {
missing.add(dis.readDouble());
} else {
byte[] missB = new byte[8]; dis.readFully(missB);
missing.add(new String(missB, "ASCII").trim());
}
}
}
var.put("missing", missing);
variables.add(var);
}
while (true) {
int tag = dis.readInt();
if (tag == 999) break;
if (tag == 3) {
Map<String, Object> lblSet = new HashMap<>();
int count = dis.readInt();
List<Object[]> labels = new ArrayList<>();
for (int i = 0; i < count; i++) {
double val = dis.readDouble();
int lblLen = dis.readInt();
byte[] lblB = new byte[lblLen]; dis.readFully(lblB);
labels.add(new Object[]{val, new String(lblB, "ASCII").trim()});
int pad = (4 - (lblLen % 4)) % 4;
dis.skipBytes(pad);
}
lblSet.put("labels", labels);
valueLabels.add(lblSet);
} else if (tag == 4) {
if (!valueLabels.isEmpty()) {
int count = dis.readInt();
List<Integer> vars = new ArrayList<>();
for (int i = 0; i < count; i++) vars.add(dis.readInt());
valueLabels.get(valueLabels.size() - 1).put("vars", vars);
}
} else if (tag == 6) {
int count = dis.readInt();
for (int i = 0; i < count; i++) {
byte[] lineB = new byte[80]; dis.readFully(lineB);
documents.add(new String(lineB, "ASCII").trim());
}
} else if (tag == 7) {
Map<String, Object> ext = new HashMap<>();
ext.put("subtype", dis.readInt());
ext.put("size", dis.readInt());
ext.put("count", dis.readInt());
dis.skipBytes((int) ext.get("size") * (int) ext.get("count"));
extensions.add(ext);
}
}
} catch (IOException e) {
e.printStackTrace();
}
}
public void printProperties() {
System.out.println("Header:");
header.forEach((k, v) -> System.out.println("- " + k + ": " + v));
System.out.println("\nVariables:");
for (Map<String, Object> var : variables) {
System.out.println("- Name: " + var.get("name") + ", Type: " + var.get("type") + ", Label: " + var.getOrDefault("label", ""));
System.out.println(" Missing: " + var.get("missing"));
System.out.print(" Print Format: ");
for (byte b : (byte[]) var.get("print_fmt")) System.out.print(b + " ");
System.out.println();
System.out.print(" Write Format: ");
for (byte b : (byte[]) var.get("write_fmt")) System.out.print(b + " ");
System.out.println();
}
System.out.println("\nValue Labels:");
for (Map<String, Object> lbl : valueLabels) {
System.out.println("Set:");
for (Object[] pair : (List<Object[]>) lbl.get("labels")) {
System.out.println(" " + pair[0] + ": " + pair[1]);
}
System.out.println(" Applied to: " + lbl.getOrDefault("vars", ""));
}
System.out.println("\nDocuments:");
for (String doc : documents) {
System.out.println("- " + doc);
}
System.out.println("\nExtensions:");
for (Map<String, Object> ext : extensions) {
System.out.println("- Subtype: " + ext.get("subtype") + ", Size: " + ext.get("size") + ", Count: " + ext.get("count"));
}
}
public void write(String outputPath) {
try (FileOutputStream fos = new FileOutputStream(outputPath);
DataOutputStream dos = new DataOutputStream(fos)) {
dos.writeBytes("$FL2");
dos.writeBytes("@(#) SPSS DATA FILE - Mock - Java ");
dos.write(new byte[28]); // Pad product
dos.writeInt(2); // layout
dos.writeInt(1); // num vars
dos.writeInt(0); // compression
dos.writeInt(0); // weight
dos.writeInt(0); // cases
dos.writeDouble(100.0); // bias
dos.writeBytes("11 Nov 25 ");
dos.writeBytes("00:00:00");
dos.writeBytes("Mock File");
dos.write(new byte[55]); // Pad label
dos.write(new byte[3]); // Padding
// Var
dos.writeInt(2); // tag
dos.writeInt(0); // type
dos.writeInt(0); // has label
dos.writeInt(0); // n missing
dos.write(new byte[]{5, 8, 2, 0}); // print
dos.write(new byte[]{5, 8, 2, 0}); // write
dos.writeBytes("VAR1 ");
// Terminator
dos.writeInt(999);
dos.writeInt(1);
} catch (IOException e) {
e.printStackTrace();
}
}
// Example usage:
// public static void main(String[] args) {
// SavFile sav = new SavFile("example.sav");
// sav.printProperties();
// sav.write("output.sav");
// }
}
6. JavaScript class for .SAV file handling
Here's a JavaScript class for Node.js (using fs and Buffer). Read, print to console, write simple file.
const fs = require('fs');
class SavFile {
constructor(filepath = null) {
this.header = {};
this.variables = [];
this.valueLabels = [];
this.documents = [];
this.extensions = [];
this.filepath = filepath;
if (filepath) this.read();
}
read() {
const data = fs.readFileSync(this.filepath);
let offset = 0;
this.header.magic = data.toString('ascii', offset, offset + 4);
offset += 4;
this.header.product = data.toString('ascii', offset, offset + 60).trim();
offset += 60;
this.header.layout = data.readInt32LE(offset); offset += 4;
this.header.numVars = data.readInt32LE(offset); offset += 4;
this.header.compression = data.readInt32LE(offset); offset += 4;
if (this.header.compression === 2) {
console.error('ZLIB not supported.');
return;
}
this.header.weightIdx = data.readInt32LE(offset); offset += 4;
this.header.numCases = data.readInt32LE(offset); offset += 4;
this.header.bias = data.readDoubleLE(offset); offset += 8;
this.header.date = data.toString('ascii', offset, offset + 9).trim(); offset += 9;
this.header.time = data.toString('ascii', offset, offset + 8).trim(); offset += 8;
this.header.label = data.toString('ascii', offset, offset + 64).trim(); offset += 64;
offset += 3; // Padding
for (let v = 0; v < this.header.numVars; v++) {
const varObj = {};
const tag = data.readInt32LE(offset); offset += 4;
if (tag !== 2) break;
varObj.type = data.readInt32LE(offset); offset += 4;
varObj.hasLabel = data.readInt32LE(offset); offset += 4;
varObj.nMissing = data.readInt32LE(offset); offset += 4;
varObj.printFmt = [data[offset], data[offset+1], data[offset+2], data[offset+3]]; offset += 4;
varObj.writeFmt = [data[offset], data[offset+1], data[offset+2], data[offset+3]]; offset += 4;
varObj.name = data.toString('ascii', offset, offset + 8).trim(); offset += 8;
if (varObj.hasLabel) {
const labelLen = data.readInt32LE(offset); offset += 4;
varObj.label = data.toString('ascii', offset, offset + labelLen).trim();
offset += Math.ceil(labelLen / 4) * 4;
}
varObj.missing = [];
if (varObj.nMissing > 0) {
for (let m = 0; m < varObj.nMissing; m++) {
if (varObj.type === 0) {
varObj.missing.push(data.readDoubleLE(offset)); offset += 8;
} else {
varObj.missing.push(data.toString('ascii', offset, offset + 8).trim()); offset += 8;
}
}
}
this.variables.push(varObj);
}
while (true) {
const tag = data.readInt32LE(offset); offset += 4;
if (tag === 999) break;
if (tag === 3) {
const lblSet = { labels: [] };
const count = data.readInt32LE(offset); offset += 4;
for (let i = 0; i < count; i++) {
const val = data.readDoubleLE(offset); offset += 8;
const lblLen = data.readInt32LE(offset); offset += 4;
const lbl = data.toString('ascii', offset, offset + lblLen).trim();
offset += Math.ceil(lblLen / 4) * 4;
lblSet.labels.push({ val, lbl });
}
this.valueLabels.push(lblSet);
} else if (tag === 4) {
if (this.valueLabels.length) {
const count = data.readInt32LE(offset); offset += 4;
const vars = [];
for (let i = 0; i < count; i++) {
vars.push(data.readInt32LE(offset)); offset += 4;
}
this.valueLabels[this.valueLabels.length - 1].vars = vars;
}
} else if (tag === 6) {
const count = data.readInt32LE(offset); offset += 4;
for (let i = 0; i < count; i++) {
this.documents.push(data.toString('ascii', offset, offset + 80).trim()); offset += 80;
}
} else if (tag === 7) {
const ext = {};
ext.subtype = data.readInt32LE(offset); offset += 4;
ext.size = data.readInt32LE(offset); offset += 4;
ext.count = data.readInt32LE(offset); offset += 4;
offset += ext.size * ext.count;
this.extensions.push(ext);
}
}
}
printProperties() {
console.log('Header:');
for (const [k, v] of Object.entries(this.header)) {
console.log(`- ${k}: ${v}`);
}
console.log('\nVariables:');
for (const varObj of this.variables) {
console.log(`- Name: ${varObj.name}, Type: ${varObj.type}, Label: ${varObj.label || ''}, Missing: ${varObj.missing}`);
console.log(` Print Format: ${varObj.printFmt.join(', ')}, Write Format: ${varObj.writeFmt.join(', ')}`);
}
console.log('\nValue Labels:');
for (const lbl of this.valueLabels) {
console.log('Set:');
for (const pair of lbl.labels) {
console.log(` ${pair.val}: ${pair.lbl}`);
}
console.log(` Applied to: ${lbl.vars || []}`);
}
console.log('\nDocuments:');
for (const doc of this.documents) {
console.log(`- ${doc}`);
}
console.log('\nExtensions:');
for (const ext of this.extensions) {
console.log(`- Subtype: ${ext.subtype}, Size: ${ext.size}, Count: ${ext.count}`);
}
}
write(outputPath) {
const buffer = Buffer.alloc(1024); // Oversize for simple
let offset = 0;
buffer.write('$FL2', offset); offset += 4;
buffer.write('@(#) SPSS DATA FILE - Mock - JS ', offset); offset += 60;
buffer.writeInt32LE(2, offset); offset += 4;
buffer.writeInt32LE(1, offset); offset += 4;
buffer.writeInt32LE(0, offset); offset += 4;
buffer.writeInt32LE(0, offset); offset += 4;
buffer.writeInt32LE(0, offset); offset += 4;
buffer.writeDoubleLE(100.0, offset); offset += 8;
buffer.write('11 Nov 25 ', offset); offset += 9;
buffer.write('00:00:00', offset); offset += 8;
buffer.write('Mock File', offset); offset += 64;
buffer.fill(0, offset, offset + 3); offset += 3;
// Var
buffer.writeInt32LE(2, offset); offset += 4;
buffer.writeInt32LE(0, offset); offset += 4;
buffer.writeInt32LE(0, offset); offset += 4;
buffer.writeInt32LE(0, offset); offset += 4;
buffer[ offset++ ] = 5; buffer[ offset++ ] = 8; buffer[ offset++ ] = 2; buffer[ offset++ ] = 0;
buffer[ offset++ ] = 5; buffer[ offset++ ] = 8; buffer[ offset++ ] = 2; buffer[ offset++ ] = 0;
buffer.write('VAR1 ', offset); offset += 8;
// Terminator
buffer.writeInt32LE(999, offset); offset += 4;
buffer.writeInt32LE(1, offset); offset += 4;
fs.writeFileSync(outputPath, buffer.slice(0, offset));
}
}
// Example:
// const sav = new SavFile('example.sav');
// sav.printProperties();
// sav.write('output.sav');
7. C "class" for .SAV file handling
In C, we use structs and functions (no classes). Here's a struct with functions for read, print, write.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdint.h>
typedef struct {
char magic[5];
char product[61];
int32_t layout;
int32_t num_vars;
int32_t compression;
int32_t weight_idx;
int32_t num_cases;
double bias;
char date[10];
char time[9];
char label[65];
// Dynamic arrays for variables, etc. (simplified with fixed size for demo)
struct Var {
char name[9];
int32_t type;
int32_t has_label;
int32_t n_missing;
uint8_t print_fmt[4];
uint8_t write_fmt[4];
char label_str[256]; // Assume max
double missing[3]; // Max 3
} variables[100]; // Assume max 100 vars
int var_count;
// Skip value labels, docs, extensions for brevity in C
} SavFile;
void read_sav(SavFile *sav, const char *filepath) {
FILE *f = fopen(filepath, "rb");
if (!f) return;
uint8_t header[176];
fread(header, 1, 176, f);
strncpy(sav->magic, (char*)header, 4); sav->magic[4] = 0;
strncpy(sav->product, (char*)header + 4, 60); sav->product[60] = 0;
memcpy(&sav->layout, header + 64, 4);
memcpy(&sav->num_vars, header + 68, 4);
memcpy(&sav->compression, header + 72, 4);
if (sav->compression == 2) {
fprintf(stderr, "ZLIB not supported.\n");
fclose(f);
return;
}
memcpy(&sav->weight_idx, header + 76, 4);
memcpy(&sav->num_cases, header + 80, 4);
memcpy(&sav->bias, header + 84, 8);
strncpy(sav->date, (char*)header + 92, 9); sav->date[9] = 0;
strncpy(sav->time, (char*)header + 101, 8); sav->time[8] = 0;
strncpy(sav->label, (char*)header + 109, 64); sav->label[64] = 0;
sav->var_count = 0;
for (int v = 0; v < sav->num_vars; v++) {
int32_t tag;
fread(&tag, 4, 1, f);
if (tag != 2) break;
struct Var *var = &sav->variables[sav->var_count++];
fread(&var->type, 4, 1, f);
fread(&var->has_label, 4, 1, f);
fread(&var->n_missing, 4, 1, f);
fread(var->print_fmt, 1, 4, f);
fread(var->write_fmt, 1, 4, f);
fread(var->name, 1, 8, f); var->name[8] = 0;
if (var->has_label) {
int32_t label_len;
fread(&label_len, 4, 1, f);
fread(var->label_str, 1, label_len, f); var->label_str[label_len] = 0;
int pad = (4 - (label_len % 4)) % 4;
fseek(f, pad, SEEK_CUR);
}
if (var->n_missing > 0) {
for (int m = 0; m < var->n_missing; m++) {
if (var->type == 0) {
fread(&var->missing[m], 8, 1, f);
} else {
char miss[9];
fread(miss, 1, 8, f); miss[8] = 0;
var->missing[m] = atof(miss); // Simplified
}
}
}
}
// Skip to terminator for simplicity
while (1) {
int32_t tag;
fread(&tag, 4, 1, f);
if (tag == 999) break;
// Would parse other tags here
fseek(f, -4, SEEK_CUR); // Backtrack for real parsing
}
fclose(f);
}
void print_properties(const SavFile *sav) {
printf("Header:\n");
printf("- magic: %s\n", sav->magic);
printf("- product: %s\n", sav->product);
printf("- layout: %d\n", sav->layout);
printf("- num_vars: %d\n", sav->num_vars);
printf("- compression: %d\n", sav->compression);
printf("- weight_idx: %d\n", sav->weight_idx);
printf("- num_cases: %d\n", sav->num_cases);
printf("- bias: %f\n", sav->bias);
printf("- date: %s\n", sav->date);
printf("- time: %s\n", sav->time);
printf("- label: %s\n", sav->label);
printf("\nVariables:\n");
for (int i = 0; i < sav->var_count; i++) {
const struct Var *var = &sav->variables[i];
printf("- Name: %s, Type: %d, Label: %s, Missing count: %d\n", var->name, var->type, var->label_str, var->n_missing);
printf(" Print Format: %u %u %u %u, Write Format: %u %u %u %u\n",
var->print_fmt[0], var->print_fmt[1], var->print_fmt[2], var->print_fmt[3],
var->write_fmt[0], var->write_fmt[1], var->write_fmt[2], var->write_fmt[3]);
}
// Add prints for other properties if parsed
}
void write_sav(const SavFile *sav, const char *output_path) {
FILE *f = fopen(output_path, "wb");
if (!f) return;
fwrite("$FL2", 1, 4, f);
char product[60] = "@(#) SPSS DATA FILE - Mock - C ";
memset(product + 37, ' ', 23);
fwrite(product, 1, 60, f);
fwrite(&sav->layout, 4, 1, f);
int32_t num_vars = 1;
fwrite(&num_vars, 4, 1, f);
int32_t comp = 0;
fwrite(&comp, 4, 1, f);
int32_t weight = 0;
fwrite(&weight, 4, 1, f);
int32_t cases = 0;
fwrite(&cases, 4, 1, f);
double bias = 100.0;
fwrite(&bias, 8, 1, f);
fwrite("11 Nov 25 ", 1, 9, f);
fwrite("00:00:00", 1, 8, f);
char label[64] = "Mock File";
memset(label + 9, ' ', 55);
fwrite(label, 1, 64, f);
fwrite("\0\0\0", 1, 3, f);
// Var
int32_t tag = 2;
fwrite(&tag, 4, 1, f);
int32_t type = 0;
fwrite(&type, 4, 1, f);
int32_t has_label = 0;
fwrite(&has_label, 4, 1, f);
int32_t n_missing = 0;
fwrite(&n_missing, 4, 1, f);
uint8_t fmt[4] = {5, 8, 2, 0};
fwrite(fmt, 1, 4, f);
fwrite(fmt, 1, 4, f);
fwrite("VAR1 ", 1, 8, f);
// Terminator
int32_t term = 999;
fwrite(&term, 4, 1, f);
int32_t one = 1;
fwrite(&one, 4, 1, f);
fclose(f);
}
// Example:
// int main() {
// SavFile sav;
// read_sav(&sav, "example.sav");
// print_properties(&sav);
// write_sav(&sav, "output.sav");
// return 0;
// }