Task 664: .SHX File Format
Task 664: .SHX File Format
1. Properties of the .SHX File Format Intrinsic to Its Structure
The .SHX file format, part of the ESRI Shapefile specification, is a binary index file designed for direct access to records in the corresponding .SHP file. It features a fixed 100-byte header followed by fixed-length 8-byte records. Below is a comprehensive list of all intrinsic structural properties, including byte offsets, data types, byte orders, and constraints. These are derived directly from the ESRI Shapefile Technical Description.
Header Properties (Bytes 0–99)
- File Code (Bytes 0–3): 32-bit signed integer in big-endian byte order; fixed value of 9994 (0x270A).
- Unused Fields (Bytes 4–23): Five consecutive 32-bit signed integers in big-endian byte order; each must be 0.
- File Length (Bytes 24–27): 32-bit signed integer in big-endian byte order; represents the total file length in 16-bit words (multiply by 2 for bytes). Includes the 50-word header plus 4 words per record.
- Version (Bytes 28–31): 32-bit signed integer in little-endian byte order; fixed value of 1000 (0x03E8).
- Shape Type (Bytes 32–35): 32-bit signed integer in little-endian byte order; specifies the geometry type (all non-null shapes in the file must match). Valid values: 0 (Null), 1 (Point), 3 (PolyLine), 5 (Polygon), 8 (MultiPoint), 11 (PointZ), 13 (PolyLineZ), 15 (PolygonZ), 18 (MultiPointZ), 21 (PointM), 23 (PolyLineM), 25 (PolygonM), 28 (MultiPointM), 31 (MultiPatch). Other values are reserved.
- X Minimum (Bytes 36–43): 64-bit IEEE double-precision floating-point number in little-endian byte order; minimum X coordinate of the bounding box.
- Y Minimum (Bytes 44–51): 64-bit IEEE double-precision floating-point number in little-endian byte order; minimum Y coordinate of the bounding box.
- X Maximum (Bytes 52–59): 64-bit IEEE double-precision floating-point number in little-endian byte order; maximum X coordinate of the bounding box.
- Y Maximum (Bytes 60–67): 64-bit IEEE double-precision floating-point number in little-endian byte order; maximum Y coordinate of the bounding box.
- Z Minimum (Bytes 68–75): 64-bit IEEE double-precision floating-point number in little-endian byte order; minimum Z coordinate (optional; value 0.0 if not a Z-type shape).
- Z Maximum (Bytes 76–83): 64-bit IEEE double-precision floating-point number in little-endian byte order; maximum Z coordinate (optional; value 0.0 if not a Z-type shape).
- M Minimum (Bytes 84–91): 64-bit IEEE double-precision floating-point number in little-endian byte order; minimum measure value (optional; value 0.0 or < -10^38 for "no data" if not a measured shape).
- M Maximum (Bytes 92–99): 64-bit IEEE double-precision floating-point number in little-endian byte order; maximum measure value (optional; value 0.0 or < -10^38 for "no data" if not a measured shape).
Record Properties (Starting at Byte 100; Fixed 8 Bytes per Record)
- Number of Records: Derived value; calculated as ((File Length × 2) - 100) / 8. One record per shape in the .SHP file.
- Record Offset (Bytes 0–3 of each record): 32-bit signed integer in big-endian byte order; position of the corresponding .SHP record header in 16-bit words from the start of the .SHP file (multiply by 2 for bytes). First record offset is typically 50 (after 100-byte .SHP header).
- Record Content Length (Bytes 4–7 of each record): 32-bit signed integer in big-endian byte order; length of the .SHP record content (excluding header) in 16-bit words (multiply by 2 for bytes).
General Structural Properties
- Header Size: Fixed at 100 bytes (50 × 16-bit words).
- Record Size: Fixed at 8 bytes (4 × 16-bit words) per record.
- Byte Order Conventions: File management fields (File Code, Unused, File Length, Offsets, Content Lengths) use big-endian; data fields (Version, Shape Type, Bounding Box) use little-endian.
- Units: All lengths and offsets are in 16-bit words (2 bytes each); convert to bytes by multiplying by 2.
- File Naming: Must share the base name with .SHP and .DBF files (8.3 convention, lowercase on case-sensitive systems).
- Record Order: Matches the order of records in the .SHP and .DBF files for one-to-one correspondence.
- Constraints: File must not contain NaN or infinity in floating-point values; "no data" for measures is any value < -10^38. Empty files (no records) leave bounding box unspecified.
- Endianness Dependency: Structure assumes little-endian for data (Intel/PC) and big-endian for management (Motorola/Sun).
2. Direct Download Links for Sample .SHX Files
Sample .SHX files are typically bundled in ZIP archives with other Shapefile components (.SHP, .DBF, .PRJ). Below are two direct links to such archives containing valid .SHX files (country boundary examples):
- Australia country boundaries: https://www.statsilk.com/files/country/StatPlanet_Australia.zip
- Afghanistan country boundaries: https://www.statsilk.com/files/country/StatPlanet_Afghanistan.zip
3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .SHX Parsing
The following is a self-contained HTML snippet with embedded JavaScript, suitable for embedding in a Ghost blog post (e.g., via the HTML card). It enables drag-and-drop of a .SHX file, parses its structure using the File API and DataView, and displays all properties in a formatted <pre> block on the page. No external dependencies are required.
Drag and drop a .SHX file here to parse its properties.
To write a new .SHX file, extend the script with a writeSHX function using Blob and download links, but the above focuses on read/parse/dump as specified.
4. Python Class for .SHX Handling
The following Python class uses the struct module to read, decode, and write .SHX files. It prints all properties to console upon instantiation (read) or after write() call. Tested conceptually against the specification.
import struct
import sys
class SHXHandler:
def __init__(self, filename=None):
self.filename = filename
self.header = {}
self.records = []
if filename:
self.read(filename)
self.print_properties()
def read(self, filename):
with open(filename, 'rb') as f:
data = f.read()
pos = 0
# Header
self.header['file_code'] = struct.unpack('>i', data[pos:pos+4])[0] # Big-endian
pos += 4
self.header['unused'] = [struct.unpack('>i', data[pos+4*i:pos+4*(i+1)])[0] for i in range(5)]
pos += 20
self.header['file_length_words'] = struct.unpack('>i', data[pos:pos+4])[0]
pos += 4
self.header['version'] = struct.unpack('<I', data[pos:pos+4])[0] # Little-endian
pos += 4
self.header['shape_type'] = struct.unpack('<I', data[pos:pos+4])[0]
pos += 4
bbox_fmt = '<dddddddd' # Little-endian doubles
bbox = struct.unpack(bbox_fmt, data[pos:pos+64])
self.header['xmin'], self.header['ymin'], self.header['xmax'], self.header['ymax'] = bbox[:4]
self.header['zmin'], self.header['zmax'], self.header['mmin'], self.header['mmax'] = bbox[4:]
pos += 64
# Records
num_records = ((self.header['file_length_words'] * 2) - 100) // 8
for i in range(num_records):
offset, length = struct.unpack('>ii', data[pos:pos+8]) # Big-endian
self.records.append({'offset_words': offset, 'content_length_words': length})
pos += 8
def print_properties(self):
print('=== .SHX File Properties ===')
print('\n--- Header ---')
print(f'File Code: {self.header["file_code"]} (expected 9994)')
print(f'File Length: {self.header["file_length_words"]} words ({self.header["file_length_words"] * 2} bytes)')
print(f'Version: {self.header["version"]} (expected 1000)')
print(f'Shape Type: {self.header["shape_type"]}')
print(f'Bounding Box: Xmin={self.header["xmin"]}, Ymin={self.header["ymin"]}, Xmax={self.header["xmax"]}, Ymax={self.header["ymax"]}')
print(f'Z Range: Zmin={self.header["zmin"]}, Zmax={self.header["zmax"]}')
print(f'M Range: Mmin={self.header["mmin"]}, Mmax={self.header["mmax"]}')
print(f'\nNumber of Records: {len(self.records)}')
print('--- Records ---')
for i, rec in enumerate(self.records):
print(f'Record {i}: Offset={rec["offset_words"]} words ({rec["offset_words"] * 2} bytes), '
f'Content Length={rec["content_length_words"]} words ({rec["content_length_words"] * 2} bytes)')
def write(self, output_filename):
with open(output_filename, 'wb') as f:
pos = 0
# Write header (simplified; assumes provided header data)
f.write(struct.pack('>i', self.header['file_code']))
for u in self.header['unused']:
f.write(struct.pack('>i', u))
f.write(struct.pack('>i', self.header['file_length_words']))
f.write(struct.pack('<I', self.header['version']))
f.write(struct.pack('<I', self.header['shape_type']))
bbox = [self.header['xmin'], self.header['ymin'], self.header['xmax'], self.header['ymax'],
self.header['zmin'], self.header['zmax'], self.header['mmin'], self.header['mmax']]
f.write(struct.pack('<dddddddd', *bbox))
# Write records
for rec in self.records:
f.write(struct.pack('>ii', rec['offset_words'], rec['content_length_words']))
print(f'Wrote .SHX to {output_filename}')
self.print_properties()
# Usage: handler = SHXHandler('sample.shx'); handler.write('output.shx')
5. Java Class for .SHX Handling
The following Java class uses DataInputStream and DataOutputStream for binary I/O. It reads/decodes the file upon construction, prints properties to console, and supports writing. Byte order is handled explicitly.
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
public class SHXHandler {
private int fileCode;
private int[] unused = new int[5];
private int fileLengthWords;
private int version;
private int shapeType;
private double xmin, ymin, xmax, ymax, zmin, zmax, mmin, mmax;
private int[][] records; // [offsetWords, contentLengthWords]
public SHXHandler(String filename) throws IOException {
DataInputStream dis = new DataInputStream(new FileInputStream(filename));
try {
// Header
fileCode = readBigEndianInt(dis);
for (int i = 0; i < 5; i++) unused[i] = readBigEndianInt(dis);
fileLengthWords = readBigEndianInt(dis);
version = readLittleEndianInt(dis);
shapeType = readLittleEndianInt(dis);
xmin = dis.readDouble(); // Little-endian by default in DataInputStream
ymin = dis.readDouble();
xmax = dis.readDouble();
ymax = dis.readDouble();
zmin = dis.readDouble();
zmax = dis.readDouble();
mmin = dis.readDouble();
mmax = dis.readDouble();
// Records
int numRecords = ((fileLengthWords * 2) - 100) / 8;
records = new int[numRecords][2];
for (int i = 0; i < numRecords; i++) {
records[i][0] = readBigEndianInt(dis); // Offset
records[i][1] = readBigEndianInt(dis); // Length
}
printProperties();
} finally {
dis.close();
}
}
private int readBigEndianInt(DataInputStream dis) throws IOException {
byte[] buf = new byte[4];
dis.readFully(buf);
ByteBuffer bb = ByteBuffer.wrap(buf).order(ByteOrder.BIG_ENDIAN);
return bb.getInt();
}
private int readLittleEndianInt(DataInputStream dis) throws IOException {
byte[] buf = new byte[4];
dis.readFully(buf);
ByteBuffer bb = ByteBuffer.wrap(buf).order(ByteOrder.LITTLE_ENDIAN);
return bb.getInt();
}
public void printProperties() {
System.out.println("=== .SHX File Properties ===");
System.out.println("\n--- Header ---");
System.out.printf("File Code: %d (expected 9994)%n", fileCode);
System.out.printf("File Length: %d words (%d bytes)%n", fileLengthWords, fileLengthWords * 2);
System.out.printf("Version: %d (expected 1000)%n", version);
System.out.printf("Shape Type: %d%n", shapeType);
System.out.printf("Bounding Box: Xmin=%.6f, Ymin=%.6f, Xmax=%.6f, Ymax=%.6f%n", xmin, ymin, xmax, ymax);
System.out.printf("Z Range: Zmin=%.6f, Zmax=%.6f%n", zmin, zmax);
System.out.printf("M Range: Mmin=%.6f, Mmax=%.6f%n", mmin, mmax);
int numRecords = records.length;
System.out.printf("%nNumber of Records: %d%n", numRecords);
System.out.println("--- Records ---");
for (int i = 0; i < numRecords; i++) {
System.out.printf("Record %d: Offset=%d words (%d bytes), Content Length=%d words (%d bytes)%n",
i, records[i][0], records[i][0] * 2, records[i][1], records[i][1] * 2);
}
}
public void write(String outputFilename) throws IOException {
DataOutputStream dos = new DataOutputStream(new FileOutputStream(outputFilename));
try {
// Write header
writeBigEndianInt(dos, fileCode);
for (int u : unused) writeBigEndianInt(dos, u);
writeBigEndianInt(dos, fileLengthWords);
dos.writeInt(version); // Little-endian default
dos.writeInt(shapeType);
dos.writeDouble(xmin);
dos.writeDouble(ymin);
dos.writeDouble(xmax);
dos.writeDouble(ymax);
dos.writeDouble(zmin);
dos.writeDouble(zmax);
dos.writeDouble(mmin);
dos.writeDouble(mmax);
// Write records
for (int[] rec : records) {
writeBigEndianInt(dos, rec[0]);
writeBigEndianInt(dos, rec[1]);
}
System.out.println("Wrote .SHX to " + outputFilename);
printProperties();
} finally {
dos.close();
}
}
private void writeBigEndianInt(DataOutputStream dos, int value) throws IOException {
ByteBuffer bb = ByteBuffer.allocate(4).order(ByteOrder.BIG_ENDIAN).putInt(value);
dos.write(bb.array());
}
// Usage: new SHXHandler("sample.shx").write("output.shx");
}
6. JavaScript Class for .SHX Handling
The following is a plain JavaScript class (ES6) for Node.js or browser environments, using fs for Node or assuming a buffer input. It reads/decodes from a buffer, prints properties to console, and supports writing to a new buffer/file. For browser use, pass an ArrayBuffer from FileReader.
class SHXHandler {
constructor(bufferOrFilename) {
this.header = {};
this.records = [];
if (typeof bufferOrFilename === 'string') {
const fs = require('fs');
const buffer = fs.readFileSync(bufferOrFilename);
this.read(buffer);
} else {
this.read(bufferOrFilename);
}
this.printProperties();
}
read(buffer) {
const view = new DataView(buffer);
let pos = 0;
// Header
this.header.fileCode = view.getInt32(pos, false); // Big-endian
pos += 4;
this.header.unused = [];
for (let i = 0; i < 5; i++) {
this.header.unused.push(view.getInt32(pos, false));
pos += 4;
}
this.header.fileLengthWords = view.getInt32(pos, false);
pos += 4;
this.header.version = view.getUint32(pos, true); // Little-endian
pos += 4;
this.header.shapeType = view.getUint32(pos, true);
pos += 4;
// Bounding Box
this.header.xmin = view.getFloat64(pos, true);
pos += 8;
this.header.ymin = view.getFloat64(pos, true);
pos += 8;
this.header.xmax = view.getFloat64(pos, true);
pos += 8;
this.header.ymax = view.getFloat64(pos, true);
pos += 8;
this.header.zmin = view.getFloat64(pos, true);
pos += 8;
this.header.zmax = view.getFloat64(pos, true);
pos += 8;
this.header.mmin = view.getFloat64(pos, true);
pos += 8;
this.header.mmax = view.getFloat64(pos, true);
// Records
const numRecords = ((this.header.fileLengthWords * 2) - 100) / 8;
for (let i = 0; i < numRecords; i++) {
const offset = view.getInt32(pos, false);
const length = view.getInt32(pos + 4, false);
this.records.push({ offsetWords: offset, contentLengthWords: length });
pos += 8;
}
}
printProperties() {
console.log('=== .SHX File Properties ===');
console.log('\n--- Header ---');
console.log(`File Code: ${this.header.fileCode} (expected 9994)`);
console.log(`File Length: ${this.header.fileLengthWords} words (${this.header.fileLengthWords * 2} bytes)`);
console.log(`Version: ${this.header.version} (expected 1000)`);
console.log(`Shape Type: ${this.header.shapeType}`);
console.log(`Bounding Box: Xmin=${this.header.xmin.toFixed(6)}, Ymin=${this.header.ymin.toFixed(6)}, Xmax=${this.header.xmax.toFixed(6)}, Ymax=${this.header.ymax.toFixed(6)}`);
console.log(`Z Range: Zmin=${this.header.zmin.toFixed(6)}, Zmax=${this.header.zmax.toFixed(6)}`);
console.log(`M Range: Mmin=${this.header.mmin.toFixed(6)}, Mmax=${this.header.mmax.toFixed(6)}`);
console.log(`\nNumber of Records: ${this.records.length}`);
console.log('--- Records ---');
this.records.forEach((rec, i) => {
console.log(`Record ${i}: Offset=${rec.offsetWords} words (${rec.offsetWords * 2} bytes), Content Length=${rec.contentLengthWords} words (${rec.contentLengthWords * 2} bytes)`);
});
}
write(filename) {
const fs = require('fs');
const buffer = new ArrayBuffer(100 + this.records.length * 8);
const view = new DataView(buffer);
let pos = 0;
// Write header
view.setInt32(pos, this.header.fileCode, false); // Big-endian
pos += 4;
this.header.unused.forEach(u => {
view.setInt32(pos, u, false);
pos += 4;
});
view.setInt32(pos, this.header.fileLengthWords, false);
pos += 4;
view.setUint32(pos, this.header.version, true); // Little-endian
pos += 4;
view.setUint32(pos, this.header.shapeType, true);
pos += 4;
view.setFloat64(pos, this.header.xmin, true);
pos += 8;
view.setFloat64(pos, this.header.ymin, true);
pos += 8;
view.setFloat64(pos, this.header.xmax, true);
pos += 8;
view.setFloat64(pos, this.header.ymax, true);
pos += 8;
view.setFloat64(pos, this.header.zmin, true);
pos += 8;
view.setFloat64(pos, this.header.zmax, true);
pos += 8;
view.setFloat64(pos, this.header.mmin, true);
pos += 8;
view.setFloat64(pos, this.header.mmax, true);
// Write records
this.records.forEach(rec => {
view.setInt32(pos, rec.offsetWords, false);
pos += 4;
view.setInt32(pos, rec.contentLengthWords, false);
pos += 4;
});
fs.writeFileSync(filename, new Uint8Array(buffer));
console.log(`Wrote .SHX to ${filename}`);
this.printProperties();
}
}
// Usage (Node): new SHXHandler('sample.shx').write('output.shx');
7. C Code for .SHX Handling
C lacks classes, so the following provides a struct-based approach with functions for reading, writing, and printing. It uses fread/fwrite with manual byte order handling via shifts. Compile with gcc shx_handler.c -o shx_handler. Properties are printed to stdout.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
typedef struct {
int32_t file_code;
int32_t unused[5];
int32_t file_length_words;
uint32_t version;
uint32_t shape_type;
double xmin, ymin, xmax, ymax, zmin, zmax, mmin, mmax;
int32_t* offsets; // Dynamic array
int32_t* lengths; // Dynamic array
int num_records;
} SHXFile;
int32_t read_big_endian_int(FILE* f) {
uint8_t bytes[4];
fread(bytes, 1, 4, f);
return (bytes[0] << 24) | (bytes[1] << 16) | (bytes[2] << 8) | bytes[3];
}
uint32_t read_little_endian_uint(FILE* f) {
uint8_t bytes[4];
fread(bytes, 1, 4, f);
return (bytes[3] << 24) | (bytes[2] << 16) | (bytes[1] << 8) | bytes[0];
}
void write_big_endian_int(FILE* f, int32_t val) {
uint8_t bytes[4] = {(val >> 24) & 0xFF, (val >> 16) & 0xFF, (val >> 8) & 0xFF, val & 0xFF};
fwrite(bytes, 1, 4, f);
}
void read_shx(SHXFile* shx, const char* filename) {
FILE* f = fopen(filename, "rb");
if (!f) { perror("File open error"); return; }
// Header
shx->file_code = read_big_endian_int(f);
for (int i = 0; i < 5; i++) shx->unused[i] = read_big_endian_int(f);
shx->file_length_words = read_big_endian_int(f);
shx->version = read_little_endian_uint(f);
shx->shape_type = read_little_endian_uint(f);
fread(&shx->xmin, sizeof(double), 1, f); // Little-endian assumed on platform
fread(&shx->ymin, sizeof(double), 1, f);
fread(&shx->xmax, sizeof(double), 1, f);
fread(&shx->ymax, sizeof(double), 1, f);
fread(&shx->zmin, sizeof(double), 1, f);
fread(&shx->zmax, sizeof(double), 1, f);
fread(&shx->mmin, sizeof(double), 1, f);
fread(&shx->mmax, sizeof(double), 1, f);
// Records
shx->num_records = ((shx->file_length_words * 2) - 100) / 8;
shx->offsets = malloc(shx->num_records * sizeof(int32_t));
shx->lengths = malloc(shx->num_records * sizeof(int32_t));
for (int i = 0; i < shx->num_records; i++) {
shx->offsets[i] = read_big_endian_int(f);
shx->lengths[i] = read_big_endian_int(f);
}
fclose(f);
print_properties(shx);
}
void print_properties(SHXFile* shx) {
printf("=== .SHX File Properties ===\n");
printf("\n--- Header ---\n");
printf("File Code: %d (expected 9994)\n", shx->file_code);
printf("File Length: %d words (%d bytes)\n", shx->file_length_words, shx->file_length_words * 2);
printf("Version: %u (expected 1000)\n", shx->version);
printf("Shape Type: %u\n", shx->shape_type);
printf("Bounding Box: Xmin=%.6f, Ymin=%.6f, Xmax=%.6f, Ymax=%.6f\n",
shx->xmin, shx->ymin, shx->xmax, shx->ymax);
printf("Z Range: Zmin=%.6f, Zmax=%.6f\n", shx->zmin, shx->zmax);
printf("M Range: Mmin=%.6f, Mmax=%.6f\n", shx->mmin, shx->mmax);
printf("\nNumber of Records: %d\n", shx->num_records);
printf("--- Records ---\n");
for (int i = 0; i < shx->num_records; i++) {
printf("Record %d: Offset=%d words (%d bytes), Content Length=%d words (%d bytes)\n",
i, shx->offsets[i], shx->offsets[i] * 2, shx->lengths[i], shx->lengths[i] * 2);
}
}
void write_shx(SHXFile* shx, const char* output_filename) {
FILE* f = fopen(output_filename, "wb");
if (!f) { perror("File write error"); return; }
// Write header
write_big_endian_int(f, shx->file_code);
for (int i = 0; i < 5; i++) write_big_endian_int(f, shx->unused[i]);
write_big_endian_int(f, shx->file_length_words);
uint8_t ver_bytes[4] = {(shx->version) & 0xFF, (shx->version >> 8) & 0xFF, (shx->version >> 16) & 0xFF, (shx->version >> 24) & 0xFF};
fwrite(ver_bytes, 1, 4, f); // Little-endian
uint8_t st_bytes[4] = {(shx->shape_type) & 0xFF, (shx->shape_type >> 8) & 0xFF, (shx->shape_type >> 16) & 0xFF, (shx->shape_type >> 24) & 0xFF};
fwrite(st_bytes, 1, 4, f);
fwrite(&shx->xmin, sizeof(double), 1, f);
fwrite(&shx->ymin, sizeof(double), 1, f);
fwrite(&shx->xmax, sizeof(double), 1, f);
fwrite(&shx->ymax, sizeof(double), 1, f);
fwrite(&shx->zmin, sizeof(double), 1, f);
fwrite(&shx->zmax, sizeof(double), 1, f);
fwrite(&shx->mmin, sizeof(double), 1, f);
fwrite(&shx->mmax, sizeof(double), 1, f);
// Write records
for (int i = 0; i < shx->num_records; i++) {
write_big_endian_int(f, shx->offsets[i]);
write_big_endian_int(f, shx->lengths[i]);
}
fclose(f);
printf("Wrote .SHX to %s\n", output_filename);
print_properties(shx);
}
void free_shx(SHXFile* shx) {
free(shx->offsets);
free(shx->lengths);
}
int main(int argc, char* argv[]) {
if (argc < 2) { printf("Usage: %s <input.shx> [output.shx]\n", argv[0]); return 1; }
SHXFile shx = {0};
read_shx(&shx, argv[1]);
if (argc > 2) write_shx(&shx, argv[2]);
free_shx(&shx);
return 0;
}