Task 853: .XZ File Format
Task 853: .XZ File Format
1. List of Properties Intrinsic to the .XZ File Format
The .XZ file format is a container for compressed streams without archiving capabilities, designed for single-file compression. Its structure consists of one or more concatenated Streams, each comprising a Header, zero or more Blocks, an Index, and a Footer, optionally followed by Stream Padding. All sizes are multiples of four bytes to maintain alignment. The following is a comprehensive list of the intrinsic properties (structural fields and elements) of the .XZ file format, derived from its official specification. These properties define the format's layout, integrity checks, and compression metadata.
Stream Header (12 bytes total, always present):
- Header Magic Bytes: 6 bytes with fixed value 0xFD 0x37 0x7A 0x58 0x5A 0x00, used for file type identification.
- Stream Flags: 2 bytes, where the first byte is reserved (must be 0x00) and the second byte specifies the Check type (bits 0-3: Check ID from 0x00 for None to 0x0A for SHA-256; bits 4-7 reserved and must be 0x00).
- Header CRC32: 4 bytes, an unsigned 32-bit little-endian integer representing the CRC32 checksum of the Stream Flags.
Block (variable size, zero or more per Stream; each Block represents a compressed segment):
- Block Header (variable size, 8-1024 bytes, multiple of 4):
- Block Header Size: 1 byte, encoded value (0x01-0xFF) indicating the header size as (value + 1) * 4 bytes.
- Block Flags: 1 byte, where bits 0-1 indicate the number of filters (1-4), bits 2-5 are reserved (must be 0x00), bit 6 indicates presence of Compressed Size, and bit 7 indicates presence of Uncompressed Size.
- Compressed Size: Variable-length integer (VLI, 1-9 bytes), present if bit 6 of Block Flags is set; represents the size of Compressed Data (non-zero).
- Uncompressed Size: VLI (1-9 bytes), present if bit 7 of Block Flags is set; represents the size after decompression.
- List of Filter Flags: One per filter (up to 4), each consisting of Filter ID (VLI), Size of Properties (VLI), and Filter Properties (variable bytes based on size). Supported Filter IDs include 0x21 (LZMA2), 0x03 (Delta), and others for executable filters (0x04-0x09).
- Header Padding: 0-3 null bytes (0x00) to align the header to a multiple of 4 bytes.
- Block Header CRC32: 4 bytes, unsigned 32-bit little-endian CRC32 of the Block Header excluding this field.
- Compressed Data: Variable bytes, the output of the filter chain (e.g., LZMA2-compressed data); alignment depends on filters.
- Block Padding: 0-3 null bytes (0x00) to align the Block to a multiple of 4 bytes.
- Check: Variable size based on Stream Flags Check ID (0 bytes for None, 4 for CRC32, 8 for CRC64, 16/32/64 for reserved or SHA-256); checksum of uncompressed data.
Index (variable size, up to 16 GiB, always present):
- Index Indicator: 1 byte with fixed value 0x00.
- Number of Records: VLI (1-9 bytes), indicating the number of Blocks in the Stream.
- List of Records: One per Block, each consisting of Unpadded Size (VLI, size of Block excluding Block Padding) and Uncompressed Size (VLI, size after decompression).
- Index Padding: 0-3 null bytes (0x00) to align the Index to a multiple of 4 bytes.
- Index CRC32: 4 bytes, unsigned 32-bit little-endian CRC32 of the Index excluding this field.
Stream Footer (12 bytes total, always present):
- Footer CRC32: 4 bytes, unsigned 32-bit little-endian CRC32 of Backward Size and Stream Flags.
- Backward Size: 4 bytes, unsigned 32-bit little-endian integer indicating Index size as (value + 1) * 4 bytes.
- Stream Flags: 2 bytes, identical to the Stream Header's Stream Flags.
- Footer Magic Bytes: 2 bytes with fixed value 0x59 0x5A ('Y' 'Z').
Stream Padding (optional, multiple of 4 bytes): Zero or more groups of 4 null bytes (0x00) between or after Streams for alignment.
Additional constraints include: total Stream size < 8 EiB; VLI limited to 63 bits; support for up to 4 filters per Block with restrictions on non-last filters; and custom Filter IDs using a 40-bit Developer ID prefix.
2. Two Direct Download Links for .XZ Files
- https://sourceforge.net/projects/lzmautils/files/xz-5.4.1.tar.xz/download (xz-5.4.1.tar.xz, source code archive compressed with XZ).
- https://sourceforge.net/projects/gwyddion/files/sample-modules/2.6/threshold-example-2.6.tar.xz/download (threshold-example-2.6.tar.xz, sample module archive compressed with XZ).
3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .XZ File Property Dump
The following is an HTML snippet with embedded JavaScript suitable for embedding in a Ghost blog post. It creates a drag-and-drop area where a user can drop a .XZ file. The script parses the file, extracts all properties from the list above (handling multiple Streams if present), and displays them in a structured text format on the screen. It does not decompress data but parses and validates the structure per the specification.
4. Python Class for .XZ File Handling
The following Python class can open a .XZ file, decode its structure, read and store the properties and data, print the properties to console, and write the parsed structure back to a new file (round-trip capability without modifying compressed data).
import struct
import io
class XZFile:
def __init__(self, filename):
with open(filename, 'rb') as f:
self.data = f.read()
self.buffer = io.BytesIO(self.data)
self.properties = self.parse()
self.original_data = self.data # For write
def read_vli(self):
val = 0
shift = 0
for _ in range(9):
byte = self.buffer.read(1)[0]
val |= (byte & 0x7F) << shift
if (byte & 0x80) == 0:
return val
shift += 7
raise ValueError('Invalid VLI')
def parse_stream(self):
props = {}
# Stream Header
magic = self.buffer.read(6)
props['Header Magic'] = ' '.join(f'{b:02x}' for b in magic)
flags = self.buffer.read(2)
props['Stream Flags'] = f'0x{flags[0]:02x} 0x{flags[1]:02x} (Check ID: {flags[1] & 0x0F})'
header_crc = struct.unpack('<I', self.buffer.read(4))[0]
props['Header CRC32'] = f'0x{header_crc:08x}'
props['Blocks'] = []
while True:
pos = self.buffer.tell()
next_byte = self.buffer.read(1)[0]
self.buffer.seek(pos)
if next_byte == 0:
break
props['Blocks'].append(self.parse_block(flags[1] & 0x0F))
# Index
props['Index'] = {}
index_ind = self.buffer.read(1)[0]
props['Index']['Indicator'] = f'0x{index_ind:02x}'
num_records = self.read_vli()
props['Index']['Number of Records'] = num_records
props['Index']['Records'] = []
for i in range(num_records):
unpadded = self.read_vli()
uncomp = self.read_vli()
props['Index']['Records'].append(f'Record {i+1}: Unpadded {unpadded}, Uncompressed {uncomp}')
pad_start = self.buffer.tell()
pad_bytes = 0
while pad_bytes < 3 and self.buffer.read(1)[0] == 0:
pad_bytes += 1
self.buffer.seek(pad_start + pad_bytes)
props['Index']['Padding'] = pad_bytes
index_crc = struct.unpack('<I', self.buffer.read(4))[0]
props['Index']['CRC32'] = f'0x{index_crc:08x}'
# Stream Footer
footer_crc = struct.unpack('<I', self.buffer.read(4))[0]
props['Footer CRC32'] = f'0x{footer_crc:08x}'
backward_size = struct.unpack('<I', self.buffer.read(4))[0]
props['Backward Size'] = (backward_size + 1) * 4
footer_flags = self.buffer.read(2)
props['Footer Flags'] = f'0x{footer_flags[0]:02x} 0x{footer_flags[1]:02x}'
footer_magic = self.buffer.read(2)
props['Footer Magic'] = ' '.join(f'{b:02x}' for b in footer_magic)
# Stream Padding
pad_count = 0
while self.buffer.tell() < len(self.data) and self.buffer.read(1)[0] == 0:
pad_count += 1
if pad_count % 4 != 0:
raise ValueError('Invalid Stream Padding')
props['Stream Padding'] = pad_count
self.buffer.seek(self.buffer.tell() - pad_count - 1 if pad_count > 0 else 0) # Reset for multiple streams
return props
def parse_block(self, check_id):
block_props = {}
header_size_enc = self.buffer.read(1)[0]
header_size = (header_size_enc + 1) * 4
block_props['Header Size'] = header_size
block_flags = self.buffer.read(1)[0]
num_filters = (block_flags & 0x03) + 1
has_comp_size = (block_flags & 0x40) != 0
has_uncomp_size = (block_flags & 0x80) != 0
block_props['Flags'] = f'0x{block_flags:02x} (Filters: {num_filters}, Comp: {has_comp_size}, Uncomp: {has_uncomp_size})'
if has_comp_size:
block_props['Compressed Size'] = self.read_vli()
if has_uncomp_size:
block_props['Uncompressed Size'] = self.read_vli()
block_props['Filters'] = []
for i in range(num_filters):
filter_id = self.read_vli()
prop_size = self.read_vli()
props_bytes = self.buffer.read(prop_size)
block_props['Filters'].append(f'Filter {i+1}: ID {filter_id}, Prop Size {prop_size}, Props: {" ".join(f"{b:02x}" for b in props_bytes)}')
pad_start = self.buffer.tell()
pad_bytes = 0
while pad_bytes < 3 and self.buffer.read(1)[0] == 0:
pad_bytes += 1
self.buffer.seek(pad_start + pad_bytes)
block_props['Header Padding'] = pad_bytes
block_crc = struct.unpack('<I', self.buffer.read(4))[0]
block_props['Header CRC32'] = f'0x{block_crc:08x}'
comp_start = self.buffer.tell()
self.buffer.seek(comp_start + block_props.get('Compressed Size', 0))
block_props['Compressed Data'] = f'{block_props.get("Compressed Size", 0)} bytes (skipped)'
block_pad = 0
while block_pad < 3 and self.buffer.read(1)[0] == 0:
block_pad += 1
self.buffer.seek(self.buffer.tell() - block_pad - 1 if block_pad > 0 else self.buffer.tell())
block_props['Block Padding'] = block_pad
check_size = {0x00: 0, 0x01:4, 0x04:8, 0x0A:32}.get(check_id, 64) # Simplified mapping
check_bytes = self.buffer.read(check_size)
block_props['Check'] = ' '.join(f'{b:02x}' for b in check_bytes)
return block_props
def parse(self):
properties = []
while self.buffer.tell() < len(self.data):
properties.append(self.parse_stream())
return properties
def print_properties(self):
for i, stream in enumerate(self.properties):
print(f'Stream {i+1}:')
for key, value in stream.items():
if isinstance(value, dict):
print(f' {key}:')
for subkey, subvalue in value.items():
if isinstance(subvalue, list):
print(f' {subkey}:')
for item in subvalue:
print(f' {item}')
else:
print(f' {subkey}: {subvalue}')
elif isinstance(value, list):
print(f' {key}:')
for block in value:
for bkey, bvalue in block.items():
if isinstance(bvalue, list):
print(f' {bkey}:')
for fitem in bvalue:
print(f' {fitem}')
else:
print(f' {bkey}: {bvalue}')
else:
print(f' {key}: {value}')
def write(self, output_filename):
# Write back the original data (round-trip)
with open(output_filename, 'wb') as f:
f.write(self.original_data)
5. Java Class for .XZ File Handling
The following Java class can open a .XZ file, decode its structure, read and store the properties and data, print the properties to console, and write the parsed structure back to a new file.
import java.io.*;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
public class XZFile {
private byte[] data;
private ByteBuffer buffer;
private List<Map<String, Object>> properties;
private byte[] originalData;
public XZFile(String filename) throws IOException {
try (FileInputStream fis = new FileInputStream(filename)) {
data = fis.readAllBytes();
}
buffer = ByteBuffer.wrap(data).order(ByteOrder.LITTLE_ENDIAN);
properties = parse();
originalData = data.clone(); // For write
}
private long readVLI() {
long val = 0;
int shift = 0;
for (int i = 0; i < 9; i++) {
byte b = buffer.get();
val |= ((long) (b & 0x7F)) << shift;
if ((b & 0x80) == 0) return val;
shift += 7;
}
throw new IllegalArgumentException("Invalid VLI");
}
private Map<String, Object> parseStream() {
Map<String, Object> props = new HashMap<>();
// Stream Header
byte[] magic = new byte[6];
buffer.get(magic);
props.put("Header Magic", bytesToHex(magic));
byte[] flags = new byte[2];
buffer.get(flags);
int checkId = flags[1] & 0x0F;
props.put("Stream Flags", String.format("0x%02x 0x%02x (Check ID: %d)", flags[0], flags[1], checkId));
int headerCrc = buffer.getInt();
props.put("Header CRC32", String.format("0x%08x", headerCrc));
List<Map<String, Object>> blocks = new ArrayList<>();
props.put("Blocks", blocks);
while (true) {
int pos = buffer.position();
byte nextByte = buffer.get();
buffer.position(pos);
if (nextByte == 0) break;
blocks.add(parseBlock(checkId));
}
// Index
Map<String, Object> index = new HashMap<>();
props.put("Index", index);
byte indexInd = buffer.get();
index.put("Indicator", String.format("0x%02x", indexInd));
long numRecords = readVLI();
index.put("Number of Records", numRecords);
List<String> records = new ArrayList<>();
for (long i = 0; i < numRecords; i++) {
long unpadded = readVLI();
long uncomp = readVLI();
records.add(String.format("Record %d: Unpadded %d, Uncompressed %d", i + 1, unpadded, uncomp));
}
index.put("Records", records);
int padStart = buffer.position();
int padBytes = 0;
while (padBytes < 3 && buffer.get() == 0) padBytes++;
buffer.position(padStart + padBytes);
index.put("Padding", padBytes);
int indexCrc = buffer.getInt();
index.put("CRC32", String.format("0x%08x", indexCrc));
// Stream Footer
int footerCrc = buffer.getInt();
props.put("Footer CRC32", String.format("0x%08x", footerCrc));
int backwardSize = buffer.getInt();
props.put("Backward Size", (backwardSize + 1) * 4);
byte[] footerFlags = new byte[2];
buffer.get(footerFlags);
props.put("Footer Flags", String.format("0x%02x 0x%02x", footerFlags[0], footerFlags[1]));
byte[] footerMagic = new byte[2];
buffer.get(footerMagic);
props.put("Footer Magic", bytesToHex(footerMagic));
// Stream Padding
int padCount = 0;
while (buffer.hasRemaining() && buffer.get() == 0) padCount++;
if (padCount % 4 != 0) throw new IllegalArgumentException("Invalid Stream Padding");
props.put("Stream Padding", padCount);
buffer.position(buffer.position() - padCount - 1); // Reset if needed
return props;
}
private Map<String, Object> parseBlock(int checkId) {
Map<String, Object> blockProps = new HashMap<>();
byte headerSizeEnc = buffer.get();
int headerSize = (headerSizeEnc + 1) * 4;
blockProps.put("Header Size", headerSize);
byte blockFlags = buffer.get();
int numFilters = (blockFlags & 0x03) + 1;
boolean hasCompSize = (blockFlags & 0x40) != 0;
boolean hasUncompSize = (blockFlags & 0x80) != 0;
blockProps.put("Flags", String.format("0x%02x (Filters: %d, Comp: %b, Uncomp: %b)", blockFlags, numFilters, hasCompSize, hasUncompSize));
long compSize = 0;
if (hasCompSize) {
compSize = readVLI();
blockProps.put("Compressed Size", compSize);
}
if (hasUncompSize) {
blockProps.put("Uncompressed Size", readVLI());
}
List<String> filters = new ArrayList<>();
for (int i = 0; i < numFilters; i++) {
long filterId = readVLI();
long propSize = readVLI();
byte[] propsBytes = new byte[(int) propSize];
buffer.get(propsBytes);
filters.add(String.format("Filter %d: ID %d, Prop Size %d, Props: %s", i + 1, filterId, propSize, bytesToHex(propsBytes)));
}
blockProps.put("Filters", filters);
int padStart = buffer.position();
int padBytes = 0;
while (padBytes < 3 && buffer.get() == 0) padBytes++;
buffer.position(padStart + padBytes);
blockProps.put("Header Padding", padBytes);
int blockCrc = buffer.getInt();
blockProps.put("Header CRC32", String.format("0x%08x", blockCrc));
int compStart = buffer.position();
buffer.position(compStart + (int) compSize);
blockProps.put("Compressed Data", compSize + " bytes (skipped)");
int blockPad = 0;
while (blockPad < 3 && buffer.get() == 0) blockPad++;
buffer.position(buffer.position() - blockPad);
blockProps.put("Block Padding", blockPad);
int checkSize = switch (checkId) {
case 0x00 -> 0;
case 0x01 -> 4;
case 0x04 -> 8;
case 0x0A -> 32;
default -> 64;
};
byte[] checkBytes = new byte[checkSize];
buffer.get(checkBytes);
blockProps.put("Check", bytesToHex(checkBytes));
return blockProps;
}
private List<Map<String, Object>> parse() {
List<Map<String, Object>> propsList = new ArrayList<>();
while (buffer.hasRemaining()) {
propsList.add(parseStream());
}
return propsList;
}
public void printProperties() {
for (int i = 0; i < properties.size(); i++) {
System.out.println("Stream " + (i + 1) + ":");
Map<String, Object> stream = properties.get(i);
for (Map.Entry<String, Object> entry : stream.entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();
if (value instanceof Map) {
System.out.println(" " + key + ":");
@SuppressWarnings("unchecked")
Map<String, Object> subMap = (Map<String, Object>) value;
for (Map.Entry<String, Object> subEntry : subMap.entrySet()) {
if (subEntry.getValue() instanceof List) {
System.out.println(" " + subEntry.getKey() + ":");
@SuppressWarnings("unchecked")
List<String> list = (List<String>) subEntry.getValue();
for (String item : list) {
System.out.println(" " + item);
}
} else {
System.out.println(" " + subEntry.getKey() + ": " + subEntry.getValue());
}
}
} else if (value instanceof List) {
System.out.println(" " + key + ":");
@SuppressWarnings("unchecked")
List<Map<String, Object>> blocks = (List<Map<String, Object>>) value;
for (Map<String, Object> block : blocks) {
for (Map.Entry<String, Object> bEntry : block.entrySet()) {
if (bEntry.getValue() instanceof List) {
System.out.println(" " + bEntry.getKey() + ":");
@SuppressWarnings("unchecked")
List<String> list = (List<String>) bEntry.getValue();
for (String item : list) {
System.out.println(" " + item);
}
} else {
System.out.println(" " + bEntry.getKey() + ": " + bEntry.getValue());
}
}
}
} else {
System.out.println(" " + key + ": " + value);
}
}
}
}
public void write(String outputFilename) throws IOException {
try (FileOutputStream fos = new FileOutputStream(outputFilename)) {
fos.write(originalData);
}
}
private static String bytesToHex(byte[] bytes) {
StringBuilder sb = new StringBuilder();
for (byte b : bytes) {
sb.append(String.format("%02x ", b));
}
return sb.toString().trim();
}
}
6. JavaScript Class for .XZ File Handling
The following JavaScript class can open a .XZ file (via ArrayBuffer), decode its structure, read and store the properties and data, print the properties to console, and write the structure back as a Blob for download.
class XZFile {
constructor(buffer) {
this.view = new DataView(buffer);
this.offset = 0;
this.properties = this.parse();
this.originalBuffer = buffer; // For write
}
readBytes(n) {
const bytes = new Uint8Array(this.view.buffer.slice(this.offset, this.offset + n));
this.offset += n;
return bytes;
}
readUint32() {
const val = this.view.getUint32(this.offset, true);
this.offset += 4;
return val;
}
readVLI() {
let val = 0;
let shift = 0;
for (let i = 0; i < 9; i++) {
const byte = this.view.getUint8(this.offset++);
val |= (byte & 0x7F) << shift;
if ((byte & 0x80) === 0) return val;
shift += 7;
}
throw new Error('Invalid VLI');
}
parseStream() {
const props = {};
const magic = this.readBytes(6);
props['Header Magic'] = Array.from(magic).map(b => b.toString(16).padStart(2, '0')).join(' ');
const flags = this.readBytes(2);
const checkId = flags[1] & 0x0F;
props['Stream Flags'] = `0x${flags[0].toString(16).padStart(2, '0')} 0x${flags[1].toString(16).padStart(2, '0')} (Check ID: ${checkId})`;
const headerCrc = this.readUint32();
props['Header CRC32'] = `0x${headerCrc.toString(16).padStart(8, '0')}`;
props['Blocks'] = [];
while (true) {
const nextByte = this.view.getUint8(this.offset);
if (nextByte === 0) break;
props['Blocks'].push(this.parseBlock(checkId));
}
props['Index'] = {};
const indexInd = this.readBytes(1)[0];
props['Index']['Indicator'] = `0x${indexInd.toString(16).padStart(2, '0')}`;
const numRecords = this.readVLI();
props['Index']['Number of Records'] = numRecords;
props['Index']['Records'] = [];
for (let i = 0; i < numRecords; i++) {
const unpadded = this.readVLI();
const uncomp = this.readVLI();
props['Index']['Records'].push(`Record ${i + 1}: Unpadded ${unpadded}, Uncompressed ${uncomp}`);
}
const padStart = this.offset;
let padBytes = 0;
while (padBytes < 3 && this.view.getUint8(this.offset++) === 0) padBytes++;
this.offset = padStart + padBytes;
props['Index']['Padding'] = padBytes;
const indexCrc = this.readUint32();
props['Index']['CRC32'] = `0x${indexCrc.toString(16).padStart(8, '0')}`;
const footerCrc = this.readUint32();
props['Footer CRC32'] = `0x${footerCrc.toString(16).padStart(8, '0')}`;
const backwardSize = this.readUint32();
props['Backward Size'] = (backwardSize + 1) * 4;
const footerFlags = this.readBytes(2);
props['Footer Flags'] = `0x${footerFlags[0].toString(16).padStart(2, '0')} 0x${footerFlags[1].toString(16).padStart(2, '0')}`;
const footerMagic = this.readBytes(2);
props['Footer Magic'] = Array.from(footerMagic).map(b => b.toString(16).padStart(2, '0')).join(' ');
let padCount = 0;
while (this.offset < this.view.byteLength && this.view.getUint8(this.offset++) === 0) padCount++;
if (padCount % 4 !== 0) throw new Error('Invalid Stream Padding');
props['Stream Padding'] = padCount;
this.offset -= padCount; // Reset
return props;
}
parseBlock(checkId) {
const blockProps = {};
const headerSizeEnc = this.readBytes(1)[0];
const headerSize = (headerSizeEnc + 1) * 4;
blockProps['Header Size'] = headerSize;
const blockFlags = this.readBytes(1)[0];
const numFilters = (blockFlags & 0x03) + 1;
const hasCompSize = (blockFlags & 0x40) !== 0;
const hasUncompSize = (blockFlags & 0x80) !== 0;
blockProps['Flags'] = `0x${blockFlags.toString(16).padStart(2, '0')} (Filters: ${numFilters}, Comp: ${hasCompSize}, Uncomp: ${hasUncompSize})`;
let compSize = 0;
if (hasCompSize) {
compSize = this.readVLI();
blockProps['Compressed Size'] = compSize;
}
if (hasUncompSize) {
blockProps['Uncompressed Size'] = this.readVLI();
}
blockProps['Filters'] = [];
for (let i = 0; i < numFilters; i++) {
const filterId = this.readVLI();
const propSize = this.readVLI();
const propsBytes = this.readBytes(propSize);
blockProps['Filters'].push(`Filter ${i + 1}: ID ${filterId}, Prop Size ${propSize}, Props: ${Array.from(propsBytes).map(b => b.toString(16).padStart(2, '0')).join(' ')}`);
}
const padStart = this.offset;
let padBytes = 0;
while (padBytes < 3 && this.view.getUint8(this.offset++) === 0) padBytes++;
this.offset = padStart + padBytes;
blockProps['Header Padding'] = padBytes;
const blockCrc = this.readUint32();
blockProps['Header CRC32'] = `0x${blockCrc.toString(16).padStart(8, '0')}`;
const compStart = this.offset;
this.offset += compSize;
blockProps['Compressed Data'] = `${compSize} bytes (skipped)`;
let blockPad = 0;
while (blockPad < 3 && this.view.getUint8(this.offset++) === 0) blockPad++;
this.offset -= blockPad;
blockProps['Block Padding'] = blockPad;
let checkSize = 0;
switch (checkId) {
case 0x00: checkSize = 0; break;
case 0x01: checkSize = 4; break;
case 0x04: checkSize = 8; break;
case 0x0A: checkSize = 32; break;
default: checkSize = 64;
}
const checkBytes = this.readBytes(checkSize);
blockProps['Check'] = Array.from(checkBytes).map(b => b.toString(16).padStart(2, '0')).join(' ');
return blockProps;
}
parse() {
const propsList = [];
while (this.offset < this.view.byteLength) {
propsList.push(this.parseStream());
}
return propsList;
}
printProperties() {
this.properties.forEach((stream, i) => {
console.log(`Stream ${i + 1}:`);
Object.entries(stream).forEach(([key, value]) => {
if (typeof value === 'object' && value !== null) {
console.log(` ${key}:`);
if (Array.isArray(value)) {
value.forEach((item, j) => {
if (typeof item === 'object') {
Object.entries(item).forEach(([bKey, bValue]) => {
if (Array.isArray(bValue)) {
console.log(` ${bKey}:`);
bValue.forEach(f => console.log(` ${f}`));
} else {
console.log(` ${bKey}: ${bValue}`);
}
});
} else {
console.log(` ${item}`);
}
});
} else {
Object.entries(value).forEach(([subKey, subValue]) => {
if (Array.isArray(subValue)) {
console.log(` ${subKey}:`);
subValue.forEach(r => console.log(` ${r}`));
} else {
console.log(` ${subKey}: ${subValue}`);
}
});
}
} else {
console.log(` ${key}: ${value}`);
}
});
});
}
write() {
// Return a Blob for download
return new Blob([this.originalBuffer], { type: 'application/x-xz' });
}
}
7. C "Class" for .XZ File Handling
Since C does not support classes natively, the following implementation uses a struct with associated functions to mimic a class. It can open a .XZ file, decode its structure, read and store the properties and data, print the properties to console, and write the structure back to a new file.
#include <stdio.h>
#include <stdlib.h>
#include <stdint.h>
#include <string.h>
#include <stdbool.h>
typedef struct {
uint8_t *data;
size_t size;
size_t offset;
// Properties would be stored in a dynamic structure, but for simplicity, we print directly
uint8_t *original_data;
size_t original_size;
} XZParser;
XZParser* xzparser_new(const char* filename) {
XZParser* parser = malloc(sizeof(XZParser));
FILE* f = fopen(filename, "rb");
if (!f) return NULL;
fseek(f, 0, SEEK_END);
parser->size = ftell(f);
fseek(f, 0, SEEK_SET);
parser->data = malloc(parser->size);
fread(parser->data, 1, parser->size, f);
fclose(f);
parser->offset = 0;
parser->original_data = malloc(parser->size);
memcpy(parser->original_data, parser->data, parser->size);
parser->original_size = parser->size;
return parser;
}
void xzparser_free(XZParser* parser) {
free(parser->data);
free(parser->original_data);
free(parser);
}
uint64_t read_vli(XZParser* parser) {
uint64_t val = 0;
int shift = 0;
for (int i = 0; i < 9; i++) {
uint8_t byte = parser->data[parser->offset++];
val |= ((uint64_t)(byte & 0x7F)) << shift;
if ((byte & 0x80) == 0) return val;
shift += 7;
}
fprintf(stderr, "Invalid VLI\n");
exit(1);
}
void read_bytes(XZParser* parser, uint8_t* buf, size_t n) {
memcpy(buf, parser->data + parser->offset, n);
parser->offset += n;
}
uint32_t read_uint32(XZParser* parser) {
uint32_t val;
memcpy(&val, parser->data + parser->offset, 4);
parser->offset += 4;
return val;
}
void parse_block(XZParser* parser, int check_id) {
printf("Block:\n");
uint8_t header_size_enc;
read_bytes(parser, &header_size_enc, 1);
size_t header_size = (header_size_enc + 1) * 4;
printf(" Header Size: %zu bytes\n", header_size);
uint8_t block_flags;
read_bytes(parser, &block_flags, 1);
int num_filters = (block_flags & 0x03) + 1;
bool has_comp_size = (block_flags & 0x40) != 0;
bool has_uncomp_size = (block_flags & 0x80) != 0;
printf(" Flags: 0x%02x (Filters: %d, Comp: %d, Uncomp: %d)\n", block_flags, num_filters, has_comp_size, has_uncomp_size);
uint64_t comp_size = 0;
if (has_comp_size) {
comp_size = read_vli(parser);
printf(" Compressed Size: %llu\n", (unsigned long long)comp_size);
}
if (has_uncomp_size) {
uint64_t uncomp_size = read_vli(parser);
printf(" Uncompressed Size: %llu\n", (unsigned long long)uncomp_size);
}
printf(" Filters:\n");
for (int i = 0; i < num_filters; i++) {
uint64_t filter_id = read_vli(parser);
uint64_t prop_size = read_vli(parser);
uint8_t* props = malloc(prop_size);
read_bytes(parser, props, prop_size);
printf(" Filter %d: ID %llu, Prop Size %llu, Props: ", i + 1, (unsigned long long)filter_id, (unsigned long long)prop_size);
for (size_t j = 0; j < prop_size; j++) {
printf("%02x ", props[j]);
}
printf("\n");
free(props);
}
size_t pad_start = parser->offset;
int pad_bytes = 0;
while (pad_bytes < 3 && parser->data[parser->offset++] == 0) pad_bytes++;
parser->offset = pad_start + pad_bytes;
printf(" Header Padding: %d bytes\n", pad_bytes);
uint32_t block_crc = read_uint32(parser);
printf(" Header CRC32: 0x%08x\n", block_crc);
size_t comp_start = parser->offset;
parser->offset += comp_size;
printf(" Compressed Data: %llu bytes (skipped)\n", (unsigned long long)comp_size);
int block_pad = 0;
while (block_pad < 3 && parser->data[parser->offset++] == 0) block_pad++;
parser->offset -= block_pad;
printf(" Block Padding: %d bytes\n", block_pad);
int check_size = 0;
switch (check_id) {
case 0x00: check_size = 0; break;
case 0x01: check_size = 4; break;
case 0x04: check_size = 8; break;
case 0x0A: check_size = 32; break;
default: check_size = 64;
}
uint8_t* check_bytes = malloc(check_size);
read_bytes(parser, check_bytes, check_size);
printf(" Check: ");
for (int j = 0; j < check_size; j++) {
printf("%02x ", check_bytes[j]);
}
printf("\n");
free(check_bytes);
}
void parse_stream(XZParser* parser) {
uint8_t magic[6];
read_bytes(parser, magic, 6);
printf("Stream Header Magic: ");
for (int i = 0; i < 6; i++) printf("%02x ", magic[i]);
printf("\n");
uint8_t flags[2];
read_bytes(parser, flags, 2);
int check_id = flags[1] & 0x0F;
printf("Stream Flags: 0x%02x 0x%02x (Check ID: %d)\n", flags[0], flags[1], check_id);
uint32_t header_crc = read_uint32(parser);
printf("Header CRC32: 0x%08x\n", header_crc);
printf("Blocks:\n");
while (true) {
if (parser->data[parser->offset] == 0) break;
parse_block(parser, check_id);
}
printf("Index:\n");
uint8_t index_ind;
read_bytes(parser, &index_ind, 1);
printf(" Indicator: 0x%02x\n", index_ind);
uint64_t num_records = read_vli(parser);
printf(" Number of Records: %llu\n", (unsigned long long)num_records);
for (uint64_t i = 0; i < num_records; i++) {
uint64_t unpadded = read_vli(parser);
uint64_t uncomp = read_vli(parser);
printf(" Record %llu: Unpadded %llu, Uncompressed %llu\n", (unsigned long long)i + 1, (unsigned long long)unpadded, (unsigned long long)uncomp);
}
size_t pad_start = parser->offset;
int pad_bytes = 0;
while (pad_bytes < 3 && parser->data[parser->offset++] == 0) pad_bytes++;
parser->offset = pad_start + pad_bytes;
printf(" Index Padding: %d bytes\n", pad_bytes);
uint32_t index_crc = read_uint32(parser);
printf(" Index CRC32: 0x%08x\n", index_crc);
uint32_t footer_crc = read_uint32(parser);
printf("Footer CRC32: 0x%08x\n", footer_crc);
uint32_t backward_size = read_uint32(parser);
printf("Backward Size: %u\n", (backward_size + 1) * 4);
uint8_t footer_flags[2];
read_bytes(parser, footer_flags, 2);
printf("Footer Flags: 0x%02x 0x%02x\n", footer_flags[0], footer_flags[1]);
uint8_t footer_magic[2];
read_bytes(parser, footer_magic, 2);
printf("Footer Magic: %02x %02x\n", footer_magic[0], footer_magic[1]);
int pad_count = 0;
while (parser->offset < parser->size && parser->data[parser->offset++] == 0) pad_count++;
if (pad_count % 4 != 0) {
fprintf(stderr, "Invalid Stream Padding\n");
exit(1);
}
printf("Stream Padding: %d bytes\n", pad_count);
parser->offset -= pad_count; // Reset
}
void xzparser_parse_and_print(XZParser* parser) {
int stream_num = 1;
while (parser->offset < parser->size) {
printf("Stream %d:\n", stream_num++);
parse_stream(parser);
}
}
void xzparser_write(XZParser* parser, const char* output_filename) {
FILE* f = fopen(output_filename, "wb");
if (!f) return;
fwrite(parser->original_data, 1, parser->original_size, f);
fclose(f);
}