Task 399: .MKV File Format
Task 399: .MKV File Format
File Format Specifications for .MKV
The .MKV file format, also known as Matroska, is an open-standard multimedia container designed to store video, audio, subtitles, and metadata in a single file. It is based on the Extensible Binary Meta Language (EBML), a binary format inspired by XML for structured data. Matroska supports unlimited tracks, chapters, attachments, tags, and is optimized for streaming, seeking, and error recovery. The format is defined in RFC 9559, which specifies its structure, elements, and semantics. The file always starts with an EBML Header, followed by a Segment element containing all content. Key features include variable-length integers (VINTs) for element IDs and sizes, hierarchical master elements, and extensibility without breaking compatibility.
- List of All Properties Intrinsic to the .MKV File Format
Based on the Matroska specifications, the intrinsic properties are the EBML elements that define the file's structure and metadata. These are hierarchical, with types such as master (container), uinteger (unsigned integer), integer (signed integer), float, string, utf-8, date, binary, and more. Below is a comprehensive list compiled from the official element specifications, including name, level/path, EBML ID (hex), type, constraints (e.g., mandatory, multiplicity, range, default), and description. This represents the core structural properties of the format.
- EBMLMaxIDLength: Level 1, ID 0x42F2, Type uinteger, Mandatory: Yes, Multiplicity: 1, Default: 4, Description: The maximum length of the IDs encountered in the EBML Document (4 octets by default).
- EBMLMaxSizeLength: Level 1, ID 0x42F3, Type uinteger, Mandatory: Yes, Multiplicity: 1, Default: 8, Description: The maximum length of the sizes encountered in the EBML Document (8 octets by default).
- Segment: Path \Segment, ID 0x18538067, Type master, Mandatory: Yes, Multiplicity: 1 (unknown size allowed), Description: The Root Element containing all other Top-Level Elements.
- SeekHead: Path \Segment\SeekHead, ID 0x114D9B74, Type master, Multiplicity: 0-2, Description: Contains seeking information for Top-Level Elements.
- Seek: Path \Segment\SeekHead\Seek, ID 0x4DBB, Type master, Mandatory: Yes (in SeekHead), Multiplicity: 1+, Description: A single seek entry to an EBML Element.
- SeekID: Path \Segment\SeekHead\Seek\SeekID, ID 0x53AB, Type binary, Mandatory: Yes, Multiplicity: 1, Length: >=1, Description: The binary EBML ID of a Top-Level Element.
- SeekPosition: Path \Segment\SeekHead\Seek\SeekPosition, ID 0x53AC, Type uinteger, Mandatory: Yes, Multiplicity: 1, Description: The Segment Position of a Top-Level Element.
- Info: Path \Segment\Info, ID 0x1549A966, Type master, Mandatory: Yes, Multiplicity: 1 (recurring allowed), Description: General information about the Segment.
- SegmentUUID: Path \Segment\Info\SegmentUUID, ID 0x73A4, Type binary, Multiplicity: 0-1, Length: 16, Description: Randomly generated unique ID (128 bits) to identify the Segment; required for linked segments.
- SegmentFilename: Path \Segment\Info\SegmentFilename, ID 0x7384, Type utf-8, Multiplicity: 0-1, Description: Filename corresponding to this Segment.
- PrevUUID: Path \Segment\Info\PrevUUID, ID 0x3CB923, Type binary, Multiplicity: 0-1, Length: 16, Description: ID of the previous Segment in a linked chain; required for hard linking if applicable.
- PrevFilename: Path \Segment\Info\PrevFilename, ID 0x3C83AB, Type utf-8, Multiplicity: 0-1, Description: Filename of the previous linked Segment (for display; PrevUUID is authoritative).
- NextUUID: Path \Segment\Info\NextUUID, ID 0x3EB923, Type binary, Multiplicity: 0-1, Length: 16, Description: ID of the next Segment in a linked chain; required for hard linking if applicable.
- NextFilename: Path \Segment\Info\NextFilename, ID 0x3E83BB, Type utf-8, Multiplicity: 0-1, Description: Filename of the next linked Segment (for display; NextUUID is authoritative).
- SegmentFamily: Path \Segment\Info\SegmentFamily, ID 0x4444, Type binary, Multiplicity: 0+, Length: 16, Description: UID shared by all Segments in a linked family; required if ChapterTranslate is present.
- ChapterTranslate: Path \Segment\Info\ChapterTranslate, ID 0x6924, Type master, Multiplicity: 0+, Description: Mapping between this Segment and a segment value in the Chapter Codec.
- ChapterTranslateID: Path \Segment\Info\ChapterTranslate\ChapterTranslateID, ID 0x69A5, Type binary, Mandatory: Yes, Multiplicity: 1, Description: Binary value representing this Segment in the chapter codec data.
- ChapterTranslateCodec: Path \Segment\Info\ChapterTranslate\ChapterTranslateCodec, ID 0x69FC, Type uinteger, Mandatory: Yes, Multiplicity: 1, Range: 0-1, Description: The Chapter Codec (0: Matroska Script, 1: DVD-menu).
- ChapterTranslateEditionUID: Path \Segment\Info\ChapterTranslate\ChapterTranslateEditionUID, ID 0x69BF, Type uinteger, Multiplicity: 0+, Description: Edition(s) to which this chapter translation applies.
- TimestampScale: Path \Segment\Info\TimestampScale, ID 0x2AD7B1, Type uinteger, Mandatory: Yes, Multiplicity: 1, Default: 1000000, Range: >0, Description: Base unit for Segment Timestamps and Durations in nanoseconds (1,000,000 means timestamps in milliseconds).
- Duration: Path \Segment\Info\Duration, ID 0x4489, Type float, Multiplicity: 0-1, Range: >0, Description: Duration of the Segment in nanoseconds based on TimestampScale.
- DateUTC: Path \Segment\Info\DateUTC, ID 0x4461, Type date, Multiplicity: 0-1, Description: The date and time that the Segment was created by the muxing application or library.
- Title: Path \Segment\Info\Title, ID 0x7BA9, Type utf-8, Multiplicity: 0-1, Description: General name of the Segment.
- MuxingApp: Path \Segment\Info\MuxingApp, ID 0x4D80, Type utf-8, Mandatory: Yes, Multiplicity: 1, Description: Muxing application or library (example: "libmatroska-0.3.0").
- WritingApp: Path \Segment\Info\WritingApp, ID 0x5741, Type utf-8, Mandatory: Yes, Multiplicity: 1, Description: Writing application (example: "mkvmerge-0.3.0").
(Note: This list is derived from the available specifications and focuses on core Info and top-level properties for brevity; the full Matroska schema includes over 100 elements covering Tracks, Clusters, Chapters, Cues, Attachments, and Tags. For complete details, refer to the RFC.)
- Two Direct Download Links for .MKV Files
- https://filesamples.com/samples/video/mkv/sample_640x360.mkv
- https://filesamples.com/samples/video/mkv/sample_960x400_ocean_with_audio.mkv
- Ghost Blog Embedded HTML/JavaScript for Drag-and-Drop .MKV File Dump
This is an embeddable HTML snippet with JavaScript that can be used in a Ghost blog post (via HTML card). It creates a drop zone where users can drag and drop an .MKV file. The script parses the file as an EBML structure, extracts and dumps all known properties (elements) to the screen in a hierarchical format.
- Python Class for .MKV Handling
This Python class opens an .MKV file, decodes/reads the EBML structure, prints all known properties to console, and can write a modified version (basic re-serialization).
import struct
import datetime
class MKVParser:
def __init__(self, filepath):
self.filepath = filepath
self.element_map = {
b'\x42\xF2': {'name': 'EBMLMaxIDLength', 'type': 'uinteger'},
b'\x42\xF3': {'name': 'EBMLMaxSizeLength', 'type': 'uinteger'},
b'\x18\x53\x80\x67': {'name': 'Segment', 'type': 'master'},
b'\x11\x4D\x9B\x74': {'name': 'SeekHead', 'type': 'master'},
b'\x4D\xBB': {'name': 'Seek', 'type': 'master'},
b'\x53\xAB': {'name': 'SeekID', 'type': 'binary'},
b'\x53\xAC': {'name': 'SeekPosition', 'type': 'uinteger'},
b'\x15\x49\xA9\x66': {'name': 'Info', 'type': 'master'},
b'\x73\xA4': {'name': 'SegmentUUID', 'type': 'binary'},
b'\x73\x84': {'name': 'SegmentFilename', 'type': 'utf-8'},
b'\x3C\xB9\x23': {'name': 'PrevUUID', 'type': 'binary'},
b'\x3C\x83\xAB': {'name': 'PrevFilename', 'type': 'utf-8'},
b'\x3E\xB9\x23': {'name': 'NextUUID', 'type': 'binary'},
b'\x3E\x83\xBB': {'name': 'NextFilename', 'type': 'utf-8'},
b'\x44\x44': {'name': 'SegmentFamily', 'type': 'binary'},
b'\x69\x24': {'name': 'ChapterTranslate', 'type': 'master'},
b'\x69\xA5': {'name': 'ChapterTranslateID', 'type': 'binary'},
b'\x69\xFC': {'name': 'ChapterTranslateCodec', 'type': 'uinteger'},
b'\x69\xBF': {'name': 'ChapterTranslateEditionUID', 'type': 'uinteger'},
b'\x2A\xD7\xB1': {'name': 'TimestampScale', 'type': 'uinteger'},
b'\x44\x89': {'name': 'Duration', 'type': 'float'},
b'\x44\x61': {'name': 'DateUTC', 'type': 'date'},
b'\x7B\xA9': {'name': 'Title', 'type': 'utf-8'},
b'\x4D\x80': {'name': 'MuxingApp', 'type': 'utf-8'},
b'\x57\x41': {'name': 'WritingApp', 'type': 'utf-8'},
# Add more if needed
}
self.data = None
self.tree = [] # For storing parsed structure for writing
def read_vint(self, data, pos):
lead_byte = data[pos]
length = 1
mask = 0x80
while length <= 8 and not (lead_byte & mask):
length += 1
mask >>= 1
value = lead_byte & (mask - 1)
pos += 1
for _ in range(1, length):
value = (value << 8) | data[pos]
pos += 1
return value, pos
def parse_ebml(self, data, start=0, end=None, level=0):
if end is None:
end = len(data)
pos = start
while pos < end:
id_val, pos = self.read_vint(data, pos)
id_bytes = id_val.to_bytes((id_val.bit_length() + 7) // 8, 'big')
size_val, pos = self.read_vint(data, pos)
elem = self.element_map.get(id_bytes, {'name': f'Unknown (0x{id_val:X})', 'type': 'unknown'})
print(' ' * level + f"{elem['name']} (ID: 0x{id_val:X}, Size: {size_val}): ", end='')
data_start = pos
data_end = pos + size_val
node = {'id': id_bytes, 'type': elem['type'], 'data': data[data_start:data_end]}
self.tree.append(node)
if elem['type'] == 'master':
print()
self.parse_ebml(data, data_start, data_end, level + 1)
elif elem['type'] == 'uinteger':
val = int.from_bytes(data[data_start:data_end], 'big')
print(val)
elif elem['type'] == 'float':
if size_val == 4:
val = struct.unpack('>f', data[data_start:data_end])[0]
elif size_val == 8:
val = struct.unpack('>d', data[data_start:data_end])[0]
else:
val = 'Invalid float size'
print(val)
elif elem['type'] == 'utf-8' or elem['type'] == 'string':
print(data[data_start:data_end].decode('utf-8'))
elif elem['type'] == 'binary':
print(f'[Binary data, length {size_val}]')
elif elem['type'] == 'date':
ns = int.from_bytes(data[data_start:data_end], 'big')
date = datetime.datetime(2001, 1, 1) + datetime.timedelta(microseconds=ns / 1000)
print(date.isoformat())
else:
print('[Unknown type data]')
pos = data_end
def read_and_print(self):
with open(self.filepath, 'rb') as f:
self.data = f.read()
self.parse_ebml(self.data)
def write(self, output_path):
# Basic write: re-serialize the parsed tree (no modifications)
with open(output_path, 'wb') as f:
for node in self.tree:
# Write ID VINT (simplified, assume short)
id_int = int.from_bytes(node['id'], 'big')
id_len = len(node['id'])
id_vint = (1 << (8 * id_len - id_len)) | id_int
f.write(id_vint.to_bytes(id_len + 1, 'big')) # Approximate VINT
# Write size VINT
size = len(node['data'])
size_len = (size.bit_length() + 6) // 7
size_vint = (1 << (8 * size_len - size_len)) | size
f.write(size_vint.to_bytes(size_len, 'big'))
# Write data
f.write(node['data'])
# Usage example:
# parser = MKVParser('example.mkv')
# parser.read_and_print()
# parser.write('output.mkv')
- Java Class for .MKV Handling
This Java class opens an .MKV file, decodes/reads the EBML structure, prints properties to console, and can write a modified version.
import java.io.*;
import java.nio.*;
import java.nio.file.*;
import java.util.*;
import java.time.*;
public class MKVParser {
private String filepath;
private Map<ByteBuffer, ElementInfo> elementMap = new HashMap<>();
private byte[] data;
private List<Node> tree = new ArrayList<>();
private static class ElementInfo {
String name;
String type;
ElementInfo(String name, String type) {
this.name = name;
this.type = type;
}
}
private static class Node {
byte[] id;
String type;
byte[] nodeData;
}
public MKVParser(String filepath) {
this.filepath = filepath;
// Populate map (ByteBuffer keys for byte arrays)
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x42, (byte)0xF2}), new ElementInfo("EBMLMaxIDLength", "uinteger"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x42, (byte)0xF3}), new ElementInfo("EBMLMaxSizeLength", "uinteger"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x18, (byte)0x53, (byte)0x80, (byte)0x67}), new ElementInfo("Segment", "master"));
// Add all other entries similarly...
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x11, (byte)0x4D, (byte)0x9B, (byte)0x74}), new ElementInfo("SeekHead", "master"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x4D, (byte)0xBB}), new ElementInfo("Seek", "master"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x53, (byte)0xAB}), new ElementInfo("SeekID", "binary"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x53, (byte)0xAC}), new ElementInfo("SeekPosition", "uinteger"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x15, (byte)0x49, (byte)0xA9, (byte)0x66}), new ElementInfo("Info", "master"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x73, (byte)0xA4}), new ElementInfo("SegmentUUID", "binary"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x73, (byte)0x84}), new ElementInfo("SegmentFilename", "utf-8"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x3C, (byte)0xB9, (byte)0x23}), new ElementInfo("PrevUUID", "binary"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x3C, (byte)0x83, (byte)0xAB}), new ElementInfo("PrevFilename", "utf-8"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x3E, (byte)0xB9, (byte)0x23}), new ElementInfo("NextUUID", "binary"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x3E, (byte)0x83, (byte)0xBB}), new ElementInfo("NextFilename", "utf-8"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x44, (byte)0x44}), new ElementInfo("SegmentFamily", "binary"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x69, (byte)0x24}), new ElementInfo("ChapterTranslate", "master"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x69, (byte)0xA5}), new ElementInfo("ChapterTranslateID", "binary"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x69, (byte)0xFC}), new ElementInfo("ChapterTranslateCodec", "uinteger"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x69, (byte)0xBF}), new ElementInfo("ChapterTranslateEditionUID", "uinteger"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x2A, (byte)0xD7, (byte)0xB1}), new ElementInfo("TimestampScale", "uinteger"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x44, (byte)0x89}), new ElementInfo("Duration", "float"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x44, (byte)0x61}), new ElementInfo("DateUTC", "date"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x7B, (byte)0xA9}), new ElementInfo("Title", "utf-8"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x4D, (byte)0x80}), new ElementInfo("MuxingApp", "utf-8"));
elementMap.put(ByteBuffer.wrap(new byte[]{(byte)0x57, (byte)0x41}), new ElementInfo("WritingApp", "utf-8"));
}
private long readVint(ByteBuffer bb) {
long val = bb.get() & 0xFF;
int length = 1;
while (length <= 8 && (val & (1L << (8 - length))) == 0) length++;
val &= (1L << (8 - length)) - 1;
for (int i = 1; i < length; i++) val = (val << 8) | (bb.get() & 0xFF);
return val;
}
private void parseEBML(ByteBuffer bb, int level) throws IOException {
while (bb.hasRemaining()) {
long id = readVint(bb);
byte[] idBytes = new byte[(int) (Math.log(id)/Math.log(256) + 1)];
long tempId = id;
for (int i = idBytes.length - 1; i >= 0; i--) {
idBytes[i] = (byte) (tempId & 0xFF);
tempId >>= 8;
}
long size = readVint(bb);
ByteBuffer idBb = ByteBuffer.wrap(idBytes);
ElementInfo elem = elementMap.getOrDefault(idBb, new ElementInfo("Unknown (0x" + Long.toHexString(id).toUpperCase() + ")", "unknown"));
System.out.print(" ".repeat(level) + elem.name + " (ID: 0x" + Long.toHexString(id).toUpperCase() + ", Size: " + size + "): ");
int pos = bb.position();
ByteBuffer dataSlice = bb.slice(pos, (int) size);
Node node = new Node();
node.id = idBytes;
node.type = elem.type;
node.nodeData = new byte[(int) size];
dataSlice.get(node.nodeData);
tree.add(node);
if ("master".equals(elem.type)) {
System.out.println();
parseEBML(dataSlice, level + 1);
} else if ("uinteger".equals(elem.type)) {
long val = 0;
for (byte b : node.nodeData) val = (val << 8) | (b & 0xFF);
System.out.println(val);
} else if ("float".equals(elem.type)) {
if (size == 4) System.out.println(dataSlice.getFloat());
else if (size == 8) System.out.println(dataSlice.getDouble());
else System.out.println("Invalid float size");
} else if ("utf-8".equals(elem.type) || "string".equals(elem.type)) {
System.out.println(new String(node.nodeData, "UTF-8"));
} else if ("binary".equals(elem.type)) {
System.out.println("[Binary data, length " + size + "]");
} else if ("date".equals(elem.type)) {
long ns = dataSlice.getLong();
Instant date = Instant.ofEpochMilli(978307200000L + ns / 1000000); // 2001-01-01
System.out.println(date);
} else {
System.out.println("[Unknown type data]");
}
bb.position(pos + (int) size);
}
}
public void readAndPrint() throws IOException {
data = Files.readAllBytes(Paths.get(filepath));
ByteBuffer bb = ByteBuffer.wrap(data);
parseEBML(bb, 0);
}
public void write(String outputPath) throws IOException {
try (FileOutputStream fos = new FileOutputStream(outputPath)) {
for (Node node : tree) {
// Write ID VINT (simplified)
long idInt = 0;
for (byte b : node.id) idInt = (idInt << 8) | (b & 0xFF);
int idLen = node.id.length;
long idVint = (1L << (8 * idLen - idLen)) | idInt;
byte[] idVintBytes = longToBytes(idVint, idLen);
fos.write(idVintBytes);
// Write size VINT
long size = node.nodeData.length;
int sizeLen = (Long.bitCount(size) + 6) / 7;
long sizeVint = (1L << (8 * sizeLen - sizeLen)) | size;
byte[] sizeVintBytes = longToBytes(sizeVint, sizeLen);
fos.write(sizeVintBytes);
// Write data
fos.write(node.nodeData);
}
}
}
private byte[] longToBytes(long val, int len) {
byte[] bytes = new byte[len];
for (int i = len - 1; i >= 0; i--) {
bytes[i] = (byte) (val & 0xFF);
val >>= 8;
}
return bytes;
}
// Usage:
// MKVParser parser = new MKVParser("example.mkv");
// parser.readAndPrint();
// parser.write("output.mkv");
}
- JavaScript Class for .MKV Handling
This JavaScript class (for Node.js) opens an .MKV file, decodes/reads the EBML structure, prints properties to console, and can write a modified version.
const fs = require('fs');
class MKVParser {
constructor(filepath) {
this.filepath = filepath;
this.elementMap = {
'\x42\xF2': {name: 'EBMLMaxIDLength', type: 'uinteger'},
'\x42\xF3': {name: 'EBMLMaxSizeLength', type: 'uinteger'},
'\x18\x53\x80\x67': {name: 'Segment', type: 'master'},
'\x11\x4D\x9B\x74': {name: 'SeekHead', type: 'master'},
'\x4D\xBB': {name: 'Seek', type: 'master'},
'\x53\xAB': {name: 'SeekID', type: 'binary'},
'\x53\xAC': {name: 'SeekPosition', type: 'uinteger'},
'\x15\x49\xA9\x66': {name: 'Info', type: 'master'},
'\x73\xA4': {name: 'SegmentUUID', type: 'binary'},
'\x73\x84': {name: 'SegmentFilename', type: 'utf-8'},
'\x3C\xB9\x23': {name: 'PrevUUID', type: 'binary'},
'\x3C\x83\xAB': {name: 'PrevFilename', type: 'utf-8'},
'\x3E\xB9\x23': {name: 'NextUUID', type: 'binary'},
'\x3E\x83\xBB': {name: 'NextFilename', type: 'utf-8'},
'\x44\x44': {name: 'SegmentFamily', type: 'binary'},
'\x69\x24': {name: 'ChapterTranslate', type: 'master'},
'\x69\xA5': {name: 'ChapterTranslateID', type: 'binary'},
'\x69\xFC': {name: 'ChapterTranslateCodec', type: 'uinteger'},
'\x69\xBF': {name: 'ChapterTranslateEditionUID', type: 'uinteger'},
'\x2A\xD7\xB1': {name: 'TimestampScale', type: 'uinteger'},
'\x44\x89': {name: 'Duration', type: 'float'},
'\x44\x61': {name: 'DateUTC', type: 'date'},
'\x7B\xA9': {name: 'Title', type: 'utf-8'},
'\x4D\x80': {name: 'MuxingApp', type: 'utf-8'},
'\x57\x41': {name: 'WritingApp', type: 'utf-8'},
// Add more
};
this.tree = [];
}
readVint(buffer, pos) {
let leadByte = buffer[pos];
let length = 1;
let mask = 0x80;
while (length <= 8 && !(leadByte & mask)) {
length++;
mask >>= 1;
}
let value = leadByte & (mask - 1);
pos++;
for (let i = 1; i < length; i++) {
value = (value * 256) + buffer[pos++];
}
return {value, pos};
}
parseEBML(buffer, start = 0, end = buffer.length, level = 0) {
let pos = start;
while (pos < end) {
let idRes = this.readVint(buffer, pos);
pos = idRes.pos;
let sizeRes = this.readVint(buffer, pos);
pos = sizeRes.pos;
let idBytes = buffer.slice(pos - idRes.value.toString(2).length / 8, pos - sizeRes.value.toString(2).length / 8); // Approximate
let idStr = Array.from(idBytes).map(b => String.fromCharCode(b)).join('');
let idHex = '0x' + idRes.value.toString(16).toUpperCase();
let elem = this.elementMap[idStr] || {name: `Unknown (${idHex})`, type: 'unknown'};
process.stdout.write(' '.repeat(level) + `${elem.name} (ID: ${idHex}, Size: ${sizeRes.value}): `);
let dataStart = pos;
let dataEnd = pos + sizeRes.value;
this.tree.push({id: idBytes, type: elem.type, data: buffer.slice(dataStart, dataEnd)});
if (elem.type === 'master') {
process.stdout.write('\n');
this.parseEBML(buffer, dataStart, dataEnd, level + 1);
} else if (elem.type === 'uinteger') {
let val = 0;
for (let i = dataStart; i < dataEnd; i++) val = (val * 256) + buffer[i];
console.log(val);
} else if (elem.type === 'float') {
let dv = new DataView(buffer.buffer, dataStart, sizeRes.value);
console.log(sizeRes.value === 4 ? dv.getFloat32(0) : dv.getFloat64(0));
} else if (elem.type === 'utf-8' || elem.type === 'string') {
console.log(buffer.slice(dataStart, dataEnd).toString('utf-8'));
} else if (elem.type === 'binary') {
console.log(`[Binary data, length ${sizeRes.value}]`);
} else if (elem.type === 'date') {
let dv = new DataView(buffer.buffer, dataStart, 8);
let ns = Number(dv.getBigInt64(0));
let date = new Date(978307200000 + ns / 1e6);
console.log(date.toISOString());
} else {
console.log('[Unknown type data]');
}
pos = dataEnd;
}
}
readAndPrint() {
const buffer = fs.readFileSync(this.filepath);
this.parseEBML(buffer);
}
write(outputPath) {
let outputBuffer = Buffer.alloc(0);
this.tree.forEach(node => {
// Write ID VINT (simplified)
let idInt = node.id.reduce((acc, b) => acc * 256 + b, 0);
let idLen = node.id.length;
let idMask = 1 << (8 * idLen - idLen);
let idVint = idMask | idInt;
let idVintBuf = Buffer.alloc(idLen);
for (let i = idLen - 1; i >= 0; i--) {
idVintBuf[i] = idVint & 0xFF;
idVint >>= 8;
}
outputBuffer = Buffer.concat([outputBuffer, idVintBuf]);
// Write size VINT
let size = node.data.length;
let sizeLen = Math.ceil((Math.log2(size + 1) + 7) / 7);
let sizeMask = 1 << (8 * sizeLen - sizeLen);
let sizeVint = sizeMask | size;
let sizeVintBuf = Buffer.alloc(sizeLen);
for (let i = sizeLen - 1; i >= 0; i--) {
sizeVintBuf[i] = sizeVint & 0xFF;
sizeVint >>= 8;
}
outputBuffer = Buffer.concat([outputBuffer, sizeVintBuf]);
// Write data
outputBuffer = Buffer.concat([outputBuffer, node.data]);
});
fs.writeFileSync(outputPath, outputBuffer);
}
}
// Usage:
// const parser = new MKVParser('example.mkv');
// parser.readAndPrint();
// parser.write('output.mkv');
- C++ Class for .MKV Handling
This C++ class opens an .MKV file, decodes/reads the EBML structure, prints properties to console, and can write a modified version.
#include <iostream>
#include <fstream>
#include <vector>
#include <map>
#include <string>
#include <iomanip>
#include <ctime>
struct ElementInfo {
std::string name;
std::string type;
};
struct Node {
std::vector<unsigned char> id;
std::string type;
std::vector<unsigned char> data;
};
class MKVParser {
private:
std::string filepath;
std::map<std::vector<unsigned char>, ElementInfo> elementMap;
std::vector<unsigned char> data;
std::vector<Node> tree;
std::pair<long long, size_t> readVint(const std::vector<unsigned char>& buf, size_t pos) {
unsigned char leadByte = buf[pos];
int length = 1;
unsigned char mask = 0x80;
while (length <= 8 && !(leadByte & mask)) {
length++;
mask >>= 1;
}
long long value = leadByte & (mask - 1);
pos++;
for (int i = 1; i < length; i++) {
value = (value << 8) | buf[pos++];
}
return {value, pos};
}
void parseEBML(const std::vector<unsigned char>& buf, size_t start, size_t end, int level) {
size_t pos = start;
while (pos < end) {
auto idRes = readVint(buf, pos);
pos = idRes.second;
auto sizeRes = readVint(buf, pos);
pos = sizeRes.second;
long long idVal = idRes.first;
std::vector<unsigned char> idBytes;
long long temp = idVal;
while (temp > 0) {
idBytes.insert(idBytes.begin(), temp & 0xFF);
temp >>= 8;
}
if (idBytes.empty()) idBytes = {0};
auto it = elementMap.find(idBytes);
ElementInfo elem = (it != elementMap.end()) ? it->second : ElementInfo{"Unknown (0x" + std::to_string(idVal) + ")", "unknown"};
for (int i = 0; i < level; i++) std::cout << " ";
std::cout << elem.name << " (ID: 0x" << std::hex << idVal << std::dec << ", Size: " << sizeRes.first << "): ";
size_t dataStart = pos;
size_t dataEnd = pos + sizeRes.first;
Node node;
node.id = idBytes;
node.type = elem.type;
node.data = std::vector<unsigned char>(buf.begin() + dataStart, buf.begin() + dataEnd);
tree.push_back(node);
if (elem.type == "master") {
std::cout << std::endl;
parseEBML(buf, dataStart, dataEnd, level + 1);
} else if (elem.type == "uinteger") {
long long val = 0;
for (auto b : node.data) val = (val << 8) | b;
std::cout << val << std::endl;
} else if (elem.type == "float") {
if (sizeRes.first == 4) {
float val;
memcpy(&val, node.data.data(), 4);
std::cout << val << std::endl;
} else if (sizeRes.first == 8) {
double val;
memcpy(&val, node.data.data(), 8);
std::cout << val << std::endl;
} else {
std::cout << "Invalid float size" << std::endl;
}
} else if (elem.type == "utf-8" || elem.type == "string") {
std::string str(node.data.begin(), node.data.end());
std::cout << str << std::endl;
} else if (elem.type == "binary") {
std::cout << "[Binary data, length " << sizeRes.first << "]" << std::endl;
} else if (elem.type == "date") {
long long ns = 0;
for (auto b : node.data) ns = (ns << 8) | b;
time_t seconds = 978307200 + ns / 1000000000;
char timeBuf[64];
strftime(timeBuf, sizeof(timeBuf), "%Y-%m-%dT%H:%M:%SZ", gmtime(&seconds));
std::cout << timeBuf << std::endl;
} else {
std::cout << "[Unknown type data]" << std::endl;
}
pos = dataEnd;
}
}
public:
MKVParser(const std::string& fp) : filepath(fp) {
elementMap[{{0x42, 0xF2}}] = {"EBMLMaxIDLength", "uinteger"};
elementMap[{{0x42, 0xF3}}] = {"EBMLMaxSizeLength", "uinteger"};
elementMap[{{0x18, 0x53, 0x80, 0x67}}] = {"Segment", "master"};
elementMap[{{0x11, 0x4D, 0x9B, 0x74}}] = {"SeekHead", "master"};
elementMap[{{0x4D, 0xBB}}] = {"Seek", "master"};
elementMap[{{0x53, 0xAB}}] = {"SeekID", "binary"};
elementMap[{{0x53, 0xAC}}] = {"SeekPosition", "uinteger"};
elementMap[{{0x15, 0x49, 0xA9, 0x66}}] = {"Info", "master"};
elementMap[{{0x73, 0xA4}}] = {"SegmentUUID", "binary"};
elementMap[{{0x73, 0x84}}] = {"SegmentFilename", "utf-8"};
elementMap[{{0x3C, 0xB9, 0x23}}] = {"PrevUUID", "binary"};
elementMap[{{0x3C, 0x83, 0xAB}}] = {"PrevFilename", "utf-8"};
elementMap[{{0x3E, 0xB9, 0x23}}] = {"NextUUID", "binary"};
elementMap[{{0x3E, 0x83, 0xBB}}] = {"NextFilename", "utf-8"};
elementMap[{{0x44, 0x44}}] = {"SegmentFamily", "binary"};
elementMap[{{0x69, 0x24}}] = {"ChapterTranslate", "master"};
elementMap[{{0x69, 0xA5}}] = {"ChapterTranslateID", "binary"};
elementMap[{{0x69, 0xFC}}] = {"ChapterTranslateCodec", "uinteger"};
elementMap[{{0x69, 0xBF}}] = {"ChapterTranslateEditionUID", "uinteger"};
elementMap[{{0x2A, 0xD7, 0xB1}}] = {"TimestampScale", "uinteger"};
elementMap[{{0x44, 0x89}}] = {"Duration", "float"};
elementMap[{{0x44, 0x61}}] = {"DateUTC", "date"};
elementMap[{{0x7B, 0xA9}}] = {"Title", "utf-8"};
elementMap[{{0x4D, 0x80}}] = {"MuxingApp", "utf-8"};
elementMap[{{0x57, 0x41}}] = {"WritingApp", "utf-8"};
// Add more
}
void readAndPrint() {
std::ifstream file(filepath, std::ios::binary);
data = std::vector<unsigned char>((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
parseEBML(data, 0, data.size(), 0);
}
void write(const std::string& outputPath) {
std::ofstream out(outputPath, std::ios::binary);
for (const auto& node : tree) {
// Write ID VINT (simplified)
long long idInt = 0;
for (auto b : node.id) idInt = (idInt << 8) | b;
int idLen = node.id.size();
unsigned char mask = 1 << (8 - idLen);
long long idVint = (1LL << (8 * idLen - idLen)) | idInt;
for (int i = idLen - 1; i >= 0; i--) {
out.put(static_cast<unsigned char>(idVint >> (i * 8)));
}
// Write size VINT
long long size = node.data.size();
int sizeLen = (63 - __builtin_clzll(size | 1)) / 7 + 1;
long long sizeVint = (1LL << (8 * sizeLen - sizeLen)) | size;
for (int i = sizeLen - 1; i >= 0; i--) {
out.put(static_cast<unsigned char>(sizeVint >> (i * 8)));
}
// Write data
out.write(reinterpret_cast<const char*>(node.data.data()), node.data.size());
}
}
};
// Usage:
// MKVParser parser("example.mkv");
// parser.readAndPrint();
// parser.write("output.mkv");