Task 780: .VMDK File Format

Task 780: .VMDK File Format

File Format Specifications for .VMDK

The .VMDK file format is an open virtual disk format developed by VMware for virtual machines. It supports various types of virtual disks, including monolithic, split, sparse, flat, and device-backed disks. The format consists of a text-based descriptor file (which may be standalone or embedded) and one or more extent files containing the actual disk data. The descriptor defines metadata, extents, and disk properties. Sparse extents have a binary header, while flat extents are raw data. There are also variants for ESX/VMFS environments. The format is documented in VMware's Virtual Disk Development Kit and technical notes like Virtual Disk Format 1.1 and 5.0.

  1. List of all the properties of this file format intrinsic to its file system:

Descriptor Header Properties:

  • version: The version number of the descriptor (typically 1).
  • CID: Content ID, a 32-bit value for disk integrity and updates.
  • parentCID: Parent Content ID for delta links or snapshots (ffffffff for base disks).
  • createType: Type of virtual disk (e.g., monolithicSparse, twoGbMaxExtentFlat, vmfs, streamOptimized).
  • parentFileNameHint: Path to the parent disk (optional for snapshots).

Extent Description Properties (per extent):

  • Access: Access mode (RW, RDONLY, NOACCESS).
  • SizeInSectors: Size of the extent in 512-byte sectors.
  • Type: Extent type (FLAT, SPARSE, ZERO, VMFS, VMFSSPARSE, VMFSRDM, VMFSRAW).
  • Filename: Path to the extent data file.
  • Offset: Starting offset in bytes (for FLAT extents).

Disk Database (DDB) Properties:

  • adapterType: Disk adapter type (ide, buslogic, lsilogic, legacyESX).
  • geometry.cylinders: Number of cylinders in disk geometry.
  • geometry.heads: Number of heads in disk geometry.
  • geometry.sectors: Number of sectors per track in disk geometry.
  • virtualHWVersion: Virtual hardware version.
  • toolsVersion: VMware Tools version (optional).
  • hardwareVersion: Hardware compatibility version (optional).
  • uuid: Disk UUID (optional).
  • longContentID: Long content ID for integrity (optional).
  • thinProvisioned: Flag for thin provisioning (0 or 1, optional).
  • deletable: Flag indicating if deletable by tools (true/false, optional).

Sparse Extent Header Properties (for hosted sparse extents):

  • magicNumber: Magic value (0x564d444b or 'KDMV').
  • version: Header version (typically 1 or 2).
  • flags: Bit flags (e.g., bit 0 for newline detection, bit 1 for redundant grain table, bit 16 for compression, bit 17 for markers).
  • capacity: Total disk capacity in sectors.
  • grainSize: Grain size in sectors (power of 2, default 128).
  • descriptorOffset: Offset to embedded descriptor in sectors.
  • descriptorSize: Size of embedded descriptor in sectors.
  • numGTEsPerGT: Number of grain table entries per grain table (typically 512).
  • rgdOffset: Offset to redundant grain directory in sectors.
  • gdOffset: Offset to grain directory in sectors.
  • overHead: Number of sectors used by metadata.
  • uncleanShutdown: Flag for abnormal shutdown (TRUE/FALSE).
  • singleEndLineChar: Single end-of-line character ('\n').
  • nonEndLineChar: Non-end-of-line character (' ').
  • doubleEndLineChar1: First double end-of-line character ('\r').
  • doubleEndLineChar2: Second double end-of-line character ('\n').
  • compressAlgorithm: Compression algorithm (0=none, 1=DEFLATE).

VMFS Sparse (COW) Extent Header Properties (for ESX sparse extents):

  • magicNumber: Magic value (0x44574f43 or 'COWD').
  • version: Header version (typically 1).
  • flags: Bit flags (typically 3).
  • numSectors: Total sectors on the base disk.
  • grainSize: Grain granularity in sectors (default 1).
  • gdOffset: Offset to grain directory.
  • numGDEntries: Number of grain directory entries.
  • freeSector: Next free data sector.
  • generation: Disk generation number.
  • name: Disk name (up to COWDISK_MAX_NAME_LEN).
  • description: Disk description (up to COWDISK_MAX_DESC_LEN).
  • savedGeneration: Saved generation number.
  • uncleanShutdown: Flag for abnormal shutdown.
  • root.cylinders / child.parentFileName: Union for root (geometry) or child (parent filename and generation).
  1. Two direct download links for .VMDK files:
  1. Ghost blog embedded HTML JavaScript for drag-and-drop .VMDK file dump:
VMDK Property Dumper

Drag and Drop .VMDK File

Drop .VMDK file here
  1. Python class for .VMDK handling:
import struct
import os

class VMDKHandler:
    def __init__(self, filepath):
        self.filepath = filepath
        self.properties = {}
        self.is_sparse = False
        self.file_content = None

    def read(self):
        with open(self.filepath, 'rb') as f:
            self.file_content = f.read()
        data_view = memoryview(self.file_content)
        if len(self.file_content) >= 512 and struct.unpack_from('<I', data_view, 0)[0] == 0x4b444d56:  # 'VMDK'
            self.is_sparse = True
            self.properties = self.parse_sparse_header(data_view)
            descriptor_offset = self.properties['descriptorOffset'] * 512
            descriptor_size = self.properties['descriptorSize'] * 512
            if descriptor_offset > 0 and descriptor_size > 0:
                descriptor_text = self.file_content[descriptor_offset:descriptor_offset + descriptor_size].decode('utf-8', errors='ignore')
                self.properties.update(self.parse_descriptor(descriptor_text))
        else:
            descriptor_text = self.file_content.decode('utf-8', errors='ignore')
            self.properties = self.parse_descriptor(descriptor_text)

    def parse_sparse_header(self, data_view):
        return {
            'magicNumber': hex(struct.unpack_from('<I', data_view, 0)[0]),
            'version': struct.unpack_from('<I', data_view, 4)[0],
            'flags': struct.unpack_from('<I', data_view, 8)[0],
            'capacity': struct.unpack_from('<Q', data_view, 12)[0],
            'grainSize': struct.unpack_from('<Q', data_view, 20)[0],
            'descriptorOffset': struct.unpack_from('<Q', data_view, 28)[0],
            'descriptorSize': struct.unpack_from('<Q', data_view, 36)[0],
            'numGTEsPerGT': struct.unpack_from('<I', data_view, 44)[0],
            'rgdOffset': struct.unpack_from('<Q', data_view, 48)[0],
            'gdOffset': struct.unpack_from('<Q', data_view, 56)[0],
            'overHead': struct.unpack_from('<Q', data_view, 64)[0],
            'uncleanShutdown': bool(struct.unpack_from('<B', data_view, 72)[0]),
            'singleEndLineChar': chr(struct.unpack_from('<B', data_view, 73)[0]),
            'nonEndLineChar': chr(struct.unpack_from('<B', data_view, 74)[0]),
            'doubleEndLineChar1': chr(struct.unpack_from('<B', data_view, 75)[0]),
            'doubleEndLineChar2': chr(struct.unpack_from('<B', data_view, 76)[0]),
            'compressAlgorithm': struct.unpack_from('<H', data_view, 77)[0]
        }

    def parse_descriptor(self, text):
        lines = [line.strip() for line in text.split('\n') if line.strip() and not line.strip().startswith('#')]
        props = {'extents': [], 'ddb': {}}
        for line in lines:
            if ' ' in line and (line.startswith('RW') or line.startswith('RDONLY') or line.startswith('NOACCESS')):
                parts = line.split()
                props['extents'].append({
                    'access': parts[0],
                    'sizeInSectors': int(parts[1]),
                    'type': parts[2],
                    'filename': parts[3].strip('"'),
                    'offset': int(parts[4]) if len(parts) > 4 else 0
                })
            elif '=' in line:
                key, value = line.split('=', 1)
                key = key.strip()
                value = value.strip().strip('"')
                if key.startswith('ddb.'):
                    props['ddb'][key[4:]] = value
                else:
                    props[key] = value
        return props

    def print_properties(self):
        import json
        print(json.dumps(self.properties, indent=4))

    def write(self, new_filepath=None):
        if new_filepath is None:
            new_filepath = self.filepath
        with open(new_filepath, 'wb') as f:
            f.write(self.file_content)  # Writes back original; extend for modifications

# Usage example:
# handler = VMDKHandler('example.vmdk')
# handler.read()
# handler.print_properties()
# handler.write('modified.vmdk')
  1. Java class for .VMDK handling:
import java.io.*;
import java.nio.*;
import java.nio.channels.FileChannel;
import java.util.*;

public class VMDKHandler {
    private String filepath;
    private Map<String, Object> properties = new HashMap<>();
    private boolean isSparse = false;
    private byte[] fileContent;

    public VMDKHandler(String filepath) {
        this.filepath = filepath;
    }

    public void read() throws IOException {
        File file = new File(filepath);
        fileContent = new byte[(int) file.length()];
        try (FileInputStream fis = new FileInputStream(file)) {
            fis.read(fileContent);
        }
        ByteBuffer buffer = ByteBuffer.wrap(fileContent).order(ByteOrder.LITTLE_ENDIAN);
        if (fileContent.length >= 512 && buffer.getInt(0) == 0x4b444d56) { // 'VMDK'
            isSparse = true;
            parseSparseHeader(buffer);
            long descriptorOffset = (long) properties.get("descriptorOffset") * 512;
            long descriptorSize = (long) properties.get("descriptorSize") * 512;
            if (descriptorOffset > 0 && descriptorSize > 0) {
                String descriptorText = new String(Arrays.copyOfRange(fileContent, (int) descriptorOffset, (int) (descriptorOffset + descriptorSize)), "UTF-8");
                parseDescriptor(descriptorText);
            }
        } else {
            String text = new String(fileContent, "UTF-8");
            parseDescriptor(text);
        }
    }

    private void parseSparseHeader(ByteBuffer buffer) {
        properties.put("magicNumber", Integer.toHexString(buffer.getInt(0)));
        properties.put("version", buffer.getInt(4));
        properties.put("flags", buffer.getInt(8));
        properties.put("capacity", buffer.getLong(12));
        properties.put("grainSize", buffer.getLong(20));
        properties.put("descriptorOffset", buffer.getLong(28));
        properties.put("descriptorSize", buffer.getLong(36));
        properties.put("numGTEsPerGT", buffer.getInt(44));
        properties.put("rgdOffset", buffer.getLong(48));
        properties.put("gdOffset", buffer.getLong(56));
        properties.put("overHead", buffer.getLong(64));
        properties.put("uncleanShutdown", buffer.get(72) == 1);
        properties.put("singleEndLineChar", (char) buffer.get(73));
        properties.put("nonEndLineChar", (char) buffer.get(74));
        properties.put("doubleEndLineChar1", (char) buffer.get(75));
        properties.put("doubleEndLineChar2", (char) buffer.get(76));
        properties.put("compressAlgorithm", buffer.getShort(77) & 0xFFFF);
    }

    private void parseDescriptor(String text) {
        String[] lines = text.split("\n");
        List<Map<String, Object>> extents = new ArrayList<>();
        Map<String, String> ddb = new HashMap<>();
        properties.put("extents", extents);
        properties.put("ddb", ddb);
        for (String line : lines) {
            line = line.trim();
            if (line.isEmpty() || line.startsWith("#")) continue;
            if (line.matches("^(RW|RDONLY|NOACCESS)\\s+\\d+\\s+\\w+\\s+.*")) {
                String[] parts = line.split("\\s+");
                Map<String, Object> extent = new HashMap<>();
                extent.put("access", parts[0]);
                extent.put("sizeInSectors", Integer.parseInt(parts[1]));
                extent.put("type", parts[2]);
                extent.put("filename", parts[3].replace("\"", ""));
                extent.put("offset", parts.length > 4 ? Integer.parseInt(parts[4]) : 0);
                extents.add(extent);
            } else if (line.startsWith("ddb.")) {
                String[] parts = line.split(" = ");
                ddb.put(parts[0].substring(4).trim(), parts[1].replace("\"", "").trim());
            } else if (line.contains("=")) {
                String[] parts = line.split("=");
                properties.put(parts[0].trim(), parts[1].trim().replace("\"", ""));
            }
        }
    }

    public void printProperties() {
        System.out.println(properties);
    }

    public void write(String newFilepath) throws IOException {
        if (newFilepath == null) newFilepath = filepath;
        try (FileOutputStream fos = new FileOutputStream(newFilepath)) {
            fos.write(fileContent);  // Writes back original; extend for modifications
        }
    }

    // Usage example:
    // public static void main(String[] args) throws IOException {
    //     VMDKHandler handler = new VMDKHandler("example.vmdk");
    //     handler.read();
    //     handler.printProperties();
    //     handler.write("modified.vmdk");
    // }
}
  1. JavaScript class for .VMDK handling (Node.js):
const fs = require('fs');

class VMDKHandler {
    constructor(filepath) {
        this.filepath = filepath;
        this.properties = {};
        this.isSparse = false;
        this.fileContent = null;
    }

    read() {
        this.fileContent = fs.readFileSync(this.filepath);
        const dataView = new DataView(this.fileContent.buffer);
        if (this.fileContent.length >= 512 && dataView.getUint32(0, true) === 0x4b444d56) { // 'VMDK'
            this.isSparse = true;
            this.parseSparseHeader(dataView);
            const descriptorOffset = this.properties.descriptorOffset * 512;
            const descriptorSize = this.properties.descriptorSize * 512;
            if (descriptorOffset > 0 && descriptorSize > 0) {
                const descriptorText = this.fileContent.slice(descriptorOffset, descriptorOffset + descriptorSize).toString('utf-8');
                Object.assign(this.properties, this.parseDescriptor(descriptorText));
            }
        } else {
            const text = this.fileContent.toString('utf-8');
            this.properties = this.parseDescriptor(text);
        }
    }

    parseSparseHeader(dataView) {
        this.properties = {
            magicNumber: dataView.getUint32(0, true).toString(16),
            version: dataView.getUint32(4, true),
            flags: dataView.getUint32(8, true),
            capacity: Number(dataView.getBigUint64(12, true)),
            grainSize: Number(dataView.getBigUint64(20, true)),
            descriptorOffset: Number(dataView.getBigUint64(28, true)),
            descriptorSize: Number(dataView.getBigUint64(36, true)),
            numGTEsPerGT: dataView.getUint32(44, true),
            rgdOffset: Number(dataView.getBigUint64(48, true)),
            gdOffset: Number(dataView.getBigUint64(56, true)),
            overHead: Number(dataView.getBigUint64(64, true)),
            uncleanShutdown: dataView.getUint8(72) === 1,
            singleEndLineChar: String.fromCharCode(dataView.getUint8(73)),
            nonEndLineChar: String.fromCharCode(dataView.getUint8(74)),
            doubleEndLineChar1: String.fromCharCode(dataView.getUint8(75)),
            doubleEndLineChar2: String.fromCharCode(dataView.getUint8(76)),
            compressAlgorithm: dataView.getUint16(77, true)
        };
    }

    parseDescriptor(text) {
        const lines = text.split('\n').map(line => line.trim()).filter(line => line && !line.startsWith('#'));
        const props = { extents: [], ddb: {} };
        lines.forEach(line => {
            if (/^(RW|RDONLY|NOACCESS)/.test(line)) {
                const parts = line.split(/\s+/);
                props.extents.push({
                    access: parts[0],
                    sizeInSectors: parseInt(parts[1]),
                    type: parts[2],
                    filename: parts[3].replace(/"/g, ''),
                    offset: parts[4] ? parseInt(parts[4]) : 0
                });
            } else if (line.startsWith('ddb.')) {
                const [key, value] = line.split(' = ');
                props.ddb[key.replace('ddb.', '')] = value.replace(/"/g, '');
            } else if (line.includes('=')) {
                const [key, value] = line.split('=');
                props[key.trim()] = value.trim().replace(/"/g, '');
            }
        });
        return props;
    }

    printProperties() {
        console.log(JSON.stringify(this.properties, null, 4));
    }

    write(newFilepath = this.filepath) {
        fs.writeFileSync(newFilepath, this.fileContent);  // Writes back original; extend for modifications
    }
}

// Usage example:
// const handler = new VMDKHandler('example.vmdk');
// handler.read();
// handler.printProperties();
// handler.write('modified.vmdk');
  1. C class (using C++ for class support):
#include <iostream>
#include <fstream>
#include <vector>
#include <map>
#include <string>
#include <iomanip>
#include <cstring>

class VMDKHandler {
private:
    std::string filepath;
    std::map<std::string, std::string> properties;
    std::vector<std::map<std::string, std::string>> extents;
    std::map<std::string, std::string> ddb;
    bool is_sparse = false;
    std::vector<char> file_content;

    void parse_sparse_header(const char* data) {
        uint32_t magic = *(uint32_t*)(data + 0);
        if (magic != 0x4b444d56) return;
        is_sparse = true;
        properties["magicNumber"] = std::to_string(magic);
        properties["version"] = std::to_string(*(uint32_t*)(data + 4));
        properties["flags"] = std::to_string(*(uint32_t*)(data + 8));
        properties["capacity"] = std::to_string(*(uint64_t*)(data + 12));
        properties["grainSize"] = std::to_string(*(uint64_t*)(data + 20));
        properties["descriptorOffset"] = std::to_string(*(uint64_t*)(data + 28));
        properties["descriptorSize"] = std::to_string(*(uint64_t*)(data + 36));
        properties["numGTEsPerGT"] = std::to_string(*(uint32_t*)(data + 44));
        properties["rgdOffset"] = std::to_string(*(uint64_t*)(data + 48));
        properties["gdOffset"] = std::to_string(*(uint64_t*)(data + 56));
        properties["overHead"] = std::to_string(*(uint64_t*)(data + 64));
        properties["uncleanShutdown"] = std::to_string(static_cast<bool>(*(uint8_t*)(data + 72)));
        properties["singleEndLineChar"] = std::string(1, *(char*)(data + 73));
        properties["nonEndLineChar"] = std::string(1, *(char*)(data + 74));
        properties["doubleEndLineChar1"] = std::string(1, *(char*)(data + 75));
        properties["doubleEndLineChar2"] = std::string(1, *(char*)(data + 76));
        properties["compressAlgorithm"] = std::to_string(*(uint16_t*)(data + 77));
    }

    void parse_descriptor(const std::string& text) {
        std::istringstream iss(text);
        std::string line;
        while (std::getline(iss, line)) {
            line.erase(0, line.find_first_not_of(" \t"));
            line.erase(line.find_last_not_of(" \t") + 1);
            if (line.empty() || line[0] == '#') continue;
            if (line.find("RW ") == 0 || line.find("RDONLY ") == 0 || line.find("NOACCESS ") == 0) {
                std::map<std::string, std::string> extent;
                std::istringstream ext_iss(line);
                ext_iss >> extent["access"] >> extent["sizeInSectors"] >> extent["type"];
                std::string filename;
                ext_iss >> filename;
                filename.erase(remove(filename.begin(), filename.end(), '"'), filename.end());
                extent["filename"] = filename;
                std::string offset;
                if (ext_iss >> offset) extent["offset"] = offset;
                else extent["offset"] = "0";
                extents.push_back(extent);
            } else if (line.find("ddb.") == 0) {
                size_t eq_pos = line.find(" = ");
                if (eq_pos != std::string::npos) {
                    std::string key = line.substr(4, eq_pos - 4);
                    std::string value = line.substr(eq_pos + 3);
                    value.erase(remove(value.begin(), value.end(), '"'), value.end());
                    ddb[key] = value;
                }
            } else if (line.find('=') != std::string::npos) {
                size_t eq_pos = line.find('=');
                std::string key = line.substr(0, eq_pos);
                key.erase(key.find_last_not_of(" \t") + 1);
                std::string value = line.substr(eq_pos + 1);
                value.erase(0, value.find_first_not_of(" \t"));
                value.erase(remove(value.begin(), value.end(), '"'), value.end());
                properties[key] = value;
            }
        }
    }

public:
    VMDKHandler(const std::string& fp) : filepath(fp) {}

    void read() {
        std::ifstream file(filepath, std::ios::binary | std::ios::ate);
        std::streamsize size = file.tellg();
        file.seekg(0, std::ios::beg);
        file_content.resize(size);
        file.read(file_content.data(), size);
        const char* data = file_content.data();
        if (size >= 512) {
            parse_sparse_header(data);
            if (is_sparse) {
                uint64_t desc_offset = std::stoull(properties["descriptorOffset"]) * 512;
                uint64_t desc_size = std::stoull(properties["descriptorSize"]) * 512;
                if (desc_offset > 0 && desc_size > 0 && desc_offset + desc_size <= size) {
                    std::string desc_text(data + desc_offset, desc_size);
                    parse_descriptor(desc_text);
                }
                return;
            }
        }
        std::string text(file_content.begin(), file_content.end());
        parse_descriptor(text);
    }

    void print_properties() {
        std::cout << "{\n";
        for (const auto& p : properties) {
            std::cout << "  \"" << p.first << "\": \"" << p.second << "\",\n";
        }
        std::cout << "  \"extents\": [\n";
        for (const auto& ext : extents) {
            std::cout << "    {\n";
            for (const auto& e : ext) {
                std::cout << "      \"" << e.first << "\": \"" << e.second << "\",\n";
            }
            std::cout << "    },\n";
        }
        std::cout << "  ],\n";
        std::cout << "  \"ddb\": {\n";
        for (const auto& d : ddb) {
            std::cout << "    \"" << d.first << "\": \"" << d.second << "\",\n";
        }
        std::cout << "  }\n";
        std::cout << "}\n";
    }

    void write(const std::string& new_filepath = "") {
        std::string out_path = new_filepath.empty() ? filepath : new_filepath;
        std::ofstream out(out_path, std::ios::binary);
        out.write(file_content.data(), file_content.size());
        // Extend for modifications by updating file_content
    }
};

// Usage example:
// int main() {
//     VMDKHandler handler("example.vmdk");
//     handler.read();
//     handler.print_properties();
//     handler.write("modified.vmdk");
//     return 0;
// }