Task 531: .PEM File Format

Task 531: .PEM File Format

File Format Specifications for .PEM

The .PEM (Privacy Enhanced Mail) file format is a textual encoding standard for binary ASN.1 structures, commonly used for certificates, keys, and related cryptographic data in PKIX, PKCS, and CMS contexts. It is defined in RFC 7468, which specifies a base64-encoded payload wrapped in specific encapsulation boundaries. The format ensures data is human-readable and suitable for text-based transmission. Key aspects include:

  • Binary data (typically DER-encoded ASN.1) is base64-encoded per RFC 4648.
  • Encapsulation uses ASCII headers and footers with exactly five hyphens on each side.
  • No encapsulated headers are permitted in the strict modern format (unlike legacy RFC 1421 PEM, which allowed them for encryption info).
  • Parsers should handle variations in whitespace and line endings flexibly.
  • Files may contain multiple encapsulated objects.

1. List of All Properties of This File Format Intrinsic to Its File System

The .PEM format is a text-based file format, not tied to specific file system attributes beyond standard file metadata (e.g., size, timestamps, permissions). However, the intrinsic properties defining the format itself (structure, encoding, and rules) are as follows. These are the core characteristics that any .PEM file must adhere to:

  • File Extension: Typically .pem (though .crt, .key, or others may be used contextually).
  • MIME Type: application/pkix-cert (for certificates) or text/plain generally.
  • Encoding: US-ASCII text with base64-encoded binary payload (using ALPHA, DIGIT, +, /, and = for padding).
  • Structure: Consists of one or more encapsulated objects, each with:
  • Pre-encapsulation boundary: "-----BEGIN " + label + "-----" (followed by optional whitespace or EOL).
  • Base64-encoded body: The core data, line-wrapped.
  • Post-encapsulation boundary: "-----END " + label + "-----" (preceded by the body and optional whitespace).
  • Label: A case-sensitive string identifying the object type (e.g., "CERTIFICATE", "PRIVATE KEY"). Must match exactly between BEGIN and END. Standard labels include: CERTIFICATE, X509 CRL, CERTIFICATE REQUEST, PKCS7, CMS, PRIVATE KEY, ENCRYPTED PRIVATE KEY, ATTRIBUTE CERTIFICATE, PUBLIC KEY.
  • Base64 Line Length: Exactly 64 characters per line (except the final line, which may be shorter).
  • Whitespace Handling: Flexible; parsers ignore extraneous whitespace (SP, HT, CR, LF, etc.) outside the base64 body. Generators should minimize it.
  • Multiple Objects Support: Files can contain multiple independent encapsulated sections (e.g., certificate chains).
  • ASN.1 Compliance: Decoded base64 must form a valid BER/DER-encoded ASN.1 structure matching the label (DER preferred for canonical forms).
  • No Headers: Strict format prohibits legacy headers (e.g., Proc-Type, DEK-Info); encrypted data uses dedicated labels like "ENCRYPTED PRIVATE KEY".
  • End-of-Line (EOL): Supports CRLF, CR, or LF.
  • Optional Explanatory Text: Non-normative text may appear before/after boundaries (e.g., certificate details).
  • Security and Parsing Rules: Parsers must be robust to malformations; no canonical textual form exists, so decoding to binary is required for comparisons.

These properties ensure interoperability and text-safety. In practice, some implementations tolerate legacy elements (e.g., headers in private keys), but strict adherence follows RFC 7468.

Here are two direct links to publicly available .PEM files (sample CA certificate bundles and client certificates):

3. Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .PEM File Dump

This is an embeddable HTML snippet with JavaScript for a Ghost blog (or any HTML context). It creates a drag-and-drop area where users can drop a .PEM file. The script parses the file, extracts sections, and dumps the properties from the list above to the screen (e.g., labels, number of objects, line lengths checked, etc.). It uses FileReader for browser-based processing.

Drag and drop a .PEM file here

4. Python Class for .PEM Handling

This Python class can open a .PEM file, parse sections, decode base64 to binary, write a new .PEM from binary data, and print the properties.

import base64
import re
import os

class PemHandler:
    def __init__(self, filepath=None):
        self.filepath = filepath
        self.sections = []
        self.properties = {}
        if filepath:
            self.read()

    def read(self):
        with open(self.filepath, 'r') as f:
            content = f.read()
        self.parse(content)
        self.print_properties()

    def parse(self, content):
        self.sections = []
        matches = re.finditer(r'-----BEGIN (?P<label>[A-Z0-9 ]+)-----(?P<data>.*?)\n-----END (?P<label_end>[A-Z0-9 ]+)-----', content, re.DOTALL)
        for match in matches:
            label = match.group('label')
            data = match.group('data').strip()
            headers = [line for line in data.split('\n') if ':' in line and not re.match(r'^[A-Za-z0-9+/=]+$', line.strip())]
            base64_data = ''.join([line for line in data.split('\n') if line.strip() and ':' not in line])
            try:
                decoded = base64.b64decode(base64_data)
            except:
                decoded = b''
            self.sections.append({'label': label, 'headers': headers, 'base64': base64_data, 'decoded': decoded})
        
        self.properties = {
            'extension': os.path.splitext(self.filepath)[1] if self.filepath else '.pem',
            'mimeType': 'text/plain' if 'CERTIFICATE' in content else 'application/pkix-cert',
            'encoding': 'US-ASCII with Base64',
            'labels': ', '.join([s['label'] for s in self.sections]),
            'numObjects': len(self.sections),
            'multipleObjects': len(self.sections) > 1,
            'lineLengthsValid': all(len(line) <= 64 for s in self.sections for line in s['base64'].split('\n') if line),
            'whitespaceFlexible': True,
            'hasExplanatoryText': bool(re.search(r'^[^\-].*?-----BEGIN', content, re.DOTALL)),
            'noHeaders': all(len(s['headers']) == 0 for s in self.sections),
            'asn1Compliant': True,  # Assume; add ASN.1 parser if needed
            'eolType': 'CRLF' if '\r\n' in content else ('CR' if '\r' in content else 'LF')
        }

    def decode(self):
        return [s['decoded'] for s in self.sections]

    def write(self, output_path, data_list, labels):
        if len(data_list) != len(labels):
            raise ValueError("Data and labels must match in length")
        with open(output_path, 'w') as f:
            for data, label in zip(data_list, labels):
                base64_data = base64.b64encode(data).decode('ascii')
                wrapped = '\n'.join([base64_data[i:i+64] for i in range(0, len(base64_data), 64)])
                f.write(f"-----BEGIN {label}-----\n{wrapped}\n-----END {label}-----\n")

    def print_properties(self):
        print("PEM File Properties:")
        for key, value in self.properties.items():
            print(f"{key}: {value}")

# Example usage:
# handler = PemHandler('example.pem')
# decoded = handler.decode()
# handler.write('new.pem', decoded, [s['label'] for s in handler.sections])

5. Java Class for .PEM Handling

This Java class performs similar operations: open, parse, decode, write, and print properties.

import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.Paths;
import java.util.ArrayList;
import java.util.Base64;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class PemHandler {
    private String filepath;
    private List<PemSection> sections = new ArrayList<>();
    private java.util.Map<String, Object> properties = new java.util.HashMap<>();

    public PemHandler(String filepath) {
        this.filepath = filepath;
        if (filepath != null) {
            read();
        }
    }

    public void read() {
        try {
            String content = new String(Files.readAllBytes(Paths.get(filepath)), StandardCharsets.US_ASCII);
            parse(content);
            printProperties();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    private void parse(String content) {
        sections.clear();
        Pattern pattern = Pattern.compile("-----BEGIN (?<label>[A-Z0-9 ]+)-----(?<data>.*?)-----END (?<label_end>[A-Z0-9 ]+)-----", Pattern.DOTALL);
        Matcher matcher = pattern.matcher(content);
        while (matcher.find()) {
            String label = matcher.group("label");
            String data = matcher.group("data").trim();
            String[] lines = data.split("\\r?\\n");
            List<String> headers = new ArrayList<>();
            StringBuilder base64Builder = new StringBuilder();
            for (String line : lines) {
                if (line.contains(":") && !line.matches("^[A-Za-z0-9+/=]+$")) {
                    headers.add(line);
                } else if (!line.trim().isEmpty()) {
                    base64Builder.append(line.trim());
                }
            }
            byte[] decoded = Base64.getDecoder().decode(base64Builder.toString());
            sections.add(new PemSection(label, headers, base64Builder.toString(), decoded));
        }

        properties.put("extension", filepath.substring(filepath.lastIndexOf('.')));
        properties.put("mimeType", content.contains("CERTIFICATE") ? "application/pkix-cert" : "text/plain");
        properties.put("encoding", "US-ASCII with Base64");
        StringBuilder labelsSb = new StringBuilder();
        for (PemSection s : sections) {
            if (labelsSb.length() > 0) labelsSb.append(", ");
            labelsSb.append(s.label);
        }
        properties.put("labels", labelsSb.toString());
        properties.put("numObjects", sections.size());
        properties.put("multipleObjects", sections.size() > 1);
        boolean lineLengthsValid = true;
        for (PemSection s : sections) {
            for (String line : s.base64.split("\\n")) {
                if (line.length() > 64) {
                    lineLengthsValid = false;
                    break;
                }
            }
        }
        properties.put("lineLengthsValid", lineLengthsValid);
        properties.put("whitespaceFlexible", true);
        properties.put("hasExplanatoryText", Pattern.compile("^[^\-].*?-----BEGIN", Pattern.DOTALL).matcher(content).find());
        boolean noHeaders = true;
        for (PemSection s : sections) {
            if (!s.headers.isEmpty()) {
                noHeaders = false;
                break;
            }
        }
        properties.put("noHeaders", noHeaders);
        properties.put("asn1Compliant", true); // Assume
        String eolType = content.contains("\r\n") ? "CRLF" : (content.contains("\r") ? "CR" : "LF");
        properties.put("eolType", eolType);
    }

    public List<byte[]> decode() {
        List<byte[]> decodedList = new ArrayList<>();
        for (PemSection s : sections) {
            decodedList.add(s.decoded);
        }
        return decodedList;
    }

    public void write(String outputPath, List<byte[]> dataList, List<String> labels) throws IOException {
        if (dataList.size() != labels.size()) {
            throw new IllegalArgumentException("Data and labels must match in length");
        }
        try (BufferedWriter writer = new BufferedWriter(new FileWriter(outputPath))) {
            for (int i = 0; i < dataList.size(); i++) {
                byte[] data = dataList.get(i);
                String label = labels.get(i);
                String base64 = Base64.getEncoder().encodeToString(data);
                StringBuilder wrapped = new StringBuilder();
                for (int j = 0; j < base64.length(); j += 64) {
                    wrapped.append(base64.substring(j, Math.min(j + 64, base64.length()))).append("\n");
                }
                writer.write("-----BEGIN " + label + "-----\n");
                writer.write(wrapped.toString());
                writer.write("-----END " + label + "-----\n");
            }
        }
    }

    public void printProperties() {
        System.out.println("PEM File Properties:");
        for (java.util.Map.Entry<String, Object> entry : properties.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    private static class PemSection {
        String label;
        List<String> headers;
        String base64;
        byte[] decoded;

        PemSection(String label, List<String> headers, String base64, byte[] decoded) {
            this.label = label;
            this.headers = headers;
            this.base64 = base64;
            this.decoded = decoded;
        }
    }

    // Example usage:
    // public static void main(String[] args) {
    //     PemHandler handler = new PemHandler("example.pem");
    //     List<byte[]> decoded = handler.decode();
    //     handler.write("new.pem", decoded, handler.sections.stream().map(s -> s.label).collect(java.util.stream.Collectors.toList()));
    // }
}

6. JavaScript Class for .PEM Handling

This JavaScript class (Node.js compatible) can read a .PEM file synchronously, parse, decode, write, and print properties to console. Requires fs module.

const fs = require('fs');

class PemHandler {
  constructor(filepath = null) {
    this.filepath = filepath;
    this.sections = [];
    this.properties = {};
    if (filepath) {
      this.read();
    }
  }

  read() {
    const content = fs.readFileSync(this.filepath, 'utf8');
    this.parse(content);
    this.printProperties();
  }

  parse(content) {
    this.sections = [];
    const matches = content.matchAll(/-----BEGIN (?<label>[A-Z0-9 ]+)-----(?<data>.*?)-----END (?<label_end>[A-Z0-9 ]+)-----/gs);
    for (const match of matches) {
      const label = match.groups.label;
      const data = match.groups.data.trim();
      const lines = data.split(/\r?\n/);
      const headers = lines.filter(line => line.includes(':') && !/^[A-Za-z0-9+/=]+$/.test(line.trim()));
      const base64Lines = lines.filter(line => line.trim() && !line.includes(':'));
      const base64Data = base64Lines.join('');
      const decoded = Buffer.from(base64Data, 'base64');
      this.sections.push({ label, headers, base64: base64Data, decoded });
    }

    this.properties = {
      extension: this.filepath ? this.filepath.split('.').pop() : 'pem',
      mimeType: content.includes('CERTIFICATE') ? 'application/pkix-cert' : 'text/plain',
      encoding: 'US-ASCII with Base64',
      labels: this.sections.map(s => s.label).join(', '),
      numObjects: this.sections.length,
      multipleObjects: this.sections.length > 1,
      lineLengthsValid: this.sections.every(s => s.base64.match(/.{1,64}/g).every(l => l.length <= 64)),
      whitespaceFlexible: true,
      hasExplanatoryText: /^[^(-----BEGIN)]/.test(content.trim()),
      noHeaders: this.sections.every(s => s.headers.length === 0),
      asn1Compliant: true, // Assume
      eolType: content.includes('\r\n') ? 'CRLF' : (content.includes('\r') ? 'CR' : 'LF')
    };
  }

  decode() {
    return this.sections.map(s => s.decoded);
  }

  write(outputPath, dataList, labels) {
    if (dataList.length !== labels.length) {
      throw new Error('Data and labels must match in length');
    }
    let output = '';
    for (let i = 0; i < dataList.length; i++) {
      const data = dataList[i];
      const label = labels[i];
      const base64 = data.toString('base64');
      const wrapped = base64.match(/.{1,64}/g).join('\n');
      output += `-----BEGIN ${label}-----\n${wrapped}\n-----END ${label}-----\n`;
    }
    fs.writeFileSync(outputPath, output);
  }

  printProperties() {
    console.log('PEM File Properties:');
    for (const [key, value] of Object.entries(this.properties)) {
      console.log(`${key}: ${value}`);
    }
  }
}

// Example usage:
// const handler = new PemHandler('example.pem');
// const decoded = handler.decode();
// handler.write('new.pem', decoded, handler.sections.map(s => s.label));

7. C Class for .PEM Handling

Since standard C does not have classes, this is implemented as a C++ class for object-oriented structure. It handles opening, parsing, decoding (with a simple base64 decoder), writing, and printing properties. For base64, a basic implementation is included (no external libs).

#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <regex>
#include <map>
#include <cstring>

class PemHandler {
private:
    std::string filepath;
    struct PemSection {
        std::string label;
        std::vector<std::string> headers;
        std::string base64;
        std::vector<char> decoded;
    };
    std::vector<PemSection> sections;
    std::map<std::string, std::string> properties;

    // Simple base64 decode function
    std::vector<char> base64Decode(const std::string& input) {
        std::vector<char> output;
        const std::string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
        int val = 0, valb = -8;
        for (char c : input) {
            if (c == '=') break;
            size_t pos = chars.find(c);
            if (pos == std::string::npos) continue;
            val = (val << 6) + pos;
            valb += 6;
            if (valb >= 0) {
                output.push_back(char((val >> valb) & 0xFF));
                valb -= 8;
            }
        }
        return output;
    }

public:
    PemHandler(const std::string& fp = "") : filepath(fp) {
        if (!fp.empty()) {
            read();
        }
    }

    void read() {
        std::ifstream file(filepath);
        if (!file) {
            std::cerr << "Error opening file" << std::endl;
            return;
        }
        std::string content((std::istreambuf_iterator<char>(file)), std::istreambuf_iterator<char>());
        parse(content);
        printProperties();
    }

    void parse(const std::string& content) {
        sections.clear();
        std::regex pattern(R"(-----BEGIN ([A-Z0-9 ]+)-----(.*?)-----END ([A-Z0-9 ]+)-----)", std::regex::dotall);
        auto begin = std::sregex_iterator(content.begin(), content.end(), pattern);
        auto end = std::sregex_iterator();
        for (std::sregex_iterator i = begin; i != end; ++i) {
            std::smatch match = *i;
            std::string label = match[1].str();
            std::string data = match[2].str();
            std::vector<std::string> lines;
            std::string line;
            for (char c : data) {
                if (c == '\n' || c == '\r') {
                    if (!line.empty()) lines.push_back(line);
                    line.clear();
                } else {
                    line += c;
                }
            }
            if (!line.empty()) lines.push_back(line);
            std::vector<std::string> headers;
            std::string base64Data;
            for (const std::string& l : lines) {
                std::string trimmed = l;
                trimmed.erase(0, trimmed.find_first_not_of(" \t"));
                trimmed.erase(trimmed.find_last_not_of(" \t") + 1);
                if (trimmed.find(':') != std::string::npos && !std::regex_match(trimmed, std::regex("^[A-Za-z0-9+/=]+$"))) {
                    headers.push_back(trimmed);
                } else if (!trimmed.empty()) {
                    base64Data += trimmed;
                }
            }
            auto decoded = base64Decode(base64Data);
            sections.push_back({label, headers, base64Data, decoded});
        }

        size_t dotPos = filepath.rfind('.');
        properties["extension"] = (dotPos != std::string::npos) ? filepath.substr(dotPos) : ".pem";
        properties["mimeType"] = (content.find("CERTIFICATE") != std::string::npos) ? "application/pkix-cert" : "text/plain";
        properties["encoding"] = "US-ASCII with Base64";
        std::string labelsStr;
        for (const auto& s : sections) {
            if (!labelsStr.empty()) labelsStr += ", ";
            labelsStr += s.label;
        }
        properties["labels"] = labelsStr;
        properties["numObjects"] = std::to_string(sections.size());
        properties["multipleObjects"] = (sections.size() > 1) ? "true" : "false";
        bool lineLengthsValid = true;
        for (const auto& s : sections) {
            size_t pos = 0;
            while (pos < s.base64.size()) {
                size_t len = std::min<size_t>(64, s.base64.size() - pos);
                if (len > 64) {
                    lineLengthsValid = false;
                    break;
                }
                pos += 64;
            }
        }
        properties["lineLengthsValid"] = lineLengthsValid ? "true" : "false";
        properties["whitespaceFlexible"] = "true";
        std::regex expText(R"(^[^-].*?-----BEGIN)", std::regex::dotall);
        properties["hasExplanatoryText"] = std::regex_search(content, expText) ? "true" : "false";
        bool noHeaders = true;
        for (const auto& s : sections) {
            if (!s.headers.empty()) {
                noHeaders = false;
                break;
            }
        }
        properties["noHeaders"] = noHeaders ? "true" : "false";
        properties["asn1Compliant"] = "true"; // Assume
        std::string eolType = (content.find("\r\n") != std::string::npos) ? "CRLF" : ((content.find('\r') != std::string::npos) ? "CR" : "LF");
        properties["eolType"] = eolType;
    }

    std::vector<std::vector<char>> decode() {
        std::vector<std::vector<char>> decodedList;
        for (const auto& s : sections) {
            decodedList.push_back(s.decoded);
        }
        return decodedList;
    }

    void write(const std::string& outputPath, const std::vector<std::vector<char>>& dataList, const std::vector<std::string>& labels) {
        if (dataList.size() != labels.size()) {
            std::cerr << "Data and labels must match in length" << std::endl;
            return;
        }
        std::ofstream out(outputPath);
        if (!out) {
            std::cerr << "Error opening output file" << std::endl;
            return;
        }
        for (size_t i = 0; i < dataList.size(); ++i) {
            const auto& data = dataList[i];
            const std::string& label = labels[i];
            // Simple base64 encode
            std::string base64;
            const std::string chars = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/";
            int val = 0, valb = -6;
            for (char c : data) {
                val = (val << 8) + static_cast<unsigned char>(c);
                valb += 8;
                while (valb >= 0) {
                    base64 += chars[(val >> valb) & 0x3F];
                    valb -= 6;
                }
            }
            if (valb > -6) base64 += chars[((val << 8) >> (valb + 8)) & 0x3F];
            while (base64.size() % 4) base64 += '=';
            // Wrap at 64
            std::string wrapped;
            for (size_t j = 0; j < base64.size(); j += 64) {
                wrapped += base64.substr(j, 64) + "\n";
            }
            out << "-----BEGIN " << label << "-----\n" << wrapped << "-----END " << label << "-----\n";
        }
    }

    void printProperties() {
        std::cout << "PEM File Properties:" << std::endl;
        for (const auto& prop : properties) {
            std::cout << prop.first << ": " << prop.second << std::endl;
        }
    }
};

// Example usage:
// int main() {
//     PemHandler handler("example.pem");
//     auto decoded = handler.decode();
//     std::vector<std::string> labels;
//     // Populate labels from sections...
//     handler.write("new.pem", decoded, labels);
//     return 0;
// }