Task 527: .PDI File Format
Task 527: .PDI File Format
1. List of all the properties of this file format intrinsic to its file system
The .PDI file format refers to the PlanetPress Image (now OL Connect Image) index file, a text-based format used for indexing and searching archived documents (typically PDF, TIFF, or JPEG). It is generated by PlanetPress/OL Connect Image software for document management and search applications. The format is native (text) or XML, but the native format is the primary one, consisting of key-value entries in the syntax ~<name>=<value>. The properties (keys) are intrinsic to the format's structure and include metadata about the document, archive, creation, and index terms per page. The full list of properties is:
- SearchDBName: The name defined in PlanetPress for the database record of the document. Used as the record name in search databases if no custom name is defined. Data type: string.
- FormName: The name of the converted document (from the PlanetPress Suite Workflow Tool Documents folder) that generated the PDI and associated PDF. Used to group PDF files from the same document and as a fallback for SearchDBName. Data type: string.
- ArchiveFile: The name (without extension) of the PDI file and its corresponding PDF file. Set in the PlanetPress Suite Workflow Tool during configuration of the Image output task. Data type: string.
- ArchiveMethod: The type of archive file generated by PlanetPress Image (e.g., PDF, TIFF, JPEG). Note: PlanetPress Search only supports searching PDF archives. Data type: string.
- Time: The time PlanetPress Image created the PDI file, in HH:MM:SS format. Data type: string.
- Date: The date PlanetPress Image created the PDI file, in YYYY/MM/DD format. Data type: string.
- IndexName:: The name of an index term used in the PDF file, followed by its field length (the greater of the data selection length in PlanetPress or the default in PlanetPress Search). This appears in the header section listing all index terms. Data type: string (name) with integer (field length).
- FieldName : The name of an index term, followed by a list of values it takes on a given page of the PDF. This appears in the body for each page's index data. Data type: string (name) followed by list of string values.
- Page: The page number in the PDF file that contains the preceding FieldName entries (until the next Page entry). Data type: integer.
These properties are fixed and define the file's structure. The file starts with header properties (SearchDBName to IndexName), followed by per-page data (Page, FieldName). The format is not binary; it's plain text, so "intrinsic to its file system" refers to these key-value pairs that enable indexing and searching without external metadata.
2. Two direct download links for files of format .PDI
After extensive searching, no public direct download links for valid .PDI files (PlanetPress/OL Connect index files) were found. These files are typically generated internally by the software for document archiving and are not commonly shared publicly. For example purposes, you could generate your own using OL Connect software, but no safe, verified direct links are available from reliable sources like official documentation or repositories. If you have access to PlanetPress/OL Connect, sample PDI files can be created from demo documents in the software.
3. Ghost blog embedded HTML JavaScript for drag and drop .PDI file dump
This is an embedded HTML + JavaScript snippet that can be inserted into a Ghost blog post (using the HTML card in the editor). It creates a drop zone for a .PDI file, reads it as text, parses the properties based on the ~name=value syntax, and dumps them to the screen in a readable list. It handles header properties and per-page data.
4. Python class for .PDI file handling
This Python class opens a .PDI file, decodes (parses) the text-based structure, reads the properties, prints them to console, and supports writing a new .PDI file with given properties.
class PDIHandler:
def __init__(self, filepath=None):
self.properties = {}
self.pages = {} # Page number to list of field values
if filepath:
self.read(filepath)
def read(self, filepath):
with open(filepath, 'r') as f:
content = f.readlines()
for line in content:
line = line.strip()
if line.startswith('~'):
match = line[1:].split('=', 1)
if len(match) == 2:
key = match[0].strip()
value = match[1].strip()
if key == 'Page':
self.current_page = value
self.pages[self.current_page] = []
elif key.startswith('FieldName'):
if hasattr(self, 'current_page'):
self.pages[self.current_page].append(f"{key}: {value}")
else:
self.properties[key] = value
def print_properties(self):
print("PDI Properties:")
for key, value in self.properties.items():
print(f"{key}: {value}")
for page, fields in self.pages.items():
print(f"Page {page}:")
for field in fields:
print(f" {field}")
def write(self, filepath):
with open(filepath, 'w') as f:
for key, value in self.properties.items():
f.write(f"~{key}={value}\n")
for page, fields in self.pages.items():
f.write(f"~Page={page}\n")
for field in fields:
f.write(f"~{field}\n")
# Example usage
if __name__ == "__main__":
pdi = PDIHandler("example.pdi")
pdi.print_properties()
# To write: pdi.write("new.pdi")
5. Java class for .PDI file handling
This Java class opens a .PDI file, decodes the structure, reads properties, prints to console, and supports writing a new .PDI file.
import java.io.*;
import java.util.*;
public class PDIHandler {
private Map<String, String> properties = new HashMap<>();
private Map<String, List<String>> pages = new LinkedHashMap<>();
private String currentPage;
public PDIHandler(String filepath) throws IOException {
if (filepath != null) {
read(filepath);
}
}
public void read(String filepath) throws IOException {
try (BufferedReader br = new BufferedReader(new FileReader(filepath))) {
String line;
while ((line = br.readLine()) != null) {
line = line.trim();
if (line.startsWith("~")) {
String[] parts = line.substring(1).split("=", 2);
if (parts.length == 2) {
String key = parts[0].trim();
String value = parts[1].trim();
if (key.equals("Page")) {
currentPage = value;
pages.put(currentPage, new ArrayList<>());
} else if (key.startsWith("FieldName")) {
if (currentPage != null) {
pages.get(currentPage).add(key + ": " + value);
}
} else {
properties.put(key, value);
}
}
}
}
}
}
public void printProperties() {
System.out.println("PDI Properties:");
for (Map.Entry<String, String> entry : properties.entrySet()) {
System.out.println(entry.getKey() + ": " + entry.getValue());
}
for (Map.Entry<String, List<String>> pageEntry : pages.entrySet()) {
System.out.println("Page " + pageEntry.getKey() + ":");
for (String field : pageEntry.getValue()) {
System.out.println(" " + field);
}
}
}
public void write(String filepath) throws IOException {
try (BufferedWriter bw = new BufferedWriter(new FileWriter(filepath))) {
for (Map.Entry<String, String> entry : properties.entrySet()) {
bw.write("~" + entry.getKey() + "=" + entry.getValue() + "\n");
}
for (Map.Entry<String, List<String>> pageEntry : pages.entrySet()) {
bw.write("~Page=" + pageEntry.getKey() + "\n");
for (String field : pageEntry.getValue()) {
bw.write("~" + field + "\n");
}
}
}
}
public static void main(String[] args) throws IOException {
PDIHandler pdi = new PDIHandler("example.pdi");
pdi.printProperties();
// To write: pdi.write("new.pdi");
}
}
6. JavaScript class for .PDI file handling
This JavaScript class opens a .PDI file (using Node.js fs for console use), decodes the structure, reads properties, prints to console, and supports writing a new .PDI file.
const fs = require('fs');
class PDIHandler {
constructor(filepath = null) {
this.properties = {};
this.pages = {};
this.currentPage = null;
if (filepath) {
this.read(filepath);
}
}
read(filepath) {
const content = fs.readFileSync(filepath, 'utf8').split('\n');
content.forEach(line => {
line = line.trim();
if (line.startsWith('~')) {
const match = line.slice(1).match(/([^=]+)=(.*)/);
if (match) {
const key = match[1].trim();
const value = match[2].trim();
if (key === 'Page') {
this.currentPage = value;
this.pages[this.currentPage] = [];
} else if (key.startsWith('FieldName')) {
if (this.currentPage) {
this.pages[this.currentPage].push(`${key}: ${value}`);
}
} else {
this.properties[key] = value;
}
}
}
});
}
printProperties() {
console.log('PDI Properties:');
for (const [key, value] of Object.entries(this.properties)) {
console.log(`${key}: ${value}`);
}
for (const [page, fields] of Object.entries(this.pages)) {
console.log(`Page ${page}:`);
fields.forEach(field => console.log(` ${field}`));
}
}
write(filepath) {
let data = '';
for (const [key, value] of Object.entries(this.properties)) {
data += `~${key}=${value}\n`;
}
for (const [page, fields] of Object.entries(this.pages)) {
data += `~Page=${page}\n`;
fields.forEach(field => data += `~${field}\n`);
}
fs.writeFileSync(filepath, data);
}
}
// Example usage
const pdi = new PDIHandler('example.pdi');
pdi.printProperties();
// To write: pdi.write('new.pdi');
7. C class for .PDI file handling
This C implementation uses a struct as a "class" equivalent, opens a .PDI file, decodes the structure, reads properties, prints to console, and supports writing a new .PDI file. (Uses dynamic memory; compile with gcc.)
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#define MAX_LINE 1024
#define MAX_PROPS 10
#define MAX_PAGES 100
#define MAX_FIELDS 100
typedef struct {
char *keys[MAX_PROPS];
char *values[MAX_PROPS];
int prop_count;
char *page_numbers[MAX_PAGES];
char **fields[MAX_PAGES];
int field_counts[MAX_PAGES];
int page_count;
} PDIHandler;
void init_PDIHandler(PDIHandler *handler) {
handler->prop_count = 0;
handler->page_count = 0;
memset(handler->field_counts, 0, sizeof(handler->field_counts));
}
void read_PDI(PDIHandler *handler, const char *filepath) {
FILE *f = fopen(filepath, "r");
if (!f) {
printf("Error opening file.\n");
return;
}
char line[MAX_LINE];
char *current_page = NULL;
while (fgets(line, MAX_LINE, f)) {
if (line[0] == '~') {
char *key = strtok(line + 1, "=");
char *value = strtok(NULL, "\n");
if (key && value) {
key = strdup(key);
value = strdup(value);
if (strcmp(key, "Page") == 0) {
current_page = value;
handler->page_numbers[handler->page_count] = current_page;
handler->fields[handler->page_count] = malloc(MAX_FIELDS * sizeof(char*));
handler->field_counts[handler->page_count] = 0;
handler->page_count++;
} else if (strstr(key, "FieldName")) {
if (current_page) {
char *field = malloc(strlen(key) + strlen(value) + 3);
sprintf(field, "%s: %s", key, value);
handler->fields[handler->page_count - 1][handler->field_counts[handler->page_count - 1]] = field;
handler->field_counts[handler->page_count - 1]++;
}
free(key);
free(value);
} else {
handler->keys[handler->prop_count] = key;
handler->values[handler->prop_count] = value;
handler->prop_count++;
}
}
}
}
fclose(f);
}
void print_PDI(PDIHandler *handler) {
printf("PDI Properties:\n");
for (int i = 0; i < handler->prop_count; i++) {
printf("%s: %s\n", handler->keys[i], handler->values[i]);
}
for (int p = 0; p < handler->page_count; p++) {
printf("Page %s:\n", handler->page_numbers[p]);
for (int f = 0; f < handler->field_counts[p]; f++) {
printf(" %s\n", handler->fields[p][f]);
}
}
}
void write_PDI(PDIHandler *handler, const char *filepath) {
FILE *f = fopen(filepath, "w");
if (!f) {
printf("Error writing file.\n");
return;
}
for (int i = 0; i < handler->prop_count; i++) {
fprintf(f, "~%s=%s\n", handler->keys[i], handler->values[i]);
}
for (int p = 0; p < handler->page_count; p++) {
fprintf(f, "~Page=%s\n", handler->page_numbers[p]);
for (int f = 0; f < handler->field_counts[p]; f++) {
fprintf(f, "~%s\n", handler->fields[p][f]);
}
}
fclose(f);
}
void free_PDIHandler(PDIHandler *handler) {
for (int i = 0; i < handler->prop_count; i++) {
free(handler->keys[i]);
free(handler->values[i]);
}
for (int p = 0; p < handler->page_count; p++) {
for (int f = 0; f < handler->field_counts[p]; f++) {
free(handler->fields[p][f]);
}
free(handler->fields[p]);
free(handler->page_numbers[p]);
}
}
int main() {
PDIHandler handler;
init_PDIHandler(&handler);
read_PDI(&handler, "example.pdi");
print_PDI(&handler);
// To write: write_PDI(&handler, "new.pdi");
free_PDIHandler(&handler);
return 0;
}