Task 118: .CSV File Format

Task 118: .CSV File Format

File Format Specifications for .CSV

The .CSV (Comma-Separated Values) file format is a simple text-based format for representing tabular data. It is formally described in RFC 4180, which standardizes common practices for CSV files, including structure, delimiters, quoting, and MIME type. While CSV is not a rigidly enforced binary format (it's plain text), RFC 4180 provides guidelines to ensure interoperability. Key aspects include fields separated by commas, records on separate lines ended by CRLF, optional quoting with double quotes, and escaping embedded quotes by doubling them. There is no magic number or binary header, as it's a flat text file.

List of All Properties of This File Format Intrinsic to Its File System
Based on RFC 4180 and common implementations, the intrinsic properties of the .CSV format (focusing on format-specific traits rather than general file system metadata like size or permissions, which apply to any file) are:

  • MIME Type: text/csv
  • File Extension: .csv
  • Magic Number: None (plain text format, no binary signature)
  • Macintosh File Type Code: TEXT
  • Delimiter: Comma (,)
  • Quote Character: Double quote (")
  • Escaping Method: Double quote inside a quoted field is escaped by preceding it with another double quote (e.g., "" for a single ")
  • Line Terminator: CRLF (\r\n)
  • Header: Optional (first line may contain field names; presence can be indicated via MIME parameter "header=present" or "header=absent")
  • Character Set: US-ASCII (common), but other IANA-registered text charsets like UTF-8 are supported via optional "charset" MIME parameter
  • Field Consistency: Each record must have the same number of fields; spaces are part of the data and not ignored
  • Record Structure: Each record on a separate line; last record may omit ending line break
  • Quoting Rules: Fields may be quoted; required for fields containing commas, quotes, or line breaks

These properties are "intrinsic" in that they define how the format interacts with file systems (e.g., as plain text without binary markers), but in practice, variants exist (e.g., semicolon delimiters in some locales), and tools often detect them dynamically.

Two Direct Download Links for .CSV Files

Ghost Blog Embedded HTML JavaScript for Drag-and-Drop .CSV File Dump
Below is a self-contained HTML snippet with embedded JavaScript that can be embedded in a Ghost blog post (using the HTML card/block). It creates a drop zone where users can drag and drop a .CSV file. The script reads the file, attempts basic detection of properties (e.g., delimiter by frequency count, line terminator by scanning, quote char assumption, etc.), and dumps the properties to the screen. It assumes standard quoting and handles basic parsing without external libraries.

Drag and drop a .CSV file here


Python Class for .CSV Handling
Below is a Python class that opens a .CSV file, detects properties using the csv module's Sniffer (for dialect detection), reads/parses the data, prints the properties, and can write the data back (or to a new file). It assumes Python 3 and uses the built-in csv module.

import csv
import os

class CSVHandler:
    def __init__(self, filepath):
        self.filepath = filepath
        self.data = []
        self.dialect = None
        self.properties = {
            'MIME Type': 'text/csv',
            'File Extension': '.csv',
            'Magic Number': 'None',
            'Macintosh File Type Code': 'TEXT',
            'Delimiter': None,
            'Quote Character': None,
            'Escaping Method': 'Double quote (")',
            'Line Terminator': None,
            'Header': None,
            'Character Set': 'UTF-8 (assumed)',
            'Field Consistency': 'Assumed consistent',
            'Number of Rows': None,
            'Number of Columns': None
        }

    def read(self):
        with open(self.filepath, 'r', encoding='utf-8') as f:
            sample = f.read(1024 * 10)  # Sample for sniffing
            f.seek(0)
            sniffer = csv.Sniffer()
            self.dialect = sniffer.sniff(sample)
            self.properties['Delimiter'] = self.dialect.delimiter
            self.properties['Quote Character'] = self.dialect.quotechar
            self.properties['Line Terminator'] = repr(self.dialect.lineterminator)
            has_header = sniffer.has_header(sample)
            self.properties['Header'] = 'Present' if has_header else 'Absent'
            f.seek(0)
            reader = csv.reader(f, self.dialect)
            self.data = list(reader)
            self.properties['Number of Rows'] = len(self.data)
            self.properties['Number of Columns'] = len(self.data[0]) if self.data else 0

    def print_properties(self):
        for key, value in self.properties.items():
            print(f"{key}: {value}")

    def write(self, new_filepath=None):
        if new_filepath is None:
            new_filepath = self.filepath
        with open(new_filepath, 'w', encoding='utf-8', newline='') as f:
            writer = csv.writer(f, dialect=self.dialect)
            writer.writerows(self.data)

# Example usage:
# handler = CSVHandler('example.csv')
# handler.read()
# handler.print_properties()
# handler.write('output.csv')

Java Class for .CSV Handling
Below is a Java class that opens a .CSV file, performs basic parsing and detection (manual, without external libs like OpenCSV), reads into a list of lists, prints properties, and writes back. Detection is simple (e.g., delimiter by count in sample lines).

import java.io.*;
import java.util.*;

public class CSVHandler {
    private String filepath;
    private List<List<String>> data = new ArrayList<>();
    private Map<String, String> properties = new HashMap<>();

    public CSVHandler(String filepath) {
        this.filepath = filepath;
        properties.put("MIME Type", "text/csv");
        properties.put("File Extension", ".csv");
        properties.put("Magic Number", "None");
        properties.put("Macintosh File Type Code", "TEXT");
        properties.put("Escaping Method", "Double quote (\")");
        properties.put("Character Set", "UTF-8 (assumed)");
        properties.put("Field Consistency", "Assumed consistent");
        properties.put("Header", "Unknown");
    }

    public void read() throws IOException {
        String content = readFileAsString();
        detectProperties(content);
        // Parse data
        String delimiter = properties.get("Delimiter");
        String[] lines = content.split(properties.get("Line Terminator").equals("CRLF") ? "\r\n" : "\n");
        for (String line : lines) {
            if (!line.trim().isEmpty()) {
                List<String> row = parseLine(line, delimiter.charAt(0));
                data.add(row);
            }
        }
        properties.put("Number of Rows", String.valueOf(data.size()));
        properties.put("Number of Columns", data.isEmpty() ? "0" : String.valueOf(data.get(0).size()));
    }

    private String readFileAsString() throws IOException {
        StringBuilder sb = new StringBuilder();
        try (BufferedReader br = new BufferedReader(new FileReader(filepath))) {
            String line;
            while ((line = br.readLine()) != null) {
                sb.append(line).append("\n");
            }
        }
        return sb.toString();
    }

    private void detectProperties(String content) {
        // Detect line terminator
        properties.put("Line Terminator", content.contains("\r\n") ? "CRLF" : "LF");

        // Detect delimiter
        String[] sampleLines = content.split("\r?\n|\n", 5);
        char[] delims = {',', ';', '\t', '|'};
        int maxCount = 0;
        char detectedDelim = ',';
        for (char delim : delims) {
            int count = 0;
            for (String line : sampleLines) {
                count += countOccurrences(line, delim);
            }
            if (count > maxCount) {
                maxCount = count;
                detectedDelim = delim;
            }
        }
        properties.put("Delimiter", String.valueOf(detectedDelim));

        // Assume quote char
        properties.put("Quote Character", "\"");
    }

    private int countOccurrences(String str, char ch) {
        int count = 0;
        for (char c : str.toCharArray()) {
            if (c == ch) count++;
        }
        return count;
    }

    private List<String> parseLine(String line, char delimiter) {
        List<String> fields = new ArrayList<>();
        StringBuilder field = new StringBuilder();
        boolean inQuotes = false;
        for (char c : line.toCharArray()) {
            if (c == '"' && !inQuotes) {
                inQuotes = true;
            } else if (c == '"' && inQuotes) {
                if (field.length() > 0 && field.charAt(field.length() - 1) == '"') {
                    // Escaped quote
                    field.setLength(field.length() - 1); // Remove extra "
                }
                inQuotes = false;
            } else if (c == delimiter && !inQuotes) {
                fields.add(field.toString());
                field = new StringBuilder();
            } else {
                field.append(c);
            }
        }
        fields.add(field.toString());
        return fields;
    }

    public void printProperties() {
        for (Map.Entry<String, String> entry : properties.entrySet()) {
            System.out.println(entry.getKey() + ": " + entry.getValue());
        }
    }

    public void write(String newFilepath) throws IOException {
        if (newFilepath == null) newFilepath = filepath;
        char delimiter = properties.get("Delimiter").charAt(0);
        String lineTerm = properties.get("Line Terminator").equals("CRLF") ? "\r\n" : "\n";
        try (BufferedWriter bw = new BufferedWriter(new FileWriter(newFilepath))) {
            for (List<String> row : data) {
                for (int i = 0; i < row.size(); i++) {
                    String field = row.get(i).replace("\"", "\"\""); // Escape quotes
                    bw.write("\"" + field + "\"");
                    if (i < row.size() - 1) bw.write(delimiter);
                }
                bw.write(lineTerm);
            }
        }
    }

    // Example usage:
    // public static void main(String[] args) throws IOException {
    //     CSVHandler handler = new CSVHandler("example.csv");
    //     handler.read();
    //     handler.printProperties();
    //     handler.write("output.csv");
    // }
}

JavaScript Class for .CSV Handling
Below is a JavaScript class (ES6) that opens a .CSV file using Node.js (requires fs module), detects properties similarly, reads/parses, prints to console, and writes. Run with Node.js.

const fs = require('fs');

class CSVHandler {
    constructor(filepath) {
        this.filepath = filepath;
        this.data = [];
        this.properties = {
            'MIME Type': 'text/csv',
            'File Extension': '.csv',
            'Magic Number': 'None',
            'Macintosh File Type Code': 'TEXT',
            'Escaping Method': 'Double quote (")',
            'Character Set': 'UTF-8 (assumed)',
            'Field Consistency': 'Assumed consistent',
            'Header': 'Unknown'
        };
    }

    read() {
        const content = fs.readFileSync(this.filepath, 'utf-8');
        this.detectProperties(content);
        // Parse data
        const delimiter = this.properties['Delimiter'];
        const lineTerm = this.properties['Line Terminator'] === 'CRLF' ? '\r\n' : '\n';
        const lines = content.split(lineTerm).filter(line => line.trim());
        lines.forEach(line => {
            const row = this.parseLine(line, delimiter);
            this.data.push(row);
        });
        this.properties['Number of Rows'] = this.data.length;
        this.properties['Number of Columns'] = this.data[0] ? this.data[0].length : 0;
    }

    detectProperties(content) {
        // Detect line terminator
        this.properties['Line Terminator'] = content.includes('\r\n') ? 'CRLF' : 'LF';

        // Detect delimiter
        const sampleLines = content.split(/\r?\n/).slice(0, 5);
        const delims = [',', ';', '\t', '|'];
        let maxCount = 0;
        let detectedDelim = ',';
        delims.forEach(delim => {
            let count = 0;
            sampleLines.forEach(line => {
                count += line.split(delim).length - 1;
            });
            if (count > maxCount) {
                maxCount = count;
                detectedDelim = delim === '\t' ? '\\t' : delim;
            }
        });
        this.properties['Delimiter'] = detectedDelim;

        // Assume quote
        this.properties['Quote Character'] = '"';
    }

    parseLine(line, delimiter) {
        const fields = [];
        let field = '';
        let inQuotes = false;
        for (let c of line) {
            if (c === '"' && !inQuotes) {
                inQuotes = true;
            } else if (c === '"' && inQuotes) {
                inQuotes = false;
            } else if (c === delimiter && !inQuotes) {
                fields.push(field);
                field = '';
            } else {
                field += c;
            }
        }
        fields.push(field);
        return fields;
    }

    printProperties() {
        for (let [key, value] of Object.entries(this.properties)) {
            console.log(`${key}: ${value}`);
        }
    }

    write(newFilepath = null) {
        if (!newFilepath) newFilepath = this.filepath;
        const delimiter = this.properties['Delimiter'] === '\\t' ? '\t' : this.properties['Delimiter'];
        const lineTerm = this.properties['Line Terminator'] === 'CRLF' ? '\r\n' : '\n';
        let output = '';
        this.data.forEach(row => {
            output += row.map(field => `"${field.replace(/"/g, '""')}"`).join(delimiter) + lineTerm;
        });
        fs.writeFileSync(newFilepath, output, 'utf-8');
    }
}

// Example usage:
// const handler = new CSVHandler('example.csv');
// handler.read();
// handler.printProperties();
// handler.write('output.csv');

C "Class" (Struct and Functions) for .CSV Handling
C doesn't have classes, so below is a struct with associated functions to open a .CSV file, detect/parse, print properties to console, and write. It uses manual detection and parsing. Compile with gcc file.c -o csvhandler.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#define MAX_LINES 1000
#define MAX_COLS 100
#define MAX_FIELD 1024

typedef struct {
    char *filepath;
    char *data[MAX_LINES][MAX_COLS];  // Simple 2D array for data
    int num_rows;
    int num_cols;
    struct {
        char *mime_type;
        char *extension;
        char *magic_number;
        char *mac_type;
        char delimiter;
        char *quote_char;
        char *escaping;
        char *line_term;
        char *header;
        char *charset;
        char *field_consist;
    } properties;
} CSVHandler;

void init_properties(CSVHandler *handler) {
    handler->properties.mime_type = "text/csv";
    handler->properties.extension = ".csv";
    handler->properties.magic_number = "None";
    handler->properties.mac_type = "TEXT";
    handler->properties.escaping = "Double quote (\")";
    handler->properties.charset = "UTF-8 (assumed)";
    handler->properties.field_consist = "Assumed consistent";
    handler->properties.header = "Unknown";
    handler->properties.quote_char = "\"";
    handler->num_rows = 0;
    handler->num_cols = 0;
}

char *read_file_as_string(const char *filepath) {
    FILE *file = fopen(filepath, "r");
    if (!file) return NULL;
    fseek(file, 0, SEEK_END);
    long length = ftell(file);
    fseek(file, 0, SEEK_SET);
    char *buffer = malloc(length + 1);
    fread(buffer, 1, length, file);
    buffer[length] = '\0';
    fclose(file);
    return buffer;
}

void detect_properties(CSVHandler *handler, char *content) {
    // Detect line terminator
    if (strstr(content, "\r\n")) {
        handler->properties.line_term = "CRLF";
    } else {
        handler->properties.line_term = "LF";
    }

    // Detect delimiter (simple count on first 5 lines)
    char delims[] = {',', ';', '\t', '|'};
    int max_count = 0;
    char detected = ',';
    char *line = strtok(content, "\r\n\n");
    int line_count = 0;
    while (line && line_count < 5) {
        for (int i = 0; i < 4; i++) {
            char d = delims[i];
            int count = 0;
            for (char *p = line; *p; p++) if (*p == d) count++;
            if (count > max_count) {
                max_count = count;
                detected = d;
            }
        }
        line = strtok(NULL, "\r\n\n");
        line_count++;
    }
    handler->delimiter = detected;
}

void read_csv(CSVHandler *handler) {
    char *content = read_file_as_string(handler->filepath);
    if (!content) {
        printf("Error reading file\n");
        return;
    }
    detect_properties(handler, content);

    // Parse (simple, assumes no nested quotes for brevity)
    char *line = strtok(content, handler->properties.line_term == "CRLF" ? "\r\n" : "\n");
    while (line && handler->num_rows < MAX_LINES) {
        char field[MAX_FIELD];
        int field_idx = 0;
        int col = 0;
        int in_quotes = 0;
        for (char *p = line; *p && col < MAX_COLS; p++) {
            if (*p == '"' && !in_quotes) {
                in_quotes = 1;
            } else if (*p == '"' && in_quotes) {
                in_quotes = 0;
            } else if (*p == handler->delimiter && !in_quotes) {
                field[field_idx] = '\0';
                handler->data[handler->num_rows][col] = strdup(field);
                field_idx = 0;
                col++;
            } else {
                field[field_idx++] = *p;
            }
        }
        field[field_idx] = '\0';
        handler->data[handler->num_rows][col] = strdup(field);
        handler->num_cols = col + 1;  // Assume consistent
        handler->num_rows++;
        line = strtok(NULL, handler->properties.line_term == "CRLF" ? "\r\n" : "\n");
    }
    free(content);
}

void print_properties(CSVHandler *handler) {
    printf("MIME Type: %s\n", handler->properties.mime_type);
    printf("File Extension: %s\n", handler->properties.extension);
    printf("Magic Number: %s\n", handler->properties.magic_number);
    printf("Macintosh File Type Code: %s\n", handler->properties.mac_type);
    printf("Delimiter: %c\n", handler->delimiter);
    printf("Quote Character: %s\n", handler->properties.quote_char);
    printf("Escaping Method: %s\n", handler->properties.escaping);
    printf("Line Terminator: %s\n", handler->properties.line_term);
    printf("Header: %s\n", handler->properties.header);
    printf("Character Set: %s\n", handler->properties.charset);
    printf("Field Consistency: %s\n", handler->properties.field_consist);
    printf("Number of Rows: %d\n", handler->num_rows);
    printf("Number of Columns: %d\n", handler->num_cols);
}

void write_csv(CSVHandler *handler, const char *new_filepath) {
    char *path = new_filepath ? (char*)new_filepath : handler->filepath;
    FILE *file = fopen(path, "w");
    if (!file) return;
    char *line_term = strcmp(handler->properties.line_term, "CRLF") == 0 ? "\r\n" : "\n";
    for (int i = 0; i < handler->num_rows; i++) {
        for (int j = 0; j < handler->num_cols; j++) {
            fprintf(file, "\"%s\"", handler->data[i][j]);  // Simple, no escape for brevity
            if (j < handler->num_cols - 1) fprintf(file, "%c", handler->delimiter);
        }
        fprintf(file, "%s", line_term);
    }
    fclose(file);
}

int main() {
    CSVHandler handler;
    handler.filepath = "example.csv";
    init_properties(&handler);
    read_csv(&handler);
    print_properties(&handler);
    write_csv(&handler, "output.csv");
    // Free memory (omitted for brevity)
    return 0;
}