Task 718: .TAR File Format
Task 718: .TAR File Format
File Format Specifications for the .TAR File Format
The .TAR (Tape ARchive) file format is a standard for concatenating files into a single archive without compression, originally designed for tape storage. It is defined in POSIX standards, with the primary variant being the USTAR format (POSIX.1-1988), which extends earlier versions. The format consists of 512-byte records: a header for each archived item followed by its data (padded to 512-byte multiples), and ends with at least two zero-filled records. Key specifications include ASCII-encoded headers, octal numeric fields, and support for file metadata. Extensions like GNU and PAX (POSIX.1-2001) add features such as longer names and sparse files, but the core USTAR structure is widely used for compatibility.
1. List of Properties Intrinsic to the .TAR File Format
The properties refer to the fields in the standard USTAR header (512 bytes per entry), which capture metadata for each archived item. These are intrinsic to the format's structure and handling of file system attributes. Below is a comprehensive list, presented in a table for clarity, including offsets, sizes, and descriptions based on the POSIX USTAR specification.
| Field Name | Offset | Size (Bytes) | Description |
|---|---|---|---|
| name | 0 | 100 | File name or path (null-terminated ASCII string; combined with prefix for paths exceeding 100 bytes). |
| mode | 100 | 8 | File permissions (ASCII octal, zero-padded). |
| uid | 108 | 8 | Owner's user ID (ASCII octal, zero-padded). |
| gid | 116 | 8 | Group's ID (ASCII octal, zero-padded). |
| size | 124 | 12 | File size in bytes (ASCII octal, zero-padded; for regular files). |
| mtime | 136 | 12 | Modification time (ASCII octal seconds since Unix epoch, zero-padded). |
| checksum | 148 | 8 | Header checksum (ASCII octal; sum of header bytes with this field treated as spaces). |
| typeflag | 156 | 1 | Entry type ('0' or NUL: regular file; '1': hard link; '2': symbolic link; '3': character device; '4': block device; '5': directory; '6': FIFO; others reserved or extended). |
| linkname | 157 | 100 | Target name for links (null-terminated ASCII string). |
| magic | 257 | 6 | Format identifier ("ustar\0"). |
| version | 263 | 2 | Version ("00"). |
| uname | 265 | 32 | Owner's user name (null-terminated ASCII). |
| gname | 297 | 32 | Group's name (null-terminated ASCII). |
| devmajor | 329 | 8 | Major device number for special files (ASCII octal, zero-padded). |
| devminor | 337 | 8 | Minor device number for special files (ASCII octal, zero-padded). |
| prefix | 345 | 155 | Path prefix for long names (null-terminated ASCII; prepended to name with '/'). |
| pad | 500 | 12 | Padding (null bytes). |
These fields ensure portability across systems, preserving file system attributes like permissions and ownership. The format supports multiple entries in sequence, with data blocks following each header.
2. Two Direct Download Links for .TAR Files
- https://getsamplefiles.com/download/tar/sample-1.tar
- https://getsamplefiles.com/download/tar/sample-2.tar
3. HTML/JavaScript for Drag-and-Drop .TAR File Property Dump
The following is a self-contained HTML page with embedded JavaScript that enables drag-and-drop of a .TAR file. Upon dropping, it reads the file, parses USTAR headers, extracts the properties listed above, and displays them on the screen. It uses the FileReader API for browser-based processing.
4. Python Class for .TAR File Handling
The following Python class can open a .TAR file, decode headers, read and print properties for each entry, and write a simple .TAR archive with added files.
import os
import struct
import time
class TarHandler:
HEADER_FORMAT = '100s8s8s8s12s12s8sc100s6s2s32s32s8s8s155s12s'
BLOCK_SIZE = 512
def __init__(self, filepath):
self.filepath = filepath
def read_and_print_properties(self):
with open(self.filepath, 'rb') as f:
offset = 0
while True:
header_data = f.read(self.BLOCK_SIZE)
if len(header_data) < self.BLOCK_SIZE or all(b == 0 for b in header_data):
break
properties = self._decode_header(header_data)
if properties:
print(f"Entry at offset {offset}:")
for key, value in properties.items():
print(f" {key}: {value}")
print()
size = int(properties['size'], 8)
data_blocks = (size + self.BLOCK_SIZE - 1) // self.BLOCK_SIZE
f.seek(data_blocks * self.BLOCK_SIZE, os.SEEK_CUR)
offset += self.BLOCK_SIZE + data_blocks * self.BLOCK_SIZE
def _decode_header(self, data):
unpacked = struct.unpack(self.HEADER_FORMAT, data)
properties = {
'name': unpacked[0].decode('ascii').rstrip('\x00'),
'mode': int(unpacked[1].decode('ascii').rstrip('\x00'), 8),
'uid': int(unpacked[2].decode('ascii').rstrip('\x00'), 8),
'gid': int(unpacked[3].decode('ascii').rstrip('\x00'), 8),
'size': unpacked[4].decode('ascii').rstrip('\x00'),
'mtime': int(unpacked[5].decode('ascii').rstrip('\x00'), 8),
'checksum': int(unpacked[6].decode('ascii').rstrip('\x00'), 8),
'typeflag': unpacked[7].decode('ascii'),
'linkname': unpacked[8].decode('ascii').rstrip('\x00'),
'magic': unpacked[9].decode('ascii'),
'version': unpacked[10].decode('ascii'),
'uname': unpacked[11].decode('ascii').rstrip('\x00'),
'gname': unpacked[12].decode('ascii').rstrip('\x00'),
'devmajor': int(unpacked[13].decode('ascii').rstrip('\x00'), 8),
'devminor': int(unpacked[14].decode('ascii').rstrip('\x00'), 8),
'prefix': unpacked[15].decode('ascii').rstrip('\x00'),
}
if not properties['name']:
return None
return properties
def write_tar(self, output_path, files_to_add):
with open(output_path, 'wb') as tar_file:
for file_path in files_to_add:
self._add_file_to_tar(tar_file, file_path)
# End of archive
tar_file.write(b'\x00' * (self.BLOCK_SIZE * 2))
def _add_file_to_tar(self, tar_file, file_path):
stat = os.stat(file_path)
name = os.path.basename(file_path).encode('ascii')
header = struct.pack(
self.HEADER_FORMAT,
name.ljust(100, b'\x00'),
f'{stat.st_mode:07o}\x00'.encode(),
f'{stat.st_uid:07o}\x00'.encode(),
f'{stat.st_gid:07o}\x00'.encode(),
f'{stat.st_size:011o}\x00'.encode(),
f'{int(stat.st_mtime):011o}\x00'.encode(),
b' ', # Checksum placeholder
b'0', # Regular file
b''.ljust(100, b'\x00'), # linkname
b'ustar\x00',
b'00',
os.getlogin().encode().ljust(32, b'\x00'),
os.getlogin().encode().ljust(32, b'\x00'),
b'000000 \x00', # devmajor
b'000000 \x00', # devminor
b''.ljust(155, b'\x00'), # prefix
b''.ljust(12, b'\x00') # pad
)
checksum = sum(header) & 0o777777
header = header[:148] + f'{checksum:06o} \x00'.encode() + header[156:]
tar_file.write(header)
with open(file_path, 'rb') as src:
data = src.read()
tar_file.write(data)
padding = (self.BLOCK_SIZE - (len(data) % self.BLOCK_SIZE)) % self.BLOCK_SIZE
tar_file.write(b'\x00' * padding)
Usage example: handler = TarHandler('example.tar'); handler.read_and_print_properties(); handler.write_tar('new.tar', ['file1.txt', 'file2.txt']).
5. Java Class for .TAR File Handling
The following Java class can open a .TAR file, decode headers, read and print properties for each entry, and write a simple .TAR archive.
import java.io.*;
import java.nio.charset.StandardCharsets;
import java.nio.file.Files;
import java.nio.file.attribute.BasicFileAttributes;
public class TarHandler {
private static final int BLOCK_SIZE = 512;
private final String filepath;
public TarHandler(String filepath) {
this.filepath = filepath;
}
public void readAndPrintProperties() throws IOException {
try (RandomAccessFile file = new RandomAccessFile(filepath, "r")) {
long offset = 0;
while (true) {
byte[] header = new byte[BLOCK_SIZE];
if (file.read(header) < BLOCK_SIZE || isAllZero(header)) {
break;
}
Properties props = decodeHeader(header);
if (props != null) {
System.out.println("Entry at offset " + offset + ":");
props.print();
System.out.println();
}
long size = Long.parseLong(props.size, 8);
long dataBlocks = (size + BLOCK_SIZE - 1) / BLOCK_SIZE;
file.seek(file.getFilePointer() + dataBlocks * BLOCK_SIZE);
offset += BLOCK_SIZE + dataBlocks * BLOCK_SIZE;
}
}
}
private Properties decodeHeader(byte[] data) {
Properties props = new Properties();
props.name = new String(data, 0, 100, StandardCharsets.US_ASCII).trim();
if (props.name.isEmpty()) return null;
props.mode = Integer.parseInt(new String(data, 100, 8, StandardCharsets.US_ASCII).trim(), 8);
props.uid = Integer.parseInt(new String(data, 108, 8, StandardCharsets.US_ASCII).trim(), 8);
props.gid = Integer.parseInt(new String(data, 116, 8, StandardCharsets.US_ASCII).trim(), 8);
props.size = new String(data, 124, 12, StandardCharsets.US_ASCII).trim();
props.mtime = Long.parseLong(new String(data, 136, 12, StandardCharsets.US_ASCII).trim(), 8);
props.checksum = Integer.parseInt(new String(data, 148, 8, StandardCharsets.US_ASCII).trim(), 8);
props.typeflag = (char) data[156];
props.linkname = new String(data, 157, 100, StandardCharsets.US_ASCII).trim();
props.magic = new String(data, 257, 6, StandardCharsets.US_ASCII);
props.version = new String(data, 263, 2, StandardCharsets.US_ASCII);
props.uname = new String(data, 265, 32, StandardCharsets.US_ASCII).trim();
props.gname = new String(data, 297, 32, StandardCharsets.US_ASCII).trim();
props.devmajor = Integer.parseInt(new String(data, 329, 8, StandardCharsets.US_ASCII).trim(), 8);
props.devminor = Integer.parseInt(new String(data, 337, 8, StandardCharsets.US_ASCII).trim(), 8);
props.prefix = new String(data, 345, 155, StandardCharsets.US_ASCII).trim();
return props;
}
private boolean isAllZero(byte[] data) {
for (byte b : data) {
if (b != 0) return false;
}
return true;
}
public void writeTar(String outputPath, String[] filesToAdd) throws IOException {
try (FileOutputStream tarFile = new FileOutputStream(outputPath)) {
for (String filePath : filesToAdd) {
addFileToTar(tarFile, filePath);
}
// End of archive
byte[] end = new byte[BLOCK_SIZE * 2];
tarFile.write(end);
}
}
private void addFileToTar(FileOutputStream tarFile, String filePath) throws IOException {
File file = new File(filePath);
BasicFileAttributes attrs = Files.readAttributes(file.toPath(), BasicFileAttributes.class);
String name = file.getName();
long size = file.length();
long mtime = attrs.lastModifiedTime().toMillis() / 1000;
byte[] header = new byte[BLOCK_SIZE];
System.arraycopy(name.getBytes(StandardCharsets.US_ASCII), 0, header, 0, Math.min(name.length(), 100));
String modeStr = String.format("%07o\0", 0644);
System.arraycopy(modeStr.getBytes(), 0, header, 100, 8);
String uidStr = String.format("%07o\0", 0);
System.arraycopy(uidStr.getBytes(), 0, header, 108, 8);
System.arraycopy(uidStr.getBytes(), 0, header, 116, 8);
String sizeStr = String.format("%011o\0", size);
System.arraycopy(sizeStr.getBytes(), 0, header, 124, 12);
String mtimeStr = String.format("%011o\0", mtime);
System.arraycopy(mtimeStr.getBytes(), 0, header, 136, 12);
header[156] = '0'; // Regular file
System.arraycopy("ustar\0".getBytes(), 0, header, 257, 6);
System.arraycopy("00".getBytes(), 0, header, 263, 2);
String user = System.getProperty("user.name");
System.arraycopy(user.getBytes(), 0, header, 265, Math.min(user.length(), 32));
System.arraycopy(user.getBytes(), 0, header, 297, Math.min(user.length(), 32));
// Checksum
int checksum = 0;
for (byte b : header) checksum += b & 0xFF;
checksum += 8 * ' '; // For checksum field
String checksumStr = String.format("%06o\0 ", checksum);
System.arraycopy(checksumStr.getBytes(), 0, header, 148, 8);
tarFile.write(header);
try (FileInputStream src = new FileInputStream(file)) {
byte[] buffer = new byte[1024];
int len;
while ((len = src.read(buffer)) > 0) {
tarFile.write(buffer, 0, len);
}
}
int padding = (int) ((BLOCK_SIZE - (size % BLOCK_SIZE)) % BLOCK_SIZE);
tarFile.write(new byte[padding]);
}
static class Properties {
String name, size, magic, version, uname, gname, linkname, prefix;
int mode, uid, gid, checksum, devmajor, devminor;
long mtime;
char typeflag;
void print() {
System.out.println(" name: " + name);
System.out.println(" mode: " + mode);
System.out.println(" uid: " + uid);
System.out.println(" gid: " + gid);
System.out.println(" size: " + size);
System.out.println(" mtime: " + mtime);
System.out.println(" checksum: " + checksum);
System.out.println(" typeflag: " + typeflag);
System.out.println(" linkname: " + linkname);
System.out.println(" magic: " + magic);
System.out.println(" version: " + version);
System.out.println(" uname: " + uname);
System.out.println(" gname: " + gname);
System.out.println(" devmajor: " + devmajor);
System.out.println(" devminor: " + devminor);
System.out.println(" prefix: " + prefix);
}
}
}
Usage example: TarHandler handler = new TarHandler("example.tar"); handler.readAndPrintProperties(); handler.writeTar("new.tar", new String[]{"file1.txt", "file2.txt"});
6. JavaScript Class for .TAR File Handling
The following JavaScript class (for Node.js) can open a .TAR file, decode headers, read and print properties for each entry to console, and write a simple .TAR archive.
const fs = require('fs');
class TarHandler {
constructor(filepath) {
this.filepath = filepath;
this.BLOCK_SIZE = 512;
}
readAndPrintProperties() {
const data = fs.readFileSync(this.filepath);
let offset = 0;
while (offset < data.length) {
const header = data.slice(offset, offset + this.BLOCK_SIZE);
if (this.isAllZero(header)) break;
const properties = this.decodeHeader(header);
if (properties) {
console.log(`Entry at offset ${offset}:`);
console.log(properties);
console.log();
}
const size = parseInt(properties.size, 8);
const dataBlocks = Math.ceil(size / this.BLOCK_SIZE);
offset += this.BLOCK_SIZE + dataBlocks * this.BLOCK_SIZE;
}
}
decodeHeader(data) {
const properties = {};
properties.name = this.asciiToString(data, 0, 100).trim();
if (!properties.name) return null;
properties.mode = parseInt(this.asciiToString(data, 100, 8), 8);
properties.uid = parseInt(this.asciiToString(data, 108, 8), 8);
properties.gid = parseInt(this.asciiToString(data, 116, 8), 8);
properties.size = this.asciiToString(data, 124, 12).trim();
properties.mtime = parseInt(this.asciiToString(data, 136, 12), 8);
properties.checksum = parseInt(this.asciiToString(data, 148, 8), 8);
properties.typeflag = String.fromCharCode(data[156]);
properties.linkname = this.asciiToString(data, 157, 100).trim();
properties.magic = this.asciiToString(data, 257, 6);
properties.version = this.asciiToString(data, 263, 2);
properties.uname = this.asciiToString(data, 265, 32).trim();
properties.gname = this.asciiToString(data, 297, 32).trim();
properties.devmajor = parseInt(this.asciiToString(data, 329, 8), 8);
properties.devminor = parseInt(this.asciiToString(data, 337, 8), 8);
properties.prefix = this.asciiToString(data, 345, 155).trim();
return properties;
}
asciiToString(data, start, length) {
let str = '';
for (let i = start; i < start + length; i++) {
if (data[i] === 0) break;
str += String.fromCharCode(data[i]);
}
return str;
}
isAllZero(data) {
return data.every(b => b === 0);
}
writeTar(outputPath, filesToAdd) {
const tarData = [];
filesToAdd.forEach(filePath => {
this.addFileToTar(tarData, filePath);
});
// End of archive
tarData.push(Buffer.alloc(this.BLOCK_SIZE * 2));
fs.writeFileSync(outputPath, Buffer.concat(tarData));
}
addFileToTar(tarData, filePath) {
const stats = fs.statSync(filePath);
const name = filePath.split('/').pop();
const header = Buffer.alloc(this.BLOCK_SIZE);
header.write(name, 0, 100);
header.write((stats.mode & 0o777).toString(8).padStart(7, '0') + '\0', 100, 8);
header.write('0000000\0', 108, 8); // uid
header.write('0000000\0', 116, 8); // gid
header.write(stats.size.toString(8).padStart(11, '0') + '\0', 124, 12);
header.write(Math.floor(stats.mtime.getTime() / 1000).toString(8).padStart(11, '0') + '\0', 136, 12);
header[156] = '0'.charCodeAt(0); // typeflag
header.write('ustar\0', 257, 6);
header.write('00', 263, 2);
const user = process.env.USER || 'user';
header.write(user, 265, user.length);
header.write(user, 297, user.length);
// Checksum
let checksum = 0;
for (let i = 0; i < this.BLOCK_SIZE; i++) checksum += header[i];
checksum += 8 * ' '.charCodeAt(0); // For checksum field
header.write(checksum.toString(8).padStart(6, '0') + '\0 ', 148, 8);
tarData.push(header);
const fileData = fs.readFileSync(filePath);
tarData.push(fileData);
const padding = (this.BLOCK_SIZE - (fileData.length % this.BLOCK_SIZE)) % this.BLOCK_SIZE;
tarData.push(Buffer.alloc(padding));
}
}
module.exports = TarHandler;
Usage example: const TarHandler = require('./TarHandler'); const handler = new TarHandler('example.tar'); handler.readAndPrintProperties(); handler.writeTar('new.tar', ['file1.txt', 'file2.txt']);
7. C++ Class for .TAR File Handling
The following C++ class can open a .TAR file, decode headers, read and print properties for each entry to console, and write a simple .TAR archive.
#include <iostream>
#include <fstream>
#include <string>
#include <vector>
#include <cstring>
#include <sys/stat.h>
#include <unistd.h>
class TarHandler {
private:
std::string filepath;
static const int BLOCK_SIZE = 512;
struct Properties {
std::string name;
int mode;
int uid;
int gid;
std::string size; // Keep as string for printing
long mtime;
int checksum;
char typeflag;
std::string linkname;
std::string magic;
std::string version;
std::string uname;
std::string gname;
int devmajor;
int devminor;
std::string prefix;
void print() const {
std::cout << " name: " << name << std::endl;
std::cout << " mode: " << mode << std::endl;
std::cout << " uid: " << uid << std::endl;
std::cout << " gid: " << gid << std::endl;
std::cout << " size: " << size << std::endl;
std::cout << " mtime: " << mtime << std::endl;
std::cout << " checksum: " << checksum << std::endl;
std::cout << " typeflag: " << typeflag << std::endl;
std::cout << " linkname: " << linkname << std::endl;
std::cout << " magic: " << magic << std::endl;
std::cout << " version: " << version << std::endl;
std::cout << " uname: " << uname << std::endl;
std::cout << " gname: " << gname << std::endl;
std::cout << " devmajor: " << devmajor << std::endl;
std::cout << " devminor: " << devminor << std::endl;
std::cout << " prefix: " << prefix << std::endl;
}
};
public:
TarHandler(const std::string& filepath) : filepath(filepath) {}
void readAndPrintProperties() {
std::ifstream file(filepath, std::ios::binary);
if (!file) return;
long offset = 0;
while (true) {
char header[BLOCK_SIZE];
file.read(header, BLOCK_SIZE);
if (file.gcount() < BLOCK_SIZE || isAllZero(header)) break;
Properties props = decodeHeader(header);
if (!props.name.empty()) {
std::cout << "Entry at offset " << offset << ":" << std::endl;
props.print();
std::cout << std::endl;
}
long size = std::stol(props.size, nullptr, 8);
long dataBlocks = (size + BLOCK_SIZE - 1) / BLOCK_SIZE;
file.seekg(dataBlocks * BLOCK_SIZE, std::ios::cur);
offset += BLOCK_SIZE + dataBlocks * BLOCK_SIZE;
}
}
private:
Properties decodeHeader(const char* data) {
Properties props;
props.name = trim(std::string(data + 0, 100));
if (props.name.empty()) return props;
props.mode = std::stoi(trim(std::string(data + 100, 8)), nullptr, 8);
props.uid = std::stoi(trim(std::string(data + 108, 8)), nullptr, 8);
props.gid = std::stoi(trim(std::string(data + 116, 8)), nullptr, 8);
props.size = trim(std::string(data + 124, 12));
props.mtime = std::stol(trim(std::string(data + 136, 12)), nullptr, 8);
props.checksum = std::stoi(trim(std::string(data + 148, 8)), nullptr, 8);
props.typeflag = data[156];
props.linkname = trim(std::string(data + 157, 100));
props.magic = std::string(data + 257, 6);
props.version = std::string(data + 263, 2);
props.uname = trim(std::string(data + 265, 32));
props.gname = trim(std::string(data + 297, 32));
props.devmajor = std::stoi(trim(std::string(data + 329, 8)), nullptr, 8);
props.devminor = std::stoi(trim(std::string(data + 337, 8)), nullptr, 8);
props.prefix = trim(std::string(data + 345, 155));
return props;
}
bool isAllZero(const char* data) {
for (int i = 0; i < BLOCK_SIZE; ++i) {
if (data[i] != 0) return false;
}
return true;
}
std::string trim(const std::string& str) {
size_t end = str.find_last_not_of('\0');
return (end == std::string::npos) ? "" : str.substr(0, end + 1);
}
public:
void writeTar(const std::string& outputPath, const std::vector<std::string>& filesToAdd) {
std::ofstream tarFile(outputPath, std::ios::binary);
if (!tarFile) return;
for (const auto& filePath : filesToAdd) {
addFileToTar(tarFile, filePath);
}
// End of archive
char end[BLOCK_SIZE * 2] = {0};
tarFile.write(end, BLOCK_SIZE * 2);
}
private:
void addFileToTar(std::ofstream& tarFile, const std::string& filePath) {
struct stat stats;
if (stat(filePath.c_str(), &stats) != 0) return;
std::string name = filePath.substr(filePath.find_last_of('/') + 1);
char header[BLOCK_SIZE] = {0};
strncpy(header + 0, name.c_str(), 100);
snprintf(header + 100, 8, "%07o", stats.st_mode & 0777);
snprintf(header + 108, 8, "%07o", stats.st_uid);
snprintf(header + 116, 8, "%07o", stats.st_gid);
snprintf(header + 124, 12, "%011lo", (long)stats.st_size);
snprintf(header + 136, 12, "%011lo", (long)stats.st_mtime);
header[156] = '0'; // Regular file
strncpy(header + 257, "ustar\0", 6);
strncpy(header + 263, "00", 2);
std::string user = getlogin() ? getlogin() : "user";
strncpy(header + 265, user.c_str(), 32);
strncpy(header + 297, user.c_str(), 32);
// Checksum
unsigned int checksum = 0;
for (int i = 0; i < BLOCK_SIZE; ++i) checksum += static_cast<unsigned char>(header[i]);
checksum += 8 * ' ';
snprintf(header + 148, 8, "%06o\0 ", checksum);
tarFile.write(header, BLOCK_SIZE);
std::ifstream src(filePath, std::ios::binary);
if (src) {
char buffer[1024];
while (src.read(buffer, sizeof(buffer))) {
tarFile.write(buffer, src.gcount());
}
tarFile.write(buffer, src.gcount()); // Last read
}
int padding = (BLOCK_SIZE - (stats.st_size % BLOCK_SIZE)) % BLOCK_SIZE;
char pad[BLOCK_SIZE] = {0};
tarFile.write(pad, padding);
}
};
Usage example: TarHandler handler("example.tar"); handler.readAndPrintProperties(); handler.writeTar("new.tar", {"file1.txt", "file2.txt"});