Task 128: .DATS File Format
Task 128: .DATS File Format
.DATS File Format Specifications
The .DATS file format refers to the Data Tag Suite (DATS), a JSON-based schema developed for describing datasets, primarily in biomedical and scientific contexts. It is designed to enhance dataset discoverability and is serialized in JSON format, often with the .json extension but associated with DATS structure. The specifications are defined in JSON Schema documents hosted in the DATS repository.
1. List of Properties Intrinsic to the File Format
The DATS format is structured as a JSON object, with the root representing a "Dataset" entity. The intrinsic properties are the keys defined in the schema, each with specific types, descriptions, and cardinalities. Below is a comprehensive list derived from the DATS dataset schema:
- identifier: Type: object (reference to identifier_info_schema). Description: Unique identifier for the dataset. Required: No. Cardinality: 0..1.
- alternateIdentifiers: Type: array of objects (reference to alternate_identifier_info_schema). Description: Alternative identifiers for the dataset. Required: No. Cardinality: 0..*.
- relatedIdentifiers: Type: array of objects (reference to related_identifier_info_schema). Description: Identifiers for related resources. Required: No. Cardinality: 0..*.
- title: Type: string. Description: The name or title of the dataset. Required: Yes. Cardinality: 1.
- description: Type: string. Description: A textual summary of the dataset. Required: No. Cardinality: 0..1.
- types: Type: array of objects (reference to data_type_schema). Description: Types or categories of the data. Required: Yes. Cardinality: 1..*.
- keywords: Type: array of objects (reference to annotation_schema). Description: Keywords associated with the dataset. Required: No. Cardinality: 0..*.
- version: Type: string. Description: Version number of the dataset. Required: No. Cardinality: 0..1.
- licenses: Type: array of objects (reference to license_schema). Description: Licenses governing the dataset. Required: No. Cardinality: 0..*.
- creators: Type: array of objects (reference to creator_schema). Description: Entities that created the dataset. Required: No. Cardinality: 0..*.
- acknowledges: Type: array of objects (reference to funding_schema). Description: Acknowledgments or funding sources. Required: No. Cardinality: 0..*.
- dates: Type: array of objects (reference to date_schema). Description: Relevant dates (e.g., creation, update). Required: No. Cardinality: 0..*.
- privacy: Type: string. Description: Privacy level or restrictions. Required: No. Cardinality: 0..1.
- spatialCoverage: Type: array of objects (reference to place_schema). Description: Spatial or geographic coverage. Required: No. Cardinality: 0..*.
- temporalCoverage: Type: object (reference to period_of_time_schema). Description: Time period covered by the dataset. Required: No. Cardinality: 0..1.
- storedIn: Type: object (reference to data_repository_schema). Description: Repository where the dataset is stored. Required: No. Cardinality: 0..1.
- distributions: Type: array of objects (reference to dataset_distribution_schema). Description: Distribution formats or access points. Required: No. Cardinality: 0..*.
- primaryPublications: Type: array of objects (reference to publication_schema). Description: Primary publications related to the dataset. Required: No. Cardinality: 0..*.
- citations: Type: array of objects (reference to publication_schema). Description: Citations referencing the dataset. Required: No. Cardinality: 0..*.
- producedBy: Type: object (reference to study_schema). Description: Study or process that produced the dataset. Required: No. Cardinality: 0..1.
- isAbout: Type: array of objects (reference to biological_entity_schema). Description: Subjects or entities the dataset is about. Required: No. Cardinality: 0..*.
- hasPart: Type: array of objects (reference to dataset_schema). Description: Sub-datasets included. Required: No. Cardinality: 0..*.
- isPartOf: Type: array of objects (reference to dataset_schema). Description: Parent datasets. Required: No. Cardinality: 0..*.
- dimensions: Type: array of objects (reference to dimension_schema). Description: Dimensions or variables in the dataset. Required: No. Cardinality: 0..*.
- extraProperties: Type: array of objects (reference to category_values_pair_schema). Description: Additional custom properties. Required: No. Cardinality: 0..*.
These properties form the core structure, with references to sub-schemas for complex types.
2. Two Direct Download Links for .DATS Files
Note that DATS files are typically saved with a .json extension but conform to the DATS schema. Here are two direct download links to example files:
- https://raw.githubusercontent.com/bento-platform/bento_demo_dataset/main/dats.json
- https://raw.githubusercontent.com/conpdatasets/Multi-model_functionalization_of_disease-associated_PTEN_missense_mutations/master/DATS.json
3. Ghost Blog Embedded HTML JavaScript for Drag and Drop
The following is an HTML page with embedded JavaScript that can be embedded in a Ghost blog post. It allows users to drag and drop a .DATS (JSON) file, parses it, and dumps all the properties listed above to the screen.
4. Python Class for .DATS Files
The following Python class opens a .DATS file, decodes it as JSON, reads the properties, allows writing updated properties back to a file, and prints them to the console.
import json
class DATSHandler:
def __init__(self, filepath):
self.filepath = filepath
self.data = self.read()
def read(self):
with open(self.filepath, 'r') as f:
return json.load(f)
def print_properties(self):
properties = ['identifier', 'alternateIdentifiers', 'relatedIdentifiers', 'title', 'description', 'types', 'keywords', 'version', 'licenses', 'creators', 'acknowledges', 'dates', 'privacy', 'spatialCoverage', 'temporalCoverage', 'storedIn', 'distributions', 'primaryPublications', 'citations', 'producedBy', 'isAbout', 'hasPart', 'isPartOf', 'dimensions', 'extraProperties']
for prop in properties:
if prop in self.data:
print(f"{prop}: {json.dumps(self.data[prop], indent=2)}")
def write(self, new_filepath=None):
filepath = new_filepath or self.filepath
with open(filepath, 'w') as f:
json.dump(self.data, f, indent=2)
# Example usage:
# handler = DATSHandler('example.dats')
# handler.print_properties()
# handler.data['title'] = 'Updated Title'
# handler.write('updated.dats')
5. Java Class for .DATS Files
The following Java class opens a .DATS file, decodes it as JSON using org.json library (assume included), reads the properties, allows writing updated properties back to a file, and prints them to the console.
import org.json.JSONObject;
import org.json.JSONException;
import java.io.*;
public class DATSHandler {
private String filepath;
private JSONObject data;
public DATSHandler(String filepath) throws IOException, JSONException {
this.filepath = filepath;
this.data = read();
}
private JSONObject read() throws IOException, JSONException {
BufferedReader br = new BufferedReader(new FileReader(filepath));
StringBuilder sb = new StringBuilder();
String line;
while ((line = br.readLine()) != null) {
sb.append(line);
}
br.close();
return new JSONObject(sb.toString());
}
public void printProperties() {
String[] properties = {"identifier", "alternateIdentifiers", "relatedIdentifiers", "title", "description", "types", "keywords", "version", "licenses", "creators", "acknowledges", "dates", "privacy", "spatialCoverage", "temporalCoverage", "storedIn", "distributions", "primaryPublications", "citations", "producedBy", "isAbout", "hasPart", "isPartOf", "dimensions", "extraProperties"};
for (String prop : properties) {
if (data.has(prop)) {
System.out.println(prop + ": " + data.get(prop).toString(2));
}
}
}
public void write(String newFilepath) throws IOException {
String path = (newFilepath != null) ? newFilepath : filepath;
try (FileWriter file = new FileWriter(path)) {
file.write(data.toString(2));
}
}
public JSONObject getData() {
return data;
}
// Example usage:
// public static void main(String[] args) throws Exception {
// DATSHandler handler = new DATSHandler("example.dats");
// handler.printProperties();
// handler.getData().put("title", "Updated Title");
// handler.write("updated.dats");
// }
}
6. JavaScript Class for .DATS Files
The following JavaScript class (for Node.js) opens a .DATS file, decodes it as JSON, reads the properties, allows writing updated properties back to a file, and prints them to the console.
const fs = require('fs');
class DATSHandler {
constructor(filepath) {
this.filepath = filepath;
this.data = this.read();
}
read() {
const content = fs.readFileSync(this.filepath, 'utf8');
return JSON.parse(content);
}
printProperties() {
const properties = ['identifier', 'alternateIdentifiers', 'relatedIdentifiers', 'title', 'description', 'types', 'keywords', 'version', 'licenses', 'creators', 'acknowledges', 'dates', 'privacy', 'spatialCoverage', 'temporalCoverage', 'storedIn', 'distributions', 'primaryPublications', 'citations', 'producedBy', 'isAbout', 'hasPart', 'isPartOf', 'dimensions', 'extraProperties'];
properties.forEach(prop => {
if (this.data.hasOwnProperty(prop)) {
console.log(`${prop}: ${JSON.stringify(this.data[prop], null, 2)}`);
}
});
}
write(newFilepath = null) {
const path = newFilepath || this.filepath;
fs.writeFileSync(path, JSON.stringify(this.data, null, 2));
}
}
// Example usage:
// const handler = new DATSHandler('example.dats');
// handler.printProperties();
// handler.data.title = 'Updated Title';
// handler.write('updated.dats');
7. C Class for .DATS Files
Since C does not have built-in classes, the following is implemented in C++ (commonly referred to as "C class" in contexts implying object-oriented features). It uses the nlohmann/json library (assume included) to open a .DATS file, decode it as JSON, read the properties, allow writing updated properties back to a file, and print them to the console.
#include <iostream>
#include <fstream>
#include <nlohmann/json.hpp>
using json = nlohmann::json;
class DATSHandler {
private:
std::string filepath;
json data;
public:
DATSHandler(const std::string& fp) : filepath(fp) {
data = read();
}
json read() {
std::ifstream file(filepath);
json j;
file >> j;
return j;
}
void printProperties() {
std::vector<std::string> properties = {"identifier", "alternateIdentifiers", "relatedIdentifiers", "title", "description", "types", "keywords", "version", "licenses", "creators", "acknowledges", "dates", "privacy", "spatialCoverage", "temporalCoverage", "storedIn", "distributions", "primaryPublications", "citations", "producedBy", "isAbout", "hasPart", "isPartOf", "dimensions", "extraProperties"};
for (const auto& prop : properties) {
if (data.contains(prop)) {
std::cout << prop << ": " << data[prop].dump(2) << std::endl;
}
}
}
void write(const std::string& newFilepath = "") {
std::string path = newFilepath.empty() ? filepath : newFilepath;
std::ofstream file(path);
file << data.dump(2);
}
json& getData() {
return data;
}
};
// Example usage:
// int main() {
// DATSHandler handler("example.dats");
// handler.printProperties();
// handler.getData()["title"] = "Updated Title";
// handler.write("updated.dats");
// return 0;
// }