Dataset information

In ttl2html, you can describe information about an entire dataset (e.g., title, author, release date, data size) in RDF format and input it into the tool. The tool then displays this information on the main page and the “About” page, giving users a clear overview before they use the dataset.

The dataset metadata expected by this tool consists of three main parts:

  • Metadata for the entire dataset

  • Contact information

  • Version history information

The basic data structure is illustrated in the following diagram:

_images/dataset-model.png

Data model for the dataset information

The table below lists the metadata vocabularies and their namespaces used in the model diagram and in the descriptions that follow:

Metadata vocabulary used (prefix and namespace)

Metadata vocabulary

Prefix

Namespace URI

DCMI Metadata Terms

dct:

http://purl.org/dc/terms/

RDF Schema

rdfs:

http://www.w3.org/2000/01/rdf-schema#

FOAF (Friend of a Friend) Vocabulary

foaf:

http://xmlns.com/foaf/0.1/

VoID (Vocabulary of Interlinked Datasets)

void:

http://rdfs.org/ns/void#

PAV (Provenance Authoring and Versioning ontology)

pav:

http://purl.org/pav/

DCAT (Data Catalog Vocabulary)

dcat:

http://www.w3.org/ns/dcat#

PROV-O (PROV ontology)

prov:

http://www.w3.org/ns/prov#

Metadata for the Entire Dataset

The resource labeled “Entire Dataset” in the model represents the dataset as a whole. Metadata about the dataset is attached as properties of this resource.

  • A resource linked with pav:hasCurrentVersion represents the “latest release version” of the dataset and contains details of the available Linked Data. When a resource with this pav:hasCurrentVersion property is found, this tool determines that dataset information is included and automatically writes out the dataset information.

  • A resource linked with dct:publisher represents the person or organization providing the dataset. This information is displayed as “contact information” in the Linked Data output.

  • Resources linked with pav:hasVersion represent “previous versions” and serve as historical information.

The following properties are available for describing the overall dataset:

Properties for Entire Dataset Metadata

Property

Description

rdf:type

void:Dataset

dct:title

Title of the dataset

dct:description

Description of the dataset

dct:license

Dataset license; use a URI such as a Creative Commons license if possible

foaf:homepage

URI of the published dataset site

foaf:page

Additional page for the dataset (if it has a different URI from above)

dct:publisher

Dataset publisher (see contact information below)

Example (Turtle):

@prefix dct:  <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix void: <http://rdfs.org/ns/void#> .
@prefix pav:  <http://purl.org/pav/> .
@prefix ex:   <http://example.org/dataset/> .

ex:dataset1 a void:Dataset ;
    dct:title "Sample RDF Dataset"@en ;
    dct:description "An example dataset for demonstrating ttl2html metadata"@en ;
    dct:license <https://creativecommons.org/licenses/by/4.0/> ;
    foaf:homepage <http://example.org/dataset/> ;
    dct:publisher ex:project1 ;
    pav:hasCurrentVersion ex:dataset1-v2 ;
    pav:hasVersion ex:dataset1-v1 .

Contact Information

The following properties can be used to describe the provider of the dataset. If the provider consists of multiple people, the contact resource should be represented as an instance of the foaf:Project class, and each member is linked with the foaf:member property.

Properties for contact information (project/multiple people)

Properties

description

rdf:type

foaf:Project

foaf:name

Name of the project

foaf:member

Member of the project (multiple repetitions possible). Links to resources that represent individuals below.

Properties for individual contact information

Property

Description

rdf:type

foaf:Person

foaf:name

Name of the individual

foaf:mbox

Email address

vcard:organization-name

Name of the organization to which the individual belongs

Example (Turtle):

ex:project1 a foaf:Project ;
    foaf:name "Example Project" ;
    foaf:member ex:alice ;
    foaf:member ex:bob .

ex:alice a foaf:Person ;
    foaf:name "Alice Example" ;
    foaf:mbox <mailto:alice@example.org> ;
    <http://www.w3.org/2006/vcard/ns#organization-name> "Example University" .

ex:bob a foaf:Person ;
    foaf:name "Bob Example" ;
    foaf:mbox <mailto:bob@example.org> .

Version History Information

Version history information provides details about dataset revisions over time. This information is represented using the PAV (Provenance Authoring and Versioning ontology).

  • The latest version is linked from the “Entire Dataset” resource with the pav:hasCurrentVersion property.

  • Past versions are linked with the pav:hasVersion property.

The following properties can be used for each version resource:

Properties for version history information

Property

Description

rdf:type

prov:Dataset

dct:title

Version title

dct:issued

Release date of the version

pav:version

Version number

dcat:bytesize

File size of the dataset

void:triples

Number of triples in the dataset

void:dataDump

URI of the dataset file

prov:qualifiedRevision

Resource describing revision details (can be a blank node)

prov:wasDerivedFrom

Source resource from which the data was obtained (can be a blank node)

Example (Turtle):

ex:dataset1-v2 a prov:Dataset ;
    dct:title "Dataset Version 2.0" ;
    pav:version "2.0" ;
    dct:issued "2025-12-25" ;
    dcat:byteSize 123456 ;
    void:triples 50000 ;
    void:dataDump <http://example.org/dataset/v2/dump.nt.gz> ;
    prov:qualifiedRevision ex:revnote-v2 ;
    prov:wasDerivedFrom [
      rdf:value <https://example.go.jp/sample-project/> ;
      rdfs:label "Project Report 2022-2024" .
    ] .

ex:revnote-v2 a prov:Revision ;
    rdfs:comment "Second release: added new data and fixed errors in metadata"@en ;
    rdfs:seeAlso <http://example.org/dataset/v2/changelog> .

ex:dataset1-v1 a prov:Revision ;
    dct:title "Dataset Version 1.0" ;
    pav:version "1.0" ;
    void:triples 30000 ;
    void:dataDump <http://example.org/dataset/v1/dump.nt.gz> .

Revision Details

The value of prov:qualifiedRevision may contain the following properties:

Properties for revision details

Property

Description

rdf:type

prov:Revision

rdfs:comment

Description of the revision

rdfs:seeAlso

URI with more details on the revision (if available)

Source Information

The prov:wasDerivedFrom property, assigned to a dataset version resource, can be used to represent source information. By describing the value of this property as a blank node with the structure shown below, the source of a published dataset can be explicitly indicated. This method can also be used to provide attribution required by licenses such as CC-BY.

The resource used as the value of prov:wasDerivedFrom (typically assumed to be a blank node) should be assigned at least two properties: rdf:value and rdfs:label. The rdf:value property should contain the URI of the source, while rdfs:label should provide the name of the source (a human-readable label).

License Information

License information can be expressed not only as a single URI, but also as an extended representation that adds explanatory text to the URI.

License Information Properties

Property

Description

rdf:value

URI representing the license itself

rdfs:label

Descriptive text describing the license

foaf:thumbnail

URI of a thumbnail image of the license information

ex:dataset1 a void:Dataset ;
   ...
   dct:license ex:license ;
   ... .

ex:license
   rdf:value <https://creativecommons.org/licenses/by/4.0/>;
   rdfs:label "Creative Commons Attribution-ShareAlike (CC BY-SA)";
   foaf:thumbnail ex:license.png .

License information can also be expressed using blank nodes as follows:

ex:dataset1 a void:Dataset ;
   ...
   dct:license [
      rdf:value <https://creativecommons.org/licenses/by/4.0/>;
      rdfs:label "Creative Commons Attribution-ShareAlike (CC BY-SA)";
      foaf:thumbnail ex:license.png
   ] ;

SPARQL Endpoint Information

In ttl2html, information about the location of a SPARQL endpoint can be displayed on the top page and the about page by describing it in the input RDF triples (Turtle).

To express the endpoint location in RDF, use one of the following methods:

  • Method A (recommended): Express as a DataService using DCAT’s dcat:accessService.

    You can specify not only the endpoint URL but also the landing page.

  • Method B (simplified): Write only VoID’s void:sparqlEndpoint.

    This provides a minimal endpoint URI, but no landing page.

In the following example, _:toplevel is expressed as an entity representing the entire dataset.

Method A: Express using dcat:accessService (DataService)

The following RDF triple expression can be added as part of the metadata for the entire dataset.

@prefix void: <http://rdfs.org/ns/void#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .

_:toplevel a void:Dataset, dcat:Dataset ;
   dcat:accessService [
      a dcat:DataService;
      dcat:endpointURL <https://dydra.com/masao/jp-naaa/sparql>;
      dcat:landingPage <https://dydra.com/masao/jp-naaa/@query>
   ] .

The meanings of the properties in the above example can be understood as follows:

  • dcat:accessService: Service for accessing this dataset

  • a dcat:DataService: Type of service (DataService)

  • dcat:endpointURL: SPARQL endpoint URL

  • dcat:landingPage: Landing page for humans (e.g., query UI, description page)

Method B: Express using void:sparqlEndpoint

The following RDF triple expression can be added as part of the metadata for the entire dataset.

@prefix void: <http://rdfs.org/ns/void#> .
@prefix dcat: <http://www.w3.org/ns/dcat#> .

_:toplevel a void:Dataset, dcat:Dataset ;
   void:sparqlEndpoint <https://dydra.com/masao/jp-naaa/sparql> .

Note that when using void:sparqlEndpoint, you can only add endpoint URIs for machine access; you cannot describe search or description pages for humans.