Dataset information
===================
In **ttl2html**, you can describe information about an entire dataset (e.g., title, author, release date, data size) in RDF format and input it into the tool.
The tool then displays this information on the main page and the "About" page, giving users a clear overview before they use the dataset.
The dataset metadata expected by this tool consists of three main parts:
- Metadata for the entire dataset
- Contact information
- Version history information
The basic data structure is illustrated in the following diagram:
.. figure:: dataset-model.png
Data model for the dataset information
The table below lists the metadata vocabularies and their namespaces used in the model diagram and in the descriptions that follow:
.. csv-table:: Metadata vocabulary used (prefix and namespace)
:header: "Metadata vocabulary", "Prefix", "Namespace URI"
:widths: auto
`DCMI Metadata Terms `_, dct:, `http://purl.org/dc/terms/ `_
`RDF Schema `_, rdfs:, `http://www.w3.org/2000/01/rdf-schema# `_
`FOAF (Friend of a Friend) Vocabulary `_, foaf:, `http://xmlns.com/foaf/0.1/ `_
`VoID (Vocabulary of Interlinked Datasets) `_, void:, `http://rdfs.org/ns/void# `_
`PAV (Provenance Authoring and Versioning ontology) `_, pav:, `http://purl.org/pav/ `_
`DCAT (Data Catalog Vocabulary) `_, dcat:, `http://www.w3.org/ns/dcat# `_
`PROV-O (PROV ontology) `_, prov:, `http://www.w3.org/ns/prov# `_
Metadata for the Entire Dataset
-------------------------------
The resource labeled "Entire Dataset" in the model represents the dataset as a whole.
Metadata about the dataset is attached as properties of this resource.
- A resource linked with ``pav:hasCurrentVersion`` represents the "latest release version" of the dataset and contains details of the available Linked Data. When a resource with this ``pav:hasCurrentVersion`` property is found, this tool determines that dataset information is included and automatically writes out the dataset information.
- A resource linked with ``dct:publisher`` represents the person or organization providing the dataset. This information is displayed as "contact information" in the Linked Data output.
- Resources linked with ``pav:hasVersion`` represent "previous versions" and serve as historical information.
The following properties are available for describing the overall dataset:
.. csv-table:: Properties for Entire Dataset Metadata
:header: Property, Description
:widths: auto
``rdf:type``, ``void:Dataset``
``dct:title``, Title of the dataset
``dct:description``, Description of the dataset
``dct:license``, Dataset license; use a URI such as a Creative Commons license if possible
``foaf:homepage``, URI of the published dataset site
``foaf:page``, Additional page for the dataset (if it has a different URI from above)
``dct:publisher``, Dataset publisher (see contact information below)
**Example (Turtle):**
.. code-block:: turtle
@prefix dct: .
@prefix foaf: .
@prefix void: .
@prefix pav: .
@prefix ex: .
ex:dataset1 a void:Dataset ;
dct:title "Sample RDF Dataset"@en ;
dct:description "An example dataset for demonstrating ttl2html metadata"@en ;
dct:license ;
foaf:homepage ;
dct:publisher ex:project1 ;
pav:hasCurrentVersion ex:dataset1-v2 ;
pav:hasVersion ex:dataset1-v1 .
Contact Information
-------------------
The following properties can be used to describe the provider of the dataset.
If the provider consists of multiple people, the contact resource should be represented as an instance of the ``foaf:Project`` class, and each member is linked with the ``foaf:member`` property.
.. csv-table:: Properties for contact information (project/multiple people)
:header: Properties, description
:widths: auto
``rdf:type``, ``foaf:Project``
``foaf:name``,Name of the project
``foaf:member``,Member of the project (multiple repetitions possible). Links to resources that represent individuals below.
.. csv-table:: Properties for individual contact information
:header: Property, Description
:widths: auto
``rdf:type``, ``foaf:Person``
``foaf:name``, Name of the individual
``foaf:mbox``, Email address
``vcard:organization-name``, Name of the organization to which the individual belongs
**Example (Turtle):**
.. code-block:: turtle
ex:project1 a foaf:Project ;
foaf:name "Example Project" ;
foaf:member ex:alice ;
foaf:member ex:bob .
ex:alice a foaf:Person ;
foaf:name "Alice Example" ;
foaf:mbox ;
"Example University" .
ex:bob a foaf:Person ;
foaf:name "Bob Example" ;
foaf:mbox .
Version History Information
---------------------------
Version history information provides details about dataset revisions over time.
This information is represented using the
`PAV (Provenance Authoring and Versioning ontology) `_.
- The latest version is linked from the "Entire Dataset" resource with the ``pav:hasCurrentVersion`` property.
- Past versions are linked with the ``pav:hasVersion`` property.
The following properties can be used for each version resource:
.. csv-table:: Properties for version history information
:header: Property, Description
:widths: auto
``rdf:type``, ``prov:Dataset``
``dct:title``, Version title
``dct:issued``, Release date of the version
``pav:version``, Version number
``dcat:bytesize``, File size of the dataset
``void:triples``, Number of triples in the dataset
``void:dataDump``, URI of the dataset file
``prov:qualifiedRevision``, Resource describing revision details (can be a blank node)
``prov:wasDerivedFrom``, Source resource from which the data was obtained (can be a blank node)
**Example (Turtle):**
.. code-block:: turtle
ex:dataset1-v2 a prov:Dataset ;
dct:title "Dataset Version 2.0" ;
pav:version "2.0" ;
dct:issued "2025-12-25" ;
dcat:byteSize 123456 ;
void:triples 50000 ;
void:dataDump ;
prov:qualifiedRevision ex:revnote-v2 ;
prov:wasDerivedFrom [
rdf:value ;
rdfs:label "Project Report 2022-2024" .
] .
ex:revnote-v2 a prov:Revision ;
rdfs:comment "Second release: added new data and fixed errors in metadata"@en ;
rdfs:seeAlso .
ex:dataset1-v1 a prov:Revision ;
dct:title "Dataset Version 1.0" ;
pav:version "1.0" ;
void:triples 30000 ;
void:dataDump .
Revision Details
^^^^^^^^^^^^^^^^^^
The value of ``prov:qualifiedRevision`` may contain the following properties:
.. csv-table:: Properties for revision details
:header: Property, Description
:widths: auto
``rdf:type``, ``prov:Revision``
``rdfs:comment``, Description of the revision
``rdfs:seeAlso``, URI with more details on the revision (if available)
Source Information
^^^^^^^^^^^^^^^^^^
The ``prov:wasDerivedFrom`` property, assigned to a dataset version resource, can be used to represent source information.
By describing the value of this property as a blank node with the structure shown below, the source of a published dataset can be explicitly indicated.
This method can also be used to provide attribution required by licenses such as CC-BY.
The resource used as the value of ``prov:wasDerivedFrom`` (typically assumed to be a blank node) should be assigned at least two properties: ``rdf:value`` and ``rdfs:label``.
The ``rdf:value`` property should contain the URI of the source, while ``rdfs:label`` should provide the name of the source (a human-readable label).
License Information
-------------------
License information can be expressed not only as a single URI, but also as an extended representation that adds explanatory text to the URI.
.. csv-table:: License Information Properties
:header: Property, Description
:widths: auto
``rdf:value``, URI representing the license itself
``rdfs:label``, Descriptive text describing the license
``foaf:thumbnail``, URI of a thumbnail image of the license information
.. code-block:: turtle
ex:dataset1 a void:Dataset ;
...
dct:license ex:license ;
... .
ex:license
rdf:value ;
rdfs:label "Creative Commons Attribution-ShareAlike (CC BY-SA)";
foaf:thumbnail ex:license.png .
License information can also be expressed using blank nodes as follows:
.. code-block:: turtle
ex:dataset1 a void:Dataset ;
...
dct:license [
rdf:value ;
rdfs:label "Creative Commons Attribution-ShareAlike (CC BY-SA)";
foaf:thumbnail ex:license.png
] ;
SPARQL Endpoint Information
---------------------------
In ttl2html, information about the location of a SPARQL endpoint can be displayed on the top page and the about page by describing it in the input RDF triples (Turtle).
To express the endpoint location in RDF, use one of the following methods:
- Method A (recommended): Express as a DataService using DCAT's `dcat:accessService`.
You can specify not only the endpoint URL but also the landing page.
- Method B (simplified): Write only VoID's `void:sparqlEndpoint`.
This provides a minimal endpoint URI, but no landing page.
In the following example, `_:toplevel` is expressed as an entity representing the entire dataset.
Method A: Express using dcat:accessService (DataService)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following RDF triple expression can be added as part of the metadata for the entire dataset.
.. code-block:: turtle
@prefix void: .
@prefix dcat: .
_:toplevel a void:Dataset, dcat:Dataset ;
dcat:accessService [
a dcat:DataService;
dcat:endpointURL ;
dcat:landingPage
] .
The meanings of the properties in the above example can be understood as follows:
- `dcat:accessService`: Service for accessing this dataset
- `a dcat:DataService`: Type of service (DataService)
- `dcat:endpointURL`: SPARQL endpoint URL
- `dcat:landingPage`: Landing page for humans (e.g., query UI, description page)
Method B: Express using void:sparqlEndpoint
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
The following RDF triple expression can be added as part of the metadata for the entire dataset.
.. code-block:: turtle
@prefix void: .
@prefix dcat: .
_:toplevel a void:Dataset, dcat:Dataset ;
void:sparqlEndpoint .
Note that when using void:sparqlEndpoint, you can only add endpoint URIs for machine access; you cannot describe search or description pages for humans.