Fork me on GitHub

Another RDF Encoding Form (aREF)

Jakob Voß (voss@gbv.de)

2014-10-16 (version 0.32)

Table of Contents

1 Introduction

This document defines an encoding of RDF graphs called another RDF encoding form (aREF). The encoding combines and simpilfies parts of existing RDF serializations Turtle, JSON-LD, and RDF/JSON. In contrast to these formats, RDF data in aREF is not serialized as a Unicode string but encoded as a list-map-structure, as known from the type system of most programming languages and from data structuring languages such as JSON and YAML.

This specification of aREF is hosted in a public git repository at https://github.com/gbv/aREF/, written in in Pandoc’s Markdown and managed with makespec. Please add and comment on issues to this specification at https://github.com/gbv/aREF/issues. The most recent version of this document is made available at http://gbv.github.io/aREF/.

2 Background

2.1 Terminology

Terms written in “bold” refer to terms at the place of their definition in this document. Terms written in “italics” refer to terms defined elsewhere in this document. Uppercase keywords (MUST, MAY, RECOMMENDED, SHOULD…) are used as defined in RFC 2119. Syntax rules in this document are expressed in ABNF notation as specified by RFC 5234.

Examples and notes in this document are informative only. YAML syntax is used to express sample aREF documents, unless noted otherwise.

The following syntax rules are referenced later in this document:

string = *( %x0-%x10FFFF )

LOWERCASE = %x61-%x7A ; a-z

The term string in this document always refers to Unicode strings as defined by Unicode. A string can also be defined with syntax rule string.

Strings SHOULD always be normalized to Normal Form C (NFC). Applications MAY restrict strings by disallowing selected Unciode codepoints, such as the 66 Unicode noncharacters or the set of Unicode characters not expressible in XML.

2.2 RDF data

RDF is a graph-based data structuring languages defined as abstract syntax by Klyne and Carroll (2004). Several RDF variants exist (in particular see Wood, 2013 for a comparision between RDF 1.0 and RDF 1.1). RDF extensions with named graphs, blank nodes as predicates, and literal nodes as subjects are not covered by this specification nor expressible in aREF.

RDF data as encoded by aREF is defined as following:

An RDF graph encoded in aREF can also include blank node identifiers to refer to particular blank nodes within the scope of the same RDF graph.

Ask a Semantic Web or Linked Data evangelist for examples of RDF!

2.3 Lists-map-structures

A list-map-structure is an abstract data structure build of

Every aREF document MUST be given as map. Applications MAY restrict aREF documents to non-circular list-map-structures. All non-circular list-map-structures can be serialized in JSON and YAML.

Applications MAY support special null values, disjoint from strings, as element in a list and/or mapped to in a map. These null values MUST be ignored on decoding aREF.

See section aREF document types and appendix aREF serializations for examples.

3 Encoding

3.1 IRIs

An IRI in aREF is encoded as string, either as plain IRI, or as explicit IRI, or as qName. The special stringa” can further be used to encode the predicatehttp://www.w3.org/1999/02/22-rdf-syntax-ns#type”.

3.1.1 Plain IRIs

A plain IRI is an IRI, as defined in RFC 3987. If used as object, a plain IRIs MUST conform to the syntax rule IRILike to distinguish from a literal node.

  IRIlike = LOWERCASE *( LOWERCASE / DIGIT / "+" / "." / "-" ) ":" [ string ]

3.1.2 Explicit IRIs

An explicit IRI is an IRI enclosed in in angle brackets (“<” and “>”).

  explicitIRI   = "<" IRI ">"   ; IRI syntax rule from RFC 3987

Applications MAY use the syntax rule IRILike instead of IRI to facilitate decoding aREF.

3.1.3 qNames

A qName consists of a prefix and a localName separated by an underscore (“_”):

  qName  = prefix "_" localName

The prefix is a string starting with a lowercase letter (a-z) optionally followed by a sequence of lowercase letters and digits (0-9).

  prefix = LOWERCASE *( LOWERCASE / DIGIT )    ; a-z *( a-z / 0-9 )

The localName is a string that conforms to the following syntax.

  localName     = nameStartChar *(nameChar)

  nameStartChar = ALPHA / "_" / %x00C0-%x00D6 / %x00D8-%x00F6 /
                  %x00F8-%x02FF / %x0370-%x037D / %x037F-%x1FFF / 
                  %x200C-%x200D / %x2070-%x218F / %x2C00-%x2FEF / 
                  %x3001-%xD7FF / %xF900-%xFDCF / %xFDF0-%xFFFD /
                  %x10000-%xEFFFF

  nameChar      = nameStartChar / '-' / DIGIT / %xB7 / %x0300-%x036F / %x203F-%x2040
The syntax rule localName is more restrictive than corresponding definitions in Turtle and JSON-LD.

A qName is mapped to an IRI by appending its localName to the namespace URI that corresponds to its prefix. Applications SHOULD warn about unknown prefixes and/or ignore all triples that include a node with an unknown prefix.

3.2 Literal nodes

A literal node is encoded as string in one of three forms:

  literalNode   = languageString / datatypeString / plainLiteral

3.2.1 Literal nodes with language tag

A literal node with language tag is encoded by appending an at sign (“@”) followed by the language tag to the node’s string:

  languageString = string "@" languageTag

  languageTag    = 2*8(ALPHA) *( "-" 1*8( ALPHA / DIGIT ) )
{
  "_id": "http://example.com/MyResource",
  "skos_prefLabel": [
    "east@en",
    "Osten@de"
    "東@ja",
    "東@ja-Hani",
    "ヒガシ@ja-Kana",
    "higashi@ja-Latn"
  ]
}
The syntax rule languageTag is slightly more restrictive than the syntax of a language tag in Turtle but less restrictive than the syntax of a language tag in JSON-LD, which refers to well-formed language tags as defined in BCP 47.

3.2.2 Literal nodes with datatype

A literal node with datatype is encoded by appending a caret (“^”) followed by the datatype’s IRI either explicit IRI or as qName:

  datatypeString = string "^" ( qName / explicitIRI )
{
  "_id": "http://example.org/",
  "dct_modified": [
    "2010-05-29T14:17:39+02:00^xsd_dateDate",
    "2010-05-29^<http://www.w3.org/2001/XMLSchema#date>"
  ]
}
Turtle uses the character sequence “^^” instead of a single “^”.

3.2.3 Simple literals

A simple literal is encoded either as literal node with datatypehttp://www.w3.org/2001/XMLSchema#string” or as string that conforms to the plainLiteral syntax rule. The syntax MUST BE disjoint to the syntax rules languageString and datatypeString and to the syntax rules of IRIs (explicitIRI, IRIlike, qName) and blank nodes (blankNode).

  plainLiteral = string / string "@" ; MUST NOT match any of rules
                                     ; languageString, datatypeString, 
                                     ; explicitIRI, IRIlike, qName
                                     ; blankNode

An at sign (“@”) can always be appended to the node’s string to distinguish from other syntax rules. The at sign MUST be appended if the simple literal ends with an at sign.

aREF string RDF literal (Turtle syntax)
@ ""
*empty string* ""
^xsd_string ""
@@ "@"
@^xsd_string "@"
alice@en "alice"@en
alice@example.com "alice@example.com"
123 "123"
忍者@ja "忍者"@ja
Ninja@en "Ninja"@en
Ninja@en@ "Ninja@en"

3.3 Blank nodes

A blank node is encoded

blankNode      = "_:" 1*( ALPHA / DIGIT )

Within the scope of the same RDF graph, equal blank node identifiers MUST refer to the same blank node. Blank node identifiers SHOULD NOT be shared among different RDF graphs.

In the simplest case, a blank node in aREF can be encoded as an empty map.

_ns:
    foaf: http://xmlns.com/foaf/0.1/
_:alice:
    foaf_knows: _:bob
_:bob:
    foaf_knows:
        _id: _:alice
_ns:
    foaf: http://xmlns.com/foaf/0.1/
_:someone
    foaf_knows:
        foaf_name: "Bob"
The syntax rule blankNode is more restrictive than the rule of blank node identifiers in Turtle and in JSON-LD.

3.4 Graphs

An RDF graph in aREF is encoded as a list-map-structure that is

3.4.1 Subject maps

A subject map is a map with the following constraints:

  1. The subject map MUST NOT contain the key_id”.

  2. The subject map MAY contain the key key_ns”, mapped to a namespace map.

  3. Additional keys, starting with _ and not with _: SHOULD be ignored.

  4. Every other key is either a plain IRI or a qName or a blank node. These keys encode the subjects of RDF triples.

  5. Every value of a key that encodes a subject MUST BE a predicate map that either does not contain the key_id” or maps the keyid” to an encoding of the same subject.

"http://example.org/alice":
    foaf_knows: http://example.org/bob
    _id: http://example.org/alice  # redundant

3.4.2 Predicate maps

A predicate map encodes a set of RDF triples with same subject. The subject is given by context, if the predicate map is part of a subject map, or explicitly with the key_id”, or the subject is a blank node.

A predicate map is a map with the following constraints:

  1. The optional key_id”, if given, MUST be mapped to a plain IRI, a qName, or a blank node.

  2. The optional key_ns”, if given, MUST be mapped to a namespace map.

  3. Additional keys, starting with _ SHOULD be ignored.

  4. Every key, unless it starts with “_”, MUST be either a plain IRI or a qName, or the value “a” that stands for the IRIhttp://www.w3.org/1999/02/22-rdf-syntax-ns#type”. These keys encode [predicates] of triples.

  5. Every value of a key that encodes a predicate MUST BE an encoded object.

{
  "_id": "http://example.org/places#BrewEats",
  "a": [ "http://schema.org/Restaurant", "http://schema.org/Brewery" ]
}

3.4.3 Encoded objects

An encoded object encodes zero or more RDF objects with same subject and same predicate. An encoded object MUST BE one of, or a list of any of the following:

A list as encoded object represents a set of objects, so the order of elements is irrelevant and duplicates SHOULD NOT be included, independent from different encoding forms.

The following encoded objects, expressed in JSON, refer to the same IRI:

  • http://example.org/
  • <http://example.org/>
  • { "_id": "http://example.org/" }
  • [ "http://example.org/" ]
  • [ "<http://example.org/>" ]
  • [ { "_id": "http://example.org/" } ]

3.5 Namespace maps

A namespace map can be specified explicitly with the special key “_ns” in a subject map or in a predicate map. An aREF document MUST NOT contain more than one explicit namespace map.

A namespace map is

Applications MAY further assume an implicit namespace map. Mappings from an implicit namespace map can be overriden by explicit namespace maps. The following implicit namespace map or a superset of it SHOULD be assumed by default:

{
  "rdf":  "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "owl":  "http://www.w3.org/2002/07/owl#",
  "xsd":  "http://www.w3.org/2001/XMLSchema#"
}

TODO: should the default namespace map always precede namespace maps given by namespace map identifier, so applications can always assume they are right?

The following namespace maps are equivalent:

  • "example"
  • { "_": "example" }

A commonly used namespace map is listed at http://www.w3.org/2011/rdfa-context/rdfa-1.1. If the the namespace map identifier http://www.w3.org/2013/json-ld-context/rdfa11 refers to this map, it can be used in aREF as following (examples in YAML):

_ns: http://www.w3.org/2013/json-ld-context/rdfa11

Custom prefixes can be added and existing prefixes redefined like this:

_ns: 
  _: http://www.w3.org/2013/json-ld-context/rdfa11
  dc: http://purl.org/dc/elements/1.1/ # instead of http://purl.org/dc/terms/
  dct: http://purl.org/dc/terms/       # additional prefix

This specification does not include rules how to resolve namespace maps identifiers. The following guidelines are non normative:

  • An URL is expected to refer to a JSON-LD document with a @context element. For instance the default aREF namespace map could be expressed like this:

    {
      "@context": {
        "rdf":  "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
        "owl":  "http://www.w3.org/2002/07/owl#",
        "xsd":  "http://www.w3.org/2001/XMLSchema#"
      }
    }

    Note that JSON-LD context documents for particular ontologies usually define abbreviations for full URIs and/or default vocabularies (@vocab) that cannot be used in aREF documents because a qNames MUST consists of prefix and local name.

  • A string of the form YYYYMMDD is expected to refer to the namespace map defined at this date at http://prefix.cc (see rdfns, available as package librdf-ns-perl in Debian for a related command line tool). For instance the identifier “20140901” maps prefix “fabio” to http://purl.org/spar/fabio/ and the identifier “20120521” maps it to http://purl.org/spar/fabio#.

4 aREF document types

THIS PART OF THE SPEC IS NOT FINISHED YET

Depending on their structure, aREF documents can be classified as circular or non-circular, as flat, as consistent, and as normalized.

An aREF document is circular iff there is at least one path from a subject map to itself by stepping to a next subject maps that is part of an encoded objects of the previous subject map.

A minimal circular aREF document can be created in JavaScript as following:

var aref = { _id: "http://example.org/alice" };
aref.foaf_knows = alice; # alice knows herself

Circular aREF documents cannot be serialized in JSON but in YAML, for instance this normalized circular aREF document:

http://example.org/alice: &alice
    _id: http://example.org/alice
    foaf_knows: &bob    # alice knows bob
http://example.org/bob: &bob
    _id: http://example.org/bob
    foaf_knows: &alice  # bob knows alice

An aREF document is flat iff all of its encoded objects are encoded as strings. All flat aREF documents are non-circular.

The list-map-structure of a flat aREF document can at most be nested in two levels, if it is a subject map and at most one level, if it is a predicate map:

{
  "http://example.org/": {    # first level: predicate map
    "dct_title": [            # second level: list of encoded objects
      "example@en",
      "Beispiel@de"
    ]
  }
}

An aREF document (or its IRIs) is/are consistent iff … same IRI should be encoded the same way (but subtle differences is used as subject, predicate, and object)*

An aREF document is normalized according to a given namespace map if

  1. The document must be a subject map

  2. The document contains no null values or ignored keys

  3. Its IRIs are encoded consistently

  4. All lists have at least two members

  5. what about _ns?

  6. The document is

    • either flat and no predicate map contains the key _id (“normalized form 1)

    • or normalized form 2:

      • all predicate maps must contain the key _id and at least one more predicate key

      • all predicate maps must directly be mapped from a keys in the subject map.

…better names for the two forms…

5 References

5.1 Normative references

5.2 Other references

6 Appendix

6.1 aREF query

This section is non-normative

aREF query is a query language to query string, IRIs, and/or blank nodes from a given IRI or a blank node in an RDF graph. The query language can be used as path language for RDF, similar to XPath for XML.

An aREF query consists a list of qNames, separated by dot (“.”) and optionally followed by:

query  = qName *( "." qName ) [ filter ]
filter = "." | "@" [ languageTag ] | "^" [ qName ]
aREF query expression informal description
foaf_knows.foaf_knows friends of friends
dct_creator. creators unless only given as string
dct_creator@ literal node creators
dct_creator.foaf_name author names
dct_date^xsd_gYear date values of datatype xsd_gYear
skos_prefLabel@en preferred labels in English

6.2 aREF serializations

An aREF document can be expressed both in data structuring languages (JSON, YAML…) and in type systems of programming languages (Python, Ruby, Perl…).

The following examples express the same aREF document in different languages. The RDF graph encoded in aREF can be expressed in Turtle syntax as following:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://example.com/people#alice> a foaf:Person ;
    foaf:name "Alice Smith" ;
    foaf:age 42 ;
    foaf:homepage 
        <http://personal.example.org/~alice/>, 
        <http://work.example.com/asmith/> ;
    foaf:knows [
        foaf:name "John" ;
        dct:description "a nice guy"@en 
    ]
.
Please add your favorite data or programming language at https://github.com/gbv/aREF/issues to be included here!

YAML

The most condensed readable serialization of aREF is probably possible in YAML:

---
_ns: 
    dct: http://purl.org/dc/terms/
    foaf: http://xmlns.com/foaf/0.1/
_id: http://example.com/people#alice
a: foaf_Person
foaf_name: Alice Smith
foaf_age: 42^xsd_integer 
foaf_homepage: 
    - http://personal.example.org/~alice/ 
    - http://work.example.com/asmith/ 
foaf_knows:
    _id: _:1
    foaf_name: John
    dct_description: a nice guy@en

JSON

The same in JSON requires more brackets and delimiters:

{ 
    "_ns": { 
        "dct": "http://purl.org/dc/terms/",
        "foaf": "http://xmlns.com/foaf/0.1/"
    },
    "_id": "http://example.com/people#alice",
    "a": "foaf:Person",
    "foaf_name": "Alice Smisth",
    "foaf_age": "42^xsd_integer",
    "foaf_homepage": [
       "http://personal.example.org/~alice/",
       "http://work.example.com/asmith/" 
    ],
    "foaf_knows": { 
        "_id": "_:1",
        "foaf_name": "John",
        "dct_description": "a nice guy@en" 
    }
}

JavaScript

In JavaScript one can omit quotes around map keys by using underscores for prefixed names:

{ 
    _ns: { 
        dct: 'http://purl.org/dc/terms/',
        foaf: 'http://xmlns.com/foaf/0.1/'
    },
    _id: 'http://example.com/people#alice',
    a: 'foaf:Person',
    foaf_name: 'Alice Smisth',
    foaf_age: '42^xsd_integer',
    foaf_homepage: [
       'http://personal.example.org/~alice/',
       'http://work.example.com/asmith/' 
    ],
    foaf_knows: { 
        _id: '_:1',
        foaf_name: 'John',
        dct_description: 'a nice guy@en' 
    }
}

Perl

Similar rules apply to aREF in Perl:

{
    _ns => {
       dct => 'http://purl.org/dc/terms/',
       foaf => 'http://xmlns.com/foaf/0.1/',
    },
    _id => 'http://example.com/people#alice',
    a   => 'foaf:Person',
    foaf_name => 'Alice Smith',
    foaf_age  => '42^xsd_integer', 
    foaf_homepage => [
        'http://personal.example.org/~alice/',
        'http://work.example.com/asmith/' 
    ],
    foaf_knows => {
        _id => '_:1'
        foaf_name => 'John',
        dct_description => 'a nice guy@en',
    }
}

PHP

Although PHP does not fully differntiate arrays and maps, one can express both. A PHP array is a map unless all PHP array keys are numeric:

[
    "_ns" => [ 
        "dct" => "http://purl.org/dc/terms/",
       "foaf" => "http://xmlns.com/foaf/0.1/"
    ],
    "_id" => "http://example.com/people#alice",
    "a" => "foaf_Person",
    "foaf_name" => "Alice Smith",
    "foaf_age"  => "42^xsd_integer",
    "foaf_homepage" => [
        "http://personal.example.org/~alice/",  /* key "0" */
        "http://work.example.com/asmith/"       /* key "1" */
    ],
    "foaf_knows" => [
        "_id" => "_:1",
        "foaf_name" => "John",
        "dct_description" => "a nice guy@en"
    ]
];