1 Introduction

This document defines an encoding of RDF graphs called another RDF encoding form (aREF). The encoding combines and simpilfies parts of existing RDF serializations Turtle, JSON-LD, and RDF/JSON. In contrast to these formats, RDF data in aREF is not serialized as a Unicode string but encoded as a list-map-structure, as known from the type system of most programming languages and from data structuring languages such as JSON and YAML.

This specification of aREF is hosted in a public git repository at https://github.com/gbv/aREF/, written in in Pandoc’s Markdown and managed with makespec. Please add and comment on issues to this specification at https://github.com/gbv/aREF/issues. The most recent version of this document is made available at http://gbv.github.io/aREF/.

2 Background

2.1 Terminology

Terms written in “bold” refer to terms at the place of their definition in this document. Terms written in “italics” refer to terms defined elsewhere in this document. Uppercase keywords (MUST, MAY, RECOMMENDED, SHOULD…) are used as defined in RFC 2119. Syntax rules in this document are expressed in ABNF notation as specified by RFC 5234.

Examples and notes in this document are informative only. YAML syntax is used to express sample aREF documents, unless noted otherwise.

The following syntax rules are referenced later in this document:

string = *( %x0-%x10FFFF )

LOWERCASE = %x61-%x7A ; a-z

The term string in this document always refers to Unicode strings as defined by Unicode. A string can also be defined with syntax rule string.

Strings SHOULD always be normalized to Normal Form C (NFC). Applications MAY restrict strings by disallowing selected Unciode codepoints, such as the 66 Unicode noncharacters or the set of Unicode characters not expressible in XML.

2.2 RDF data

RDF is a graph-based data structuring languages defined as abstract syntax by Klyne and Carroll (2004). Several RDF variants exist (in particular see Wood, 2013 for a comparision between RDF 1.0 and RDF 1.1). RDF extensions with named graphs, blank nodes as predicates, and literal nodes as subjects are not covered by this specification nor expressible in aREF.

RDF data as encoded by aREF is defined as following:

An RDF graph is a set of triples.
A triple (also known as “statement”) consists of a subject, a predicate, and an object.
A subject is either an IRI or a blank node.
A predicate (also known as “property”) is an IRI.
An object is either an IRI or a blank node or a literal node.
An IRI (Internationalized Resource Identifier) is a string that conforms to the IRI syntax defined in RFC 3987.
A blank node is neither an IRI nor a literal node.
A literal node is a string tagged by either a language tag or by a datatype.
A simple literal is a literal node with datatype http://www.w3.org/2001/XMLSchema#string.
A datatype is an IRI.
A language tag is a well-formed laguage tag as defined in BCP 47.

An RDF graph encoded in aREF can also include blank node identifiers to refer to particular blank nodes within the scope of the same RDF graph.

Ask a Semantic Web or Linked Data evangelist for examples of RDF!

2.3 Lists-map-structures

A list-map-structure is an abstract data structure build of

strings, which are Unicode strings,
lists, which are a sequences of zero or more list-map-structures,
and maps, which are sets of strings (the maps’ keys) and a mapping from these keys to list-map-structures.

Every aREF document MUST be given as map. Applications MAY restrict aREF documents to non-circular list-map-structures. All non-circular list-map-structures can be serialized in JSON and YAML.

Applications MAY support special null values, disjoint from strings, as element in a list and/or mapped to in a map. These null values MUST be ignored on decoding aREF.

See section aREF document types and appendix aREF serializations for examples.

3 Encoding

3.1 IRIs

An IRI in aREF is encoded as string, either as plain IRI, or as explicit IRI, or as qName. The special string “a” can further be used to encode the predicate “http://www.w3.org/1999/02/22-rdf-syntax-ns#type”.

3.1.1 Plain IRIs

A plain IRI is an IRI, as defined in RFC 3987. If used as object, a plain IRIs MUST conform to the syntax rule IRILike to distinguish from a literal node.

  IRIlike = LOWERCASE *( LOWERCASE / DIGIT / "+" / "." / "-" ) ":" [ string ]

3.1.2 Explicit IRIs

An explicit IRI is an IRI enclosed in in angle brackets (“<” and “>”).

  explicitIRI   = "<" IRI ">"   ; IRI syntax rule from RFC 3987

Applications MAY use the syntax rule IRILike instead of IRI to facilitate decoding aREF.

3.1.3 qNames

A qName consists of a prefix and a localName separated by an underscore (“_”):

  qName  = prefix "_" localName

The prefix is a string starting with a lowercase letter (a-z) optionally followed by a sequence of lowercase letters and digits (0-9).

  prefix = LOWERCASE *( LOWERCASE / DIGIT )    ; a-z *( a-z / 0-9 )

The localName is a string that conforms to the following syntax.

  localName     = nameStartChar *(nameChar)

  nameStartChar = ALPHA / "_" / %x00C0-%x00D6 / %x00D8-%x00F6 /
                  %x00F8-%x02FF / %x0370-%x037D / %x037F-%x1FFF / 
                  %x200C-%x200D / %x2070-%x218F / %x2C00-%x2FEF / 
                  %x3001-%xD7FF / %xF900-%xFDCF / %xFDF0-%xFFFD /
                  %x10000-%xEFFFF

  nameChar      = nameStartChar / '-' / DIGIT / %xB7 / %x0300-%x036F / %x203F-%x2040

The syntax rule localName is more restrictive than corresponding definitions in Turtle and JSON-LD.

A qName is mapped to an IRI by appending its localName to the namespace URI that corresponds to its prefix. Applications SHOULD warn about unknown prefixes and/or ignore all triples that include a node with an unknown prefix.

3.2 Literal nodes

A literal node is encoded as string in one of three forms:

  literalNode   = languageString / datatypeString / plainLiteral

3.2.1 Literal nodes with language tag

A literal node with language tag is encoded by appending an at sign (“@”) followed by the language tag to the node’s string:

  languageString = string "@" languageTag

  languageTag    = 2*8(ALPHA) *( "-" 1*8( ALPHA / DIGIT ) )

{
  "_id": "http://example.com/MyResource",
  "skos_prefLabel": [
    "east@en",
    "Osten@de"
    "東@ja",
    "東@ja-Hani",
    "ヒガシ@ja-Kana",
    "higashi@ja-Latn"
  ]
}

The syntax rule languageTag is slightly more restrictive than the syntax of a language tag in Turtle but less restrictive than the syntax of a language tag in JSON-LD, which refers to well-formed language tags as defined in BCP 47.

3.2.2 Literal nodes with datatype

A literal node with datatype is encoded by appending a caret (“^”) followed by the datatype’s IRI either explicit IRI or as qName:

  datatypeString = string "^" ( qName / explicitIRI )

{
  "_id": "http://example.org/",
  "dct_modified": [
    "2010-05-29T14:17:39+02:00^xsd_dateDate",
    "2010-05-29^<http://www.w3.org/2001/XMLSchema#date>"
  ]
}

Turtle uses the character sequence “^^” instead of a single “^”.

3.2.3 Simple literals

A simple literal is encoded either as literal node with datatype “http://www.w3.org/2001/XMLSchema#string” or as string that conforms to the plainLiteral syntax rule. The syntax MUST BE disjoint to the syntax rules languageString and datatypeString and to the syntax rules of IRIs (explicitIRI, IRIlike, qName) and blank nodes (blankNode).

  plainLiteral = string / string "@" ; MUST NOT match any of rules
                                     ; languageString, datatypeString, 
                                     ; explicitIRI, IRIlike, qName
                                     ; blankNode

An at sign (“@”) can always be appended to the node’s string to distinguish from other syntax rules. The at sign MUST be appended if the simple literal ends with an at sign.

aREF string	RDF literal (Turtle syntax)
`@`	`""`
`empty string`	`""`
`^xsd_string`	`""`
`@@`	`"@"`
`@^xsd_string`	`"@"`
`alice@en`	`"alice"@en`
`alice@example.com`	`"alice@example.com"`
`123`	`"123"`
`忍者@ja`	`"忍者"@ja`
`Ninja@en`	`"Ninja"@en`
`Ninja@en@`	`"Ninja@en"`

3.3 Blank nodes

A blank node is encoded

either as predicate map without the key “_id”,
or as blank node identifier, that is a string which starts with “_:”, followed by a alphanumerical label (syntax rule blankNode),
or as predicate map with the key “_id” mapped to blank node identifier.

blankNode      = "_:" 1*( ALPHA / DIGIT )

Within the scope of the same RDF graph, equal blank node identifiers MUST refer to the same blank node. Blank node identifiers SHOULD NOT be shared among different RDF graphs.

In the simplest case, a blank node in aREF can be encoded as an empty map.

_ns:
    foaf: http://xmlns.com/foaf/0.1/
_:alice:
    foaf_knows: _:bob
_:bob:
    foaf_knows:
        _id: _:alice

_ns:
    foaf: http://xmlns.com/foaf/0.1/
_:someone
    foaf_knows:
        foaf_name: "Bob"

The syntax rule blankNode is more restrictive than the rule of blank node identifiers in Turtle and in JSON-LD.

3.4 Graphs

An RDF graph in aREF is encoded as a list-map-structure that is

either a subject map,
or a predicate map that MUST contain the special key “_id”.

3.4.1 Subject maps

A subject map is a map with the following constraints:

The subject map MUST NOT contain the key “_id”.
The subject map MAY contain the key key “_ns”, mapped to a namespace map.
Additional keys, starting with _ and not with _: SHOULD be ignored.
Every other key is either a plain IRI or a qName or a blank node. These keys encode the subjects of RDF triples.
Every value of a key that encodes a subject MUST BE a predicate map that either does not contain the key “_id” or maps the key “id” to an encoding of the same subject.

"http://example.org/alice":
    foaf_knows: http://example.org/bob
    _id: http://example.org/alice  # redundant

3.4.2 Predicate maps

A predicate map encodes a set of RDF triples with same subject. The subject is given by context, if the predicate map is part of a subject map, or explicitly with the key “_id”, or the subject is a blank node.

A predicate map is a map with the following constraints:

The optional key “_id”, if given, MUST be mapped to a plain IRI, a qName, or a blank node.
The optional key “_ns”, if given, MUST be mapped to a namespace map.
Additional keys, starting with _ SHOULD be ignored.
Every key, unless it starts with “_”, MUST be either a plain IRI or a qName, or the value “a” that stands for the IRI “http://www.w3.org/1999/02/22-rdf-syntax-ns#type”. These keys encode [predicates] of triples.
Every value of a key that encodes a predicate MUST BE an encoded object.

{
  "_id": "http://example.org/places#BrewEats",
  "a": [ "http://schema.org/Restaurant", "http://schema.org/Brewery" ]
}

3.4.3 Encoded objects

An encoded object encodes zero or more RDF objects with same subject and same predicate. An encoded object MUST BE one of, or a list of any of the following:

a plain IRI, explicit IRI, or qName to encode an IRI
a string that conforms to syntax rule literalNode to encode a literal node (see literal nodes)
a string that conforms to syntax rule blankNode to encode a blank node (see blank nodes)
a predicate map

A list as encoded object represents a set of objects, so the order of elements is irrelevant and duplicates SHOULD NOT be included, independent from different encoding forms.

The following encoded objects, expressed in JSON, refer to the same IRI:

http://example.org/
<http://example.org/>
{ "_id": "http://example.org/" }
[ "http://example.org/" ]
[ "<http://example.org/>" ]
[ { "_id": "http://example.org/" } ]

3.5 Namespace maps

A namespace map can be specified explicitly with the special key “_ns” in a subject map or in a predicate map. An aREF document MUST NOT contain more than one explicit namespace map.

A namespace map is

either a map in which every key conforms to the prefix syntax rule (see qName) and is mapped to an IRI (syntax rule IRI from RFC 3987). The IRIs in a namespac map are also called namespace URIs. The special key underscore (_) can further be used to refer to another predefined namespace map given by a string. This string is also called namespace map identifier. Mappings explicitly given with namespace URI precedence over mappings refered to by a namespace map identifier.
or a namespace map identifier that refers to a predefined namespace map.

Applications MAY further assume an implicit namespace map. Mappings from an implicit namespace map can be overriden by explicit namespace maps. The following implicit namespace map or a superset of it SHOULD be assumed by default:

{
  "rdf":  "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
  "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
  "owl":  "http://www.w3.org/2002/07/owl#",
  "xsd":  "http://www.w3.org/2001/XMLSchema#"
}

TODO: should the default namespace map always precede namespace maps given by namespace map identifier, so applications can always assume they are right?

The following namespace maps are equivalent:

"example"
{ "_": "example" }

A commonly used namespace map is listed at http://www.w3.org/2011/rdfa-context/rdfa-1.1. If the the namespace map identifier http://www.w3.org/2013/json-ld-context/rdfa11 refers to this map, it can be used in aREF as following (examples in YAML):

_ns: http://www.w3.org/2013/json-ld-context/rdfa11

Custom prefixes can be added and existing prefixes redefined like this:

_ns: 
  _: http://www.w3.org/2013/json-ld-context/rdfa11
  dc: http://purl.org/dc/elements/1.1/ # instead of http://purl.org/dc/terms/
  dct: http://purl.org/dc/terms/       # additional prefix

This specification does not include rules how to resolve namespace maps identifiers. The following guidelines are non normative:

An URL is expected to refer to a JSON-LD document with a @context element. For instance the default aREF namespace map could be expressed like this:
```
{
  "@context": {
    "rdf":  "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
    "rdfs": "http://www.w3.org/2000/01/rdf-schema#",
    "owl":  "http://www.w3.org/2002/07/owl#",
    "xsd":  "http://www.w3.org/2001/XMLSchema#"
  }
}
```
Note that JSON-LD context documents for particular ontologies usually define abbreviations for full URIs and/or default vocabularies (@vocab) that cannot be used in aREF documents because a qNames MUST consists of prefix and local name.
A string of the form YYYYMMDD is expected to refer to the namespace map defined at this date at http://prefix.cc (see rdfns, available as package librdf-ns-perl in Debian for a related command line tool). For instance the identifier “20140901” maps prefix “fabio” to http://purl.org/spar/fabio/ and the identifier “20120521” maps it to http://purl.org/spar/fabio#.

4 aREF document types

THIS PART OF THE SPEC IS NOT FINISHED YET

Depending on their structure, aREF documents can be classified as circular or non-circular, as flat, as consistent, and as normalized.

An aREF document is circular iff there is at least one path from a subject map to itself by stepping to a next subject maps that is part of an encoded objects of the previous subject map.

A minimal circular aREF document can be created in JavaScript as following:

var aref = { _id: "http://example.org/alice" };
aref.foaf_knows = alice; # alice knows herself

Circular aREF documents cannot be serialized in JSON but in YAML, for instance this normalized circular aREF document:

http://example.org/alice: &alice
    _id: http://example.org/alice
    foaf_knows: &bob    # alice knows bob
http://example.org/bob: &bob
    _id: http://example.org/bob
    foaf_knows: &alice  # bob knows alice

An aREF document is flat iff all of its encoded objects are encoded as strings. All flat aREF documents are non-circular.

The list-map-structure of a flat aREF document can at most be nested in two levels, if it is a subject map and at most one level, if it is a predicate map:

{
  "http://example.org/": {    # first level: predicate map
    "dct_title": [            # second level: list of encoded objects
      "example@en",
      "Beispiel@de"
    ]
  }
}

An aREF document (or its IRIs) is/are consistent iff … same IRI should be encoded the same way (but subtle differences is used as subject, predicate, and object)*

…

An aREF document is normalized according to a given namespace map if

The document must be a subject map
The document contains no null values or ignored keys
Its IRIs are encoded consistently
All lists have at least two members
what about _ns?
The document is
- either flat and no predicate map contains the key _id (“normalized form 1)
- or normalized form 2:
  - all predicate maps must contain the key _id and at least one more predicate key
  - all predicate maps must directly be mapped from a keys in the subject map.

…better names for the two forms…

5 References

5.1 Normative references

The Unicode Standard, Version 6.3.0 The Unicode Consortium, 2013. ISBN 978-1-936213-08-5. http://www.unicode.org/versions/Unicode6.3.0/

Martin Düst; Michel Suignard: Internationalized Resource Identifiers (IRIs). RFC3987, January 2005. http://tools.ietf.org/html/rfc3987

A. Phillips; M. Davis: Tags for Identifying Languages. BCP 47, September 2009. http://tools.ietf.org/html/bcp47

Mark Davis; Martin Düst: Unicode Normalization Forms. Unicode Standard Annex #15 http://www.unicode.org/unicode/reports/tr15/

D. Crocker; P. Overell: Augmented BNF for Syntax Specifications: ABNF. RFC5234, 2010. http://tools.ietf.org/html/rfc5234

5.2 Other references

Graham Klyne; Jeremy J. Carroll (editors): Resource Description Framework (RDF): Concepts and Abstract Syntax. W3C Recommendation, 10 February 2004 http://www.w3.org/TR/rdf-concepts/
Eric Prud’hommeaux; Gavin Carothers (editors): Turtle. Terse RDF Triple Language. W3C Candidate Recommendation, 19 February 2013. http://www.w3.org/TR/turtle/
Manu Sporny; Gregg Kellogg; Markus Lanthaler (editors): JSON-LD 1.0. A JSON-based Serialization for Linked Data. W3C Candidate Recommendation, 10 September 2013. http://www.w3.org/TR/json-ld/
Ian Davis; Thomas Steiner; Arnaud J Le Hors (editors): RDF 1.1 JSON Alternate Serialization (RDF/JSON). W3C Working Group Note, 07 November 2013. http://www.w3.org/TR/rdf-json/
David Boom (editor): What’s New in RDF 1.1. W3C Working Draft, 17 December 2013. http://www.w3.org/TR/rdf11-new/
Graham Klyne: Uniform Resource Identifier (URI) Schemes. IANA, 21 October 2013. http://www.iana.org/assignments/uri-schemes/
RDFa Core Initial Context. Vocabulary Prefixes. http://www.w3.org/2011/rdfa-context/rdfa-1.1
Jakob Voß: RDF-aREF. CPAN Perl Module. https://metacpan.org/release/RDF-aREF
Jakob Voß: RDF-NS. CPAN Perl Module. https://metacpan.org/release/RDF-NS
Douglas Crockford: The application/json Media Type for JavaScript Object Notation (JSON). RFC 4627, July 2006. https://tools.ietf.org/html/rfc4627
Oren Ben-Kiki, Clark Evens, Ingy döt Net: YAML Ain’t Markup Language (YAML™) Version 1.2. 1 October 2010. http://yaml.org/spec/1.2/spec.html

6 Appendix

6.1 aREF query

This section is non-normative

aREF query is a query language to query string, IRIs, and/or blank nodes from a given IRI or a blank node in an RDF graph. The query language can be used as path language for RDF, similar to XPath for XML.

An aREF query consists a list of qNames, separated by dot (“.”) and optionally followed by:

either a dot to only query IRIs and blank nodes,
or an at character (“@”) to only query literal nodes,
or an at character followed by a language tag (syntax rule languageTag) to only query literal nodes with this specific language tag,
or a caret (“^”) to only query literal nodes with datatype,
or a caret followed by a qName to only query literal nodes with a specific datatype

query  = qName *( "." qName ) [ filter ]
filter = "." | "@" [ languageTag ] | "^" [ qName ]

aREF query expression	informal description
`foaf_knows.foaf_knows`	friends of friends
`dct_creator.`	creators unless only given as string
`dct_creator@`	literal node creators
`dct_creator.foaf_name`	author names
`dct_date^xsd_gYear`	date values of datatype `xsd_gYear`
`skos_prefLabel@en`	preferred labels in English

6.2 aREF serializations

An aREF document can be expressed both in data structuring languages (JSON, YAML…) and in type systems of programming languages (Python, Ruby, Perl…).

The following examples express the same aREF document in different languages. The RDF graph encoded in aREF can be expressed in Turtle syntax as following:

@prefix dct: <http://purl.org/dc/terms/> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://example.com/people#alice> a foaf:Person ;
    foaf:name "Alice Smith" ;
    foaf:age 42 ;
    foaf:homepage 
        <http://personal.example.org/~alice/>, 
        <http://work.example.com/asmith/> ;
    foaf:knows [
        foaf:name "John" ;
        dct:description "a nice guy"@en 
    ]
.

Please add your favorite data or programming language at https://github.com/gbv/aREF/issues to be included here!

YAML

The most condensed readable serialization of aREF is probably possible in YAML:

---
_ns: 
    dct: http://purl.org/dc/terms/
    foaf: http://xmlns.com/foaf/0.1/
_id: http://example.com/people#alice
a: foaf_Person
foaf_name: Alice Smith
foaf_age: 42^xsd_integer 
foaf_homepage: 
    - http://personal.example.org/~alice/ 
    - http://work.example.com/asmith/ 
foaf_knows:
    _id: _:1
    foaf_name: John
    dct_description: a nice guy@en

JSON

The same in JSON requires more brackets and delimiters:

{ 
    "_ns": { 
        "dct": "http://purl.org/dc/terms/",
        "foaf": "http://xmlns.com/foaf/0.1/"
    },
    "_id": "http://example.com/people#alice",
    "a": "foaf:Person",
    "foaf_name": "Alice Smisth",
    "foaf_age": "42^xsd_integer",
    "foaf_homepage": [
       "http://personal.example.org/~alice/",
       "http://work.example.com/asmith/" 
    ],
    "foaf_knows": { 
        "_id": "_:1",
        "foaf_name": "John",
        "dct_description": "a nice guy@en" 
    }
}

JavaScript

In JavaScript one can omit quotes around map keys by using underscores for prefixed names:

{ 
    _ns: { 
        dct: 'http://purl.org/dc/terms/',
        foaf: 'http://xmlns.com/foaf/0.1/'
    },
    _id: 'http://example.com/people#alice',
    a: 'foaf:Person',
    foaf_name: 'Alice Smisth',
    foaf_age: '42^xsd_integer',
    foaf_homepage: [
       'http://personal.example.org/~alice/',
       'http://work.example.com/asmith/' 
    ],
    foaf_knows: { 
        _id: '_:1',
        foaf_name: 'John',
        dct_description: 'a nice guy@en' 
    }
}

Perl

Similar rules apply to aREF in Perl:

{
    _ns => {
       dct => 'http://purl.org/dc/terms/',
       foaf => 'http://xmlns.com/foaf/0.1/',
    },
    _id => 'http://example.com/people#alice',
    a   => 'foaf:Person',
    foaf_name => 'Alice Smith',
    foaf_age  => '42^xsd_integer', 
    foaf_homepage => [
        'http://personal.example.org/~alice/',
        'http://work.example.com/asmith/' 
    ],
    foaf_knows => {
        _id => '_:1'
        foaf_name => 'John',
        dct_description => 'a nice guy@en',
    }
}

PHP

Although PHP does not fully differntiate arrays and maps, one can express both. A PHP array is a map unless all PHP array keys are numeric:

[
    "_ns" => [ 
        "dct" => "http://purl.org/dc/terms/",
       "foaf" => "http://xmlns.com/foaf/0.1/"
    ],
    "_id" => "http://example.com/people#alice",
    "a" => "foaf_Person",
    "foaf_name" => "Alice Smith",
    "foaf_age"  => "42^xsd_integer",
    "foaf_homepage" => [
        "http://personal.example.org/~alice/",  /* key "0" */
        "http://work.example.com/asmith/"       /* key "1" */
    ],
    "foaf_knows" => [
        "_id" => "_:1",
        "foaf_name" => "John",
        "dct_description" => "a nice guy@en"
    ]
];

Another RDF Encoding Form (aREF)

Jakob Voß (voss@gbv.de)

2014-10-16 (version 0.32)

Table of Contents