Network Working Group J. Voß
Internet-Draft Verbundzentrale des GBV
Intended status: Informational M. Schindler
Expires: May 5, 2018 November 2017

BEACON link dump format
draft-voss-beacon-003

Abstract

This document specifies BEACON, a data interchange format for large numbers of uniform links.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on May 5, 2018.

Copyright Notice

Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

BEACON is a data interchange format for large numbers of uniform links. A BEACON link dump consists of

Link dumps can be serialized in BEACON format (Section 3). BEACON format is a condense, line-oriented text format that utilizes common patterns in links of a link dump form abbreviation. A link dump serialized in BEACON format is also referred to as BEACON file.

Link dumps can further be mapped to RDF graphs with minor limitations (Section 5).

The non-normative appendix contain a mapping of BEACON links to HTML (Appendix B) and a serialization of link dumps based on XML (Appendix C).

The current specification is managed at https://github.com/gbv/beaconspec.

1.1. Notational conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

The formal grammar rules in this document are to be interpreted as described in [RFC5234], including the ABNF core rules HTAB, LF, CR, and SP. In addition, the minus operator (-) is used to exclude line breaks and vertical bars from the rules LINE and TOKEN in Section 3.1.

Samples of RDF graphs in this document are expressed in Turtle syntax [TURTLE].

1.2. Examples

The simplest form of a BEACON file contains full URL links separated by one or two vertical bars:

http://example.com/people/alice|http://example.com/documents/23.about
http://example.com/people/bob||http://example.com/documents/42.about

The first element of a link is called source identifier and the second is called target identifier. In most cases these identifiers are URLs or URIs. If a target identifier does not start with http: or https:, two vertical bars MUST be used:

http://example.com/people/alice||urn:isbn:0123456789

Source identifier and target identifier can be abbreviated with the meta fields PREFIX and TARGET, respectively. A simple BEACON file with such abbreviations can look like this:

#FORMAT: BEACON
#PREFIX: http://example.org/id/
#TARGET: http://example.com/about/

12345
6789||abc

In this examples the following two links are encoded:

http://example.org/id/12345|http://example.com/about/12345
http://example.org/id/6789|http://example.com/about/abc

Links can further be extended by link annotation and relation type.

2. Basic concepts

2.1. Links

A link in a link dump is a directed, typed connection between two resources, optionally enriched by an annotation. A link is compromised of four elements:

Each elements MUST be whitespace-normalized Unicode strings (Section 2.3) that conforms to the TOKEN grammar rule given in Section 3.1. All elements except link annotation MUST NOT be empty strings.

Source identifier and target identifier define where a link is pointing from and to respectively. Relation type is an identifier that indicates the meaning of a link. All these identifiers SHOULD be URIs [RFC3986]. A link annotation can be used to further describe a link or parts of it.

All links in a link dump share either a common relation type or a common link annotation, or both. This uniformity is used to abbreviate links in BEACON format (Section 3).

The set that all source identifiers in a link dump originate from is called the source dataset and the set that all target identifiers originate from is called the target dataset.

2.2. Allowed characters

The set of allowed Unicode characters in BEACON dumps is the set of valid Unicode characters from UCS which can also be expressed in XML 1.0, excluding some discouraged control characters:

 CHAR        =  WHITESPACE / %x21-7E / %xA0-D7FF / %xE000-FFFD
             /  %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
             /  %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
             /  %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
             /  %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
             /  %xD0000-DFFFD / %xE0000-EFFFD / %xF0000-FFFFD
             /  %x10000-10FFFD

Applications SHOULD exclude disallowed characters by stripping them, by replacing them with the replacement character U+FFFD, or by refusing to process. Applications SHOULD also apply Unicode Normalization Form Canonical Composition (NFKC) to all strings.

2.3. Whitespace normalization

A Unicode string is whitespace-normalized according to this specification, by stripping leading and trailing whitespace and by replacing all WHITESPACE character sequences by a single space (SP).

 WHITESPACE  =  1*( CR | LF | SPACE )

 SPACE       =  HTAB | SP

2.4. URI patterns

A URI pattern in this specification is a URI Template, as defined in [RFC6570], with all template expressions being either {ID} for simple string expansion or {+ID} for reserved expansion. URI patterns are used in link construction (Section 3.2) to expand link tokens to full identifiers (usually URIs).

A URI pattern is allowed to contain the broader set of characters allowed in Internationalized Resource Identifiers (IRI) [RFC3987]. The URI constructed from a URI pattern by template processing can be transformed to a IRI by following the process defined in Section 3.2 of [RFC3987].

For instance the URI pattern http://example.org/?id={ID} is expanded to:

 ID variable      Expanded

  Hello World!     http://example.org/?id=Hello%20World%21
  x/?a=1&b=2       http://example.org/?id=x%2F%3Fa%3D1%26b%3D2
  M%C3%BCller      http://example.org/?id=M%25C3%25BCller

And the URI pattern http://example.org/{+ID} is expanded to:

 ID variable      Expanded

  Hello World!     http://example.org/Hello%20World!
  x/?a=1&b=2       http://example.org/x/?a=1&b=2
  M%C3%BCller      http://example.org/M%25C3%25BCller

3. BEACON format

3.1. BEACON files

A BEACON file is a UTF-8 encoded Unicode file [RFC3629]. The file MAY begin with a Unicode Byte Order Mark and it SHOULD end with a line break. The rest of the file consists of four parts in this order:

  1. a format indicator (#FORMAT: BEACON)
  2. a list of meta lines
  3. a list of empty lines
  4. a list of link lines

All four parts are optional but RECOMMENDED. The order of meta lines and of link lines, respectively, is irrelevant.

 LINE       =  *CHAR - ( *CHAR LINEBREAK *CHAR )

 TOKEN      =  *CHAR - ( *CHAR ( LINEBREAK / VBAR ) *CHAR )

 LINEBREAK  =  LF | CR LF | CR   ; "\n", "\r\n", or "\r"

 VBAR       =  %x7C              ; vertical bar ("|")

 SEPARATOR   =  ":" *SPACE / +SPACE

 BEACONFILE  =  [ %xEF.BB.BF ]        ; Unicode UTF-8 Byte Order Mark
                [ "#FORMAT" SEPARATOR "BEACON" *SPACE LINEBREAK ]
                *( METALINE LINEBREAK )
                *( *SPACE LINEBREAK ) ; empty lines
                 LINKLINE *( LINEBREAK LINKLINE )
                [ LINEBREAK ]

A meta line specifies a meta field (Section 4) and its value, separated by colon and/or tabulator or space:

 METALINE    =  "#" METAFIELD SEPARATOR METAVALUE

 METAFIELD   =  +( %x41-5A )   ;  "A" to "Z"

 METAVALUE   =  LINE

If a BEACON file contains multiple meta lines with same field name, all but one of these lines MUST be ignored. Applications SHOULD emit a warning for multiple meta lines with same field name.

Each link is given on a link line with its source token, optionally follwed by annotation token and target token. These link elements are used for (Section 3.2) unless the source token consists of whitespace only. If no empty line is given, the first link line MUST NOT begin with #.

 LINKLINE    =  SOURCE /
                SOURCE VBAR TARGET /
                SOURCE VBAR ANNOTATION /
                SOURCE VBAR ANNOTATION VBAR TARGET

 SOURCE      =  TOKEN

 TARGET      =  TOKEN

 ANNOTATION  =  TOKEN

The ambiguity of rule LINKLINE with one VBAR is resolved is following:

This way one can use two forms to encode links to HTTP URIs (given target meta field and message meta field with their default values):

foo|http://example.org/bar
foo||http://example.org/bar

Applications MAY accept link lines with more than two vertical bars but they MUST ignore additional content between a third vertical bar and the end of the line.

3.2. Link construction

Link elements in BEACON format are given in abbreviated form with link tokens. Each link is constructed based on meta fields for link construction (Section 4.1) and from

All tokens MUST be whitespace-normalized before further processing. The link elements are then constructed as following (see Section 2.4 for how to construct values from URI patterns):

The following table illustrates construction of a link:

 meta field  +  link token  -->  link element
---------------------------------------------------
 PREFIX      |  source       |   source identifier
 TARGET      |  target       |   target identifier
 MESSAGE     |  annotation   |   link annotation
 RELATION    |  annotation   |   relation type

Constructed source identifier, target identifier, and relation types SHOULD be syntactically valid URIs. Applications MAY ignore links with invalid URIs and SHOULD emit a warning.

Applications MUST NOT differentiate between equal links constructed from different abbreviations. For instance the following BEACON file contains a single link:

 #PREFIX: http://example.org/
 #TARGET: http://example.com/
 #MESSAGE: Hello World!

 foo

The same link could also be serialized without any meta fields:

 http://example.org/foo|Hello World!|http://example.com/foo

The default meta fields values can be specified as:

 #PREFIX: {+ID}
 #TARGET: {+ID}
 #RELATION: http://www.w3.org/2000/01/rdf-schema#seeAlso

Multiple occurrences of equal links in one BEACON file SHOULD be ignored. It is RECOMMENDED to indicate duplicated links with a warning.

3.3. MIME type

The RECOMMENDED MIME type of BEACON files is "text/plain". The file extension .txt SHOULD be used when storing BEACON files.

4. Meta fields

A link dump SHOULD contain a set of meta fields, each identified by its name build of uppercase letters A-Z. Relevant meta fields for link construction (Section 4.1), for description of the link dump (Section 4.2), and for description of source dataset and target dataset (Section 4.3) are defined in the following.

A link dump can only contain one value for each meta field. Additional meta fields, not defined in this specification, SHOULD be ignored.

All meta field values MUST be whitespace-normalized. Missing meta field values and empty strings MUST be set to the field’s default value, which is the empty string unless noted otherwise. The following diagram shows which meta fields belong to which dataset.

                    +-----------------------+
                    | link dump             |
                    |                       |
                    |  * DESCRIPTION        |
                    |  * CREATOR            |
                    |  * CONTACT            |
                    |  * HOMEPAGE           |
                    |  * FEED               |
                    |  * TIMESTAMP          |
                    |  * UPDATE             |
                    |                       |
                    | +-------------------+ |
                    | | link construction | |
                    | |                   | |   +-----------------+
+----------------+  | |  * PREFIX         | |   | target dataset  |
| source dataset | ---|  * TARGET         |---> |                 |
|                |  | |  * RELATION       | |   |  * TARGETSET    |
|                | ---|  * MESSAGE        |---> |  * NAME         |
|  * SOURCESET   |  | |  * ANNOTATION     | |   |  * INSTITUTION  |
|                | ---|                   |---> |                 |
+----------------+  | +-------------------+ |   +-----------------+
                    +-----------------------+

Examples of meta fields are included in Section 5.

4.1. Meta fields for link construction

The following meta fields define how to construct links from link tokens (Section 3.2). See Section 5.5 for mapping of these fields to RDF.

4.1.1. PREFIX

The PREFIX meta field specifies a URI pattern (Section 2.4) to construct source identfiers. If the non-empty field value contains no URI pattern, the expression {ID} is appended.

The default value is {+ID}.

The name PREFIX was choosen to keep backwards compatibility with existing BEACON files.

4.1.2. TARGET

The TARGET meta field specifies a URI pattern (Section 2.4) to construct target identifiers. If the non-empty field value field contains no URI pattern, the expression {ID} is appended.

The default value is {+ID}.

4.1.3. MESSAGE

The MESSAGE meta field specifies a default value for link annotations.

4.1.4. RELATION

The RELATION meta field specifies relation types of links. The field value MUST be either a URI as defined in [RFC3986] or a URI pattern as described in Section 2.4.

The default value is http://www.w3.org/2000/01/rdf-schema#seeAlso.

4.1.5. ANNOTATION

The ANNOTATION field can be used to specify the meaning of link annotations in a link dump. The field value MUST be a URI.

4.2. Meta fields for link dumps

Meta fields for link dumps describe the link dump as whole. See Section 5.6 for mapping of these fields to RDF.

4.2.1. DESCRIPTION

The DESCRIPTION meta field contains a human readable description of the link dump.

4.2.2. CREATOR

The CREATOR meta field contains the URI or the name of the person, organization, or a service primarily responsible for making the link dump. The field SHOULD NOT contain a simple URL, unless this URL is also used as URI.

4.2.3. CONTACT

The CONTACT meta field contains an email address or similar contact information to reach the creator of the link dump. The field value SHOULD be an individual mailbox address as specified in section 3.4 of [RFC5322].

4.2.4. HOMEPAGE

The HOMEPAGE meta field contains a URL of a website with additional information about this link dump. Note that this field does not specify the homepage of the target dataset.

4.2.5. FEED

The FEED meta field contains a URL, where to download the link dump from.

4.2.6. TIMESTAMP

The TIMESTAMP field contains the date of last modification of the link dump. Note that this value MAY be different to the last modification time of a BEACON file that serializes the link dump. The timestamp value MUST conform to the full-date or to the date-time production rule in [RFC3339]. In addition, an uppercase T character MUST be used to separate date and time, and an uppercase Z character MUST be present in the absence of a numeric time zone offset.

4.2.7. UPDATE

The UPDATE field specifies how frequently the link dump is likely to change. The field corresponds to the <changefreq> element in Sitemaps XML format [Sitemaps]. Valid values are:

The value always SHOULD be used to describe link dumps that change each time they are accessed. The value never SHOULD be used to describe archived link dumps.

4.3. Meta fields for datasets

Dataset meta fields contain properties of the source dataset or target dataset, respectively. See Section 5.7 for for mapping of these fields to RDF.

4.3.1. SOURCESET

The SOURCESET meta field contains the URI of the source dataset.

4.3.2. TARGETSET

The TARGETSET meta field contains the URI of the target dataset.

4.3.3. NAME

The NAME meta field contains a name or title of the target dataset.

4.3.4. INSTITUTION

The INSTITUTION meta field contains the name or HTTP URI of the organization or of an individual responsible for making available the target dataset.

5. Mapping to RDF

A link dump can be mapped to an RDF graph as described in this section. The mapping excludes all links with one of source identifier, target identifier, relation type not being a valid URI.

All URIs MUST be transformed to IRIs as defined in Section 3.2 of [RFC3987].

Examples of link dumps mapped to RDF are given in Appendix D.

5.1. Naming conventions

The following namespace prefixes are used to refer to RDF properties and classes from the RDFS vocabulary [RDF], the DCMI Metadata Terms [DCTERMS], the FOAF vocabulary [FOAF], the VoID vocabulary [VOID], and the Hydra Core Vocabulary [Hydra], the RSS 1.0 Syndication Module [RSSSYND]:

 rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
 dcterms: <http://purl.org/dc/terms/>
 foaf:    <http://xmlns.com/foaf/0.1/>
 void:    <http://rdfs.org/ns/void#>
 hydra:   <http://www.w3.org/ns/hydra/core#>
 rssynd:  <http://web.resource.org/rss/1.0/modules/syndication/>

The blank node :dump denotes the the link dump, the blank node :sourceset denotes the the source dataset, and the blank node :targetset denotes the the target dataset. Source datatset and target datatset can also be given an absolute IRI with meta fields SOURCESET and TARGETSET, respectively (Section 4.3).

5.2. Default triples

The following RDF triples can always be assumed when mapping link dumps to RDF:

 :dump a void:Linkset, hydra:Collection ;
     void:subjectsTarget :sourceset ;
     void:objectsTarget :targetset .

 :sourceset a void:Dataset .
 :targetset a void:Dataset .

All publically available BEACON data dumps SHOULD be Open Data, so the following triple MAY be assumed as well:

:dump <http://creativecommons.org/ns#license>
    <http://creativecommons.org/publicdomain/zero/1.0/> .

5.3. Links in RDF

Links (Section 2.1) with source identifier, target identifier, and relation type being valid URIs can be mapped to at least one RDF triple with:

The total number of mappable links in a link dump SHOULD result in two additional RDF triples whith COUNT being the number of links:

 :dump hydra:totalItems COUNT .
 :dump void:entities COUNT .

5.4. Link annotations in RDF

Each non-empty link annotation SHOULD result in an additional RDF triple with:

Applications MAY use a predefined IRI as link annotation or process the link annotation by other means, for instance for provenience and versioning of links. Applications MAY assign a default language tag or datatype to all literal objects derived from link annotations.

Typical use cases of link annotations include specification of labels and a "number of hits" at the target dataset. For instance the following file in BEACON format (Section 3):

 #PREFIX: http://example.org/
 #TARGET: http://example.com/
 #RELATION: http://xmlns.com/foaf/0.1/primaryTopic
 #ANNOTATION: http://purl.org/dc/terms/extent

 abc|12|xy

can be mapped to

 <http://example.org/abc> foaf:primaryTopic <http://example.com/xy> .
 <http://example.com/xy> dcterms:extent "12" .

The total number of mappable links and link annotations in a link dump SHOULD result in an additional RDF triple whith TRIPLES being the sum of both numbers:

 :dump void:triples TRIPLES .

5.5. Meta fields for link construction in RDF

All meta fields for link construction (Section 4.1) except for MESSAGE can be mapped to RDF triples.

The PREFIX meta field (Section 4.1.1) SHOULD be mapped to the RDF property void:uriSpace or void:uriRegexPattern with :sourceset as RDF subject.

The TARGET meta field (Section 4.1.2 SHOULD be mapped to the RDF property void:uriSpace or void:uriRegexPattern with :targetset as RDF subject.

The RELATION meta field (Section 4.1.4), if its value contains an URI, SHOULD mapped to the RDF property void:linkPredicate with :dump as RDF subject.

The ANNOTATION meta field (Section 4.1.5) is used to map link annotations to RDF (Section 5.4) unless the RELATION meta field contains an URI template.

5.6. Meta fields for link dumps in RDF

Meta fields for link dumps (Section 4.2) describe properties of the link dump.

The DESCRIPTION meta field (Section 4.2.1) corresponds to the dcterms:description RDF property. For instance

#DESCRIPTION: Mapping from ids to documents

can be mapped to

:dump dcterms:description "Mapping from ids to documents" .

The CREATOR meta field (Section 4.2.2) corresponds to the dcterms:creator RDF property. The RDF object SHOULD NOT be a literal node. For instance

#CREATOR: Bea Beacon

can be mapped to

:dump dcterms:creator [ foaf:name "Bea Beacon" ] .

A field value starting with http:// or https:// is interpreted as URI instead of string. For instance

#CREATOR: http://example.org/people/bea

can be mapped to

:dump dcterms:creator <http://example.org/people/bea> .

The CONTACT meta field (Section 4.2.3) corresponds to the foaf:mbox RDF property. The RDF object SHOULD NOT be a literal node. For instance

 #CONTACT: admin@example.com

can be mapped to

 :dump dcterms:creator [ foaf:mbox <mailto:admin@example.com> ] .

and

 #CONTACT: Bea Beacon <bea@example.org>

can be mapped to

 :dump dcterms:creator [
     foaf:name "Bea Beacon" ;
     foaf:mbox <mailto:bea@example.org>
 ] .

The HOMEPAGE meta field (Section 4.2.4) corresponds to the foaf:homepage RDF property. For instance

#HOMEPAGE: http://example.org/about.html

can be mapped to

:dump foaf:homepage <http://example.org/about.html> .

The FEED meta field (Section 4.2.5) corresponds to the void:dataDump RDF property. For instance

#FEED: http://example.com/beacon.txt

can be mapped to

:dump void:dataDump <http://example.com/beacon.txt> .

The TIMESTAMP meta field (Section 4.2.6) corresponds to the dcterms:modified RDF property. For instance the following valid timestamps

 #TIMESTAMP: 2012-05-30
 #TIMESTAMP: 2012-05-30T15:17:36+02:00
 #TIMESTAMP: 2012-05-30T13:17:36Z

can be mapped to the following RDF triples, respectively:

 :dump dcterms:modified "2012-05-30"^^xsd:date
 :dump dcterms:modified "2012-05-30T15:17:36+02:00"^^xsd:dateTime
 :dump dcterms:modified "2012-05-30T13:17:36Z"^^xsd:dateTime

The UPDATE meta field (Section 4.2.7) corresponds to the rssynd:updatePeriod RDF property. For instance a daily update

#UPDATE: daily

can be mapped to

:dump rssynd:updatePeriod "daily" .

5.7. Meta fields for datasets in RDF

Meta fields for the datasets (Section 4.3) are mapped to subjects and objects of RDF triples to describe the source dataset and target dataset, respectively.

The SOURCESET meta field (Section 4.3.1) replaces the blank node :sourceset.

The TARGETSET meta field (Section 4.3.1) replaces the blank node :targetset.

The NAME meta field (Section 4.3.3) is mapped to the RDF property dcterms:title with :targetset as RDF subject. For instance the field value "Wikipedia", expressible in BEACON format as

#NAME: Wikipedia

can be mapped to

:targetset dcterms:title "Wikipedia" .

The INSTITUTION meta field (Section 4.3.4) corresponds to the RDF property dcterms:publisher. The RDF object SHOULD NOT be a literal node. For instance

#INSTITUTION: Wikimedia Foundation

can be mapped

:dump dcterms:publisher [ foaf:name "Wikimedia" ] .

A field value starting with http:// or https:// is interpreted as URI instead of string. For instance

#INSTITUTION: http://viaf.org/viaf/137022054/

can be mapped to

:targetset dcterms:publisher http://viaf.org/viaf/137022054/ .

5.8. Limitations and applications

BEACON format (Section 3) can be used as serialization format for RDF graphs where all parts of RDF triples are IRIs and IRIs do not contain the character sequences %7C, %0A, %0D, or any other percent-encoded character not included in the list of allowed characters (Section 2.2). This limitation applies because the disallowed character sequences would need to result from characters not allowed in link tokens of BEACON format.

BEACON link dumps can be served for instance as Triple Pattern Fragments [TPF] which also consist of a set of links sharing a common pattern, and additional metadata.

6. Security Considerations

Programs should be prepared for malformed and malicious content when parsing BEACON files, when constructing links from link tokens, and when mapping links to RDF or HTML. Possible attacks of parsing contain broken UTF-8 and buffer overflows. Link construction can result in unexpectedly long strings and character sequences that may be harmless when analyzed as parts. Most notably, BEACON data may store strings containing HTML and JavaScript code to be used for cross-site scripting attacks on the site displaying BEACON links. Applications should therefore escape or filter accordingly all content with established libraries, such as Apache Escape Utils.

7. References

7.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008.
[RFC5322] Resnick, P., "Internet Message Format", RFC 5322, October 2008.
[RFC6570] Gregorio, J., Fielding, R., Hadley, M., Nottingham, M. and D. Orchard, "URI Template", RFC 6570, March 2012.
[Unicode] The Unicode Consortium, , "The Unicode Standard"
[DCTERMS] DCMI Usage Board, , "DCMI Metadata Terms", Oct 2010.

7.2. Informative References

[FOAF] Brickley, D. and L. Miller, "FOAF Vocabulary Specification", Aug 2010.
[RFC5013] Kunze, J. and T. Baker, "The Dublin Core Metadata Element Set", RFC 5013, August 2007.
[Sitemaps] Google Inc., , "Sitemaps XML format", February 2008.
[SAX] Bradner, D., "SAX 1.0: The Simple API for XML", May 1998.
[TURTLE] Beckett, D. and T. Berners-Lee, "Turtle - Terse RDF Triple Language", Mar 2011.
[RELAX-NGC] Clark, J., "RELAX NG Compact Syntax", Nov 2002.
[RSSSYND] RSS-DEV Working Group, , "RDF Site Summary 1.0 Modules: Syndication", Dec 2000.
[RDF] Hayes, P., "RDF Semantics", Feb 2004.
[VOID] Cyganiak, R., Zhao, J., Alexander, K. and M. Hausenblas, "Vocabulary of Interlinked Datasets (VoID)", Mar 2011.
[TPF] Verborgh, R., "Triple Pattern Fragments", Jan 2017.
[Hydra] Lanthaler, M., "Hydra Core Vocabulary", Oct 2017.

Appendix A. Glossary

BEACON
a data interchange format as specified in this document.
BEACON file
a link dump serialized in BEACON format.
BEACON format
a condense format to serialize link dumps as specified in this document.
link
a source identifier, target identifier, relation type, and (optional) link annotation. Given in form of link tokens in BEACON format to construct links from.
link annotation
an additional description of a link given as non-empty Unicode string.
link dump
a set of links and meta fields.
link token
a Unicode string in BEACON format used to construct a link.
meta field
a property to describe a link dump, a source database, a target database, or how to construct links from BEACON format.
source identifier
identifier where a link points from.
target identifier
identifier where a link points to.
source database
the set (or superset) of all source URIs in a link dump.
target database
the set (or superset) of all target URIs in a link dump.
relation type
the type of connection between target identifier and source identifier.

Appendix B. Mapping BEACON to HTML

An important use-case of BEACON is the creation of HTML links to related documents. A link in a BEACON dump can be mapped to a HTML link (<a> element) as following:

For instance the following link, given in a BEACON file:

 http://example.com|example|http://example.org

can be mapped to the following HTML link:

 <a href="http://example.org">example</a>

Note that the link annotation is optional. Additional meta fields can be used to construct appropriate HTML links. For instance the meta fields

 #RELATION: http://xmlns.com/foaf/0.1/isPrimaryTopicOf
 #SOURCETYPE: http://xmlns.com/foaf/0.1/Person
 #NAME: ACME documents

can be used to create a link such as

 <span>
   More information about this person
   <a href="http://example.com/foo">at ACME documents</a>.
 </span>

because http://xmlns.com/foaf/0.1/isPrimaryTopicOf translates to "more information about", http://xmlns.com/foaf/0.1/Person translates to "this person", and the target dataset’s name "ACME documents" can be used as link label.

Appendix C. BEACON XML format

A BEACON XML file is a valid XML file conforming to the following schema. The file SHOULD be encoded in UTF-8 [RFC3629]. The file MUST:

The file MAY further:

All attributes MUST be given in lowercase.

To process BEACON XML files, a complete and stream-processing XML parser, for instance the Simple API for XML [SAX], is RECOMMENDED, in favor of parsing with regular expressions or similar methods prone to errors. Additional XML attributes of <link> elements and <link> elements without source attribute SHOULD be ignored.

Note that in contrast to BEACON text files, link tokens MAY include line breaks, which MUST BE removed by whitespace normalization. Furthermore id field, annotation field and target token MAY include a vertical bar, which MUST be replaced by the character sequence %7C before further processing.

A schema of BEACON XML format in RELAX NG Compact syntax [RELAX-NGC] can be given as following:

default namespace = "http://purl.org/net/beacon"

element beacon {
  attribute prefix      { text }.
  attribute target      { text },
  attribute message     { text },
  attribute source      { text },
  attribute name        { text },
  attribute institution { text },
  attribute description { text },
  attribute creator     { text },
  attribute contact     { text },
  attribute homepage    { xsd:anyURI },
  attribute feed        { xsd:anyURI },
  attribute timestamp   { text },
  attribute update { "always" | "hourly" | "daily"
    | "weekly" | "monthly" | "yearly" | "never" },
  attribute relation    { xsd:anyURI },
  attribute annotation  { xsd:anyURI },
  element link {
    attribute source     { text },
    attribute target     { text }?,
    attribute annotation { text }?,
    empty
  }*
}

Appendix D. Mapping examples

A short example of a link dump serialized in BEACON text format:

#FORMAT: BEACON
#PREFIX: http://example.org/
#TARGET: http://example.com/
#NAME:   ACME document

alice||foo
bob
ada|bar

The link dump can be mapped to RDF as following:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix hydra: <http://www.w3.org/ns/hydra/core#> .
@prefix void: <http://rdfs.org/ns/void#> .

:sourceset a void:Dataset ;
    void:uriSpace "http://example.org/" .

:targetset a void:Dataset ;
    void:uriSpace "http://example.com/" .

:dump a void:Linkset, hydra:Collection ;
    void:subjectsTarget :sourceset ;
    void:objectsTarget :targetset ;
    void:linkPredicate rdfs:seeAlso ;
    hydra:totalItems 3 ;
    void:entities 3 ;
    void:triples 4 .

<http://example.org/alice>
  rdfs:seeAlso <http://example.com/foo> .
<http://example.org/bob>
  rdfs:seeAlso <http://example.com/bob> .
<http://example.org/ada>
  rdfs:seeAlso <http://example.com/ada> .
<http://example.com/ada>
  rdfs:value "bar" .

The same link dump serialized in BEACON XML format (Appendix C):

<?xml version="1.0" encoding="UTF-8"?>
<beacon xmlns="http://purl.org/net/beacon"
        prefix="http://example.org/"
        target="http://example.com/"
        name="ACME document">
   <link source="alice" target="foo" />
   <link source="bob" />
   <link source="ada" annotation="bar" />
</beacon>

To give an extended example, the "ACME" company wants to provide links from documents to people that contributed to each document. A list of documents is available from http://example.com/documents/ and a list of people, titled "ACME staff", is available from http://example.com/people/.

This information can be expressed in a serialized link dump with BEACON meta fields as following:

#FORMAT: BEACON
#INSTITUTION: ACME
#RELATION: http://purl.org/dc/elements/1.1/contributor
#SOURCESET: http://example.com/documents/
#TARGETSET: http://example.com/people/
#NAME: ACME staff

Both source identifiers for people and target identifiers for documents follow a pattern, so links can be abbreviated as following:

#PREFIX: http://example.com/documents/
#TARGET: http://example.com/people/{+ID}.about

23||alice
42||bob

From this form the following links can be constructed:

http://example.com/documents/23|http://example.com/people/alice.about
http://example.com/documents/42|http://example.com/people/bob.about

The example can be extended by addition of a third element for each link. For instance the annotation could be used to specifcy the date of each document:

#ANNOTATION: http://purl.org/dc/elements/1.1/date

23|2017-11-28|alice
42|2017-01-31|bob

This link dump can be mapped to RDF as following:

@prefix void:    <http://rdfs.org/ns/void#> .
@prefix hydra:   <http://www.w3.org/ns/hydra/core#> .
@prefix dcterms: <http://purl.org/dc/terms/> .
@prefix dc:      <http://purl.org/dc/elements/1.1/> .

:dump a void:Linkset, hydra:Collection ;
    void:subjectsTarget <http://example.com/documents/> ;
    void:objectsTarget <http://example.com/people/> ;
    void:linkPredicate dc:contributor ;
    hydra:totalItems 2 ;
    void:entities 2 ;
    void:triples 4 .

<http://example.com/documents/> a void:Dataset ;
    void:uriSpace "http://example.com/documents/" .

<http://example.com/people/> a void:Dataset ;
    dcterms:publisher "ACME" ;
    dcterms:title "ACME staff" ;
    void:uriSpace "http://example.com/people/" ;
    void:uriRegexPattern
      "^http://example\\.com/people/(.+)\\.about$" .

<http://example.com/documents/23>
    dc:contributor <http://example.com/people/alice.about> ;
    dc:date "2017-11-28" .

<http://example.com/documents/42>
    dc:contributor <http://example.com/people/bob.about> ;
    dc:date "2017-01-31" .

Authors' Addresses

Jakob Voß Verbundzentrale des GBV Platz der Göttinger Sieben 1 Göttingen, 37073 Germany Phone: +49(551)39-10242 EMail: voss@gbv.de
Mathias Schindler Bundestagsbüro Julia Reda, MdEP, Platz der Republik 1 Berlin, 11011 Germany EMail: info@mathias-schindler.de

Table of Contents