Network Working Group J. Voß
Internet-Draft Verbundzentrale des GBV
Intended status: Informational M. Schindler
Expires: January 7, 2015 Wikimedia Deutschland e.V.
July 6, 2014

BEACON link dump format
draft-voss-beacon-001

Abstract

This document specifies BEACON, a data interchange format for large numbers of uniform links.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on January 7, 2015.

Copyright Notice

Copyright (c) 2014 IETF Trust and the persons identified as the document authors. All rights reserved.

This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.

1. Introduction

BEACON is a data interchange format for large numbers of uniform links. A BEACON link dump consists of:

Each link consists of a source identifier, a target identifier, and an optional annotation. Common patterns in these elements can be used to abbreviate serializations of link dumps. This specification defines:

The current specification is managed at https://github.com/gbv/beaconspec.

1.1. Examples

The simplest form of a BEACON file contains full URL links separated by a vertical bar:

http://example.com/people/alice|http://example.com/documents/23.about
http://example.com/people/bob|http://example.com/documents/42.about

The first element of a link is called source identifier and the second is called target identifier. In most cases these identifiers are URLs or URIs. If a target identifier does not start with http or https, two vertical bars MUST be used:

http://example.com/people/alice||urn:isbn:0123456789

Source and target identifier can be abbreviated with the meta fields PREFIX and TARGET, respectively. A simple BEACON file with such abbreviations can look like this:

#FORMAT: BEACON
#PREFIX: http://example.org/id/
#TARGET: http://example.com/about/

12345
6789||abc

In this examples the following two links are encoded:

http://example.org/id/12345|http://example.com/about/12345
http://example.org/id/6789|http://example.com/about/abc

1.2. Notational conventions

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

The formal grammar rules in this document are to be interpreted as described in [RFC5234], including the ABNF core rules HTAB, LF, CR, and SP. In addition, the minus operator (-) is used to exclude line breaks and vertical bars from the rules LINE and TOKEN:

 LINE       =  *CHAR - ( *CHAR LINEBREAK *CHAR )

 TOKEN      =  *CHAR - ( *CHAR ( LINEBREAK / VBAR ) *CHAR )

 LINEBREAK  =  LF | CR LF | CR   ; "\n", "\r\n", or "\r"

 VBAR       =  %x7C              ; vertical bar ("|")

Samples of RDF in this document are expressed in Turtle syntax [TURTLE].

2. Basic concepts

2.1. Links

A link in a link dump is a directed connection between two resources, optionally enriched by an annotation. A link is compromised of three elements:

All elements MUST be whitespace-normalized (Section 2.3) Unicode strings that MUST NOT contain a VBAR character. Source identifier and target identifier define where a link is pointing from and to respectively. The identifiers MUST NOT be empty strings and they SHOULD be URIs [RFC3986]. The annotation can optionally be used to further describe the link or parts of it. A missing annotation is equal to the empty string. The meaning of a link can be indicated by the RELATION meta field (Section 4.1.4).

2.2. Allowed characters

The set of allowed Unicode characters in BEACON dumps is the set of valid Unicode characters from UCS which can also be expressed in XML 1.0, excluding some discouraged control characters:

 CHAR        =  WHITESPACE / %x21-7E / %xA0-D7FF / %xE000-FFFD
             /  %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
             /  %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
             /  %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
             /  %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
             /  %xD0000-DFFFD / %xE0000-EFFFD / %xF0000-FFFFD
             /  %x10000-10FFFD

Applications SHOULD exclude disallowed characters by stripping them, by replacing them with the replacement character U+FFFD, or by refusing to process. Applications SHOULD also apply Unicode Normalization Form Canonical Composition (NFKC) to all strings.

2.3. Whitespace normalization

A Unicode string is whitespace-normalized according to this specification, by stripping leading and trailing whitespace and by replacing all WHITESPACE character sequences by a single space (SP).

 WHITESPACE  =  1*( CR | LF | SPACE )

 SPACE       =  HTAB | SP

2.4. URI patterns

A URI pattern in this specification is an URI Template, as defined in [RFC6570], with all template expressions being either {ID} for simple string expansion or {+ID} for reserved expansion.

A URI pattern is used to construct a URI by replacing all template expressions with an identifier value. All identifier characters in the unreserved range from [RFC3986], and characters in the reserved range or character sequences matching the pct-encoded rule for expressions being {+ID}, are copied literally. All other characters are copied to the URI as the sequence of pct-encoded triplets corresponding to that character’s encoding in UTF-8 [RFC3629]. The referenced character ranges are imported here from [RFC3986] for convenience:

 pct-encoded    =  "%" HEXDIG HEXDIG
 unreserved     =  ALPHA / DIGIT / "-" / "." / "_" / "~"
 reserved       =  gen-delims / sub-delims
 gen-delims     =  ":" / "/" / "?" / "#" / "[" / "]" / "@"
 sub-delims     =  "!" / "$" / "&" / "'" / "(" / ")"
                /  "*" / "+" / "," / ";" / "="

A URI pattern is allowed to contain the broader set of characters allowed in Internationalized Resource Identifiers (IRI) [RFC3987]. The URI constructed from a URI pattern by template processing can be transformed to an IRI by following the process defined in Section 3.2 of [RFC3987].

 Example value    Expression   Copied as

  path/dir          {ID}        path%2Fdir
  path/dir          {+ID}       path/dir
  Hello World!      {ID}        Hello%20World%21
  Hello World!      {+ID}       Hello%20World!
  Hello%20World     {ID}        Hello%2520World
  Hello%20World     {+ID}       Hello%20World
  M%C3%BCller       {ID}        M%25C3%25BCller
  M%C3%BCller       {+ID}       M%C3%BCller

3. BEACON format

A BEACON file is an UTF-8 encoded Unicode file [RFC3629]. The file MAY begin with an Unicode Byte Order Mark and it SHOULD end with a line break. The first line of a BEACON file SHOULD include the meta field FORMAT set to BEACON ("#FORMAT: BEACON"). The rest of the file consists of a (possibly empty) set of lines that express meta fields (Section 4), followed by a set of lines with link tokens which links are constructed from (Section 3.1). At least one empty line SHOULD be used to separate meta lines and link lines. If no empty line is given, the first link line MUST NOT begin with "#".

 BEACONFILE  =  [ %xEF.BB.BF ]        ; Unicode UTF-8 Byte Order Mark
                [ "#FORMAT" SEPARATOR "BEACON" *SPACE LINEBREAK ]
                *( METALINE LINEBREAK )
                *( *SPACE LINEBREAK ) ; empty lines
                 LINKLINE *( LINEBREAK LINKLINE )
                [ LINEBREAK ]

The order of meta lines and of link lines, respectively, is irrelevant.

A meta line specifies a meta field (Section 4) and its value, separated by colon and/or tabulator or space:

 METALINE    =  "#" METAFIELD SEPARATOR METAVALUE

 SEPARATOR   =  ":" *SPACE / +SPACE

 METAFIELD   =  +( %x41-5A )   ;  "A" to "Z"

 METAVALUE   =  LINE

Each link is given on a link line with its source token, optionally follwed by annotation token and target token:

 LINKLINE    =  SOURCE /
                SOURCE VBAR TARGET /   ; if TARGET is http: or https:
                SOURCE VBAR ANNOTATION /
                SOURCE VBAR ANNOTATION VBAR TARGET

 SOURCE      =  TOKEN

 TARGET      =  TOKEN

 ANNOTATION  =  TOKEN

The ambiguity of rule LINKLINE with one occurrence of VBAR is resolved is following:

This way one can use two forms to encode links to HTTP URIs (given target meta field and message meta field with their default values):

foo|http://example.org/foobar
foo||http://example.org/foobar

3.1. Link construction

Link elements in BEACON format are given in abbreviated form of link tokens. Each link is constructed from:

All tokens MUST be whitespace-normalized before further processing.

Construction rules are based on the value of link construction meta fields (Section 4.1). A link is constructed as following:

The following table illustrates construction of a link:

 meta field  +  link token  -->  link element
---------------------------------------------------
 prefix      |  source       |   source identifier
 target      |  target       |   target identifier
 message     |  annotation   |   annotation

Constructed source identifier and target identifier SHOULD be syntactically valid URIs. Applications MAY ignore links with invalid URIs and SHOULD give a warning.

Applications MUST NOT differentiate between equal links constructed from different abbreviations. For instance the following BEACON file contains a single link:

 #PREFIX: http://example.org/
 #TARGET: http://example.com/
 #MESSAGE: Hello World!

 foo

The same link could also be serialized without any meta fields:

 http://example.org/foo|Hello World!|http://example.com/foo

The default meta fields values could also be specified as:

 #PREFIX: {+ID}
 #TARGET: {+ID}

Multiple occurrences of equal links in one BEACON file SHOULD be ignored. It is RECOMMENDED to indicate duplicated links with a warning.

3.2. MIME type

The RECOMMENDED MIME type of BEACON files is "text/plain". The file extension .txt SHOULD be used when storing BEACON files.

4. Meta fields

A link dump SHOULD contain a set of meta fields, each identified by its name build of uppercase letters A-Z. Relevant meta fields for link construction (Section 4.1), for description of the link dump (Section 4.2), and for description of source dataset and target dataset (Section 4.3) are defined in the following. Additional meta fields, not defined in this specification, SHOULD be ignored. All meta field values MUST be whitespace-normalized. Missing meta field values and empty strings MUST be set to the field’s default value, which is the empty string unless noted otherwise. The following diagram shows which meta fields belong to which dataset. Repeatable fields are marked with a plus character (+):

                    +-----------------------+
                    | link dump             |
                    |                       |
                    |  * DESCRIPTION+       |
                    |  * CREATOR+           |
                    |  * CONTACT+           |
                    |  * HOMEPAGE+          |
                    |  * FEED+              |
                    |  * TIMESTAMP+         |
                    |  * UPDATE             |
                    |                       |
                    | +-------------------+ |
                    | | link construction | |
                    | |                   | |   +-----------------+
+----------------+  | |  * PREFIX         | |   | target dataset  |
| source dataset | ---|  * TARGET         |---> |                 |
|                |  | |  * RELATION       | |   |  * TARGETSET    |
|                | ---|  * MESSAGE        |---> |  * NAME+        |
|  * SOURCESET   |  | |  * ANNOTATION     | |   |  * INSTITUTION+ |
|                | ---|                   |---> |                 |
+----------------+  | +-------------------+ |   +-----------------+
                    +-----------------------+

Examples of meta fields are included in Section 5.1.

4.1. Link construction meta fields

Link construction meta fields define how to construct links from link tokens (Section 3.1). See Section 5.1.3 for examples.

4.1.1. PREFIX

The PREFIX meta field specifies an URI patter to construct sources identfiers. If this field is not specified or set to the empty string, the default value {+ID} is used. If the field value contains no template expression, the expression {ID} is appended. The name PREFIX was choosen to keep backwards compatibility with existing BEACON files.

4.1.2. TARGET

The TARGET meta field specifies an URI patter to construct target identifiers. If this field is not specified or set to the empty string, the default value {+ID} is used. If the field value field contains no template expression, the expression {ID} is appended.

4.1.3. MESSAGE

The MESSAGE meta field is used to specify a default value for link annotations.

4.1.4. RELATION

All links in a link dump share a common relation type, specified by the RELATION meta field. The default relation type is rdfs:seeAlso, but application not interested in mapping to RDF can ignore this meta field. A relation type MUST be either an URI or a registered link type from the IANA link relations registry [RFC5988].

4.1.5. ANNOTATION

The ANNOTATION field can be used to specify a specific the meaning of link annotations in a link dump. The field value MUST be an URI.

4.2. Link dump meta fields

Link dump meta fields describe the link dump as whole. See Section 5.1.4 for examples.

4.2.1. DESCRIPTION

The DESCRIPTION meta field contains a human readable description of the link dump.

4.2.2. CREATOR

The CREATOR meta field contains the URI or the name of the person, organization, or a service primarily responsible for making the link dump. The field SHOULD NOT contain a simple URL, unless this URL is also used as URI.

4.2.3. CONTACT

The CONTACT meta field contains an email address or similar contact information to reach the creator of the link dump. The field value SHOULD be a mailbox address as specified in section 3.4 of [RFC5322].

4.2.4. HOMEPAGE

The HOMEPAGE meta field contains an URL of a website with additional information about this link dump. Note that this field does not specify the homepage of the target dataset.

4.2.5. FEED

The FEED meta field contains an URL, where to download the link dump from.

4.2.6. TIMESTAMP

The TIMESTAMP field contains the date of last modification of the link dump. Note that this value MAY be different to the last modification time of a BEACON file that serializes the link dump. The timestamp value MUST conform to the full-date or to the date-time production rule in [RFC3339]. In addition, an uppercase T character MUST be used to separate date and time, and an uppercase Z character MUST be present in the absence of a numeric time zone offset.

4.2.7. UPDATE

The UPDATE field specifies how frequently the link dump is likely to change. The field corresponds to the <changefreq> element in Sitemaps XML format [Sitemaps]. Valid values are:

The value "always" SHOULD be used to describe link dumps that change each time they are accessed. The value "never" SHOULD be used to describe archived link dumps. Please note that the value of this tag is considered a hint and not a command.

4.3. Dataset meta fields

The set that all source identifiers in a link dump originate from is called the source dataset and the set that all target identifiers originate from is called the target dataset. Dataset meta fields contain properties of the source dataset or target dataset, respectively. See Section 5.1.5 for examples of this meta fields.

4.3.1. SOURCESET

The source dataset can be identified by the SOURCESET meta field, which MUST be an URI if given.

4.3.2. TARGETSET

The target dataset can be identified by the TARGETSET meta field, which MUST be an URI if given.

4.3.3. NAME

The NAME meta field contains a name or title of the target dataset.

4.3.4. INSTITUTION

The INSTITUTION meta field contains the name or HTTP URI of the organization or of an individual responsible for making available the target dataset.

5. Mappings

An important use-case of BEACON is the creation of HTML links as described in section Section 5.2. A link dump can also be mapped to an RDF graph (Section 5.1) so BEACON provides a RDF serialization format for a subset of RDF graphs with uniform links.

5.1. Mapping to RDF

The following namespace prefixes are used to refer to RDF properties and classes from the RDF and RDFS vocabularies [RDF], the DCMI Metadata Terms [DCTERMS], the FOAF vocabulary [FOAF], the VoID vocabulary [VOID], and the RSS 1.0 Syndication Module [RSSSYND]:

 rdf:     <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
 rdfs:    <http://www.w3.org/2000/01/rdf-schema#>
 dcterms: <http://purl.org/dc/terms/extent>
 foaf:    <http://xmlns.com/foaf/0.1/>
 void:    <http://rdfs.org/ns/void#>
 rssynd:  <http://web.resource.org/rss/1.0/modules/syndication/>

The blank node :dump denotes the URI of the link dump, the blank node :sourceset denotes the URI of the source dataset, and the blank node :targetset denotes the URI of the target dataset.

Note that literal values with language tags or datatypes are not supported when mapping BEACON to RDF.

5.1.1. Mapping links to RDF

Links with syntactically valid URIs as source and target identifiers can be mapped to at least one RDF triple with:

As RDF is not defined on URIs but on URI references or IRIs, all URIs MUST be transformed to an IRI by following the process defined in Section 3.2 of [RFC3987]. Applications MAY reject mapping link dumps with relation type from the IANA link relations registry, in lack of official URIs. Another valid solution is to extend the RDF model by using blank nodes as predicates.

Links with non-URI source and/or target identifiers are allowed but NOT RECOMMENDED. Such links cannot be mapped to RDF.

5.1.2. Mapping link annotations to RDF

Each link annotation SHOULD result in an additional RDF triple, unless its value equals to the empty string. The additional triple is mapped with:

Applications MAY use a predefined URI as ANNOTATION or process the link annotation by other means. For instance annotations could contain additional information about a link such as its provenience, date, or probability (reification).

Typical use cases of annotations include specification of labels and a "number of hits" at the target dataset. For instance the following file in BEACON format (Section 3):

 #PREFIX: http://example.org/
 #TARGET: http://example.com/ 
 #RELATION: http://xmlns.com/foaf/0.1/primaryTopic
 #ANNOTATION: http://purl.org/dc/terms/extent

 abc|12|xy

is mapped to the following RDF triples:

 <http://example.org/abc> foaf:primaryTopic <http://example.com/xy> .
 <http://example.com/xy> dcterms:extent "12" .

5.1.3. Mapping link construction meta fields to RDF

Link construction meta fields (Section 4.1) are primarily required for link construction (Section 3.1). Some of these fields can further be mapped to RDF triples.

The PREFIX meta field (Section 4.1.1) MAY be mapped to the RDF property void:uriSpace or void:uriRegexPattern with :sourceset as RDF subject.

The TARGET meta field (Section 4.1.2 MAY be mapped to the RDF property void:uriSpace or void:uriRegexPattern with :targetset as RDF subject.

The RELATION meta field (Section 4.1.4) is mapped to the RDF property void:linkPredicate with :dump as RDF subject, if its value contains an URI. Some examples of relation types and their mapping to RDF triples:

 #RELATION: http://www.w3.org/2002/07/owl#sameAs
 #RELATION: http://xmlns.com/foaf/0.1/isPrimaryTopicOf
 #RELATION: http://purl.org/spar/cito/cites
 #RELATION: describedby
 #RELATION: replies

 :dump void:linkPredicate <http://www.w3.org/2002/07/owl#sameAs> .
 :dump void:linkPredicate foaf:isPrimaryTopicOf .
 :dump void:linkPredicate <http://purl.org/spar/cito/cites> .

The ANNOTATION meta field (Section 4.1.5), if given, contains an RDF property for RDF triples between link target and link annotation. To give an example, the following BEACON file

#ANNOTATION: http://purl.org/dc/elements/1.1/format

http://example.org/apples|sphere|http://example.org/oranges

implies the following RDF triple

<http://example.org/oranges> dc:format "sphere" .

5.1.4. Mapping link dump meta fields to RDF

Link dump meta fields (Section 4.2) describe properties of the link dump, referred to as blank node :dump in the following. The following RDF triples are always assumed when mapping link dumps to RDF:

 :dump a void:Linkset ; 
     void:subjectsTarget :sourceset ;
     void:objectsTarget :targetset .

The DESCRIPTION meta field is mapped to the dcterms:description RDF property. For instance

#DESCRIPTION: Mapping from ids to documents

can be mapped to

:dump dcterms:description "Mapping from ids to documents" .

The CREATOR meta field is mapped to the dcterms:creator RDF property. The creator is an instace of the class foaf:Agent. For instance

#CREATOR: Bea Beacon

and

#CREATOR: http://example.org/people/bea

can be mapped the the following RDF triples, respectively:

:dump dcterms:creator "Bea Beacon" .
:dump dcterms:creator [ a foaf:Agent ; foaf:name "Bea Beacon" ] .

:dump dcterms:creator <http://example.org/people/bea> .
<http://example.org/people/bea> a foaf:Agent .

The CONTACT meta field (Section 4.2.3) is mapped to the foaf:mbox and to the foaf:name RDF properties. For instance

 #CONTACT: admin@example.com

can be mapped to

 :dump dcterms:creator [
     foaf:mbox <mailto:admin@example.com>
 ] .

and

 #CONTACT: Bea Beacon <bea@example.org>

can be mapped to

 :dump dcterms:creator [
     foaf:name "Bea Beacon" ;
     foaf:mbox <mailto:bea@example.org>
 ] .

The HOMEPAGE meta field (Section 4.2.4) is mapped to the foaf:homepage RDF property. For instance

#HOMEPAGE: http://example.org/about.html

can be mapped to

:dump foaf:homepage <http://example.org/about.html> .

The FEED meta field (Section 4.2.5) corresponds to the void:dataDump RDF property. For instance

#FEED: http://example.com/beacon.txt

can be mapped to

:dump void:dataDump <http://example.com/beacon.txt> .

The TIMESTAMP meta field (Section 4.2.6 corresponds to the dcterms:modified RDF property. For instance the following valid timestamps

 #TIMESTAMP: 2012-05-30
 #TIMESTAMP: 2012-05-30T15:17:36+02:00
 #TIMESTAMP: 2012-05-30T13:17:36Z

can be mapped to the following RDF triples, respectively:

 :dump dcterms:modified "2012-05-30"
 :dump dcterms:modified "2012-05-30T15:17:36+02:00"
 :dump dcterms:modified "2012-05-30T13:17:36Z"

The UPDATE meta field (Section 4.2.7) corresponds to the rssynd:updatePeriod RDF property. For instance this field

#UPDATE: daily

specifies a daily update, expressible in RDF as

:dump rssynd:updatePeriod "daily" .

5.1.5. Mapping dataset meta fields to RDF

Dataset meta fields (Section 4.3) are mapped to subjects and objects of RDF triples to describe the source dataset and target dataset, respectively.

The following triples are always assumed in mappings of link dumps to RDF:

 :sourceset a void:Dataset .
 :targetset a void:Dataset .

The SOURCESET meta field (Section 4.3.1) replaces the blank node :sourceset, if given.

The TARGETSET meta field (Section 4.3.1) replaces the blank node :targetset, if given.

The NAME meta field (Section 4.3.3) is mapped to the RDF property dcterms:title with :targetset as RDF subject. For instance the field value "ACME documents", expressible in BEACON format as

#NAME: ACME documents

can be mapped to this RDF triple:

:targetset dcterms:title "ACME documents" .

The INSTITUTION meta field (Section 4.3.4) is mapped to the RDF property dcterms:publisher. For instance the field value "ACME", expressible in BEACON format as

#INSTITUTION: ACME

can be mapped to this RDF triple:

:targetset dcterms:publisher "ACME" .

A field value starting with http:// or https:// is interpreted as URI instead of string. For instance

#INSTITUTION: http://example.org/acme/

can be mapped to this RDF triple:

:targetset dcterms:publisher <http://example.org/acme/> .

5.2. Mapping to HTML

This document does not specify a single mapping of links in a BEACON link dump to links in a HTML document, so the following description is non-normative.

A link in a BEACON dump can be mapped to a HTML link (<a> element) as following:

For instance the following link, given in a BEACON file:

 http://example.com|example|http://example.org

can be mapped to the following HTML link:

 <a href="http://example.org">example</a>

Note that the annotation field value may be the empty string. In practice, additional meta fields SHOULD be used to construct appropriate HTML links. For instance the meta fields

 #RELATION: http://xmlns.com/foaf/0.1/isPrimaryTopicOf
 #SOURCETYPE: http://xmlns.com/foaf/0.1/Person 
 #NAME: ACME documents

can be used to create a link such as

 <span>
   More information about this person
   <a href="http://example.com/foo">at ACME documents</a>.
 </span>  

because foaf:isPrimaryTopicOf translates to "more information about", foaf:Person translates to "this person", and the target dataset’s name can be used as link label.

6. Security Considerations

Programs should be prepared for malformed and malicious content when parsing BEACON files, when constructing links from link tokens, and when mapping links to RDF or HTML. Possible attacks of parsing contain broken UTF-8 and buffer overflows. Link construction can result in unexpectedly long strings and character sequences that may be harmless when analyzed as parts. Most notably, BEACON data may store strings containing HTML and JavaScript code to be used for cross-site scripting attacks on the site displaying BEACON links. Applications should therefore escape or filter accordingly all content with established libraries, such as Apache Escape Utils.

7. References

7.1. Normative References

[RFC2119] Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, March 1997.
[RFC3339] Klyne, G. and C. Newman, "Date and Time on the Internet: Timestamps", RFC 3339, July 2002.
[RFC3629] Yergeau, F., "UTF-8, a transformation format of ISO 10646", STD 63, RFC 3629, November 2003.
[RFC3986] Berners-Lee, T., Fielding, R. and L. Masinter, "Uniform Resource Identifier (URI): Generic Syntax", STD 66, RFC 3986, January 2005.
[RFC3987] Duerst, M. and M. Suignard, "Internationalized Resource Identifiers (IRIs)", RFC 3987, January 2005.
[RFC5234] Crocker, D. and P. Overell, "Augmented BNF for Syntax Specifications: ABNF", STD 68, RFC 5234, January 2008.
[RFC5322] Resnick, P., "Internet Message Format", RFC 5322, October 2008.
[RFC5988] Nottingham, M., "Web Linking", RFC 5988, October 2010.
[RFC6570] Gregorio, J., Fielding, R., Hadley, M., Nottingham, M. and D. Orchard, "URI Template", RFC 6570, March 2012.
[Unicode] The Unicode Consortium, , "The Unicode Standard, Version 6.1", April 2012.
[DCTERMS] DCMI Usage Board, , "DCMI Metadata Terms", Oct 2010.

7.2. Informative References

[FOAF] Brickley, D. and L. Miller, "FOAF Vocabulary Specification", Aug 2010.
[RFC5013] Kunze, J. and T. Baker, "The Dublin Core Metadata Element Set", RFC 5013, August 2007.
[Sitemaps] Google Inc., , "Sitemaps XML format", February 2008.
[SAX] Bradner, D., "SAX 1.0: The Simple API for XML", May 1998.
[TURTLE] Beckett, D. and T. Berners-Lee, "Turtle - Terse RDF Triple Language", Mar 2011.
[RELAX-NGC] Clark, J., "RELAX NG Compact Syntax", Nov 2002.
[RSSSYND] RSS-DEV Working Group, , "RDF Site Summary 1.0 Modules: Syndication", Dec 2000.
[RDF] Hayes, P., "RDF Semantics", Feb 2004.
[VOID] Cyganiak, R., Zhao, J., Alexander, K. and M. Hausenblas, "Vocabulary of Interlinked Datasets (VoID)", Mar 2011.

Appendix A. Glossary

annotation
an additional description of a link given as Unicode string (the empty string, if missing).
BEACON
a data interchange format as specified in this document.
BEACON file
a link dump serialized in BEACON format.
BEACON format
a condense format to serialize link dumps as specified in this document.
link
a triple of source identifier, target identifier, and (optional) annotation. Given in form of link tokens in BEACON format to construct links from.
link dump
a set of links and meta fields with common relation type for all links.
link token
a Unicode string in BEACON format used to construct a link.
meta field
a property to describe a link dump, a source database, a target database, or how to construct links from BEACON format.
source identifier
identifier where a link points from.
target identifier
identifier where a link points to.
source database
the set (or superset) of all source URIs in a link dump.
target database
the set (or superset) of all target URIs in a link dump.
relation type
a common type of connection between target identifiers and source identifiers in a link dump.

Appendix B. BEACON XML format

A BEACON XML file is a valid XML file conforming to the following schema. The file SHOULD be encoded in UTF-8 [RFC3629]. The file MUST:

The file MAY further:

All attributes MUST be given in lowercase.

To process BEACON XML files, a complete and stream-processing XML parser, for instance the Simple API for XML [SAX], is RECOMMENDED, in favor of parsing with regular expressions or similar methods prone to errors. Additional XML attributes of <link> elements and <link> elements without source attribute SHOULD be ignored.

Note that in contrast to BEACON text files, link tokens MAY include line breaks, which MUST BE removed by whitespace normalization. Furthermore id field, annotation field and target token MAY include a vertical bar, which MUST be replaced by the character sequence %7C before further processing.

A schema of BEACON XML format in RELAX NG Compact syntax [RELAX-NGC] can be given as following:

default namespace = "http://purl.org/net/beacon"

element beacon {
  attribute prefix      { text }.
  attribute target      { text },
  attribute message     { text },
  attribute source      { text },
  attribute name        { text },
  attribute institution { text },
  attribute description { text },
  attribute creator     { text },
  attribute contact     { text },
  attribute homepage    { xsd:anyURI },
  attribute feed        { xsd:anyURI },
  attribute timestamp   { text },
  attribute update { "always" | "hourly" | "daily" 
    | "weekly" | "monthly" | "yearly" | "never" },
  attribute relation    { xsd:anyURI },
  attribute annotation  { xsd:anyURI },
  element link {
    attribute source     { text },
    attribute target     { text }?,
    attribute annotation { text }?,
    empty
  }*
}

Appendix C. Mapping example

A short example of a link dump serialized in BEACON text format:

#FORMAT: BEACON
#PREFIX: http://example.org/
#TARGET: http://example.com/
#NAME:   ACME document

alice||foo
bob
ada|bar

The link dump can be mapped to RDF as following:

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix void: <http://rdfs.org/ns/void#> .

:sourceset a void:Dataset ;
    void:uriSpace "http://example.org/" .

:targetset a void:Dataset ;
    void:uriSpace "http://example.com/" .

:dump a void:Linkset ;
    void:subjectsTarget :sourceset ;
    void:objectsTarget :targetset ;
    void:linkPredicate rdfs:seeAlso .

<http://example.org/alice> 
  rdfs:seeAlso <http://example.com/foo> . 
<http://example.org/bob> 
  rdfs:seeAlso <http://example.com/bob> . 
<http://example.org/ada> 
  rdfs:seeAlso <http://example.com/ada> . 
<http://example.com/ada> 
  rdfs:value "bar" .

The same link dump serialized in BEACON XML format:

<?xml version="1.0" encoding="UTF-8"?>
<beacon xmlns="http://purl.org/net/beacon" 
        prefix="http://example.org/"
        target="http://example.com/"
        name="ACME document">
   <link source="alice" target="foo" />
   <link source="bob" />
   <link source="ada" annotation="bar" />
</beacon>

Appendix D. Extended example

To give an extended example, the "ACME" company wants to provide links from people to documents that each person contributed to (a "contributor" relationship in terms of Dublin Core). A list of all people is available from http://example.com/people/ and a list of all documents, titled "ACME documents", is available from http://example.com/documents/. This information can be expressed in a serialized link dump with BEACON meta fields as following:

#FORMAT: BEACON
#INSTITUTION: ACME
#RELATION: http://purl.org/dc/elements/1.1/contributor
#SOURCESET: http://example.com/people/
#TARGETSET: http://example.com/documents/
#NAME: ACME documents

Both source identifiers for people and target identifiers for documents follow a pattern, so links can be abbreviated as following:

#PREFIX: http://example.com/people/
#TARGET: http://example.com/documents/{+ID}.about

alice||23
bob||42

From this form the following links can be constructed:

http://example.com/people/alice|http://example.com/documents/23.about
http://example.com/people/bob|http://example.com/documents/42.about

The example can be extended by addition of a third element for each link. For instance the annotation could be used to specifcy the date of each document:

#ANNOTATION: http://purl.org/dc/elements/1.1/date

alice|2014-03-12|23
bob|2013-10-21|42

Authors' Addresses

Jakob Voß Verbundzentrale des GBV Platz der Göttinger Sieben 1 Göttingen, 37073 Germany Phone: +49(551)39-10242 EMail: voss@gbv.de
Mathias Schindler Wikimedia Deutschland e.V. Tempelhofer Ufer 23-24 Berlin, 10963 Germany Phone: +49(30)21915826-0 EMail: mathias.schindler@wikimedia.de

Table of Contents