Network Working Group | J. Voß |
Internet-Draft | Verbundzentrale des GBV |
Intended status: Informational | M. Schindler |
Expires: May 5, 2018 | November 2017 |
BEACON link dump format
draft-voss-beacon-003
This document specifies BEACON, a data interchange format for large numbers of uniform links.
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at http://datatracker.ietf.org/drafts/current/.
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."
This Internet-Draft will expire on May 5, 2018.
Copyright (c) 2017 IETF Trust and the persons identified as the document authors. All rights reserved.
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
BEACON is a data interchange format for large numbers of uniform links. A BEACON link dump consists of
Link dumps can be serialized in BEACON format (Section 3). BEACON format is a condense, line-oriented text format that utilizes common patterns in links of a link dump form abbreviation. A link dump serialized in BEACON format is also referred to as BEACON file.
Link dumps can further be mapped to RDF graphs with minor limitations (Section 5).
The non-normative appendix contain a mapping of BEACON links to HTML (Appendix B) and a serialization of link dumps based on XML (Appendix C).
The current specification is managed at https://github.com/gbv/beaconspec.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].
The formal grammar rules in this document are to be interpreted as described in [RFC5234], including the ABNF core rules HTAB, LF, CR, and SP. In addition, the minus operator (-) is used to exclude line breaks and vertical bars from the rules LINE and TOKEN in Section 3.1.
Samples of RDF graphs in this document are expressed in Turtle syntax [TURTLE].
The simplest form of a BEACON file contains full URL links separated by one or two vertical bars:
http://example.com/people/alice|http://example.com/documents/23.about http://example.com/people/bob||http://example.com/documents/42.about
The first element of a link is called source identifier and the second is called target identifier. In most cases these identifiers are URLs or URIs. If a target identifier does not start with http: or https:, two vertical bars MUST be used:
http://example.com/people/alice||urn:isbn:0123456789
Source identifier and target identifier can be abbreviated with the meta fields PREFIX and TARGET, respectively. A simple BEACON file with such abbreviations can look like this:
#FORMAT: BEACON #PREFIX: http://example.org/id/ #TARGET: http://example.com/about/ 12345 6789||abc
In this examples the following two links are encoded:
http://example.org/id/12345|http://example.com/about/12345 http://example.org/id/6789|http://example.com/about/abc
Links can further be extended by link annotation and relation type.
A link in a link dump is a directed, typed connection between two resources, optionally enriched by an annotation. A link is compromised of four elements:
Each elements MUST be whitespace-normalized Unicode strings (Section 2.3) that conforms to the TOKEN grammar rule given in Section 3.1. All elements except link annotation MUST NOT be empty strings.
Source identifier and target identifier define where a link is pointing from and to respectively. Relation type is an identifier that indicates the meaning of a link. All these identifiers SHOULD be URIs [RFC3986]. A link annotation can be used to further describe a link or parts of it.
All links in a link dump share either a common relation type or a common link annotation, or both. This uniformity is used to abbreviate links in BEACON format (Section 3).
The set that all source identifiers in a link dump originate from is called the source dataset and the set that all target identifiers originate from is called the target dataset.
The set of allowed Unicode characters in BEACON dumps is the set of valid Unicode characters from UCS which can also be expressed in XML 1.0, excluding some discouraged control characters:
CHAR = WHITESPACE / %x21-7E / %xA0-D7FF / %xE000-FFFD / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE0000-EFFFD / %xF0000-FFFFD / %x10000-10FFFD
Applications SHOULD exclude disallowed characters by stripping them, by replacing them with the replacement character U+FFFD, or by refusing to process. Applications SHOULD also apply Unicode Normalization Form Canonical Composition (NFKC) to all strings.
A Unicode string is whitespace-normalized according to this specification, by stripping leading and trailing whitespace and by replacing all WHITESPACE character sequences by a single space (SP).
WHITESPACE = 1*( CR | LF | SPACE ) SPACE = HTAB | SP
A URI pattern in this specification is a URI Template, as defined in [RFC6570], with all template expressions being either {ID} for simple string expansion or {+ID} for reserved expansion. URI patterns are used in link construction (Section 3.2) to expand link tokens to full identifiers (usually URIs).
A URI pattern is allowed to contain the broader set of characters allowed in Internationalized Resource Identifiers (IRI) [RFC3987]. The URI constructed from a URI pattern by template processing can be transformed to a IRI by following the process defined in Section 3.2 of [RFC3987].
For instance the URI pattern http://example.org/?id={ID} is expanded to:
ID variable Expanded Hello World! http://example.org/?id=Hello%20World%21 x/?a=1&b=2 http://example.org/?id=x%2F%3Fa%3D1%26b%3D2 M%C3%BCller http://example.org/?id=M%25C3%25BCller
And the URI pattern http://example.org/{+ID} is expanded to:
ID variable Expanded Hello World! http://example.org/Hello%20World! x/?a=1&b=2 http://example.org/x/?a=1&b=2 M%C3%BCller http://example.org/M%25C3%25BCller
A BEACON file is a UTF-8 encoded Unicode file [RFC3629]. The file MAY begin with a Unicode Byte Order Mark and it SHOULD end with a line break. The rest of the file consists of four parts in this order:
All four parts are optional but RECOMMENDED. The order of meta lines and of link lines, respectively, is irrelevant.
LINE = *CHAR - ( *CHAR LINEBREAK *CHAR ) TOKEN = *CHAR - ( *CHAR ( LINEBREAK / VBAR ) *CHAR ) LINEBREAK = LF | CR LF | CR ; "\n", "\r\n", or "\r" VBAR = %x7C ; vertical bar ("|") SEPARATOR = ":" *SPACE / +SPACE BEACONFILE = [ %xEF.BB.BF ] ; Unicode UTF-8 Byte Order Mark [ "#FORMAT" SEPARATOR "BEACON" *SPACE LINEBREAK ] *( METALINE LINEBREAK ) *( *SPACE LINEBREAK ) ; empty lines LINKLINE *( LINEBREAK LINKLINE ) [ LINEBREAK ]
A meta line specifies a meta field (Section 4) and its value, separated by colon and/or tabulator or space:
METALINE = "#" METAFIELD SEPARATOR METAVALUE METAFIELD = +( %x41-5A ) ; "A" to "Z" METAVALUE = LINE
If a BEACON file contains multiple meta lines with same field name, all but one of these lines MUST be ignored. Applications SHOULD emit a warning for multiple meta lines with same field name.
Each link is given on a link line with its source token, optionally follwed by annotation token and target token. These link elements are used for (Section 3.2) unless the source token consists of whitespace only. If no empty line is given, the first link line MUST NOT begin with #.
LINKLINE = SOURCE / SOURCE VBAR TARGET / SOURCE VBAR ANNOTATION / SOURCE VBAR ANNOTATION VBAR TARGET SOURCE = TOKEN TARGET = TOKEN ANNOTATION = TOKEN
The ambiguity of rule LINKLINE with one VBAR is resolved is following:
This way one can use two forms to encode links to HTTP URIs (given target meta field and message meta field with their default values):
foo|http://example.org/bar foo||http://example.org/bar
Applications MAY accept link lines with more than two vertical bars but they MUST ignore additional content between a third vertical bar and the end of the line.
Link elements in BEACON format are given in abbreviated form with link tokens. Each link is constructed based on meta fields for link construction (Section 4.1) and from
All tokens MUST be whitespace-normalized before further processing. The link elements are then constructed as following (see Section 2.4 for how to construct values from URI patterns):
The following table illustrates construction of a link:
meta field + link token --> link element --------------------------------------------------- PREFIX | source | source identifier TARGET | target | target identifier MESSAGE | annotation | link annotation RELATION | annotation | relation type
Constructed source identifier, target identifier, and relation types SHOULD be syntactically valid URIs. Applications MAY ignore links with invalid URIs and SHOULD emit a warning.
Applications MUST NOT differentiate between equal links constructed from different abbreviations. For instance the following BEACON file contains a single link:
#PREFIX: http://example.org/ #TARGET: http://example.com/ #MESSAGE: Hello World! foo
The same link could also be serialized without any meta fields:
http://example.org/foo|Hello World!|http://example.com/foo
The default meta fields values can be specified as:
#PREFIX: {+ID} #TARGET: {+ID} #RELATION: http://www.w3.org/2000/01/rdf-schema#seeAlso
Multiple occurrences of equal links in one BEACON file SHOULD be ignored. It is RECOMMENDED to indicate duplicated links with a warning.
The RECOMMENDED MIME type of BEACON files is "text/plain". The file extension .txt SHOULD be used when storing BEACON files.
A link dump SHOULD contain a set of meta fields, each identified by its name build of uppercase letters A-Z. Relevant meta fields for link construction (Section 4.1), for description of the link dump (Section 4.2), and for description of source dataset and target dataset (Section 4.3) are defined in the following.
A link dump can only contain one value for each meta field. Additional meta fields, not defined in this specification, SHOULD be ignored.
All meta field values MUST be whitespace-normalized. Missing meta field values and empty strings MUST be set to the field’s default value, which is the empty string unless noted otherwise. The following diagram shows which meta fields belong to which dataset.
+-----------------------+ | link dump | | | | * DESCRIPTION | | * CREATOR | | * CONTACT | | * HOMEPAGE | | * FEED | | * TIMESTAMP | | * UPDATE | | | | +-------------------+ | | | link construction | | | | | | +-----------------+ +----------------+ | | * PREFIX | | | target dataset | | source dataset | ---| * TARGET |---> | | | | | | * RELATION | | | * TARGETSET | | | ---| * MESSAGE |---> | * NAME | | * SOURCESET | | | * ANNOTATION | | | * INSTITUTION | | | ---| |---> | | +----------------+ | +-------------------+ | +-----------------+ +-----------------------+
Examples of meta fields are included in Section 5.
The following meta fields define how to construct links from link tokens (Section 3.2). See Section 5.5 for mapping of these fields to RDF.
The PREFIX meta field specifies a URI pattern (Section 2.4) to construct source identfiers. If the non-empty field value contains no URI pattern, the expression {ID} is appended.
The default value is {+ID}.
The name PREFIX was choosen to keep backwards compatibility with existing BEACON files.
The TARGET meta field specifies a URI pattern (Section 2.4) to construct target identifiers. If the non-empty field value field contains no URI pattern, the expression {ID} is appended.
The default value is {+ID}.
The MESSAGE meta field specifies a default value for link annotations.
The RELATION meta field specifies relation types of links. The field value MUST be either a URI as defined in [RFC3986] or a URI pattern as described in Section 2.4.
The default value is http://www.w3.org/2000/01/rdf-schema#seeAlso.
The ANNOTATION field can be used to specify the meaning of link annotations in a link dump. The field value MUST be a URI.
Meta fields for link dumps describe the link dump as whole. See Section 5.6 for mapping of these fields to RDF.
The DESCRIPTION meta field contains a human readable description of the link dump.
The CREATOR meta field contains the URI or the name of the person, organization, or a service primarily responsible for making the link dump. The field SHOULD NOT contain a simple URL, unless this URL is also used as URI.
The CONTACT meta field contains an email address or similar contact information to reach the creator of the link dump. The field value SHOULD be an individual mailbox address as specified in section 3.4 of [RFC5322].
The HOMEPAGE meta field contains a URL of a website with additional information about this link dump. Note that this field does not specify the homepage of the target dataset.
The FEED meta field contains a URL, where to download the link dump from.
The TIMESTAMP field contains the date of last modification of the link dump. Note that this value MAY be different to the last modification time of a BEACON file that serializes the link dump. The timestamp value MUST conform to the full-date or to the date-time production rule in [RFC3339]. In addition, an uppercase T character MUST be used to separate date and time, and an uppercase Z character MUST be present in the absence of a numeric time zone offset.
The UPDATE field specifies how frequently the link dump is likely to change. The field corresponds to the <changefreq> element in Sitemaps XML format [Sitemaps]. Valid values are:
The value always SHOULD be used to describe link dumps that change each time they are accessed. The value never SHOULD be used to describe archived link dumps.
Dataset meta fields contain properties of the source dataset or target dataset, respectively. See Section 5.7 for for mapping of these fields to RDF.
The SOURCESET meta field contains the URI of the source dataset.
The TARGETSET meta field contains the URI of the target dataset.
The NAME meta field contains a name or title of the target dataset.
The INSTITUTION meta field contains the name or HTTP URI of the organization or of an individual responsible for making available the target dataset.
A link dump can be mapped to an RDF graph as described in this section. The mapping excludes all links with one of source identifier, target identifier, relation type not being a valid URI.
All URIs MUST be transformed to IRIs as defined in Section 3.2 of [RFC3987].
Examples of link dumps mapped to RDF are given in Appendix D.
The following namespace prefixes are used to refer to RDF properties and classes from the RDFS vocabulary [RDF], the DCMI Metadata Terms [DCTERMS], the FOAF vocabulary [FOAF], the VoID vocabulary [VOID], and the Hydra Core Vocabulary [Hydra], the RSS 1.0 Syndication Module [RSSSYND]:
rdfs: <http://www.w3.org/2000/01/rdf-schema#> dcterms: <http://purl.org/dc/terms/> foaf: <http://xmlns.com/foaf/0.1/> void: <http://rdfs.org/ns/void#> hydra: <http://www.w3.org/ns/hydra/core#> rssynd: <http://web.resource.org/rss/1.0/modules/syndication/>
The blank node :dump denotes the the link dump, the blank node :sourceset denotes the the source dataset, and the blank node :targetset denotes the the target dataset. Source datatset and target datatset can also be given an absolute IRI with meta fields SOURCESET and TARGETSET, respectively (Section 4.3).
The following RDF triples can always be assumed when mapping link dumps to RDF:
:dump a void:Linkset, hydra:Collection ; void:subjectsTarget :sourceset ; void:objectsTarget :targetset . :sourceset a void:Dataset . :targetset a void:Dataset .
All publically available BEACON data dumps SHOULD be Open Data, so the following triple MAY be assumed as well:
:dump <http://creativecommons.org/ns#license> <http://creativecommons.org/publicdomain/zero/1.0/> .
Links (Section 2.1) with source identifier, target identifier, and relation type being valid URIs can be mapped to at least one RDF triple with:
The total number of mappable links in a link dump SHOULD result in two additional RDF triples whith COUNT being the number of links:
:dump hydra:totalItems COUNT . :dump void:entities COUNT .
Each non-empty link annotation SHOULD result in an additional RDF triple with:
Applications MAY use a predefined IRI as link annotation or process the link annotation by other means, for instance for provenience and versioning of links. Applications MAY assign a default language tag or datatype to all literal objects derived from link annotations.
Typical use cases of link annotations include specification of labels and a "number of hits" at the target dataset. For instance the following file in BEACON format (Section 3):
#PREFIX: http://example.org/ #TARGET: http://example.com/ #RELATION: http://xmlns.com/foaf/0.1/primaryTopic #ANNOTATION: http://purl.org/dc/terms/extent abc|12|xy
can be mapped to
<http://example.org/abc> foaf:primaryTopic <http://example.com/xy> . <http://example.com/xy> dcterms:extent "12" .
The total number of mappable links and link annotations in a link dump SHOULD result in an additional RDF triple whith TRIPLES being the sum of both numbers:
:dump void:triples TRIPLES .
All meta fields for link construction (Section 4.1) except for MESSAGE can be mapped to RDF triples.
The PREFIX meta field (Section 4.1.1) SHOULD be mapped to the RDF property void:uriSpace or void:uriRegexPattern with :sourceset as RDF subject.
The TARGET meta field (Section 4.1.2 SHOULD be mapped to the RDF property void:uriSpace or void:uriRegexPattern with :targetset as RDF subject.
The RELATION meta field (Section 4.1.4), if its value contains an URI, SHOULD mapped to the RDF property void:linkPredicate with :dump as RDF subject.
The ANNOTATION meta field (Section 4.1.5) is used to map link annotations to RDF (Section 5.4) unless the RELATION meta field contains an URI template.
Meta fields for link dumps (Section 4.2) describe properties of the link dump.
The DESCRIPTION meta field (Section 4.2.1) corresponds to the dcterms:description RDF property. For instance
#DESCRIPTION: Mapping from ids to documents
can be mapped to
:dump dcterms:description "Mapping from ids to documents" .
The CREATOR meta field (Section 4.2.2) corresponds to the dcterms:creator RDF property. The RDF object SHOULD NOT be a literal node. For instance
#CREATOR: Bea Beacon
can be mapped to
:dump dcterms:creator [ foaf:name "Bea Beacon" ] .
A field value starting with http:// or https:// is interpreted as URI instead of string. For instance
#CREATOR: http://example.org/people/bea
can be mapped to
:dump dcterms:creator <http://example.org/people/bea> .
The CONTACT meta field (Section 4.2.3) corresponds to the foaf:mbox RDF property. The RDF object SHOULD NOT be a literal node. For instance
#CONTACT: admin@example.com
can be mapped to
:dump dcterms:creator [ foaf:mbox <mailto:admin@example.com> ] .
and
#CONTACT: Bea Beacon <bea@example.org>
can be mapped to
:dump dcterms:creator [ foaf:name "Bea Beacon" ; foaf:mbox <mailto:bea@example.org> ] .
The HOMEPAGE meta field (Section 4.2.4) corresponds to the foaf:homepage RDF property. For instance
#HOMEPAGE: http://example.org/about.html
can be mapped to
:dump foaf:homepage <http://example.org/about.html> .
The FEED meta field (Section 4.2.5) corresponds to the void:dataDump RDF property. For instance
#FEED: http://example.com/beacon.txt
can be mapped to
:dump void:dataDump <http://example.com/beacon.txt> .
The TIMESTAMP meta field (Section 4.2.6) corresponds to the dcterms:modified RDF property. For instance the following valid timestamps
#TIMESTAMP: 2012-05-30 #TIMESTAMP: 2012-05-30T15:17:36+02:00 #TIMESTAMP: 2012-05-30T13:17:36Z
can be mapped to the following RDF triples, respectively:
:dump dcterms:modified "2012-05-30"^^xsd:date :dump dcterms:modified "2012-05-30T15:17:36+02:00"^^xsd:dateTime :dump dcterms:modified "2012-05-30T13:17:36Z"^^xsd:dateTime
The UPDATE meta field (Section 4.2.7) corresponds to the rssynd:updatePeriod RDF property. For instance a daily update
#UPDATE: daily
can be mapped to
:dump rssynd:updatePeriod "daily" .
Meta fields for the datasets (Section 4.3) are mapped to subjects and objects of RDF triples to describe the source dataset and target dataset, respectively.
The SOURCESET meta field (Section 4.3.1) replaces the blank node :sourceset.
The TARGETSET meta field (Section 4.3.1) replaces the blank node :targetset.
The NAME meta field (Section 4.3.3) is mapped to the RDF property dcterms:title with :targetset as RDF subject. For instance the field value "Wikipedia", expressible in BEACON format as
#NAME: Wikipedia
can be mapped to
:targetset dcterms:title "Wikipedia" .
The INSTITUTION meta field (Section 4.3.4) corresponds to the RDF property dcterms:publisher. The RDF object SHOULD NOT be a literal node. For instance
#INSTITUTION: Wikimedia Foundation
can be mapped
:dump dcterms:publisher [ foaf:name "Wikimedia" ] .
A field value starting with http:// or https:// is interpreted as URI instead of string. For instance
#INSTITUTION: http://viaf.org/viaf/137022054/
can be mapped to
:targetset dcterms:publisher http://viaf.org/viaf/137022054/ .
BEACON format (Section 3) can be used as serialization format for RDF graphs where all parts of RDF triples are IRIs and IRIs do not contain the character sequences %7C, %0A, %0D, or any other percent-encoded character not included in the list of allowed characters (Section 2.2). This limitation applies because the disallowed character sequences would need to result from characters not allowed in link tokens of BEACON format.
BEACON link dumps can be served for instance as Triple Pattern Fragments [TPF] which also consist of a set of links sharing a common pattern, and additional metadata.
Programs should be prepared for malformed and malicious content when parsing BEACON files, when constructing links from link tokens, and when mapping links to RDF or HTML. Possible attacks of parsing contain broken UTF-8 and buffer overflows. Link construction can result in unexpectedly long strings and character sequences that may be harmless when analyzed as parts. Most notably, BEACON data may store strings containing HTML and JavaScript code to be used for cross-site scripting attacks on the site displaying BEACON links. Applications should therefore escape or filter accordingly all content with established libraries, such as Apache Escape Utils.
[FOAF] | Brickley, D. and L. Miller, "FOAF Vocabulary Specification", Aug 2010. |
[RFC5013] | Kunze, J. and T. Baker, "The Dublin Core Metadata Element Set", RFC 5013, August 2007. |
[Sitemaps] | Google Inc., , "Sitemaps XML format", February 2008. |
[SAX] | Bradner, D., "SAX 1.0: The Simple API for XML", May 1998. |
[TURTLE] | Beckett, D. and T. Berners-Lee, "Turtle - Terse RDF Triple Language", Mar 2011. |
[RELAX-NGC] | Clark, J., "RELAX NG Compact Syntax", Nov 2002. |
[RSSSYND] | RSS-DEV Working Group, , "RDF Site Summary 1.0 Modules: Syndication", Dec 2000. |
[RDF] | Hayes, P., "RDF Semantics", Feb 2004. |
[VOID] | Cyganiak, R., Zhao, J., Alexander, K. and M. Hausenblas, "Vocabulary of Interlinked Datasets (VoID)", Mar 2011. |
[TPF] | Verborgh, R., "Triple Pattern Fragments", Jan 2017. |
[Hydra] | Lanthaler, M., "Hydra Core Vocabulary", Oct 2017. |
An important use-case of BEACON is the creation of HTML links to related documents. A link in a BEACON dump can be mapped to a HTML link (<a> element) as following:
For instance the following link, given in a BEACON file:
http://example.com|example|http://example.org
can be mapped to the following HTML link:
<a href="http://example.org">example</a>
Note that the link annotation is optional. Additional meta fields can be used to construct appropriate HTML links. For instance the meta fields
#RELATION: http://xmlns.com/foaf/0.1/isPrimaryTopicOf #SOURCETYPE: http://xmlns.com/foaf/0.1/Person #NAME: ACME documents
can be used to create a link such as
<span> More information about this person <a href="http://example.com/foo">at ACME documents</a>. </span>
because http://xmlns.com/foaf/0.1/isPrimaryTopicOf translates to "more information about", http://xmlns.com/foaf/0.1/Person translates to "this person", and the target dataset’s name "ACME documents" can be used as link label.
A BEACON XML file is a valid XML file conforming to the following schema. The file SHOULD be encoded in UTF-8 [RFC3629]. The file MUST:
The file MAY further:
All attributes MUST be given in lowercase.
To process BEACON XML files, a complete and stream-processing XML parser, for instance the Simple API for XML [SAX], is RECOMMENDED, in favor of parsing with regular expressions or similar methods prone to errors. Additional XML attributes of <link> elements and <link> elements without source attribute SHOULD be ignored.
Note that in contrast to BEACON text files, link tokens MAY include line breaks, which MUST BE removed by whitespace normalization. Furthermore id field, annotation field and target token MAY include a vertical bar, which MUST be replaced by the character sequence %7C before further processing.
A schema of BEACON XML format in RELAX NG Compact syntax [RELAX-NGC] can be given as following:
default namespace = "http://purl.org/net/beacon" element beacon { attribute prefix { text }. attribute target { text }, attribute message { text }, attribute source { text }, attribute name { text }, attribute institution { text }, attribute description { text }, attribute creator { text }, attribute contact { text }, attribute homepage { xsd:anyURI }, attribute feed { xsd:anyURI }, attribute timestamp { text }, attribute update { "always" | "hourly" | "daily" | "weekly" | "monthly" | "yearly" | "never" }, attribute relation { xsd:anyURI }, attribute annotation { xsd:anyURI }, element link { attribute source { text }, attribute target { text }?, attribute annotation { text }?, empty }* }
A short example of a link dump serialized in BEACON text format:
#FORMAT: BEACON #PREFIX: http://example.org/ #TARGET: http://example.com/ #NAME: ACME document alice||foo bob ada|bar
The link dump can be mapped to RDF as following:
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> . @prefix hydra: <http://www.w3.org/ns/hydra/core#> . @prefix void: <http://rdfs.org/ns/void#> . :sourceset a void:Dataset ; void:uriSpace "http://example.org/" . :targetset a void:Dataset ; void:uriSpace "http://example.com/" . :dump a void:Linkset, hydra:Collection ; void:subjectsTarget :sourceset ; void:objectsTarget :targetset ; void:linkPredicate rdfs:seeAlso ; hydra:totalItems 3 ; void:entities 3 ; void:triples 4 . <http://example.org/alice> rdfs:seeAlso <http://example.com/foo> . <http://example.org/bob> rdfs:seeAlso <http://example.com/bob> . <http://example.org/ada> rdfs:seeAlso <http://example.com/ada> . <http://example.com/ada> rdfs:value "bar" .
The same link dump serialized in BEACON XML format (Appendix C):
<?xml version="1.0" encoding="UTF-8"?> <beacon xmlns="http://purl.org/net/beacon" prefix="http://example.org/" target="http://example.com/" name="ACME document"> <link source="alice" target="foo" /> <link source="bob" /> <link source="ada" annotation="bar" /> </beacon>
To give an extended example, the "ACME" company wants to provide links from documents to people that contributed to each document. A list of documents is available from http://example.com/documents/ and a list of people, titled "ACME staff", is available from http://example.com/people/.
This information can be expressed in a serialized link dump with BEACON meta fields as following:
#FORMAT: BEACON #INSTITUTION: ACME #RELATION: http://purl.org/dc/elements/1.1/contributor #SOURCESET: http://example.com/documents/ #TARGETSET: http://example.com/people/ #NAME: ACME staff
Both source identifiers for people and target identifiers for documents follow a pattern, so links can be abbreviated as following:
#PREFIX: http://example.com/documents/ #TARGET: http://example.com/people/{+ID}.about 23||alice 42||bob
From this form the following links can be constructed:
http://example.com/documents/23|http://example.com/people/alice.about http://example.com/documents/42|http://example.com/people/bob.about
The example can be extended by addition of a third element for each link. For instance the annotation could be used to specifcy the date of each document:
#ANNOTATION: http://purl.org/dc/elements/1.1/date 23|2017-11-28|alice 42|2017-01-31|bob
This link dump can be mapped to RDF as following:
@prefix void: <http://rdfs.org/ns/void#> . @prefix hydra: <http://www.w3.org/ns/hydra/core#> . @prefix dcterms: <http://purl.org/dc/terms/> . @prefix dc: <http://purl.org/dc/elements/1.1/> . :dump a void:Linkset, hydra:Collection ; void:subjectsTarget <http://example.com/documents/> ; void:objectsTarget <http://example.com/people/> ; void:linkPredicate dc:contributor ; hydra:totalItems 2 ; void:entities 2 ; void:triples 4 . <http://example.com/documents/> a void:Dataset ; void:uriSpace "http://example.com/documents/" . <http://example.com/people/> a void:Dataset ; dcterms:publisher "ACME" ; dcterms:title "ACME staff" ; void:uriSpace "http://example.com/people/" ; void:uriRegexPattern "^http://example\\.com/people/(.+)\\.about$" . <http://example.com/documents/23> dc:contributor <http://example.com/people/alice.about> ; dc:date "2017-11-28" . <http://example.com/documents/42> dc:contributor <http://example.com/people/bob.about> ; dc:date "2017-01-31" .