lipstick

Syntax highlighting of MARC and related record formats

Syntax elements

Basic structure

The basic structure of all record formats can be simplified as this:

Format variants mainly differ in their separators, for instance binary ISO MARC used byte codes 0x1D, 0x1D, and 0x1F to separate records, fields, and subfield separators respectively.

Data elements

Syntax highlighting is limited to readable format variants, for this reason we assume that each field is one line.

A MARC tag is either the leader tag LDR or three digits. A PIC tag is three digits, the first beeing 0, 1, or 3, followed by an uppercase ASCII letter or @.

MARC tags can be prepended by = (MARCMaker format)

MARC tags of fixed fields can be followed by a length consisting of / and two positions separated by -. A position consists of two digits (this extension is found in MARC examples).

An indicator is a an ASCII alphabetic or numeric character, or blank. Blank indicators can be shown as #, _, \, or a single space.

An occurrence is two or three digits, prepended with /

A subfield separator is $ or but these characters should not be mixed in one field. The character can be escaped in subfield values by duplication ($$ or ‡‡). The double dagger symbol (=U+2021) could also be a palatal click symbol (ǂ=U+01C2).

A subfield code is a single character.

Additional whitespace should be allowed for readability.