Unity
Parser for unit strings
Unity
Author
Norman Gray https://nxg.me.uk

The Unity package provides a parser for unit strings.

This is the unity parser (version 1.1-snap.888aa0195ca4, released 2022 August 2), which is a C library to help parse unit specification strings such as W.mm**-2. There is also an associated Java class library which uses the same grammars. For more details, see the library's home page; and repository.

As well as parsing various unit strings, the library can also serialise a parsed expression in various formats, including the four formats that it can parse, a LaTeX version with name latex (which uses the {siunitx} package) and a debug format which lists the parsed unit in an unambiguous, but not otherwise useful, form.

Parsing Units

You can parse units using a couple of different syntaxes since, unfortunately, there is no general consensus on which syntax the world should agree on. The ones supported (and their names within this library) are as follows :

fits
See FITS, 3.0 Sect.4.3 (W.D. Pence et al., A&A 524, A42, 2010); v4.0 Sect.4.3 (FITS standards page)); and further comments in the FITS WCS IV paper.
ogip
OGIP memo OGIP/93-001, 1993
cds
Standards for Astronomical Catalogues, Version 2.0, section 3.2, 2000
vounits
IVOA VOUnits Recommendation

See also:

  • The IAU style manual, section 5.1 (1989) is by now rather old, but appears to be one of the few existing standards for units, specific to astronomy.
  • ISO/IEC 80000 (parts 1–13) describes a variety of units, including the specification of the 'binary' prefixes kibi, mebi, and so on (see ISO/IEC 80000-13 Sect.4, and IEEE standard 1541-2002).
  • The VOUnits Recommendation discusses various tradeoffs and conflicting specifications, at some length.

Each of these has an associated writer, which allows you to write a parsed UnitExpression to a string, in a format which should be conformant with the particular syntax's standard. See unity_write_formatted.

In addition, there is a latex writer, which produces a formatted form for the the expression, in a form suitable for inclusion in a LaTeX document, using the siunitx package. To use the resulting output in a LaTeX document, include the following in the preamble of the file:

\usepackage{siunitx}
\DeclareSIQualifier\solar{$\odot$}

You may add any siunitx options that seem convenient, and you may omit the declaration of \solar if the units in the document do not include the various solar ones.

The parsing is permissive, to the extent that it permits non-recognised and deprecated units. The result of the parse may be checked for conformance with one or other standard using the functions unity_check_unit and unity_check_expression. Note that SI prefixes are still noticed for unrecognised units: thus furlongs/fortnight will be parsed as femto-urlongs per femto-ortnight. The same is not true of recognised units: a pixel/s is a pixel per second, and does not involve a pico-ixel.

Known Units

The various unit syntaxes have different sets of ‘known units’, namely units, and unit abbreviations, which the syntax blesses as recommended, or at least acknowledged. The list of knowns units in the various syntaxes is below.

Much of the unit information, such as unit names and dimensions, is derived from the QUDT unit ontology which (from its self-description) was ‘developed for the NASA Exploration Initiatives Ontology Models (NExIOM) project, a Constellation Program initiative at the AMES Research Center (ARC).’

Demo

If you want to experiment with the library, build the program src/c/unity (in the distribution):

    % ./unity -icds -oogip 'mm2/s'
    mm**2 /s
    % ./unity -icds -ofits -v mm/s
    mm s-1
    check: all units recognised?           yes
    check: all units recommended?          yes
    check: all units satisfy constraints?  yes
    % ./unity -ifits -ocds -v 'merg/s'
    merg/s
    check: all units recognised?           yes
    check: all units recommended?          no
    check: all units satisfy constraints?  no
    % ./unity -icds -ofits -v 'merg/s'
    merg s-1
    check: all units recognised?           no
    check: all units recommended?          no
    check: all units satisfy constraints?  yes

In the latter cases, the -v option validates the input string against various constraints. The expression mm/s is completely valid in all the syntaxes. In the FITS syntax, the erg is a recognised unit, but it is deprecated; although it is recognised, it is not permitted to have SI prefixes. In the CDS syntax, the erg is neither recognised nor (a fortiori) recommended; since there are no constraints on it in this syntax, it satisfies all of them (this latter behaviour is admittedly slightly counterintuitive).

Grammars supported

The four supported grammars have a fair amount in common, but the differences are nonetheless significant enough that they require separate grammars. Important differences are in the number of solidi they allow in the units specifications, and the symbols they use for products and powers.

Note that in each of the grammars here, the empty string is not a valid units string – in particular, it is not taken to indicate a dimensionless quantity. If a particular context wishes to interpret such a string as indicating a dimensionless quantity, or perhaps instead indicating ‘units unknown’, then it should handle that case separately. To obtain what would be the result of such a parse, use the function unity_get_dimensionless(void). The VOUnits syntax, though it also deems the empty string to be invalid, recognises the string "1" as indicating a dimensionless quantity.

Current limitations:

  • Currently ignores some of the odder unit restrictions (such as the OGIP requirement that 'Crab' can have a 'milli' prefix, but no other SI prefixes)

In the grammars below, the common terminals are as follows:

  • WHITESPACE: one or more whitespace characters (in the grammars, whitespace is not permitted unless it matches a WHITESPACE terminal)
  • STAR, DOT: a star or a dot, generally used to indicate multiplication
  • DIVISION: a slash
  • STARSTAR, CARET: the former is "**", the latter "^"; both are used to indicate exponentiation
  • OPEN_P, CLOSE_P: open and close parentheses
  • INTEGER, FLOAT: numbers; the syntax of FLOAT is [+-]?[1-9][0-9]*\.[0-9]+, so that there are no exponents allowed; the signed integers have a non-optional leading sign, the unsigned don't
  • STRING: a sequence of one or more upper- and lower-case ASCII letters, [a-zA-Z]+
  • LIT10, LIT1: the literal strings "10" and "1"

There are some other terminals used in some grammars. See the VOUnits specification for further details.

The FITS grammar

input: complete_expression 
        | scalefactor complete_expression 
        | scalefactor WHITESPACE complete_expression 
        | division unit_expression 
        ;
complete_expression: product_of_units 
        | product_of_units division unit_expression 
        ;
product_of_units: unit_expression 
        | product_of_units product unit_expression 
        ;
unit_expression: term 
        // m(2) is m^2, not function application
        | STRING parenthesized_number 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;
function_application: STRING OPEN_P complete_expression CLOSE_P ;
scalefactor: LIT10 power numeric_power 
        | LIT10 SIGNED_INTEGER 
        ;
division: DIVISION;
term: unit 
        | unit numeric_power 
        | unit power numeric_power 
        ;
unit: STRING 
        ;
power: CARET
        | STARSTAR
        ;
numeric_power: integer 
        | parenthesized_number 
        ;
parenthesized_number: OPEN_P integer CLOSE_P 
        | OPEN_P FLOAT CLOSE_P 
        | OPEN_P integer division UNSIGNED_INTEGER CLOSE_P 
        ;
integer: SIGNED_INTEGER | UNSIGNED_INTEGER;
product: WHITESPACE | STAR | DOT;

The OGIP grammar

input: complete_expression 
        | scalefactor complete_expression 
        | scalefactor WHITESPACE complete_expression 
        ;
complete_expression: product_of_units 
        ;
product_of_units: unit_expression
        | division unit_expression 
        | product_of_units product unit_expression 
        | product_of_units division unit_expression 
        ;
unit_expression: term 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;
function_application: STRING OPEN_P complete_expression CLOSE_P ;
scalefactor: LIT10 power numeric_power 
        | LIT10 
        | FLOAT 
        ;
division: DIVISION | WHITESPACE DIVISION
        | WHITESPACE DIVISION WHITESPACE | DIVISION WHITESPACE;
term: unit 
        | unit power numeric_power 
        ;
unit: STRING 
        ;
power: STARSTAR;
numeric_power: UNSIGNED_INTEGER 
        | FLOAT 
        | parenthesized_number 
        ;
parenthesized_number: OPEN_P integer CLOSE_P 
        | OPEN_P FLOAT CLOSE_P 
        | OPEN_P integer division UNSIGNED_INTEGER CLOSE_P 
        ;
integer: SIGNED_INTEGER | UNSIGNED_INTEGER;
product: WHITESPACE | STAR | WHITESPACE STAR
       | WHITESPACE STAR WHITESPACE | STAR WHITESPACE;

The CDS grammar

This is quite similar to the OGIP grammar, but with more restrictions.

The CDSFLOAT terminal is a string matching the regular expression [0-9]+\.[0-9]+x10[-+][0-9]+ (that is, something resembling 1.5x10+11). The termainals OPEN_SQ and CLOSE_SQ are opening and closing square brackets [...].

input: complete_expression 
        | scalefactor complete_expression 
        ;
complete_expression: product_of_units 
        ;
product_of_units: unit_expression
        | division unit_expression 
        | product_of_units product unit_expression 
        | product_of_units division unit_expression 
        ;
unit_expression: term 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;
function_application: OPEN_SQ complete_expression CLOSE_SQ 
        ;
scalefactor: LIT10 power numeric_power 
        | LIT10 SIGNED_INTEGER 
        | UNSIGNED_INTEGER 
        | LIT10 
        | CDSFLOAT 
        | FLOAT 
        ;
division: DIVISION;
term: unit 
        | unit numeric_power 
        ;
unit: STRING 
        | PERCENT 
        ;
power: STARSTAR;
numeric_power: integer 
        ;
integer: SIGNED_INTEGER | UNSIGNED_INTEGER;
product: DOT;

The VOUnits grammar

The VOUFLOAT and QUOTED_STRING features are extensions beyond the other grammars. These aside, this syntax is a strict subset of the FITS and CDS grammars, in the sense that any VOUnit unit string, without these extensions, is a valid FITS and CDS string, too), and it is almost a subset of the OGIP grammar, except that it uses the dot for multiplication rather than star.

The VOUFLOAT terminal is a string matching either of the regular expressions 0\.[0-9]+([eE][+-]?[0-9]+)? or [1-9][0-9]*(\.[0-9]+)?([eE][+-]?[0-9]+)? (that is, something resembling, for example, 0.123 or 1.5e+11). Also QUOTED_STRING is a STRING enclosed in single quotes '...'.

input: complete_expression 
        | scalefactor complete_expression 
        | LIT1 
        ;
complete_expression: product_of_units 
        | product_of_units division unit_expression 
        ;
product_of_units: unit_expression 
        | product_of_units product unit_expression 
        ;
unit_expression: term 
        | function_application 
        | OPEN_P complete_expression CLOSE_P 
        ;
function_application: STRING OPEN_P  function_operand CLOSE_P ;
function_operand: complete_expression 
        | scalefactor complete_expression ;
scalefactor: LIT10 power numeric_power 
        | LIT10 
        | LIT1 
        | VOUFLOAT 
        ;
division: DIVISION;
term: unit 
        | unit power numeric_power 
        ;
unit: STRING 
        | QUOTED_STRING 
        | STRING QUOTED_STRING 
        ;
power: STARSTAR;
numeric_power: integer 
        | parenthesized_number 
        ;
parenthesized_number: OPEN_P integer CLOSE_P 
        | OPEN_P FLOAT CLOSE_P 
        | OPEN_P integer division UNSIGNED_INTEGER CLOSE_P 
        ;
integer: SIGNED_INTEGER | UNSIGNED_INTEGER;
product: DOT;

The lists of known units

Below are the lists of known units, taken from Pence et al, the OGIP specification, and the CDS specification.

In the columns below, a 1 indicates that the unit is permitted, s indicates that SI prefixes are allowed, b that IEC binary prefixes are allowed, d that the unit is deprecated in some way, and p that the symbol is the preferred one in this syntax (where there is more than one symbol that maps to this unit). The CDS standard doesn’t indicate which units may or may not take SI prefixes: in the table below, we generally follow the FITS prescription, except where the CDS specification positively suggests otherwise. Where there are two possible abbreviations for a unit in a syntax, (eg FITS allows ‘pixel’ and ‘pix’), we prefer the one marked with a ‘p’.

The lists of known units
unitmeaningFITSOGIPCDSVOUnits
%qudt:Percent1
Aqudt:Ampere1s1s1s1s
aunity:JulianYear1ps1s1s
aduunity:ADU11s
Angstromqudt:Angstrom1d11dp
angstromqudt:Angstrom11d
arcminqudt:ArcMinute1111s
arcsecqudt:ArcSecond111s1s
AUqudt:AstronomicalUnit1111p
auqudt:AstronomicalUnit1
Baunity:BesselianYear1d
barnqudt:Barn1sd11s1sd
beamunity:Beam11s
binunity:DistributionBin111s
bitqudt:Bit1s1s1sb
bytequdt:Byte1s11s1sbp
Bqudt:Byte1sb
Cqudt:Coulomb1s1s1s1s
cdqudt:Candela1s1s1s1s
chanunity:DetectorChannel111s
countqudt:Number111sp
Crabunity:Crab1s
ctqudt:Number111s
cyunity:JulianCentury1
dqudt:Day1111s
dBqudt:Decibel1
Dqudt:Debye111s
degqudt:DegreeAngle1111s
ergqudt:Erg1d11sd
eVqudt:ElectronVolt1s1s1s1s
Fqudt:Farad1s1s1s1s
gqudt:Gram1s1s1s1s
Gqudt:Gauss1sd11sd
Hqudt:Henry1s1s1s1s
hqudt:Hour1111s
Hzqudt:Hertz1s1s1s1s
Jqudt:Joule1s1s1s1s
Jyunity:Jansky1s1s1s1s
Kqudt:Kelvin1s1s1s1s
lmqudt:Lumen1s1s1s1s
lxqudt:Lux1s1s1s1s
lyrqudt:LightYear111s
mqudt:Meter1s1s1s1s
magunity:StellarMagnitude1s11s1s
masunity:MilliArcSecond111
minqudt:MinuteTime1111s
molqudt:Mole1s1s1s1s
Nqudt:Newton1s1s1s1s
Ohmqudt:Ohm1s1s1s
ohmqudt:Ohm1s
Paqudt:Pascal1s1s1s1s
pcqudt:Parsec1s1s1s1s
phunity:Photon11s
photonunity:Photon1p11sp
pixunity:Pixel111s
pixelunity:Pixel1p11sp
Runity:Rayleigh1s1s
radqudt:Radian1s1s1s1s
Ryunity:Rydberg11s1s
squdt:SecondTime1s1s1s1s
Squdt:Siemens1s1s1s1s
solLumunity:SolarLuminosity111s
solMassunity:SolarMass111s
solRadunity:SolarRadius111s
srqudt:Steradian1s1s1s1s
Tqudt:Tesla1s1s1s1s
taqudt:YearTropical1d
uqudt:UnifiedAtomicMassUnit11s
Vqudt:Volt1s1s1s1s
voxelunity:Voxel111s
Wqudt:Watt1s1s1s1s
Wbqudt:Weber1s1s1s1s
yrunity:JulianYear1s11sp1sp