Unity
Parser for unit strings

The Unity package provides a parser for unit strings.
This is the unity parser (version 1.1, released 2023 October 22), which is a C library to help parse unit specification strings such as W.mm**2
. There is also an associated Java class library which uses the same grammars. For more details, see the library's home page; and repository.
As well as parsing various unit strings, the library can also serialise a parsed expression in various formats, including the four formats that it can parse, a LaTeX version with name latex
(which uses the {siunitx}
package) and a debug
format which lists the parsed unit in an unambiguous, but not otherwise useful, form.
You can parse units using a couple of different syntaxes since, unfortunately, there is no general consensus on which syntax the world should agree on. The ones supported (and their names within this library) are as follows :
See also:
Each of these has an associated writer, which allows you to write a parsed UnitExpression to a string, in a format which should be conformant with the particular syntax's standard. See unity_write_formatted
.
In addition, there is a latex writer, which produces a formatted form for the the expression, in a form suitable for inclusion in a LaTeX document, using the siunitx
package. To use the resulting output in a LaTeX document, include the following in the preamble of the file:
\usepackage{siunitx} \DeclareSIQualifier\solar{$\odot$}
You may add any siunitx
options that seem convenient, and you may omit the declaration of \solar
if the units in the document do not include the various solar ones.
The parsing is permissive, to the extent that it permits nonrecognised and deprecated units. The result of the parse may be checked for conformance with one or other standard using the functions unity_check_unit
and unity_check_expression
. Note that SI prefixes are still noticed for unrecognised units: thus furlongs/fortnight
will be parsed as femtourlongs per femtoortnight. The same is not true of recognised units: a pixel/s
is a pixel per second, and does not involve a picoixel.
The various unit syntaxes have different sets of ‘known units’, namely units, and unit abbreviations, which the syntax blesses as recommended, or at least acknowledged. The list of knowns units in the various syntaxes is below.
Much of the unit information, such as unit names and dimensions, is derived from the QUDT unit ontology which (from its selfdescription) was ‘developed for the NASA Exploration Initiatives Ontology Models (NExIOM) project, a Constellation Program initiative at the AMES Research Center (ARC).’
If you want to experiment with the library, build the program src/c/unity
(in the distribution):
% ./unity icds oogip 'mm2/s' mm**2 /s % ./unity icds ofits v mm/s mm s1 check: all units recognised? yes check: all units recommended? yes check: all units satisfy constraints? yes % ./unity ifits ocds v 'merg/s' merg/s check: all units recognised? yes check: all units recommended? no check: all units satisfy constraints? no % ./unity icds ofits v 'merg/s' merg s1 check: all units recognised? no check: all units recommended? no check: all units satisfy constraints? yes
In the latter cases, the v
option validates the input string against various constraints. The expression mm/s
is completely valid in all the syntaxes. In the FITS syntax, the erg
is a recognised unit, but it is deprecated; although it is recognised, it is not permitted to have SI prefixes. In the CDS syntax, the erg
is neither recognised nor (a fortiori) recommended; since there are no constraints on it in this syntax, it satisfies all of them (this latter behaviour is admittedly slightly counterintuitive).
The four supported grammars have a fair amount in common, but the differences are nonetheless significant enough that they require separate grammars. Important differences are in the number of solidi they allow in the units specifications, and the symbols they use for products and powers.
Note that in each of the grammars here, the empty string is not a valid units string – in particular, it is not taken to indicate a dimensionless quantity. If a particular context wishes to interpret such a string as indicating a dimensionless quantity, or perhaps instead indicating ‘units unknown’, then it should handle that case separately. To obtain what would be the result of such a parse, use the function unity_get_dimensionless(void)
. The VOUnits syntax, though it also deems the empty string to be invalid, recognises the string "1"
as indicating a dimensionless quantity.
Current limitations:
In the grammars below, the common terminals are as follows:
[+]?[19][09]*\.[09]+
, so that there are no exponents allowed; the signed integers have a nonoptional leading sign, the unsigned don't [azAZ]+
There are some other terminals used in some grammars. See the VOUnits specification for further details.
input: complete_expression  scalefactor complete_expression  scalefactor WHITESPACE complete_expression  division unit_expression ; complete_expression: product_of_units  product_of_units division unit_expression ; product_of_units: unit_expression  product_of_units product unit_expression ; unit_expression: term // m(2) is m^2, not function application  STRING parenthesized_number  function_application  OPEN_P complete_expression CLOSE_P ; function_application: STRING OPEN_P complete_expression CLOSE_P ; scalefactor: LIT10 power numeric_power  LIT10 SIGNED_INTEGER ; division: DIVISION; term: unit  unit numeric_power  unit power numeric_power ; unit: STRING ; power: CARET  STARSTAR ; numeric_power: integer  parenthesized_number ; parenthesized_number: OPEN_P integer CLOSE_P  OPEN_P FLOAT CLOSE_P  OPEN_P integer division UNSIGNED_INTEGER CLOSE_P ; integer: SIGNED_INTEGER  UNSIGNED_INTEGER; product: WHITESPACE  STAR  DOT;
input: complete_expression  scalefactor complete_expression  scalefactor WHITESPACE complete_expression ; complete_expression: product_of_units ; product_of_units: unit_expression  division unit_expression  product_of_units product unit_expression  product_of_units division unit_expression ; unit_expression: term  function_application  OPEN_P complete_expression CLOSE_P ; function_application: STRING OPEN_P complete_expression CLOSE_P ; scalefactor: LIT10 power numeric_power  LIT10  FLOAT ; division: DIVISION  WHITESPACE DIVISION  WHITESPACE DIVISION WHITESPACE  DIVISION WHITESPACE; term: unit  unit power numeric_power ; unit: STRING ; power: STARSTAR; numeric_power: UNSIGNED_INTEGER  FLOAT  parenthesized_number ; parenthesized_number: OPEN_P integer CLOSE_P  OPEN_P FLOAT CLOSE_P  OPEN_P integer division UNSIGNED_INTEGER CLOSE_P ; integer: SIGNED_INTEGER  UNSIGNED_INTEGER; product: WHITESPACE  STAR  WHITESPACE STAR  WHITESPACE STAR WHITESPACE  STAR WHITESPACE;
This is quite similar to the OGIP grammar, but with more restrictions.
The CDSFLOAT
terminal is a string matching the regular expression [09]+\.[09]+x10[+][09]+
(that is, something resembling 1.5x10+11
). The termainals OPEN_SQ
and CLOSE_SQ
are opening and closing square brackets [...]
.
input: complete_expression  scalefactor complete_expression ; complete_expression: product_of_units ; product_of_units: unit_expression  division unit_expression  product_of_units product unit_expression  product_of_units division unit_expression ; unit_expression: term  function_application  OPEN_P complete_expression CLOSE_P ; function_application: OPEN_SQ complete_expression CLOSE_SQ ; scalefactor: LIT10 power numeric_power  LIT10 SIGNED_INTEGER  UNSIGNED_INTEGER  LIT10  CDSFLOAT  FLOAT ; division: DIVISION; term: unit  unit numeric_power ; unit: STRING  PERCENT ; power: STARSTAR; numeric_power: integer ; integer: SIGNED_INTEGER  UNSIGNED_INTEGER; product: DOT;
The VOUFLOAT
and QUOTED_STRING
features are extensions beyond the other grammars. These aside, this syntax is a strict subset of the FITS and CDS grammars, in the sense that any VOUnit unit string, without these extensions, is a valid FITS and CDS string, too), and it is almost a subset of the OGIP grammar, except that it uses the dot for multiplication rather than star.
The VOUFLOAT
terminal is a string matching either of the regular expressions 0\.[09]+([eE][+]?[09]+)?
or [19][09]*(\.[09]+)?([eE][+]?[09]+)?
(that is, something resembling, for example, 0.123
or 1.5e+11
). Also QUOTED_STRING
is a STRING
enclosed in single quotes '...'
.
input: complete_expression  scalefactor complete_expression  LIT1 ; complete_expression: product_of_units  product_of_units division unit_expression ; product_of_units: unit_expression  product_of_units product unit_expression ; unit_expression: term  function_application  OPEN_P complete_expression CLOSE_P ; function_application: STRING OPEN_P function_operand CLOSE_P ; function_operand: complete_expression  scalefactor complete_expression ; scalefactor: LIT10 power numeric_power  LIT10  LIT1  VOUFLOAT ; division: DIVISION; term: unit  unit power numeric_power ; unit: STRING  QUOTED_STRING  STRING QUOTED_STRING  PERCENT ; power: STARSTAR; numeric_power: integer  parenthesized_number ; parenthesized_number: OPEN_P integer CLOSE_P  OPEN_P FLOAT CLOSE_P  OPEN_P integer division UNSIGNED_INTEGER CLOSE_P ; integer: SIGNED_INTEGER  UNSIGNED_INTEGER; product: DOT;
Below are the lists of known units, taken from Pence et al, the OGIP specification, and the CDS specification.
In the columns below, a 1
indicates that the unit is permitted, s
indicates that SI prefixes are allowed, b
that IEC binary prefixes are allowed, d
that the unit is deprecated in some way, and p
that the symbol is the preferred one in this syntax (where there is more than one symbol that maps to this unit). The CDS standard doesn’t indicate which units may or may not take SI prefixes: in the table below, we generally follow the FITS prescription, except where the CDS specification positively suggests otherwise. Where there are two possible abbreviations for a unit in a syntax, (eg FITS allows ‘pixel’ and ‘pix’), we prefer the one marked with a ‘p’.
unit  meaning  FITS  OGIP  CDS  VOUnits 
%  qudt:Percent  1  1  
A  qudt:Ampere  1s  1s  1s  1s 
a  unity:JulianYear  1ps  1s  1s  
adu  unity:ADU  1  1s  
Angstrom  qudt:Angstrom  1d  1  1dp  
angstrom  qudt:Angstrom  1  1d  
arcmin  qudt:ArcMinute  1  1  1  1s 
arcsec  qudt:ArcSecond  1  1  1s  1s 
AU  qudt:AstronomicalUnit  1  1  1  1p 
au  qudt:AstronomicalUnit  1  
Ba  unity:BesselianYear  1d  1d  
barn  qudt:Barn  1sd  1  1s  1sd 
beam  unity:Beam  1  1s  
bin  unity:DistributionBin  1  1  1s  
bit  qudt:Bit  1s  1s  1sb  
byte  qudt:Byte  1s  1  1s  1sbp 
B  qudt:Byte  1sb  
C  qudt:Coulomb  1s  1s  1s  1s 
cd  qudt:Candela  1s  1s  1s  1s 
chan  unity:DetectorChannel  1  1  1s  
count  qudt:Number  1  1  1sp  
Crab  unity:Crab  1s  
ct  qudt:Number  1  1  1s  
cy  unity:JulianCentury  1  
d  qudt:Day  1  1  1  1s 
dB  qudt:Decibel  1  
D  qudt:Debye  1  1  1s  
deg  qudt:DegreeAngle  1  1  1  1s 
erg  qudt:Erg  1d  1  1sd  
eV  qudt:ElectronVolt  1s  1s  1s  1s 
F  qudt:Farad  1s  1s  1s  1s 
g  qudt:Gram  1s  1s  1s  1s 
G  qudt:Gauss  1sd  1  1sd  
H  qudt:Henry  1s  1s  1s  1s 
h  qudt:Hour  1  1  1  1s 
Hz  qudt:Hertz  1s  1s  1s  1s 
J  qudt:Joule  1s  1s  1s  1s 
Jy  unity:Jansky  1s  1s  1s  1s 
K  qudt:Kelvin  1s  1s  1s  1s 
lm  qudt:Lumen  1s  1s  1s  1s 
lx  qudt:Lux  1s  1s  1s  1s 
lyr  qudt:LightYear  1  1  1s  
m  qudt:Meter  1s  1s  1s  1s 
mag  unity:StellarMagnitude  1s  1  1s  1s 
mas  unity:MilliArcSecond  1  1  1  
min  qudt:MinuteTime  1  1  1  1s 
mol  qudt:Mole  1s  1s  1s  1s 
N  qudt:Newton  1s  1s  1s  1s 
Ohm  qudt:Ohm  1s  1s  1s  
ohm  qudt:Ohm  1s  
Pa  qudt:Pascal  1s  1s  1s  1s 
pc  qudt:Parsec  1s  1s  1s  1s 
ph  unity:Photon  1  1s  
photon  unity:Photon  1p  1  1sp  
pix  unity:Pixel  1  1  1s  
pixel  unity:Pixel  1p  1  1sp  
R  unity:Rayleigh  1s  1s  
rad  qudt:Radian  1s  1s  1s  1s 
Ry  unity:Rydberg  1  1s  1s  
s  qudt:SecondTime  1s  1s  1s  1s 
S  qudt:Siemens  1s  1s  1s  1s 
solLum  unity:SolarLuminosity  1  1  1s  
solMass  unity:SolarMass  1  1  1s  
solRad  unity:SolarRadius  1  1  1s  
sr  qudt:Steradian  1s  1s  1s  1s 
T  qudt:Tesla  1s  1s  1s  1s 
ta  qudt:YearTropical  1d  1d  
u  qudt:UnifiedAtomicMassUnit  1  1s  
V  qudt:Volt  1s  1s  1s  1s 
voxel  unity:Voxel  1  1  1s  
W  qudt:Watt  1s  1s  1s  1s 
Wb  qudt:Weber  1s  1s  1s  1s 
yr  unity:JulianYear  1s  1  1sp  1sp 