Data Formats | Data Formats

SAS has been used as a diagnostic and a metric in so many fields of study for so long that a veritable plethora of data formats now exist.

This may not be an issue for the occasional user with their own lab-based light or X-ray scattering apparatus, but for the global SAS community moving from facility-to-facility, interchanging between X-rays and neutrons, it is a big problem. Precious time can be wasted tediously converting files from one format into another, data may be corrupted or detail lost. Indeed it has not been unknown for the dimensions of the data to become confused resulting in erroneous scientific conclusions.

Small-angle measurements can be loosely divided into four categories: fibre difffraction, studies of defects in single crystals, systems under an applied stress/strain, and studies of solutions or colloidal systems. For the first three it is useful to analyse the whole SAS pattern, for the last a 'radial average' suffices. The former category are referred to as 2D data, the latter as 1D data.

A distinction is also made been 'raw' (or 'untreated') data - that recorded by the detector without any processing or proper correction - and 'reduced' (or 'treated') data.

Data may be stored in binary form, or as ASCII text (what you are reading now), with or without formatting (e.g. neat columns of numbers or some other inherent structure).

Four data formats deserve particular mention:

  • BSL - 1D or 2D, raw or reduced data; binary, unformatted
  • sasCIF - intended as a 1D archive format for reduced data; ASCII, formatted
  • NeXus  - currently aimed at 1D or 2D raw data; binary, HDF formatted
  • SASXML  - 1D reduced data; ASCII, formatted; 2D version in development

The origins of BSL lie in the early days of synchrotron X-ray scattering when computers were rare, memory was limited and disk space more so. It is thus a very compact format, but one that is unintelligible except to programs specifically written to use it. It also does not carry any metadata. However, it is the primary data interchange format used by the ex-CCP13 analysis programs.

A major contribution by Marc Malfois and Dmitri Svergun in the late 1990's was to assemble a CIF dictionary for SAS (sasCIF) closely following macromolecular mmCIF practice. CIF - the Crystallographic Information Format - is an archive format developed under the auspicies of the International Union of Crystallography (IUCr).

NeXus is an international initiative to provide a modern common data format for all X-ray and neutron scattering experiments (and not just SAS). It is based on the principles of HDF (Hierachical Data Format) developed by the US National Centre for Supercomputing Applications and allows the capture and storage of vast quantities of associated metadata in addition to the data. By 2010 raw data from most of the leading synchrotron, neutron and muon facilities will likely be output in NeXus format. NeXus/HDF-compliant applications are required to read NeXus files.

SASXML is a format devised by the CanSAS network of SAS scientists. It uses XML to 'wrap up' reduced SAS data in a manner that is editable (in the simplest of text editors), human-readable (when printed or typed to a screen), and machine-treatable (by a wide-range of Third Party applications). In effect SASXML 'standardises' the information that most facilities were already providing to their users anyway, but with the added advantage that structured metadata can also be incorporated. The Microsoft Corporation also endorse XML.


Documents