canSAS-III
Collective Action for Nomadic Small-Angle Scatterers
Grenoble 17th - 19th May 2001 This short note proposes file formats to generalise access to treated SAS data
- for storing and treating simple 1D distributions
- for storing and browsing 2D and complex multi-parameter data
One aim of the forthcoming canSAS-III meeting will be to establish template routines/macros demonstrating reading these data into various software packages.
Simplification: will this help most people most of the time?
The information to be gained from SAS experiments is complementary to a wide range of studies from Metallurgy, through Condensed Matter Physics, Soft Matter Chemistry to Biology; it is inevitable that the software tools used within the SAS community are unlikely ever to converge to a specific subset.At canSAS-I, 1998, we identified two classes of experiments, those essentially producing reduced one dimensional radial scattering distributions, and a more complex case, leading to two dimensional patterns, or multiple scans of 1D patterns as a function of temperature, and/or other constraints. Data reduction requires merging components of detector calibration, backgrounds etc. in both cases.
In some fields of measurement the analysis procedures have been highly developed, notably solution scattering, and we welcome the contribution from Dmitri Svergun and Marc Malfois, following the first canSAS meeting, in proposing the sasCIF archive format, which allows all instrument parameters and component data to be placed systematically in a form closely following the CIF standards.
For the more general case we present here a pragmatic simplification for interchanging data. If data values of Intensity and Q, and their error deviations are made available, preferably calculated using facility programs specific to each instrument, then the data format can be greatly simplified. There is, in principle, nothing more to be added! We note that the (URL) BSL format has been used extensively, and that this primarily contains blocks of intensity values only. We do however wish to encourage annotation, especially quoting units, and inclusion of history of treatment: in practice we now accept that these fields should be included as comments, rather than trying to establish stylised formal standards.
Simple one dimensional distributions, I(Q) versus Q
We recognise there is a large variety of tools used to add, subtract and divide the measured distributions. To offer ease of access for the widest use it was decided that data files should be in text form, with one pattern per file, typically in a multi-column form. One possible example of such a work-file is shown below:#canSAS: headerlength = 42 lines
TIT:teflon instrument tests
INS:ILL SANS D11
KEY: 22100 102 19 1 38 34
LAY: 10 0 32 0 3 3
HIS:i2c 15-Jan-2001 17:00:59
PAR:theta0 = 0.0000 ! Theta-0 Detector offset angle
PAR:x0 = 32.0000 ! X0 cms Beam centre
PAR:y0 = 31.5000 ! Y0 cms Beam centre
PAR:dr = 2.0000 ! Delta-R cms regrouping step
PAR:sd = 10.0000 ! SD m Sample-detector distance
PAR:wave = 12.0000 ! Angstroms incident wavelength
PAR:collim = 8.0000 ! m collimation distance
PAR:conc = 1.0000 ! concentration
PAR:isum = 66. ! ISUM central window sum
PAR:mon = 100000. ! flux monitor counts
PAR:secwid = 180.0000 ! degrees detector sector width
PAR:secori = 0.0000 ! degrees sector orientation
PAR:dwave = 10.0000 ! % wavelength spread
PAR:sourx = 15.0000 ! mm source slit width x
PAR:soury = 30.0000 ! mm source slit height y
PAR:sampx = 13.0000 ! mm sample width x
PAR:sampy = 0.0000 ! mm sample height y
PAR:pixelx = 10.0000 ! mm detector x pixel size
PAR:pixely = 10.0000 ! mm detector y pixel size
PAR:samang = 0.0000 ! degrees sample normal/beam
PAR:tempk = 298.0000 ! K sample temperature
PAR:trans = 0.9000 ! sample transmission
PAR:thick = 1.0000 ! mm sample thickness
PAR:time = 34.9000 ! counting time secs
PAR:res1 = 0.0000 ! reserved
PAR:res2 = 0.0000 ! reserved
PAR:res3 = 0.0000 ! reserved
PAR:res4 = 0.0000 ! reserved
PAR:res5 = 0.0000 ! reserved
PAR:res6 = 0.0000 ! reserved
PAR:res7 = 0.0000 ! reserved
PAR:res8 = 0.0000 ! reserved
PDF: 19 0 0 0 0 0 0 6
PDF: 0.100000E+01 0.100000E+04 0.000000E+00 0.100000E+01 0.120000E+01
PDF: 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00 0.000000E+00
#canSAS: Q A-1 I(Q) errI(Q) errQ ....end of header
2.617993E-04 3.700000E+01 4.301163E+00 2.243613E-06
1.062462E-03 6.412500E+01 1.634587E+00 9.105271E-06
2.107973E-03 1.410135E+03 5.207492E+00 1.806527E-05
3.167636E-03 1.752197E+03 4.801586E+00 2.714656E-05
4.189463E-03 7.581771E+02 2.810281E+00 3.590359E-05
5.226389E-03 3.350000E+02 1.617772E+00 4.479002E-05
6.271116E-03 1.678649E+02 1.064999E+00 5.374330E-05
7.300317E-03 9.413372E+01 7.397899E-01 6.256352E-05
8.360283E-03 5.821635E+01 5.290428E-01 7.164741E-05
9.419649E-03 3.818750E+01 4.128921E-01 8.072614E-05
1.045564E-02 2.666532E+01 3.279046E-01 8.960456E-05
1.149618E-02 1.941304E+01 2.652115E-01 9.852196E-05
1.255376E-02 1.486688E+01 2.197023E-01 1.075854E-04
1.360724E-02 1.204012E+01 1.927716E-01 1.166137E-04
1.466810E-02 1.026648E+01 1.679423E-01 1.257052E-04
1.572930E-02 8.808511E+00 1.530585E-01 1.347997E-04
1.672592E-02 7.862857E+00 1.498843E-01 1.433407E-04
1.775654E-02 7.079167E+00 1.717455E-01 1.521731E-04
1.857273E-02 6.913043E+00 2.741200E-01 1.591678E-04
One sees here data derived from a standard ILL file, where the original header data has been encapsulated by the canSAS lines. Most of the parameters were included to be included in modelling the resolution in Q, and are redundant if this is given in tabular form. The minimal form of this is presented below:
#canSAS: headerlength = 2 lines
#canSAS: Q A-1 I(Q) errI(Q) errQ ....end of header
2.617993E-04 3.700000E+01 4.301163E+00 2.243613E-06
1.062462E-03 6.412500E+01 1.634587E+00 9.105271E-06
2.107973E-03 1.410135E+03 5.207492E+00 1.806527E-05
3.167636E-03 1.752197E+03 4.801586E+00 2.714656E-05
4.189463E-03 7.581771E+02 2.810281E+00 3.590359E-05
5.226389E-03 3.350000E+02 1.617772E+00 4.479002E-05
6.271116E-03 1.678649E+02 1.064999E+00 5.374330E-05
7.300317E-03 9.413372E+01 7.397899E-01 6.256352E-05
8.360283E-03 5.821635E+01 5.290428E-01 7.164741E-05
9.419649E-03 3.818750E+01 4.128921E-01 8.072614E-05
1.045564E-02 2.666532E+01 3.279046E-01 8.960456E-05
1.149618E-02 1.941304E+01 2.652115E-01 9.852196E-05
1.255376E-02 1.486688E+01 2.197023E-01 1.075854E-04
1.360724E-02 1.204012E+01 1.927716E-01 1.166137E-04
1.466810E-02 1.026648E+01 1.679423E-01 1.257052E-04
1.572930E-02 8.808511E+00 1.530585E-01 1.347997E-04
1.672592E-02 7.862857E+00 1.498843E-01 1.433407E-04
1.775654E-02 7.079167E+00 1.717455E-01 1.521731E-04
1.857273E-02 6.913043E+00 2.741200E-01 1.591678E-04
Feedback to the ILL shows that stripping off the header to introduce column data into spreadsheets or other calculation tools poses few problems. The primary element missing in the most simplistic form is the title line, though the ILL uses styled filenames with simple measurement sequence numbers and version numbers to identify data. This has advantages of requiring little imagination, and is easily adapted for treating swathes of data in a semi-automatic fashion. Few instrument parameters really need to be extracted from these comment fields for further simple treatment. More complete history, comments and specifying units as shown in the first example is clearly of use to sporadic SAS experimenters.
Inclusion of the deviations must be encouraged, though inevitably allowance will have to be made for those who only include I(Q) and Q columns. It seems futile in these cases to insist that two columns of zero are added representing the unknown errors.
Starting the line with # allows the canSAS markers to be included in a simple sasCIF file which contains only a single table at the end. Such lines are traditionally ignored as comments.
Complex Results I(Qx,Qy), I(Q,...)
To cope with larger volumes of data, which may extend to gigabytes, in the case of kinetic experiments using a two dimensional detector, it becomes attractive to consider using existing file handling software, especially where this now is available with free graphical browsers. The Hierachical Data File (HDF) developed by NCSA offers features including computer system independence, free browsers based on Java, and automatic data compression. Access to HDF reading and writing functions are now included in many treatment packages e.g. IDL and IGOR, and the software is in the public domain.Our proposal is to use this type of file as a container for the data, without constraints of pre-defining variable names or internal structure. The availability today of efficient browsers renders this unnecessary. It is possible to include in such a file not only the intensity values, but also sections containing text describing the measurements, and blocks of other parameters, and some browsers facilitate the work of adding additional data components directly. While recommending this simplistic use of HDF does not resolve problems of actually treating the data, it does provide for a commonly accepted mechanism for storing and handling data, with a first level graphical representation.
While the lack of formally defining component names to be used may seem counter-productive, the HDF container file can satisfy almost all basic requirements, rather than presumptuously imposing constraints based on past experience. Nothing prevents groups working on similar measurements from convening to define useful common data layouts and component names. There is some direct stimulus to learn about using the HDF tools, either within tools, or with the library and standard compilers. There remains finally the general possibility of using browsers to extract data as text files to introduce into non-HDF capable programs.
The use of HDF has long been proposed by the NeXus group; the high aims of this activity are to stylise data to an extent that data treatment from scattering spectrometers at different sites may be unified and shared. Establishing acceptable definitions of parameter names and their relationships with measurement parameters in a fashion commonly acceptable to the different instruments and facilities remains an incompletely solved problem. We expect feedback from NeXus will aid convergence on HDF file layout and internal names for the SAS community.
We note again the simplifications which can be made if only treated data are exchangeable; it is no longer necessary to try and map together all names and functions of instrument raw data parameters. In both NeXus and sasCIF it is difficult to obtain consensus on nomenclature, especially since the user community spans such a wide range of instruments, experimental procedures and subject jargon.
The recently completed DUBBLE beamline at ESRF offers raw data in BSL/OTOKO format, however it is assumed that the majority of users will take away results in partially treated form, and these will be in one of several often used formats which the control and reduction suite can provide for direct use subsequently in the home laboratory. This takes full advantage of dropping instrument details of little relevance in the final analysis.
Last modified: 10th March 2001 (R.E. Ghosh)
