Data Formats | Proposal for a simple XML data format

Simple regrouped data file for SAS data

R. Ghosh, ILL, S.King, ISIS, A. Rennie, Uppsala University - November 2006


Meetings earlier in the year with members present from the ILL, ISIS and DIAMOND lead to attempts at producing a simplified data format which would be easy to use with spreadsheet programs. The starting point was the recent inclusion in Excel2003 of automatically reading XML files, placing data directly in columns.

The outcome of this meeting included a number of recommendations

  • Four column data should be offered including Q,intensity,intensity errors and an assessment of Q resolution, point by point.
  • using nomenclature and style derived from the NeXus dictionary the instrument parameters could be included (optionally)

Component names

Given that items have clear tag names it seems reasonable to allow some flexibility in the choice of, say, units. This leads to a limited list of recommended names, and alternatives. The following is a working list rather than a definitive specification:

 Idatum
I_counts        I_cm-1
Idev_counts     Idev_cm-1
Q_A-1           Q_nm-1
Qdev_A-1        Qdev_nm-1

Title
Source_file
Flux_monitor
Count_time_secs

SASsample
  sample_temperature
  sample_offset_angle_deg
  sample_x_mm
  sample_y_mm
  sample_thickness_mm
  sample_transmission

SASinstrument

  SASsource
    radiation
    beam_x_mm
    beam_y_mm
    wavelength_A         wavelength_nm
    wavelength_spread

  SAScollimator
    distance_coll_m
  SASdetector
    offset_angle_deg
    x0_cm
    y0_cm
    distance_SD_m
    pixel_x_mm
    pixel_y_mm
 
SASprocess
  date
  radial_step_cm
  sector_width_deg
  sector_orient_deg

Practical implementation

Rather than build a complete data processing system a pragmatic decision was taken to keep production of the standard ILL data files in parallel with the newly formatted XML files (the added production of XML or not is simply controlled by an environment variable set by the user). The XML files were then tested with Excel and Kalidagraph, and succesfully read by both. The Excel method for reading the data file shows it is quite straightforward. To simplify take-up by other display programs white-space separates the table values, and the xml tags are packed close together; this leads to 9 columns being read-in, with Q, I, Idev and Qdev in columns 2 4 6 and 8 respectively.

Two examples are available here, a minimalist approach xg009436_001.xml, and a more comprehensive file xg009436_001f.xml.

A small set of Fortran-callable routines have been developed to navigate and extract data using a robust xmlparser. Examples of a simple test extractor, and a data converter are shown. A generic plotting program, xmlplo, will read mixed xml files from a variety of sources and superpose results. A zip file may be downloaded.