Data Formats | Fortran-callable navigator and extractor for XML data files

R. Ghosh, ILL, July 2007


These routines written in C++ use a GPL xml parser written by Frank Vanden Berghen and extend it as a Fortran wrapper to perform basic navigation and extraction of selected data fields in xml files. This enables, for example, for all instances of a data item to be extracted from an xml file without knowing details of the internal structure, merely the actual node name.

Further development will include the ability to start the selection from a chosen branch in the file, allowing the possibility of navigating easily through similarly named segments (nodes) and returning attributes.

The basic routines are:

rxmlopen File opening routine

      numnod=rxmlopen(filename)
numnod integer return is the number of nodes found
filename character (in) filename (no trailing blanks)

mgetf Floating point number extractor from named node(s) q.v. igetf - forrtran version

      nfound=mgetf(ff,lim,nodname)
nfound integer number of reals found
ff reals returned
lim (in) integer array limit
nodname (in) character node name (no trailing blanks)

rgetc text field extractor from named node(s)

      nfound = rgetc(buf,iblen,nfield,nth,name) 
nfound number of characters returned
buf characrcter buffer
iblen integer buffer length
nfield integer number of text field in node (starts at 0)
nth integer (in) nth occurrence of this node name (starts at 1)
name character (in) node name (no trailing blanks)

rnodeinfo node information retrieval (becomes "current_node")

      ifo=rnodeinfo(ntot,nsnod,natt,ntext,nth,nname) 
ifo integer <0 no info/node absent
ntot integer total number of such named nodes
nsnod integer number of sub-nodes
natt integer number of attributes
ntext integer number of text fields
nth integer (in) results for nth node (start at 1)
nname character (in) node name

curatts attributes (name,value) of "current_node"

      ifo    integer <0 not found or no "current_node"
attnam character name
lenn integer length of string returned for name
attval character string value of attribute
lenv integer length of string returned for value

The open routine rxmlopen calls Frank Vanden Berghen's xmlParser.cpp. This parses the whole file, finding all nodes. The Fortran callable routine then traverses the parsed names, and builds up a local list of node names which can then scanned efficiently by the extractor routines. This allows all instances of a variable, for example, to be obtained in one pass. The igetf Fortran routine is less efficient since each item is subject to restarting the search. The present version of rxml.cpp will accommodate up to 10000 node names.

A simple test example below shows use to get values from a file

      program frxml
c***** tests reading xml files...
character ccc*100,fname*50,nname*20,attnam*60,attval*60
integer rxmlopen,rgetf,rnodeinfo
real xx(500)
write(6,1)
1 format(' Give filename ',$)
read(5,*) fname
ll=index(fname,' ')-1
numnam=0
c***** C program MUST reconstruct a zero terminated name
c the string length passed does not include any termination
c in memory - and strings are concatenated in Fortran storage.
numnam = rxmlopen(fname(1:ll))
write(6,2) numnam
2 format(i5,' names found')
lim=500
write(6,6) 'to find reals (library) : '
6 format(' Give NAME ',a,$)
read(5,*) nname
ll=index(nname,' ')-1
nfound=mgetf(xx,lim,nname(1:ll))
write(6,*) 'Found reals ',nfound
write(6,*) (xx(i),i=1,nfound)
write(6,6) 'to find reals (fortran) : '
read(5,*) nname
ll=index(nname,' ')-1
lim=500
nfound=igetf(xx,lim,nname(1:ll))
write(6,*) 'Found reals ',nfound
write(6,*) (xx(i),i=1,nfound)
lim=500
write(6,6) 'info for : '
read(5,*) nname
nth=1
c***** nth instance
ll=index(nname,' ')-1
ifo=rnodeinfo(ntot,nsnod,natt,ntext,nth,nname(1:ll))
if(ifo.lt.0) stop 'no info'
write(6,10)ifo,ntot,nsnod,natt,ntext
10 format(i10,' ifo'/
1 i10,' ntot'/
1 i10,' nsnod'/
1 i10,' natt'/
1 i10,' ntext')
if(natt.gt.0) then
do i=1,natt
ii=i-1
ifo=myatts(ii,attnam,lenn,attval,lenv)
if(ifo.ge.0) write(6,*) ' Attribute "',attnam(1:lenn),
1'" Value :',attval(1:lenv)
end do
end if
end

integer function igetf(ff,lim,name)
character *(*) name
character*1024 buf
real ff(lim)
integer iblen,ierr
integer rnodeinfo,rgetc
igetf=-1111
iblen=1024
limm=lim
c***** only read first field (starts 0)
nfield=0
nth=1

ifo=rnodeinfo(ntot,nsnod,natt,ntext,nth,name)
if(ifo.lt.0) return
if(ntot.gt.lim) ntot=lim
ierr=0
do i=1,ntot
ff(i)=0.
nth=i
nfound = rgetc(buf,iblen,nfield,nth,name) +1
c c-call has two extra string length arguments..
c note differences always with MSVC / gcc
c MSVC has length immediately after string;
c g77 has lengths added at end
if(nfound.le.0) then
ierr=ierr+1
else
c write(6,*) nfound,buf(1:nfound)
read(buf(1:nfound),*) ff(i)
end if
end do
igetf=ntot
if(ierr.gt.0) igetf=-ierr
return
end

The internal floating point routine or the text routine is used to select specific variables. The first is more efficient since the search through the nodes is only made as a single pass.

RESULTS

> rxml
Give filename sk1.xml
736 names found
Give NAME to find reals (library) : I_cm-1
Found reals 140
34.3499985 18.9500008 10.8900003 6.78000021 4.52099991 3.19700003
2.35700011 1.83099997 1.46399999 1.227 1.03499997 0.910099983
0.796899974 0.721300006 0.66170001 0.605400026 0.57980001 0.555700004
0.534600019 0.513199985 0.49970001 0.477100015 0.469199985 0.465600014
:
:
0.435099989 0.454400003 0.450599998 0.444400012 0.485199988 0.508000016
0.518999994 0.524500012
Give NAME to find reals (fortran) : Q_A-1
Found reals 140
0.00700000022 0.00899999961 0.0109999999 0.0130000003 0.0149999997
0.0170000009 0.0189999994 0.0209999997 0.023 0.0250000004 0.0270000007
0.0289999992 0.0309999995 0.0329999998 0.0350000001 0.0370000005
0.0390000008 0.0410000011 0.0430000015 0.0450000018 0.0469999984
:
:
0.268999994 0.270999998 0.273000002 0.275000006 0.27700001 0.279000014
0.280999988 0.282999992 0.284999996
Give NAME info for : SASprocess
0 ifo
1 ntot
10 nsnod
1 natt
0 ntext
Attribute "name" Value :COLETTE


A more complete example shows the conversion of an xml file to standard ILL SANS data.

The routines, updates and examples will be posted shortly on the ILL ftp server.