06/13/2009
File Format documentation for CDAAC.
CDAAC data files are documented via a specialized XML format. Files in
this format are converted via a perl script to HTML for display on the web.
Much of the information to be stored in the XML files is available in other
places in the CDAAC system, so there is also a perl script which adds or updates
information in the XML files with this CDAAC system information.
All the HTML-ified XML files are accessed from the web via an image map (called
pub.png) which shows the organization of file types in the /pub directory
hierarchy.
The /ops/tools/www/cdaac/fileFormats/ directory contains several file types:
XML files: filetype.xml (eg atmPrf.xml). These files contain descriptions in XML
for one CDAAC data file type. These XML files can have the following tags:
- data_format
The root element for all files. Contains the 'name' attribute, plus
other attributes to describe the name.
- description
Contains a text/html description of the data file.
- global_data
For netCDF files, introduces the global or attribute section.
- profile_data
For netCDF files, introduces the vector or profile data section.
- ncfield
For netCDF files. Contained within the global_data or profile_data sections. Has
several attributes: desc, name, valid_range, type, missing_value,
unit.
- part
For binary (such as binex) files. Introduces one section of the binary file.
Has attributes name and desc.
- binfield
For binary files. Analogous to the ncfield tag. Contains
several attributes: desc, name, type, vals,
size.
- multiple
For binary files. Introduces a repeated section of the file. Contains
attributes num and name.
These XML file are processed by perl programs also contained in this directory:
- extractNetcdfDoc.pl
This program is an XML to XML filter which takes a basic
XML file and adds information to it taken from the netCDF file it documents (for example,
atmPrf.xml documents files like atmPrf_CHAM.2001.139.00.10.G05_0031.0002_nc) and from the PubFile
database. The information includes naming convention info for the data_format tag and
netCDF field info for the ncfield tags.
- updateMap.pl
This program generates the image and image map which shows the sample CDAAC
/pub hierarchy. These files are pub.png and pub.map.
- pub.tar
This file contains a sample /pub hierarchy which shows how the various file
types are organized in the /pub directory. This file is used by the program updateMap.pl to
generate the image and image map.
How to add and update data file documentation for CDAAC.
To add documentation for a new file type (say data type foobar), do the following (assuming that the file type already exists in PubFile and there is an example in the /pub area):
- First create a foobar.xml file like this:
<data_format>
<description>
The 'foobar' file contains foo data in the bar format.
More text...
</description>
<ncfield name="foo"/>
</data_format>
For netCDF file types any ncfield tags you specify will result
in documentation being extracted from the netCDF file and added to
the xml.
- Then run extractNetcdfDoc.pl:
cd /ops/tools/www/cdaac/fileFormats
./extractNetcdfDoc.pl foobar_2003.290.00.34_nc fmission
This will add naming convention info and (if its a netCDF file) documentation
info about netCDF fields.
- Now add any final edits to foobar.xml. The fileFormats.cgi script already located in the main web area will take care of dynamically rendering html content from the xml file.
- In case you want to add the foobar file type to the pub.png image map, try
this. First untar the pub.tar file:
tar -xvf pub.tar
This will extract a directory ./pub in the fileFormats directory.
- Now add add a foobar directory in this hierarchy:
cd ./pub/champ/level1b # (say)
mkdir foobar
cd /ops/tools/www/cdaac/fileFormats
- Now, create a new pub.tar file:
mv pub.tar pub.tar.bak
tar -cvf - pub > pub.tar
- Finally, update the image and image map:
./updateMap.pl
|