Monday, April 20, 2009

Interferometry File Formats

When astronomers talk about software done right, they often hold up FITS as the gold standard. I'll admit, FITS has done more to live up to its namesake (Flexible Image Transport System) than many believed possible. But unfortunately, there can be too much of a good thing. Sometimes too much emphasis is put on defining an end-all-be-all file format, when all we really need are good tools for converting between file formats.

Data formats are a problem in radio astronomy software. Currently, there are at least three major formats (MIRIAD, UVFITS, and MeasurementSets), each linked to a major software package (MIRIAD, AIPS, and CASA), with rudimentary/non-existant tools for converting between them. Many have taken this current state of affairs as a sign that multiple file formats are bad and that the community should decide on a single format. Since each format is intimately tied to major software package, this battle over file formats has escalated to a war between software packages.

The mistake made here was blaming the file formats. File formats are not the problem. The problem is that the software for reading them has not been circulated in easily accessible modules. I am encountering this problem as I'm trying to get AIPY to be agnostic about file formats by wrapping them all into Python. Here's where I am:

The MIRIAD file format was actually easily wrapped up, owing to MIRIAD having a developed programmer's API.

MeasurementSets (with CASA) are giving me a lot more trouble. It seems that CASA, with all of it's C++ objects that are passed between functions, is something of an "all or nothing" deal. If I want to read a MeasurementSet, I apparently need to wrap up the entirety of CASA. The failing here is code modularity.

UVFITS is giving me the opposite problem.
UVFITS was cooked up as a FITS-conforming file format to handle raw interferometric data. Unfortunately, interferometric data isn't in picture form yet, so an extension of the FITS format (the binary table) was cooked up to accommodate that (Cotton et al. 1995). The result was a file format that is so general that it does not tell the programmer what the data actually means.

File formats exist to support the needs of different applications. They've been created out of need, and should not be dismissed as unnecessary. I recommend to the radio astronomy software community that we embrace these file formats and work on modular code so that they are accessible from any software package.

No comments:

Post a Comment