Monday, November 9, 2009

The Need for SPEAD

I've been absent for a good while now as a result of participating in a (successful) deployment of our PAPER experiment in South Africa. The Karoo desert in SA, where we were stationed, was very reminiscent of Rangely, CO where I grew up, except for the occasional baboon or kudu in the road. Though it came at a price of a lot of work piled up for me when I got back, and an awfully long time away from J, the isolation from all but our experiment helped ferment some ideas I'd been having about migrating the AIPY toolkit I've been developing to use a streaming data format that would avoid unnecessary disk accesses, would allow AIPY to be integrated directly with the correlators developed by our CASPER project, and would help our experiment develop a real-time analysis pipeline for compensating for ionospheric distortion in our data.

After chatting with a lot of guys working on the Karoo Array Telescope in Cape Town, we came up with a concrete protocol build on something already being used for CASPER correlator output. I just got done writing my first grant proposal to the NSF, funding a graduate student to work on this protocol--the Streaming Protocol for Exchanging Astronomical Data (SPEAD, pronounced "speed"). The process of writing a grant myself was a learning process, and helped me understand where a lot of the questions I got asked by my previous advisors were coming from.

A lesson I got to take away from SA was this: the reason we were in SA (as opposed to Australia) for PAPER was because we had been working with the KAT team, sharing correlator development. The reason we were working with the KAT team was because CASPER and KAT started up a collaboration a few years before. And that collaboration was started up because Dan Werthimer went down to visit SA some years ago to help advise them in a review of the design of their telescope electronics. Dan was invited there because he struck up a fast friendship with Alan Langman (the KAT director) at an earlier conference. The moral of this chain of causes and effects being that sometimes large projects go in new directions because of personal friendships, and sometimes those friendships end up making the difference in the success of a project.

Sunday, September 6, 2009

"Guess What I Was Thinking" Logic Puzzles (A Rant)

Probably more than most, I like logic puzzles. They're fun. But there's a variety of logic puzzles, especially prevalent in IQ/Mensa tests, that I really dislike. They are the "what's the next symbol in the pattern"-type of puzzles, and I HATE them. They are written like there is one (and only one) answer, and you must be dense not to see it. But can't I put anything in that blank space and call it a "pattern"? And what if the pattern that I see when looking at the provided sequence isn't the one you were thinking of? Is anyone with me in recognizing that these aren't logic puzzles at all? They're "guess what I was thinking of" puzzles! They aren't adequately constrained. They don't specify the parameter space from which the sequence is drawn. Anything can be a pattern! Anything!

What they actually want you to do is find the most likely symbol given several measurements and a set of priors about the likelihood of the author picking a particular sequence, but they neglect to provide you with any information about those priors. Maybe they assume that you can guess the priors based on estimates of your own sequence-picking priors, but that only works if your brain works the same way as the authors'. And quite frankly, if the authors can't appreciate that answers to these puzzles they are writing are indeterminate, I'm pretty sure their brain isn't working the same way as mine. Quit calling these logic puzzles! Put them on a Berkeley Psychic Institute entrance exam, not a college entrance exam.

Ok. Done ranting.

Wednesday, August 26, 2009


After a trying, but ultimately successful month spent extracting the family from Puerto Rico and re-embedding us in Berkeley, I'm just starting to get back on top of things enough to think about posting...

I had lunch yesterday with a good friend of mine, Pierre, who co-founded a company that specializes in sensory and mapping systems such as those that are used to create Google's "Street View". I was impressed to learn about their system for combining data from GPS, LIDAR, car odometers, and IMUs to create a consistent picture of how a vehicle is located and oriented in space as a function of time. They've spent a lot of time calibrating their systems, and use some sophisticated MCMC post-processing methods for deriving the actual trajectory of a vehicle.

Although the antennas in the PAPER array (that's the low-frequency interferometer I'm working on), are much less mobile than a car, there was considerable overlap between the problem Pierre has been working to solve and the calibration problem I am facing the requires positioning antennas and celesital sources as a function of time in the face of ionospheric distortion, variable gains, etc. Pierre pointed me to Graph-SLAM as a formal description of the problem that we are trying to solve, and suggested that Kalman Filtering with RTS Smoothing was a powerful technique for converging to the optimal solution (with covariance information) in linear time.

Tuesday, July 14, 2009

The Voynich Manuscript: Bootstrapping Language

The internet is a powerful and dangerous thing. It all started when I read the latest xkcd, which warned me that visiting was dangerous. Then I read about the "6 phenomena that science can't explain" (which was a very dramatic title for some underwhelming mysteries), and next thing I knew, I was reading about the Voynich Manuscript. I learned about cryptography, glossolalia, the Manchu language, among other things. Then I took a look at the manuscript and before I knew it, I had a transcribed version of the manuscript in electronic form using the European Voynich Alphabet. And it just went downhill from there.

To summarize, the Voynich Manuscript (hereafter VMS) is a handwritten text with some illustrations some 500 years old. It uses glyphs no one knows how to read, it is not clear if it corresponds to a known language, it may or may not be encrypted, and little progress has been made in deciphering any of it, despite the fact that some bright people have tried. So I decided to have a crack at it.

The reason I got interested is because of the similaries to SETI. Arecibo, back in 1974, transmitted a message off into space that had been designed to be decrypted. We might receive a message like that some day. Or we might intercept something much like the VMS--a bunch of data in a language that we have no prior knowledge of--and we may be finding ourself trying to figure out how to bootstrap a language. That is to say, to learn the grammar and semantics of a language from a static example, without outside help.

Is this possible? For grammar, I'm pretty sure of it. I can imagine an algorithm (maybe Maximum Entropy Modeling and Bayesian learning applied to grouping and parsing) that uses correlations in the appearances of language elements (starting with letters and building up) and correlations in the behaviors of these elements relative to one another to build a model for parsing a language. For the VMS, I used something similar to this (not the MEM and Bayesian part) to show that spaces, line breaks, and paragraph breaks show similar grouping correlations relative to other VMS letters, and so can probably be considered one grammatical element of whitespace. That's a pretty simple thing to deduce, but it was actually something I was worried about in getting started with the VMS.

Sematics is another issue. Once upon a time, I would have had an optomistic answer to bootstrapping sematics from text that was not written for that purpose. However, after watching my child mysteriously acquire language, illustrating how hard-wired the human brain is for learning language from another human, and how much it relies on shared experience and feedback, I'm less sure.

I would be interested to know if there is a field of mathematics that studies sematics and the properties that a self-contained system needs to have to be able to generate sematical relationships. The Arecibo message relied on a shared physical environment to try to bootstrap sematics. I wonder if it would be enough to describe the rules of the grammar of a language in the language itself. That one, once the reader had deduced the relationships between elements, you would have a shared knowledge of that subject that might enable a reader to correlate the structure of the descriptions with the grammatical structure and thereby establish the first sematical relationships.

Anyway, after preliminary analysis of the VMS, I'm pretty sure that it's not random gibberish (there are correlations between elements on levels ranging from letters to words), and if it's encrypted, it's a weak form of encryption that preserves these correlations. My pet theory, extended from the glossalalia idea, is that this is actually plaintext in a natural language with an invented set of symbols, but that the natural language might be the accidental or intentional creation of a savant or scholar.

Thursday, July 9, 2009

Countability and Strong Positive Anymore

Seven or eight years ago, I discovered that I have a linguistic condition called "Positive Anymore." I was in college, chatting with my roommates, and said something like "Anymore, I just lift weights in the Leverett gym." One of my roommates just couldn't take it anymore. "I've heard you say 'anymore' like that for years now. What the hell does it mean?" A quick poll of those present revealed that I was the only one for whom that construction made grammatical sense. My aunt (a linguistics professor) gave me the prognosis: I had Positive Anymore.

I used to think that PA was a linguistic shortcoming of mine, but anymore I'm convinced it's more like a superpower. Whereas most English speakers can only use the word in negative constructions like "I don't drive anymore" I have the uncanny ability to use it positively as per the first sentence in this paragraph. Moreover, I don't just have PA, I have strong PA, which means I can, at will, detach "anymore" and put it anywhere in a sentence. "Anymore, I just take the bus." Astounded yet?
If you're still having trouble parsing that, replace "anymore" with "nowadays"--to me, they mean about the same thing.

I receive no end of flak from friends, relatives, and spouses about my PA, although it's not that uncommon a condition. In fact, I have caught several of my relatives (mostly on my dad's side) using PA even after making fun of me for my PA. They have it and don't even know it.

Anyway, S and I were trying to figure out if my use of PA was inconsistent with my use of "any", which I use according to the standard rules. But it turns out the standard rules are weirder than you might think. For example, "I don't want any spam" is a grammatical negative construction; "I want any spam" is ungrammatical and positive. "Do you want any spam?" is grammatical and seems positive, but it turns out that there is an implied negative in English owing to the uncertainty inherent in questions and subjunctives. Hmph. And then what about "I like any spam I can get?" That seems positive again, but the clause "I can get" is required to make it grammatical. And then the plot thickens. "I feed spam to any dog" is grammatical and did not require a clause to modify "any dog."

Our theory was that using "any" positively without a clause requires the noun modified by "any" to come in quantized units--to be countable. Dogs are countable; spam is a continuum, much like water or space-time. The best example we could think of that illustrated this was "fish". Fish can be countable noun (number of live fishes) or continuum noun (amount of dead fish to eat). If I say "I'll take any fish," the ambiguity in the countability of fish is broken--it's clear I'm talking about a live fish (or, perhaps, a type of fish, which is also countable). But if I say "I'll take any fish that you give me," the ambiguity is preserved.

The implication was that for my use of PA to be consistent with the standard use of "any" (which it may not need to be, since "anymore" is one word, not "any more"), I must be thinking of the time interval referred to by PA (i.e. now and continuing indefinitely into the future) as something countable rather than continuous. Maybe. I don't know. Anymore, I'm just really confused.

Thursday, July 2, 2009

The Need for Speed

On the drive from San Juan to Arecibo this morning, I got to wondering about where my average driving speed fell in the distribution of drivers here in Puerto Rico. In the states, I felt like I was a pretty average driver, but here en la isla, the distribution of driving speeds is different. There are a lot of fast drivers, too be sure, but there is also a subpopulation of drivers whose speed is significantly (~10 mph) below the speed limit. This may be because relative to the US, PR is economically depressed and so more old cars are on the road, or as a reaction to the more erratic driving habits there seem to be here, but anyway, I definitely pass more people than pass me now.

So in an effort to discover where my driving speed fell relative to others (and in and effort to alleviate the boredom of driving 1.5 hrs alone), I started counting how many cars I passed and how many passed me as I was going 65 mph (the speed limit). Out of 55 pass events, only 9 involved me getting passed. To make this a tractable problem in my head, I decided to assume that driving speeds were normally (gaussian) distributed about a mean--even though this contradicts my anecdotal evidence above. Using this approximation, my first instinct was to say that 1/6 of the cars on the road were faster than me, and since ~2/3 of samples are within +/- 1 sigma of the mean in a gaussian distribution, 1/6 of the samples would be above +1 sigma. So I approximated that I was a 1 sigma driver.

But then it occurred to me that I needed to control for a significant sample bias. This is because the test I was doing wasn't randomly selecting cars and comparing my speed to them. Cars were far more likely to get selected if the difference between their speed and my speed was large. A car going the same speed as me would never pass me, and I would never pass it. But I would assuredly pass almost every car on the road that was going 10 mph as I went 65. The "road distance" that I sampled for different velocities is proportional to abs(v-v0), where v0 is my velocity. The effect this had on my samples was to underweight speeds close to my own and overweight the wings of the gaussian distribution I had assumed as my model. If I drove exactly the mean velocity, this effect would not be terribly important--if the model were correct, it would still be the case that as many cars passed me as I passed. But as my velocity moves away from the mean velocity, the "normal drivers" who are only going a little faster than me get undersampled, so I only see the drivers who are tearing around like a bat out of hell. At the lower end, I still see the real slow-pokes on the road, but I start seeing people who are going a bit faster than that, of which there are a lot more. The effect of this sample bias, it seems, would be to make it seem that I'm a farther outlier in my driving speed than I actually am.

So now, to figure out where I fall in the (normal) distribution of driving speeds, I need to know exactly what the mean driving speed and what sigma is, so that I can compensate for the abs(v-v0)
sampling factor. That means I need to figure out 2 numbers, but unfortunately, I only measured 1 number (that 1/6 of the passes while driving were me being passed) so I won't be able to properly constrain this problem. However, I should be able to figure out the mean on my drive home by finding the speed at which as many people pass me as I pass. For now, let's say this is 55 mph. Then all I need to do is find the sigma for which a gaussian distribution around 55 mph downweighted by abs(v-65 mph) has 1/6 of the area lying above 65 mph. I just solved that numerically on my computer, and it's saying that the best-fit sigma is ~22 mph. So that puts me at about +1/2 sigma. That seems reasonable.

An interesting next step (which I'm not going to do right now since I need to get to work) would be to translate the sample error in my pass measurements into an error in the determination of sigma, and then the error in my driving speed percentile.

Tuesday, June 2, 2009

What are the chances?

I cringe every time that I hear this phrase. I heard it most recently when my friend had her car stolen from San Juan. It was recovered in a semi-drivable state in Bayamon. She invested a couple thousand dollars to get rid of the "semi", only to have the car re-stolen a couple of months later. This time, when the car was recovered in Dorado, there was no "semi" to be had, so she's currently trying to sell it for pieces. While the police were fairly understanding (though less than helpful) the first time her car got stolen, the second time was occasion for all sorts of raised eyebrows and skepticism. And in exasperation my friend uttered the phrase in question.

"What are the chances" is a Pandora's box of bad statistics. Statistics is about hedging your bets given incomplete information, but this phrase is always uttered after the fact, when we have (relatively) complete information: it happened. So unless you plan on repeating the experiment, the chances are one. It happened.

As an example, let's take the famous Goat/Car Puzzle. There are 3 doors; one has a car behind it and the other two have goats. After you pick a door, the game-show host opens one of the other doors and reveals a goat. You are then offered the option of switching your choice to the other door. If you play this game repeatedly, you'll win more often if you switch your choice. But the instance just played out, the car was behind one of the doors, and if that was the door you picked, your chances of getting the car were 1. If you didn't, your chances were 0.

You might object: "What were the chances beforehand, when I didn't know where the goat and car were?". But to do that, you need to make some assumptions. You need to assume that at each playing of the game, the cars and goats are randomly assigned and/or you randomly pick doors. Otherwise, it might be the case that the car is always behind door #1, and you always pick door #2. Your chances of success wouldn't be so good in this case. You might have decent prior knowledge of how cars, goats, and doors are picked in this example, but for everyday occurrences, we usually have much more limited prior knowledge. Are cars randomly stolen, or are certain brands targeted? Are certain areas targeted? People often assume that these processes are random, but they rarely are. With limited priors on these events, the question "what are the chances" can't be answered with any certainty and any answers given should be taken with a great big shaker of salt.

Furthermore, people have selective attention. We ignore whole heaps of ordinary outcomes and only pay attention to ones that strike us as interesting. As a friend of mine once said: "Low probability stuff happens pretty regularly because stuff is happening all the time." Even if the processes involved are random, unlikely outcomes are to be expected if the processes are repeated often enough. People tend to ignore the ordinary outcomes, exclaim at the extraordinary ones, and then assume that something deeper is afoot. In my friend's case, the police started wondering if she was being personally targeted or if she was really bad at locking her car. But even if we assume a random model of car thefts, some number unlikely outcomes doesn't automatically imply that our random model is wrong.

Finally, we also need to keep in mind that in complex systems like real life, there may be a huge number of possible outcomes. But something has to happen. When you roll a die, each number only has a 1/6 chance of coming up. Would you roll a die once and then exclaim: "Wow, it came up six! What are the chances?" In real life, there might be millions of outcomes, each with one-in-a-million chance of coming true, but the fact that one of them happens shouldn't be surprising.

Fighting against all of the pitfalls inherent in asking "What are the chances?", I've developed a reflexive response: "What are the chances?" One.

Thursday, May 21, 2009

Ida y Vuelta: a Tail?

It was all over the news and the net yesterday: the missing link has been found! Darwin has finally been proven right!

Hold on a minute. There's no "missing link"--we have example after example of the evolutionary forebearers of Homo sapiens. Evolution was not in question--we already knew Darwin was right. Ida (the name of the fossil found) is not a direct ancestor of humans--the fossil represents a transitional form between lemurs and other apes.

I do not mean to denigrate what is obviously a very important and exceptionally well-preserved fossil that may provide important insights into primate evolution. My objection is to the media frenzy that gives the impression that this is the final resolution to a scientific debate about evolution that a) never existed and b) would not have been resolved by the fossil in question if it did exist.

We have Neanderthals, Homo erectus, Homo habilis, Australopithecus afarensis, and more. Why is Ida, who isn't even our evolutionary forebearer, the "missing link"? I have a theory: it's about the tail. I think that somehow, through all the monkey-human-common-ancestor debate, through the discovery of early hominids, through all of the discussion about increasing cranial capacity and tool usage, everyone has really been wondering "what about the tail? What happened to the tail?" The media was holding out for something it could tout as a human ancestor (or something close enough) with a tail to finally declare the "missing link" found and the evolution debate settled.

You might say that this is all for the best--that the false debate, however belatedly, is finally being put to rest with one last hurrah. But here's a scenario that makes be cringe just to think about it: suppose it turns out that this fossil, which was recovered from a private collection, turns out to be something other than the specimen that the scientist in question thinks it is. Scientists makes mistakes--that's what peer review is for. Then suddenly we've breathed new life into a misconception about evolution that has been hanging around for way too long already.

Tuesday, May 12, 2009

Golomb Rulers/Squares/Rectangles

Yesterday I came across an interesting example of the isolation of academic fields from one another.

A common design parameter for antenna arrays is to try to obtain uniform coverage of the aperture plane to get as many independent measurements as possible. Interferometers sample the aperture plane at locations that correspond to the difference vectors between antenna elements. For example, in one dimension, if I put four antennas at locations (0,1,4,6), then that array would sample the difference set of those positions: (1,2,3,4,5,6). However, if I put antennas at (0,1,2,3), the difference set would only include (1,2,3) with 1 occuring 3 times and 2 occuring twice. For the purpose of uniformly sampling an aperture, redundant spacings are lost measurements. We're looking for a minimum-redundancy array.
There have been a few papers in radio astronomy on minimum-redundancy arrays for the more useful 2-dimensional case, including Golay (1971), Klemperer (1974), and Cornwell (1988).

Thinking that this might be a mathematical problem of interest, I ran it by a good friend of mine: Phil Matchett Wood--a mathematician at Princeton. He quickly uncovered the equivalent problem as formulated in math literature: Golomb rulers in 1-D and Golomb rectangles in 2-D. Some relevant papers on the subject are Shearer (1995), Meyer & Jaumard (2005), Robinson (1985), and Robinson (1997). These papers are on the exact problem and describe applications to "radar and sonar signal detection", obviously referring to the need for independent aperture samples in radar and sonar interferometers. Somehow, the differing nomenclature between these fields was never quite bridged, and so there has not been and cross-referencing between these two formulations of the same underlying problem. People like to talk about the possibility that relevant research in one field goes unnoticed by other fields. This is the first time I've across it myself, though.

Tuesday, May 5, 2009

Compressed Sensing and Wiener Filtering

Today I'm trying to expand my understanding of how we can best remove contaminant signals from the data we take with the Precision Array for Probing the Epoch of Reionization (PAPER). There is a specific problem I want to make sure we can solve for PAPER. Foregrounds to our signal, particularly synchrotron radiation, are expected to be very smooth with frequency. The idea put forth by the MWA and LOFAR groups is that by observing the same spatial harmonics at multiple frequencies, we should be able to remove such smooth components to suppress them relative to the cosmic reionization signal we are looking for. However, generating overlapping coverage of spatial harmonics as a function of frequency is expensive. My intuition is that since foregrounds do not have a spatial structure that changes dramatically with frequency, we shouldn't need to sample a given spatial harmonic very finely in frequency to get the suppression we want. This would allow us to spread our antennas out a little more and get measurements of the sky at a variety of spatial modes.

In many ways, our problem is analogous to what was done with the Cosmic Microwave Background (CMB). For foreground removal in CMB work, Tegmark and Efstathiou (1996) begin with an assumption that foregrounds can be described as the product of a spatial term and a spectral frequency term. This allows them to construct Wiener filters that use the internal degrees of freedom of their data, together with a model of their foreground and a weighting factor based on the noisiness of their data, to construct a filter for removing that foreground. For the most part, this is standard Wiener filtering, except they have to be careful about what they do to their power spectrum, so they apply a normalization factor to correct for a deficiency in Wiener filters. Tegmark (1998) goes on to generalize this technique for foregrounds that vary slowly with frequency. I'm in the process of wading through these papers, but they seem to be directly applicable to what we are doing, and seem to confirm my suspicions that synchrotron emission should be well-enough behaved to require only sparse frequency coverage of a wavemode in order to be suppressed.

Another tactic that I am investigating is that of compressed sensing which I was alerted to in talks by Scaife and Schwardt at the SKA Imaging Workshop in Socorro this last April. The landmark paper on this principle seems to be Donoho (2006), where it is shown that the compressibility of a signal (being sparse for some choice of coordinates) is a sufficient regularization criterion to faithfully reconstruct signals using a small number of samples. In a way, this technique has an element of Occam's Razor in it--it tries to find a solution, in some optimal basis, that needs the fewest non-zero numbers to agree with the measured data. At least, that's my take on it without having finished the paper.

The relevance of compressed sensing to image deconvolution is explored in Wiaux et al (2009), and it seems to be powerful. I'm excited by this deconvolution approach because it meshes well with the intuitive approach I've been taking to deconvolution, which was to use wavelets and a Markov Chain Monte Carlo optimizer to find the model with the fewest number of components that reproduces our data to within the noise. Compressed sensing seems to be exactly this idea, but is agnostic about the basis chosen, instead of mandating one like wavelets. Anyway, this technique may also be relevant to our foreground removal problem because we might be able to use it to construct the minimal foreground model implied by our data. For synchrotron emission, which should have smoothly varying spatial structure with frequency, I envision that this could construct a maximally smooth model that would allow us to use sparse frequency coverage to remove the foreground emission to the extent that it is possible to do so.

Monday, May 4, 2009

Is Tenure a Problem in Science Departments?

Somewhat belatedly, I wanted to comment on the New York Times Op-Ed by Mark Taylor that addresses some of the flaws of the current academic system and proposes some solutions. The central problems that Taylor highlights in his article are that academic departments are too isolated from one another and from the world, and that there aren't enough academic positions for all of the people who are getting doctoral degrees these days. Taylor makes some very good points, but his article is strongly influenced by his experiences in a humanities department, and I am not sure how relevant his suggestions are for a science department.

Astronomy suffers from many of the same problems Taylor describes. There are far more graduate students and post-doctoral researchers than there are tenured professorial positions, and yet students are trained as if they were all to be professors. However, science students are often not paying their own tuition (it is paid by the grant of a supporting professor) and there are more options for science students outside of tenure-track positions because of the many sources of external funding that support scientific research. Unlike the humanities, there are a variety of scientific programming and research positions for graduates who are not seeking professorial positions. These positions are aligned with the education students receive through their doctoral research.

This isn't to say that there is not a major problem in scientific disciplines concerning the ratio of student positions to professional positions. Rather, it is that the problem may not be as closely tied to tenure and the longevity of tenured professors as in the humanities. The problem may be that professional positions available to graduating students are being occupied by the students themselves. Scientific research in the United States relies on a large pool of skilled labor. Currently, this labor is being bought at well under market price in the form of cheap graduate student researchers. If more of these positions were filled by full-time research professionals, we might have a healthier employment system for scientific academia.

The problem is that this raises the price of research in the United States and may result in a reduction in the total number of projects (and therefore, researchers) that can be supported. From the perspective of researchers, this may be a healthier state of affairs--to not be misled into spending 5-7 years underpaid as a graduate student only to find that the only way to continue to do what you've been trained to do is to continue to be underpaid. But unlike in the humanities, science graduate students usually have not accumulated debt beyond their undergraduate education and they have been supported (however cheaply) through this process. The solution, then, may simply be to ensure that prospective graduate students in science are well-informed about what employment prospects they should expect after they file their dissertation.

Monday, April 20, 2009

Interferometry File Formats

When astronomers talk about software done right, they often hold up FITS as the gold standard. I'll admit, FITS has done more to live up to its namesake (Flexible Image Transport System) than many believed possible. But unfortunately, there can be too much of a good thing. Sometimes too much emphasis is put on defining an end-all-be-all file format, when all we really need are good tools for converting between file formats.

Data formats are a problem in radio astronomy software. Currently, there are at least three major formats (MIRIAD, UVFITS, and MeasurementSets), each linked to a major software package (MIRIAD, AIPS, and CASA), with rudimentary/non-existant tools for converting between them. Many have taken this current state of affairs as a sign that multiple file formats are bad and that the community should decide on a single format. Since each format is intimately tied to major software package, this battle over file formats has escalated to a war between software packages.

The mistake made here was blaming the file formats. File formats are not the problem. The problem is that the software for reading them has not been circulated in easily accessible modules. I am encountering this problem as I'm trying to get AIPY to be agnostic about file formats by wrapping them all into Python. Here's where I am:

The MIRIAD file format was actually easily wrapped up, owing to MIRIAD having a developed programmer's API.

MeasurementSets (with CASA) are giving me a lot more trouble. It seems that CASA, with all of it's C++ objects that are passed between functions, is something of an "all or nothing" deal. If I want to read a MeasurementSet, I apparently need to wrap up the entirety of CASA. The failing here is code modularity.

UVFITS is giving me the opposite problem.
UVFITS was cooked up as a FITS-conforming file format to handle raw interferometric data. Unfortunately, interferometric data isn't in picture form yet, so an extension of the FITS format (the binary table) was cooked up to accommodate that (Cotton et al. 1995). The result was a file format that is so general that it does not tell the programmer what the data actually means.

File formats exist to support the needs of different applications. They've been created out of need, and should not be dismissed as unnecessary. I recommend to the radio astronomy software community that we embrace these file formats and work on modular code so that they are accessible from any software package.

Thursday, April 16, 2009

PAPER: 8 Station Results

Yesterday I submitted our paper on the 4- and 8-antenna deployments of PAPER to the Astronomical Journal & astro-ph. Although it does not place any meaningful constraints on cosmic reionization (when light from the first stars broke up the bulk of the hydrogen gas that had been sitting around since it originally formed), it nonetheless illustrates a first level of calibration and analysis towards that goal. This paper should have some impact in the community, as it shows that we are fairly well along in our experiment and it provides a first look at some of the astrophysical foregrounds that will interfere with detecting reionization. This paper also will double as the last chapter of my dissertation, so publishing it puts me in the endgame of my doctoral studies... This fall I'm going to take an NSF postdoc back at Berkeley. Basically, that just means I'll be doing the same research (along with some teaching), but with better pay.

Tuesday, April 14, 2009

More on MCMC in Python

Markov-Chain Monte Carlo (MCMC) seems to be a promising technique for the calibration/imaging problem that we are facing with our experiment the Precision Array for Probing the Epoch of Reionization (PAPER). Yesterday, in addition to taking a crash-course in MCMC, I also started playing with PyMC, which implements, among other things, MCMC using Metropolis-Hastings chains. A first shot at a simple fitter using PyMC went something like this:

import pymc, numpy as n, pylab as p
from pymc import deterministic

x = n.arange(-10., 10, .01)

def actual_func(a, b, c): return a*x**2 + b*x + c

sig = .1
tau = 1/sig**2
noise = n.random.normal(0, scale=sig, size=x.size)
mdata = actual_func(1., 2., 3.)
mdata += noise

a_fit = pymc.Uniform('a', 0., 2.)
b_fit = pymc.Uniform('b', 1., 3.)
c_fit = pymc.Uniform('c', 2., 4.)
tau_fit = pymc.Uniform('tau', tau/3, 3*tau)

def func(a=a_fit, b=b_fit, c=c_fit): return actual_func(a, b, c)

mdata_var = pymc.Normal('mdata', mu=func, tau=tau_fit,
value=mdata, observed=True)

mdl = pymc.Model([a_fit, b_fit, c_fit, d_fit, mdata_var, tau_fit])
mc = pymc.MCMC(mdl)

a_hist, a_edges = n.histogram(a_fit.trace(), 40, (0,2))
a_bins = (a_edges[:-1] + a_edges[1:])/2
p.plot(a_bins, a_hist)

I chose this example over the ones that came with PyMC because it was much closer to the kind of calibration problems we will be trying to solve with PAPER. An interesting point that came up early on was the reliance of MCMC on an estimate of noise levels in the data. I remember this from the Maximum Entropy Method (MEM) for deconvolution that I coded up in AIPY. An interesting technique, used here, is to actually include the noise level (tau_fit) as a variable that is fit for. This way you can specify a starting guess for noise, along with a confidence interval, and not have to pull a magic number out of a hat. In this example, the fitter does a good job of accurately determining noise levels. I think what happens in this code is that the current guess for the noise level is used as part of the calculation that determines the next state to jump to in the Markov chain, and that new state make include a revision of the noise level. This clearly might be instable for more complex systems, so I imagine some amount of care must be exercised in leaving noise as a free parameter.

Monday, April 13, 2009

Image Deconvolution with Markov-Chain Monte Carlo

I've decided to morph this blog to be more about short updates pertaining to current ideas I'm thinking about, rather than the long-winded philosophical rants I've posted so far. Hopefully this might keep me more engaged as a blogger and maybe even help me keep better track of the things I'm working on. Starting up this blog again, by the way, is a shameless procrastination technique, since my dissertation is due in about 1 month, and all my writing energy should really be focused on that...

After attending an SKA Imaging Workshop in Socorro, NM a couple of weeks ago, I've developed an interest in Bayesian statistics and Markov-Chain Monte Carlo (MCMC) techniques as they pertain to interferometric imaging. Having never taken a stats course, I'm scrambling a little to absorb the vocabulary I need to understand papers written on the subject. Fortunately, in this era of wikipedia, getting up to speed isn't that hard. After reading wiki articles on MCMC, Markov Chains, and the Metropolis-Hastings algorithm, I dived into EVLA Memo 102, which talks about a first shot at using MCMC for image deconvolution.

The Maximum Entropy Method (MEM) is a classic deconvolution technique (one I've already reimplemented for AIPY), but I'd like to go a bit further down this road. According to the standard implementation (which I gleaned from reading Cornwell & Evans (1984) and Sault (1990)) this algorithm uses a steepest descent minimization technique based on the assumption of a nearly diagonal pixel covariance matrix (i.e. the convolution kernel is approximately a single pixel). While this is an effective computation-saving assumption, I found that for the data I was working with, this assumption lead to the fit diverging when I started imaging at finer resolutions.

I think MCMC, by not taking the steepest decent, might be able to employ the diagonality assumption more robustly. I also think it's high time that deconvolution algorithms make better use of priors. The spatial uniformity prior in MEM makes it powerful for deconvolving extended emission, while the brightest-pixel selection technique in CLEAN makes it effective for deconvolving point sources. There's no reason we can't build a prior explicitly for a deconvolution algorithm that tells it to prefer single strong point sources over many weaker point sources, but also tells it that when all else is equal, entropy should be maximized.