Tuesday, December 28, 2010

Distribution of Actual Births Around Estimated Due Date

Given that we're now in overtime here for our second child, I have become very interested in the question of what our "due date" meant in the first place.

The plotted on the left are data from a study of Canadian births, 1972-1986 (Arbuckle & Sherman 1989). I then started trying to aggregate some data for my own plots.

The first thing I noticed required a little attention was the "fence post" problem relating to recording gestational age. If a study A records births in the 39th week, where exactly is that on the x axis? Well, if we assume that they started counting with 1, then the 39th week is actually 38.5 +/- 0.5 weeks (they are counting fence). But if study B records births from week 39-40, the implication is that they started at 0 (they are counting fence posts), so 39-40 is 39.5 +/- 0.5. That was a little tricky.

Then we have studies that bin over different time intervals. If another study C records births at 37-41 weeks, how do we relate that to A and B? The intelligent way to plot this would be to use the probability density of going into labor, which divides out by the length of time over which the observation was made. So we should be careful to do that.

And then there's just the general problem that a lot of studies list percentages for births in each time bin, but don't list total populations or error bars, so we don't know what the errors in their measurements were. Ugh. So I crossed my fingers and hoped they followed good practices with their significant figures: I assigned an error equal to the last significant digit they list.


I got most of my data from this semi-thorough compilation of census data and going to some of the original sources. The data aren't great, but they are adequate (see plot to left). I fit to the data an increasing exponential tail to the left, plus a normal distribution. The fit isn't great (despite some claims in the literature of it being normally distributed), but it captures enough of the overall distribution.


The width of the normal distribution was 1.5 weeks, centered at 39.5 weeks. This seems consistent with several sources that suggest ~10% of pregnancies would go into the 42nd week if they were allowed to. It also suggests that the "due date" means the mean of the normal portion of the distribution. Yet fully 1/2 of pregnancies will go beyond the expected due date, and 1/6th will go past 41 weeks, according to this coarse fit.

Of course, none of this accounts for biases that we know exist. The growing prevalence of inductions and C-sections move births earlier artificially. Although the statistical significance may questionable owing to systematic biases (self-selection for uncomplicated pregnancies, etc.), it appears that the recorded midwife births go later than the aggregate (presumably hospital-dominated) births at the 2-sigma level. This may potentially indicate that without intervention biases, the distribution of birth dates around the expected due date could be broader and weighted toward later dates.

Friday, August 13, 2010

Listening to the Astro2010 Decadal Survey

Here are some streaming comments as I'm listening to results of the 2010 Decadal Survey. A shortlink to the report is here.

11:10 am EDT: Excellent placement of "what were the first luminous objects, and when did they form" second from the top of "Major questions to address this decade".

11:20 am EDT: Ooh, even better. "Cosmic Dawn" is the first of the 3 over-arching fields. Good fielding for EoR as a top science priority in the next decade!

11:23 am EDT: Now we're starting into descriptions of the panels. I was involved in several of the RMS (Radio, Millimeter, and Submillimeter) submissions, and am looking for HERA (the Hydrogen Epoch of Reionization Array). I'm also rooting for the Allen Telescope Array's Radio Sky Surveys Project. Finally, I'm hoping for some mention in the TEC (Technology development) program of prioritizing the development of shared solutions to digital signal processing hardware and libraries, which was the recommendation of a white paper I drafted with the help of the CASPER community.

11:45 am EDT: Mention of SKA as a priority for Radio Astronomy, unsurprisingly.

11:47 am EDT: Looks like Roger's wrapping up here. Turning to Q&A.

Meanwhile, I'm reading through the report. I see radio instrumentation is listed as a funding priority on ES-4, linking to 7-39. A good sign.

A painful line on 1-18: "U.S. participation in projects such as the Square Kilometer Array is possible only if there is either a significant increase in NSF-AST funding or continuing closure of additional unique and highly productive facilities." Ouch.

But on 2-12, my own sky map (well, with "permissions pending" for now). Now that's something!

12:01 pm EDT: An interesting question. "Why such a priority on habitable planets in the decadal review?" Sounds like the answer is that it's particularly primed to make big breakthroughs. I think I agree with that. I wasn't surprised that it was on there. There's some rumors going around that Kepler has found earth-like planets in earth-like orbits.

12:02 pm EDT: What about the surplus of post-docs relative to faculty positions? Answer: there are a wide range of careers and positions available to astronomers, so it's not unreasonable to have a larger number of post-doc positions where budding astronomers get training. I'm not sure that answer fully appreciates the scale of the problem.

Continuing with reading, on page 3-13, "The HERA program, a project that was highly ranked by the RMS-PPP and included by the committee in its list of compelling cases for a competed mid-scale program at NSF, provides a development pathway for the SKA-low facility. Progress on development of the SKA-mid pathfinder instruments, the Allen Telescope Array in the U.S., the MeerKAT in South Africa and the ASKAP in Australia, and in new instruments and new observing modes on the existing facilities ... will provide crucial insight into the optimal path towards a full SKA-mid." That's good to see mentioned. Sounds like it lays the groundwork for a strong future proposal to get funded. It's not a promise of funding, though. Not that such a promise was expected.

12:11 pm EDT: Second use of "tripwires" for projects. A very colorful phrase.

Interesting plot on 4-15: papers in all astronomy fields are increasing. Instrumentation papers seem to be low in number, but holding their own against other fields (same percentage contribution to total paper number).

On 5-14 for Data Reduction and Analysis Software: "Flexibility, openness, and platform independence, modularity, and public dissemination are essential to this effort. Focused investment in a series of small-scale initiatives for common tool development ... may be the most cost-effective approach, although there are undoubtedly synergies with the pipeline development needed for the large-scale projects." Sounds helpful for some of my projects like AIPY, SPEAD, and CASPER.

12:18 pm EDT: Comments on the SKA? Answer: SKA is the future, but the US can't pay for the construction on the proposed time scale (but slower might be ok). Technology development should be prioritized though. Low-frequency SKA, though, targets EoR, and we're interested in projects targeting that. Yes!

On 5-21 for Technology Development: "The committee received community input in the form of white papers on the funding needs for technology development in areas such as ... high speed, large N correlators. In these areas and others, researchers ... had come together to plan a coherent strategy for the decade. The OIR and RMS panels made a convincing case that the current level of ATI funding needs to be augmented in order to successfully pursue these highly-ranked technology development programs and roadmaps." Looks like my white paper fell on receptive ears.

12:28 pm EDT: Neil Tyson is closing down Q&A. He is one cool dude. I'm glad he was on the panel.

On 7-7 for Science Objectives for the Decade: "Find and explore the epoch of reionization using hydrogen line observations starting with the HERA telescopes that are already under construction." Wham. And in Table 7.1 on 7-32, Priority 2, Projects thought compelling: HERA.

On B-2 for Program Priorization, in Table B.1, I see both ATA and HERA. I'd say ATA didn't necessarily win big in the review, but at least they're there.

And finally, in appendix D-1: "Hydrogen Epoch of Reionization Array ... is a multi- stage project in radio astronomy to understand how hydrogen is ionized after the first stars start to shine. The first phase (HERA I) is under way and will demonstrate the feasibility of the technical approach. The second phase (HERA II) would serve as a pathfinder for an eventual world-wide effort in the following decade to construct a facility with a total collecting area of a square kilometer and the power to make detailed maps of this critical epoch in the history of the universe. Proceeding with HERA II should be subject to HERA I meeting stringent performance requirements in its ability to achieve system calibration and the removal of cosmic foreground emission." We've got our work cut out for us!

Tuesday, July 27, 2010

Don Backer


Sadly, the world and I lost Don Backer on July 25, 2010. In addition to being an inspiring scientist, instrumentalist, and educator, Don was my graduate and post-doctoral advisor, and my close friend. He and I worked closely together from 2004 until last Sunday, when he died suddenly of an apparent heart attack.



Don was well known for discovering the first millisecond pulsar early in his career. More recently, he and I had been working on the Precision Array for Probing the Epoch of Reionization--and experiment for detecting the first stars and galaxies that formed in the universe. We recently had a made a lot of exciting progress with this experiment, and it is especially tragic to lose him at such a pivotal time.

Don was a very warm yet reserved man. He was always extremely busy, and I envied his ability to juggle a huge number of tasks at once. Yet every time I walked into his office, gave me a big welcoming smile, saying "Hi Aaron! Come on in." In that instant between when he looked up and when recognized me, I would sometimes see a hint of displeasure at being interrupted (a lot of people in the department walked into Don's office to hassle him about any of the many projects that he was involved in), but it was always gone the instant he recognized me, and I took pride in being someone from whom Don welcomed interruptions.

Don was always a model to me of how to be and instrumentalist and a scientist. I've long been interested in both building and using scientific instruments, and Don was a shining example of how to do both. I learned a lot from Don that helped guide me professionally, and I owe him a lot for his advice and generosity. It is sobering to consider that my next steps will have to be without Don's quiet support and encouragement. In many respects, though, Don's generosity has already helped pave the next steps for me. I'm sure I will continue to incur debt to him for years to come.

One of my favorite qualities of Don was his grand sense of adventure. Our foray into the Karoo desert to deploy PAPER in South Africa could not have happened without Don's enthusiasm for traveling, roughing it, and flying by the seat of the pants. I loved going on deployment expeditions with Don. He was always bright-eyed and smiling, summoning such energy at 66 years that I, at 29, struggled to keep up. It was not hard to see the Don of the black-and-white photographs, the same wiry energy and wry grin that stood in front of me.

Don was never very forthcoming with advice--he advised me more by example. I'm pretty sure this was a result of a very ingrained sense of humility. Don never said "you're wrong", or "you should". I think he didn't feel it was his place to pass judgment on people. Despite this humility, or probably because of it, Don was an effective leader. Without badgering people or using heavy-handed methods, Don brought people into consensus and helped move projects forward. Unfortunately, his effectiveness, coupled with his self-described "responsibility gene", meant that he was often called upon to bail out troubled projects, and he had a hard time refusing them. I often wished Don spent more time on PAPER. I think he did, too.

Don and I were a great team. I'm not a good multi-tasker. Don insulated me from a lot of project management, logistics, and distractions, carving out a space for me to work effectively toward our goal. Soon, some of the important products of our partnership will bear fruit, and I'm sad that Don won't be there to see it. But he knew it was in the works before he left, and for that I am thankful.

I'm sad to have lost a good friend and mentor. Things are hard now as we try to pick up the pieces of all the many things Don was managing. I'm sad that he's not here to help. He was always good at bailing us out.

Monday, March 29, 2010

Who Dominates Health Care Costs?

There's nothing like a little bout with MRSA to make one pay a little more attention to the state of health care legislation. Two nights in the ER are definitely making me thankful for health insurance. Knowing that it wasn't going to cost me an arm and a leg to get antibiotics through an IV (in fact, it probably saved me the leg) definitely helped me to seek care early, rather than waiting for the infection to get truly life-threatening. And that probably saved in health care costs in the long run.

I've heard it argued many times by the other side that universal health care will drive up the cost of health care for everyone, because so-called "healthy people" will be paying, through their premiums, for the bills of the "unhealthy". Ignoring that:
  1. the above is a tautological statement about what insurance is
  2. people routinely go from the "healthy" group to the "unhealthy" group and back again
  3. we should maybe feel a moral obligation to care for the unhealthy
Yeah, ignoring that, I wanted to know if the underlying assumption was, in fact, true. Who dominates health care costs? Is it the small number of extremely sick people? Or is it the larger number of moderately sick people?

To answer that question, I went searching for the population distribution of health care costs. I found the following publication: Variations in Lifetime Healthcare Costs across a Population (Forget et al. 2008). To the left are reproduced Figs. 4 and 5.

Given all the hype, I was somewhat underwhelmed to see that these curves depict (with the exception of an excess at the lowest cost bin) a gamma distribution. This isn't surprising, because a gamma distribution is supposed to represent the sum of a bunch of exponentially-distributed random variables.




To find the contribution of people in each cost bin to the total health care cost of the population, we simply need to multiply the population of that bin (drawn from a gamma function) by the mean health care cost of that bin (a linearly increasing function). Setting the mode of the gamma distributon to $90k for females, and tweaking the k and theta parameters (I'll chi-by-eye it at k=4.5, theta=1.0) we get the following distributions of fractional population (black) and fractional total health care cost (red), as a function of lifetime healthcare cost:

So who dominates health care costs? Those just slightly above the mode, which is to say, the large number of people who are just a little sicker than most. And that really could be any of us, folks.

Wednesday, February 10, 2010

AstroBaki on MediaWiki

I just started up a new wiki called AstroBaki. The main reason I did this was that my MoinMoin AIPY wiki was clunky to use and was getting spammed lots. I switched to the MediaWiki engine, which has better automated control over these kinds of things. As an added bonus, MediaWiki has support for latex math. This got me thinking...

When I started grad school, I had a hard time transitioning from feeling like I was producing and contributing (I was working as a development engineer for SETI) to just absorbing knowledge. To make myself feel better and more invested in learning, I started doing something for which I became moderately famous around the department: latexing lecture notes on-the-fly. For full disclosure, I should mention that I copycatted the idea of latexing on-the-fly from my friend Phil.

The key to success is to use lots of "defs", and to recognize when you need to def a sequence of commands. When the same sequence of symbols started popping up, I would pretend that I had already def'd the command and start using it, and when there was a pause in the derivation, I would remember to scribble down what that command should mean. In my later years, I also started drawing figures in paint for inclusion in latex.

Anyway, I now have about 4 or 5 latex'd class notes that I have put on my website. From what I hear, they are still regularly used in UCB classes, and I occasionally get happy emails from grad students thanking me for the effort. Meanwhile, I've been reading a book about Nicolas Bourbaki, a famous pseudonym for a group of (mostly French) mathematicians who collaboratively re-wrote mathematics from 1935 to the 70s. Nicolas Bourbaki was a wiki, ahead of its time.

"Now wouldn't it be cool," I thought to myself, "if students using these lecture notes could fix them when they are wrong (after all, they were written on-the-fly), and re-organize them to make more sense?" Could these notes become a sort of open-source textbook for astronomy? So AstroBaki was born.

The difficulty, I am finding, is in translating latex (especially latex heavy in defs) into mediawiki. The best tool I've found so far has been pandoc, which didn't do the defs, but did everything else pretty well. I'm loath to do things by hand, so I'll see what can be automated, and I'll keep you posted.

Friday, January 22, 2010

Where is GCC for FPGAs?

A lot of the digital signal processing that gets done in radio astronomy these days is done on Field Programmable Gate Arrays (FPGAs), and one of the projects I've been working on from the beginning in my research is developing open-source libraries for programming these chips. My part in this has generally been on the algorithmic/mathematical side: writing FFTs, filters, cross-correlation engines, etc. Another key aspect of this work, though, is a toolflow that allows people to design systems at a high level with parameterized algorithmic cores, and to turn that design into the wiring instructions that tell the FPGA how to implement the system.

We currently use a design entry system based on Simulink running on Matlab, and while it is an extremely powerful environment, we've also found it to be limiting, frustrating, and hard to maintain designs in. In October, I volunteered at an international workshop on astronomy signal processing to explore alternatives to this environment. My current favorite is MyHDL, which uses Python to generate lower-level code in Verilog or VHDL, and I may start looking more deeply into porting a design to use MyHDL.

Something that is bothering me, though, is that however much we work on porting our toolflow open-source equivalents, there is currently no open-source compiler for FPGAs. The state of affairs in FPGA-land is something like PCs in the '70s, when every personal computer had its own specialized compiler. For PCs, the problem was solved by GCC (the Gnu Compiler Collection), which became the default open-source solution for compiling most languages to target the many CPU architectures that exist in the world today.

I'm keeping my eye on gEDA, and notably Icarus, which seems to be a free synthesis tool (synthesis, mapping, and routing are the 3 main stages of compiling for an FPGA). Perhaps mapping and routing can never be open-source, since they tend to be very chip-specific. But here's hoping...

Friday, January 8, 2010

Hands-On Cosmology Education

Yesterday I spent the morning giving a gosh-wow talk about cosmology to a physics class at Athenian High School taught by my housemate Dave Otten. It was a lot of fun, and the students were all very enthusiastic. It was almost entirely driven by their questions, and they loved being pitched curveballs (time is reference-frame dependent, the universe is expanding, spiral arms are standing waves, etc). The hour-and-a-half lecture was over before we knew it.

Afterward, Dave mentioned that it would be really cool if there were a way to talk about galactic-scale astronomy and cosmology that was in keeping with the philosophy of their school, which emphasizes lab-based, hands-on learning. He mentioned that PhET is a free resource he uses for providing interactive simulations that make hands-on labs out of subjects that otherwise would be too slow, small, big, fast, or dangerous to perform live in a classroom. He also lamented that there aren't any galactic- or cosmological-scale simulators there that could help to understand how systems on this scale behave, and that could perhaps illustrate exactly where the problems of dark matter and dark energy are encountered. Has anyone seen something like this?