Given that we're now in overtime here for our second child, I have become very interested in the question of what our "due date" meant in the first place.
The plotted on the left are data from a study of Canadian births, 1972-1986 (Arbuckle & Sherman 1989). I then started trying to aggregate some data for my own plots.
The first thing I noticed required a little attention was the "fence post" problem relating to recording gestational age. If a study A records births in the 39th week, where exactly is that on the x axis? Well, if we assume that they started counting with 1, then the 39th week is actually 38.5 +/- 0.5 weeks (they are counting fence). But if study B records births from week 39-40, the implication is that they started at 0 (they are counting fence posts), so 39-40 is 39.5 +/- 0.5. That was a little tricky.
Then we have studies that bin over different time intervals. If another study C records births at 37-41 weeks, how do we relate that to A and B? The intelligent way to plot this would be to use the probability density of going into labor, which divides out by the length of time over which the observation was made. So we should be careful to do that.
And then there's just the general problem that a lot of studies list percentages for births in each time bin, but don't list total populations or error bars, so we don't know what the errors in their measurements were. Ugh. So I crossed my fingers and hoped they followed good practices with their significant figures: I assigned an error equal to the last significant digit they list.
I got most of my data from this semi-thorough compilation of census data and going to some of the original sources. The data aren't great, but they are adequate (see plot to left). I fit to the data an increasing exponential tail to the left, plus a normal distribution. The fit isn't great (despite some claims in the literature of it being normally distributed), but it captures enough of the overall distribution.
The width of the normal distribution was 1.5 weeks, centered at 39.5 weeks. This seems consistent with several sources that suggest ~10% of pregnancies would go into the 42nd week if they were allowed to. It also suggests that the "due date" means the mean of the normal portion of the distribution. Yet fully 1/2 of pregnancies will go beyond the expected due date, and 1/6th will go past 41 weeks, according to this coarse fit.
Of course, none of this accounts for biases that we know exist. The growing prevalence of inductions and C-sections move births earlier artificially. Although the statistical significance may questionable owing to systematic biases (self-selection for uncomplicated pregnancies, etc.), it appears that the recorded midwife births go later than the aggregate (presumably hospital-dominated) births at the 2-sigma level. This may potentially indicate that without intervention biases, the distribution of birth dates around the expected due date could be broader and weighted toward later dates.