A Big Answer

Published

on December 22, 2010

. 280 Comments

I was going to wait a few days before posting the answer to yesterday’s puzzler, but we’re up well over 100 comments already, the holidays are almost upon us, and I think it’s time to settle this so you can all give your full attention to whatever festivities you’ve got coming up.

Here’s the puzzle again:

More precisely: What fraction of the population should we expect to be female? That is, in a large number of similar countries, what would be the average proportion of females?

Stop reading here if you don’t want spoilers:

Here’s the wrong answer: Every birth has a 50% chance of producing a girl. This remains the case no matter what stopping rule the parents are using. Therefore the expected number of girls is equal to the expected number of boys. So in expectation, half of all children are girls.
Pretty convincing, eh? So why is it wrong? Well, actually, most of it is right. Every birth has a 50% chance of producing a girl — check. This remains the case no matter what stopping rule the parents are using — check. Therefore the expected number of girls is equal to the expected number of boys — check! But it does not follow that in expectation, half of all children are girls!
To see why not, let me tell you about the families who live on my block. There are 3 families with four girls each (and no boys), and one family with 12 boys (and no girls). Altogether, that makes 12 girls and 12 boys — equal numbers! On average, each family has three girls and three boys. Nevertheless, the fraction of girls in the average family is not 50%. It’s 75% (the average of 100%, 100%, 100%, and 0%).
In other words, if you were to choose a random family off my block, the expected number of girls would equal the expected number of boys — 3 in either case. But the expected fraction of girls in the family would be 75%. Moral: Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.

Edit: This came up in comments, and it might be worth adding here: This example in no way relies on there actually being four families. Suppose there’s just one family, that randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). In that family, the expected number of girls equals the expected number of boys, and the expected fraction of girls is still 75%. The country in the original problem is drawn randomly from a universe of possible countries, just as this family is drawn randomly from a universe of possible families.

So that explains why the “obvious” argument is wrong, and that, to me, is really the interesting part. But of course we’re not done until we find the right argument. That’s a bit trickier, and it depends on the country’s population. I’ll start with the case where there’s just one couple. Here are some possible family configurations, with their probabilities:

From this we see that the expected number of boys is

which adds to 1. And the expected number of girls is

which also adds to 1. Sure enough, the expected number of girls is equal to the expected number of boys.

But the expected fraction of girls is

which adds to 1-log(2), or about 30.6%.

For a population of k families, a similar calculation gives an answer of approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.

Several of our commenters were on to various aspects of this, and some were on to pretty much all of it. In no particular order let me acknowledge Vic, DaveB, KenB, ThomasBayes, loveactuary, JonathanCampbell, Brett, wellplacedadjective, JonathanKariv, and mobile — and let me apologize for anyone I’ve inadvertently omitted (with so many comments to digest, I’m sure there are a few). But above all, a humble tip of the hat to the mathematician and backgammon expert Douglas Zare who inspired this post with his brilliant exposition over at MathOverflow (his is the first of the several answers). (Warning: Depending on your technical background, you will find his explanation either perfectly illuminating, perfectly indecipherable, or somewhere in between.)

Now go enjoy your holiday. If your relatives like this kind of thing, you can share it with them. If not, you can use it to get them to leave you alone.

280 Responses to “A Big Answer”

Feed for this Entry Trackback Address

1 1 David Sloan
December 22, 2010 at 3:30 am

I respectfully disagree with your analysis; I believe your error is one of units.

Consider a car that travels at 30mph for one mile, then at 60mph for another mile. Each of these speeds is a ratio (miles per hour), and if you were to average them by computing a weighted average over each mile traveled, you would get 45mph. However, the car’s actual average speed is (2 miles / 3 minutes) = 40mph.

If you had instead computed a weighted average over each hour traveled, you would have gotten the correct answer: (2min * 30mph + 1min * 60mph) / 3min = 40mph

In short, (I think) it matters that the weights be the same units as the denominators of your ratios. It worked when we used weights expressed in time, but not when we used weights expressed in distance.

Your gender ratios are expressed in units of girls/children, but your weights are expressed in units of families (or boys; since every family contains exactly one boy in this model, the two are identical).

When you use the correct weights (number of children in the family), you get:
(1/2 * 1child * 0girl/1child) + (1/4 * 2child * 1girl/2child) + (1/8 * 3child * 2girl/3child) + …
= 1/2 * 0girl + 1/4 * 1girl + 1/8 * 2girl + …
= 0.5girl

Thus, the expected ratio of girls in the population is still 0.5.
2 2 Dick Darlington
December 22, 2010 at 4:44 am

There is a hidden assumption in your solution, which is: at the moment we are looking at this population, each family contains exactly one boy and a certain number of girls (distributed as a geometric variable with parameter 1/2). In other words, every family ever created has completed its “reproductive cycle”. In yet other words: in this population, every woman has a brother.

For such a population, formed of independent “complete” families, your computation is certainly correct.

Problem is, if one looks at the population at a given time, this situation is very unlikely, as a certain fraction of the couples would be in the process of having girls while waiting for a boy. By the time they all have a boy each, new families will be created, etc. In other words: at any point in time, there will be a certain number of women who don’t have a brother.

In _that_ setup, the “official” argument giving 50/50 is correct.

I didn’t do the computation, but I would bet that the proportion of women with no brothers, or the proportion of “incomplete” families, at a time when the population is of size k, is of order 1/k, which explains the discrepancy between the outcomes in the two setups.

Now, which of these models corresponds to your initial question is something you will have to tell us, as it is not clear from the wording (as is almost always the case in such riddles).
3 3 David Sloan
December 22, 2010 at 4:49 am

Another flaw: when you give the formula (1/2) – (1/4k), you say “For a population of size k…”, but the linked explanation says “With k families…”.

With k families, this does work out, assuming you accept the validity of computing a weighted average of ratios without using the same units for weights and denominators (as I object to in my previous post).

With k *children*, the math is much simpler. No matter how you select the k children (chronological order of birth, taking as many children as needed from each of a list of parents, etc) each of the k children was equally likely to be born a boy or a girl, and thus total fraction of girls is 0.5. This is exactly the argument you claim is wrong, but you didn’t actually refute it; you just provided a different argument that gets a different answer.

How can the ratio differ depending on whether k refers to the number of families or the number of children? As Douglas Zare correctly claims in his opening paragraph over on MathOverflow: “The proportion of girls in one family is a biased estimator of the proportion of girls in a population consisting of many families because you are underweighting the families with a large number of children.” Those underweighted families are those with a large number of girls, and so you undercount girls by sampling families instead of children.

Why was it a good idea in the first place to use a biased estimator of the actual quantity we care about (“fraction of the population”), when we can compute the quantity directly?
4 4 David Sloan
December 22, 2010 at 5:10 am

Whoops. In my first post, I flubbed the addition on the very last step. That infinite sum is = 1 girl (not 0.5 girls), the expected number of girls per family. Since every family has a boy, this is what I should have expected.
5 5 Henry
December 22, 2010 at 6:19 am

I am not convinced. Your answer looks like mathematical sleight-of-hand to me. Now, the girl named Florida problem was similarly counter-intuitive at first, so I’m open to the possibility I’m missing something subtle. But to convince me, you’ll need to address Doug’s point from the previous thread:

Also, this is just a modified version of gambler’s ruin.

http://en.wikipedia.org/wiki/Gambler’s_ruin

If the answer was anything other than 50% one could easily outperform the stock market (or any martingale for that matter). Just invest some money on Day 1 and keep it in the market until you’ve had one up day. If your expected number of up days was greater than 1/2, you have now just beat the market and are on the road to riches. (If it’s less just short).

What’s the difference between the two?
6 6 Henry
December 22, 2010 at 6:19 am

I am not convinced. Your answer looks like mathematical sleight-of-hand to me. Now, the girl named Florida problem was similarly counter-intuitive at first, so I’m open to the possibility I’m missing something subtle. But to convince me, you’ll need to address Doug’s point from the previous thread:

Also, this is just a modified version of gambler’s ruin.

http://en.wikipedia.org/wiki/Gambler’s_ruin

If the answer was anything other than 50% one could easily outperform the stock market (or any martingale for that matter). Just invest some money on Day 1 and keep it in the market until you’ve had one up day. If your expected number of up days was greater than 1/2, you have now just beat the market and are on the road to riches. (If it’s less just short).

What’s the difference between the two?
7 7 Dave B
December 22, 2010 at 6:50 am

Henry: With regard to Doug’s point I think that whether you beat the market or not is dependent on the expected difference between the number of up days and down days which as was shown is zero. So the fact that the expected ratio of ups to downs is greater than 50% does not mean you have beaten the market.

(This ignores the issue of the amount going up being the same as the amount going down etc. which would make the comparison with the stock market meaningless anyway. I am assuming Doug’s point is referring to a fair coin tossing game of some sort.)
8 8 Steve Landsburg
December 22, 2010 at 7:08 am

David Sloan:

you say “For a population of size k…”, but the linked explanation says “With k families…”.

Right. I meant to say “For a population with k families….”. I’ve corrected this in the post. Thanks for catching this.

each of the k children was equally likely to be born a boy or a girl, and thus total fraction of girls is 0.5.

This is exactly wrong.

It is true that each of the k children was equally likely to be born a boy or a girl, and thus the expected difference between boys and girls is zero. It does not follow that the expected fraction of girls is .5. You’ve made no attempt to provide an argument for this, and in fact no valid argument exists.

Edited to add: You want to assume a fixed number of children, but this is quite contrary to the spirit of the problem. Presumably you want to observe this country at some fixed moment in time, and at any given moment, the number of children is a random variable.
9 9 Steve Landsburg
December 22, 2010 at 7:14 am

Dick Darlington:

In _that_ setup, the “official” argument giving 50/50 is correct.

The “official” argument cannot give 50/50 because the official argument never even attempts to address the question that was asked. It addresses the expected difference, not the expected ratio. It *does not even offer an argument* for a 50/50 ratio (or any other). And in fact, *whether or not* some families are still reproducing, the correct ratio is *not* 50/50.

So — to compute the correct ratio, you do need to make some assumptions. I made certain assumptions; you’re arguing for others. That’s fine. But there are *no* assumptions under which 50/50 is the correct answer.
10 10 Steve Landsburg
December 22, 2010 at 7:15 am

Henry:

If the answer was anything other than 50% one could easily outperform the stock market (or any martingale for that matter).

This argument fails precisely because the expected fraction of girls is not a martingale.
11 11 Harold
December 22, 2010 at 7:16 am

I am a little confused. I don’t get the distinction between number and fraction. What is the fraction of girls in your apartment block?

There are 12 boys and 12 girls. The simplest way to calculate the fraction of girls is to divide number of girls by total children = 0.5. The fraction of girls is therefore 0.5.

A different way is to calculate the average number of girls in a family, = (1+1+1+0)/4 = 0.75.

If you pick a random family, the fraction of girls expected is 0.75, but that was not the question asked, which was what is the fraction of girls in the block? To my mind, if you use the random family method, you get the “wrong” answer (because there is no account taken of family size.)

Are you saying the fraction of girls in your block is 0.75?

In the countries question, you seem to be saying that there will be equal numbers of boys and girls, but the fraction of girls will not be 0.5, which I don’t quite understand.
12 12 Steve Landsburg
December 22, 2010 at 7:22 am

Harold: The fraction of girls in my apartment block happens to be 50%. But if you choose a family at random, the expected fraction in that family is not 50%.

Now suppose there was just one family in my apartment block, which drew a ball from an urn to tell it whether to adopt four girls or twelve boys. They adopt four girls with probability 3/4 or twelve boys with probability 1/4.

That would be the only existing family. Based on the information you’ve got, the expected ratio in that family is 50%, although the actual ratio may turn out to be either 100% or 0%.

Likewisie with countries. A given country is a draw from a distribution of possible countries. The actual ratio, if you lump all those hypothetical possible countries together, is 50%. But the ratio in any *particular* country is not 50%, either in actuality or in expectation.
13 13 Onus Probandy
December 22, 2010 at 7:29 am

I call shenanigans.

The question was “what fraction of the population is female?”

You have answered “what fraction of a randomly chosen family is female”
14 14 Steve Landsburg
December 22, 2010 at 7:40 am

Onus:

The question was “what fraction of the population is female?”

You have answered “what fraction of a randomly chosen family is female”

No!!! I have answered “In expectation, what fraction of the country is female?”, or in other words, what fraction of a country, randomly chosen from a universe of such countries, should we expect to be female?

The families in the example I gave are analogous to the countries in the original question. The explicit calculation that I gave deals with the countries, not the families.
15 15 Onus Probandy
December 22, 2010 at 7:52 am

Steve:

Your universe of countries is one in which each country is made up of the same family type. So Country A is all “B”, country B is all “GB”, country C is all “GGB”, etc.

If I randomly chose from those countries, then fair enough, your answer is correct. That wasn’t the question though.
16 16 Steve Landsburg
December 22, 2010 at 7:53 am

Onus Probandy:

Your universe of countries is one in which each country is made up of the same family type. So Country A is all “B”, country B is all “GB”, country C is all “GGB”, etc.

That is absolutely not true. Did you bother to look at the calculation before you claimed this?

(To elaborate: I started with an example in which each “country” consists of exactly one family, to illustrate what’s going on. Obviously, with one family, there’s one family type. Then I observed that if there were two or three or k families, you’d need a subtler calculation, which would fail to yield 50% for exactly the same reasons. That calculation does not assume one family type per country. You can see more technical details in the Douglas Zare post that I linked to.)
17 17 Thomas Bayes
December 22, 2010 at 8:35 am

One nitpick with Douglas Zare’s answer:
—
“It is not enough to argue that the expected number of boys equals the expected number of girls, since we want E[G/(G+B)]≠E[G]/E[G+B]. Expectation is linear, but not multiplicative for dependent variables, and G and G+B are not independent even though G and B are.”
—
Independence (or, more precisely, correlation) isn’t the only issue. Even for independent variables, the expected value of a ratio is not equal to the ratio of the expected values. (The expected value of a product of uncorrelated variables is the product of the expected values, though.) This is one of the most important keys to understanding this problem, I believe. And this is why I suggested the Taylor series to expand the ratio about its mean. I also think it is a little easier to find the expected proportion of boys because the random part (G) only appears in the denominator. Also, B is equal to the number of mothers, so I don’t believe B and G are independent because I don’t believe the number of girls is independent of the number of mothers.

Overall, though, Douglas Zare’s approach is a good one.
18 18 Steve Landsburg
December 22, 2010 at 8:45 am

Thomas Bayes: Excellent points. Thanks for this.
19 19 Onus Probandy
December 22, 2010 at 8:51 am

I did read your calculation. I’m not convinced it’s justified (I’m happy to be wrong, I’m only an Internet lurker, not a mathematics professor, but at present, I don’t follow the logic).

If you say my analysis of your calculation is wrong. Here is a secondary question then: what calculation would you do to answer the question you say you haven’t answered?

What fraction of a randomly chosen family is female?

One half of the families are 0%
One quarter are 50%
One eighth are 66%
..etc…

I can pick any of these families at random. Half of my selections will return a 0% family, one quarter of my selections will return a 50% family, etc.

But hold on… that’s your calculation. The one you say you haven’t done.

(Maybe I’ve made two mistakes, and they are cancelling out)

My guess at the fault in your calculation is that your selection is from a pool of variable sized families. The “family” is your unit. Your question is phrased so that you should be randomly selecting individuals. You should really be scaling each term by “proportion of population represented by this group of families” as well, since e.g. 1/8 of the families are three times bigger than 1/2 of the families.

Except that you’ve got another fault… you’ve completely ignored the parents. There is at least “GB” in every configuration. The largest part of your population is the one most affected by this, since it turns “B” into “GBB”. Perhaps you’re allowing for that by assuming that every parent is also a child… I might have missed that bit of reasoning, if so, forget this paragraph.
20 20 Onus Probandy
December 22, 2010 at 8:58 am

Looking above, I think I’m making the same argument as David Sloan is in the first comment. Essentially: your units mismatch.

Half of families doesn’t equal half of the population.
21 21 Steve Landsburg
December 22, 2010 at 9:13 am

Onus:

You write: The “family” is your unit.

That’s where you’re confused. The *country* is my unit. The analysis depends on the number of families in that country. I’ve done the detailed calculation in the special case where the country has one family. But I’ve also pointed out that the same principles underlie the calculation you’d do in a country with two or three or k families.

So my snippy “Did you read the calculation?” was too harsh, because of course the calculation does apply only to a single family. But that’s not because I’m choosing the family as a unit; it’s because I’m choosing a country-with-one-family as a unit. And I’m pointing out that you’d get a similar result if you used a country-with-two-families or a country-with-100-families as a unit. You can find details in the Zare post that I linked to, or in the excellent analysis by Thomas Bayes in yesterday’s thread.
22 22 Steve Landsburg
December 22, 2010 at 9:15 am

Onus:

PS: Yes, I relied on each parent also being a child. If you want to drop this and start with a generation of Adams and Eves, the calculation will be slightly different. Tweaking the assumptions changes the outcome. But there are no reasonable assumptions under which the outcome is 50%.
23 23 Ken B
December 22, 2010 at 9:17 am

@Thomas Bayes:
This is off topic but I infer you might be a Bayesian. Do you (or Steve) know any good prose explanations why Bayesianism is the correct approach? I don’t want a textbook, I want an apologia pro vita bayesian. I had a good frequentist upbringing but have had heretical doubts for a while now.
Thanks in advance.
24 24 Steve Landsburg
December 22, 2010 at 9:23 am

Ken B: If you can find it, Howard Raiffa’s hard-to-find white paperback book on Decision Analysis is likely to be exactly what you’re looking for.
25 25 Harold
December 22, 2010 at 9:35 am

Your question is: “That is, in a large number of similar countries, what would be the average proportion of females?” To which the answer is not 50%. And also: “The actual ratio, if you lump all those hypothetical possible countries together, is 50%.”

I am struggling to understand the difference between “lump all possible countries together” and “average”.

There are two possible questions.
1) What is the fraction of girls in all the countries added together?
2) What is the expected fraction of girls in a particular country?

By saying “What fraction of the population should we expect to be female?” seems to be Q1, and “in a large number of similar countries, what would be the average proportion of females?” I read as Q2.

In the single family example, I can see that the expected fraction of girls is 0.75, and the expected number of girls equals the expected number of boys. If I were to ask “What is the expected fraction of girls for a single family” then 0.75 seems right. But “in a large number of such families, what is the average proportion of females?” 0.5 seems right, as I would feel it necessary to include the infomation that each family of boys has 12 members, and girls only 4.

the key is that there are 2 questions, and then answering the right one.
26 26 Henry
December 22, 2010 at 9:37 am

I agree with this answer now. My attempt to resolve the martingale paradox (and possibly help others with understanding it) runs as follows:

Suppose every family in this country also bets $1 every time they conceive a child that it will turn out to be a boy. We decide to look at the proportion of bets each family loses. Half of the families, with a single boy, lose 0% of their bets. A quarter of the families, with a girl and boy, lose 50% of their bets. One eighth lose 75% of theirs bets, and so on.

Using the same summation as above, we can calculate the average number of bets a family loses (i.e. the number of girls it has): ~30.6%. This means the country came out ahead in its betting, right? Wrong. The families that lost the highest proportion of bets also made the most bets in the first place. A family that lost of 75% of its bets lost $2, a family that lost 90% lost $8 and a family that lost 95% lost $18.

We can see that the relationship between the number of bets a family loses and the proportion of bets it loses is non-linear. A increase in the former results in smaller and smaller increases in the the latter. Thus, the heavily losing families do add as much to the population’s losing proportion as they do to its absolute level of losses.

These families can all represent different probabilistic states of the world for a single family. In the states with large families, they do not the same relative amount of “credit” for a high proportion as they do for a high absolute number of girls. Hence, the proportion is lower than 50%.
27 27 Onus Probandy
December 22, 2010 at 9:41 am

That’s where you’re confused. The *country* is my unit.

Gulp. I don’t dare say that you’re confused.

The country is manifestly not your unit, you haven’t given probability distributions for countries. The table you give is in terms of families:

Here are some possible family configurations, with their probabilities:

You go on to sum the probabilities in this table (which you have stated are family probabilities) with weightings by how many girls are in each family (still no countries). You are sampling from families, not countries. I accept that you go on to modify for a finite country with k families, but that’s merely a parameter telling you how far to sum along that infinite series; but I am talking about your 1-log(2) answer, and ignoring the k version.

Your population is a set of balls. Half of those balls have “B” written on them, a quarter has “GB” written on them, etc. In your analysis you choose from those balls at random, and have calculated the expected ratio of “G” to “B” on that randomly chosen ball. The question is nothing to do with the balls though. The question asks how many “G”s there are.

You keep saying that that isn’t what you’ve calculated, in which case tell me how you would calculate the ratio of G to B on a randomly chosen ball. You expect that calculation to be different from your current answer, I do not.

Slight aside:

But there are *no* assumptions under which 50/50 is the correct answer.

What about the assumption that the method of choosing to have another child is whatever we have right now on this planet? Doesn’t that come out at 50/50?
28 28 Henry
December 22, 2010 at 9:44 am

I should proofread more.

“One eighth lose 75% of their bets, and so on.”
“Thus, the heavily losing families do not add as much to the population’s losing proportion as they do to its absolute level of losses.”
“n the states with large families, they do not get the same…”
29 29 Ken B
December 22, 2010 at 9:58 am

Moral: read puzzle questions carefully! The puzzle is not about expected number of female births but about the expectation of a fraction.

Lots of puzzles are like this. The archetype is “When I was going to St Ives ….”
30 30 Steve Landsburg
December 22, 2010 at 10:01 am

Harold:

1) What is the fraction of girls in all the countries added together?
2) What is the expected fraction of girls in a particular country?

Let me try an analogy.

There are two ways a coin can land. Heads or tails. Now I ask you to imagine two coin-flippers. There are two ways you could interpret that:

1) Imagine two coin flippers, each of whom independently flips a coin. With probability 25%, they both flip heads, with probability 50% they flip a head and a tail, etc.

2) Imagine two coin flippers who span the set of all possible outcomes. One of them flips a head and one flips a tail.

Those are different scenarios.

When you talk about “adding up the girls in all the countries”, you must decide what you mean by “all the countries”. Is each country performing the same random experiment separately, or are these a set of representative countries that span all the possible outcomes? In the first case, we get a probability distribution of answers; in the second case, we get a single answer.

The problem asks for the expected fraction-of-girls in a single country with (say) 10 families that has gone through the random process once. If you add together 10 such countries, you’ve got the equivalent of a single country with 100 families and you have not substantially changed the problem.
31 31 Steve Landsburg
December 22, 2010 at 10:02 am

Henry: Bingo. This is exactly right.
32 32 Steve Landsburg
December 22, 2010 at 10:09 am

Onus Probandy:

Your population is a set of balls. Half of those balls have “B” written on them, a quarter has “GB” written on them, etc. In your analysis you choose from those balls at random, and have calculated the expected ratio of “G” to “B” on that randomly chosen ball. The question is nothing to do with the balls though. The question asks how many “G”s there are.

Let’s do this for a two-family country. The two families have the following configurations with the following probabilities:

1/4 B/B
1/8 B/GB
1/8 GB/B
1/16 GB/GB
1/16 GGB/B

Etc. That gives me a 1/4 chance of 0% girls, a 1/8 chance of 1/3 girls, a 1/8 chance of 1/3 girls, a 1/16 chance of 2/3 girls, a 1/16 chance of 1/2 girls, etc. I can add all this up and get an answer that is not 1/2. (In fact it is log(4)-1, which is less than .4).

Here you can see that my unit is a two-family country. I can do the same thing for a ten-family country and get a different answer, which still won’t be 1/2.

What about the assumption that the method of choosing to have another child is whatever we have right now on this planet? Doesn’t that come out at 50/50?

That assumption is not consistent with any interpretation of the stopping rule given in the problem. There are no assumptions consistent with that stopping rule that will give you an answer of 50%.
33 33 Onus Probandy
December 22, 2010 at 10:13 am

Henry: Bingo. This is exactly right.

Henry is right.

Henry: Using the same summation as above, we can calculate the average number of bets a family loses (i.e. the number of girls it has): ~30.6%.

The number of bets the family loses. Not “the number of bets lost” which is the original question rephrased.
34 34 Steve Landsburg
December 22, 2010 at 10:24 am

Onus:

The number of bets the family loses.

You continue to ignore the twin facts that a) we can use families as *analogues* for countries. If an argument that seems airtight on the country level makes wrong predictions at the family level, then the argument is not airtight. And b) it is perfectly legitimate to ask what happens in a one-family country.

(Also c) the same phenomena that both Henry and I are talking about *do* happen in multi-family countries. It’s just easier to illustrate them in the one-family case.)
35 35 Tom
December 22, 2010 at 10:34 am

David Sloan and Onus have it right. Each term in the final displayed equation needs to be weighted by the number of children in the family. Otherwise the ratio in large families gets underweighted in the population.

Please look carefully at David Sloan’s units argument.

(I’m assuming the question we’re really asking is what’s the m/f ratio among the kids; that’s what we’re solving for in the equation.)
36 36 Henry
December 22, 2010 at 10:36 am

Henry: Using the same summation as above, we can calculate the average number of bets a family loses (i.e. the number of girls it has): ~30.6%.

The number of bets the family loses. Not “the number of bets lost” which is the original question rephrased.

Oops, poor wording on my part – this should read average proportion of bets the family loses (i.e. the proportion of girls it has) is ~30.6%.
37 37 Tom
December 22, 2010 at 10:38 am

But the expected fraction of girls in the family would be 75%. Moral: Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.

Sure, but the expected fraction of girls in a randomly-selected family isn’t what the problem statement asks for.
38 38 Steve Landsburg
December 22, 2010 at 10:40 am

Tom:

David Sloan and Onus have it right. Each term in the final displayed equation needs to be weighted by the number of children in the family.

Only if you’re asking a completely different question than the puzzle is asking.
39 39 Steve Landsburg
December 22, 2010 at 10:42 am

Tom:

Sure, but the expected fraction of girls in a randomly-selected family isn’t what the problem statement asks for.

The problem asks for the expected fraction of girls in a randomly-selected country. The calculation shows that the answer is not 50% in a randomly-selected one-family country. Similar calculations show that the answer is not 50% in a two-family, three-family or k-family country. No legitimate argument has been offered by anyone to suggest that the answer is 50% in any reasonable scenario, and in fact no such argument is possible, because there is no reasonable scenario in which the answer is 50%.
40 40 Tom
December 22, 2010 at 11:01 am

Steve Landsburg,

Steve,

No, I’m sorry, the puzzle is asking for the m/f ratio in the population. Not the m/f ratio of a family selected at random from a list of families. The latter is what you calculated.

Here: With your problem statement in hand, I walk into your neighborhood, from your example. I ask all the girls to stand on my left and the boys on my right. By counting them I can determine what fraction of the population of your neighborhood is female. I count them and I get 50%.
41 41 Ken B
December 22, 2010 at 11:05 am

@Tom:
I think you misread what Steve’s example shows. Think of each family as a possible example of a country, or if you like as a possible future state of a 1 couple country. Or look back at my example where I analyzed 4 cases — four futures — for a two couple country. In each case the fraction g/b+g, or if you prefer g/b, is different in equally likely possible futures. So you need to average them (more generally find their expectation) But the average of fractions is not the fraction of averages.
42 42 Tom
December 22, 2010 at 11:14 am

Ken B,

What is the ratio of males to females in Steve’s neighborhood?
43 43 Steve Landsburg
December 22, 2010 at 11:21 am

Tom:

What is the ratio of males to females in Steve’s neighborhood?

My neighborhood happens to have exactly 3 families of Type A and 1 family of Type B. The ratio of males to females in that neighborhood is not the same as the expected ratio in a family drawn randomly from that neighborhood.

The neighborhood is the analogue of the universe of possible scenarios that could play out in countries where people follow a certain stopping rule. The individual families are the analogues of individual countries in that universe.

The problem asks for the expected ratio in *one country*, not in the sum of *all possible countries*. By analogy, you need to look at the expected ratio of males to females in *one of the families on my block*, not in the sum of all families put together. So the question you are asking is utterly irrelevant to the original puzzle.
44 44 Steve Landsburg
December 22, 2010 at 11:24 am

Tom:

PS: If we were summing over all *possible* countries, as you seem to want to do, then there would be no “expectation” about it — there would be a ratio, pure and simple. But what’s being asked for here is an *expected* ratio in one country.

And PPS: Ken B’s most recent comment nails exactly the point you’re confused about. The average of fractions is not the fraction of averages.
45 45 Onus Probandy
December 22, 2010 at 11:24 am

I’m beginning to see. Thanks for sticking with me.

Etc. That gives me a 1/4 chance of 0% girls, a 1/8 chance of 1/3 girls, a 1/8 chance of 1/3 girls, a 1/16 chance of 2/3 girls, a 1/16 chance of 1/2 girls, etc. I can add all this up and get an answer that is not 1/2. (In fact it is log(4)-1, which is less than .4).

I follow this.

But it doesn’t match up with your equation at the very top.

x = 1/2 * 0 + 1/4 * 1/2 + 1/8 * 2/3 + …

Surely when there are k copies of the probability distribution given in the table, the result when multiplied together is more complex than just the probability distribution from the first table?
(perhaps my maths is letting me down, that it’s a well known result I don’t know).
46 46 Steve Landsburg
December 22, 2010 at 11:27 am

Onus: The equation you are quoting is the right equation for a one-family country. The new computation is correct for a two-family country. You have to decide in advance what size country you’re talking about before you can start doing computations. Different sized countries will give different computations, and hence different answers — but will never give the answer 1/2.
47 47 Thomas Bayes
December 22, 2010 at 11:27 am

For those of you having trouble with this, here is a simple, but exaggerated, example to emphasize one of the key issues.

Let X be a uniform random variable from the interval 1 to 2.
Let Y be a uniform random variable from the interval 1 to 2.
Let X and Y be statistically independent.

What is the expected value for X?
What is the expected value for Y?
What is the expected value for the ratio X/Y?
What is the expected value for the ratio Y/X?

Even though the expected value for X is equal to the expected value for Y, the expected value for their ratio is 3*log(2)/2, which is bigger than 1. However, the probability that their ratio is larger than 1 is 1/2. This means that even though the expected value of the ratio is about 1.04, the ratio is equally likely to be larger or smaller than 1. Mean, median, and mode do not have to be the same.

So, if you are asked “is the expected value for X larger than the expected value for Y?”, your answer should be “no”. But if you are asked “is the expected value for the ratio of X to Y larger than 1?”, your answer should be “yes”. And if you are asked “is the expected value for the ratio of Y to X larger than 1?”, your answer should also be “yes”. But if you are asked to bet on whether or not the ratio of X to Y would be larger or smaller than 1, it wouldn’t matter which way you bet. The ratio is equally likely to be larger or smaller than 1.

In a similar way, expecting an equal number of boys and girls is not the same as expecting the proportion of boys (or girls) to be 1/2.

For this boy-girl problem with M mothers:

1) The expected number of boys is equal to the expected number of girls.

2) The expected proportion of boys is equal to 1/2 + 0.25/M

3) The probability of having an equal number of boys and girls is roughly equal to .2821/sqrt(M).

4) The probability of having more boys than girls is roughly equal to 0.5

5) The probability of having more girls than boys is roughly equal to 0.5 – .2821/sqrt(M)
48 48 Tom
December 22, 2010 at 11:37 am

Steve,

Sorry, I didn’t quite catch the number you got for the numerical m/f ratio in your neighborhood. But that’s ok, I see you gave a number in your post, 30.6% girls. So the m/f ratio you get from your analysis is more than 2:1 male:female.

This is an excellent test case for your analysis, since we can see that the actual value is 1:1 for your example. Equal numbers of girls and boys, as you posted.

If your formula gave 1:1, great. But say in your post that your formula gives more than 2:1. So the analysis isn’t correct.

That’s really all we need to do for now. There’s a serious error in your formula. It doesn’t handle your own example. You need to correct that.

This is very clear.
49 49 Tom
December 22, 2010 at 11:39 am

Oh, sorry, your analysis gives 75%, a 3:1 ratio, for your own example.

Unfortunately since the actual m/f ratio in your neighborhood is 1:1 the analysis is not accurate.
50 50 Steve Landsburg
December 22, 2010 at 11:39 am

Tom:

This is an excellent test case for your analysis, since we can see that the actual value is 1:1 for your example. Equal numbers of girls and boys, as you posted.

For what example? I have given no example in which the actual value is 1:1.

Also, of course, it’s not surprising that the number comes out differently in different examples. 30.6% in the example you cite; 75% in the families-on-the-block example. They’re different examples. Of course they have different properties.
51 51 Jonathan Campbell
December 22, 2010 at 11:41 am

The original question at MathOverflow says “What is the proportion of boys to girls in the country?” The correct answer to this question, if I understand the phrase “proportion of boys to girls” correctly (this seems to be nonstandard usage, as usually proportions represent the portion of one set belonging to some subset, rather than mutually exclusive sets), is “undefined”, since there is nonzero probability, with a finite # of families, that there will be 0 girls.
52 52 Ken B
December 22, 2010 at 11:42 am

Here is another old chestnut which is related in a way. An example from days of yore: In the first half of the season Babe Ruth had a higher batting average than Lou Gehrig. In the second half of the season Babe Ruth had a higher batting average than Lou Gehrig. Over the whole season Lou Gehrig had a higher batting average than Babe Ruth. Explain.

An example explains
first half: BR at bat 1 time, one hit; LG 99 at bats with 98 hits.
second half: BR at bat 99 times with 50 hits, LG at bat twice, one hit.

Season: each at bat 100 times, LG 99 hits, BR 51 hits.
53 53 Steve Landsburg
December 22, 2010 at 11:45 am

Tom:

Unfortunately since the actual m/f ratio in your neighborhood is 1:1 the analysis is not accurate.

Okay, now I suspect you’re just being obstinate for the fun of it.

The actual expected ratio for a family in my neighborhood is 75%. Adding up over a perfectly representative universe of all families takes all the probability out of the problem, which clearly converts it to a completely different and quite irrelevant question.
54 54 Steve Landsburg
December 22, 2010 at 11:46 am

Jonathan Campbell: Yes, I agree with you, which is why I changed the statement of the problem.
55 55 Ken B
December 22, 2010 at 11:46 am

As an aside, aren’t all the answers not quite right? It looks to me like no-one is counting the parents in any of their sums. The question is after all about females not female births.
56 56 Steve Landsburg
December 22, 2010 at 11:51 am

Tom: Perhaps this will help:

Here we have a country where people follow the stopping rule.

There are many ways the history of that country could play out.

Question 1: Without knowing how the history *actually* played out, choose a child at random. What is the probability that child is a girl? Answer: 50%. The analogue, in the families-on-the-block scenario is: Without knowing which family you’re choosing from, choose a child at random. What is the probability the child is a girl? Answer: 50%.

Question 2: Without knowing how the history actually played out, what is the expected fraction-of-girls? Answer: Not 50%. The analogue, in the families-on-the-block scenario is: Choosing a family at random, what is the expected fraction-of-girls? Answer: 75%; i.e. not 50%.

You persist in addressing Question 1 whereas the puzzle asks for an answer to Question 2.
57 57 Steve Landsburg
December 22, 2010 at 11:52 am

Ken B:

As an aside, aren’t all the answers not quite right?

The “correct” answer is model-dependent. It depends on what you assume about the initial conditions, about whether all the families have finished reproducing, etc. But I can think of no reasonable model in which the answer is 50%.
58 58 Tom
December 22, 2010 at 11:58 am

Steve,

;) That’s an amusing inversion of what’s happening here, and I do appreciate the humor.

Again: What happens if I go into your neighborhood and ask all the males to stand on one side and all the females on the other? The two groups will be equal. So the fraction of females in your neighborhood is 50%.

Your analysis gives 75%. Your analysis isn’t handling that example properly. It happens to all of us.
59 59 Onus Probandy
December 22, 2010 at 11:58 am

Steve:

The equation you are quoting is the right equation for a one-family country.

Well no wonder I’ve had trouble. I thought that was the equation for an infinite population, and then “k” told you how many terms of this series to summate (in a non 1 to 1 manner).

For a population of k families, a similar calculation gives an answer

Since that “similar calculation” is actually the answer. Can we have that equation?
60 60 Ken B
December 22, 2010 at 12:00 pm

@Steve:
I just mean we — the under 50 percenters at least — have been discussing g/g+b. But the actual fraction according to the strict interpretation should be (g+N)/(g+N + b+N) since you want females in the whole population, not female births in the child population. That change will not push the expectation up to 50%.
61 61 Onus Probandy
December 22, 2010 at 12:02 pm

… by “have it”, I mean “have it explained”.. Douglas Zare’s explanation is way above my head. The leap from 2 to k goes a bit fast for me.
62 62 Will A
December 22, 2010 at 12:04 pm

Prof Landsburg:

I’m only pointing this out because you “made” me spend so much time on this yesterday.

I believe you have an error in your 3rd bullet point:
”
Nevertheless, the fraction of girls in the average family is not 50%. It’s 75% (the average of 100%, 100%, 100%, and 0%).
”

The question was number of females, not number of girls. The puzzle assumed 2 parents so on your block you have:
3 families with four girls each (1 dad, 1 mom, 3 girls)
1 family with 12 boys (1 dad, 1 mom, 12 boys).

he fraction of girls in the average family is not 50%. It’s 65% (the average of 80%, 80%, 80%, and 8%).
63 63 Steve Landsburg
December 22, 2010 at 12:06 pm

Onus:

Since that “similar calculation” is actually the answer. Can we have that equation?

It’s in the Douglas Zare post that I linked to in my last paragraph.

Edit: Oops. Just saw your followup comment in which you say you already looked at Zare but would appreciate more explanation. I’ll try to get to this later today.
64 64 Steve Landsburg
December 22, 2010 at 12:07 pm

Tom:

I suspect you might just be too stupid to think about this problem.
65 65 mobile
December 22, 2010 at 12:08 pm

Imagine 100 identical countries with 100 couples. Due to chance, some countries will have a lot of couples who have their sons early and the couples in other countries will tend to have their sons late. The proportion of boys in the “early” countries will be higher than the proportion of girls in the “late” countries. The “early” countries will also have smaller populations. That is, there will be large countries with lots of girls and small countries with not so many girls (the number of boys in all countries will be the same).

If you aggregate all the countries, the number of boys and girls is expected to be the same. If you choose a country “at random”, however, you are biased toward a large fraction of boys.

If you’re still not convinced, consider 2 countries with 1 couple.
In country 1, the couple has a son: fraction of boys is 100%. In country 2, the couple has two daughters and a son: fraction of boys is 33%. Aggregate population is two sons and two daughters, but the “expected” fraction of boys is (100+33)/2 = 67%.
66 66 Will A
December 22, 2010 at 12:33 pm

Being that I’m trying to work for Google, I’d like to run this on a computer simulation.

I can create 100 “countries” with let’s say 1,000 couples. For each couple I could generate random girls and boys following the rules in the puzzle.

To determine the fraction of girls do I:
– Count the number of boys and girls for each country. E.g. 1400 boys and 600 girls equals 30%
or
– Count the fraction of each family. E.g. family1 = 0%, family2 = 50%, …, family3 = 75% and then average these fractions.

Thanks
67 67 Ken B
December 22, 2010 at 12:42 pm

@Will A:
Each country in your simulation is a possible future state. So you count the entrie population of the country and get a ratio for that country. Then if you have done the probabilities right you have a sample set of size 100 from the possible futures of one country. That will give you a sample mean which will approximate the answer. So sum over the country not over the families. The families thing is another analogy for possible futures is all.
68 68 Dave B
December 22, 2010 at 12:57 pm

Will A: You won’t show anything with your simulation though since with 1000 couples the expected ratio will be quite close to 50% in any case so your sample of 100 won’t be large enough to show a difference. If you want to show something run it with only a handful of families (say 5) in each country which should consistently show a ratio with a value other than 50%.
69 69 Black Foliage
December 22, 2010 at 1:23 pm

Tom:

I would recommend you read Thomas Bayes post at 11:27 AM it is a great explanation of what is causing you to be confused.

Also its a bit harsh to call you too stupid to understand the problem, I just think you lack the mathematical background to think about the problem in a different way than you currently do.

Furthermore your intuition is somewhat correct in that in a country with a large number of families the proportion of girls approaches 50% so Landsburg’s point, while interesting and enlightening, is a minor deviation from the slightly incorrect answer of 50%.
70 70 Tom
December 22, 2010 at 1:24 pm

Steve,

Well, that is a different analysis, but not quite what I had in mind! To save time let’s assume that I am stupid, but that for some reason you need to address my argument instead of my personal measurements.

So. Have you walked around your neighborhood and counted noses?

Your problem statement asks what fraction are girls. That is Ngirls divided by Ngirls+Nboys. You gave numbers in your post:

Ngirls=12
Nboys=12.

It’s true that you didn’t carry out the division 12/24 in your post. But I’m betting on 50%.
71 71 Bryan
December 22, 2010 at 1:45 pm

This seems stupid. Even Steve admits that as K goes to infinity, the ratio of men and women approaches 1/2 from below. I think most people conceptualized Steve’s original problem with an implicit assumption that K goes to infinity (i.e. we are looking at a single country, with K families playing Steve’s game, and K can be assumed to be large).

Steve is now answering the problem when K is finite, and acting like everyone else got the problem wrong. This seems silly… We are debating an assumption about K being finite or not. Debating assumptions (we should all agree) is for the birds….

Disappointing.
72 72 Doug
December 22, 2010 at 1:56 pm

From the original post

“But in expectation, what fraction of the POPULATION is female? In other words, if there were many such countries, what fraction would you expect to observe on average?” (emphasis on population mine).”

Steve, your answer is correct and very clever if only you worded it correctly. But I think you screwed the pooch in what was probably a last minute oversight trying to write the problem out.

Let Bi be the number of boys in any family, and Gi be the number of girls in any family. What you asked for was the expectation of the fraction of the population which is:

E(Sum(Fi over all i)/(Sum(Fi over all i) + Sum(Bi over all i)))

The correct answer to this is no doubt 1/2, I certainly hope that you don’t dispute that.

Your answer is for the expected proportion of each family which is an entirely different random variable:

E(Sum( Fi / (Fi + Bi)))

So as you can see the random variable that you asked about and the random variable you gave an answer for are NOT the same random variables.
73 73 Ken B
December 22, 2010 at 2:16 pm

@Bryan:
No I think the real point is that everyone solves the problem find E(g)/E(g+b) without realizing that that just isn’t right. I know I fell into the trap, as did it seems pretty much everyone. The puzzle highlights an interesting error. Just because the infinite population case converges to 1/2 doesn’t really matter. The thinking most of us used was subtly flawed when we first arrived at 1/2. I would never have thought twice until Steve noted that the obvious answer was wrong, I ‘d have stopped at the “aha its 1/2 moment”.
74 74 Thomas Bayes
December 22, 2010 at 2:17 pm

Bryan: Modern digital communication that enables all of the things we do with our cell phones wouldn’t be possible if someone didn’t understand the ‘silly’ things like the slight expectation bias that occurs in this problem. Is the expected proportion of girls ‘close’ to 1/2 for large K? Of course. Is it useful to understand the way it deviates for finite K? I think so.

If I try to estimate the mean from several (K) independent samples of a random variable that has a mean and variance, then the sample mean will have a variance that is equal to the variance for the variable divided by K. Would it be silly to note this, or should I just say that the variance is equal to 0?

Doug: “The correct answer to this is no doubt 1/2, I certainly hope that you don’t dispute that.”

If I interpret your notation correctly, then you are referring to the expected value for the ratio of total number of girls in the population to total number of boys and girls in the population. That value is:

0.5 – 0.25/(number of families)

Your second example:

E(Sum( Fi / (Fi + Bi)))

would evaluate to the number of families multiplied by the expected value for the ratio for a single family. Neither would be equal to 1/2.
75 75 Bryan
December 22, 2010 at 2:18 pm

Doug,

Thanks for providing hard evidence that the original problem was meant to assume that K (the number of families or “countries”) was large.

Can we all now agree that what we are debating today is whether K is finite or not? And can we further agree that differences of opinion on this matter tell us nothing about anyone’s “cleverness”?
76 76 Ken B
December 22, 2010 at 2:30 pm

@Doug:
No you are getting the equation wrong. Forget families; that was Steve’s analog to alternate possibilities. SL is not summing families. If SL’s post isn’t clear go back and read posts by Thomas Bayes and myself where I think your problem is addressed.
77 77 David Sloan
December 22, 2010 at 2:37 pm

It is critically important whether we are fixing the number of completed families (i.e. ones who have had their boy and stopped reproducing) fixing the number of children.

If you consider only completed families, you will miss all the ones who are still working on their boy, and have some (possibly large) number of girls already. If this is the question, then yes, the expected fraction of girls will be <50% – you implicitly discarded all the families who consist (so far) only of girls!

If you consider all children, regardless of whether their parents have stopped having more, you get the expected 50%.

Simulations bear this out:
1) repeatedly generate a single family, and "average" together the gender ratio in each family, weighted by number of families. Result: average converges towards ~0.306 (the figure quoted many times above for a single family's expected ratio)
2) repeatedly generate a single family, and "average" together the gender ratio in each family, weighted by number of children in that family. Result: average converges towards 0.5
3) repeatedly generate 10 families in a batch, and weight by # of families. Result: converge towards 0.475 (the figure quoted above for 10 families' expected ratio)
4) generate 10-family batches, and weight by number of children. Result: 0.5 again
5) generate k-child batches, from as many families are needed to produce this many. Average together the gender ratios (implicitly weighted by # of children in the batch, since they're all identical). Result: 0.5 again

Note: in cases 1 and 3, my "units" arguments still applies. The fact that I have written code to simulate these scenarios does not mean I believe the math used to compute the "average" after the fact is meaningful. Python doesn't understand units; it's just doing arithmetic. In all the cases where I weight averages by number of children, I get 0.5. In all cases where I weight by number of families, I get the <0.5 (i.e. Douglas Zare's formula) result.

Steven: if I have missed a simulation scenario that you consider relevant, please let me know. I'd be happy to add it to my suite of test cases. I'd also be interested to know what you expect the results to be. ;)

If you intended to ask one of the questions answered by simulation 1 or 3, I believe your original wording was flawed: you asked for "fraction of the population", not "ratio of the average family".
78 78 Ken B
December 22, 2010 at 2:38 pm

@Tom:
Steve has perhaps confused things with his example about his fecund neighbourhood. (Is there something in the water where you live Steve?) He should have stated more clearly that he is mapping the country to a neighbour, and that the various neighbours correspond to various possible final states of the (one couple) country. For each of these final states you (more or less) find g/g+b, and average those averages. You don’t get 1/2. You don’t ever get one half for any finite number of families.

Seriously, read what some of the others have said, like Thomas bayes, or my example early on.
79 79 Bryan
December 22, 2010 at 2:39 pm

Ken B and Thomas,

I agree with you that the finite case of K is important. My point is that most people were not answering that specific question. Thus, saying they had the “wrong” answer is “silly” – most people had the right answer to a different question.

At this point, I think everyone understands the question Steve was trying to ask, and agrees with his answer. The only debate is what question did Steve actually ask? This debate strikes me as futile, as its answers lays in the perceptions inside of each person’s head….
80 80 Ken B
December 22, 2010 at 2:44 pm

Time for a thread hijacking! Here is another problem in probability from the site Steve got this from

An ordinary deck of cards, face down, is placed in front of you in a stack. A dealer turns the top card of the stack face up and puts it on a separate pile, and does this repeatedly until you say “now”. At that point he turns over the next card and stops. You can say “now” at any time from the very beginning (before the first card is turned over) until almost the very end (just before the last card is turned over). You win if the last card turned over — the one turned over just after you say “now” — is red.

What is your strategy, what is it’s rate of return, and can anyone do better?
81 81 Guy
December 22, 2010 at 3:03 pm

In your explanation at the top of the page you ignore families with only girls – i.e. all the families who haven’t had a boy yet. The question you answer might be more interesting, but it isn’t the question in the blue box.
82 82 Thomas Bayes
December 22, 2010 at 3:05 pm

Ken B:
—
This is off topic but I infer you might be a Bayesian. Do you (or Steve) know any good prose explanations why Bayesianism is the correct approach? I don’t want a textbook, I want an apologia pro vita bayesian. I had a good frequentist upbringing but have had heretical doubts for a while now.
Thanks in advance.
—

Another suggestion in addition to the one Steve gave would be a paper by E.T. Jaynes:

http://bayes.wustl.edu/etj/articles/general.background.pdf

There are others here:

http://bayes.wustl.edu/etj/node1.html

Some quotes that might prime your interest:
—
“Starting with the debates of the 1930’s between Jeffreys and Fisher in the British Statistical Journals, there has been a puzzling communication block that has prevented orthodoxians from comprehending Bayesian methods, and Bayesians from comprehending orthodox criticisms of our methods. On the topic of how probability theory should be used in inference, L. J. Savage (1954) remarked that ‘there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel’ “.
—
“In Bayesian parameter estimation, both the prior and posterior distributions represent, not any measurable property of the parameter, but only our own state of knowledge about it. The width of the distribution is not intended to indicate the range of variability of the true values of the parameter, as Barnard’s terminology led him to suppose. It indicates the range of values that are consistent with our prior information and data, and which honesty therefore compels us to admit as possible values. What is ‘distributed’ is not the parameter, but the probability.”
—
83 83 Steve Landsburg
December 22, 2010 at 3:05 pm

mobile: Excellent comment; thanks.
84 84 Steve Landsburg
December 22, 2010 at 3:06 pm

Will A:

– Count the number of boys and girls for each country. E.g. 1400 boys and 600 girls equals 30%
or
– Count the fraction of each family. E.g. family1 = 0%, family2 = 50%, …, family3 = 75% and then average these fractions.

Count the fraction for each country. Then average those fractions.

You should get approximately 1/2 – 1/4000 (if you’ve got 1000 couples). It might be hard to distinguish this from 1/2 just by eyeballing, but the difference is there.
85 85 Steve Landsburg
December 22, 2010 at 3:14 pm

Tom:

Your problem statement asks what fraction are girls. That is Ngirls divided by Ngirls+Nboys. You gave numbers in your post:

Ngirls=12
Nboys=12.

We both agree on these numbers. What you don’t understand is that they are irrelevant.

I think, but am not certain, that much of your confusion stems from failing to recognize that each *family* in the neighborhood example corresponds to an entire *country* in the original setup. So let’s start over with a modified example designed to erase that confusion:

There is ONE family in my neighborhood. They are about to flip a fair four-sided coin to decide how many children to adopt. If the coin comes up on any of the first three sides, they will adopt four girls. Otherwise, they will adopt twelve boys.

I make the following claims. If you disagree, please tell me which specific claim you disagree with:

a) In expectation, this family will adopt three girls and three boys.

b) In actuality, they will adopt neither three girls nor three boys. They will instead adopt either four girls (having 100% daughters) with probability 3/4 or twelve boys (having 0% daughters) with probability 1/4.

c) In view of b), the expected value of of the girl-fraction is 75%.

d) If we add up all the hypothetical children in all four hypothetical outcomes, we get four girls plus four girls plus four girls versus twelve boys. This is a 50% fraction of girls.

e) However, that 50% ratio is quite irrelevant to the problem, which asks not about *hypothetical* children but about *actual* children.

f) To summarize: 50% is the right answer to the wrong question. It is the answer to a question about what happens if you sum up over a bunch of hypotheticals. 75% is the right answer to the right question.
86 86 Steve Landsburg
December 22, 2010 at 3:16 pm

Bryan:

Even Steve admits that as K goes to infinity, the ratio of men and women approaches 1/2 from below. I think most people conceptualized Steve’s original problem with an implicit assumption that K goes to infinity (i.e. we are looking at a single country, with K families playing Steve’s game, and K can be assumed to be

But even when K is large, the “standard” argument doesn’t work. You need a *different* argument (and a much more technical one) to get the result that when K is large the result is near 1/2. Even if the result were *exactly* 1/2 (which it isn’t), it wouldn’t change the fact that the standard argument is wrong.
87 87 Steve Landsburg
December 22, 2010 at 3:17 pm

Doug:

E(Sum(Fi over all i)/(Sum(Fi over all i) + Sum(Bi over all i)))

The correct answer to this is no doubt 1/2, I certainly hope that you don’t dispute that.

I absolutely dispute that. See the calculation in the post.

PS: Note that in the case of a one-family country, the two expressions you’re trying to distinguish are in fact identical.
88 88 Steve Landsburg
December 22, 2010 at 3:20 pm

Ken B:

@Tom:
Steve has perhaps confused things with his example about his fecund neighbourhood. (Is there something in the water where you live Steve?) He should have stated more clearly that he is mapping the country to a neighbour, and that the various neighbours correspond to various possible final states of the (one couple) country. For each of these final states you (more or less) find g/g+b, and average those averages. You don’t get 1/2. You don’t ever get one half for any finite number of families.

I’ve explained this to him about six times. I don’t think he’s interested.
89 89 Steve Landsburg
December 22, 2010 at 3:22 pm

Guy:

In your explanation at the top of the page you ignore families with only girls – i.e. all the families who haven’t had a boy yet. The question you answer might be more interesting, but it isn’t the question in the blue box.

As I’ve explained multiple times in this thread (though I realize you might not have read all the comments!), the precise answer depends on your modeling assumptions, including whether you assume that all the families have stopped reproducing. I answered the question on that assumption by way of illustrating an approach. With different assumptions, you’d get a different answer. But it still wouldn’t be 1/2.
90 90 Steve Landsburg
December 22, 2010 at 3:24 pm

David Sloan:

If you consider all children, regardless of whether their parents have stopped having more, you get the expected 50%.

I do not believe this. Do you have an argument for it, or are you just making it up?
91 91 Steve Landsburg
December 22, 2010 at 3:27 pm

David Sloan:

Your simulation 1) gives the right answer. Your simulation 2) is irrelevant to any reasonable interpretation of the problem. Your simulation 3) gives the right answer. Your simulation 4) is irrelevant to any reasonable interpretation of the problem. I don’t get exactly what you’re doing in 5).

All your extraneous simulations are revealing is that E(Girls)/E(Kids) = .5. We know this. But the problem asks for E(Girls/Kids), which is not at all the same thing.
92 92 Ken B
December 22, 2010 at 3:38 pm

@Steve:
You might be right. I find it kind of amusing in general that in a thread with a clear cut correct answer — which you give — we get so much argument and tsuris, but in a more subjective and woolly thread — where I think you get it quite wrong (The law is an ass) — we don’t see a fraction of that! Maybe this isn’t so surprising. The real dichotomy in life is between those willing to be wrong, and those not. The former do quantitive studies and the latter do lit-crit.
93 93 Ken B
December 22, 2010 at 3:40 pm

@TB & Steve:
Thanks. The book is ordered, the paper is printed. Merry Christmas.
94 94 Steve Landsburg
December 22, 2010 at 3:52 pm

Ken:

I find it kind of amusing in general that in a thread with a clear cut correct answer — which you give — we get so much argument and tsuris

Thanks for doing your part to help others understand that answer.

I do see why the answer might be hard to grasp at first. What puzzles me is the folks who care so much about it that they’re willing to keep making the same false arguments over and over and over, but don’t care enough about it to take the trouble to understand what you, Thomas Bayes and others have explained so carefully and clearly.
95 95 Neil
December 22, 2010 at 4:25 pm

Thanks for posting today. I would have hated to spend the holiday weekend thinking about this only to find I was going down the wrong path. I see now what you meant when you said it was strictly a math reasoning problem. I first thought there was some interesting and counterintuitive demographic puzzle here.
96 96 Will A
December 22, 2010 at 5:18 pm

@ Dave B:

”
If you want to show something run it with only a handful of families (say 5) in each country.
”

@ Steve:
You told me to count the fraction in each country. I did this. Here are the results from my pass at it. Notice that the Avg % girls in each family is consistent with your solution. Especially when run with 100,000 couples per country. Here are some sample runs:

Couples/Country Avg. % of Girls Avg. % Girls Each Fam
1 29.01(85 max/0 min) 29.01(85 max/0 min)
4 46.94(80 max/0 min) 33.84(80 max/0 min)
100,000 49.99(50.2 max/49.8min) 30.68(30.9 max/30.5 min)

So based on my simulation, I would expect that given a sample of 4 couples in a neighborhood, the neighborhood would have ~41% girls. And that the average fraction of girls in each family would be ~30%.

Now of course my simulation could be invalid. If my simulation is incorrect, it would be helpful to know how to “correctly” simulate this.

If my simulation is valid, then there is a difference between the Average % of girls in each country and the Average % of girls in each family for countries with more than 1 couple.
97 97 Steve Landsburg
December 22, 2010 at 5:24 pm

Will A: Your results sound exactly right.

Your second simulation essentially treats each family as a separate country. So you’ve got a result for four-family countries and a result for one-family countries. Those results are exactly as expected.
98 98 Will A
December 22, 2010 at 5:28 pm

Sorry about the above post, I should have put the table in an html table. Here is another run:

Couples/CountryAvg. % of GirlsAvg. % of Girl/Fam.

130.1 (90max/0min)30.1 (90max/0min)

445.2 (80max/0min)32.8 (76max/0min)

100,00049.99 (50.2max/49.8min)30.68 (30.8max/30.5min)
99 99 Will A
December 22, 2010 at 5:52 pm

Steve:

It’s hard to believe that I’m exactly correct. However, I think that if I was trying to get a programming job at Google, I believe that my solution is “better” than yours:

Write an algorithm that generates different answers based on what is meant by “fraction of population” then ask the user what they meant. The is basically the job of a programmer.
100 100 Harold
December 22, 2010 at 6:09 pm

So lets see if I get it. There will be lots of different countries with different populations. The ones with small populations will have more boys than girls. The ones with large populations will have more girls than boys. Therefore selecting a country at random gives an expected fraction that is not 50%, because there are more countries with low population and more boys than those with high population and more girls. This allows me to square the ideas that there are equal numbers of boys and girls overall, but the expected fraction of girls in any country is not 50%.
101 101 Will A
December 22, 2010 at 6:21 pm

If my simulation is correct, the smaller the population is, the more likely it is for the percentage of girls to be less than 50%.
102 102 Thomas Bayes
December 22, 2010 at 6:29 pm

I just looked at this article:

http://www.businessinsider.com/15-google-interview-questions-that-will-make-you-feel-stupid-2010-11

which includes this question and 14 others that have supposedly been asked at Google interviews. This one caught my eye:

“Why are manhole covers round?”

I have a colleague who told me that he was asked a similar question by a professor during his PhD qualifying exams. His question was “Why are manholes round?”, which is better, I believe, because a very good answer to the supposed Google question is “Because manholes are round.”

Cheers.
103 103 JamesL
December 22, 2010 at 6:55 pm

Joining the conversation this late, I can’t claim to have digested every comment in the thread, but I do think I follow the mathematics in Steven’s and Douglas Zare’s posts. (As an aside, I worked for many years at a company – not Google – that asks questions like these to interview candidates, and the math PhDs, including myself, who ask these question do debate points like this internally amongst themselves at great length.)

Steven’s point (or really, Douglas Zare’s) that E[G/(G+B)] != E[G] / E[G+B] is a fair one. It seems to me, however, that the real quibble with Google’s interview question isn’t the answer but the phrasing of the question. It seems clear to me that the intended limit for any real country is k -> infinity (where k is number of families), and that the “wrong” answer is acceptable in an interview as a hand-wavy, “in the time available” argument that the limit of the digamma expression for large k is indeed one half. Certainly I wouldn’t be terribly impressed if Steven trotted out his 30.6% answer in an interview, since the case “k=1 family” wasn’t asked in my question and isn’t reasonably implied in any way I can see.
104 104 Tom
December 22, 2010 at 7:10 pm

JamesL

Good luck. The attempt here was to produce a real-world example of a case in which the expectation value of the ratio differs from the ratio of the expectation values.

Unfortunately, as David has shown exhaustively, the two numbers happen to coincide in the problem statement given here. So we’re going to be subjected to ad hominems, weird constructions like ‘countries’ that consist of a single family each, etc., until we give up and go home.

If we can’t handle the ensemble of Steve’s neighborhood–which consists of the single point 12G,12B & has expectation value of the fraction of girls 100%*(12/24)–then we certainly aren’t going to get anywhere with whole countries.

Just my 2 cents.
105 105 Tom
December 22, 2010 at 7:14 pm

This example from Steve’s post is accurate, and gets his point about ratio of expectations across very well:

Suppose there’s just one family, that randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). In that family, the expected number of girls equals the expected number of boys, and the expected fraction of girls is still 75%.

Anybody who wants to try and shoot that down, go at it. Steve will win, the discussion will die down, and Steve will be able to make his point about expectation of ratios.
106 106 Will A
December 22, 2010 at 7:28 pm

@ JamesL:

It is probably reasonable to assume that in any “real” country, there is one person who doesn’t want to have a boy.

Therefore, since we are considering countries where everyone wants to have a boy, we must be considering imaginary countries.

By considering the case where k=1 family, Prof. Landburg is taking into consideration the infinite number of the imaginary countries with 1 family.
107 107 Thomas Bayes
December 22, 2010 at 7:51 pm

Most people seem to accept the fact that the expectation of a ratio is not the same as the ratio of the expectations. That’s good. Now the issue seems to be whether or not the difference between the correct and incorrect answers — the 0.25/K term — is important.

Suppose the Google question was this:
—
X_1, X_2, . . ., X_K are independent, identically distributed random variables with mean 0 and variance V. S is the sample mean, which is equal to the sum of these variables divided by K. What is the expected value of S*S?
—

One approach would be to determine that the expected value of S is equal to 0, and then to declare that the expected value of S*S must be 0.

Another approach would be to determine that the expected value of S*S is V/K.

We might say that V/K is very close to zero for large values of K, but that wouldn’t make the first answer correct. Would you accept an answer of zero for this question? If not, why be comfortable with an answer of 1/2 for the boy-girl question?
108 108 Will A
December 22, 2010 at 8:27 pm

@ Tom:

”
Suppose there’s just one family, that randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). In that family, the expected number of girls equals the expected number of boys, and the expected fraction of girls is still 75%.
”

This is incorrect as it relates to the puzzle. The puzzle asked about females. If you say that the expected fraction of girls is 75% you are not taking into account the mother in the family who is a female.

The argument as I see it is whether Prof. Landsburg’s answer applies only to countries that have 1 family or “the fraction of girls in the average family”.

I’m arguing that the phrase “fraction of the population” doesn’t have the same meaning and therefore value as “the fraction of girls in the average family” when dealing with countries that have more than 1 family.
109 109 ErikR
December 22, 2010 at 8:49 pm

We might say that V/K is very close to zero for large values of K, but that wouldn’t make the first answer correct. Would you accept an answer of zero for this question? If not, why be comfortable with an answer of 1/2 for the boy-girl question?

Two reasons. One is that the original question was not phrased in a way that would make it obvious that Steve was asking for the expectation of a ratio. “What fraction of the population is female” does not convey to me that he is asking for a certain mathematical expectation. That is why mathematics has its own syntax and definitions — English is not precise enough to express many concepts clearly.

Second, the example of sex ratio in a country is a terrible one. I cannot think of any realistic situation where someone would be interested in the number of girls born in a country where they would care that the mathematical expectation of the ratio is slightly different than 1/2.

It may have been better to ask the question as some sort of esoteric gambling decision as someone else alluded to. Or maybe something about signal processing, as TB alluded to. Something where the difference might actually matter in a realistic situation.
110 110 Tom
December 22, 2010 at 8:49 pm

Will A.,

The quotation in my comment is from Steve’s post, in the section labeled Edit. In that excellent example of the difference between expectation-of-the-ratio and ratio-of-the-expectations, Steve really is talking about kids only.

W/r/t the problem statement, I think you’re right and my own initial calculations of the expectation of ratios included parents, but personally can I plead that we not try to steer this out-of-control discussion through that extra curve at this late stage? Everybody’s been talking about kids only, that algebra is a little simpler, and it’s really just a convention. No baby is thrown out with that particular bathwater.
111 111 Will A
December 22, 2010 at 10:33 pm

@ Tom:

I am fine dealing with kids only, as long as we discuss the impact that random births have.

Let’s say I asked the following question:
Consider a country where on average each couple pairs for life and has on average 2 kids each. On average what percentage of the population never creates a couple and therefore never has a child?

I would submit that there will be times when there is never exactly the same number of males and females. Therefore, there will be will be people who never have kids.

Since the birthrate is 2 children per couple and some people never have kids, the population will eventually go to 0. It may take a while, but it will go to zero.

If this is correct than any theoretical country (e.g. countries with only 1 family) has a population > 0 for a finite time and a population of 0 for an infinite amount of time.

If this is correct, then 0 seems like a valid answer to the question.
112 112 Steve Landsburg
December 22, 2010 at 10:59 pm

Harold:

So lets see if I get it.

You get it! Not only that, but based on your history around here, I was certain you would.
113 113 Steve Landsburg
December 22, 2010 at 11:01 pm

JamesL:

It seems clear to me that the intended limit for any real country is k -> infinity (where k is number of families), and that the “wrong” answer is acceptable in an interview as a hand-wavy, “in the time available” argument that the limit of the digamma expression for large k is indeed one half.

The reason I would find this an unsatisfactory answer is that, without a supporting calculation, it is nothing more than an unsupported claim. Even if 1/2 is the right answer, I am unimpressed with a candidate who simply guesses it correctly but can’t defend it.
114 114 Steve Landsburg
December 22, 2010 at 11:03 pm

Tom:

Anybody who wants to try and shoot that down, go at it. Steve will win, the discussion will die down, and Steve will be able to make his point about expectation of ratios.

Unless I’m failing to detect sarcasm, you seem to have finally gotten this point. The point you’re still missing is that this example illustrates *exactly the same thing* as the four-family example.
115 115 Steve Landsburg
December 22, 2010 at 11:07 pm

Will A:

This is incorrect as it relates to the puzzle. The puzzle asked about females. If you say that the expected fraction of girls is 75% you are not taking into account the mother in the family who is a female.

In fact, it relates directly to the puzzle. It gives an explicit example in which the expected numbers of boys and girls are equal but the expected ratio is not 50%. It follows, then, that proving that the expected numbers of boys and girls are equal cannot address the question asked in the puzzle. The fact that this example might differ from the puzzle example in other ways is not relevant to that key point.
116 116 Thomas Bayes
December 22, 2010 at 11:33 pm

ErikR:
Here is Steve’s original statement of the question:

“But in expectation, what fraction of the population is female? In other words, if there were many such countries, what fraction would you expect to observe on average?”

He even used bold font for the word ‘expectation’. It was clear to me that he wanted us to think about the expectation of the fraction of the population that is female. Sure it’s a quirky question, but it contains an important lesson about the expected value of ratios. And based on the early responses to the question, it is clear that many people are prone to making the mistake that this question helps identify.
117 117 Tom
December 22, 2010 at 11:41 pm

Steve,

Please don’t waste both of our time with more ad hominem remarks.

There’s no sarcasm at all in my post. Of course the expectation of a ratio and the ratio of expectations are always calculated differently. Your adoption example is a great illustration of a case where that leads to a numerical difference between the two quantities.

The only reason the remainder of the discussion has gone off the rails is that unfortunately in the original puzzle, boxed in your post, there happens to be no difference numerically between the expectation of the ratio and the ratio of the expectations. It’s bad luck. They both happen to come out to 0.5. Dave has provided you with detailed calculations showing why this is true, but unfortunately you’re not listening to him.

Seriously, the problem here is not that you have no commenters who took a freshman probability class. Some of us have PhDs too, we know this elementary stuff too (and we make mistakes too, and we don’t think that makes us or you stupid).

In the case of your puzzle the problem is that you left out, in the final display equation in your post, a weighting factor in each term proportional to the number of children in the corresponding family. You need that factor because when you select a member of the overall population, your chance of getting a member of a given type of family is not only reduced by that family type’s unlikeliness (the 2^n factor in your final formula) but somewhat increased by the size of the family (a linear factor missing from your formula).

If you add the missing factor, you’ll (unfortunately, and I mean that) get 0.5 for the expectation value of the fraction of girls. Unfortunately this problem is a weirdie that doesn’t differentiate between expectation of ratios and ratio of expectations. Your general point is FINE, everybody acknowledges that, this is just a piddling technical issue.

Best of luck. Sincerely!
118 118 ErikR
December 23, 2010 at 12:00 am

Thomas Bayes:

That is not the original question. You are quoting Steve’s rephrasing of the original question. This is a simple case of pounding square pegs into round holes, and then acting shocked (shocked!) that it is difficult to get the pegs to go in smoothly.

If the goal is to act superior to others who just don’t get it, then taking a common question and rephrasing it in an absurd way is a great way to accomplish the goal.

If the goal is to help inform people about the difference between the mathematical expectation of a ratio and the ratio of expected values, then this was a terrible question to start with, since it is difficult to imagine a realistic situation where people interested in the number of girls born in a country would care about the slight difference.
119 119 Steve Landsburg
December 23, 2010 at 12:19 am

Tom: My reference to sarcasm was not intended as a personal attack; I was genuinely unsure (not knowing you, really) of whether you intended your statements sincerely or sarcastically. I was pretty sure of the former but didn’t want to be presumptuous.

You continue to be dead wrong about the specifics of the puzzle, as is shown by a) the calculation in my original post, b) the calculation in the Douglas Zare post I linked to, c) the (rather brilliant, in my opinion) Taylor Series calculation posted by Thomas Bayes in comments to the original post and d) several other arguments by commenters who have succeeded in understanding the issue.
120 120 Paul G
December 23, 2010 at 3:58 am

Hello, everyone. Like everyone else, I had a really hard time accepting the idea that the expected fraction could be anything other than the biologically-mandated 50%. The following extreme example helped it become intuitive for me:

Suppose a couple decides that if their first child is a boy, they will stop having children. If it is a girl, however, they will go on to have 100 more children.

They have a 50% chance of having a boy, and a 50% chance of having 101 children that will probably be about half boys and half girls. So the expected fraction of boys is approximately 75%. There is nothing really subtle about it – that is just how expected values work.

Someone mentioned gambling – you could easily use this strategy to have an expected rate of return of about 50% on a trip to Vegas, but it would not be a profitable strategy. The same obviously applies to stocks and could generate some attractive-looking but misleading claims for, say, advertising a mutual fund.
121 121 Henry
December 23, 2010 at 6:22 am

I wonder if might be better to simplify the question with a different stopping rule, e.g. “Parents have two children unless their first is a boy.”

For any given family, there’s a 50% chance they have 0% girls, a 25% chance they have 50% girls and a 25% chance they have 100% girls, for an expected fraction of 37.5% girls. This differs from the expected value because 100% boy families only have +1 boy, whereas 100% girl families have +2 girls.
122 122 Steve Landsburg
December 23, 2010 at 9:12 am

Paul G and Henry: These are good examples. Thanks.
123 123 Ken B
December 23, 2010 at 9:49 am

@Thoma Bayes:
Why are manhole covers round? This is a well known interview question with quite a lore. My answer is its easy to manipulate going back into the hole: won’t fall in, unlike most shapes, and doesn’t require rotational corrections. But the very BEST answer I have heard is this: “Its the medieval principle of ‘as above as below’. The circle is the most divine shape and the manho;e cover serves to remind those passing by and looking down that we live in a divinely ordered universe.”
124 124 Tom
December 23, 2010 at 11:30 am

PaulG,

The confusion has nothing to do with biology. The confusion arises because Steve is not calculating the expected fraction of females in a country. He is calculating the expected fraction of females in a family. A number of commenters here understand perfectly well what Steve’s doing, and it’s ok in its own right. It’s just not what the problem asks for.

The error and terror comes in when, in an attempt to gloss over Steve’s deviation from the problem statement, people start trying to make each family represent a country. That’s just silly.

The answer for the expected fraction of females in the country mentioned in the problem statement must converge to this: walk up to a member of the country and ascertain whether they are male or female. Repeat and average.

Steve’s answer fails that test. That is the source of the debate.
125 125 Ken B
December 23, 2010 at 11:39 am

@Tom:
Have you actually read any of the comments where I or Thomas Bayes or others have clarified the family thing? I know you don’t accepts Steve’s explanations, but he ain’t the only one in there pitching. You are just simply misunderstanding. Maybe Steve’s example was pedagogically ill-chosen as being easy to misunderstand, but the issue really has been clarified enough. Family or country the issue is *from a base group of couples….* The families are example base groups.

@JamesL:
Then you’d make a bad interviewer. Anyone on the under 50% side here BOTH gets the expected answer (50%) AND sees why it isn’t really an answer to the question actually asked. Not only does that show greater math competency it shows a better attention to the small details of definition that can be so important in programming.
126 126 Steve Landsburg
December 23, 2010 at 12:16 pm

Tom: You are definitely too stupid to think about this problem.
127 127 Tom
December 23, 2010 at 12:29 pm

Ken B,

I read them, but they just repeat Steve’s calculation of the female fraction in the average family. That’s a cooler calculation than what the problem statement actually asks for, Steve does it correctly, and it shows off the expectation(ratio) vs expected(g)/expected(g) distinction numerically. It’s wonderful.

There’s only one problem with it, it doesn’t answer the question posed in the problem statement. (I’m not talking about something vague about ‘pedagogy’ or ‘clarity.’ Everything’s perfectly clear.)

From the problem statement:

What fraction of the population is female?

Steve (et al., ok?) aren’t calculating that. Instead he’s calculating the fraction of girls in the average family. You folks have even been acknowledging that. (Sometimes. Sometimes some people try to call families ‘countries’ and construct an ensemble of those ‘countries.’ Cute but still not what the problem statement asks for.)

It’s not the same thing. The fraction of girls in the population, which is what the problems statement requires, is a slightly different calculation from the fraction of girls in the average family. This is because in a country large families constitute more of the population. You can confirm that the word population appears in the problem statement, while “*base couples*” does not

I can provide my calculation for the expectation value of the g/(g+b) ratio in such a country, but Dave has already done so and apparently nobody in the “average over families” group can understand what he’s saying. So, forgive me if I’m blunt, but I don’t have high hopes.

Again, I hope you’ll forgive my bluntness, but you guys are getting the whole context wrong here. You’re not explaining the correct answer to recalcitrant undergrads, you’re refusing to check your work and persisting in an elementary error.

Best of luck.
128 128 Thomas Bayes
December 23, 2010 at 12:47 pm

Tom:

I’m not sure this will help, but here is what I believe is important to learn from this problem:

1. The expected number of boys and girls born to a generation of families in a country will be equal. So, if the number of boys born is B and the number of girls is G, then you can expect B to be equal to G in the sense that the expected value of B-G is zero. This will not depend on the number of families in the country. It will be true for 1 family and it will be true for a trillion families. If this was the question, then Steve and others would not need to illustrate what happens with 1, 2, or some other finite number of families.

2. If, however, you are asked for the expected proportion of boys (or girls) in the country, then the answer will depend on the number of families in the country. To demonstrate this, Steve and others have shown that the expected proportion of boys deviates far from 50% when there is only one family in the country. It gets closer to 50% for two families in the country, and closer yet for three. The expected proportion of boys is different from 50% by an amount that is roughly equal to 25% divided by the total number of families.

3. Providing the expected proportion of boys (or girls) in a single family is not addressing a different problem. It is a way to show that the expected value for B-G is zero for any number of families, but the expected value of B/(B+G) depends on the number of families.

4. There are two reasons that it is important to recognize and not ignore the fact that the expected proportion of boys is not equal to 50%: i) nearly all of the ways that people arrive at an answer of 50% are technically wrong; ii) arriving at an answer of 50% by ignoring the 1/K term in the correct answer is somewhat ‘okay’, but, if you do this, you shouldn’t have any reason to be critical of an answer that retains the 1/K term.

Where in this do you think there is an elementary error?
129 129 Bennett Haselton
December 23, 2010 at 1:02 pm

I know I’m late to the party, but Dick Darlington made this point:

“Problem is, if one looks at the population at a given time, this situation is very unlikely, as a certain fraction of the couples would be in the process of having girls while waiting for a boy. By the time they all have a boy each, new families will be created, etc. In other words: at any point in time, there will be a certain number of women who don’t have a brother.”

True, however, I think this is cancelled out by the fact that assuming all residents of the country have the same lifespan, then just as there is an initial period where some girls in a family have been born but no boys have been born yet, there is a period at the end of their lives where the firstborn girls have died but the later-born girls (and the final male child) are still alive.
130 130 Tom
December 23, 2010 at 1:19 pm

Thomas,

Your answer is fine. Countries have lots and lots of families, and so obviously the answer is much closer to 50% than to 30.7%. The limit of a single-family country is very far from what the problem statement requires. If Steve, in his post, had gotten the answer “very close to 50%,” then I would never have commented. It was the incorrect answer, 30.7%, and the absurd attempts to defend it as the correct numerical answer, that generated the controversy. I think you realize that pretty well.

I have no problem with calling 30.7% the first estimate in an infinite sequence that converges to 50% but that is very slightly less than 50% for any country of finite size. It’s fine! I suspect Dave will breathe a sigh of relief as well. If the original post had done that, then I would never have commented.

Correctly-calculated answers for that come out to 50% in the limit of a large country have been presented again and again in this thread. The people who made those calculations understood these elementary issues just fine. (This is undergrad stuff, expectation of ratios, ratio of expectations. Come on!) We received dismissive, deprecatory, and absurd responses. (“Irrelevant,” “stupid,” “don’t have the math background,” “single-family country,” etc. )

The correctly-calculated answers have been repeatedly dismissed in favor of an extremely crude calculation that gives a big error. That’s the problem.
131 131 Neil
December 23, 2010 at 2:37 pm

Let me add some confusion.

To ask what is the expected sex ratio in a country, one must assume a population of countries. Then, if you chose one country at random from the population and counted the boys and counted the girls and divide one count by the other, what quotient should you expect to see before you do the count? Thomas Bayes says it depends on the total population of the country, but you won’t know what that is until you do the count (unless all countries are of the same size, but we have no reason to assume that.)

Now if you first count all children before you count them by sex, you can make a conditional expectation based on the total. I haven’t done it, but Thomas Bayes is very smart, so I assume his answer is correct for this conditional expectation.

OTOH, I think Tom is worried about the unconditional expectation. To answer that, I assume that you need to know something about the distribution of country size in your population. It is beyond my skill set, but I am guessing that if country size is a random variable, the unconditional expectation is 50%
132 132 Neil
December 23, 2010 at 2:39 pm

That is, proportion of 50%.
133 133 ErikR
December 23, 2010 at 2:46 pm

“@JamesL:
Then you’d make a bad interviewer. Anyone on the under 50% side here BOTH gets the expected answer (50%) AND sees why it isn’t really an answer to the question actually asked. Not only does that show greater math competency it shows a better attention to the small details of definition that can be so important in programming.”

I think JamesL would make a good interviewer. More important than pointing out minor mathematical distinctions — in a problem where such distinctions are not useful to anyone — is to understand what is useful to most people based on a somewhat ambiguous statement. When considering girls born in a country, computing

E(G) / ( E(G) + E(B) )

is almost always going to be a quantity that is at least as useful to people as

E( G / B )

No need to waste time on the latter when the former will do. Then you have more time to work on the really useful problems, those problems that you can solve that will be useful to many people, rather than wasting time on minor details of a problem that will be useless to almost everyone.

In the real world, choosing the right problem to work on (and partially solving it) is frequently vastly more important than comprehensively solving every minor detail of an obscure problem.

So, as an interviewer, I’d be most impressed by a candidate who hand-wavingly responded that the answer was close to 50%, and then, perhaps, talked in more depth about the solution to another problem that is only vaguely related to the one asked, but is much more relevant to most people.
134 134 Thomas Bayes
December 23, 2010 at 3:21 pm

Neil:

The unconditional expectation is the expected value of the conditional expectation. That is, if we know the expected proportion of boys for a particular number of people in the population, then we can assign probabilities to every possible population size and compute an average. I don’t know what would be a good assignment for these probabilities, but I do know that if every conditional expectation is less than 50% (and they are), then there is no way for the unconditional expectation to be 50%.
135 135 Bryan
December 23, 2010 at 4:23 pm

Steve, I know you will view the following problem as a different problem than your proposed problem, but I am curious how you would answer the following:

Suppose we have a single country of K families, each of which randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). Suppose K is large.

Now suppose you had to sample 100 children from this country, and you had to make your best guess about the fraction of those children who were male. What would your guess be? Additionally, please define, mathematically, what expectation it is you’re taking when you form your best guess of the fraction of males?

(I now appreciate that this was not the spirit of your original question, but I still think many people thought it was. This doesn’t make them “wrong.” They just answered the wrong question.)

If nothing else, this blog post proves your unique talent for stirring the pot (which is an important one).
136 136 Ken B
December 23, 2010 at 4:45 pm

@Steve:
This’ll learn ya. Had you said instead of 4 neighbours that you had 4 neighbouring countries ….
137 137 Steve Landsburg
December 23, 2010 at 5:00 pm

Bryan: The expected ratio of girls to boys in your problem is slightly greater than 1/2. For a subsample of 100, the expected ratio is also slightly greater than 1/2, by some amount that would not be too difficulat to calculate.

My actual guess — whether I shaded it above or below the expected value — would depend on the consequences of being wrong and the consequences of being right.
138 138 Neil
December 23, 2010 at 5:18 pm

Thomas Bayes,

Yes, that is obvious now. Unless you have a population of very large (infinite) sized countries, the expectation of the proportion of boys must exceed (I assume you meant to say) 50%
139 139 Will A
December 23, 2010 at 7:59 pm

A less controversial question would have involved a forest next to a city where everyone in the city is deaf during the half the day.

If a tree falls and makes a sound people plant a male tree in their backyard and stop planting trees.

If a tree falls and doesn’t make a sound, then people plant a female tree in their backyard, but don’t stop planting trees. What is the expected …

*** feel free to not post if there are already too many posts ***
140 140 Michael
December 24, 2010 at 1:57 am

Okay, you ask “what fraction of the population is female.” I have yet to see a convincing argument that that isn’t G/(G+B)

Your “tricky solution” is really the answer to the question, what is the average fraction of girls across all families? Your a-ha moment is really just that you’ve changed the question without telling us.
141 141 Steve Landsburg
December 24, 2010 at 9:07 am

Michael: Of course it’s G/(G+B). Who said it wasn’t?
142 142 Will A
December 24, 2010 at 12:38 pm

@ Michael:

I believe that there are 2 possible ways to read Prof. Landsburg’s answer.

The first is to read it as the answer is 30.86 for any country of any size.

The other way to read it would be the 30.86 would be answer for the case of countries with 1 family and as the number of families in a country approaches infinity, the answer for those countries approaches 50%.

What might be troubling to some is that the first reading of the answer doesn’t seem to be correct.

The second reading of the answer leads to different people wanting a more specific answer. People understand the point, but what the answer. Is the answer a function that depends on the number of people in a country? Is the answer not 50 and not more than 50?

Is there a single numerical answer to this question and to life, the universe and everything that is somewhere near the middle of 31 and 50 (42 maybe)?

I tend to choose the 2nd reading of Prof. Landburg’s answer. However since this blog is about tackling problems of philosophy, I feel justified in arguing what different terms like expected means.

E.g. if a country with 1 family has a boy, then I expect that country has no means to increase its population and I expect the population to become 0. I expect this because my expectation is not that people live forever.
143 143 Steve Landsburg
December 24, 2010 at 2:18 pm

Will A:

I believe that there are 2 possible ways to read Prof. Landsburg’s answer.

The first is to read it as the answer is 30.86 for any country of any size.

I don’t see how anyone could have read the answer this way. What led you to think this was a possible reading?
144 144 Will A
December 24, 2010 at 4:22 pm

I believe we are stuck in semantics. I ran a simulation based on your instructions. When the country size is 1 family, sum(G)/(sum(B)+sum(G)) is the same as the average number of girls per family in the country.

No matter the size of a country, I come up with the average number of girls per family to be 305. However as the size of the country increases G/(B+G).

E.g. with a country of 4 families, I expect sum(G)/(sum(B)+sum(G)) to be ~42% and the average number of girls per family to be ~30%.

The fact that I come up with 2 different answer leads me to conclusion that the answer depends on how one defines “fraction of a population”.

I would be willing to accept the definition of a demographer as to what “fraction of a population” means if the supposition of the problem was that Google was interviewing me for a demographer position.
145 145 Steve Landsburg
December 24, 2010 at 4:47 pm

Will A:

E.g. with a country of 4 families, I expect sum(G)/(sum(B)+sum(G)) to be ~42% and the average number of girls per family to be ~30%.

This is right.

The fact that I come up with 2 different answer leads me to conclusion that the answer depends on how one defines “fraction of a population”.

“Fraction of population” is quite unambiguous. For a country with four families, the correct answer is about 42%.

What else could “fraction of population” mean?
146 146 ErikR
December 24, 2010 at 6:06 pm

What else could “fraction of population” mean?

It could mean

E(G) / ( E(G) + E(B) )
147 147 Josh
December 24, 2010 at 9:21 pm

I’ll be the first to admit I didn’t follow the more technical aspects of all of this (even though the comments were nonetheless highly entertaining to me…weird sense of humor I guess…), but what helped more than anything to drive at least some of the intuition home was Paul G’s exaggerated example using the scenario where a family starting out either has a boy or (if their first child is not a boy) has 100 more children. It’s spoon feeding for most on here I realize, but it was helpful for me. Thanks.
148 148 Will A
December 25, 2010 at 1:54 am

@ Steve:

I have to apologize, I swore that I read your answer numerous times and only now to I see:
”
For a population of k families, a similar calculation gives an answer of approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.
”

So now I have to assume that anyone arguing with you must have missed this as well.

However, I still submit that given the randomness of births, any such population that follows this rule is bound to go to zero and therefore I expect the value of the population to be to be zero eventually.

To me this is like asking:
In Japan, the birthrate is approximately 1 child per couple. In the year 5281 what percentage of the population will be female. Well the answer is of course 0/(0+0). Which doesn’t seem to match (1/2) – (1/4k).

Of course, my math skills are pretty lacking. Is 0/(0+0) 50%?
149 149 Gunter
December 25, 2010 at 5:04 am

A strange country :
The country can be as small as one family but every woman in that country can bear an infinite number of children.
So the number of childrens in a family can be bigger than the number of families in the country.
150 150 Steve Landsburg
December 25, 2010 at 7:32 am

Will A:

Well the answer is of course 0/(0+0). Which doesn’t seem to match (1/2) – (1/4k).

And of course, there’s no reason why it should match, since your example violates the “reproduce till you have a boy” assumption.
151 151 Mariano M. Chouza
December 25, 2010 at 12:25 pm

@Gunter: the probability of a family reaching size k in this problem is 2^{-k}.
152 152 Bob Ayers
December 25, 2010 at 5:16 pm

In a certain country, we measured all the citizens for a certain trait T. The measurements of T are independent and identically distributed random variables. The trait T has known mean mu and variance sigma^2 (standard deviation sigma).
Question: What is the expected fraction of our measurements that are within one observed sigma of our observed mean?
Some might answer thus: The central limit theorem tells us that we will get a normal distribution. A tabulation of the normal distribution tells us that 0.682689492… of the normal distribution is within one sigma of the mean. So the answer is 0.68+
But that answer is WRONG. It is wrong because the central limit theorem only applies in the limit, while all countries have a finite population. For example, if the country has a population of two, and our measured values are P and Q, then the observed mean is (P+Q)/2 and the observed sigma is sqrt (2 * ((P-Q)/2)^2). 100% of the values are within one sigma. This is also true for a population of three.
The answer may APPROACH 0.682689… as the population increases, but it is NOT 0.682689…
++
I suggest that the above has some of the flavor of the stated problem. In both cases, the answerer has made a leap, and he is wrong if the population of the country is two or three. But, for this problem, the answer IS 0.68+ when the population is country-sized, e.g. several million. And similarly, the answer for the girl-boy problem is 0.50 (to many digits) when the population is country-sized
153 153 Steve Landsburg
December 25, 2010 at 6:00 pm

Bob Ayers:

You’ve missed the main point.

You write:

And similarly, the answer for the girl-boy problem is 0.50 (to many digits) when the population is country-sized

And yes, that is true. But it does not follow from the fact that the expected number of girls is equal to the expected number of boys.

You have, in fact, merely asserted this conclusion without proving it. And the main point here is that a) assertions need to be justified and b) the “usual” justification for this assertion is not valid.

(Moreover, of course, the answer is not exactly .5.)
154 154 Bob Ayers
December 25, 2010 at 9:09 pm

Steve Landsburg notes:

You’ve missed the main point.
You write:
And similarly, the answer for the girl-boy problem is 0.50 (to many digits) when the population is country-sized
And yes, that is true. But it does not follow from the fact that the expected number of girls is equal to the expected number of boys.
You have, in fact, merely asserted this conclusion without proving it. And the main point here is that a) assertions need to be justified and b) the “usual” justification for this assertion is not valid.

I’m sorry for having mis-phrased. I meant to parallel the question, which is not exactly what is quoted above but rather “What fraction of the population should we expect to be female”. Thus I should have written:
And similarly, the answer for the fraction of the population we should expect to be female is 0.50 (to many digits) when the population is country-sized.

And indeed I did not supply a proof, of that or of the central limit theorem.

I claim that I do understand the point, which is well-illustrated by Steve Landsburg’s earlier remarks and especially by his “4G 4G 4G 12B” example. I myself used a similar example: “Family flips a fair coin to decide on one girl or two boys but that does not mean that for small population we should expect to see E(g) = E(b)/2” And I calculated the latter for a variety of small family-counts. This example seems more amenable to calculation than the 4-choice one, tho it lacks the 50% result — and I expect that the fact that its calculation uses the binomial theorem straightforwardly may make a proof that E(g) approaches E(b)/2 for large N simpler.

Indeed the person/interviewee who reasons from “half boys” to expectations has made a leap that he has not justified; albeit one that gets the right answer. My little tale was meant to show solely that we make many such leaps based on the law of large numbers, and often they are a shortcut to the right answer, rigor or no. I was not attempting to knock Landsberg’s derivation of a different moral from his tale.
155 155 Will A
December 26, 2010 at 2:02 am

@ Steve:

I agree that the example I give for Japan doesn’t match. However, if we consider the countries with 1 family that follows the rules, the 1 family countries who have 1 boy first and stops has no way to reproduce. The future of the country is to have a population of 0.

So I expect the future fraction of girls in countries with 1 family who have boys first to be 0/(0+0).

Now if I assume that in these imaginary countries people live forever (in a way that I can consider imaginary blue and red balls drawn from an imaginary bag can exist forever), then I would say that the countries with 1 family who have a boy first would have 3 citizens who live forever.

This would be the mother, the father, and son. And in this country, the fraction of females would be 1/(1+2).
156 156 Will A
December 26, 2010 at 2:21 am

I forgot the other and most important case. If we assume that couples in these countries live forever until they have a boy and stop, we come up with your answer.

So if this was the assumption of the puzzle, I apologize. However a less ambiguous way of phrasing the question would have been.

Imagine a country where people hate living and whenever a couple has a boy, the couple dies. Therefore in this country every couple tries to have a boy. What is the expected …
157 157 Neil
December 26, 2010 at 8:46 pm

OT. I am interested in the population dynamics of this reproduction rule. At first, I thought that the population would go extinct with certainty because half the women (wombs) do not reproduce wombs. (Only wombs can reproduce.) Then I realized, the expected number of women in the subsequent generation is equal to the initial number. This suggests that the population follows a Martingale (am I right?) and goes extinct with positive probability.
158 158 Neil
December 26, 2010 at 8:51 pm

And, oh yes. Is there any relevance to the fact that the two counties most likely to approximate this decision rule (China and India, of course) are the most populous on this planet?
159 159 Will A
December 27, 2010 at 1:06 am

@ Neil:

Are China and India most likely to approximate this rule or are they most likely to use a decision rule where couples try to have as many boys as possible?

As it relates to probability theory, you probably know more than me. Are the odds that every generation has the exact number of boys and girls?

If the odds say that there will be times when the exact number of boys and girls are different, then there will be either boys or girls who are not able to form couples and therefore won’t be able to reproduce. This would be factor (if correct) that would lead the population to decrease.
160 160 Will A
December 27, 2010 at 6:59 am

@ Neil:

Also, from a setting a max point of view no generation can have more couples than the previous generation.

Consider the case of countries with 1 families. If the couple has:
B – no couples can be formed population dies.
GB – one couple can be formed
GGB – one couple can be formed (one boy joins with one of the girls)
GGGB – one couple can be formed
….

In general a country of k couples will have exactly k boys. And therefore the maximum number of possible couples in the next generation will be k.

The only way for such a country to not decrease is for the country to produce more girls in every generation. However Prof. Landsburg’s proof shows that less girls are produced than boys on average.

Therefore, any country that following this rule will eventually have zero children.
161 161 Thomas Bayes
December 27, 2010 at 10:04 am

Will A:
—
“However Prof. Landsburg’s proof shows that less girls are produced than boys on average.”
—
To be precise, this is not what Prof. Landsburg’s proof showed. Less girls than boys on average implies that the expected value of the difference B-G be positive. The thing he proved was that, for K families in the country, the expected value of the ratio B/(B+G) would be less than 1/2 by an amount equal to 1/(4K), even though the expected value of B-G would be zero. The apparent ‘contradiction’ of these two facts is the main point of this question.

In Professor Landsburg’s own words: “Moral: Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.”
162 162 Will A
December 27, 2010 at 1:08 pm

@ Thomas:

Thanks for the correction. I think though that these populations eventually go to zero. You are obviously (and I mean this) much more adept at mathematical concepts than I am.

Based on what you are saying does this mean that the population of these countries will increase, decrease, or stay the same?

I could be wrong, but I think this is what Neil was asking and I made a poor attempt at answering.

Different people can find different things interesting. What I would find interesting is if the population decreases overtime because this puzzle would fall into the set S where S is the set of puzzles that can have multiple possible morals. E.g.
– Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.
– A country like China whose citizens want to have a son could implement this policy and still reduce its population.

This puzzle seems to be part of this set. And therefore the “correct” answer is based on the moral that the interviewer is trying to get across.

If the interviewer doesn’t give the moral of the puzzle when asking it, then a “correct” answer is for the interviewee to pick a moral and answer the question in the way that matches the moral.

E.g. if an 8 year old asks what me “What is 1 and 1?” Either 2 or 11 should be correct and equally valid answers.
163 163 Will A
December 27, 2010 at 3:19 pm

Perhaps a grand unified solution is in order something like:

Let K be the set of countries having k couples at a given point in time.

The fraction of girls is approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.

Let g be the number of generations in the future of the given point of time. As g increases, the fraction of girls starts to approach 1-log(2).

For a sufficiently large g, the fraction of girls is 0/0.
164 164 Tom Grey
December 28, 2010 at 8:57 am

Thanks for really nice math puzzle, dressed up in girl/boy fractions.
China and India both really suffer from too many boys (selective abortion and/or infanticide of girl babies).

In reality, there is some limit to the number of children in any one family. Perhaps 20, 30, or 40? This means some families stop with all girls.

I don’t know what having a limit does to the math, but at the limit the simple expected number of boys adds to less than 1; let me try for #boys in 16 family country (with limit of kids at 4):
8/16 B 8 0
4/16 GB 4 4
2/16 GGB 2 4
1/16 GGGB 1 3
1/16 GGGG 0 4

Looks like 16 fathers, 15 sons=31, 16 mothers, 15 daughters=31.

So with a real-world constraint on family size, the 50-50 looks better. If there was a stronger max child constraint (like 3, or 2) it looks like at the limit there would be more daughter only families to balance.

(you also have to ignore non-identical twins)
Fun math practice.
165 165 Tim
December 28, 2010 at 9:33 am

Here is such a program, written in Python (I’m not a professional computer programmer, and I’m not trying to make a bet).

It simulates 1e6 families and obtains an answer of 0.5 (within errors). The number of families can be changed.

import random

numboys=0
numgirls=0
numfamilies=0
totalfamilies=1000000

while(numfamilies 0.5): #boys are 0.5
numgirls += 1
kid = random.random()
if(kid <= 0.5):
numboys += 1
numfamilies += 1
print numgirls*1.0/numboys*1.0
166 166 Tim
December 28, 2010 at 9:34 am

Sorry, that last code copied incorrectly for some reason: (left out a while loop). This is correct – you will have to correct the tabbing to make python happy.

import random

numboys=0
numgirls=0
numfamilies=0
totalfamilies=1000000

while(numfamilies 0.5): #boys are 0.5
numgirls += 1
kid = random.random()
if(kid <= 0.5):
numboys += 1
numfamilies += 1
print numgirls*1.0/numboys*1.0
167 167 TF
December 28, 2010 at 9:52 am

Landsburg asks one question, then answers another.

If the question is, “What fraction of the *population* is female?”, the correct answer is 1/2. Really, truly, honestly. As Landsburg himself proves.

If the question is, “What is the mean percentage of girls in a family?”, then the answer is different.

Reconciling the two is the fact that (under this system) the families with more girls are uniformly larger, and thus contribute greater weight to the population.

You run into similar issues when looking at economic statistics aggregated by household vs. looking at the population as a whole. Households of different sizes have different characteristics, and a large household carries greater weight in the overall population than a smaller household.
168 168 Steve Landsburg
December 28, 2010 at 9:57 am

Ted Fischer:

Landsburg asks one question, then answers another.

Take my bet then. We’ll let a random panel of statisticians decide what question was asked.
169 169 ptuomov
December 28, 2010 at 10:36 am

You need to clarify the questions. It’s not clear to me from the question whether you are looking for the expectation of ratio or ratio of expectations. That is the expectation of (females / population) or (expectation of females / expectation of population).
170 170 Mark
December 28, 2010 at 10:54 am

Lets line up all the families in the world. Each family will flip a coin until they flip a head. Then they pass the coin to the next family who does the same.

Isn’t this the same as if it is just me flipping a coin until I get a head. Then I do it again, and again, and again.

In the end all I have done is flip a coin lots of times and the expected number of heads is 50%.
171 171 TF
December 28, 2010 at 11:18 am

Steve, ask that statistician yourself (though strictly speaking this is probability not statistics). There is no need to frame it as a bet, and their opinion on the wording does not influence my opinion of the correct answer.

When you ask, “What fraction of the population is female?”, the calculation is simply the number of girls divided by the total population. Inserting a behavioral pattern ahead of that question does not change the calculation specified by the question, and the question makes no reference to family structure.

Note that we don’t disagree on the calculations. We agree that half of the births in the country specified will be female (and thus half of the children in the schools). We agree that half of the families will have a single boy, a quarter will have a girl and a boy, and so forth. There is no mathematical disagreement involved. We simply disagree on whether “What fraction of the population is female?” references family structure or not.
172 172 Mark
December 28, 2010 at 11:18 am

Ok, correcting myself, I was assuming an infinite number of times that I stop and start. But the problem as stated has a finite number of families in the country. So the answer would depend on the number of families and is not 50%.
173 173 TF
December 28, 2010 at 11:21 am

Is this even an honest question? Or an attempt by a behavioral economist to determine whether or not people are willing to place silly bets with people they’ve never heard of?

Yes, I’m certain I’m correct. No, I’m not interested in betting under any circumstances. Call that a philosophical objection, if you like.
174 174 Steve Landsburg
December 28, 2010 at 11:23 am

TF:

When you ask, “What fraction of the population is female?”, the calculation is simply the number of girls divided by the total population.

Absolutely.

Note that we don’t disagree on the calculations. We agree that half of the births in the country specified will be female (and thus half of the children in the schools).

We absolutely do not agree on that. If you believe that half the births in the country specified will be female, then I hope you will take my bet.
175 175 TF
December 28, 2010 at 11:35 am

Okay, I’ve slogged through some of the additional comments…

If you take a fixed number of couples, and they all follow this procedure until their families are complete, and then you stop and analyze the outcome, your expected fraction will be very slightly less than 1/2.

Yet half of all births on an ongoing basis are female.

I guess I can see how you are interpreting the question, though I still disagree that it is a natural interpretation.
176 176 Steve Landsburg
December 28, 2010 at 11:37 am

TF:

your expected fraction will be very slightly less than 1/2.

Yet half of all births on an ongoing basis are female.

Nope. It doesn’t matter whether or not you wait till the families are complete; the answer will be less than 1/2 either way. And it is not true that half the births on an ongoing basis are female.
177 177 Erick Fejta
December 28, 2010 at 11:41 am

Can we increase the number of families to one thousand?
178 178 Steve Landsburg
December 28, 2010 at 11:44 am

Erick Fejta:

Can we increase the number of families to one thousand?

Sure, as long as I get to update my prediction for the expected ratio (to .49975, in fact), and as long as we increase the number of runs well past 3000.

If you’re prepared to put money on this I’ll be glad to get more specific.
179 179 TF
December 28, 2010 at 12:17 pm

Maybe another twist on it from a different angle?

* If an actual country were to operate this way, half of its births would be girls. But then, at any time there would be families that have not yet had a boy.

* The families with more girls than boys are larger than those that are evenly split (or those with just a boy). Similarly, if you were to repeat this experiment with k families each time, those trials in which the girls outnumbered the boys would have more children than those trials in which the boys outnumbered the girls. (There would always be exactly k boys, so this should be obvious.)

* Thus for a fixed number of families in a “one off” trial, the expected fraction of boys will be very slightly greater than the expected fraction of girls. And this is the question which Landsburg intended. But in the long run, aggregating the populations of the different samples, the proportion approaches a limit of 1/2. Thus the essential expectation is not violated.

I still don’t see why it is necessary to frame this as a bet. Mathematics is inherently provable. If there is a disagreement, then either the interpretations differ or one side is provably wrong. Neither is an appropriate basis for betting.
180 180 EF
December 28, 2010 at 12:29 pm

No thanks, I would rather be on your side of the bet.

Here’s python code that runs the simulation: http://commondatastorage.googleapis.com/elf/familygirls.py

Simulation 2990: 1 girls, 4 boys, 20.000000% girls
Sum of 2991 simulations: 12114 girls, 11964 boys = 50.311488% girls
Average of 2991 simulations: 44.314615% girls

Simulation 2991: 3 girls, 4 boys, 42.857143% girls
Sum of 2992 simulations: 12117 girls, 11968 boys = 50.309321% girls
Average of 2992 simulations: 44.314128% girls

Simulation 2997: 8 girls, 4 boys, 66.666667% girls
Sum of 2998 simulations: 12140 girls, 11992 boys = 50.306647% girls
Average of 2998 simulations: 44.317036% girls

Simulation 2999: 0 girls, 4 boys, 0.000000% girls
Sum of 3000 simulations: 12148 girls, 12000 boys = 50.306444% girls
Average of 3000 simulations: 44.309713% girls
181 181 Jim Robinson
December 28, 2010 at 12:30 pm

Steve,

Your example about 3 families with 3 girls and one family with 12 boys does not follow the original problem. The family with 12 boys can not exist. The family could only have one boy and be done. Therefore, the example you created to illustrate your bogus math is false.
182 182 Steve Landsburg
December 28, 2010 at 12:39 pm

TF:

If an actual country were to operate this way, half of its births would be girls

You keep saying this. But it is not true.
183 183 Steve Landsburg
December 28, 2010 at 12:42 pm

Jim Robinson:

You missed the entire point of the example.

The “official” solution to the brain teaser confuses the expected value of a ratio with the ratio of the expected values.

The families-on-the-block problem is a completely separate problem in which you get a clearly wrong answer when you confuse the expected value of a ratio with the ratio of the expected values.

It’s not supposed to be the same problem. It’s supposed to be yet another illustration of how this invalid argument can lead to incorrect results.
184 184 Will A
December 28, 2010 at 12:50 pm

@ TF:

For the most part if someone says that something is a riddle, they are looking to make some point or as Prof. Landsburg put it, a moral.

As a behavioral economist, I would think that the following would be the correct answer for your field and possibly just a defensible given the moral as Prof. Landsburg’s answer.

Answer: Google interviewers shouldn’t ask ambiguous questions and hire someone based on their being 1 answer. This is unfair and unfair hiring practices are evil.

Moral: Just because a corporation has a goal to do no evil, doesn’t mean that it never does evil.
185 185 TF
December 28, 2010 at 12:50 pm

“It doesn’t matter whether or not you wait till the families are complete; the answer will be less than 1/2 either way. ”

Hm, depending on how you define it, you might be right…

If you simply count the first million children, then the expectation is precisely 1/2.

If, on the other hand, you cycle it in rounds? With each family having one child per round (if it is not yet complete)? The expectation is that k/2 families in the first generation will have girls, but (as before) those random trials in which there are more boys in the first generation correspond to lower populations than those in which there are more girls in the first generation. So your point holds.

It *is* an interesting problem.
186 186 Jim Robinson
December 28, 2010 at 12:56 pm

Steve,
I also forgot to mention, when you set k to a finite number, of course it’s not going to equal .5 in your equation. However, when dealing with statistics and expectations, we are trying to determine an expected number for a population, which (because of the Central Limit Theorem) allows us to use a normalized population. Thus, we can use infinity for k, so as k approaches infinity, your equation approaches .5, which is the expected value. Now, there is also the confusion of what is the question asking for and what are you trying to solve? Is the question asking for E[G]/(E[G] + E[B]) or is it asking for E[G/{G+B)]. The former is the expected number of girls divided by the expected population, which most people would believe is what the question is asking for. It asks for the fraction, NOT the expected fraction. The latter is asking for the expected fraction, which is how you viewed the problem. Even so, your solution isn’t entirely correct. After calculating out to a family with 12 children (11 girls), the solution fails to change from 30.7% (after rounding). The percentage gets more accurate, but does not change from 30.7%.
187 187 General Apathy
December 28, 2010 at 12:59 pm

It is clear that this economist can’t read his own question. In no way can “what fraction of the population is female?” be twisted around into what is the average fraction of females in a given family. I would be happy to put up a large amount of money against your argument however based on the mental gymnastics you are conduction to ignore your mistake I doubt you would ever pay. You should be embarrassed.
188 188 TF
December 28, 2010 at 1:02 pm

“If an actual country were to operate this way, half of its births would be girls”

As another commenter wrote, getting pregnant is a lot like flipping a coin. My country only has one bedroom, so pregnancies must necessarily happen sequentially. But because this is a healthy, thriving country, there is always somebody waiting to use the bedroom.

Coin flip after coin flip, the expectation after any fixed number of births is always 50%. The only way to budge it off that number is to use a fixed and finite number of families. Your exception relies on a changing denominator.
189 189 Ian
December 28, 2010 at 1:04 pm

It’s easy if you use a number of reproducing couples that is a power of 2 so that you can “split” the population.

N is the number of reproducing couples.

let N=2^1=2
Results
1:b
2:gb
Summary
# of boys = 2, number of girls = 1; 66.7% boys

Let N=2^2=4
Results
1:b
2:b
3:gb
4:ggb
Summary
# of boys = 4, number of girls = 3; 57.1% boys

let N=2^3=8
Results
1:b
2:b
3:b
4:b
5:gb
6:gb
7:ggb
8:gggb
Summary
# of boys = 8, number of girls = 7; 53.3% boys

…

Let N=2^m=n
Summary
# of boys = n, number of girls = n-1; n/(2n-1)x100% boys

Conclusion
As n->inf, proportion of boys -> 50%, but is always greater than 50%
190 190 Steve Landsburg
December 28, 2010 at 1:05 pm

TF:

Coin flip after coin flip, the expectation after any fixed number of births is always 50%.

You can say this another 3000 times and it still won’t be true.
191 191 Steve Landsburg
December 28, 2010 at 1:09 pm

General Apathy:

In no way can “what fraction of the population is female?” be twisted around into what is the average fraction of females in a given family.

Right. And in no way can my analysis of the average fraction in a country be twisted around into an analysis of the average fraction in a country.

But we can leave that to a panel of statisticians to decide. I’m happy to have us each put up $5000 in advance, to be held by a neutral party, so you don’t have to worry about reneging. Do we have a bet?
192 192 Scoot AO
December 28, 2010 at 1:09 pm

How can we have a family of 12 boys in this scenario? Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.
193 193 Steve Landsburg
December 28, 2010 at 1:13 pm

Scoot AO:

Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.

Wanna bet?
194 194 Pietro Poggi-Corradini
December 28, 2010 at 1:13 pm

Steve,

do macroeconomists worry about these sort of issues when they try to aggregate ratios like W_i/P_i, wages over prices for a single firm, to get a notion of average real wage?

More generally could the puzzle be rephrased in an economics context?
195 195 Jim Robinson
December 28, 2010 at 1:20 pm

You missed the entire point of the example.

The “official” solution to the brain teaser confuses the expected value of a ratio with the ratio of the expected values.

The families-on-the-block problem is a completely separate problem in which you get a clearly wrong answer when you confuse the expected value of a ratio with the ratio of the expected values.

It’s not supposed to be the same problem. It’s supposed to be yet another illustration of how this invalid argument can lead to incorrect results.

(I apologize for not knowing how to italicize or quote on this site.)
Actually, if you read my second post, you would hopefully see that the question is not asking for the expected value of ratios. It is asking for the ratio of expected values. You are the one making this confusion, and I can at least understand why you’re thinking that way. However, to make the assumption that it is asking for the expected value of ratios and then treating everyone else like they are wrong without proving without a doubt that it’s asking for the expected value of ratios in addition to treating people making the original assumption like they are stupid is both egotistical and a terrible way to try to sell a book.
196 196 TF
December 28, 2010 at 1:21 pm

“Coin flip after coin flip, the expectation after any fixed number of births is always 50%.

You can say this another 3000 times and it still won’t be true.”

Let me clarify, please. You are saying that if families operate under your system, then the expected number of girls out of the first thousand births is NOT precisely 500.

I would echo your words, except that would be an incredibly foolish statement. We seem to disagree on this point. I’ve made my case, do you care to offer yours?
197 197 Scoot AO
December 28, 2010 at 1:21 pm

Sure. You can’t average percents unless they have the same denominator.

And how about explaining how you can have a family of 12 boys inested of ignoring that part?

Anyway, you area making it much too hard.

Every year some families go to the hospital to have babies. Assuming for convenience that the probability of a boy is 50% (close enough for this kind of problem), half the babies added to the population will be girls, and half will be boys.

This will be the case every year. Thus no matter how long the population does this kind of thing, the children will all be arriving in 50/50 ratio. And thus the population, even if it started with some other ratio, will tend over time to be 50%.
198 198 Scoot AO
December 28, 2010 at 1:24 pm

I’ll take the bet. Have a team of statisticians (although the commentats here should be enough to satisfy any impartial observer that you are wrong)review your ‘solution.’ Make sure they read the question though because you don’t even answer the question asked.

A Big Answer

280 Responses to “A Big Answer”

Leave a Reply

Search:

Recent Posts

Archives

Econ Blogs

Math Blogs

Philosophy Blogs

Science Blogs

Unclassified Blogs