I was going to wait a few days before posting the answer to yesterday’s puzzler, but we’re up well over 100 comments already, the holidays are almost upon us, and I think it’s time to settle this so you can all give your full attention to whatever festivities you’ve got coming up.

Here’s the puzzle again:

More precisely: What fraction of the population should we *expect* to be female? That is, in a large number of similar countries, what would be the average proportion of females?

Stop reading here if you don’t want spoilers:

- Here’s the
*wrong*answer: Every birth has a 50% chance of producing a girl. This remains the case no matter what stopping rule the parents are using. Therefore the expected number of girls is equal to the expected number of boys. So in expectation, half of all children are girls. - Pretty convincing, eh? So why is it wrong? Well, actually, most of it is right. Every birth has a 50% chance of producing a girl — check. This remains the case no matter what stopping rule the parents are using — check. Therefore the expected number of girls is equal to the expected number of boys — check! But it does not follow that in expectation, half of all children are girls!
- To see why not, let me tell you about the families who live on my block. There are 3 families with four girls each (and no boys), and one family with 12 boys (and no girls). Altogether, that makes 12 girls and 12 boys — equal numbers! On average, each family has three girls and three boys. Nevertheless, the fraction of girls in the average family is not 50%. It’s 75% (the average of 100%, 100%, 100%, and 0%).
- In other words, if you were to choose a random family off my block, the expected number of girls would equal the expected number of boys — 3 in either case. But the expected
**fraction**of girls in the family would be 75%. Moral: Just because two variables have an expected**difference**of zero, you can’t conclude they have an expected**ratio**of one. That needs to be computed separately.

**Edit**: This came up in comments, and it might be worth adding here: This example in no way relies on there actually being four families. Suppose there’s just one family, that randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). In that family, the expected number of girls equals the expected number of boys, and the expected fraction of girls is still 75%. The country in the original problem is drawn randomly from a universe of possible countries, just as this family is drawn randomly from a universe of possible families.

So that explains why the “obvious” argument is wrong, and that, to me, is really the interesting part. But of course we’re not done until we find the **right** argument. That’s a bit trickier, and it depends on the country’s population. I’ll start with the case where there’s just one couple. Here are some possible family configurations, with their probabilities:

From this we see that the expected number of boys is

which adds to 1. And the expected number of girls is

which also adds to 1. Sure enough, the expected number of girls is equal to the expected number of boys.

But the expected **fraction** of girls is

which adds to 1-log(2), or about 30.6%.

For a population of k families, a similar calculation gives an answer of approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.

Several of our commenters were on to various aspects of this, and some were on to pretty much all of it. In no particular order let me acknowledge Vic, DaveB, KenB, ThomasBayes, loveactuary, JonathanCampbell, Brett, wellplacedadjective, JonathanKariv, and mobile — and let me apologize for anyone I’ve inadvertently omitted (with so many comments to digest, I’m sure there are a few). But above all, a humble tip of the hat to the mathematician and backgammon expert Douglas Zare who inspired this post with his brilliant exposition over at MathOverflow (his is the first of the several answers). (Warning: Depending on your technical background, you will find his explanation either perfectly illuminating, perfectly indecipherable, or somewhere in between.)

Now go enjoy your holiday. If your relatives like this kind of thing, you can share it with them. If not, you can use it to get them to leave you alone.

I respectfully disagree with your analysis; I believe your error is one of units.

Consider a car that travels at 30mph for one mile, then at 60mph for another mile. Each of these speeds is a ratio (miles per hour), and if you were to average them by computing a weighted average over each mile traveled, you would get 45mph. However, the car’s actual average speed is (2 miles / 3 minutes) = 40mph.

If you had instead computed a weighted average over each hour traveled, you would have gotten the correct answer: (2min * 30mph + 1min * 60mph) / 3min = 40mph

In short, (I think) it matters that the weights be the same units as the denominators of your ratios. It worked when we used weights expressed in time, but not when we used weights expressed in distance.

Your gender ratios are expressed in units of girls/children, but your weights are expressed in units of families (or boys; since every family contains exactly one boy in this model, the two are identical).

When you use the correct weights (number of children in the family), you get:

(1/2 * 1child * 0girl/1child) + (1/4 * 2child * 1girl/2child) + (1/8 * 3child * 2girl/3child) + …

= 1/2 * 0girl + 1/4 * 1girl + 1/8 * 2girl + …

= 0.5girl

Thus, the expected ratio of girls in the population is still 0.5.

There is a hidden assumption in your solution, which is: at the moment we are looking at this population, each family contains exactly one boy and a certain number of girls (distributed as a geometric variable with parameter 1/2). In other words, every family ever created has completed its “reproductive cycle”. In yet other words: in this population, every woman has a brother.

For such a population, formed of independent “complete” families, your computation is certainly correct.

Problem is, if one looks at the population at a given time, this situation is very unlikely, as a certain fraction of the couples would be in the process of having girls while waiting for a boy. By the time they all have a boy each, new families will be created, etc. In other words: at any point in time, there will be a certain number of women who don’t have a brother.

In _that_ setup, the “official” argument giving 50/50 is correct.

I didn’t do the computation, but I would bet that the proportion of women with no brothers, or the proportion of “incomplete” families, at a time when the population is of size k, is of order 1/k, which explains the discrepancy between the outcomes in the two setups.

Now, which of these models corresponds to your initial question is something you will have to tell us, as it is not clear from the wording (as is almost always the case in such riddles).

Another flaw: when you give the formula (1/2) – (1/4k), you say “For a population of size k…”, but the linked explanation says “With k families…”.

With k families, this does work out, assuming you accept the validity of computing a weighted average of ratios without using the same units for weights and denominators (as I object to in my previous post).

With k *children*, the math is much simpler. No matter how you select the k children (chronological order of birth, taking as many children as needed from each of a list of parents, etc) each of the k children was equally likely to be born a boy or a girl, and thus total fraction of girls is 0.5. This is exactly the argument you claim is wrong, but you didn’t actually refute it; you just provided a different argument that gets a different answer.

How can the ratio differ depending on whether k refers to the number of families or the number of children? As Douglas Zare correctly claims in his opening paragraph over on MathOverflow: “The proportion of girls in one family is a biased estimator of the proportion of girls in a population consisting of many families because you are underweighting the families with a large number of children.” Those underweighted families are those with a large number of girls, and so you undercount girls by sampling families instead of children.

Why was it a good idea in the first place to use a biased estimator of the actual quantity we care about (“fraction of the population”), when we can compute the quantity directly?

Whoops. In my first post, I flubbed the addition on the very last step. That infinite sum is = 1 girl (not 0.5 girls), the expected number of girls per family. Since every family has a boy, this is what I should have expected.

I am not convinced. Your answer looks like mathematical sleight-of-hand to me. Now, the girl named Florida problem was similarly counter-intuitive at first, so I’m open to the possibility I’m missing something subtle. But to convince me, you’ll need to address Doug’s point from the previous thread:

What’s the difference between the two?

I am not convinced. Your answer looks like mathematical sleight-of-hand to me. Now, the girl named Florida problem was similarly counter-intuitive at first, so I’m open to the possibility I’m missing something subtle. But to convince me, you’ll need to address Doug’s point from the previous thread:

What’s the difference between the two?

Henry: With regard to Doug’s point I think that whether you beat the market or not is dependent on the expected difference between the number of up days and down days which as was shown is zero. So the fact that the expected ratio of ups to downs is greater than 50% does not mean you have beaten the market.

(This ignores the issue of the amount going up being the same as the amount going down etc. which would make the comparison with the stock market meaningless anyway. I am assuming Doug’s point is referring to a fair coin tossing game of some sort.)

David Sloan:

you say “For a population of size k…”, but the linked explanation says “With k families…”.Right. I meant to say “For a population with k families….”. I’ve corrected this in the post. Thanks for catching this.

each of the k children was equally likely to be born a boy or a girl, and thus total fraction of girls is 0.5.This is exactly wrong.

It is true that each of the k children was equally likely to be born a boy or a girl, and thus the expected

differencebetween boys and girls is zero. Itdoes not followthat the expected fraction of girls is .5. You’ve made no attempt to provide an argument for this, and in fact no valid argument exists.Edited to add:You want to assume a fixed number of children, but this is quite contrary to the spirit of the problem. Presumably you want to observe this country at some fixed moment in time, and at any given moment, the number of children is a random variable.Dick Darlington:

In _that_ setup, the “official” argument giving 50/50 is correct.The “official” argument cannot give 50/50 because the official argument never even attempts to address the question that was asked. It addresses the expected difference, not the expected ratio. It *does not even offer an argument* for a 50/50 ratio (or any other). And in fact, *whether or not* some families are still reproducing, the correct ratio is *not* 50/50.

So — to compute the correct ratio, you do need to make some assumptions. I made certain assumptions; you’re arguing for others. That’s fine. But there are *no* assumptions under which 50/50 is the correct answer.

Henry:

If the answer was anything other than 50% one could easily outperform the stock market (or any martingale for that matter).This argument fails precisely because the expected fraction of girls is not a martingale.

I am a little confused. I don’t get the distinction between number and fraction. What is the fraction of girls in your apartment block?

There are 12 boys and 12 girls. The simplest way to calculate the fraction of girls is to divide number of girls by total children = 0.5. The fraction of girls is therefore 0.5.

A different way is to calculate the average number of girls in a family, = (1+1+1+0)/4 = 0.75.

If you pick a random family, the fraction of girls expected is 0.75, but that was not the question asked, which was what is the fraction of girls in the block? To my mind, if you use the random family method, you get the “wrong” answer (because there is no account taken of family size.)

Are you saying the fraction of girls in your block is 0.75?

In the countries question, you seem to be saying that there will be equal numbers of boys and girls, but the fraction of girls will not be 0.5, which I don’t quite understand.

Harold: The fraction of girls in my apartment block happens to be 50%. But if you choose a family at random, the expected fraction in that family is not 50%.

Now suppose there was just one family in my apartment block, which drew a ball from an urn to tell it whether to adopt four girls or twelve boys. They adopt four girls with probability 3/4 or twelve boys with probability 1/4.

That would be the only existing family. Based on the information you’ve got, the expected ratio in that family is 50%, although the actual ratio may turn out to be either 100% or 0%.

Likewisie with countries. A given country is a draw from a distribution of possible countries. The actual ratio, if you lump all those hypothetical possible countries together, is 50%. But the ratio in any *particular* country is not 50%, either in actuality or in expectation.

I call shenanigans.

The question was “what fraction of the

populationis female?”You have answered “what fraction of a randomly chosen family is female”

Onus:

The question was “what fraction of the population is female?”You have answered “what fraction of a randomly chosen family is female”

No!!! I have answered “In expectation, what fraction of the country is female?”, or in other words, what fraction of a country, randomly chosen from a universe of such countries, should we expect to be female?

The

familiesin the example I gave are analogous to thecountriesin the original question. The explicit calculation that I gave deals with the countries, not the families.Steve:

Your universe of countries is one in which each country is made up of the same family type. So Country A is all “B”, country B is all “GB”, country C is all “GGB”, etc.

If I randomly chose from those countries, then fair enough, your answer is correct. That wasn’t the question though.

Onus Probandy:

Your universe of countries is one in which each country is made up of the same family type. So Country A is all “B”, country B is all “GB”, country C is all “GGB”, etc.That is absolutely not true. Did you bother to look at the calculation before you claimed this?

(To elaborate: I started with an example in which each “country” consists of exactly one family, to illustrate what’s going on. Obviously, with one family, there’s one family type. Then I observed that if there were two or three or k families, you’d need a subtler calculation, which would fail to yield 50% for exactly the same reasons. That calculation does not assume one family type per country. You can see more technical details in the Douglas Zare post that I linked to.)

One nitpick with Douglas Zare’s answer:

–

“It is not enough to argue that the expected number of boys equals the expected number of girls, since we want E[G/(G+B)]≠E[G]/E[G+B]. Expectation is linear, but not multiplicative for dependent variables, and G and G+B are not independent even though G and B are.”

–

Independence (or, more precisely, correlation) isn’t the only issue. Even for independent variables, the expected value of a ratio is not equal to the ratio of the expected values. (The expected value of a product of uncorrelated variables is the product of the expected values, though.) This is one of the most important keys to understanding this problem, I believe. And this is why I suggested the Taylor series to expand the ratio about its mean. I also think it is a little easier to find the expected proportion of boys because the random part (G) only appears in the denominator. Also, B is equal to the number of mothers, so I don’t believe B and G are independent because I don’t believe the number of girls is independent of the number of mothers.

Overall, though, Douglas Zare’s approach is a good one.

Thomas Bayes: Excellent points. Thanks for this.

I did read your calculation. I’m not convinced it’s justified (I’m happy to be wrong, I’m only an Internet lurker, not a mathematics professor, but at present, I don’t follow the logic).

If you say my analysis of your calculation is wrong. Here is a secondary question then: what calculation

wouldyou do to answer the question you say you haven’t answered?What fraction of a randomly chosen family is female?One half of the families are 0%

One quarter are 50%

One eighth are 66%

..etc…

I can pick any of these families at random. Half of my selections will return a 0% family, one quarter of my selections will return a 50% family, etc.

But hold on… that’s

yourcalculation. The one you say you haven’t done.(Maybe I’ve made two mistakes, and they are cancelling out)

My guess at the fault in your calculation is that your selection is from a pool of variable sized families. The “family” is your unit. Your question is phrased so that you should be randomly selecting individuals. You should really be scaling each term by “proportion of population represented by this group of families” as well, since e.g. 1/8 of the families are three times bigger than 1/2 of the families.

Except that you’ve got another fault… you’ve completely ignored the parents. There is at least “GB” in every configuration. The largest part of your population is the one most affected by this, since it turns “B” into “GBB”. Perhaps you’re allowing for that by assuming that every parent is also a child… I might have missed that bit of reasoning, if so, forget this paragraph.

Looking above, I think I’m making the same argument as David Sloan is in the first comment. Essentially: your units mismatch.

Half of families doesn’t equal half of the population.

Onus:

You write:

The “family” is your unit.That’s where you’re confused. The *country* is my unit. The analysis depends on the number of families in that country. I’ve done the detailed calculation in the special case where the country has one family. But I’ve also pointed out that the same principles underlie the calculation you’d do in a country with two or three or k families.

So my snippy “Did you read the calculation?” was too harsh, because of course the calculation does apply only to a single family. But that’s

notbecause I’m choosing the family as a unit; it’s because I’m choosing a country-with-one-family as a unit. And I’m pointing out that you’d get a similar result if you used a country-with-two-families or a country-with-100-families as a unit. You can find details in the Zare post that I linked to, or in the excellent analysis by Thomas Bayes in yesterday’s thread.Onus:

PS: Yes, I relied on each parent also being a child. If you want to drop this and start with a generation of Adams and Eves, the calculation will be slightly different. Tweaking the assumptions changes the outcome. But there are

noreasonable assumptions under which the outcome is 50%.@Thomas Bayes:

This is off topic but I infer you might be a Bayesian. Do you (or Steve) know any good prose explanations why Bayesianism is the correct approach? I don’t want a textbook, I want an apologia pro vita bayesian. I had a good frequentist upbringing but have had heretical doubts for a while now.

Thanks in advance.

Ken B: If you can find it, Howard Raiffa’s hard-to-find white paperback book on Decision Analysis is likely to be exactly what you’re looking for.

Your question is: “That is, in a large number of similar countries, what would be the average proportion of females?” To which the answer is not 50%. And also: “The actual ratio, if you lump all those hypothetical possible countries together, is 50%.”

I am struggling to understand the difference between “lump all possible countries together” and “average”.

There are two possible questions.

1) What is the fraction of girls in all the countries added together?

2) What is the expected fraction of girls in a particular country?

By saying “What fraction of the population should we expect to be female?” seems to be Q1, and “in a large number of similar countries, what would be the average proportion of females?” I read as Q2.

In the single family example, I can see that the expected fraction of girls is 0.75, and the expected number of girls equals the expected number of boys. If I were to ask “What is the expected fraction of girls for a single family” then 0.75 seems right. But “in a large number of such families, what is the average proportion of females?” 0.5 seems right, as I would feel it necessary to include the infomation that each family of boys has 12 members, and girls only 4.

the key is that there are 2 questions, and then answering the right one.

I agree with this answer now. My attempt to resolve the martingale paradox (and possibly help others with understanding it) runs as follows:

Suppose every family in this country also bets $1 every time they conceive a child that it will turn out to be a boy. We decide to look at the proportion of bets each family loses. Half of the families, with a single boy, lose 0% of their bets. A quarter of the families, with a girl and boy, lose 50% of their bets. One eighth lose 75% of theirs bets, and so on.

Using the same summation as above, we can calculate the average number of bets a family loses (i.e. the number of girls it has): ~30.6%. This means the country came out ahead in its betting, right? Wrong. The families that lost the highest proportion of bets also

made the most betsin the first place. A family that lost of 75% of its bets lost $2, a family that lost 90% lost $8 and a family that lost 95% lost $18.We can see that the relationship between the

numberof bets a family loses and theproportionof bets it loses is non-linear. A increase in the former results in smaller and smaller increases in the the latter. Thus, the heavily losing families do add as much to the population’s losing proportion as they do to its absolute level of losses.These families can all represent different probabilistic states of the world for a single family. In the states with large families, they do not the same relative amount of “credit” for a high proportion as they do for a high absolute number of girls. Hence, the proportion is lower than 50%.

That’s where you’re confused. The *country* is my unit.Gulp. I don’t dare say that you’re confused.

The country is manifestly not your unit, you haven’t given probability distributions for countries. The table you give is in terms of families:

Here are some possible family configurations, with their probabilities:You go on to sum the probabilities in this table (which you have stated are family probabilities) with weightings by how many girls are in each family (still no countries). You are sampling from families, not countries. I accept that you go on to modify for a finite country with k families, but that’s merely a parameter telling you how far to sum along that infinite series; but I am talking about your 1-log(2) answer, and ignoring the k version.

Your population is a set of balls. Half of those balls have “B” written on them, a quarter has “GB” written on them, etc. In your analysis you choose from those balls at random, and have calculated the expected ratio of “G” to “B” on that randomly chosen ball. The question is nothing to do with the balls though. The question asks how many “G”s there are.

You keep saying that that isn’t what you’ve calculated, in which case tell me how you would calculate the ratio of G to B on a randomly chosen ball. You expect that calculation to be different from your current answer, I do not.

Slight aside:

But there are *no* assumptions under which 50/50 is the correct answer.What about the assumption that the method of choosing to have another child is whatever we have right now on this planet? Doesn’t that come out at 50/50?

I should proofread more.

“One eighth lose 75% of

theirbets, and so on.”“Thus, the heavily losing families do

notadd as much to the population’s losing proportion as they do to its absolute level of losses.”“n the states with large families, they do not

getthe same…”Moral: read puzzle questions carefully! The puzzle is not about expected number of female births but about the expectation of a fraction.

Lots of puzzles are like this. The archetype is “When I was going to St Ives ….”

Harold:

1) What is the fraction of girls in all the countries added together?2) What is the expected fraction of girls in a particular country?

Let me try an analogy.

There are two ways a coin can land. Heads or tails. Now I ask you to imagine two coin-flippers. There are two ways you could interpret that:

1) Imagine two coin flippers, each of whom independently flips a coin. With probability 25%, they both flip heads, with probability 50% they flip a head and a tail, etc.

2) Imagine two coin flippers who span the set of all possible outcomes. One of them flips a head and one flips a tail.

Those are different scenarios.

When you talk about “adding up the girls in all the countries”, you must decide what you mean by “all the countries”. Is each country performing the same random experiment separately, or are these a set of representative countries that span all the possible outcomes? In the first case, we get a probability distribution of answers; in the second case, we get a single answer.

The problem asks for the expected fraction-of-girls in a single country with (say) 10 families that has gone through the random process once. If you add together 10 such countries, you’ve got the equivalent of a single country with 100 families and you have not substantially changed the problem.

Henry: Bingo. This is exactly right.

Onus Probandy:

Your population is a set of balls. Half of those balls have “B” written on them, a quarter has “GB” written on them, etc. In your analysis you choose from those balls at random, and have calculated the expected ratio of “G” to “B” on that randomly chosen ball. The question is nothing to do with the balls though. The question asks how many “G”s there are.Let’s do this for a two-family country. The two families have the following configurations with the following probabilities:

1/4 B/B

1/8 B/GB

1/8 GB/B

1/16 GB/GB

1/16 GGB/B

Etc. That gives me a 1/4 chance of 0% girls, a 1/8 chance of 1/3 girls, a 1/8 chance of 1/3 girls, a 1/16 chance of 2/3 girls, a 1/16 chance of 1/2 girls, etc. I can add all this up and get an answer that is not 1/2. (In fact it is log(4)-1, which is less than .4).

Here you can see that my unit is a two-family country. I can do the same thing for a ten-family country and get a different answer, which still won’t be 1/2.

What about the assumption that the method of choosing to have another child is whatever we have right now on this planet? Doesn’t that come out at 50/50?That assumption is not consistent with any interpretation of the stopping rule given in the problem. There are no assumptions consistent with that stopping rule that will give you an answer of 50%.

Henry: Bingo. This is exactly right.Henry is right.

Henry: Using the same summation as above, we can calculate the average number of bets a family loses (i.e. the number of girls it has): ~30.6%.The number of bets the

familyloses. Not “the number of bets lost” which is the original question rephrased.Onus:

The number of bets the family loses.You continue to ignore the twin facts that a) we can use families as *analogues* for countries. If an argument that seems airtight on the country level makes wrong predictions at the family level, then the argument is not airtight. And b) it is perfectly legitimate to ask what happens in a one-family country.

(Also c) the same phenomena that both Henry and I are talking about *do* happen in multi-family countries. It’s just easier to illustrate them in the one-family case.)

David Sloan and Onus have it right. Each term in the final displayed equation needs to be weighted by the number of children in the family. Otherwise the ratio in large families gets underweighted in the population.

Please look carefully at David Sloan’s units argument.

(I’m assuming the question we’re really asking is what’s the m/f ratio among the kids; that’s what we’re solving for in the equation.)

Oops, poor wording on my part – this should read average

proportionof bets the family loses (i.e. the proportion of girls it has) is ~30.6%.Sure, but the expected fraction of girls in a randomly-selected family isn’t what the problem statement asks for.

Tom:

David Sloan and Onus have it right. Each term in the final displayed equation needs to be weighted by the number of children in the family.Only if you’re asking a completely different question than the puzzle is asking.

Tom:

Sure, but the expected fraction of girls in a randomly-selected family isn’t what the problem statement asks for.The problem asks for the expected fraction of girls in a randomly-selected country. The calculation shows that the answer is not 50% in a randomly-selected one-family country. Similar calculations show that the answer is not 50% in a two-family, three-family or k-family country. No legitimate argument has been offered by anyone to suggest that the answer is 50% in any reasonable scenario, and in fact no such argument is possible, because there is no reasonable scenario in which the answer is 50%.

Steve Landsburg,

Steve,

No, I’m sorry, the puzzle is asking for the m/f ratio in the population. Not the m/f ratio of a family selected at random from a list of families. The latter is what you calculated.

Here: With your problem statement in hand, I walk into your neighborhood, from your example. I ask all the girls to stand on my left and the boys on my right. By counting them I can determine what fraction of the population of your neighborhood is female. I count them and I get 50%.

@Tom:

I think you misread what Steve’s example shows. Think of each family as a possible example of a country, or if you like as a possible future state of a 1 couple country. Or look back at my example where I analyzed 4 cases — four futures — for a two couple country. In each case the fraction g/b+g, or if you prefer g/b, is different in equally likely possible futures. So you need to average them (more generally find their expectation) But the average of fractions is not the fraction of averages.

Ken B,

What is the ratio of males to females in Steve’s neighborhood?

Tom:

What is the ratio of males to females in Steve’s neighborhood?My neighborhood happens to have exactly 3 families of Type A and 1 family of Type B. The ratio of males to females in that neighborhood is not the same as the expected ratio in a family drawn randomly from that neighborhood.

The neighborhood is the analogue of the universe of possible scenarios that could play out in countries where people follow a certain stopping rule. The individual families are the analogues of individual countries in that universe.

The problem asks for the expected ratio in *one country*, not in the sum of *all possible countries*. By analogy, you need to look at the expected ratio of males to females in *one of the families on my block*, not in the sum of all families put together. So the question you are asking is utterly irrelevant to the original puzzle.

Tom:

PS: If we were summing over all *possible* countries, as you seem to want to do, then there would be no “expectation” about it — there would be a ratio, pure and simple. But what’s being asked for here is an *expected* ratio in one country.

And PPS: Ken B’s most recent comment nails exactly the point you’re confused about. The average of fractions is not the fraction of averages.

I’m beginning to see. Thanks for sticking with me.

Etc. That gives me a 1/4 chance of 0% girls, a 1/8 chance of 1/3 girls, a 1/8 chance of 1/3 girls, a 1/16 chance of 2/3 girls, a 1/16 chance of 1/2 girls, etc. I can add all this up and get an answer that is not 1/2. (In fact it is log(4)-1, which is less than .4).I follow this.

But it doesn’t match up with your equation at the very top.

x = 1/2 * 0 + 1/4 * 1/2 + 1/8 * 2/3 + …

Surely when there are k copies of the probability distribution given in the table, the result when multiplied together is more complex than just the probability distribution from the first table?

(perhaps my maths is letting me down, that it’s a well known result I don’t know).

Onus: The equation you are quoting is the right equation for a one-family country. The new computation is correct for a two-family country. You have to decide in advance what size country you’re talking about before you can start doing computations. Different sized countries will give different computations, and hence different answers — but will never give the answer 1/2.

For those of you having trouble with this, here is a simple, but exaggerated, example to emphasize one of the key issues.

Let X be a uniform random variable from the interval 1 to 2.

Let Y be a uniform random variable from the interval 1 to 2.

Let X and Y be statistically independent.

What is the expected value for X?

What is the expected value for Y?

What is the expected value for the ratio X/Y?

What is the expected value for the ratio Y/X?

Even though the expected value for X is equal to the expected value for Y, the expected value for their ratio is 3*log(2)/2, which is bigger than 1. However, the probability that their ratio is larger than 1 is 1/2. This means that even though the expected value of the ratio is about 1.04, the ratio is equally likely to be larger or smaller than 1. Mean, median, and mode do not have to be the same.

So, if you are asked “is the expected value for X larger than the expected value for Y?”, your answer should be “no”. But if you are asked “is the expected value for the ratio of X to Y larger than 1?”, your answer should be “yes”. And if you are asked “is the expected value for the ratio of Y to X larger than 1?”, your answer should also be “yes”. But if you are asked to bet on whether or not the ratio of X to Y would be larger or smaller than 1, it wouldn’t matter which way you bet. The ratio is equally likely to be larger or smaller than 1.

In a similar way, expecting an equal number of boys and girls is not the same as expecting the proportion of boys (or girls) to be 1/2.

For this boy-girl problem with M mothers:

1) The expected number of boys is equal to the expected number of girls.

2) The expected proportion of boys is equal to 1/2 + 0.25/M

3) The probability of having an equal number of boys and girls is roughly equal to .2821/sqrt(M).

4) The probability of having more boys than girls is roughly equal to 0.5

5) The probability of having more girls than boys is roughly equal to 0.5 – .2821/sqrt(M)

Steve,

Sorry, I didn’t quite catch the number you got for the numerical m/f ratio in your neighborhood. But that’s ok, I see you gave a number in your post, 30.6% girls. So the m/f ratio you get from your analysis is more than 2:1 male:female.

This is an excellent test case for your analysis, since we can see that the actual value is 1:1 for your example. Equal numbers of girls and boys, as you posted.

If your formula gave 1:1, great. But say in your post that your formula gives more than 2:1. So the analysis isn’t correct.

That’s really all we need to do for now. There’s a serious error in your formula. It doesn’t handle your own example. You need to correct that.

This is very clear.

Oh, sorry, your analysis gives 75%, a 3:1 ratio, for your own example.

Unfortunately since the actual m/f ratio in your neighborhood is 1:1 the analysis is not accurate.

Tom:

This is an excellent test case for your analysis, since we can see that the actual value is 1:1 for your example. Equal numbers of girls and boys, as you posted.For what example? I have given no example in which the actual value is 1:1.

Also, of course, it’s not surprising that the number comes out differently in different examples. 30.6% in the example you cite; 75% in the families-on-the-block example. They’re different examples. Of course they have different properties.

The original question at MathOverflow says “What is the proportion of boys to girls in the country?” The correct answer to this question, if I understand the phrase “proportion of boys to girls” correctly (this seems to be nonstandard usage, as usually proportions represent the portion of one set belonging to some subset, rather than mutually exclusive sets), is “undefined”, since there is nonzero probability, with a finite # of families, that there will be 0 girls.

Here is another old chestnut which is related in a way. An example from days of yore: In the first half of the season Babe Ruth had a higher batting average than Lou Gehrig. In the second half of the season Babe Ruth had a higher batting average than Lou Gehrig. Over the whole season Lou Gehrig had a higher batting average than Babe Ruth. Explain.

An example explains

first half: BR at bat 1 time, one hit; LG 99 at bats with 98 hits.

second half: BR at bat 99 times with 50 hits, LG at bat twice, one hit.

Season: each at bat 100 times, LG 99 hits, BR 51 hits.

Tom:

Unfortunately since the actual m/f ratio in your neighborhood is 1:1 the analysis is not accurate.Okay, now I suspect you’re just being obstinate for the fun of it.

The actual expected ratio for a family in my neighborhood is 75%. Adding up over a perfectly representative universe of all families

takes all the probability out of the problem, which clearly converts it to a completely different and quite irrelevant question.Jonathan Campbell: Yes, I agree with you, which is why I changed the statement of the problem.

As an aside, aren’t all the answers not quite right? It looks to me like no-one is counting the parents in any of their sums. The question is after all about females not female births.

Tom: Perhaps this will help:

Here we have a country where people follow the stopping rule.

There are many ways the history of that country could play out.

Question 1: Without knowing how the history *actually* played out, choose a child at random. What is the probability that child is a girl? Answer: 50%. The analogue, in the families-on-the-block scenario is: Without knowing which family you’re choosing from, choose a child at random. What is the probability the child is a girl? Answer: 50%.

Question 2: Without knowing how the history actually played out, what is the expected fraction-of-girls? Answer: Not 50%. The analogue, in the families-on-the-block scenario is: Choosing a family at random, what is the expected fraction-of-girls? Answer: 75%; i.e. not 50%.

You persist in addressing Question 1 whereas the puzzle asks for an answer to Question 2.

Ken B:

As an aside, aren’t all the answers not quite right?The “correct” answer is model-dependent. It depends on what you assume about the initial conditions, about whether all the families have finished reproducing, etc. But I can think of no reasonable model in which the answer is 50%.

Steve,

;) That’s an amusing inversion of what’s happening here, and I do appreciate the humor.

Again: What happens if I go into your neighborhood and ask all the males to stand on one side and all the females on the other? The two groups will be equal. So the fraction of females in your neighborhood is 50%.

Your analysis gives 75%. Your analysis isn’t handling that example properly. It happens to all of us.

Steve:

The equation you are quoting is the right equation for a one-family country.Well no wonder I’ve had trouble. I thought that was the equation for an infinite population, and then “k” told you how many terms of this series to summate (in a non 1 to 1 manner).

For a population of k families, a similar calculation gives an answerSince that “similar calculation” is actually the answer. Can we have that equation?

@Steve:

I just mean we — the under 50 percenters at least — have been discussing g/g+b. But the actual fraction according to the strict interpretation should be (g+N)/(g+N + b+N) since you want females in the whole population, not female births in the child population. That change will not push the expectation up to 50%.

… by “have it”, I mean “have it explained”.. Douglas Zare’s explanation is way above my head. The leap from 2 to k goes a bit fast for me.

Prof Landsburg:

I’m only pointing this out because you “made” me spend so much time on this yesterday.

I believe you have an error in your 3rd bullet point:

”

Nevertheless, the fraction of girls in the average family is not 50%. It’s 75% (the average of 100%, 100%, 100%, and 0%).

”

The question was number of females, not number of girls. The puzzle assumed 2 parents so on your block you have:

3 families with four girls each (1 dad, 1 mom, 3 girls)

1 family with 12 boys (1 dad, 1 mom, 12 boys).

he fraction of girls in the average family is not 50%. It’s 65% (the average of 80%, 80%, 80%, and 8%).

Onus:

Since that “similar calculation” is actually the answer. Can we have that equation?It’s in the Douglas Zare post that I linked to in my last paragraph.

Edit: Oops. Just saw your followup comment in which you say you already looked at Zare but would appreciate more explanation. I’ll try to get to this later today.Tom:

I suspect you might just be too stupid to think about this problem.

Imagine 100 identical countries with 100 couples. Due to chance, some countries will have a lot of couples who have their sons early and the couples in other countries will tend to have their sons late. The proportion of boys in the “early” countries will be higher than the proportion of girls in the “late” countries. The “early” countries will also have smaller populations. That is, there will be large countries with lots of girls and small countries with not so many girls (the number of boys in all countries will be the same).

If you aggregate all the countries, the number of boys and girls is expected to be the same. If you choose a country “at random”, however, you are biased toward a large fraction of boys.

If you’re still not convinced, consider 2 countries with 1 couple.

In country 1, the couple has a son: fraction of boys is 100%. In country 2, the couple has two daughters and a son: fraction of boys is 33%. Aggregate population is two sons and two daughters, but the “expected” fraction of boys is (100+33)/2 = 67%.

Being that I’m trying to work for Google, I’d like to run this on a computer simulation.

I can create 100 “countries” with let’s say 1,000 couples. For each couple I could generate random girls and boys following the rules in the puzzle.

To determine the fraction of girls do I:

- Count the number of boys and girls for each country. E.g. 1400 boys and 600 girls equals 30%

or

- Count the fraction of each family. E.g. family1 = 0%, family2 = 50%, …, family3 = 75% and then average these fractions.

Thanks

@Will A:

Each country in your simulation is a possible future state. So you count the entrie population of the country and get a ratio for that country. Then if you have done the probabilities right you have a sample set of size 100 from the possible futures of one country. That will give you a sample mean which will approximate the answer. So sum over the country not over the families. The families thing is another analogy for possible futures is all.

Will A: You won’t show anything with your simulation though since with 1000 couples the expected ratio will be quite close to 50% in any case so your sample of 100 won’t be large enough to show a difference. If you want to show something run it with only a handful of families (say 5) in each country which should consistently show a ratio with a value other than 50%.

Tom:

I would recommend you read Thomas Bayes post at 11:27 AM it is a great explanation of what is causing you to be confused.

Also its a bit harsh to call you too stupid to understand the problem, I just think you lack the mathematical background to think about the problem in a different way than you currently do.

Furthermore your intuition is somewhat correct in that in a country with a large number of families the proportion of girls approaches 50% so Landsburg’s point, while interesting and enlightening, is a minor deviation from the slightly incorrect answer of 50%.

Steve,

Well, that is a different analysis, but not quite what I had in mind! To save time let’s assume that I am stupid, but that for some reason you need to address my argument instead of my personal measurements.

So. Have you walked around your neighborhood and counted noses?

Your problem statement asks what fraction are girls. That is Ngirls divided by Ngirls+Nboys. You gave numbers in your post:

Ngirls=12

Nboys=12.

It’s true that you didn’t carry out the division 12/24 in your post. But I’m betting on 50%.

This seems stupid. Even Steve admits that as K goes to infinity, the ratio of men and women approaches 1/2 from below. I think most people conceptualized Steve’s original problem with an implicit assumption that K goes to infinity (i.e. we are looking at a single country, with K families playing Steve’s game, and K can be assumed to be large).

Steve is now answering the problem when K is finite, and acting like everyone else got the problem wrong. This seems silly… We are debating an assumption about K being finite or not. Debating assumptions (we should all agree) is for the birds….

Disappointing.

From the original post

“But in expectation, what fraction of the POPULATION is female? In other words, if there were many such countries, what fraction would you expect to observe on average?” (emphasis on population mine).”

Steve, your answer is correct and very clever if only you worded it correctly. But I think you screwed the pooch in what was probably a last minute oversight trying to write the problem out.

Let Bi be the number of boys in any family, and Gi be the number of girls in any family. What you asked for was the expectation of the fraction of the population which is:

E(Sum(Fi over all i)/(Sum(Fi over all i) + Sum(Bi over all i)))

The correct answer to this is no doubt 1/2, I certainly hope that you don’t dispute that.

Your answer is for the expected proportion of each family which is an entirely different random variable:

E(Sum( Fi / (Fi + Bi)))

So as you can see the random variable that you asked about and the random variable you gave an answer for are NOT the same random variables.

@Bryan:

No I think the real point is that everyone solves the problem find E(g)/E(g+b) without realizing that that just isn’t right. I know I fell into the trap, as did it seems pretty much everyone. The puzzle highlights an interesting error. Just because the infinite population case converges to 1/2 doesn’t really matter. The thinking most of us used was subtly flawed when we first arrived at 1/2. I would never have thought twice until Steve noted that the obvious answer was wrong, I ‘d have stopped at the “aha its 1/2 moment”.

Bryan: Modern digital communication that enables all of the things we do with our cell phones wouldn’t be possible if someone didn’t understand the ‘silly’ things like the slight expectation bias that occurs in this problem. Is the expected proportion of girls ‘close’ to 1/2 for large K? Of course. Is it useful to understand the way it deviates for finite K? I think so.

If I try to estimate the mean from several (K) independent samples of a random variable that has a mean and variance, then the sample mean will have a variance that is equal to the variance for the variable divided by K. Would it be silly to note this, or should I just say that the variance is equal to 0?

Doug: “The correct answer to this is no doubt 1/2, I certainly hope that you don’t dispute that.”

If I interpret your notation correctly, then you are referring to the expected value for the ratio of total number of girls in the population to total number of boys and girls in the population. That value is:

0.5 – 0.25/(number of families)

Your second example:

E(Sum( Fi / (Fi + Bi)))

would evaluate to the number of families multiplied by the expected value for the ratio for a single family. Neither would be equal to 1/2.

Doug,

Thanks for providing hard evidence that the original problem was meant to assume that K (the number of families or “countries”) was large.

Can we all now agree that what we are debating today is whether K is finite or not? And can we further agree that differences of opinion on this matter tell us nothing about anyone’s “cleverness”?

@Doug:

No you are getting the equation wrong. Forget families; that was Steve’s analog to alternate possibilities. SL is not summing families. If SL’s post isn’t clear go back and read posts by Thomas Bayes and myself where I think your problem is addressed.

It is critically important whether we are fixing the number of completed families (i.e. ones who have had their boy and stopped reproducing) fixing the number of children.

If you consider only completed families, you will miss all the ones who are still working on their boy, and have some (possibly large) number of girls already. If this is the question, then yes, the expected fraction of girls will be <50% – you implicitly discarded all the families who consist (so far) only of girls!

If you consider all children, regardless of whether their parents have stopped having more, you get the expected 50%.

Simulations bear this out:

1) repeatedly generate a single family, and "average" together the gender ratio in each family, weighted by number of families. Result: average converges towards ~0.306 (the figure quoted many times above for a single family's expected ratio)

2) repeatedly generate a single family, and "average" together the gender ratio in each family, weighted by number of children in that family. Result: average converges towards 0.5

3) repeatedly generate 10 families in a batch, and weight by # of families. Result: converge towards 0.475 (the figure quoted above for 10 families' expected ratio)

4) generate 10-family batches, and weight by number of children. Result: 0.5 again

5) generate k-child batches, from as many families are needed to produce this many. Average together the gender ratios (implicitly weighted by # of children in the batch, since they're all identical). Result: 0.5 again

Note: in cases 1 and 3, my "units" arguments still applies. The fact that I have written code to simulate these scenarios does not mean I believe the math used to compute the "average" after the fact is meaningful. Python doesn't understand units; it's just doing arithmetic. In all the cases where I weight averages by number of children, I get 0.5. In all cases where I weight by number of families, I get the <0.5 (i.e. Douglas Zare's formula) result.

Steven: if I have missed a simulation scenario that you consider relevant, please let me know. I'd be happy to add it to my suite of test cases. I'd also be interested to know what you expect the results to be. ;)

If you intended to ask one of the questions answered by simulation 1 or 3, I believe your original wording was flawed: you asked for "fraction of the population", not "ratio of the average family".

@Tom:

Steve has perhaps confused things with his example about his fecund neighbourhood. (Is there something in the water where you live Steve?) He should have stated more clearly that he is mapping the country to a neighbour, and that the various neighbours correspond to various possible final states of the (one couple) country. For each of these final states you (more or less) find g/g+b, and average those averages. You don’t get 1/2. You don’t ever get one half for any finite number of families.

Seriously, read what some of the others have said, like Thomas bayes, or my example early on.

Ken B and Thomas,

I agree with you that the finite case of K is important. My point is that most people were not answering that specific question. Thus, saying they had the “wrong” answer is “silly” – most people had the right answer to a different question.

At this point, I think everyone understands the question Steve was trying to ask, and agrees with his answer. The only debate is what question did Steve actually ask? This debate strikes me as futile, as its answers lays in the perceptions inside of each person’s head….

Time for a thread hijacking! Here is another problem in probability from the site Steve got this from

An ordinary deck of cards, face down, is placed in front of you in a stack. A dealer turns the top card of the stack face up and puts it on a separate pile, and does this repeatedly until you say “now”. At that point he turns over the next card and stops. You can say “now” at any time from the very beginning (before the first card is turned over) until almost the very end (just before the last card is turned over). You win if the last card turned over — the one turned over just after you say “now” — is red.

What is your strategy, what is it’s rate of return, and can anyone do better?

In your explanation at the top of the page you ignore families with only girls – i.e. all the families who haven’t had a boy yet. The question you answer might be more interesting, but it isn’t the question in the blue box.

Ken B:

–

This is off topic but I infer you might be a Bayesian. Do you (or Steve) know any good prose explanations why Bayesianism is the correct approach? I don’t want a textbook, I want an apologia pro vita bayesian. I had a good frequentist upbringing but have had heretical doubts for a while now.

Thanks in advance.

–

Another suggestion in addition to the one Steve gave would be a paper by E.T. Jaynes:

http://bayes.wustl.edu/etj/articles/general.background.pdf

There are others here:

http://bayes.wustl.edu/etj/node1.html

Some quotes that might prime your interest:

–

“Starting with the debates of the 1930′s between Jeffreys and Fisher in the British Statistical Journals, there has been a puzzling communication block that has prevented orthodoxians from comprehending Bayesian methods, and Bayesians from comprehending orthodox criticisms of our methods. On the topic of how probability theory should be used in inference, L. J. Savage (1954) remarked that ‘there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel’ “.

–

“In Bayesian parameter estimation, both the prior and posterior distributions represent, not any measurable property of the parameter, but only our own state of knowledge about it. The width of the distribution is not intended to indicate the range of variability of the true values of the parameter, as Barnard’s terminology led him to suppose. It indicates the range of values that are consistent with our prior information and data, and which honesty therefore compels us to admit as possible values. What is ‘distributed’ is not the parameter, but the probability.”

–

mobile: Excellent comment; thanks.

Will A:

- Count the number of boys and girls for each country. E.g. 1400 boys and 600 girls equals 30%

or

- Count the fraction of each family. E.g. family1 = 0%, family2 = 50%, …, family3 = 75% and then average these fractions.

Count the fraction for each country. Then average those fractions.

You should get approximately 1/2 – 1/4000 (if you’ve got 1000 couples). It might be hard to distinguish this from 1/2 just by eyeballing, but the difference is there.

Tom:

Your problem statement asks what fraction are girls. That is Ngirls divided by Ngirls+Nboys. You gave numbers in your post:Ngirls=12

Nboys=12.

We both agree on these numbers. What you don’t understand is that they are irrelevant.

I think, but am not certain, that much of your confusion stems from failing to recognize that each *family* in the neighborhood example corresponds to an entire *country* in the original setup. So let’s start over with a modified example designed to erase that confusion:

There is ONE family in my neighborhood. They are about to flip a fair four-sided coin to decide how many children to adopt. If the coin comes up on any of the first three sides, they will adopt four girls. Otherwise, they will adopt twelve boys.

I make the following claims. If you disagree, please tell me which specific claim you disagree with:

a) In expectation, this family will adopt three girls and three boys.

b) In actuality, they will adopt neither three girls nor three boys. They will instead adopt either four girls (having 100% daughters) with probability 3/4 or twelve boys (having 0% daughters) with probability 1/4.

c) In view of b), the expected value of of the girl-fraction is 75%.

d) If we add up all the hypothetical children in all four hypothetical outcomes, we get four girls plus four girls plus four girls versus twelve boys. This is a 50% fraction of girls.

e) However, that 50% ratio is quite irrelevant to the problem, which asks not about *hypothetical* children but about *actual* children.

f) To summarize: 50% is the right answer to the wrong question. It is the answer to a question about what happens if you sum up over a bunch of hypotheticals. 75% is the right answer to the

rightquestion.Bryan:

Even Steve admits that as K goes to infinity, the ratio of men and women approaches 1/2 from below. I think most people conceptualized Steve’s original problem with an implicit assumption that K goes to infinity (i.e. we are looking at a single country, with K families playing Steve’s game, and K can be assumed to beBut even when K is large, the “standard” argument doesn’t work. You need a *different* argument (and a much more technical one) to get the result that when K is large the result is near 1/2. Even if the result were *exactly* 1/2 (which it isn’t), it wouldn’t change the fact that the standard argument is

wrong.Doug:

E(Sum(Fi over all i)/(Sum(Fi over all i) + Sum(Bi over all i)))The correct answer to this is no doubt 1/2, I certainly hope that you don’t dispute that.

I absolutely dispute that. See the calculation in the post.

PS: Note that in the case of a one-family country, the two expressions you’re trying to distinguish are in fact identical.

Ken B:

@Tom:Steve has perhaps confused things with his example about his fecund neighbourhood. (Is there something in the water where you live Steve?) He should have stated more clearly that he is mapping the country to a neighbour, and that the various neighbours correspond to various possible final states of the (one couple) country. For each of these final states you (more or less) find g/g+b, and average those averages. You don’t get 1/2. You don’t ever get one half for any finite number of families.

I’ve explained this to him about six times. I don’t think he’s interested.

Guy:

In your explanation at the top of the page you ignore families with only girls – i.e. all the families who haven’t had a boy yet. The question you answer might be more interesting, but it isn’t the question in the blue box.As I’ve explained multiple times in this thread (though I realize you might not have read all the comments!), the precise answer depends on your modeling assumptions, including whether you assume that all the families have stopped reproducing. I answered the question on that assumption by way of illustrating an approach. With different assumptions, you’d get a different answer. But it still wouldn’t be 1/2.

David Sloan:

If you consider all children, regardless of whether their parents have stopped having more, you get the expected 50%.I do not believe this. Do you have an argument for it, or are you just making it up?

David Sloan:

Your simulation 1) gives the right answer. Your simulation 2) is irrelevant to any reasonable interpretation of the problem. Your simulation 3) gives the right answer. Your simulation 4) is irrelevant to any reasonable interpretation of the problem. I don’t get exactly what you’re doing in 5).

All your extraneous simulations are revealing is that E(Girls)/E(Kids) = .5. We know this. But the problem asks for E(Girls/Kids), which is not at all the same thing.

@Steve:

You might be right. I find it kind of amusing in general that in a thread with a clear cut correct answer — which you give — we get so much argument and tsuris, but in a more subjective and woolly thread — where I think you get it quite wrong (The law is an ass) — we don’t see a fraction of that! Maybe this isn’t so surprising. The real dichotomy in life is between those willing to be wrong, and those not. The former do quantitive studies and the latter do lit-crit.

@TB & Steve:

Thanks. The book is ordered, the paper is printed. Merry Christmas.

Ken:

I find it kind of amusing in general that in a thread with a clear cut correct answer — which you give — we get so much argument and tsurisThanks for doing your part to help others understand that answer.

I do see why the answer might be hard to grasp at first. What puzzles me is the folks who care so much about it that they’re willing to keep making the same false arguments over and over and over, but don’t care enough about it to take the trouble to understand what you, Thomas Bayes and others have explained so carefully and clearly.

Thanks for posting today. I would have hated to spend the holiday weekend thinking about this only to find I was going down the wrong path. I see now what you meant when you said it was strictly a math reasoning problem. I first thought there was some interesting and counterintuitive demographic puzzle here.

@ Dave B:

”

If you want to show something run it with only a handful of families (say 5) in each country.

”

@ Steve:

You told me to count the fraction in each country. I did this. Here are the results from my pass at it. Notice that the Avg % girls in each family is consistent with your solution. Especially when run with 100,000 couples per country. Here are some sample runs:

Couples/Country Avg. % of Girls Avg. % Girls Each Fam

1 29.01(85 max/0 min) 29.01(85 max/0 min)

4 46.94(80 max/0 min) 33.84(80 max/0 min)

100,000 49.99(50.2 max/49.8min) 30.68(30.9 max/30.5 min)

So based on my simulation, I would expect that given a sample of 4 couples in a neighborhood, the neighborhood would have ~41% girls. And that the average fraction of girls in each family would be ~30%.

Now of course my simulation could be invalid. If my simulation is incorrect, it would be helpful to know how to “correctly” simulate this.

If my simulation is valid, then there is a difference between the Average % of girls in each country and the Average % of girls in each family for countries with more than 1 couple.

Will A: Your results sound exactly right.

Your second simulation essentially treats each family as a separate country. So you’ve got a result for four-family countries and a result for one-family countries. Those results are exactly as expected.

Sorry about the above post, I should have put the table in an html table. Here is another run:

Couples/CountryAvg. % of GirlsAvg. % of Girl/Fam.

130.1 (90max/0min)30.1 (90max/0min)

445.2 (80max/0min)32.8 (76max/0min)

100,00049.99 (50.2max/49.8min)

30.68(30.8max/30.5min)Steve:

It’s hard to believe that I’m exactly correct. However, I think that if I was trying to get a programming job at Google, I believe that my solution is “better” than yours:

Write an algorithm that generates different answers based on what is meant by “fraction of population” then ask the user what they meant. The is basically the job of a programmer.

So lets see if I get it. There will be lots of different countries with different populations. The ones with small populations will have more boys than girls. The ones with large populations will have more girls than boys. Therefore selecting a country at random gives an expected fraction that is not 50%, because there are more countries with low population and more boys than those with high population and more girls. This allows me to square the ideas that there are equal numbers of boys and girls overall, but the expected fraction of girls in any country is not 50%.

If my simulation is correct, the smaller the population is, the more likely it is for the

percentageof girls to be less than 50%.I just looked at this article:

http://www.businessinsider.com/15-google-interview-questions-that-will-make-you-feel-stupid-2010-11

which includes this question and 14 others that have supposedly been asked at Google interviews. This one caught my eye:

“Why are manhole covers round?”

I have a colleague who told me that he was asked a similar question by a professor during his PhD qualifying exams. His question was “Why are manholes round?”, which is better, I believe, because a very good answer to the supposed Google question is “Because manholes are round.”

Cheers.

Joining the conversation this late, I can’t claim to have digested every comment in the thread, but I do think I follow the mathematics in Steven’s and Douglas Zare’s posts. (As an aside, I worked for many years at a company – not Google – that asks questions like these to interview candidates, and the math PhDs, including myself, who ask these question do debate points like this internally amongst themselves at great length.)

Steven’s point (or really, Douglas Zare’s) that E[G/(G+B)] != E[G] / E[G+B] is a fair one. It seems to me, however, that the real quibble with Google’s interview question isn’t the answer but the phrasing of the question. It seems clear to me that the intended limit for any real country is k -> infinity (where k is number of families), and that the “wrong” answer is acceptable in an interview as a hand-wavy, “in the time available” argument that the limit of the digamma expression for large k is indeed one half. Certainly I wouldn’t be terribly impressed if Steven trotted out his 30.6% answer in an interview, since the case “k=1 family” wasn’t asked in my question and isn’t reasonably implied in any way I can see.

JamesL

Good luck. The attempt here was to produce a real-world example of a case in which the expectation value of the ratio differs from the ratio of the expectation values.

Unfortunately, as David has shown exhaustively, the two numbers happen to coincide in the problem statement given here. So we’re going to be subjected to

ad hominems,weird constructions like ‘countries’ that consist of a single family each, etc., until we give up and go home.If we can’t handle the ensemble of Steve’s neighborhood–which consists of the single point 12G,12B & has expectation value of the fraction of girls 100%*(12/24)–then we certainly aren’t going to get anywhere with whole countries.

Just my 2 cents.

This example from Steve’s post is accurate, and gets his point about ratio of expectations across very well:

Anybody who wants to try and shoot

thatdown, go at it. Steve will win, the discussion will die down, and Steve will be able to make his point about expectation of ratios.@ JamesL:

It is probably reasonable to assume that in any “real” country, there is one person who doesn’t want to have a boy.

Therefore, since we are considering countries where everyone wants to have a boy, we must be considering imaginary countries.

By considering the case where k=1 family, Prof. Landburg is taking into consideration the infinite number of the imaginary countries with 1 family.

Most people seem to accept the fact that the expectation of a ratio is not the same as the ratio of the expectations. That’s good. Now the issue seems to be whether or not the difference between the correct and incorrect answers — the 0.25/K term — is important.

Suppose the Google question was this:

–

X_1, X_2, . . ., X_K are independent, identically distributed random variables with mean 0 and variance V. S is the sample mean, which is equal to the sum of these variables divided by K. What is the expected value of S*S?

–

One approach would be to determine that the expected value of S is equal to 0, and then to declare that the expected value of S*S must be 0.

Another approach would be to determine that the expected value of S*S is V/K.

We might say that V/K is very close to zero for large values of K, but that wouldn’t make the first answer correct. Would you accept an answer of zero for this question? If not, why be comfortable with an answer of 1/2 for the boy-girl question?

@ Tom:

”

Suppose there’s just one family, that randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). In that family, the expected number of girls equals the expected number of boys, and the expected fraction of girls is still 75%.

”

This is incorrect as it relates to the puzzle. The puzzle asked about females. If you say that the expected fraction of girls is 75% you are not taking into account the mother in the family who is a female.

The argument as I see it is whether Prof. Landsburg’s answer applies only to countries that have 1 family or “the fraction of girls in the average family”.

I’m arguing that the phrase “fraction of the population” doesn’t have the same meaning and therefore value as “the fraction of girls in the average family” when dealing with countries that have more than 1 family.

We might say that V/K is very close to zero for large values of K, but that wouldn’t make the first answer correct. Would you accept an answer of zero for this question? If not, why be comfortable with an answer of 1/2 for the boy-girl question?

Two reasons. One is that the original question was not phrased in a way that would make it obvious that Steve was asking for the expectation of a ratio. “What fraction of the population is female” does not convey to me that he is asking for a certain mathematical expectation. That is why mathematics has its own syntax and definitions — English is not precise enough to express many concepts clearly.

Second, the example of sex ratio in a country is a terrible one. I cannot think of any realistic situation where someone would be interested in the number of girls born in a country where they would care that the mathematical expectation of the ratio is slightly different than 1/2.

It may have been better to ask the question as some sort of esoteric gambling decision as someone else alluded to. Or maybe something about signal processing, as TB alluded to. Something where the difference might actually matter in a realistic situation.

Will A.,

The quotation in my comment is from Steve’s post, in the section labeled

Edit. In that excellent example of the difference between expectation-of-the-ratio and ratio-of-the-expectations, Steve really is talking about kids only.W/r/t the problem statement, I think you’re right and my own initial calculations of the expectation of ratios included parents, but personally can I plead that we not try to steer this out-of-control discussion through that extra curve at this late stage? Everybody’s been talking about kids only, that algebra is a little simpler, and it’s really just a convention. No baby is thrown out with that particular bathwater.

@ Tom:

I am fine dealing with kids only, as long as we discuss the impact that random births have.

Let’s say I asked the following question:

Consider a country where on average each couple pairs for life and has on average 2 kids each. On average what percentage of the population never creates a couple and therefore never has a child?

I would submit that there will be times when there is never exactly the same number of males and females. Therefore, there will be will be people who never have kids.

Since the birthrate is 2 children per couple and some people never have kids, the population will eventually go to 0. It may take a while, but it will go to zero.

If this is correct than any theoretical country (e.g. countries with only 1 family) has a population > 0 for a finite time and a population of 0 for an infinite amount of time.

If this is correct, then 0 seems like a valid answer to the question.

Harold:

So lets see if I get it.You get it! Not only that, but based on your history around here, I was certain you would.

JamesL:

It seems clear to me that the intended limit for any real country is k -> infinity (where k is number of families), and that the “wrong” answer is acceptable in an interview as a hand-wavy, “in the time available” argument that the limit of the digamma expression for large k is indeed one half.The reason I would find this an unsatisfactory answer is that, without a supporting calculation, it is nothing more than an unsupported claim. Even if 1/2 is the right answer, I am unimpressed with a candidate who simply guesses it correctly but can’t defend it.

Tom:

Anybody who wants to try and shoot that down, go at it. Steve will win, the discussion will die down, and Steve will be able to make his point about expectation of ratios.Unless I’m failing to detect sarcasm, you seem to have finally gotten this point. The point you’re still missing is that this example illustrates *exactly the same thing* as the four-family example.

Will A:

This is incorrect as it relates to the puzzle. The puzzle asked about females. If you say that the expected fraction of girls is 75% you are not taking into account the mother in the family who is a female.In fact, it relates directly to the puzzle. It gives an explicit example in which the expected numbers of boys and girls are equal but the expected ratio is not 50%. It follows, then, that proving that the expected numbers of boys and girls are equal cannot address the question asked in the puzzle. The fact that this example might differ from the puzzle example in other ways is not relevant to that key point.

ErikR:

Here is Steve’s original statement of the question:

“But in expectation, what fraction of the population is female? In other words, if there were many such countries, what fraction would you expect to observe on average?”

He even used bold font for the word ‘expectation’. It was clear to me that he wanted us to think about the expectation of the fraction of the population that is female. Sure it’s a quirky question, but it contains an important lesson about the expected value of ratios. And based on the early responses to the question, it is clear that many people are prone to making the mistake that this question helps identify.

Steve,

Please don’t waste both of our time with more

ad hominemremarks.There’s no sarcasm at all in my post. Of course the expectation of a ratio and the ratio of expectations are always

calculateddifferently. Your adoption example is a great illustration of a case where that leads to anumericaldifference between the two quantities.The only reason the remainder of the discussion has gone off the rails is that unfortunately in the original puzzle, boxed in your post, there happens to be no difference

numericallybetween the expectation of the ratio and the ratio of the expectations. It’s bad luck. They both happen to come out to 0.5. Dave has provided you with detailed calculations showing why this is true, but unfortunately you’re not listening to him.Seriously, the problem here is not that you have no commenters who took a freshman probability class. Some of us have PhDs too, we know this elementary stuff too (and we make mistakes too, and we don’t think that makes us or you stupid).

In the case of your puzzle the problem is that you left out, in the final display equation in your post, a weighting factor in each term proportional to the number of children in the corresponding family. You need that factor because when you select a member of the overall population, your chance of getting a member of a given type of family is not only reduced by that family type’s unlikeliness (the 2^n factor in your final formula) but somewhat increased by the size of the family (a linear factor missing from your formula).

If you add the missing factor, you’ll (unfortunately, and I mean that) get 0.5 for the expectation value of the fraction of girls. Unfortunately this problem is a weirdie that doesn’t differentiate between expectation of ratios and ratio of expectations. Your general point is FINE, everybody acknowledges that, this is just a piddling technical issue.

Best of luck. Sincerely!

Thomas Bayes:

That is not the original question. You are quoting Steve’s rephrasing of the original question. This is a simple case of pounding square pegs into round holes, and then acting shocked (shocked!) that it is difficult to get the pegs to go in smoothly.

If the goal is to act superior to others who just don’t get it, then taking a common question and rephrasing it in an absurd way is a great way to accomplish the goal.

If the goal is to help inform people about the difference between the mathematical expectation of a ratio and the ratio of expected values, then this was a terrible question to start with, since it is difficult to imagine a realistic situation where people interested in the number of girls born in a country would care about the slight difference.

Tom: My reference to sarcasm was not intended as a personal attack; I was genuinely unsure (not knowing you, really) of whether you intended your statements sincerely or sarcastically. I was pretty sure of the former but didn’t want to be presumptuous.

You continue to be dead wrong about the specifics of the puzzle, as is shown by a) the calculation in my original post, b) the calculation in the Douglas Zare post I linked to, c) the (rather brilliant, in my opinion) Taylor Series calculation posted by Thomas Bayes in comments to the original post and d) several other arguments by commenters who have succeeded in understanding the issue.

Hello, everyone. Like everyone else, I had a really hard time accepting the idea that the expected fraction could be anything other than the biologically-mandated 50%. The following extreme example helped it become intuitive for me:

Suppose a couple decides that if their first child is a boy, they will stop having children. If it is a girl, however, they will go on to have 100 more children.

They have a 50% chance of having a boy, and a 50% chance of having 101 children that will probably be about half boys and half girls. So the expected fraction of boys is approximately 75%. There is nothing really subtle about it – that is just how expected values work.

Someone mentioned gambling – you could easily use this strategy to have an expected rate of return of about 50% on a trip to Vegas, but it would not be a profitable strategy. The same obviously applies to stocks and could generate some attractive-looking but misleading claims for, say, advertising a mutual fund.

I wonder if might be better to simplify the question with a different stopping rule, e.g. “Parents have two children unless their first is a boy.”

For any given family, there’s a 50% chance they have 0% girls, a 25% chance they have 50% girls and a 25% chance they have 100% girls, for an expected fraction of 37.5% girls. This differs from the expected value because 100% boy families only have +1 boy, whereas 100% girl families have +2 girls.

Paul G and Henry: These are good examples. Thanks.

@Thoma Bayes:

Why are manhole covers round? This is a well known interview question with quite a lore. My answer is its easy to manipulate going back into the hole: won’t fall in, unlike most shapes, and doesn’t require rotational corrections. But the very BEST answer I have heard is this: “Its the medieval principle of ‘as above as below’. The circle is the most divine shape and the manho;e cover serves to remind those passing by and looking down that we live in a divinely ordered universe.”

PaulG,

The confusion has nothing to do with biology. The confusion arises because Steve is not calculating the expected fraction of females in a country. He is calculating the expected fraction of females in a family. A number of commenters here understand perfectly well what Steve’s doing, and it’s ok in its own right. It’s just not what the problem asks for.

The error and terror comes in when, in an attempt to gloss over Steve’s deviation from the problem statement, people start trying to make each family represent a country. That’s just silly.

The answer for the expected fraction of females in the country mentioned in the problem statement must converge to this: walk up to a member of the country and ascertain whether they are male or female. Repeat and average.

Steve’s answer fails that test. That is the source of the debate.

@Tom:

Have you actually read any of the comments where I or Thomas Bayes or others have clarified the family thing? I know you don’t accepts Steve’s explanations, but he ain’t the only one in there pitching. You are just simply misunderstanding. Maybe Steve’s example was pedagogically ill-chosen as being easy to misunderstand, but the issue really has been clarified enough. Family or country the issue is *from a base group of couples….* The families are example base groups.

@JamesL:

Then you’d make a bad interviewer. Anyone on the under 50% side here BOTH gets the expected answer (50%) AND sees why it isn’t really an answer to the question actually asked. Not only does that show greater math competency it shows a better attention to the small details of definition that can be so important in programming.

Tom: You are

definitelytoo stupid to think about this problem.Ken B,

I read them, but they just repeat Steve’s calculation of the female fraction in the average family. That’s a cooler calculation than what the problem statement actually asks for, Steve does it correctly, and it shows off the expectation(ratio) vs expected(g)/expected(g) distinction numerically. It’s wonderful.

There’s only one problem with it, it doesn’t answer the question posed in the problem statement. (I’m not talking about something vague about ‘pedagogy’ or ‘clarity.’ Everything’s perfectly clear.)

From the problem statement:

Steve (

et al.,ok?) aren’t calculating that. Instead he’s calculating the fraction of girls in the average family. You folks have even beenacknowledgingthat. (Sometimes. Sometimes some people try to call families ‘countries’ and construct an ensemble of those ‘countries.’ Cute but still not what the problem statement asks for.)It’s not the same thing. The fraction of girls in the

population, which iswhat the problems statement requires, is a slightly different calculation from the fraction of girls in the average family. This is because in a country large families constitute more of thepopulation. You can confirm that the wordpopulationappears in the problem statement, while “*base couples*” does notI can provide my calculation for the expectation value of the g/(g+b) ratio in such a country, but Dave has already done so and apparently nobody in the “average over families” group can understand what he’s saying. So, forgive me if I’m blunt, but I don’t have high hopes.

Again, I hope you’ll forgive my bluntness, but you guys are getting the whole context wrong here. You’re not explaining the correct answer to recalcitrant undergrads, you’re refusing to check your work and persisting in an elementary error.

Best of luck.

Tom:

I’m not sure this will help, but here is what I believe is important to learn from this problem:

1. The expected number of boys and girls born to a generation of families in a country will be equal. So, if the number of boys born is B and the number of girls is G, then you can expect B to be equal to G in the sense that the expected value of B-G is zero. This will not depend on the number of families in the country. It will be true for 1 family and it will be true for a trillion families. If this was the question, then Steve and others would not need to illustrate what happens with 1, 2, or some other finite number of families.

2. If, however, you are asked for the expected proportion of boys (or girls) in the country, then the answer will depend on the number of families in the country. To demonstrate this, Steve and others have shown that the expected proportion of boys deviates far from 50% when there is only one family in the country. It gets closer to 50% for two families in the country, and closer yet for three. The expected proportion of boys is different from 50% by an amount that is roughly equal to 25% divided by the total number of families.

3. Providing the expected proportion of boys (or girls) in a single family is not addressing a different problem. It is a way to show that the expected value for B-G is zero for any number of families, but the expected value of B/(B+G) depends on the number of families.

4. There are two reasons that it is important to recognize and not ignore the fact that the expected proportion of boys is not equal to 50%: i) nearly all of the ways that people arrive at an answer of 50% are technically wrong; ii) arriving at an answer of 50% by ignoring the 1/K term in the correct answer is somewhat ‘okay’, but, if you do this, you shouldn’t have any reason to be critical of an answer that retains the 1/K term.

Where in this do you think there is an elementary error?

I know I’m late to the party, but Dick Darlington made this point:

“Problem is, if one looks at the population at a given time, this situation is very unlikely, as a certain fraction of the couples would be in the process of having girls while waiting for a boy. By the time they all have a boy each, new families will be created, etc. In other words: at any point in time, there will be a certain number of women who don’t have a brother.”

True, however, I think this is cancelled out by the fact that assuming all residents of the country have the same lifespan, then just as there is an initial period where some girls in a family have been born but no boys have been born yet, there is a period at the end of their lives where the firstborn girls have died but the later-born girls (and the final male child) are still alive.

Thomas,

Your answer is fine. Countries have lots and lots of families, and so obviously the answer is much closer to 50% than to 30.7%. The limit of a single-family country is very far from what the problem statement requires. If Steve, in his post, had gotten the answer “very close to 50%,” then I would never have commented. It was the incorrect answer, 30.7%, and the absurd attempts to defend it as the correct numerical answer, that generated the controversy. I think you realize that pretty well.

I have no problem with calling 30.7% the first estimate in an infinite sequence that converges to 50% but that is very slightly less than 50% for any country of finite size. It’s fine! I suspect Dave will breathe a sigh of relief as well. If the original post had done that, then I would never have commented.

Correctly-calculated answers for that come out to 50% in the limit of a large country have been presented again and again in this thread. The people who made those calculations understood these elementary issues just fine. (This is undergrad stuff, expectation of ratios, ratio of expectations. Come on!) We received dismissive, deprecatory, and absurd responses. (“Irrelevant,” “stupid,” “don’t have the math background,” “single-family country,” etc. )

The correctly-calculated answers have been repeatedly dismissed in favor of an extremely crude calculation that gives a big error. That’s the problem.

Let me add some confusion.

To ask what is the expected sex ratio in a country, one must assume a population of countries. Then, if you chose one country at random from the population and counted the boys and counted the girls and divide one count by the other, what quotient should you expect to see before you do the count? Thomas Bayes says it depends on the total population of the country, but you won’t know what that is until you do the count (unless all countries are of the same size, but we have no reason to assume that.)

Now if you first count all children before you count them by sex, you can make a conditional expectation based on the total. I haven’t done it, but Thomas Bayes is very smart, so I assume his answer is correct for this conditional expectation.

OTOH, I think Tom is worried about the unconditional expectation. To answer that, I assume that you need to know something about the distribution of country size in your population. It is beyond my skill set, but I am guessing that if country size is a random variable, the unconditional expectation is 50%

That is, proportion of 50%.

I think JamesL would make a good interviewer. More important than pointing out minor mathematical distinctions — in a problem where such distinctions are not useful to anyone — is to understand what is useful to most people based on a somewhat ambiguous statement. When considering girls born in a country, computing

E(G) / ( E(G) + E(B) )

is almost always going to be a quantity that is at least as useful to people as

E( G / B )

No need to waste time on the latter when the former will do. Then you have more time to work on the really useful problems, those problems that you can solve that will be useful to many people, rather than wasting time on minor details of a problem that will be useless to almost everyone.

In the real world, choosing the right problem to work on (and partially solving it) is frequently vastly more important than comprehensively solving every minor detail of an obscure problem.

So, as an interviewer, I’d be most impressed by a candidate who hand-wavingly responded that the answer was close to 50%, and then, perhaps, talked in more depth about the solution to another problem that is only vaguely related to the one asked, but is much more relevant to most people.

Neil:

The unconditional expectation is the expected value of the conditional expectation. That is, if we know the expected proportion of boys for a particular number of people in the population, then we can assign probabilities to every possible population size and compute an average. I don’t know what would be a good assignment for these probabilities, but I do know that if every conditional expectation is less than 50% (and they are), then there is no way for the unconditional expectation to be 50%.

Steve, I know you will view the following problem as a different problem than your proposed problem, but I am curious how you would answer the following:

Suppose we have a single country of K families, each of which randomly decides whether to adopt four girls (with probability 75%) or twelve boys (with probability 25%). Suppose K is large.

Now suppose you had to sample 100 children from this country, and you had to make your best guess about the fraction of those children who were male. What would your guess be? Additionally, please define, mathematically, what expectation it is you’re taking when you form your best guess of the fraction of males?

(I now appreciate that this was not the spirit of your original question, but I still think many people thought it was. This doesn’t make them “wrong.” They just answered the wrong question.)

If nothing else, this blog post proves your unique talent for stirring the pot (which is an important one).

@Steve:

This’ll learn ya. Had you said instead of 4 neighbours that you had 4 neighbouring countries ….

Bryan: The expected ratio of girls to boys in your problem is slightly greater than 1/2. For a subsample of 100, the expected ratio is also slightly greater than 1/2, by some amount that would not be too difficulat to calculate.

My actual guess — whether I shaded it above or below the expected value — would depend on the consequences of being wrong and the consequences of being right.

Thomas Bayes,

Yes, that is obvious now. Unless you have a population of very large (infinite) sized countries, the expectation of the proportion of boys must exceed (I assume you meant to say) 50%

A less controversial question would have involved a forest next to a city where everyone in the city is deaf during the half the day.

If a tree falls and makes a

soundpeople plant a male tree in their backyard and stop planting trees.If a tree falls and doesn’t make a sound, then people plant a female tree in their backyard, but don’t stop planting trees. What is the expected …

*** feel free to not post if there are already too many posts ***

Okay, you ask “what fraction of the population is female.” I have yet to see a convincing argument that that isn’t G/(G+B)

Your “tricky solution” is really the answer to the question, what is the average fraction of girls across all families? Your a-ha moment is really just that you’ve changed the question without telling us.

Michael: Of course it’s G/(G+B). Who said it wasn’t?

@ Michael:

I believe that there are 2 possible ways to read Prof. Landsburg’s answer.

The first is to read it as the answer is 30.86 for any country of any size.

The other way to read it would be the 30.86 would be answer for the case of countries with 1 family and as the number of families in a country approaches infinity, the answer for those countries

approaches50%.What might be troubling to some is that the first reading of the answer doesn’t seem to be correct.

The second reading of the answer leads to different people wanting a more specific answer. People understand the point, but what the answer. Is the answer a function that depends on the number of people in a country? Is the answer

not50 and not more than 50?Is there a single numerical answer to this question and to life, the universe and everything that is somewhere near the middle of 31 and 50 (42 maybe)?

I tend to choose the 2nd reading of Prof. Landburg’s answer. However since this blog is about tackling problems of philosophy, I feel justified in arguing what different terms like expected means.

E.g. if a country with 1 family has a boy, then I expect that country has no means to increase its population and I expect the population to become 0. I expect this because my expectation is not that people live forever.

Will A:

I believe that there are 2 possible ways to read Prof. Landsburg’s answer.The first is to read it as the answer is 30.86 for any country of any size.I don’t see how anyone could have read the answer this way. What led you to think this was a possible reading?

I believe we are stuck in semantics. I ran a simulation based on your instructions. When the country size is 1 family, sum(G)/(sum(B)+sum(G)) is the same as the average number of girls per family in the country.

No matter the size of a country, I come up with the average number of girls per family to be 305. However as the size of the country increases G/(B+G).

E.g. with a country of 4 families, I expect sum(G)/(sum(B)+sum(G)) to be ~42% and the average number of girls per family to be ~30%.

The fact that I come up with 2 different answer leads me to conclusion that the answer depends on how one defines “fraction of a population”.

I would be willing to accept the definition of a demographer as to what “fraction of a population” means if the supposition of the problem was that Google was interviewing me for a demographer position.

Will A:

E.g. with a country of 4 families, I expect sum(G)/(sum(B)+sum(G)) to be ~42% and the average number of girls per family to be ~30%.This is right.

The fact that I come up with 2 different answer leads me to conclusion that the answer depends on how one defines “fraction of a population”.“Fraction of population” is quite unambiguous. For a country with four families, the correct answer is about 42%.

What

elsecould “fraction of population” mean?It could mean

E(G) / ( E(G) + E(B) )

I’ll be the first to admit I didn’t follow the more technical aspects of all of this (even though the comments were nonetheless highly entertaining to me…weird sense of humor I guess…), but what helped more than anything to drive at least some of the intuition home was Paul G’s exaggerated example using the scenario where a family starting out either has a boy or (if their first child is not a boy) has 100 more children. It’s spoon feeding for most on here I realize, but it was helpful for me. Thanks.

@ Steve:

I have to apologize, I swore that I read your answer numerous times and only now to I see:

”

For a population of k families, a similar calculation gives an answer of approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.

”

So now I have to assume that anyone arguing with you must have missed this as well.

However, I still submit that given the randomness of births, any such population that follows this rule is bound to go to zero and therefore I expect the value of the population to be to be zero eventually.

To me this is like asking:

In Japan, the birthrate is approximately 1 child per couple. In the year 5281 what percentage of the population will be female. Well the answer is of course 0/(0+0). Which doesn’t seem to match (1/2) – (1/4k).

Of course, my math skills are pretty lacking. Is 0/(0+0) 50%?

A strange country :

The country can be as small as one family but every woman in that country can bear an infinite number of children.

So the number of childrens in a family can be bigger than the number of families in the country.

Will A:

Well the answer is of course 0/(0+0). Which doesn’t seem to match (1/2) – (1/4k).And of course, there’s no reason why it

shouldmatch, since your example violates the “reproduce till you have a boy” assumption.@Gunter: the probability of a family reaching size k in this problem is 2^{-k}.

In a certain country, we measured all the citizens for a certain trait T. The measurements of T are independent and identically distributed random variables. The trait T has known mean mu and variance sigma^2 (standard deviation sigma).

Question: What is the expected fraction of our measurements that are within one observed sigma of our observed mean?

Some might answer thus: The central limit theorem tells us that we will get a normal distribution. A tabulation of the normal distribution tells us that 0.682689492… of the normal distribution is within one sigma of the mean. So the answer is 0.68+

But that answer is WRONG. It is wrong because the central limit theorem only applies in the limit, while all countries have a finite population. For example, if the country has a population of two, and our measured values are P and Q, then the observed mean is (P+Q)/2 and the observed sigma is sqrt (2 * ((P-Q)/2)^2). 100% of the values are within one sigma. This is also true for a population of three.

The answer may APPROACH 0.682689… as the population increases, but it is NOT 0.682689…

++

I suggest that the above has some of the flavor of the stated problem. In both cases, the answerer has made a leap, and he is wrong if the population of the country is two or three. But, for this problem, the answer IS 0.68+ when the population is country-sized, e.g. several million. And similarly, the answer for the girl-boy problem is 0.50 (to many digits) when the population is country-sized

Bob Ayers:

You’ve missed the main point.

You write:

And similarly, the answer for the girl-boy problem is 0.50 (to many digits) when the population is country-sizedAnd yes, that is true. But

it does not followfrom the fact that the expected number of girls is equal to the expected number of boys.You have, in fact, merely asserted this conclusion without proving it. And the main point here is that a) assertions need to be justified and b) the “usual” justification for this assertion is not valid.

(Moreover, of course, the answer is

notexactly .5.)Steve Landsburg notes:

You’ve missed the main point.You write:

And similarly, the answer for the girl-boy problem is 0.50 (to many digits) when the population is country-sized

And yes, that is true. But it does not follow from the fact that the expected number of girls is equal to the expected number of boys.

You have, in fact, merely asserted this conclusion without proving it. And the main point here is that a) assertions need to be justified and b) the “usual” justification for this assertion is not valid.

I’m sorry for having mis-phrased. I meant to parallel the question, which is not exactly what is quoted above but rather “What fraction of the population should we expect to be female”. Thus I should have written:

And similarly, the answer for the fraction of the population we should expect to be female is 0.50 (to many digits) when the population is country-sized.And indeed I did not supply a proof, of that or of the central limit theorem.

I claim that I do understand the point, which is well-illustrated by Steve Landsburg’s earlier remarks and especially by his “4G 4G 4G 12B” example. I myself used a similar example: “Family flips a fair coin to decide on one girl or two boys but that does not mean that for small population we should expect to see E(g) = E(b)/2″ And I calculated the latter for a variety of small family-counts. This example seems more amenable to calculation than the 4-choice one, tho it lacks the 50% result — and I expect that the fact that its calculation uses the binomial theorem straightforwardly may make a proof that E(g) approaches E(b)/2 for large N simpler.

Indeed the person/interviewee who reasons from “half boys” to expectations has made a leap that he has not justified; albeit one that gets the right answer. My little tale was meant to show solely that we make many such leaps based on the law of large numbers, and often they are a shortcut to the right answer, rigor or no. I was not attempting to knock Landsberg’s derivation of a different moral from his tale.

@ Steve:

I agree that the example I give for Japan doesn’t match. However, if we consider the countries with 1 family that follows the rules, the 1 family countries who have 1 boy first and stops has no way to reproduce. The future of the country is to have a population of 0.

So I expect the future fraction of girls in countries with 1 family who have boys first to be 0/(0+0).

Now if I assume that in these imaginary countries people live forever (in a way that I can consider imaginary blue and red balls drawn from an imaginary bag can exist forever), then I would say that the countries with 1 family who have a boy first would have 3 citizens who live forever.

This would be the mother, the father, and son. And in this country, the fraction of females would be 1/(1+2).

I forgot the other and most important case. If we assume that couples in these countries live forever until they have a boy and stop, we come up with your answer.

So if this was the assumption of the puzzle, I apologize. However a less ambiguous way of phrasing the question would have been.

Imagine a country where people hate living and whenever a couple has a boy, the couple dies. Therefore in this country every couple tries to have a boy. What is the expected …

OT. I am interested in the population dynamics of this reproduction rule. At first, I thought that the population would go extinct with certainty because half the women (wombs) do not reproduce wombs. (Only wombs can reproduce.) Then I realized, the expected number of women in the subsequent generation is equal to the initial number. This suggests that the population follows a Martingale (am I right?) and goes extinct with positive probability.

And, oh yes. Is there any relevance to the fact that the two counties most likely to approximate this decision rule (China and India, of course) are the most populous on this planet?

@ Neil:

Are China and India most likely to approximate this rule or are they most likely to use a decision rule where couples try to have as many boys as possible?

As it relates to probability theory, you probably know more than me. Are the odds that every generation has the exact number of boys and girls?

If the odds say that there will be times when the exact number of boys and girls are different, then there will be either boys or girls who are not able to form couples and therefore won’t be able to reproduce. This would be factor (if correct) that would lead the population to decrease.

@ Neil:

Also, from a setting a max point of view no generation can have more couples than the previous generation.

Consider the case of countries with 1 families. If the couple has:

B – no couples can be formed population dies.

GB – one couple can be formed

GGB – one couple can be formed (one boy joins with one of the girls)

GGGB – one couple can be formed

….

In general a country of k couples will have exactly k boys. And therefore the maximum number of possible couples in the next generation will be k.

The only way for such a country to not decrease is for the country to produce more girls in every generation. However Prof. Landsburg’s proof shows that less girls are produced than boys on average.

Therefore, any country that following this rule will eventually have zero children.

Will A:

–

“However Prof. Landsburg’s proof shows that less girls are produced than boys on average.”

–

To be precise, this is not what Prof. Landsburg’s proof showed. Less girls than boys on average implies that the expected value of the difference B-G be positive. The thing he proved was that, for K families in the country, the expected value of the ratio B/(B+G) would be less than 1/2 by an amount equal to 1/(4K), even though the expected value of B-G would be zero. The apparent ‘contradiction’ of these two facts is the main point of this question.

In Professor Landsburg’s own words: “Moral: Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.”

@ Thomas:

Thanks for the correction. I think though that these populations eventually go to zero. You are obviously (and I mean this) much more adept at mathematical concepts than I am.

Based on what you are saying does this mean that the population of these countries will increase, decrease, or stay the same?

I could be wrong, but I think this is what Neil was asking and I made a poor attempt at answering.

Different people can find different things interesting. What I would find interesting is if the population decreases overtime because this puzzle would fall into the set S where S is the set of puzzles that can have multiple possible morals. E.g.

- Just because two variables have an expected difference of zero, you can’t conclude they have an expected ratio of one. That needs to be computed separately.

- A country like China whose citizens want to have a son could implement this policy and still reduce its population.

This puzzle seems to be part of this set. And therefore the “correct” answer is based on the moral that the interviewer is trying to get across.

If the interviewer doesn’t give the moral of the puzzle when asking it, then a “correct” answer is for the interviewee to pick a moral and answer the question in the way that matches the moral.

E.g. if an 8 year old asks what me “What is 1 and 1?” Either 2 or 11 should be correct and equally valid answers.

Perhaps a grand unified solution is in order something like:

Let K be the set of countries having k couples at a given point in time.

The fraction of girls is approximately (but not exactly) (1/2) – (1/4k), which, when k is large, is approximately (but not exactly) 1/2.

Let g be the number of generations in the future of the given point of time. As g increases, the fraction of girls starts to approach 1-log(2).

For a sufficiently large g, the fraction of girls is 0/0.

Thanks for really nice math puzzle, dressed up in girl/boy fractions.

China and India both really suffer from too many boys (selective abortion and/or infanticide of girl babies).

In reality, there is some limit to the number of children in any one family. Perhaps 20, 30, or 40? This means some families stop with all girls.

I don’t know what having a limit does to the math, but at the limit the simple expected number of boys adds to less than 1; let me try for #boys in 16 family country (with limit of kids at 4):

8/16 B 8 0

4/16 GB 4 4

2/16 GGB 2 4

1/16 GGGB 1 3

1/16 GGGG 0 4

Looks like 16 fathers, 15 sons=31, 16 mothers, 15 daughters=31.

So with a real-world constraint on family size, the 50-50 looks better. If there was a stronger max child constraint (like 3, or 2) it looks like at the limit there would be more daughter only families to balance.

(you also have to ignore non-identical twins)

Fun math practice.

Here is such a program, written in Python (I’m not a professional computer programmer, and I’m not trying to make a bet).

It simulates 1e6 families and obtains an answer of 0.5 (within errors). The number of families can be changed.

import random

numboys=0

numgirls=0

numfamilies=0

totalfamilies=1000000

while(numfamilies 0.5): #boys are 0.5

numgirls += 1

kid = random.random()

if(kid <= 0.5):

numboys += 1

numfamilies += 1

print numgirls*1.0/numboys*1.0

Sorry, that last code copied incorrectly for some reason: (left out a while loop). This is correct – you will have to correct the tabbing to make python happy.

import random

numboys=0

numgirls=0

numfamilies=0

totalfamilies=1000000

while(numfamilies 0.5): #boys are 0.5

numgirls += 1

kid = random.random()

if(kid <= 0.5):

numboys += 1

numfamilies += 1

print numgirls*1.0/numboys*1.0

Landsburg asks one question, then answers another.

If the question is, “What fraction of the *population* is female?”, the correct answer is 1/2. Really, truly, honestly. As Landsburg himself proves.

If the question is, “What is the mean percentage of girls in a family?”, then the answer is different.

Reconciling the two is the fact that (under this system) the families with more girls are uniformly larger, and thus contribute greater weight to the population.

You run into similar issues when looking at economic statistics aggregated by household vs. looking at the population as a whole. Households of different sizes have different characteristics, and a large household carries greater weight in the overall population than a smaller household.

Ted Fischer:

Landsburg asks one question, then answers another.Take my bet then. We’ll let a random panel of statisticians decide what question was asked.

You need to clarify the questions. It’s not clear to me from the question whether you are looking for the expectation of ratio or ratio of expectations. That is the expectation of (females / population) or (expectation of females / expectation of population).

Lets line up all the families in the world. Each family will flip a coin until they flip a head. Then they pass the coin to the next family who does the same.

Isn’t this the same as if it is just me flipping a coin until I get a head. Then I do it again, and again, and again.

In the end all I have done is flip a coin lots of times and the expected number of heads is 50%.

Steve, ask that statistician yourself (though strictly speaking this is probability not statistics). There is no need to frame it as a bet, and their opinion on the wording does not influence my opinion of the correct answer.

When you ask, “What fraction of the population is female?”, the calculation is simply the number of girls divided by the total population. Inserting a behavioral pattern ahead of that question does not change the calculation specified by the question, and the question makes no reference to family structure.

Note that we don’t disagree on the calculations. We agree that half of the births in the country specified will be female (and thus half of the children in the schools). We agree that half of the families will have a single boy, a quarter will have a girl and a boy, and so forth. There is no mathematical disagreement involved. We simply disagree on whether “What fraction of the population is female?” references family structure or not.

Ok, correcting myself, I was assuming an infinite number of times that I stop and start. But the problem as stated has a finite number of families in the country. So the answer would depend on the number of families and is not 50%.

Is this even an honest question? Or an attempt by a behavioral economist to determine whether or not people are willing to place silly bets with people they’ve never heard of?

Yes, I’m certain I’m correct. No, I’m not interested in betting under any circumstances. Call that a philosophical objection, if you like.

TF:

When you ask, “What fraction of the population is female?”, the calculation is simply the number of girls divided by the total population.Absolutely.

Note that we don’t disagree on the calculations. We agree that half of the births in the country specified will be female (and thus half of the children in the schools).We

absolutelydo not agree on that. If you believe that half the births in the country specified will be female, then I hope you will take my bet.Okay, I’ve slogged through some of the additional comments…

If you take a fixed number of couples, and they all follow this procedure until their families are complete, and then you stop and analyze the outcome, your expected fraction will be very slightly less than 1/2.

Yet half of all births on an ongoing basis are female.

I guess I can see how you are interpreting the question, though I still disagree that it is a natural interpretation.

TF:

your expected fraction will be very slightly less than 1/2.Yet half of all births on an ongoing basis are female.Nope. It doesn’t matter whether or not you wait till the families are complete; the answer will be less than 1/2 either way. And it is

not truethat half the births on an ongoing basis are female.Can we increase the number of families to one thousand?

Erick Fejta:

Can we increase the number of families to one thousand?Sure, as long as I get to update my prediction for the expected ratio (to .49975, in fact), and as long as we increase the number of runs well past 3000.

If you’re prepared to put money on this I’ll be glad to get more specific.

Maybe another twist on it from a different angle?

* If an actual country were to operate this way, half of its births would be girls. But then, at any time there would be families that have not yet had a boy.

* The families with more girls than boys are larger than those that are evenly split (or those with just a boy). Similarly, if you were to repeat this experiment with k families each time, those trials in which the girls outnumbered the boys would have more children than those trials in which the boys outnumbered the girls. (There would always be exactly k boys, so this should be obvious.)

* Thus for a fixed number of families in a “one off” trial, the expected fraction of boys will be very slightly greater than the expected fraction of girls. And this is the question which Landsburg intended. But in the long run, aggregating the populations of the different samples, the proportion approaches a limit of 1/2. Thus the essential expectation is not violated.

I still don’t see why it is necessary to frame this as a bet. Mathematics is inherently provable. If there is a disagreement, then either the interpretations differ or one side is provably wrong. Neither is an appropriate basis for betting.

No thanks, I would rather be on your side of the bet.

Here’s python code that runs the simulation: http://commondatastorage.googleapis.com/elf/familygirls.py

Simulation 2990: 1 girls, 4 boys, 20.000000% girls

Sum of 2991 simulations: 12114 girls, 11964 boys = 50.311488% girls

Average of 2991 simulations: 44.314615% girls

Simulation 2991: 3 girls, 4 boys, 42.857143% girls

Sum of 2992 simulations: 12117 girls, 11968 boys = 50.309321% girls

Average of 2992 simulations: 44.314128% girls

Simulation 2997: 8 girls, 4 boys, 66.666667% girls

Sum of 2998 simulations: 12140 girls, 11992 boys = 50.306647% girls

Average of 2998 simulations: 44.317036% girls

Simulation 2999: 0 girls, 4 boys, 0.000000% girls

Sum of 3000 simulations: 12148 girls, 12000 boys = 50.306444% girls

Average of 3000 simulations: 44.309713% girls

Steve,

Your example about 3 families with 3 girls and one family with 12 boys does not follow the original problem. The family with 12 boys can not exist. The family could only have one boy and be done. Therefore, the example you created to illustrate your bogus math is false.

TF:

If an actual country were to operate this way, half of its births would be girlsYou keep saying this. But it is

not true.Jim Robinson:

You missed the entire point of the example.

The “official” solution to the brain teaser confuses the expected value of a ratio with the ratio of the expected values.

The families-on-the-block problem is a

completely separateproblem in which you get a clearly wrong answer when you confuse the expected value of a ratio with the ratio of the expected values.It’s not

supposedto be the same problem. It’s supposed to beyet anotherillustration of how this invalid argument can lead to incorrect results.@ TF:

For the most part if someone says that something is a riddle, they are looking to make some point or as Prof. Landsburg put it, a moral.

As a behavioral economist, I would think that the following would be the correct answer for your field and possibly just a defensible given the moral as Prof. Landsburg’s answer.

Answer: Google interviewers shouldn’t ask ambiguous questions and hire someone based on their being 1 answer. This is unfair and unfair hiring practices are evil.

Moral: Just because a corporation has a goal to do no evil, doesn’t mean that it never does evil.

“It doesn’t matter whether or not you wait till the families are complete; the answer will be less than 1/2 either way. ”

Hm, depending on how you define it, you might be right…

If you simply count the first million children, then the expectation is precisely 1/2.

If, on the other hand, you cycle it in rounds? With each family having one child per round (if it is not yet complete)? The expectation is that k/2 families in the first generation will have girls, but (as before) those random trials in which there are more boys in the first generation correspond to lower populations than those in which there are more girls in the first generation. So your point holds.

It *is* an interesting problem.

Steve,

I also forgot to mention, when you set k to a finite number, of course it’s not going to equal .5 in your equation. However, when dealing with statistics and expectations, we are trying to determine an expected number for a population, which (because of the Central Limit Theorem) allows us to use a normalized population. Thus, we can use infinity for k, so as k approaches infinity, your equation approaches .5, which is the expected value. Now, there is also the confusion of what is the question asking for and what are you trying to solve? Is the question asking for E[G]/(E[G] + E[B]) or is it asking for E[G/{G+B)]. The former is the expected number of girls divided by the expected population, which most people would believe is what the question is asking for. It asks for the fraction, NOT the expected fraction. The latter is asking for the expected fraction, which is how you viewed the problem. Even so, your solution isn’t entirely correct. After calculating out to a family with 12 children (11 girls), the solution fails to change from 30.7% (after rounding). The percentage gets more accurate, but does not change from 30.7%.

It is clear that this economist can’t read his own question. In no way can “what fraction of the population is female?” be twisted around into what is the average fraction of females in a given family. I would be happy to put up a large amount of money against your argument however based on the mental gymnastics you are conduction to ignore your mistake I doubt you would ever pay. You should be embarrassed.

“If an actual country were to operate this way, half of its births would be girls”

As another commenter wrote, getting pregnant is a lot like flipping a coin. My country only has one bedroom, so pregnancies must necessarily happen sequentially. But because this is a healthy, thriving country, there is always somebody waiting to use the bedroom.

Coin flip after coin flip, the expectation after any fixed number of births is always 50%. The only way to budge it off that number is to use a fixed and finite number of families. Your exception relies on a changing denominator.

It’s easy if you use a number of reproducing couples that is a power of 2 so that you can “split” the population.

N is the number of reproducing couples.

let N=2^1=2

Results

1:b

2:gb

Summary

# of boys = 2, number of girls = 1; 66.7% boys

Let N=2^2=4

Results

1:b

2:b

3:gb

4:ggb

Summary

# of boys = 4, number of girls = 3; 57.1% boys

let N=2^3=8

Results

1:b

2:b

3:b

4:b

5:gb

6:gb

7:ggb

8:gggb

Summary

# of boys = 8, number of girls = 7; 53.3% boys

…

Let N=2^m=n

Summary

# of boys = n, number of girls = n-1; n/(2n-1)x100% boys

Conclusion

As n->inf, proportion of boys -> 50%, but is always greater than 50%

TF:

Coin flip after coin flip, the expectation after any fixed number of births is always 50%.You can say this another 3000 times and it still won’t be true.

General Apathy:

In no way can “what fraction of the population is female?” be twisted around into what is the average fraction of females in a given family.Right. And in no way can my analysis of the average fraction in a country be twisted around into an analysis of the average fraction in a country.

But we can leave that to a panel of statisticians to decide. I’m happy to have us each put up $5000 in advance, to be held by a neutral party, so you don’t have to worry about reneging. Do we have a bet?

How can we have a family of 12 boys in this scenario? Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.

Scoot AO:

Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.Wanna bet?

Steve,

do macroeconomists worry about these sort of issues when they try to aggregate ratios like W_i/P_i, wages over prices for a single firm, to get a notion of average real wage?

More generally could the puzzle be rephrased in an economics context?

You missed the entire point of the example.

The “official” solution to the brain teaser confuses the expected value of a ratio with the ratio of the expected values.

The families-on-the-block problem is a completely separate problem in which you get a clearly wrong answer when you confuse the expected value of a ratio with the ratio of the expected values.

It’s not supposed to be the same problem. It’s supposed to be yet another illustration of how this invalid argument can lead to incorrect results.

(I apologize for not knowing how to italicize or quote on this site.)

Actually, if you read my second post, you would hopefully see that the question is not asking for the expected value of ratios. It is asking for the ratio of expected values. You are the one making this confusion, and I can at least understand why you’re thinking that way. However, to make the assumption that it is asking for the expected value of ratios and then treating everyone else like they are wrong without proving without a doubt that it’s asking for the expected value of ratios in addition to treating people making the original assumption like they are stupid is both egotistical and a terrible way to try to sell a book.

“Coin flip after coin flip, the expectation after any fixed number of births is always 50%.

You can say this another 3000 times and it still won’t be true.”

Let me clarify, please. You are saying that if families operate under your system, then the expected number of girls out of the first thousand births is NOT precisely 500.

I would echo your words, except that would be an incredibly foolish statement. We seem to disagree on this point. I’ve made my case, do you care to offer yours?

Sure. You can’t average percents unless they have the same denominator.

And how about explaining how you can have a family of 12 boys inested of ignoring that part?

Anyway, you area making it much too hard.

Every year some families go to the hospital to have babies. Assuming for convenience that the probability of a boy is 50% (close enough for this kind of problem), half the babies added to the population will be girls, and half will be boys.

This will be the case every year. Thus no matter how long the population does this kind of thing, the children will all be arriving in 50/50 ratio. And thus the population, even if it started with some other ratio, will tend over time to be 50%.

I’ll take the bet. Have a team of statisticians (although the commentats here should be enough to satisfy any impartial observer that you are wrong)review your ‘solution.’ Make sure they read the question though because you don’t even answer the question asked.

Steve is pretty clearly forgetting to weight the families which have more children by the number of children in those larger families. This can be seen in his formula:

(1/2)*0 + (1/4)*(1/2) + (1/8)*(2/3) + …..

This is exactly correct if the question is “In a randomly chosen family, what fraction of the children do I expect to be girls?” Then 1/2 of the families have 0% girls (they had a boy first) – 1/4 of the families have 50% Girls (they had a girl then a boy), 1/8 of the families have 2/3 girls (they had GGB) etc.

However, If we want to calculate the average percentage of girls in the population – we want to instead calculate the number of girls in each family (assuming they have completed their reproduction and have as many boys as possible). Another way to think about this – is that if we are randomly drawing a person from the country, rather than a family from the country, we are much more likely to draw a girl from a family that has many girls (because it also has many more children). We obtain the number of girls in each family as:

0*(1/2) + (1)*(1/4) + (2)*(1/8) + (3)*(1/16) + (4)*(1/32)

That is to say, 1/2 of the families have 0 girls, 1/4 of the families have 1 girl, 1/8 of the families have 2 girls, 1/16 of the families have 3 girls, 1/32 of families have 4 girls. So the expected number of girls in each family unit is:

0 + 1/4 + 1/4 + 3/16 + 4/32 + 5/64 + 6/128 + 7/256 + 8/512 + ….. = 1

And the expectation for “1 boy in each family” is trivial.

A final way to view this problem is to say – most families will have more boys than girls – but no family will have “more boys than girls + 1″, there will never be many more boys than girls. However, there will be a few families which have many more girls than boys – and this counteracts the many families which have only one boy. The ratio is maintained at 50/50

~Tim

This guy has a book? You mean someone publishes this idiocy? Oh my.

Maybe I should phrase it more formally? Let X be the number of girls out of the first k births. I am asserting that E(X) = k/2 no matter how families decide when to stop having children.

The expectation for any given birth is assumed to be independent and identically distributed with a probability of 50%. Thus the expected number of girls after one birth is 0.5, the expected number of girls after two births is 1.0, etc. Because each pregnancy is independent, the expected values add.

You can sequence your couples, if you like. Have one couple finish their family before the next begins. Now the births look an awful lot like an endless string of coin flips.

If you truncate the string after 1000 births, the expectation is precisely 50%. If you truncate the string after 500 families are complete (on average the same as 1000 births) then the expectation is slightly less than 50%.

Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.

this is the crux of the issue.

He must be trolling us. This can’t be a serious attempt at a solution.

Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.

this is the crux of the issue.

It is if you throw out basic mathematics knowledge and just take the numbers 1, 1, 1, 0 and average them. I have not, nor do I intend to (in large part based to the author showing his lack of mathematical knowledge on this page and these comments) read the book listed at the top. However, I would guess that this problem is in the book with this solution. The book is already receiving poor reviews, and I would be surprised if the mathematics professors at the author’s university aren’t laughing at the author behind his back.

Prof Landsburg,

Even after twisting the question you still got YOUR solution wrong. In the “spoilers” above you do not include the parents.

How can a family consist of 1 boy? Isn’t the minimum 2 boys and 1 girl?

What is the bet again that your solution (above and posted) is wrong?

- which adds to 1-log(2), or about 30.6%. WRONG!!!!!!!

Will you at least admit this part is incorrect?

Scoot AO:

Regardless, the average of 3/3, 3/3, 3/3, and 0/12 isn’t 75%.

Wanna bet?

Let’s do the math.

(3+3+3+0)/(3+3+3+12) = 9/21 = 3/7 = .4285714.

Now to your made up example, (4+4+4+0)/(4+4+4+12) = 12/24 = .5

Let’s try it another way:

1 * (4/24) + 1 * (4/24) + 1 * (4/24) + 0 * (12/24) = .5

Don’t forget that if you’re going to average (or calculate the expected values of) ratios, you need to weight them correctly and not just give equal weighting. There is no way to perform a mathematically sound calculation to get the average of the sample {3/3, 3/3, 3/3, 0/12} to equal 75% (same goes for {4/4, 4/4, 4/4, 0/12} in case you were wondering). I would suggest next time that this Landsburg work with a real mathematician when he attempts to publish something. This page has been good for some laughs. Then again, as along as we all know that this book is obviously just a book of fiction, and not someone trying to soundly prove something, it doesn’t really matter.

Instead of betting the haterz on this blog $15K, here’s an idea. Take that $15K to Las Vegas and hit the roulette tables in Vegas. Place repeated bets on red (=G) and stop each betting session when black (=B) hits. According to your “math” that should provide a 69.4% win rate (slightly less once we account for 0 and 00, but still well above 50%). A sure-fire win to beat the house!

TF:

“Coin flip after coin flip, the expectation after any fixed number of births is always 50%.You can say this another 3000 times and it still won’t be true.”

Let me clarify, please. You are saying that if families operate under your system, then the expected number of girls out of the first thousand births is NOT precisely 500.You are right; I misread you. I thought you’d made a claim about the expected ratio after a fixed number of

years. But you didn’t; you said a fixed number ofbirths. So you are correct and I apologize.Scott AO:

I’ll take the bet.You are on. The terms require a team of statisticians to read and interpret the original question.

Are you in for $5000?

Jim Robinson: If you want to place a bet on “What is the average of 3/3, 3/3, 3/3 and 0/12?”, with a panel of mathematicians serving as judges, you can name your stakes.

A team of statisticians have already responded in this thread.

Cool. I do like the problem, even if I think it could have been phrased more precisely. :)

Enjoy!

Jim Robinson: If you want to place a bet on “What is the average of 3/3, 3/3, 3/3 and 0/12?”, with a panel of mathematicians serving as judges, you can name your stakes.

Ok. If I win, you have to pull your book from the market and never publish another, AND make a public statement apologizing to all mathematicians and statisticians stating that you are inept and promise to never publish a book again.

If you win, I will buy and read a copy of your book. In addition, I will gift 5 copies of the book.

In addition, I am the one who gets to phrase the question, and I will phrase it exactly like this, “You have 4 samples. The first sample has 3 successes in 3 trials. The second sample has 3 successes in 3 trials. The third sample has 3 successes in 3 trials. The fourth sample has 0 successes in 12 trials. What is the average number of successes per trial for the aggregate?”

Since you said that I can name the stakes, and I have, I will count this as an accepted bet.

Wow, we’re still going strong on this. For those of you still disagreeing that the expected ratio of girls to boys is less than 1/2, can you tell me which of the following things you think are true, and which you think are false?

For all statements, B is the number of boys born to a generation of mothers, and G is the number of girls born to the generation. The ratio of boys to girls is R = B/(B+G).

1. The number of boys B is equal to the number of mothers in the previous generation.

2. The probability that G is greater than or equal to B is 0.5.

2a. The probability that B is greater than or equal to G is greater than 0.5.

2b. The probability that R is less than or equal to 1/2 is 0.5.

2c. The probability that R is greater than or equal to 1/2 is greater than 0.5.

3. The most likely number of girls G is equal to B-1. That is, the probability that G is equal to B-1 is larger than the probability that G takes on any other particular value.

4. The expected value of R is approximately 1/2 + 1/(4B).

Is there really a bet taking place about the average of these four numbers: (1,1,1,0)?

Jim Robinson:

Ok. If I win, you have to pull your book from the market and never publish another, AND make a public statement apologizing to all mathematicians and statisticians stating that you are inept and promise to never publish a book again.Let’s bet $5000. If you win, I will pay you $5000, pull my book from the market, never publish another, AND make the public statement you refer to. If you win, you will pay me $5000.

I am willing to let you pose *either* the original question *or* the alternate question “What is the average of 3/3, 3/3, 3/3, and 0/12 ?”

But as for *this* question:

What is the average number of successes per trial for the aggregate?”No, of course you don’t get to make up a completely new question to ask.

Thomas Bayes:

Is there really a bet taking place about the average of these four numbers: (1,1,1,0)?Jim Robinson has declared his willingness to bet on the average of those four numbers. I have agreed that if I lose that bet, I will not only pay him $5000 but pull my book off the market, never write another, and make a humiliating public statement.

As far as I’m aware, this bet is on.

The average of 1,1,1,0.

Wow. Just wow.

There’s a group of actuaries right now on an actuarial discussion board laughing at you and this thread.

Don’t you publish an actuarial text book? Some kind of simple micro-economics? I hope someone clues in the SOA so that gravy train ends for you soon.

No, the bet is on the aggregate average of 3 successes in 3 attempts for 3 trials, plus a fourth trial of 0 successes in 12 trials.

Scoot AO:

The average of 1,1,1,0.Wow. Just wow.

Yeah, that was my reaction too. But that’s what Jim Robinson says he’s willing to bet on. You want me to see if I can get you some of this action?

Hey Jim, of course YOU don’t get to make up a different question – only Steve gets to do that.

The best part is, the quetion you posed is exactly what he made up to answer the original question. It’s like a double whammy.

It’s like something a 9 year old would do. Average percentages by adding them up and dividing.

No, of course you don’t get to make up a completely new question to ask.

It’s not a new question. We had 3 samples with 3 successes in 3 trials and a fourth sample with 0 successes in 12 trials. In total we had 12 successes in 24 trials. That was the original example that Landsburg made up!

Here’s another fun one:

Steve Landsburg placed $15,000 behind one of three doors. The other two doors contain nothing. You must select one door to open. You keep whatever is inside.

However, you must first indicate to Steve which door you will open. When you do this, Steve promises to open one of the other two doors and show that this door contains nothing.

Two doors remain: the one you selected originally and the other unopened one. You must now make your final selection.

Which door should you open and why?

Jim Robinson has declared his willingness to bet on the average of those four numbers. I have agreed that if I lose that bet, I will not only pay him $5000 but pull my book off the market, never write another, and make a humiliating public statement.

As far as I’m aware, this bet is on.

Ummm, I did no such thing. I was betting on the aggregate average of 3 out of 3, 3 out of 3, 3 out of 3, and 0 out of 12. Basic mathematics says to weight each one accordingly. I never said 1, 1, 1 and 0.

I like the way steve selects part of a quote to post and only responds to that.

I wonder if he really doesn’t see his error with the averaging of the 3/3 etc or he is just trolling us.

I just ran a simplified simulation with the following assumptions:

A population of 100 women and 100 men at time = 0.

A ‘period’ is defined as a generation.

In each ‘period’ the women each birth until they have the first boy, then they stop.

The new “birthing” population for the subsequent period is equal to the previous number of failed boy attempts, ie girls. They do the same as their mothers and birth until a boy pops out.

.

I simulated 40 generations a total of 10 times. I also assume that no one ever dies so that the final ratio is taken from the sum of these 40 generations.

I see no statistically significant variation from the 50/50 expectation.

Is anything wrong with this model, Steve?

Jim Robinson:

Ummm, I did no such thing. I was betting on the aggregate average of 3 out of 3, 3 out of 3, 3 out of 3, and 0 out of 12. Basic mathematics says to weight each one accordingly. I never said 1, 1, 1 and 0.Actually, what you wrote was:

There is no way to perform a mathematically sound calculation to get the average of the sample {3/3, 3/3, 3/3, 0/12} to equal 75%So perhaps you’d like to bet on whether 3/3 = 1?

EF, after Steve shows you an empty door you should always switch. Your original choice has a 1/3 chance of being the prize, however switching only costs you if your original guess was RIGHT. The other 2/3 of the time, you switch from a wrong door to the right door. Known as the “Lets Make a Deal” paradox.

Scoot, don’t rely on simulations. Steve himself will tell you that the theoretical percentage is statistically impossible to differentiate from 50% on any reasonable number of trials. The number of trials necessary to prove your point increases dramatically with the size of the population involved. If you want to model it statistically, try a population with *eight* couples and run at least 1000 iterations. Maybe more.

Scoot AO:

A population of 100 women and 100 men at time = 0.A ‘period’ is defined as a generation.

In each ‘period’ the women each birth until they have the first boy, then they stop.

The new “birthing” population for the subsequent period is equal to the previous number of failed boy attempts, ie girls. They do the same as their mothers and birth until a boy pops out.

.

I simulated 40 generations a total of 10 times. I also assume that no one ever dies so that the final ratio is taken from the sum of these 40 generations.

If I understand this, it sounds like you did it right. The average over many runs will be slightly less than 1/2, but 10 runs isn’t nearly enough to be sure you’ll see this.

So perhaps you’d like to bet on whether 3/3 = 1?

Yes, 3/3 is 1, but that is not what your sample was. Your sample was 3 success in 3 trials. So, in your world, if in game 1 of the season Derek Jeter goes 5 for 5, and in game two of the season he goes 0 for 1, his batting average is .50000?

Scoot, don’t rely on simulations. Steve himself will tell you that the theoretical percentage is statistically impossible to differentiate from 50% on any reasonable number of trials.

To TF: That is exactly what Steve is trying to negate with this example. He’s saying that what you just said is wrong.

I wasn’t RELYING on simulations, it as just additional backup. The answer to this problem is clear. Unless you average percentages incorreclty and use that result to try to answer a question about the average percent girls per family or whatever the hell Steve is trying to do that doesn’t answer the question.

The Monty Hall question is nothing new and not that interesting. Though I wonder if Steve can grasp the solution. Or perhaps he has a better solution that has escaped us for all these years. Probably something to do with picking the correct door 1/3 of the time and the incorrect door 2/3 of the time so the averahe number of times a door is picked is 75% so therefore if you choose not to pick a door you will win the money 25% of the time which is better than…Steve, can you help us our here?

Steve, are you a professor? If a student gets 10/10 on nine quizzes and 0/100 on the final is his average grade for the semester 90%?

Or perhaps he has a better solution that has escaped us for all these years.

To Scoot: Of course he does. See, you’re left with two doors. You either win or you don’t. He saw two people on Let’s Make A Deal, the first one lost, the second one won. Boom! 50/50 chance.

Jim Robinson: In your Derek Jeter example, there are two possible questions you can ask:

1) What is his batting average for the two seasons?

2) If you observed his batting average at the end of each season, what, on average, would you observe?

I think you understand that these are different. The answer to the first question is (of course) not .500. The answer to the second question is (of course) .500.

But this was not a problem about

twocountries; it was a problem aboutonecountry. So the right analogy is not that Jeter goes 5 for 5 in one season and 0 for 1 in the other. The right analogy is that Jeter plays foroneseason. In this season, there is a 50/50 chance he will go 5 for 5 and a 50/50 chance he will go 0 for 1. What is his expected batting average?The right answer to that question — the question that is analogous to the original puzzle — is .500.

I think (but am not sure) that you’ve gotten badly confused by the “four families on the block” example, which is

quite separatefrom the original puzzle. It was meant to illustrate, in a simpler context, that an expected ratio does not necessarily equal the ratio of the expectations — and hence that any argument based on assuming that equality must be an incorrect argument.So I suggest concentrating on the original puzzle —

onecountry (like one baseball season) with anuncertainoutcome (like Jeter’s possible good season/possible bad season, except here there are many many possible outcomes). We take an expectation across those uncertain outcomes (like averaging Jeter’s possible batting averages).You can of course do other calculations that yield other numbers that might or might not be very interesting for one purpose or another. But

thisis the number that the problem asks for.Scoot AO:

Steve, are you a professor? If a student gets 10/10 on nine quizzes and 0/100 on the final is his average grade for the semester 90%?If a student takes one exam, where there’s a 75% chance he’ll score 10/10 and a 25% chance he’ll score 0/100, then his expected grade for the semester is 75%. (In a class of 100 such students, I’d expect 75% to be the average score.) If he takes both quizzes, I will of course weight them differently.

In the original problem, there are not two countries. There is

onecountry with a probability distribution of outcomes — analogous to the student taking one test with a probability distribution of outcomes. The problem asks for theexpectedoutcome which is — by definition! — an average of the possible outcomes, weighted by the probability they’ll occur. It is, in other words, analogous to the student takingoneexam whose nature is uncertain, not to the student takingbothexams.I hope this helps to clarify.

Hey Steve, can you prove that infinity+1 > infinity next? Thanks!

You were the one who brought up multiple families.

Babies will be born 50/50 male female regardless of how many boys girls the mothers have previously birthed. There’s one country but multiple familes.

What’s wrong with my simulation?

Scoot AO: As far as I can tell, there’s nothing wrong with your simulation.

Scoot, try gridding it out as a probability tree?

I put together the simplest possible tree — four couples and up to two children each. Would share my spreadsheet, if I could, but you can easily prepare it yourself.

With four couples, after each has a child there is a 4/16 chance that there will be one boy and three girls. After the three who had girls try again, there will be a 4/128 chance of four boys and three girls, a 12/128 chance of three boys and four girls, a 12/128 chance of two boys and five girls, and a 4/128 chance of one boy and six girls. (Of course that is just one segment of the larger table.) Now compute the proportion of boys/girls for each outcome. Now compute the expectation by summing the product of the probabilities and the proportions.

E(G/(B+G)) = 0.470

It certainly generalizes to larger populations, though the difference from 1/2 becomes vanishingly small. Note that after just ONE year, we have E(G/(B+G)) = 0.500. Of course.

It is a little like a Markov-chain chart with a sink. In this case the “sink” is the relative tendency to stop having children when the proportion of boys is high.

@ Steve:

Like others in this post, I am livid with your obvious lack of grasp with these topics. The puzzle you post states explicitly:

”

Assuming a 50% birthrate of boys and girl, on average what percentage of a all countries do you expect to be female.

”

The answer is obviously 50%.

I’m so upset, I’m willing to bet you $5 that if you asked this question to a panel of Women’s Study teachers from California’s community colleges that their analysis will be different than yours.

Scoot, did you complete your simulation runs? Here is a python program that runs what you describe. I’ve seen the resulting population (that begins with 100 women, 100 men) end up with more than 6 million people and is always less than 50% female.

#!/usr/bin/python

import random

random.seed()

def Breed():

global boys, girls_without_a_son, girls_with_a_son

# Repeat while there is at least one woman who has no son.

while girls_without_a_son:

if random.randint(0, 1):

# It's a girl!

# The mom still needs a son, and so does this new woman.

girls_without_a_son += 1

else:

# It's a boy!

# The mom will no longer produce children.

boys += 1

girls_without_a_son -= 1

girls_with_a_son += 1

girls = girls_without_a_son + girls_with_a_son

total = boys + girls

print '%f%% (%d of %d) of the population are women.' % (

girls * 100.0 / total, girls, total)

def main():

global boys, girls_without_a_son, girls_with_a_son

line = 'yes'

while line:

line = raw_input('Number of families? ')

if line:

# Start out with equal men and women.

boys = girls_without_a_son = int(line)

# No women have a son.

girls_with_a_son = 0

Breed()

`if __name__ == '__main__':`

main()

Some example runs (with starting sizes of either 200 or 20):

Number of families? 100

49.844836% (16062 of 32224) of the population are women.

Number of families? 10

39.130435% (18 of 46) of the population are women.

Number of families? 10

49.056604% (260 of 530) of the population are women.

Number of families? 10

49.185668% (302 of 614) of the population are women.

Number of families? 10

48.511905% (163 of 336) of the population are women.

Number of families? 100

49.986529% (185535 of 371170) of the population are women.

Number of families? 100

49.936619% (39394 of 78888) of the population are women.

Number of families? 100

49.982861% (145820 of 291740) of the population are women.

The key observation is that E(X/Y) doesn’t necessarily equal EX/EY, where EX is the expectation of X. I think this is the mistake most people who disagree with Steven are making.

Just because EG/E(G+B) = 1/2, this DOES NOT mean that E(G/(G+B)) = 1/2. I was surprised at the solution, until I remembered that the integral of ratios, does not necessarily equal the ratio of the integrals.

This is a nice problem showing the subtlety of math and probability. Nice example, Steven.

Amazing! A controversy over what is G/(G+B) when it is given that G=B. Way to go! I insist that anything other than 1/2 is pure unadulterated obfuscation.

EF,

See my above post. You should computing the following:

Let R[i] = the ratio of girls to boys and girls for family i.

Set PG to zero

Compute PG = R{1} + R[2] + … + R[k]

Compute PG = PG/k

PG is the now the expectation of the ratio. You are computing the ratio of the expectations. These two computations aren’t necessarily equal as this example shows.

Regards,

Ken

(I apologize for posting this on two threads. I believe this is important enough to make sure both discussions have a chance to see it. I hope you agree.)

Okay people, let’s construct a betting game that is based on the underlying point of this problem. Here’s how it works . . .

You toss a fair coin using one of two strategies:

A. You toss until you see heads, then stop.

B. You toss until you see tails, then stop.

Under either strategy, you record the number of heads and the number of tails. Using the same strategy for each trial, you do this 10,000 times (or, better yet, a larger number of times). Of course I’ll let you use a trusted random number generator to do this. After the 10,000 trials, you tell me the ratio of heads to tails. I’ll then guess your strategy. If I’m correct, you owe me $1. If I’m wrong, I owe you $1.

After playing this game once, you can pick a different strategy and we’ll play again. We’ll do this several million times with you selecting a possibly different strategy each time, and then we’ll square our debts when we are finished.

We can play this game millions of times because you could describe a method for changing your strategy with each new game, and I could describe a method for guessing. A trusted random number generator could ‘toss’ the coins, and a trusted third party could run the program.

What is the point? The point is I will win money in this game. I will win because the expected ratio is not 1/2, and because it is more likely to see more heads than tails with strategy A, and more tails than heads with strategy B. Sure, the expected number of heads is equal to the expected number of tails for both strategies. But if you think that is all that matters, then you will lose money on this game. I will win because there is a statistical difference in the data that are produced by the two strategies.

To bring this back to the original puzzle. There will be a statistical difference in the populations for a country that uses the ‘everyone has a boy’ policy vs one that uses the ‘everyone has a girl’ policy vs one that uses the ‘everyone has two children’ policy. On the surface the populations will look the same: the expected number of boys will equal the expected number of girls. But if you don’t understand or believe they are different, then I would take all of your money in the game I described here. And for those of you who think “it doesn’t matter because the population size will be large”, then let’s have each play of the game use 1 million or even 10 million tosses. I would take your money even faster.

How would you feel about making this a charitable contribution in the other’s name? I propose we submit the original question and a one page description of the proposed computation to a random panel that I will trust you to determine. They can choose which description of the problem is more ‘appropriate’ and the comp sci grad students probably won’t be necessary.

Maybe this isn’t in the spirit of the thread… My argument isn’t conceptual, just that statisticians would likely agree that there are more appropriate models (which result in 50/50) than the one you’re using. Those seem to be the terms of the bet.

Jim Davis: I am happy to make this a charitable contribution. I think the right way to determine a random panel is to compile a list of the top ten statistics departments in the U.S. (as ranked by either US News and World Report or the National Research Council), to compile a joint list of all their faculty, and to draw names randomly from that list until we’ve got (say) five who are willing to participate. With your permission, I’ll compose an email quoting the original problem (plus the proviso that the problem is to be interpreted in expectation), but with no further attempt to bias the results — and then I’ll present the email for you to edit before sending it out. Does this sound basically right to you?

Edited to add: We might also want to offer the stats profs $100 each, to be paid out of the loser’s share of the bet. Let me know where you stand on this.Steve Landsburg: Thanks for offering to do the work on this! Your proposal sounds fine. I take it that after five have agreed we will send our proposed solutions and some brief (<= 1 page) argument for each, subject to the other's review.

A small tribute to the panel sounds fine, although I'm not sure it is necessary… This would constitute 50% of the bet, so perhaps you could offer an option to decline payment (for charity). Either way, I'm happy to place funds in your (or any other) account in anticipation of the judgment.

“Using the same strategy for each trial, you do this 10,000 times (or, better yet, a larger number of times). ”

Thomas Bayes, why do you say “better yet”? I would think that your power of discernment is REDUCED by a greater number of trials, as the differential you are trying to spot converges faster than the standard deviation. Am I missing something?

TF:

I think you are correct. I believe that my probability of winning a particular game will be roughly 0.5 + .14/sqrt(K), so I’ll win more often with a smaller number of flips. My mistake. I’m not sure what I had in mind when I wrote that, but I must been thinking about something because I wrote it twice. Thanks for the correction.

This probability also depends on the frequency with which the other person selects each strategy, and could go up if I start to detect a pattern or trend. If they can sway their pattern or trend without me detecting it, though, this probability could go down.

I wrote a C program to simulate the interesting coin flip game proposed by Thomas Bayes.

http://pastebin.com/iTU8YUtT

The results are as expected. Note that for 4 runs, the average win percentage for the strategy guesser comes out to 50.16%, as compared to TB’s estimate of 0.5 + 0.14 / sqrt(1e4) = 50.14%.

Here are the results of 4 runs:

1e+06 number of games to play

1e+04 number of trials per game

4.999717119760e-01 average fraction T/(T+H)

1409934609 total number of tails flipped

1410065408 total number of heads flipped

500165 number won by strategy guesser

2797 number undecided by strategy guesser

497038 number lost by strategy guesser

4.999749570052e-01 average fraction T/(T+H)

1410065051 total number of tails flipped

1410065408 total number of heads flipped

500104 number won by strategy guesser

2843 number undecided by strategy guesser

497053 number lost by strategy guesser

4.999807637601e-01 average fraction T/(T+H)

1410297206 total number of tails flipped

1410065408 total number of heads flipped

499488 number won by strategy guesser

2848 number undecided by strategy guesser

497664 number lost by strategy guesser

4.999670934217e-01 average fraction T/(T+H)

1409750663 total number of tails flipped

1410065408 total number of heads flipped

501027 number won by strategy guesser

2789 number undecided by strategy guesser

496184 number lost by strategy guesser

Oops, should have double-checked my output. I had an overflow!

Here is the corrected code:

http://pastebin.com/5kKdT5V1

And output for four runs. Note that the total number of heads is now 1e10, as expected. The average win percentage for the strategy guesser is now 50.098%.

1e+06 number of games to play

1e+04 number of trials per game

4.999856753489e-01 average fraction T/(T+H)

10000426071 total number of tails flipped

10000000000 total number of heads flipped

499163 number won by strategy guesser

2776 number undecided by strategy guesser

498061 number lost by strategy guesser

4.999716573786e-01 average fraction T/(T+H)

9999864884 total number of tails flipped

10000000000 total number of heads flipped

500146 number won by strategy guesser

2848 number undecided by strategy guesser

497006 number lost by strategy guesser

4.999754369357e-01 average fraction T/(T+H)

10000017864 total number of tails flipped

10000000000 total number of heads flipped

499682 number won by strategy guesser

2657 number undecided by strategy guesser

497661 number lost by strategy guesser

`4.999821166584e-01 average fraction T/(T+H)`

10000284208 total number of tails flipped

10000000000 total number of heads flipped

499362 number won by strategy guesser

2847 number undecided by strategy guesser

497791 number lost by strategy guesser

This has probably already been said, but regardless, the explanation of why the obvious answer is wrong is, simply, wrong. That is the explanation changes the question. The original question wasn’t “What is the ratio of girls to boys in the average household,” thus the “average of an average” critique is inapt. The question is what fraction of the population is female, and in the block described that’s still 50% (leaving aside the parents, as the example does).

I am not, however, expressing an opinion on the correct answer.

“But there are *no* assumptions under which 50/50 is the correct answer.”

What if I assume k initial families producing children, but whenever one family stops (after getting their boy) another family steps in and starts producing children.

That, of course, would do it, as would the intercession of a benevolent God who always keeps the population balanced. Obviously, then, my “*no*” is an overstatement. Your assumption strikes me as less unreasonable than the benevolent-God assumption, though not by much.

I realize it’s a contrived assumption, but so is the assumption of k families stopping after one generation. My assumption seems more in keeping with the spirit of the original riddle, which heavily implies a large population in a constant state of child production.

I’m honestly having trouble understanding why the original assumption is a better model.

OK, I admit I was skeptical, so rather than do the math,

I did a simulation. Plus I upped the ante a bit. My rule

says the parents keep trying until they have 25 boys (a little

extreme, admittedly), then they stop. I simulated 10000 families. Here are typical results:

1. number of girls: 249779 number of boys: 250000 p = .755

2. number of girls: 251448 number of boys: 250000 p = .041

3. number of girls: 249636 number of boys: 250000 p = .607

The p-value is from a z-statistic comparing the observed ratio

to the null hypothesis, 0.5.

This I believe agrees with what you would predict.

John C.

Thomas Bayes suggests:

You toss a fair coin using one of two strategies:A. You toss until you see heads, then stop.

B. You toss until you see tails, then stop.

Under either strategy, you record the number of heads and the number of tails. Using the same strategy for each trial, you do this 10,000 times … After the 10,000 trials, you tell me the ratio of heads to tails. I’ll then guess your strategy. If I’m correct, you owe me $1. If I’m wrong, I owe you $1.

My thought is twofold:

1. If I offer the ratio truncated or rounded to a few digits, then he will score at chance, because every trail will be 0.50.

2. If I give him the mathematically precise ratio, then he can guess the strategy with close to 100% accuracy, because either the numerator or the denominator will be exactly 10000, and it is unlikely that both will be.

Robert,

I believe we could avoid both of your issues by simply having you tell me if the number of heads is greater than, less than, or equal to the number of tails.

Thomas Bayes suggests:

I believe we could avoid both of your issues by simply having you tell me if the number of heads is greater than, less than, or equal to the number of tails.and I agree.

I just chose either stop-on-head or stop-on-tail and ran a thousand 10,000 trial cases. Then I did that again, and again, ten times in all. Here are my numbers … each row is make a choice about H vs T, run a thousand 10,000-case trails, recording “heads greater” and “tails greater” each time. (Equal is the missing cases if the row doesn’t sum to 1000).

485 514

502 494

487 509

496 500

522 475

469 530

517 481

495 503

493 505

491 509

Care to make ten guesses?

Bob Ayers

Bob,

If you stopped on the 10,000th head, then the probability that you would report the number of heads as greater than or equal to the number of tails is about 0.5028. If you stopped on the 10,000th tail, then the probability that you would report the number of heads as greater than or equal to the number of tails is exactly 0.5.

One strategy I could use would be to guess you used strategy 1 anytime you reported the number of heads being greater than or equal to the number of tails. Assuming that you were equally likely to select either strategy, then the probability that I’d be correct is about 0.5014. So, if can pick the number of times we play, then I’ll eventually be able to win as much money as I want.

If you figured out that I was guessing strategy 1 every time the number of heads was greater than or equal to the number of tails, then you should alway use strategy 2 because I’ll only guess strategy 2 when the number of tails is greater than the number of heads, and that will happen half the time. In this case, we’ll break even. However, I’d probably change my strategy when I detected this was happening, and you would change yours when you detected I had changed mine. I’m not sure what the expected win rate would be if we both picked our strategies in some optimal way. That would be an interesting thing to consider.

The point of this game was to demonstrate that there is a statistical difference between data that are created by the two strategies. It is slight, but it is there. I hope your simulation work and this discussion will help people see the reason that the expected ratio of heads to coin tosses doesn’t have to be 1/2 if you always stop the sequence of tosses on a head.

Happy New Year! I can’t wait to see how popular I’ll be at the party tonight when I ask people this question, AND then ask them a question about all of the families with two children and one boy born on a Tuesday. I’ll either be a big hit, or I’ll get hit.

If you stopped on the 10,000th head, then the probability that you would report the number of heads as greater than or equal to the number of tails is about 0.5028. If you stopped on the 10,000th tail, then the probability that you would report the number of heads as greater than or equal to the number of tails is exactly 0.5.I assumed that your strategy would be like that.

{Is 0.5028 the correct value? I would guess closer to 0.5000 based on the “1/2 extra boy” theorem saying the expectation (B/B+G) is around 0.5001, but no matter.]

I think we well-understand each other — and we may be the only people reading this thread, as others have moved on to a later thread :-)

Yes, you will profit in the long run. But it maybe a long run; you aren’t gonna get rich guessing on the ten trials I posted — which, by the way, we all of the form “stop on first head”.

Happy New Year!

As I said on the Book blog, it has nothing to do with the number of families. The question, regardless of the number of countries (or the number of families) depends on if there is a limit to the number of girls a family can have. If there is (which there will be in reality), it is exactly 50/50. If there isn’t, then there is no answer, as in an infinite sample size, there will be at least one family with an infinite number of girls, and in order to figure the percentage of girls or boys, you would have to divide by infinity.

If you assume that every family has a boy, then the answer is that boys have a slightly higher percentage than girls (becasue we have a population that was slectively sampled – no families with all girls).

The answer to the question, “What is the percentage of girls in the average family, then it is around 30.7, if you assume that that means, “the simple average (not weighted by the number of children in each family) of all the families.” And again, the exact answer depends on if there is a limit to the number of girls a family has before they give up, or if you assume that all of the countries already have a boy (you know that after the fact).

Again, if you just make a rule for a hypothetical country or countries, there is no answer if you do not put a limit on the number of girls in a family since one of the possible permutations of G/B in a family contains an infinite number of girls (and no boys), and you cannot really multiply or divide by infinity and get a reasonable (to this question) result.

MGL:

The question, regardless of the number of countries (or the number of families) depends on if there is a limit to the number of girls a family can have. If there is (which there will be in reality), it is exactly 50/50.This is wrong and has been proved wrong a dozen different ways in the blog posts and the comments. It really would be better if you read before you posted.

if you assume that that means, “the simple average (not weighted by the number of children in each family) of all the families.”There is no assumption involved. That *is* what it means.

there is no answer if you do not put a limit on the number of girls in a family since one of the possible permutations of G/B in a family contains an infinite number of girls (and no boys), and you cannot really multiply or divide by infinity and get a reasonable (to this question) result.This happens with probability zero, so it is not an issue. In any event, the problem asked for the expectation of G/G+B, not of G/B.

WOW! Commentary from Nassim Taleb, in regards to how higher education can make us less robust, is the first thing that comes to mind when reading through these comments.

I happened to stumble across this blog in a somewhat random manner… On the way home to Rochester, NY for the holidays, a friend (who happens to be a former student of Prof Landsurg) was reading “more sex is safer sex.” We took a train from Boston to Rochester, and in that time period, my he (my friend) managed to finish the book. He lent it to me, and I have now also read it (5 stars/two thumbs up!). I figured I would see what else I could find via Google about Landsburg, as I, unfortunately, did not attend college at the U of R, and stumbled upon this blog.

I find it incredible that, given the original question, anybody came to the conclusion of a 50/50 ratio, as this completely neglects the original question. If each family stops having children after their first child (assuming 50% of the offspring will possess a Y chromosome), and the remaining families continue to have children only until a male is conceived, then it would be absolutely impossible to arrive at a 50/50 ratio, at least if one harbored any common sense!

For all intensive purposes the following example should be at least ‘directionally accurate’…

If the following families gave birth in accordance to the rules of the question, and there were a 50/50 chance of conceiving a male, then:

Family 1

1st child – Boy

Family 2

1st child – Boy

Family 3

1st child – Girl

2nd child – Boy

Family 4

1st child – Girl

2nd child – Girl

3rd child – Boy

Total:

Girls=3

Boys=4

The aforementioned doesn’t even weigh each sample based on probability either. Before anyone mentions, I am 100% aware that this is not even close to perfect data. It’s almost 3 AM, and as previously mentioned, directional accuracy was all I was going for.

I will likely play around with this tomorrow though, as 30% is, to me, just short of what I would have expected.

Either way, I am glad to of found a new venue for thinking of this type.

Great stuff!

To the extent that some couples are determined to have a son, this, rather than perniciously evil hatred of girls, could explain the imbalance between males and females in certain societies.

Especially given the exemptions given to rural families in, say, China, and the need for sons to help work in the fields.

It would seem that if, say 50% of families adopt this approach we might see approximately 55% males to 45% females in certain age ranges.

I was going to say that you were wrong because you forgot to include families with BGB or BG or GGBG, etc., but then I realized that that wouldn’t be a possibility given the scenario.

I have found that the ratio of girls to the total population after child bearing is complete is

pg/ptot = (g0 + pcouples) / (p0 + 2*pcouples), where

pg => the total number of girls after child bearing process has been completed

ptot => total population “ “

g0 => initial number of girls in the population

pcouples => number of eligible child bearing couples in the population

p0 => initial population

Proof:

Consider an initial population p0. There will be boys and girls, g0 and b0 such that

p0 = g0 + b0

(for the sake of argument we’ll assume there are no transsexuals).

In the initial population, there will be a certain number of couples able to bear children which we’ll refer to as pcouples such that

p0 = pcouples + pother (0 < pcouples < p0 and 0 < pother < pcouples)

We have a rule that says each couple should produce children until they have a boy. We would like to know the ratio of girls in the population to the entire population at the end of all child bearing, Rg such that

Rg = pg / ptot

where pg is the number of girls after all child bearing and ptot is the total population at the end of all child bearing.

I am going to think of the child bearing process in the following way: round up all of the eligible child bearing couples (the number of which is pcouples).

Round one: Have all of the couples produce a child. We expect half of the couples to produce a boy and the other half to produce a girl. The couples that produced a boy are out of the running.

Round two: Take all of the couples that produced a girl in round one and have them produce another child. We expect half of these couples to produce a boy and the other half to produce a girl.

Continue this process ad infinitum (I am of course using the approximation that the population is continuous).

We need to calculate the total number of girls and the total population after this process.

Total Population:

The total population is the initial population + the children that were produced,

ptot = p0 + pchildren

To count up the children we look at the children the couples produced in the child bearing process. Half of the couples produced 1 child (a boy), a quarter of the couples produced 2 children (a girl then a boy), an eighth of the couples produced 3 children (two girls and a boy), ….

pchildren = (1/2^1)*pcouples*1 + (1/2^2)*pcouples*2 + (1/2^3)*pcouples*3 + …

= pcouples * sum from 1 to infty of n*(1/2)^n

= 2*pcouples

therefore ptot = p0 + 2pcouples

Total Number of Girls:

The total number of girls is the initial number of girls + children that were girls,

pg = g0 + pgirl/child

By a similar process we count the number of children that were girls. Half of the couples produced 0 girls (1 boy), a quarter of the couples produced 1 girls (a boy and a girl), an eighth produced 2 girls (a boy and 2 girls), ….

pgirl/child = (1/2^1+0)*pcouples*0 + (1/2^1+1)*pcouples*1 + (1/2^1+2)*pcouples*2 + …

= (1/2) * pcouples * sum from 0 to infty of n*(1/2)^n

= (1/2) * pcouples * sum from 1 to infty of n*(1/2)^n

= (1/2 * pcouples * 2

= pcouples

therefore pg = g0 + pcouples

Finally,

Rg = pg / ptot = (g0 + pcouples) / (p0 + 2*pcouples)

Consider an initial population of half men, half women partnered into couples,

g0 = (1/2)*p0

pcouples = (1/2)*p0

Rg = [ (1/2)*p0 + (1/2)*p0 ] / [ p0 + 2*(1/2)*p0 ] = ½

Jackson Walters

I’m sure this point was made in one of the 276 comments, so forgive me in advance.

Landsburg’s model assumes that there is no upper bound on the number of children in a family. But that is not possible. If we want to deal with infinite sets, we can’t use the set of all children of k families.

Let’s instead assume that there is an upper bound on the number of children in a family, say N.

Then the expected number of boys in a randomly chosen family is 1/2*1 + 1/4*1 + . . . (1/2)^N*1 = (2^N – 1)/2^N < 1. This is a strictly increasing function bounded by one; hence for any N, the expected number of boys in a randomly chosen family is less than 1. The reason the fraction is less than one is there is a (small) chance that all N children are girls.

For example, when N = 4, the expected number of boys is (1/2)*1 + (1/4)*1 + (1/8)*1 + (1/16)*1 = 15/16. That fraction is less than one because all 4 children could be girls.

“So lets see if I get it. There will be lots of different countries with different populations. The ones with small populations will have more boys than girls. The ones with large populations will have more girls than boys. Therefore selecting a country at random gives an expected fraction that is not 50%, because there are more countries with low population and more boys than those with high population and more girls. This allows me to square the ideas that there are equal numbers of boys and girls overall, but the expected fraction of girls in any country is not 50%.”

This makes sense, except that if you were to behave in a manner which I would call “normal” and samples proportionally more frequently from the larger countries. Doing this would cause the answer to converge to the .5 ratio we expect.

The analogy for the families on the block would be to randomly select a child, then average the percentage of girls in that child’s family. as a result we would be selecting the 12boy family equally as often as we would be selecting any of the 3girl families and the result would be %50.

I think it’s worth noting that the term “country” DOES have a generally accepted definition (which certainly doesn’t involve 1 family, 4 families or 10 families). If we look at a smaller country such as Canada (with reliable census data) we see that there are slightly under 9 million families. Running a simulation with 9 million families until each family has succeeding in producing a boy is certainly large enough to produce a result close to .5. Even applying a reasonable “30 children maximum” per couple would yield a result getting very close to .5 though granted with any finite limits we are asymptotically approaching .5 from below.

Another way of looking at it is that you propose using “expectation” as an excuse for taking a simple average instead of weighted average for each family/country.

A reasonable interpretation however is that a “country” has many families and as a result you’re forced by the wording of the problem to use a weighted average. Those countries/families with 30girls and 1 boy unfortunately have to be weighted 31times more heavily than the (admitedly much) more common 1boy0girl families.

Aron:

you propose using “expectation” as an excuse for taking a simple averageThe word “expectation” has a well defined meaning, namely what you are calling a “simple average”. So yes, I am using “expectation” as an excuse for taking an expectation. There’s nothing else it could be a good excuse for.

A country is not a family. We don’t even have to limit its population to k families, for the same reason we don’t limit the number of children a woman can have, when computing your riddle.

So David and Tom are right. The fraction of the population that is female is 1/2.

That’s what the riddle is asking.

Saying it asks for the fraction expected in a randomly chosen country with k families is not fair.

Your neighborhood would be such a country, not just one family in your neighborhood.

Why should we assume the country is a limited example with a population of k?

Of course no country has infinite population, but also women can’t have infinite girls, children die before they get to marry, and if the method goes on for unlimited time with a limited population the chance of all k families having a first born boy could occur.

You even say in one of your comments: “The problem asks for the expected fraction-of-girls in a single country with (say) 10 families that has gone through the random process once”.

Maybe I forgot my English, because I don’t see those assumptions in the original problem.

I keep looking at the blue block you have at the beginning of the post, where it asks “What fraction of the population is female?” and that fraction, in the most common sense interpretation of the problem’s words, is 1/2.

OK, Bryan got it.

The problem should have advised: “you can asume infinite lives, infinite reproduction capacity, zero twins, and zero children deaths, but please don’t assume infinite population for the country”.

Sacha: You are saying that you want to interpret the problem so that a country has infinitely many families. Fine. In that case E(G/G+B) is still not 1/2.

No, you are wrong. The question asked, “What fraction OF THE POPULATION is female?” You answered the question, “What is the expected value for the fraction of children in a family who are female?” That’s a different question.

Oops, sorry, you are right.