Everything Has A Limit

Poker, economics, and personal crises, a three-for-one deal

Previous Entry Share Next Entry
Hours You Won't Get Back
peterbirks
I spent several hours on and off yesterday and today pondering the issues raised in this post:

http://alexbellos.com/?p=725

which asks:

"I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

and comes up with the answer 13/27.

As is usually the case with "probability" questions, it isn't a question of probability at all, but one of logic and set theory (or what I call set theory). The numbers part is trivial.

Bellos also throws in a few red herrings, which can lead you down the wrong paths. Most of the comments were fairly wrong-headed, I felt.

Like most people, my immediate response was, "well, since, without the added information, the answer is 1/3, and since the added information is nothing to do with anything, the answer must still be 1/3.".

However, I did a "frequentist" analysis, and the 13/27 answer seemed to hold good, so I looked for a flaw in my analysis.

Eventually the question came down to "can a moderator ("born on a Tuesday") change the overall probability, even though the moderator has nothing to do with the initial probability?"

And the answer is, yes.

Here are some more interesting examples of this.

I throw two dice. One is a six thrown on a Tuesday. What is the probability that I throw two sixes.

Answer 13/70.

The next one assumes that children like one of two colours, red or yellow, but that they only develop this preference when they reach the age of five.

"I have two children. At least one is a boy who prefers the colour Red. What is the probability that I have two boys?"

Answer, 3/7.

I throw two dice. One is a six that prefers the colour Red. What is the probability that I throw two sixes?

Answer, 3/20.

The last just seems ridiculous, until you realize that what is happening is that the "universal set" is changed by the modifier. So it doesn't matter whether the modifier has anything to do with the initial statement, or indeed whether the modifier is physically possible (as in the last case). All that matters is how it alters the constituency of the universal set.

The easiest case is the example:

"I have two children, one is a boy who prefers the colour Red. What is the probability I have two boys".


As we know, without the modifier, the probability is 1/3*

BB
BG
GB

If we add in the colour preferences, we get a sample size of 12 fathers (could be mothers, of course).

They have the following pairs of children.

BR BR
BR BY
BY BR
BY BY
_________

GR BR
GR BY
GY BR
GY BY

BR GR
BR GY
BY GR
BY GY

Of this sample, 8 are GB or BG, while 4 are BB, retaining the probability of 1/3.

But now, apply the "modifier" and ask all of the fathers whose children DO NOT fit into the statement to "step aside"


BR BR
BR BY
BY BR
..................BY BY
________________

GR BR
..................GR BY
GY BR
..................GY BY

BR GR
BR GY
.................BY GR
.................BY GY


Notice that only one of the BB group steps aside, while four of the GB/BG group steps aside.

This reduces the sample size from 12 to 7, of whom three were originally in the BB group and four were in the BG/GB group.

So the answer to the question:

"I have two children, at least one of whom is a boy who prefers the colour Red. What is the probability I have two boys"? is 3/7 (up from pre-modifier 1/3)
Similarly:
"I have two children, at least one of whom is a boy who prefers the colour Yellow. What is the probability I have two boys"? is 3/7

"I have two children, at least one of whom is a boy who prefers the colour Yellow. What is the probability I have 1 girl and 1 boy?" is 4/7 (down from pre-modifier 2/3).

Note how The third statement is less counterintuitive. "Hell", you say, "that must reduce the chances of there being GB/BG."


_________________

* For those who disagree with this premise, I refer you to:



http://www.jesperjuul.net/ludologist/?p=1048

“a) We keep flipping two coins simultaneously.
b) If both coins are tails, we flip the coins again.
c) Otherwise, you give me $15 if there is one head, and I give you $20 if there are two heads.
If the probability is 1/2, you will be making money. If it’s 1/3, I will.
Any takers?”

This also gives another way of explaining why the 13/27 (about 48%) answer is correct. The "cocktail party" explanation cleverly gets the thinker's move away from "moving down from 50%" towards "moving up from one-third".

Sample 1, cocktail party: All dads with a boy are asked to raise a hand, and 750 of them do so. We know that of these, 250, or 1/3, have two boys. Next, the 750 dads whose hands are up are told to tell a neighbor which day of the week their son was born. Key part: If they have two sons, they should use the birthday of only one of their sons (randomly chosen). 1/7 of the 500 dads with one son (and one daughter) say Tuesday, and 1/7 of the 250 dads with two sons say Tuesday. So of those who say Tuesday, just 1/3 have two sons. And that’s true of any other day of the week.

Sample 2, raising hands by day of week: Dads with a son born on a Sunday are asked to raise their hands. Next dads with a son born on a Monday are asked to raise their hands. Then dads with a son born on a Tuesday are asked to raise their hands, etc. Note that most dads with two sons will raise their hands twice, except for those both sons are born on the same day of the week. This means on any given day of the week, dads with two sons are over-represented relative to dads with one son, since they have two opportunities (one from each son) to have a son born on that day. The diagram demonstrates this over-representation– 13/27th, or 48%, of the dads claiming a son born on a Tuesday have two sons. And that’s true of any other day of the week.

The key difference in cocktail party sampling is that each dad says the birthday of one son, even if they actually have two. In the second sampling, the dads with two sons get to raise their hands on two days of the week, one for each son (unless the sons were born on the same day of the week).


Thus nearly 48% of fathers having a son who claim a son born on Tuesday have two sons, rather than one-third, because the fathers with two sons have two votes. They fail to reach 50% because a small proportion (1/49), just over 2% have two sons both born on a Tuesday, so only raise their hand once.

______________________

  • 1
Nope, sorry, I don't buy the Ludologist's explanation.

The starting position (before modifiers) is that you have two children. Note that even giving the gender of one of them is a modifier. There are four equal possibilities, the obvious BB, BG, GB and GG.

Now modify this by stating "one of them is a boy." This means that the chance of the other one being a boy is fifty-fifty. One way of looking at this is that you've removed one degree of freedom from the set, leaving only the possibility of either G or B (and the order of birth doesn't matter). The other way of looking at it is to divide the combinations into two sets, selected by age and denoted o for older and y for younger:

Set 1 (Bo): Bo-Gy Bo-By
Set 2: (By): By-Go By-Bo

Clearly there's a fifty-fifty chance that the boy you're thinking of is Bo or By. This selects between set 1 and set 2, thus giving you a 50/50 for two boys.

Now add the modifier for Tuesday. There's no obvious reason why this should be classified as a different type of modifier to "one is a boy," which is why the above result obtains. Using the same logic as you present, and given my lemma above, the odds still obtain as 13/27 ... but ...

... but this relies on the English language, not on statistics alone. If I say "one is a boy, born on a Tuesday," I may mean this exclusively or inclusively. The reduction in odds to 13/27 only comes about because one is using the statistics to combine the two modifiers (boyness and Tuesdayness), which presupposes that there is somehow a difference between "other boy and Tuesdayness" and "girl and Tuesdayness." There isn't. In effect, this argument depends upon assuming exclusivity rather than inclusivity, but only in the case that the other child is a boy.

In other words, it's a manipulation of the language.

I've never been convinced by the "switch the door" argument in the Let's make a deal Monty Hall problem, either. Monty is always going to pick a goat (unless he's stupid), and he has advance knowledge that guarantees goatiness. This advance knowledge is key. It reduces the remaining set to {car-goat goat-car}, and since you, the contestant, have no clue which way round it is, the odds are still 50-50.

I suppose I should write a computer simulation of this. Perhaps I will.

I should perhaps expand on that set 1/set 2 thing.

If I am asked, as the parent of two children, "is one of them a boy?" and I affirm the proposition, then the chances of two boys are 33%. The question excludes the possibility of GG for a "yes" answer.

If I volunteer the information myself, then you have no clue whatever concerning the basis for my pick. I might favour my older child over my younger child, in which case the gender is irrelevant and the odds are 50/50. I might reverse that preference, in which case the odds are the same. I might have tossed a coin to pick between younger and older.

But let's say I prefer little boys or little girls (no sniggering at the back, there). If I prefer little boys, then 25% of the time I have no choice and say "girl," whereas of the other 75% of the time I say "boy" with a 1/3 chance that the other child is a boy. The same obtains in reverse. This means that in 25% of the cases (where I specify gender of one child) the gender of the other child matches automatically. Out of the other 75% of cases, only 25% have a gender match. However, 25% + 25% = 50%.

I could, of course, toss a coin to choose whether I prefer little boys or little girls. (Do stop sniggering, Prendergast!) The same result obtains.

The point is, I have somehow chosen one axis (gender) and thus reduced the degrees of freedom in the argument. Since you have no idea how I made this decision, the odds are still 50/50.

Ah right. The answer is not quite as presented for a whole range of reasons.

"I have two children. One is a boy born on a Tuesday. What is the probability I have two boys?"

What sex is the child who is not known to be a boy - (ignoring some issues I will list later) why is the answer that there is a equal chance the other child is a boy or a girl.

The minute you DEMONSTRATE that one of the birth events resulted in a boy then the probability moves to 1/2.

“if you take all the boys in the world with only one sibling, the chance of any of them having a brother is 1/2.”


Of course there are not equal numbers of boys and girls born (not least due to gender related miscarriage)

There is some evidence that, although it is the man that determines the sex of the child, conditions inside some women (from memory it is acidity) that favours one type of sperm over the other. In other words some women are more likely to have boys that the average women so their chance of BB is greater than average (and the average is greater than 50%)

There is also a gender bias based on the the day of the week the child is born (eg more births are induced on Friday than Monday, larger babies are more likely to be induced and boy and girl babies are not the same average birth weight)

The age of parent makes a difference - as in when in history the question was posed. Trace components in water have made a difference to gender balance and these chemicals were not in use a generation ago

The "reality" of gender imbalances etc are irrelevant to the logical basis of the question, which is why I prefer the question of throwing two dice question. It might be true (there might be imbalances on the days born, because of induced births) but it is outside the scope of the question. I could have put in the various "assume" riders, but my post was long enough as it was.

The "if you take all the boys in the world" logical statement bears no relevancy (in logical terms) to the statement as posited. You might as well say "if you take all the tails tossed in the world where two coins have been tossed".

Yeah, well, so what? It's nothing to do with the statement "I toss two coins". The mathematics of the coin tossing (a logical equivalent) demonstrates the 1/3 point.

Look at the cocktail party demonstration. This points it out quite clearly.

PJ

Got called away - so just to finish

Is Claire/Martin a different option to Martin/Claire? - I think not because it is the number of the each gender that is in question rather than the order - but if you were to say that you still have 4 options not the 3 you list.

So you have BB BB BG GB - what you cannot do is say that B/G is not the same as G/B but that Martin/Stephen is the same as Stephen/Martin. To be consistent there must be 2 versions of BB not one.

Hence if you have 1 son you have a 50% chance of two sons (2/4)

Re: Got called away - so just to finish

I'm with you on the DEMONSTRATE thing. I also disagree with the cocktail party demonstration, since it deliberately excludes putting your hand up twice for TT for no good reason. These candidate parents are therefore not included in the distribution.

Re: Got called away - so just to finish

Try to get away from the son/daughter analogy for a second. It's confusing you. Let's think in terms of black and green snooker balls.

I have 1000 black snooker balls and 1000 green snooker balls

1000 people come to a cocktail party. I give each of them two snooker balls at random.

This should result in roughly 250 people getting two green balls, 250 people getting two black balls, and 500 people getting one green ball and one black ball. If you disagree with that analysis, there's not much hope I fear.

Half way through the party I ask all of those with at least one black ball in their pocket to raise their hands.

About 750 people will raise their hands.

I now ask all of those with two black balls in their pocket to raise their hands.

About 250 people will raise their hands.

Therefore if someone has "at least" one black ball in their pocket, the chance of them having two black balls in their pocket is 250/750, or 1/3.

The situation with two children (= two snooker balls, equally likely to be black or green) and the black ball ("B" = Boy) and Green Ball ("G" = Girl") is the precise logical equivalent to the Parent with two children.

You can give the snooker balls names such as Stephen or Martin if you like, but it won't change the odds.

I wouldn't worry about this -- many many people fail to get it (as indicated by the responses to Bellos)!

PJ

Re: Got called away - so just to finish

Language again, I'm afraid (and sorry about the mess up there). Four or a thousand, it doesn't matter, except to smooth the curve.

If you're already going to ask about black balls, then you're eliminating the 25% of candidates who have GGs. One out of three BBs sounds quite reasonable in the circumstances.

However, if you don't eliminate the GGs (because you let the respondent have an element of choice), then you're left with one out of four, as you'd expect.

The question is, how do you pose the question? In this case, you have already decided, ahead of time, to eliminate GGs.

If, on the other hand, the rubric was to pick a ball at random and stick it in your pocket, and the question was "is the other ball of the same colour?" then the answer is going to be 50/50 yes/no. Which ball is the younger, and which ball is the older, is entirely irrelevant. (As a matter of fact, the question might as well be "is the other ball of the opposite colour?)

Obviously. My point is that the random first choice is obfuscated by the terms of the "puzzle," which implicitly and incorrectly denies the respondent the element of choice.

Re: Got called away - so just to finish

On a slightly sideways slide ... I notice that one of the comments on the cocktail party link brings in what he calls "Bayes' rule," or what I think of as "Bayes' theorem."

Now, simply stated (and I hope I've got this right), Bayesian statistics requires a starting point with some given probability distribution, and uses some sort of combinatorial statistics combined with inference from that starting point to predict the probability of a given outcome.

Actually, that's neither simple nor, probably, a decent description of Bayesian statistics. I may be the first person to stand corrected even before he's been corrected. I search for superlatives in all areas.

Anyway ... I can't help feeling that this sort of statistical puzzle is intimately connected with Bayes. Which is interesting to me in a "professional" (as in some git paid me) sense is that I see the point, but misapprehend the (ridiculously simple) mathematics.

For example, Paul Graham has developed a Bayesian spam filter which seems to be remarkably accurate. I've got a commercial idea to invert this mechanism for my own purposes (in fact, not so much invert as to use it for sorting into arbitrary buckets), but for the life of me it doesn't click with my back-brain.

Equally, I worked with a web-crawling company some few years back that used Bayesian algorithms to detect possible IPR infringement. Take the case of Disney, for example (they did, in their documentation). Mickey Mouse was assigned a rating of 66%, and Donald Duck 45%. For some reason, when you did the calculation, this meant that a site that mentioned both of them was less likely to infringe on IPR than a site that mentioned either of them.

I'm beginning to suspect that I read the documentation wrong.

FWIW, here's the Wikipedia summation of all this.

I like the rephrasing as "I have two children and it is not the case that they are both girls.' What is the probability that both children are boys?" (note that, logically, this question is identical to the question as first phrased)."

Of interest, though, is that if you add the Tuesday modifier, I think that, even if you start off with the 1/2 hypothesis you still end up with 13/27 as the answer/ I haven't checked the maths though.

PJ

I concur with the 13/27 calculation, just as I agree with the 1/3 probability of both boys in the usual simple question. However I have always had a deep unease about this problem as it seems to be more about semantics and artificial constructs than mathematics.

The problem for me is that the property of having "at least one boy" is a very artificial one. The sub-populations of Boy-Girl, Girl-Boy, and Boy-Boy are all equally rich with this property. They have an equal weighting so we get a probability of 1/3 because we are simply not getting any double credit for the fact that Boy-Boy has actually got two boys in it.

Once we introduce some modifier (like born on a Tuesday) which only 1 in m of the boy population possesses we decimate the boy-girl and girl-boy populations by a factor of m but the boy-boy one by only a factor of roughly 2/m [actually 2/m - 1/m^2] as we've got two shots at it. It's not quite double as they could BOTH have the property and we wouldn't get quite the proper credit for that scenario. Anyway now the boy-boy group has roughly double the weighting and so rises to account for roughly half the cases again.

Why do I object to this problem? Because equivalence under the "has at least one X" property is going to give rise to extremely distorted equalities when applied to non-mathematical objects. The groups 1 man + 1 woman, 1 man + 1000 women, 1000 men + 1 woman are all equal under the "has at least one man" rule but not equal in any meaningful or natural way.

--- matt

Beautifully summarized, Matt. The modifier decimates the GB/BG population at nearly twice the speed as it does the BB. Thus the intial 1/3-2/3 proportion "returns" to near the 50:50 "norm".

And, yes, I agree that's it's more about semantics and artificial constructs. Another good point. Note how the sentence carefully picks two "closed" sets, both of which have equal distributions of probability.

"I have two bits of food. One is bottle-green. What is the probability that both my bits of food are bottle-green?"

Semantically, an identical question. Theoretically, one which has a "real" probability answer. But it won't be a question asked at any of these conferences. This in a sense "reveals" that these questions aren't about boys and/or girls, or days of the week, at all. They are about numbers. The introduction of a "fake" real-world analogy only serves to cloud the issue (often with a great deal of success).

PJ

I'll tell you the insight that made this make sense in a flash for me.

"I have two children. One is a boy called Alan born at 5:36 and 14 seconds in the morning of the 17th of June. What is the probability I have two boys?"

If he just said "one is a boy", then the piece of information given could refer to either of them and we need to investigate the cases Boy Not-Boy / Not-Boy Boy / Boy Boy, hence 1/3.

If he is incredibly specific about one of them, then it's clear that he has given (technically almost) all the information about one boy and (technically almost) no information whatsoever about the other, and thus the chance of the second being a boy is (technically almost) indistinguishable from the chance of the second being a boy given that we have no information about them, or ½. We need to investigate the cases IncrediblySpecificBoy Not-IncrediblySpecificBoy / Not-IncrediblySpecificBoy IncrediblySpecificBoy / IncrediblySpecificBoy IncrediblySpecificBoy, and the chance of the last of these is (technically almost) zero.

I mean, who has twin boys and names them both Alan?

Chris; this is a superb way of explaining why an unrelated event changes the likelihood of the parent having two boys. Love it!

It also illustrates how, as the number of options in the modifier increases, so the likelihood approaches the 'limit' of 50% (or, if you started off from the 50% assumption, only shifts it downwards a very small amount).

PJ

  • 1
?

Log in

No account? Create an account