Boy or Girl Explained
The “Boy or Girl Paradox” (also called “The Two Child Problem” in addition to other names) is generally phrased as follows:
You know a couple who has two children. At least one of the children is a girl. What is the probability that they have two girls?
This is an ambiguous problem, which leads to different answers depending on the assumptions that are used. Not enough information has been provided to produce a definite answer, and the unstated assumptions fill in the space needed to complete the logic.
Here I investigate this problem and explain the ambiguity.
This problem was initially proposed by Martin Gardner in 1959, who phrased it as the following two questions:
- Mr. Jones has two children. The older child is a girl. What is the probability that both children are girls?
- Mr. Smith has two children. At least one of them is a boy. What is the probability that both children are boys?
The original purpose of these two questions was to demonstrate that they have different answers, because of a subtle difference in the information provided. The problem is that perhaps they do or perhaps they do not.
Nobody doubts the answer to Question 1. We are provided the sex of the older child. The sex of the other child is equally likely one way as the other (ignoring that, in the real world, the frequency of having sons and daughters is not exactly 50/50). Therefore, the probability is one half.
Question 2 has been worded to be very similar to Question 1 in an attempt to lead the reader to assume that the reasoning works the same way and that the answer is the same. A different answer is presented, however, usually by the following reasoning.
Each child could be a boy or a girl. Therefore, denoting the sex of the two children by two letters (“B” for boy and “G” for girl), with the first letter indicating the sex of child one and the second letter indicating the sex of child two, there are four possibilities:
BB | GB ----+---- BG | GG
Each of these possibilities is equally likely, occurring with a probability of one fourth. The information that “at least one of them is a boy” allows us to discard “GG” as impossible. Therefore, three possibilities remain, each equally likely: BB, BG, and GB. Only one of these possibilities is that there are two boys, and thus, the probability of Mr. Smith having two boys is one third—not one half as in Question 1.
There is nothing wrong with this reasoning, and indeed, as Gardner later pointed out, if one were to collect a list of all of the families with two children, say from the phone book, and eliminate all of the families with no boys, then roughly one third of the remaining families will have two boys. Nevertheless, this reasoning relies on an unstated assumption, as I shall demonstrate below.
If the odds of having a boy and having a girl are equal, then the same situation can be examined by the toss of two fair coins. Consider a game in which I use a cup to toss two coins, which are hidden from view under the cup. Then I peek under the cup and announce the value, heads or tails, of one of the coins, which is determined at random—say the first coin I see or the closest coin to me or whatever. Is there a way for someone who has not seen the coins to guess the result for both coins with this information and be correct more than half of the time?
Well, if I announce that one of the coins is heads, then one knows that there are two coins with a 50/50 outcome of heads and tails, and at least one of which is heads. Since this seems similar to the situation of two children with at least one boy, it is tempting to conclude that one should guess there is one coin heads up and one coin tails up, since the reasoning above indicated that the probability of two boys (or two heads) is half of the probability of only one boy (or a heads and a tails).
This conclusion is incorrect, however, and if you don’t believe me, feel free to try this experiment yourself. You will find that you are correct only about half of the time. Obviously, this situation requires a different analysis than the one provided above.
Since the person trying to guess the result of the coin toss does not know what value (heads or tails) will be called out when the coins have been tossed and are examined, he will need to reason without that information. He could choose to decide that the other coin is always heads or that it is always tails, but that’s just the standard way of betting on a coin toss. The resulting probability will always be 50%, and nothing can change that.
The other possibility is to use the information given about one of the coins to guess the value about the other. In this case, he can choose to guess that the two coins are alike or that they are different. (To be comprehensive, I should mention that the person can also employ a “mixed strategy,” in which he assigns a probability to the two choices and picks at random, but this generalization, commonly used in game theory, doesn’t affect the results discussed here.) This is the strategy mentioned above, when it is reasoned (incorrectly) that being different is twice as likely as being the same.
The problem is that knowing the result of one of the coins tells nothing about whether the two coins are alike or different. Therefore, there is no way to use this information to formulate a strategy to improve the odds of success. Nevertheless, it is natural to ask whether there is a way somehow to rig the system and set up a game, like the card version of the Bertrand’s Box Paradox, in which the odds end up in one’s favor.
One way to do this is to have a third person handle the coins and report the result of the toss, someone who is honest and supposedly “neutral,” but who behaves in a predictable way. Suppose that I have arranged with this person to always call out that one of the coins is heads if either of the coins shows a head. This additional information is quite useful, for if he calls tails, then I know for certain that the result of the coin toss is two tails.
Even if he calls out heads, I have information that can increase my odds of guessing correctly. Of the four possibilities that are evenly likely—two with the coins alike and two with coins different—observing that tails wasn’t called eliminates one of the possibilities—that the two coins are alike (two tails). Therefore, it is advantageous to guess that the coins are different, since that result is twice as likely as the result that both coins are heads.
So what is going on here? We have two situations that, on the surface, seem like the same problem—there are two coins, at least one of them is tails—but yield different probabilities for the result of the coin toss. To understand what is going on, it’s important to note that the key difference that changed the probabilities is the knowledge that the information to be provided was biased towards heads.
It is this knowledge that is the implicit assumption that leads to the usual answer to the Two Child Problem. This is similar to the reasoning that is often given in the Monty Hall Problem, where the unstated assumption is that Monty will always reveal a door with a goat. Unless that assumption is used, then there is no advantage in switching. Most people, however, apply this assumption without even thinking about it, because having the game show host reveal the prize during the middle of the game for no reason goes against the spirit of the game.
Returning to the Two Child Problem, the implicit assumption that leads one to determine that two girls is less likely is that the information provided was restricted to girls. That is, there never was a possibility to learn about the couple’s potential sons. If, on the other hand, the information had been provided about one of the couple’s children, regardless of sex, then the information could have been “at least one child is a boy” or “at least one child is a girl,” depending on the sex of the children, but either statement would have been equally likely to have been provided. In that case, the odds of the sibling being a boy or a girl are even, just as in the experiment with the two coins.
Conclusion
As deeper analysis of these “paradoxes” demonstrates, implicit assumptions can have a substantial impact on how additional information affects estimates of probability. Therefore, extra care should be taken to ensure that all of the assumptions have been identified and their applicability has been assessed. If the assumption does not fit within the information that has been provided, then it should be discarded, at least until such time as it can be justified by additional information.
So if you know a couple who has two children, the sex of whom you do not know, and you meet them on the street with their daughter, then you can assume that the odds are even that their other child is a girl. If, on the other hand, you see them taking their daughter to a Girl Scout meeting, then you can safely reason that the odds are two to one that their other child is a boy. Why? Because they would not be taking a boy to a Girl Scout meeting, and that is enough information to change the odds.