Probability

Intuition Equally Likely Events Sample Space Relative Frequency Or And Complement Large Numbers Dependent Events Conditional probability

A probability is a number between 0 and 1 including 0 and 1.

We use the notation P(eventLabel) = probability to report a probability. 

There are three ways to assign probabilities.

1. Intuition

Intuition. An educated best guess.

2. Equally Likely Events or Outcomes

Equally Likely Events: Probabilities from mathematical formulas

In the following the word "event" and the word "outcome" are taken to have the same meaning.

Probabilities versus Statistics

The study of problems with equally likely outcomes is termed the study of probabilities. This is the realm of the mathematics of probability. In probability the sample space can be determined ahead of time using mathematical formulas. All measures are population parameters. The mathematics of probability determines the probabilities for coin tosses, dice, cards, lotteries, bingo, and other games of chance.

This course focuses not on probability but rather on statistics. In statistics, measurement are made on a sample taken from the population and used to estimate the population's parameters. The sample space is usually not known and might not be knowable. Relative frequencies will be used to estimate population parameters.

Calculating Probabilities

Where each and every event is equally likely, the probability of an event occurring can be determined from

probability = ways to get the desired event/total possible events
or
probability = ways to get the particular outcome/total possible outcomes

Dice and Coins

A six-sided die. Six sides. Each side equally likely to appear. Six total possible outcomes. Only one way to roll a one: the side with a single pip must face up. 1 way to get a one/6 possible outcomes = 0.1667 or 17%

P(1) = 0.17

A penny.

P(head on a penny) = one way to get a head/two sides = 1/2 = 0.5 or 50%

Two dice

Ways to get a five on two dice: 1 + 4 = 5, 2 + 3 = 5, 3 + 2 = 5, 4 + 1 = 5 (each die is unique). Four ways to get/36 total possibilities = 4/36 = 0.11 or 11%

Homework:

  1. What is the probability of rolling a three on...
    1. A four sided die?
    2. A six sided die?
    3. An eight sided die?
    4. A twelve sided die?
    5. A twenty sided die labeled 0-9 twice.
  2. What is the probability of throwing two pennies and having both come up heads?
Definition: Sample space

The set of all possible outcomes in an experiment.

Bear in mind that the following is an oversimplification of the complex biogenetics of achromatopsia for the sake of a statistics example. Achromatopsia is controlled by a pair of genes, one from the mother and one from the father. A child is born an achromat when the child inherits a recessive gene from both the mother and father.

A is the dominant gene
a is the recessive gene

A person with the combination AA is "double dominant" and has "normal" vision.
A person with the combination Aa is termed a carrier and has "normal" vision.
A person with the combination aa has achromatopsia.

Suppose two carriers, Aa, marry and have children. The sample space for this situation is as follows:

mother
father \ A a
A AA Aa
a Aa aa

The above diagram of all four possible outcomes represents the sample space for this exercise. Note that for each and every child there is only one possible outcome. The outcomes are said to be mutually exclusive and independent. Each outcome is as likely as any other individual outcome. All possible outcomes can be calculated. the sample space is completely known. Therefore the above involves probability and not statistics.

The probability of these two parents bearing a child with achromatopsia is:

P(achromat) = one way for the child to inherit aa/four possible combinations = 1/4 = 0.25 or 25%

This does NOT mean one in every four children will necessarily be an achromat. Suppose they have eight children. While it could turn out that exactly two children (25%) would have achromatopsia, other likely results are a single child with achromatopsia or three children with achromatopsia. Less likely, but possible, would be results of no achromat children or four achromat children. If we decide to work from actual results and build a frequency table, then we would be dealing with statistics.

The probability of bearing a carrier is:

P(carrier) = two ways for the child to inherit Aa/four possible combinations = 2/4 = 0.50

Note that while each outcome is equally likely,there are TWO ways to get a carrier, which results in a 50% probability of a child being a carrier.

At your desk: mate an achromat aa father and carrier mother Aa.

  1. What is the probability a child will be born an achromat? P(achromat) = ________
  2. What is the probability a child will be born with "normal" vision? P("normal") = ______

Homework: Mate a AA father and an achromat aa mother.

  1. What is the probability a child will be born an achromat? P(achromat) = ________
  2. What is the probability a child will be born with "normal" vision? P("normal") = ______

See: http://www.achromat.org/ for more information on achromatopsia.

3. Relative Frequency

The third way to assign probabilities is from relative frequencies. Each relative frequency represents a probability of that event occurring for that sample space. Body fat percentage data was gathered from 58 females here at the College since summer 2001. The data had the following characteristics:

count 59
mean 28.7
sx 7.1
min 15.6
max 50.1

A five bin (class) frequency and relative frequency table has the following results:
BFI = Body Fat Index (percentage*100)
CLL = Class (bin) Lower Limit
CUL = Class (bin) Upper Limit (Excel uses)
Note that the bins are not equal width.

Medical Category BFI fem CLL BFI fem CUL
x
Frequency
f
Relative Frequency
f/n or P(x)
Athletically fit* 12 20 3 0.05
Physically fit 20.1 24 15 0.25
Acceptable 24.1 31 24 0.41
Borderline obese (overfat) 31.1 39 12 0.20
Medically obese 39.1 51 5 0.08

Sample size n:

59 1.00

* body fat percentage category

This means there is a...

The most probable result (most likely) is a body fat measurement between 24.1 and 31 with a 41% probability of a student being in each of either of these intervals.

The same table, but for male students:

Medical Category BFI male CUL
x
Frequency
f
Relative Frequency
f/n or P(x)
Athletically fit* 13 9 0.18
Physically fit 17 11 0.22
Acceptable 20 10 0.20
Borderline obese (overfat) 25 9 0.18
Medically obese 50 12 0.24

Sample size n:

51 1.00

The male students have a higher probability of being obese than the female students!

Or

Probabilities can add. The probability that a female student is either athletically fit, physically fit, acceptable, or borderline can be calculated by adding the probabilities

P(females students are athletically fit OR physically fit OR acceptable OR borderline) = 0.05 + 0.25 + 0.41 + 0.20 = 0.91

Note that each student has one and only one body fat measurement, the outcomes are independent and mutually exclusive. When the outcomes are independent the probabilities add when the word OR is used.

P(A or B) = P(A) + P(B)

And

For mutually exclusive and independent events, the probability that event A and event B will occur is calculated by multiplying the individual probabilities. However, this has no clear meaning in the above context. A student cannot be athletically fit and medically obese at the same time.

Complement of an Event (not compliment!)

The complement of an event is the probability that the event will not occur. Since all probabilities add to one, the complement can be calculated from 1 - P(x). The complement is sometimes written P(NOT event). In the foregoing example we calculated P(Not medically obese) = 0.91

Law of Large Numbers

For relative frequency probability calculations, as the sample size increases the probabilities get closer and closer to the true population parameter (the actual probability for the population). Bigger samples are more accurate.

Non-mutually exclusive outcomes/dependent outcomes

Consider the following table of unofficial results from the summer 2000 senatorial election in Kitti and Madolehnihmw. Candidates from both Kitti and Madolehnihmw ran for office. One Kitti candidate was advised that he was spending too much time in Madolehnihmw, that he would not draw a lot of votes from Madolehnihmw. To what extent, if any, is this true? Can we determine the "loyalty" of the voters and make a determination as to whether campaigning outside one's home municipality matters?

K M M K M K K M M
DEdwa BEtse BHelg OILawr DGNeth STSalv HSeme JThom BWeit Sums
Kitti 243 85 167 1003 185 173 902 14 59 2831
Mad 13 702 582 129 711 48 176 25 158 2544
Sums: 256 787 749 1132 896 221 1078 39 217 5375

From the above raw data we can construct a two way table of results. This type of table is referred to as a pivot table or cross-tabulation.

    Voter Residency  
Candidate residency   K Kitti M Mad Sums
W Kitti 2321 366 2687
E Mad 510 2178 2688
  Sums 2831 2544 5375
Basic statistical probabilities from the above table

What percentage of voters reside in Kitti?
P(Residency of voter is Kitti K) = P(K) = 2831/5375 = 0.53 = 53%

What percentage of voters reside in Madolehnihmw?
P(Residency of voter is Madolehnihmw M) = P(M) = 2544/5375 = .047 - 47%

What percentage of all votes did Kitti candidates receive?
P(W) = 2687/5375 = .4999 = 49.99%

Try the following at your desk:

What percentage of all votes did Madolehnihmw candidates receive?
P(E) = 2688/5375 = 0.5001 = 50.01%

And

What percentage of the total vote is represented by Kitti residents voting for Kitti candidate?
For AND look at the INTERSECTION and use the number in the intersection.
P(K and W) = 2321/5375 = 0.43 = 43%

Find P(K and E), the percentage of the total vote represented by Kitti residents voting for Madolehnihmw candidates.
P(K and E) = 510/5375 = 0.09 = 9%

Try the following at your desk:

Find P(M and W), the percentage of the total vote represented by Madolehnihmw residents voting for Kitti candidates.
P(M and W) = 366/5375 = 0.07 = 7%

Or

Find P(K or W), the percentage of the total vote represented by all Kitti residents and all voters who voted for a Kitti candidate. This one is easiest if done by looking at the table. The three cells that have to be added are 2321 + 510 + 366. This total has to then be divided by the total, 5375.
(2321 + 510 + 366)/5375 = 0.59 = 59%

This can also be calculated from the following formula:
P(A) or P(B) = P(A) + P(B) - P(A or B)

P(K or W) = P(K) + P(W) - P(K and W)
2831/5375 + 2687/5375 - 2321/5375 = 0.5267 + 0.4999 - 0.4318 = 0.59 = 59%

Try the following at your desk:

Find P(K or E), the percentage of the total vote represented by all Kitti residents and all voters who voted for a Madolehnihmw candidate.
(2321 + 510 + 2178)/5375 = 0.93

Conditional Probability

In conditional probability a specified event has already occurred that affects the remaining statistical probability calculations. Suppose I want to only look at how the Kitti residents voted, excluding consideration of the Madolehnihmw voters. I might be asking, "What percentage of Kitti residents (not of the whole vote) voted for Kitti candidates?" We write this in the following way:
P(W, given K) = 2321/2831 = 0.82 = 82%

Think of the above this way: put your hand over all the Madolehnihmw data and then run your calculations. "K" has occurred, so we can forget about the "M" column and the sums.

The 82 percent represents, for lack of a better term, a "Kitti loyalty factor." In Kitti, 82 out of 100 hundred residents will vote for the home municipality candidate, or about 4 out of 5 people.

Try this at your desk:

Find the "Madolehnihmw loyalty factor" P(E, given M):
2178/2544 = 0.86

That is 86 out of 100 residents will vote for the home municipality candidate in Madolehnihmw.

"Cross-over" voting

Find the percentage of Kitti voters who voted "Madolehnihmw" as a percentage of all Kitti voters:
P(E, given K) = 510/2831 = 0.18 = 18%

Call this the "Kitti cross-over factor." 18% of Kitti residents will tend to cross over and vote outside their municipality.

Find the percentage of Madolehnihmw voters who voted "Kitti" as a percentage of all Madolehnihmw voters:
P(W, given M) = 366/2544 = 0.14 = 14%

A campaign statistician for a Kitti candidate might make the following line of reasoning. Only one in seven (~14%) Madolehnihmw residents is likely to vote Kitti. In some sense, an argument could be made for a Kitti candidate not spending more than one in seven days campaigning in Madolehnihmw.

On the other hand, one in every five Kitti residents is likely to vote Madolehnihmw. A campaign statistician for a Madolehnihmw candidate might reasonably recommend spending one in every five days over in Kitti to capitalize on the cross-over effect.

Favorite Meat/Favorite Sport Fish Chicken Dog Sums
Volleyball FFF F 4
Basketball MM M MM 5
Baseball MM M 3
Hockey M 1
American Football F 1
Pool

M 1
Swimming M 1
Sums: 12 2 4 18

Homework: Section 4.2 Page 198 #23

Statistics home Lee Ling home COM-FSM home page