|
| |

Hypothesis Testing and Probability
9/25/01
I will quickly go over the lab from Thursday, because it will allow us to get at at least two of these topics.
In that lab we created sets of data that were randomly sampled from a single
population with a mean of 34.93 and a standard deviation of 4.55. In other
words, the null hypothesis (equal means for both conditions) was true, and we
are looking at what to expect under those conditions.
- Review the Liddle study
- The distribution below is the distribution of 100 randomly drawn samples
from the Disclose group.
- These are from this year

.
- Then comes the distribution for the NonDisclose group.

- Notice that the means differ only very slightly between the two
conditions--34.82 - 34.94 = -0.16,
where the standard deviation of the means is approximately .85.
- Finally, the one we care most about is the distribution of differences
between means. That follows:
- These data are combined across three years.

- The normal distribution has been superimposed just for clarity, although it does distort how we see figures.
- If the effect of disclosure is null (i.e. if the null hypothesis that µ1
= µ2 = 0 is true), we would expect that the Disclose group would differ from the
NonDisclose group only as a result of random error.
- You can see that if we repeated this experiment 290 times, about half of the time the Disclose
group would have a higher mean than the NonDisclose group, about half the time it would be less.
The average difference between the means is 0.10, which is very small. There certainly does not appear to a case of
Disclosure leading to lower means than the Non-Disclosure condition. The mean of this distribution is not 0.00, but it is very close to it.
- In the text I give (but not until Chapter 7) the standard deviation
of the sum of two (independent) variables as the square root of the sum of
their separate variances.
- Going to this year's data
- Remember that we only have 100 samples, not an infinite number of
them.
- Sqrt (.82^2 + .91^2) = 1.22, which isn't all that far from the 1.19
that you see if you look at the distribution of differences over all
three years.
- The mean of the differences (-.12) is equal to the difference of the
means (-.10) when you ignore the fact that I gave differences on 3
years' data instead of 1.
- Liddle found a mean of 35.08 for the Disclose condition and 34.78 for the
Not Disclose condition, for a difference of 0.30. Notice that this is a
difference that will occur very frequently when the null hypothesis is true.
(It is very near the center of the distribution of expected mean differences
above.)
Plotting the t distribution
- I had everyone calculate the t test for their 10
replications.
- t is a test on the difference between two means, and t
will depart far from 0 if the null hypothesis is false, and stay around 0
if the null is true.
- I plotted the frequency distribution of the obtained values of t.
Because I believe that the null is true, I expect to see these t's
clustered symmetrically about 0.
- In fact, with this many cases the t distribution should
appear to our eyes very much like a normal distribution.
- This is plotted below for data combined over two years (one year
I did not record t values).
- The distribution should not be negatively skewed--I was just
unlucky.
- Liddle's t would have been 0.26. You can see that this is a
common value of t to find when the null is true.

I happen to know that when the null is true, only 5% of the time would we
get a t that exceeds about 2.005. 5% of 190 = 9.5. You can see that in fact, we got
7 values at least that large. Pretty close.
The fact that I expect about 49.5 out of 2900 means that I will make a Type I
error 5% of the time. We actually only made 7 Type I errors.
Students might reasonably expect that the values of t would be
linearly related to the values of the difference in the means.
The relationship will not be perfect because each sample has a different
standard deviation.
I plotted this below just to give a sense of what happens.

Probability Theory
Go quickly over most of this.
Basics
Frequentistic (also known as Relative Frequency)
This is the one we will make most of.
Analytic
Subjective
Events
The basic unit or outcome
Flipping a coin event = head or tail
Counting homeless event = found 58 people without homes
Example from Achenbach
Events: Score > 63. (I use 63 for an example because this is often used to
define a behavior problem kid.
If 5% of population are problem kids, what is probability that out of the next 100
people we count, 10% would turn out to be problem kids?
The event would be the 10% problem kid result.
Point out that random sampling would be crucial here.
Independent events
Two events are said to be independent if p(event #2) does not
vary as f(event #1)
Ask them for an example from research they know.
We often assume that observations (and hence errors) are independent, and that is basic to much of what we will do.
Mutually exclusive
Two events are mutually exclusive if the occurrence of one
precludes the occurrence of the other
Coin cant land both Head and Tail
Ask them for an example for Psychology.
Exhaustive
A set of events are exhaustive if no other outcome is possible.
On a Likert scale, the values Strongly Disagree, Disagree, Neutral, Agree, and Strongly
Agree are mutually exclusive and exhaustive.
Laws of probability
Additive law
The prob. of the occurrence of one of several mutually exclusive events = sum of
their individual probabilities.
Prob. of 8 or more head out of 10 = p(8) + p(9) + p(10)
For HIV example, we have four possible conditions:
Healthy and test neg, healthy and test pos., sick and test neg, sick and test pos.
Multiplicative law
The probability of the joint occurrence of two independent events =
product of their separate probabilities
p(X,Y) = p(X)*p(Y)
Use the AIDS study to show the calculation of joint probabilities. But
we can't, because testing positive and having AIDS are not independent
events.
This is an example of an hypothesis test. We say If such and such is true, then
we expect ... We then look to see if that is what we actually got.
We will see a more formal example of this test later using Chi-square
Joint and Conditional probability
Joint probability. What is the probability that Event A and event B
will occur?
What is the probability that you will engage in unsafe sex and that your partner will
have AIDS?
Conditional prob What is the prob. of A occurring given that event
B has occurred?
What is the probability that you will engage in unsafe sex given that your
partner has AIDS?
We hope that this probability is very small.
Examples
Two years ago in another class I talked about AIDS and AIDS Screening.
The following is cut from that class's notes. It lays out part
of the problem in a slightly different way than I did in class.
Assume that 0.2 percent of the population of college-age students is HIV positive.
(This is apparently a reasonable estimate based on knowledgeable sources.)
If we were to sample 100,000 college-age students, how many would we expect truly
are
HIV+ and how many are HIV-?
100,000 * .002 = 200 who are HIV+.
100,000 - 200 = 99,800 who are HIV-
The Elisa test, which is a common test for HIV) is reputed to have a sensitivity
of .998, which means that 99.8% of people who really are HIV positive will be identified as HIV+. How many would we expect
to correctly identify as HIV+ in our sample.
There are 200 who are HIV+. We would identify 99.8% = 200 * .998 = 199.6 of them.
(Ignore the fact that we have fractional people--this is just a statistical expectation.)
The Elisa test is reputed to also have a specificity of .998, meaning
that 99.8% of well people will be identified as well.. And if 99.8% are classed as well, 1
- 99.8% = .2% are identified (incorrectly) as sick.
From our sample, how many people will we correctly and incorrectly identify as HIV- and HIV+?
There are 99,800 who are HIV-, we will correctly identify 99,800 *.998 = 99600.4 of
these as HIV-, and 99800-99600.4 = 199.6 as (incorrectly) HIV+.
So, we will correctly identify 199.6 as HIV+ and incorrectly identify 199.6 as HIV+. So
half of the people we identify as HIV+ are really not sick.
I can enter these data into a table--first in terms of raw frequencies, and then in terms of proportions, which translate to probabilities.
| |
Positive |
Negative |
|
| AIDS |
199.6 |
.4 |
200 |
| No AIDS |
199.6 |
99600.4 |
99,800 |
| |
399.2 |
99,600.8 |
100,000 |
Now I can convert these to probabilities. Remember, these are joint
probabilities. The probability given in cell 2,1 is the probability of not having AIDS AND scoring positive. So row 1 total is the
probability of AIDS.
| |
Positive |
Negative |
|
| AIDS |
.002 |
.00001* |
.002 |
| No AIDS |
.002 |
.996 |
.998 |
| |
.004 |
.997 |
1.0 |
* I just stuck this 1 on the end to show that it is greater than 0.00. I have obviously
rounded.
The additive law is illustrated by calculating the probability that someone has AIDS,
regardless of their test results. This is .002 + .00001 = .002+, which is where we
started.
The probability that someone tests positive is another joint probability. It is the
probability of positive given AIDS and positive given no AIDS = .002 + .002 = .004.
But what is the conditional probability of having
AIDS given that you test positive for AIDS. This is the number of people who test positive
and have AIDS divided by the number of people with AIDS,
p(AIDS | positive) = 199.6/399.2 = .50
p(NoAIDS | positive) = 199.6/399.2 = .50.
So, if you test positive for AIDS the probability is still only .50 that you have it.
And yet, the Eliza test has a sensitivity and specificity of .998, which is remarkably
high.
I do not wish to leave the impression that the Elisa test is useless because half the
people it calls HIV+ really aren't. We still do hugely better than we would without the
test. The point I want to make is that if we are thinking of introducing some blanket
testing procedure for a whole group of people, we need to think about the fact that the test
will misidentify a lot of people. And if it does, we need to take into account the harm
that can fall to those who are misidentified, as well as the good that will fall to those
who are correctly identified.
This example works out so cleanly, in part, because of the very high probability that a
person chosen at random will not have AIDS. There are so many people who do not have AIDS
that even a very specific test is still going to make a lot of mistakes.
The Binomial Distribution
I dont like to talk much about the binomial because it puts people to sleep. It
represents all they ever thought of about probabilitythat it is boring. BUT,
Ill take a chance.
Ooops, Im may run out of time this year, so skip to the mean and
variance, if there is even time for that.
Assume that you have an event that can come out one of two ways. (e.g. a person can
be classified as having AIDS or not having AIDS, being depressed or not being depressed,
etc.), and assume that there is a probability associated with each event. (These two
probabilities would sum to 1.00) Further assume that these individual probabilities are not very
close to 0.00 or 1.00.
Lets assume that p(dep) = .10 and p (not depressed) = .90 in the
general population. (I dont know what the actual probabilities are, but those seem
reasonable.)
Lets assume that we sample 20 women who have been the victims of spousal abuse,
and find that 8 of them are classed as clinically depressed. Does this number depart far
enough from what we would expect by chance from a random sample of women, that we can
concluded that abused women are at increased risk for depression?
We need to calculate the probability of 8 out of 20 women being
depressed if the probability of depression is .10.

To answer our question, we need not only this probability, but also the probability of
9/20, 10/20, etc. (Mention the additive law.) This would be a real pain to calculate, but
it can be done. I know that the answer to this question is only about .0001 (because I
looked it up in a table.) Thus we can conclude that getting at least 8 depressed women out
of a sample of 20 randomly selected women by chance is only .0001. We would not expect
this to happen if these women were a random sample from a population of normal women.
Therefore we can conclude that abused women are at greater than average risk for
depression.
Alternative approach
I may not go over this in class, but I'll include it here for completeness.
The following approach technically is not appropriate for this problem, but well
use it anyway as an illustration.
We know for a fact the mean and standard deviation of a binomial distribution.
mean = Np
st. dev. = sqrt(Npq)
and, when Np and Nq are both greater than 5, the distribution is nearly
normal.
In our case Np = 20(.1) = 2, which is less than 5, so the distribution is not
particularly normal. But well plunge ahead anyway.
If we know that a distribution (of outcomes) is normal, and we know its mean and
standard deviation, we can calculate z and then the area under the curve beyond z.
Here we have

To find the area of an observation as extreme as 4.47 we could use the table of
the normal distribution. My table peters out before it gets that high, but we can see that
the probability is certainly somewhere near .0001. (Im using a one-tailed test here
to keep in line with the actual calculations for the binomial.)
Last Final Revision: 09/25/01 |