Hypothesis Testing and Probability

9/25/01

I will quickly go over the lab from Thursday, because it will allow us to get at at least two of these topics. In that lab we created sets of data that were randomly sampled from a single population with a mean of 34.93 and a standard deviation of 4.55. In other words, the null hypothesis (equal means for both conditions) was true, and we are looking at what to expect under those conditions.

  • Review the Liddle study
    • The distribution below is the distribution of 100 randomly drawn samples from the Disclose group.
    • These are from this year

 

.

  • Then comes the distribution for the NonDisclose group.

 

  • Notice that the means differ only very slightly between the two conditions--34.82 - 34.94 = -0.16, where the standard deviation of the means is approximately .85.
  • Finally, the one we care most about is the distribution of differences
    between means. That follows:
    • These data are combined across three years.

  • The normal distribution has been superimposed just for clarity, although it does distort how we see figures.
  • If the effect of disclosure is null (i.e. if the null hypothesis that µ1 = µ2 = 0 is true), we would expect that the Disclose group would differ from the NonDisclose group only as a result of random error. 
  • You can see that if we repeated this experiment 290 times, about half of the time the Disclose group would have a higher mean than the NonDisclose group, about half the time it would be less. The average difference between the means is 0.10, which is very small. There certainly does not appear to a case of Disclosure leading to lower means than the Non-Disclosure condition. The mean of this distribution is not 0.00, but it is very close to it.
  • In the text I give (but not until Chapter 7) the standard deviation of the sum of two (independent) variables as the square root of the sum of their separate variances. 
    • Going to this year's data
    • Remember that we only have 100 samples, not an infinite number of them.
    • Sqrt (.82^2 + .91^2) = 1.22, which isn't all that far from the 1.19 that you see if you look at the distribution of differences over all three years.
    • The mean of the differences (-.12) is equal to the difference of the means (-.10) when you ignore the fact that I gave differences on 3 years' data instead of 1.
  • Liddle found a mean of 35.08 for the Disclose condition and 34.78 for the Not Disclose condition, for a difference of 0.30. Notice that this is a difference that will occur very frequently when the null hypothesis is true. (It is very near the center of the distribution of expected mean differences above.)

Plotting the t distribution

  • I had everyone calculate the t test for their 10 replications. 
  • t is a test on the difference between two means, and t will depart far from 0 if the null hypothesis is false, and stay around 0 if the null is true. 
  • I plotted the frequency distribution of the obtained values of t. Because I believe that the null is true, I expect to see these t's clustered symmetrically about 0. 
    • In fact, with this many cases the t distribution should appear to our eyes very much like a normal distribution.
  • This is plotted below for data combined over two years (one year I did not record t values).
    • The distribution should not be negatively skewed--I was just unlucky.
  • Liddle's t would have been 0.26. You can see that this is a common value of t to find when the null is true.

I happen to know that when the null is true, only 5% of the time would we get a t that exceeds about 2.005. 5% of 190 = 9.5. You can see that in fact, we got 7 values at least that large. Pretty close.

The fact that I expect about 49.5 out of 2900 means that I will make a Type I error 5% of the time. We actually only made 7 Type I errors.

 

Students might reasonably expect that the values of t would be linearly related to the values of the difference in the means.

The relationship will not be perfect because each sample has a different standard deviation.

I plotted this below just to give a sense of what happens.

 

 

Probability Theory

Go quickly over most of this.

Basics

Frequentistic (also known as Relative Frequency)

This is the one we will make most of.

Analytic

Subjective

Events
The basic unit or outcome
Flipping a coin — event = “head” or “tail” 
Counting homeless — event = “found 58 people without homes” Example from Achenbach
Events: Score > 63. (I use 63 for an example because this is often used to define a behavior problem kid.
If 5% of population are problem kids, what is probability that out of the next 100 people we count, 10% would turn out to be problem kids? The event would be the 10% problem kid result.
Point out that random sampling would be crucial here.
Independent events
Two events are said to be independent if p(event #2) does not vary as f(event #1) Ask them for an example from research they know. We often assume that observations (and hence errors) are independent, and that is basic to much of what we will do.
Mutually exclusive
Two events are mutually exclusive if the occurrence of one
precludes the occurrence of the other
Coin can’t land both Head and Tail
Ask them for an example for Psychology.
Exhaustive
A set of events are exhaustive if no other outcome is possible.
On a Likert scale, the values Strongly Disagree, Disagree, Neutral, Agree, and Strongly Agree are mutually exclusive and exhaustive.

Laws of probability

Additive law

The prob. of the occurrence of one of several mutually exclusive events = sum of their individual probabilities.

Prob. of 8 or more head out of 10 = p(8) + p(9) + p(10)

For HIV example, we have four possible conditions:

Healthy and test neg, healthy and test pos., sick and test neg, sick and test pos.

Multiplicative law

The probability of the joint occurrence of two independent events = product of their separate probabilities

p(X,Y) = p(X)*p(Y)

Use the AIDS study to show the calculation of joint probabilities. But we can't, because testing positive and having AIDS are not independent events.

This is an example of an hypothesis test. We say “If such and such is true, then we expect ...” We then look to see if that is what we actually got.

We will see a more formal example of this test later using Chi-square

 

Joint and Conditional probability

Joint probability. —What is the probability that Event A and event B will occur?
What is the probability that you will engage in unsafe sex and that your partner will have AIDS?
Conditional prob — What is the prob. of A occurring given that event B has occurred?
What is the probability that you will engage in unsafe sex given that your partner has AIDS?

We hope that this probability is very small.

Examples
Two years ago in another class I talked about AIDS and AIDS Screening.
The following is cut from that class's notes. It lays out part of the problem in a slightly different way than I did in class.

Assume that 0.2 percent of the population of college-age students is HIV positive. (This is apparently a reasonable estimate based on knowledgeable sources.)

If we were to sample 100,000 college-age students, how many would we expect truly are HIV+ and how many are HIV-?

100,000 * .002 = 200 who are HIV+.

100,000 - 200 = 99,800 who are HIV-

The Elisa test, which is a common test for HIV) is reputed to have a sensitivity of .998, which means that 99.8% of people who really are HIV positive will be identified as HIV+. How many would we expect
to correctly identify as HIV+ in our sample.

    There are 200 who are HIV+. We would identify 99.8% = 200 * .998 = 199.6 of them.
    (Ignore the fact that we have fractional people--this is just a statistical expectation.)

The Elisa test is reputed to also have a specificity of .998, meaning
that 99.8% of well people will be identified as well.. And if 99.8% are classed as well, 1 - 99.8% = .2% are identified (incorrectly) as sick. 

From our sample, how many people will we correctly and incorrectly identify as HIV- and HIV+?

    There are 99,800 who are HIV-, we will correctly identify 99,800 *.998 = 99600.4 of 
    these as HIV-, and 99800-99600.4 = 199.6 as (incorrectly) HIV+.

So, we will correctly identify 199.6 as HIV+ and incorrectly identify 199.6 as HIV+. So half of the people we identify as HIV+ are really not sick.

I can enter these data into a table--first in terms of raw frequencies, and then in terms of proportions, which translate to probabilities.

  Positive Negative  
AIDS 199.6 .4 200
No AIDS 199.6 99600.4 99,800
  399.2 99,600.8 100,000

 

Now I can convert these to probabilities. Remember, these are joint probabilities. The probability given in cell 2,1 is the probability of not having AIDS AND scoring positive. So row 1 total is the probability of AIDS.

 

  Positive Negative  
AIDS .002 .00001* .002
No AIDS .002 .996 .998
  .004 .997 1.0

* I just stuck this 1 on the end to show that it is greater than 0.00. I have obviously rounded.

The additive law is illustrated by calculating the probability that someone has AIDS, regardless of their test results. This is .002 + .00001 = .002+, which is where we started.

The probability that someone tests positive is another joint probability. It is the probability of positive given AIDS and positive given no AIDS = .002 + .002 = .004.

But what is the conditional probability of having AIDS given that you test positive for AIDS. This is the number of people who test positive and have AIDS divided by the number of people with AIDS,

p(AIDS | positive) = 199.6/399.2 = .50

p(NoAIDS | positive) = 199.6/399.2 = .50.

So, if you test positive for AIDS the probability is still only .50 that you have it. And yet, the Eliza test has a sensitivity and specificity of .998, which is remarkably high.

I do not wish to leave the impression that the Elisa test is useless because half the people it calls HIV+ really aren't. We still do hugely better than we would without the test. The point I want to make is that if we are thinking of introducing some blanket testing procedure for a whole group of people, we need to think about the fact that the test will misidentify a lot of people. And if it does, we need to take into account the harm that can fall to those who are misidentified, as well as the good that will fall to those who are correctly identified.

This example works out so cleanly, in part, because of the very high probability that a person chosen at random will not have AIDS. There are so many people who do not have AIDS that even a very specific test is still going to make a lot of mistakes.

The Binomial Distribution

I don’t like to talk much about the binomial because it puts people to sleep. It represents all they ever thought of about probability—that it is boring. BUT, I’ll take a chance.

Ooops, I’m may run out of time this year, so skip to the mean and variance, if there is even time for that.

Assume that you have an event that can come out one of two ways. (e.g. a person can be classified as having AIDS or not having AIDS, being depressed or not being depressed, etc.), and assume that there is a probability associated with each event. (These two probabilities would sum to 1.00) Further assume that these individual probabilities are not very close to 0.00 or 1.00.

Let’s assume that p(dep) = .10 and p (not depressed) = .90 in the general population. (I don’t know what the actual probabilities are, but those seem reasonable.)

Let’s assume that we sample 20 women who have been the victims of spousal abuse, and find that 8 of them are classed as clinically depressed. Does this number depart far enough from what we would expect by chance from a random sample of women, that we can concluded that abused women are at increased risk for depression?

We need to calculate the probability of 8 out of 20 women being depressed if the probability of depression is .10.

To answer our question, we need not only this probability, but also the probability of 9/20, 10/20, etc. (Mention the additive law.) This would be a real pain to calculate, but it can be done. I know that the answer to this question is only about .0001 (because I looked it up in a table.) Thus we can conclude that getting at least 8 depressed women out of a sample of 20 randomly selected women by chance is only .0001. We would not expect this to happen if these women were a random sample from a population of normal women. Therefore we can conclude that abused women are at greater than average risk for depression.

Alternative approach

I may not go over this in class, but I'll include it here for completeness.

The following approach technically is not appropriate for this problem, but we’ll use it anyway as an illustration.

We know for a fact the mean and standard deviation of a binomial distribution.

mean = Np

st. dev. = sqrt(Npq)

and, when Np and Nq are both greater than 5, the distribution is nearly normal.

In our case Np = 20(.1) = 2, which is less than 5, so the distribution is not particularly normal. But we’ll plunge ahead anyway.

If we know that a distribution (of outcomes) is normal, and we know its mean and standard deviation, we can calculate z and then the area under the curve beyond z.

Here we have

To find the area of an observation as extreme as 4.47 we could use the table of the normal distribution. My table peters out before it gets that high, but we can see that the probability is certainly somewhere near .0001. (I’m using a one-tailed test here to keep in line with the actual calculations for the binomial.)

 

  Last Final Revision: 09/25/01