header.jpg (15348 bytes)


Lab Exercise—Sampling Distributions

9/20/01

 

This lab will look at the broad question of hypothesis testing and sampling distributions. We will use an interesting study by Becky Liddle, at Auburn, on the effect of "coming out" in class on student course evaluations. We will use this study for several reasons:

  • It will give us the opportunity to generate data that reflect the data in someone else’s study.
  • It gives us the opportunity to look at the variability of sample means over repeated experiments.
  • The study comes as close as any I know to "proving the null hypothesis," which, as I say in the book, is a philosophical impossibility, but is sometimes a practical desirability.

Liddle (1997, Teaching of Psychology) taught four nearly identical sections of the same course (in the same semester) on human relations. Near the end of the semester, in a lecture on sexuality, she "came out" to two of the sections, but not to the other two sections. Soon thereafter she administered a standard course evaluation as part of the routine procedure. Her initial question concerned whether students to whom she came out would rate the course lower than students to whom she had not come out. She was also interested in the differences between ratings of male and female students. (Liddle’s paper is an excellent example of good experimental methodology, and I recommend that you look at it. I am not doing her work justice in what little I say here.)

We can think of this study as having four groups. (Never mind that they form a factorial design.) The group means are given below, as is the average group standard deviation.

Disclosed

No Disclosure

Male

Female

Male

Female

Mean 33.00 37.15 33.00 36.56
St. Dev. 4.55 4.55 4.55 4.55

 The first problem is to generate data that resemble the data that Liddle would expect to find if the null hypothesis is true. Her combined mean for all four conditions is 34.93, and her average standard deviation is 4.55. Because we are creating a situation where the null hypothesis is true, we will draw four samples randomly from a population with those characteristics. To make our life simpler, we will assume 15 subjects in each of the four groups (though she had slightly unequal sample sizes).

  • Start out by creating columns for Group and Sex, with 1 = disclose or male, and 2 = nondisclosure or female. Enter Group and Sex codes for the 60 subjects.
  • Next create a column for a dummy variable called z.
  • Next make your solution unique by selecting Transform/Random Number Seed from the menu and entering your own 7 digit number.
  • Use the Transform/Compute menu to set z equal to a random number from a N(0,1) population.
    • z = RV.Normal(0,1)
  • Now each group should have a set of scores with a mean of approximately 0 and a standard deviation of approximately 1.00, depending on the random numbers you happened to get.
    • Check this out using the Analyze/Compare Means/Independent-Samples t test  procedure, with Disclose as the grouping variable and Rating as the dependent variable. (We will ignore gender for reason's given below.) The t value is of less interest to us than the means and the difference between means.
  • If the null hypothesis is true, each of the four groups will come from populations with m = 34.93 and s = 4.55, which is what you have here.  So, use Transform/Compute to create a new variable called Rating, to hold these data. Use Rating = z*4.55 + 34.93.
    • I'll demonstrate this.
  • This has created data for all four groups at the same time, but obviously their actual means will differ by chance from 34.93.
  • Now the data should be what you sought. Run the Analyze/Compare Means/Independent-Samples t test  procedure to check on this.
    • Your means and standard deviations will not be exactly what you wanted. Why is this the case? 

Liddle did not find any differences due to gender, but that was a result of the fact that she ran an analysis of covariance which controlled for gender differences. (That doesn’t mean that there weren’t gender differences, but only that her statistical methodology (properly so) eliminated them.) But that makes it easy for us to simply look at differences between the groups.

If you had saved the previous commands to a syntax file (using the Paste command), you would have (with some editing) the following:

COMPUTE z = rv.normal(0,1) .
COMPUTE Rating = z*4.55 + 34.93.

T-TEST
GROUPS=disclose(1 2)
/MISSING=ANALYSIS
/VARIABLES=rating
/CRITERIA=CIN(.95) .

Enter these commands into a syntax window (create a new one from the File menu). You can cut and paste if you're using Netscape. Then highlight these commands in the syntax window, and run them. (click on the triangle or press command-r).

Repeat the above for a total of 10 trials, each time recording the two means and their difference (to 2 decimal places). You will get different values each time, because the variable named z is a random number. Enter these on the following sheet.

Record these results to 2 decimal places. (Do not use more!!!)

I will collect the results, put them in a class-wide table, and pass them back.

NAME                                          

Trial

Disclose

Mean

No Disclose Mean

Difference

t value

1        
2        
3        
4        
5        
6        
7        
8        
9        
10        

What would you conclude about the effect of disclosure based solely on what you found here? We have not covered any specific hypothesis test, though we did collect the t values, nor would I want you to use one. I just want you to write up what you have found from what you can see from the numbers themselves and any graph that you might make.

 

The results of this lab, across 100 replications of the experiment, can be found at meanresults.sav. You will find a couple of errors. In the first place, I only gave differences for the first 10 replications. You need to use the transform/compute command to create a new variable subtracting NonDisclosed from Disclosed. Moreover, I made a few errors with signs. The sign of a difference must agree with the sign of the t. If they disagree, t is wrong. Correct those problems. After you make the corrections, look at the variables in a variety of ways and see if they make sense. (I have found that this file will download properly if you use Microsoft's Internet Explorer, but not if you use Netscape. Netscape does not recognize the file type (via the extension), and therefore doesn't know what to do with it. Sorry.)

From tables, I know that the probability of a t greater than +2.009 is less than .05 when the null hypothesis is true. How does that agree with what you got?

 

 

 Last revised: 09/21/01