Sampling Distribution of t

David C. Howell

This program is designed to demonstrate the fact that t has a sampling distribution. In other words, sampling from the same population(s) produces different t values each time you do it. The following SPSS and SAS programs each draw samples from two populations with means of 0.00 and 0.50, and run the appropriate t test. They do this 10 different times, producing 10 sets of results with their associated t and p values. (Those who have looked at the programs for the sampling distribution of F will recognize that I have simply modified those programs to use only two groups.)

I have set this program up to sample from populations with different means. (You can change it to equal means if you like.) Therefore, you already know that the null hypothesis is false, because you know the true population means. But that doesn't mean that all of your resulting t values will be significant. But a nonsignificant t is definitely a Type II error. I repeated this whole process 10 times on my own, and collected the resulting 100 t values. The distribution of these t values can be seen in the accompanying figure if you are interested. (The distribution looks skewed. It shouldn't be, and that skewness will go away if you base it on more trials.) The critical value for 38 df is +2.024, so the area between +2.024 represents beta. Using the material on power on p. 213 of the 3rd edition of the "Methods" book, I calculate power to be 0.35. When I look at the 100 values of t that I generated, I find that I have would reject the null hypothesis 37 times out of 100, which agrees remarkably well with the estimate. This demonstration can be used in several ways. The best way would be for each student to enter and run the accompanying program, generating 10 t's. By pooling across the entire class and plotting the results, you will get an idea of the kind of variation we routinely find. You'll actually see what we mean by power when you calculate the percentage of times you (correctly) rejecting the null hypothesis. I would then suggest that you modify the program slightly and repeat the procedure. You could:

Decrease the sample size
increase the sample size
Change the SPSS random number generator from normal(1) to rv.binom(5,.80), or the SAS random number generator from rannor(seed) to ranbin(seed, 5, .80). This would model scores on a 5 item true/false test where the probability of correct on any item is .80.
Change the populations means (mean1 and mean2) to be equal and repeat the exercise. What does this tell you about the probability of a type I error?
Change the population means to 0.00 and 0.25 and see what this does to the power of the test.

SPSS Modeling of the Sampling Distribution of t Demonstration

Copy the following program exactly as it is written, leaving out the first comment if you wish. Be careful about putting in all the periods and don't have unquoted periods in comments. The "input program" line is required.


Comment	

This program draws 2 samples from populations with means of 0.0 and

0.50, and then computes at test on the means of those groups; 

This is repeated 10 times, to produce 10 t statistics and their

 associated probabilities. There are 20 cases per group;

You can alter the constants for mean1 and mean2 to alter the size of the true difference between population means--setting them both to "0.0" would make the null hypothesis true;

You can change the sample size to whatever you want by changing N;

Created by David Howell
Last modified 3/29/96. Input program.

*Create the data file.

Compute N = 40. Compute mean1 = 0.0. Compute mean2 = 0.50. vector x(10). loop #i = 1 to N. compute group = trunc(#i/(N/2) + .99). loop #j = 1 to 10. IF (Group = 1) x(#j) = normal(1). IF (group = 2) x(#j) = normal(1) + mean2. end loop. leave N mean1 to mean2. end case. end loop. end file. end input program.

*Run the analyses.

T-Test Groups = Group(1,2)
/ variables = x1 to x10.

SAS Modeling of the Sampling Distribution of F Demonstration

Copy the following program exactly as it is written.




*       SampDistt.sas;

*       This program generates 10 data sets of random data for 2 groups and then

*		runs a t test between groups for each data set;

*       Last revised 3/29/96 -- David C. Howell;

Options ls = 78;

Options FormDlim = '-';



*  The following Data Step generates data on 20 subjects for each of 2 groups for; 

*  10 variables. The 10 variables are just 10 replications of the experiment;

*  Those data are stored in a SAS dataset named RandData for later use;

*  This dataset will consist of 11 Columns (one for Group and 10 for the dep. vars);

*  and 40 rows (one for each subject);



Data RandData;

        seed = -1;

                * Using a seed with a negative value gives different data

                  each time the program is run. The seed is for the random

                  number generator;

        array x(i) X1 - X10;

        * The following variables set the population means;

        Mean1 = 0; Mean2 = 0.50;



        Do Group = 1 to 2;

               Do Subject = 1 to 20;

                        Do over X;

                          If Group = 1 then X = rannor(seed) + mean1;

                          Else If group = 2 then X = rannor(seed) + mean2;

                          

                       End;

                        Output;

                End;

        End;

Drop Seed Subject Mean1 - Mean2;

                * This command just deletes incidental variables I don't need;

run;



*The following procedure runs a t test on each variable;

*I wrote the following code at home from memory and have not had a chance

 to test it;

Proc TTest Data = RandData;

        Class Group;

        Var X1 - X10;

Run;

Home Icon Return to Dave Howell's Statistical Home Page

Planetary

Cows Icon University of Vermont Home Page

Send mail to: David.Howell@uvm.edu)

Last revised: 7/11/98