Sampling Distribution of F

David C. Howell

barbar

This program is designed to demonstrate the fact that F has a sampling distribution. In other words, sampling from the same populations produces different F values each time you do it. The following SPSS and SAS programs each draw samples from five populations with means of 0.00, 0.25, 0.50, 0.75, and 1.00 and run the appropriate one-way analysis of variance. They do this 10 different times, producing 10 summary tables with their associated F and p values.

I have set this program up to sample from populations with different means. (You can change it to equal means if you like.) Therefore, you already know that the null hypothesis is false, because you know the true population means. But that doesn't mean that all of your resulting F values will be significant. But a nonsignificant F is definitely a Type II error. I repeated this whole process 10 times on my own, and collected the resulting 100 F values. The distribution of these F values can be seen in the accompanying figure if you are interested. The critical value for 4 and 95 df is 2.49, so the area to the left of that point represents beta. Using the material on power on p. 324 of the 3rd edition of the "Methods" book, I calculate beta to be 0.17 and power to be 0.83. How do those values compare to the results of my 100 replications? This demonstration can be used in several ways. The best way would be for each student to enter and run the accompanying program, generating 10 F's. By pooling across the entire class and plotting the results, you will get an idea of the kind of variation we routinely find. You'll actually see what we mean by power when you calculate the percentage of times you (correctly) rejecting the null hypothesis. I would then suggest that you modify the program slightly and repeat the procedure. You could:

 

SPSS Modeling of the Sampling Distribution of F Demonstration


Copy the following program exactly as it is written, leaving out the first comment if you wish. Be careful about putting in all the periods and don't have unquoted periods in comments. The "input program" line is required.

Comment This program draws 5 samples from populations with means of "0.0, 0.25, 0.50, 0.75, and 1.0," and then
computes an analysis of variance on those groups; This is repeated 10 times, to produce 10 F statistics and their
associated probabilities. 

*There are 20 cases per group;

Comment You can alter the constants for mean2 -- mean5 to alter the size of the true difference between population
means--setting them all to "0.0" would make the null hypothesis true;

Comment You can change the sample size to whatever you want by changing N;

*Create the data file.
New File.
Input program.

* Set the sample size;
compute N = 100.

*Set the means;
Compute mean2 = 0.25.
Compute mean3 = 0.50.
Compute mean4 = 0.75.
compute mean5 = 1.00.
vector x(10).
loop #i = 1 to N.
compute group = trunc(#i/(N/5) + .99).
loop #j = 1 to 10.
Do IF (Group = 1).
compute x(#j) = normal(1).
Else IF (group = 2).
compute x(#j) = normal(1) + mean2.
Else IF (group = 3).
compute x(#j) = normal(1) + mean3.
Else IF (group = 4).
compute x(#j) = normal(1) + mean4.
Else IF (group = 5).
compute x(#j) = normal(1) + mean5.
END IF.
end loop.
leave N mean2 to mean5.
end case.
end loop.
end file.
end input program.
*Run the analyses.

oneway variables = x1 to x5 by group
/Statistics = descriptives.
oneway variables = x6 to x10 by group
/Statistics = descriptives.

Execute.



SAS Modeling of the Sampling Distribution of F Demonstration


Copy the following program exactly as it is written.




*       SampDistF.sas;

*       This program generates 10 data sets of random data for 5 groups and then runs

        an analysis of variance for each data set;

*       Last revised 3/20/96 -- David C. Howell;

Options ls = 78;

Options FormDlim = '-';



*  The following Data Step generates data on 20 subjects for each of 5 groups;

*  Those data are stored in a SAS dataset named RandData for later use;

*  This dataset will consist of 11 Columns (one for Group and 10 for the dep. vars);

*  and 100 rows (one for each subject);



Data RandData;

        seed = -1;

                * Using a seed with a negative value gives different data

                  each time the program is run. The seed is for the random

                  number generator;

        array x(i) X1 - X10;

        * The following variables set the population means;

        Mean1 = 0; Mean2 = 0.25; Mean3 = 0.50; Mean4 = 0.75; Mean5 = 1.00;



        Do Group = 1 to 5;

               Do Subject = 1 to 20;

                        Do over X;

                          If Group = 1 then X = rannor(seed) + mean1;

                          Else If group = 2 then X = rannor(seed) + mean2;

                          Else if group = 3 then X = rannor(seed) + mean3;

                          Else if group = 4 then X = rannor(seed) + mean4;

                          Else if group = 5 then X = rannor(seed) + mean5;

                       End;

                        Output;

                End;

        End;

Drop Seed Subject Mean1 - Mean5;

                * This command just deletes incidental variables I don't need;

run;



*The following procedure runs a One-way Analysis of Variance;

Proc Anova Data = RandData;

        Class Group;

        Model X1 - X10 = Group;

        Means Group;

Run;

bar bar

Return to Dave Howell's Statistical Home Page  

 

University of Vermont Home Page  



Send mail to: David.Howell@uvm.edu)

Last revised 7/11/98