Objects in the R Programming Environment

Your calculation may produce more than you think


GreenBlueBar.gif

Most of you will be least slightly aware of printout from a programing environment such as SPSS, SAS, or Stata. In using those you will read in a bunch of data, which is fully available so that you don't have to attach it. You might then run, for example, a multiple regression predicting Y from a bunch of variables. While doing so you could ask the program to print out means, standard deviations, and all sorts of other stuff. The result will come out on your screen, or a printer, filling the whole screen or one or more pages with printout. But suppose that you did that in R. You might get the following. (You don't have to know anything about multiple regression to follow this.)


setwd("~/Dropbox/methods9/DataFiles") guber <- read.table("Tab15-1.dat", header = T) model1 <- lm(SATcombined ~ Expend + PctSAT + PTratio, data = guber) print(model1) ____________________________________ Call: lm(formula = SATcombined ~ Expend + PctSAT + PTratio, data = guber) Coefficients: (Intercept) Expend PctSAT PTratio 1035.474 11.014 -2.849 -2.028

That's it! How boring! You get the regression coefficients, but nothing else. No test of significance, no multiple correlation, nothing! Doesn't R know any more?

Yes it knows a lot more, but it won't give it to you until you ask for it. When R runs a procedure like this, it calculates all sorts of things. And it stores all that away in something called an "object." Model1 in an object, and hidden away within "model1" are the regression coefficients, the predicted values, (Ŷ), the residuals, (Y - Ŷ), and other stuff.

If instead of typing "print(model1), you had typed "summary(model1)", you would see:


Call:
lm(formula = SATcombined ~ Expend + PctSAT + PTratio, data = guber)

Residuals:
    Min      1Q  Median      3Q     Max 
-92.284 -21.130   1.414  16.709  66.073 

Coefficients:
             Estimate Std. Error t value Pr(>|t|)    
(Intercept) 1035.4739    50.3155  20.580   <2e-16 ***
Expend        11.0140     4.4521   2.474   0.0171 *  
PctSAT        -2.8491     0.2155 -13.222   <2e-16 ***
PTratio       -2.0282     2.2071  -0.919   0.3629    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 32.51 on 46 degrees of freedom
Multiple R-squared:  0.8227,	Adjusted R-squared:  0.8112 
F-statistic: 71.16 on 3 and 46 DF,  p-value: < 2.2e-16

Well, that's a little better, but R knows more than that. To go one further, had you typed "str(model1), where "str" stands for "structure," you would get



> str(model1)
List of 12
 $ coefficients : Named num [1:4] 1035.47 11.01 -2.85 -2.03
  ..- attr(*, "names")= chr [1:4] "(Intercept)" "Expend" "PctSAT" "PTratio"
 $ residuals    : Named num [1:50] 2.69 -30.59 -28.03 -27.81 -11.57 ...
  ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
 $ effects      : Named num [1:50] -6830.1 199.3 430.2 -29.9 -24.5 ...
  ..- attr(*, "names")= chr [1:50] "(Intercept)" "Expend" "PctSAT" "PTratio" ...
 $ rank         : int 4
 $ fitted.values: Named num [1:50] 1026 965 972 1033 914 ...
  ..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
 $ assign       : int [1:4] 0 1 2 3
 
 AND A WHOLE LOT MORE!

That may look like an awful mess, but you probably don't care about a lot of it. What you care about is that it tells you that it has stored 1) the coefficients, 2) the residuals, 3) the effects, 4) the rank, 5) the fitted values, and a lot of other stuff. I'm not going to tell you what all of that means. But the first two lines tell you that if you type "model1$coefficients" you will get the regression coefficients, and if you type model1$residuals", you will get a list of residuals.

But what about the tests of significance and other stuff? Well, it hasn't calculated those yet. But it has all the information that it needs, and if you were to type "summary(model1)", it would take what it has, calculate from that what it doesn't have, and give you what you want.

So why do you need to know this? Well, it is useful background information that will help you understand what is going on. In addition, later in the book I will use some of this information. For example, having run "result <- summary(model1), I can type "str(result)," see that it has something called "r.squared", and then type "R2 <- result$r.squared" to pull out r-squared and use it in some future calculation or summary printout. You may or may not want to do something like that yourself, but you will see me do it, and I wanted to let you know what that was all about. Until you need to do such a thing, tuck this web page somewhere and pull it out when you need it.

GreenBlueBar.gif
dch:

Free JavaScripts provided
by The JavaScript Source