Some Simple Examples with R

Introduction

The purpose of this section is just to show you something about how R operates. At one level it is just a great electronic calculator. At another level it creates really nice graphics, and at a third level it has thousands of built in function that you can use to do all sorts of things.

Just a Calculator

> 5/3
[1] 1.666667
and
> b <- 5/3
> b
[1] 1.666667
>

The "greater than" sign in the left margin is simply the prompt showing that this is a command line. The "<-" can be thought of as R's equivalent of an equal sign meaning "assign." The "[1]" tells you that 1.6667 is the first of the outputs. In this case there is only one piece of output.

Now that we have b, we can use it in a calculation. For example, we could create a variable called x and multiply b and x together

> x <- 7
b <- 5/3
> product <- b*x
> product      # or print(product) 
	       # or cat(product)
[1] 11.66667
>

Those results are not particularly interesting, so let's move on to something better. R works with vectors. That means that b can be a single number or it can be a string of numbers. Suppose that we want to find the mean of a set of numbers. We can create the set using "c" as the concatenation (or combination) operator--that just means that it strings things together. For example

> x <- c(12, 14, 16, 14, 19, 20, 23)
> x
[1] 12 14 16 14 19 20 23
> sum(x)
[1] 118
> length(x)
[1] 7
> mean(x)
[1] 16.85714
>

The first line creates the set of x values. The second prints them out. (The [1] on the left says that 12 is the first value.) The next line says that we want to calculate the sum of the values of x. In the old days you would have to say "start with total = 0, add x(1) to it, add x(2) to that, add x(3) to that, and so on." Here someone has written that kind of code, though far far more sophisticated than ours, and named it the "sum" function. When you type sum(...), it goes and finds that function and does the work. You don't have to think about it. The next two commands compute the length (the number of observations in x) and the mean. We could also get the mean by typing

> xbar <- sum(x)/length(x)
xbar
[1]  16.85713
> Or, even more simply
xbar <- mean(x)

You probably aren't quite satisfied by my explanation of the [1] in the margin. Well let's create a large vector of random numbers (perhaps 100 of them) and print them out. We have a random number generater that someone wrote, made into a function, and stuck into R, so creating those numbers is a breeze. I want 50 random numbers drawn from a normal distribution with mean = 35 and standard deviation = 7.

y <- rnorm(50, 35, 7)
> y
  [1] 47.84491 34.26865 39.36875 25.22486 40.84229 37.31904 38.91220 36.36298 39.66853 
  27.86605 22.59084
 [12] 37.93739 29.37225 29.49127 32.81972 31.38824 26.29619 29.90322 37.35145 34.41677 
 22.94971 30.35482
 [23] 27.04887 37.99185 32.04899 38.93400 34.26171 29.55976 33.97462 28.77199 30.68097 
 43.87556 36.25555
 [34] 38.21872 36.91282 28.74664 36.05580 32.58514 19.76187 27.23863 40.32444 28.34713 
 33.25422 37.10347
 [45] 32.88337 29.75646 39.65179 30.31897 35.72330 29.27479

Here 47.84491 is the first number, the first entry in the second line (37.93739) is the 12th, the first entry in the third line (27.04887) is the 23rd, and so on. The trick is that the notation x[1] is a subscript to be read as "x sub 1." x[12] is "x sub 12" and so on.

Vectors are Powerful

Above I said that R operates with vectors that can be of any length. "27" is a vector of length 1, "34, 56, 67, 46" is a vector of length 4, etc. A very powerful thing about R is that it doesn't care how long a vector is. It treats a multi-element vector the same way it treats a simple number. For example we all know that 5/3 is 1.6667. But what if I have a long vector?

 x <- c(3,6,8,12,15)
> x/3
[1] 1.0000 2.0000 2.6667 4.0000 5.0000
>

Notice that when x is a vector with 5 elements, x/3 is also a vector with 5 elements, each of which is the corresponding value of x divided by 3. Now let's get a bit fancier. I want a vector with 10 elements and a vector with 3 elements. Then I want to multiply them. How do you suppose that works?

x <- c(1,2,3,4,5,6,7,8,9,10)
y <- c(1,2,3)
print(x*y)
[1]1 4 9 1 10 18 7 16 27 10

To put it technically, y gets recycled.

We multiply 1*1. Then 2*2, then 3*3. 
But now we have run out of y values, so we start over. 
Multiply 4*1, then 5*2, then 6*3, then 7*1, 8*2, etc.

Working with vectors may not sound like such a great thing, but it can save an enormous amount of work. Suppose we have 100 random numbers in a vector named "y." I want to sum them, which is simple, and I want to get the sum of squares-i.e. the sum of each squared element of y. Below I show two ways of getting the latter.

 ybar <- mean(y)
> ybar
[1] 34.9916
> y2 <- y*y
> y2
  [1] 2289.1351 1174.3405 1549.8981  636.2936 1668.0928
  [6] 1392.7109 1514.1595 1322.2663 1573.5924  776.5168
 [11]  etc
> sum(y2)
[1] 126599.4
># OR
> sumy2 <- sum(y*y)
> sumy2
[1] 126599.4
> 
> stdev <- sqrt((sum(y*y)-(sum(y)^2)/length(y))/(length(y)-1))
> stdev
[1] 6.480909
>

Some people like to write code like that. I think that it is too hard to read and to get all of the parentheses in the right places , so I would prefer to do it in individual steps just so I'm sure what I am really doing. Either way works.

Creating Vectors with rep and seq

There are two important functions that we use frequently. I am only going to touch on what they do, but type "?rep" or "?seq" to learn more. Suppose that you have data from 25 subjects, but you forgot to enter subject numbers. Then simply use "subjNum <- seq(1:25)". subjNum is now a vector of 25 id numbers. Or suppose that you have data from 10 subjects in each of 3 groups, listed group by group. Then try "rep(1:3, each = 10)." Or suppose that the data were entered by trial, so you have subject 1 trial 1, subject 1 trial 2, subject 1 trial 3, subject 2 trial 1, subject2 trial 2, subject2 trial 3. Then use "trial = rep(1:3, times = 10)". The results for both of these functions calls are

  > grp <- rep(1:3, each = 10)
  > grp
   [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3
  > trial <- rep(1:3, times = 10)
  > trial
   [1] 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3 1 2 3

Kinds of Variables

There are several other kinds of variables besides numeric and ordinal, but the only two that I will mention here are the "logical" class and the "character" class. Logical variables are variables that only take on the value "TRUE" or "FALSE." (Always capitalized) Sometimes you will have a line of code that says something like "if z is true, do something. Otherwise do something else." This is where you are using logical variables. On the other hand, if you have a variable that is a list of names, those names are of class "character" because they are made up of characters.

Where does the data come from?

In the next section we will look at how to enter data into R. You have seen one way [x <- c(3,5,6,9)], but that is only useful if you have a small number of observations.

Specific Topics

dch: