More Examples Using R

Introduction

The following discussion assumes that you have downloaded and installed the editor RStudio. If you have not, you can still use what follows by using the "new script" command from the R screen. RStudio just makes things a great deal easier, and is what I used to create the images below.

Now that you have downloaded and installed R and RStudio, and looked at some simple examples and a page on how to read or enter data, we'll move on. Start up RStudio, and R will open automatically. Your screen should look something like the following. I suggest that you drag the borders to make them a bit narrower. It just saves aggravation.

We will start with something simple, and it won't be "Hello World." In Chapter Two of my book, Statistical Methods for Psychology, 9th ed., I refer to a study by Langlois and Roggman on attractiveness ratings assigned to photographs. The data for 20 participants follow.

   1.20, 1.82, 1.93, 2.04, 2.30, 2.33, 2.34, 2.47, 2.51, 2.55,
   2.64, 2.76, 2.77, 2.90, 2.91, 3.20, 3.22, 3.39, 3.59, 4.02

We need to read these data into R so that we can work with them. There are only 20 pieces of data, so we can enter them directly rather than creating and reading a data file. On the RStudio screen in the upper left enter


 data4 <- c(1.20, 1.82, 1.93, 2.04, 2.30, 2.33, 2.34, 2.47, 2.51, 
 2.55, 2.64, 2.76, 2.77, 2.90, 2.91, 3.20, 3.22, 3.39, 3.59, 4.02)
 print(mean(data4))
   

You don't have to lay the data out as neatly as I have. Type until you come near the end of a line, insert a comma as needed, hit return (enter), and keep typing. Don't forget the closing parenthesis followed by a carriage return. The "c" command means "concatenate", so the variable data4 will be a set of 20 numbers. (Other people say it stands for "combine," but I like "concatenate"--it is a much more impressive word.) If you see a plus sign on the left margin, that is just R's way of indicating a continuation line.

Once you have done this, put your cursor on the first line and click on the "run" menu button, or "Command Enter" (once per line). The results will appear in the lower window.

That probably doesn't leave you all excited about your programming skills, so let's go a step further. If you type

   
    xbar = mean(x)
    print(xbar)

you will see the following in the bottom half of the screen.

   
    > xbar <- mean(x)
    > print(xbar)
    [1] 8.571429

Notice that as it runs R prints out your commands (preceded by the ">" prompt) as well as the result.

What is the difference between "print()" amd "cat()?" With "print" we just got the numerical mean, but can tell what it is by the "print(xbar)" command that was output. But if you use "cat," you can get something better. For example

cat("The mean of this sample is = ", xbar) The mean of this sample is = 8.571429

And while I'm on that, if I include \n within the quotation marks, it will go to a new line, which is often a neater way of printing things out.

cat("The mean of this sample is = \n", xbar) The mean of this sample is = 8.571429

Let's back up a bit--Entering data

There are several different ways of entering data, other than reading them from a file, but we are only going to touch on two of them. One you have just seen, which is to use the "x <- c(4,7,8,9)" command. You can do that for all of your variables if you want to, but that becomes a nuisance. I already wrote about the "read.table" command on the ReadingData.html page. You can create any data file that you want just by using any editor, putting the variable names on the first line, and then enter the data as columns.

Your data might look like

   ID	Score	Group
   1	4	1
   2	9	1
   3	12	1
   5	9	2
   ...	...	...
   23	8	2
   

You then read the data file as I did in the ReadingData.html page. Now you have your data read in, but perhaps not quite in the way you expect. One of your variables is named Score, but if you ask R to type out its values you will get

  
  > Score
   Error: object 'Score' not found

The problem (if there is one) is that data1 is what is called a data frame. As I said on a previous page, a data frame is basically a file with a bunch of columns, and Score is part of that file. You could use that awful attach() command, which would make those variables available--and all set to cause trouble-- but I strongly recommend against it. (For more about "attach()," click on attaching.html.) If you want to use the Score variable, for example if you want its mean, then just use the mean(data1$Score) command. Notice the $ in that command. So add the name of the data frame and a $ to the variable name by typing


   > data1$Score
    [1]  4  9 12  8  9 13 12 13 13  7  6  7  8  7  2  6  9  7 10  
	   5  0 10  8
   > 

then data1$Score will be a legitimate variable by itself and you can now print it out as I did here. This is true whenever you have a data frame. When I am going to refer to a variable frequently, and don't want to keep adding "data1$" to the beginning, I often use something like Score <- data1$Score, which makes Score a regular variable, and a copy of the one in the data frame, and you can use it without putting data$ in front of it. But, again, be sure that you keep your variables clean and not confused with other variables by that same name.

Other methods of data entry

There are many other ways to enter data. One is by way of an Excel spreadsheet. Another is by way of the edit command. Try the following alternative commands, one at a time, on the command line to see what happens.

   
   edit(data1)
   Newfile <- edit(data.frame())
   AnotherFile <- edit(data1)
   write.table(AnotherFile, file = "Davesfile.dat", row.names = FALSE)

The first command will edit an existing data.frame. The second command will bring up an edit window and create a data frame named Newfile.dat. The third will open data1 to be edited, and save the result as a data frame named "AnotherFile." to a file names "Newfile.dat. The last will save that file that I have just created and name it "DavesFile.dat." Be sure to give it a more complete address or you won't know where it will end up. If you have set a default directory, the file will appear there.

Other Simple Commands

We are not limited to just printing out means. There are lots of other descriptive statistics that have their own functions. Assume that we have a variable named Fred. Some of these statistics are shown below along with the results that R prints out.


	Fred <- c(23, 26, 45, 23, 54, 34, 67, 8, 35, 41, 42)
   > mean(Fred)
   [1] 36.18182

   > length(Fred)  # reports the number of observations in data3.
   [1] 11

   > var(Fred)
   [1] 265.3636

   > sd(Fred)
   [1] 16.28999

   > hist(Fred)

   


I cheated here a little bit. The graphic may not come out on your R console. It may come out in its own window. (I don't know why this sometimes happens, but on a Mac it is often related to "Xquartz".

You Actually Know More than You Think

Those few commands that you just saw will take you a long way. You could, for example, do many of the exercises in Chapter Two without learning any more. You could probably guess at a few other functions such as median(Fred). Try typing sqrt(Fred). I bet that isn't quite what you thought that you would get. It gives the square root of all of th values in the variable Fred.

Specific Topics


dch

Free JavaScripts provided
by The JavaScript Source