Entering Data into R

GreenBlueBar.gif GreenBlueBar.gif

There are several different ways of entering data, but we are only going to touch on a couple of them. One you have already seen, which is to use the "x -> c(4,7,8,9)" command. You can do that for all of your variables if you want to, but that becomes a nuisance.

Another way is to take any old text editor (Notepad will even do) and create a file with the data in different columns. You can put a tab or a couple of spaces between columns, but try to make them look neat. I strongly suggest that the first row of data be the variable names. For example, your file might look like

   ID    Score    Group
   01      4        1
   02      9        1
   03     12        1
   04      8        1
   05      9        1
   06     13        1
   07     12        1
   08     13        1
   09     13        1
   10      7        1
   11      6        1
   12      7        2
   13      8        2
   ...    ...      ...
   23      8        2

which contains the data from a study by Aronson et al. (1998) on stereotype threat. Once you have entered your data and saved it to a file somewhere, you can enter the command

data1 <- read.table(file.choose(), header = TRUE) 
data2 <- read.table("Tab15-3.dat", header = TRUE)

The second command assumes that Tab15-3.dat is located in your default directory. I show below how to change the default director, or you could move to a different directory by using something like "data3 <- read.table("../DataFiles/Tab15-3.dat", header = TRUE)"

The file.choose() command will cause R to open up a dialog box so that you can hunt around for the file you want. When you find it, just click on it and it will open and be known within R as data1. The header = TRUE command tells R that the first line contains variable names. Now rather than print out the whole file to see what we have, we can print
and get

>data 1 <- read.table(file.choose(), header = TRUE
	ID    Score  Group
  1  1      4      1
  2  2      9      1
  3  3     12      1
  4  4      8      1
  5  5      9      1
  6  6     13      1

Let me digress a second to point out that if you are in RStudio you can go to "Tools/Global Options/General" and set the default working directory. Then when you name a file, or need to "choose" a file, RStudio begins by looking in that directory. That can save you a lot of hunting. (The R application itself does something similar under the "Misc" menu.) But the default directory approach only works if the data are located on your computer. But what if they are on the web?

Downloading Data from the Internet

I have saved virtually all of the data files in the book to files on the web, and I give code that reads them. All of those data files have the names of the variables in row one, so we need to take that into account. You use the address of that file (its URL) to specify where to find the data.

data4 <- read.table("http://www.uvm.edu/~dhowell/methods9/DataFiles/Tab15-3.dat", 
header = TRUE)

But I need to make a couple of comments about that. First of all, notice that the command uses a forward slash. If you were to highlight an address in Windows and copy and paste that into your command, it would not work. Windows wants its slashes to be backward "\", whereas R wants them forward "/". So always use forward slashes. Furthermore, as I say elsewhere, it is good practice on the Internet to use "https://" instead of "http://". BUT, for some reason R on a Mac or Unix machine, but not on Windows, doesn't know what "https" means. ((You can do some convoluted things to make it work, I think, but just drop the "s" instead.) Now this is not a Mac or Unix thing, it is an R thing. If you just want to use your web browser on a Mac, I suggest that you do try to use "https" to gain security if possible, although not all web sites like that. The problem only comes when you run R on a Mac. I don't know what Unix does for web sites because I haven't used Unix in years.

Now you have your data read in, but perhaps not quite in the way you expect. One of your variables is named Score, but if you ask R to type out its values you will get

   > Score
   Error: object 'Score' not found

As I have said elsewhere, the problem (if there is one) is that data1 is what is called a data frame. A data frame is basically a file with a bunch of columns, and Score is part of that file. You could type "data1$Score," if you had set the data to "data1," and everything would be fine, but putting "data1$" in front of each variable name is a pain. But if you type

   > attach(data1)
   > Score
    [1]  4  9 12  8  9 13 12 13 13  7  6  7  8  7  2  6  9
	7 10  5  0 10  8

then Score will be a legitimate variable by itself and you can now print it out as I did here. This is true whenever you have a data frame. You need to either use the "data1$subject" convention or attach the data frame. For these pages I will usually use the "attach()" command, but it is not without its problems. See my discussion of "attach()".

A Word of Warning

Suppose that you execute a program like this, perhaps make some changes, and then execute it again. You will get what looks like an error message (but isn't really). If I have a variable named Casenum sitting there, and then attach a data frame that also has a variable named "Casenum" in it, that copy of Casenum will mask (meaning hide) the prior variable named Casenum. If they are different, you may have a problem because you can't easily get to the old Casenum (unless you detach(data1)). Every time you run a program that attaches something, you will get a message like

The following object(s) are masked from 'data1 (position 3)':

ADDSC, CaseNum, Dropout, EngG, EngL, Gender, GPA, IQ, Repeat, SocProb

Most of the time that is not a problem because you are just masking the one variable with an exact copy. But the message will often lead you to think that you have made an error.

Other methods of data entry

There are a number of different ways of entering data, and I have only focused on the simplest. I'll discuss one more.

If you are familiar with SPSS or some other packages, you are used to seeing a spreadsheet where you can enter data. Well you can do that in R if you want to. First create a data frame by a command such as "newdata <- data.frame(age = numeric(0) group = character(0), dv = numeric(0)". Then edit newdata with "newdata <- edit(newdata)." But be careful about this. Don't just type "edit(newdata)." You have to assign the edited result to a file name. In this case I just used the old name, but I didn't have to. And suppose that you wanted to add other variable? Just keep moving to the right on the edit screen and add the new stuff. See the example below.

Specific Topics


Free JavaScripts provided
by The JavaScript Source