logo

 

Using the R Programming Environment

Introduction

GreenBlueBar.gif GreenBlueBar.gif

In recent years a programming environment named R has become more and more popular. It was originally written under the name "S" at Bell Labs by Chambers, Becker, and Wilks in the early 1990s or late 1980s. Along the way the companion programming environment named R was introduced, along with a commercial product named S-PLUS. R appears to have made S obsolete. We will concentrate on R because it is freely available with a huge collection of functions that other people have added to it and are continuing to add. I don't know the formal distinction between a language and a programming environment--people would call Fortran a language but we call R a programming environment. Similarly, given my (old) background, I would call what we write a "program," whereas others would call it a code snippet or a command file. I'll stick with "program" even if that is out-of-date.

It is important to recognize that R came out of the Unix operating system environment, which explains some of the feature you will come across. For example, Unix (and Linux, and R) are case-sensitive, so "Print" and "print" are two different commands, and "Print" doesn't exist and will give an error message.

Whereas most people generally write a program and then execute it, Unix types frequently like to work with what is called "the Command Line." This means that you type a command and it is executed, then you type the next command and it is executed, etc. We will some of that, but it takes some getting used to. We will generally combine commands into a "program," or "code snippet," and execute that all at once. Finally, Unix folks have a command called "man" which prints out help ("manual") pages. So if you don't know how to change your working directory you type "man(cwd)" and it will tell you. (Of course that assumes that you know that the name of the command is "cwd," but doesn't everyone know that?) R uses the same kind of help system, although the command is "help(setwd)" or, equivalently, "?setwd". That's great because help is always available, but its bad because the help pages are not always as clear as you would like--in fact some of them make no sense to me.

These pages will not make you an accomplished R programmer, nor are they intended to. My intent is to show you how to read in data, how to transform them if necessary, and how to use them to perform statistical calculations. My primary goal is to explain some of the language that you will encounter when you read through code snippets that I give in the text and on Web pages. Although R is not a statistical language, its greatest development has been in the fields of statistics, about which I know a reasonable amount, and bioinformatics, about which I know almost nothing. There is almost nothing in statistics that you can't do in R, and if you want to do something even slightly complicated, such as computing confidence limits on an effect size measure, someone (Ken Kelley at Notre Dame) has already been there ahead of you and has written functions to do just that. You just have to install his package ("MBESS"), call the appropriate function, and give it the right information.

Good Sources to Use

I want to stress that this is not a course in R, and I don't expect you to become expert in its use. I want you to be be able to take simple code that I give you, modify it as necessary to fit your particular problem, and examine the results. You can go in and change individual lines of the code and see if anything interesting happens. You can obtain a better understanding of the statistical material in the text by examining the results of individual programs. But, if you want to go further and would like a decent text for R, and I hope you do, there are a couple that I can recommend. Perhaps my favorite is a book I recently found by Andy Field, at the University of Sussex, in England. It is entitled Discovering Statistics Using R and is excellent. It runs to 957 pages and weighs a ton, but I really like it, and you have never read any statistics book written in such an interesting style. The second edition of Robert Kabacoff's R in Action is also very good. It is very well written and more than sufficiently comprehensive for your needs. Kabacoff also maintains a set of Web pages at http://www.statmethods.net/ that can be very helpful. I go there often. Many books and Web pages about R have titles such as "Learning Statistics by way of R"." Kabacoff's work is more on the order of "Learning R by way of Statistics." I think that is a better way of learning R. Everitt and Hothorn (2006), A Handbook of Statistical Analyses Using R is a good gentle introduction. Lastly, Fox and Weisberg's second edition of An R Companion to Applied Regression begins with several chapters that are an excellent introduction to R, even if you are may not be particularly interested in applied regression -- though you will be drawn into it later in my book.

But before you run out a buy another book, there are some excellent resources on the Web. Below is a list of a few of them, but a quick Web search will find many many more.

  • http://cran.r-project.org/doc/contrib/Owen-TheRGuide.pdf
  • http://www.cyclismo.org/tutorial/R/.
  • http://ww2.coastal.edu/kingw/statistics/R-tutorials/
  • An Outline of these Pages

    I am going to split these pages into several different units just so that no unit becomes too long. You can always click your way there from here. I will begin with a page on downloading R and related files. As I said elsewhere, if you can install iTunes you can install R. But along with R it is helpful, but not required, to have a good editor. I will discuss one of those (RStudio) in that section.

    Next I will examine a simple example in which you enter some commands, set up some data, and run an analysis. Because this is the beginning, and many people will be using these pages along side an ongoing statistics course, the first few examples will involve fairly elementary statistics. In this section I am not going to say much about the specific commands we will use. I just want you to see what can be done and what the commands look like.

    In the following section I will lay out the basic information about reading in data, creating new variables, doing some simple calculations, and printing out results. I can not possibly burden you with everything that R will do, but I will cover the basics so that you will be able to understand code when I present it.

    Once you have R, have seen some very basic example, and have a set of data to draw from, I will present some more complex examples. Here again, I am much more interested in having you understand what the code does and what it produces than I am in teaching a lot about specific commands. I will do that as we go through the book, and you can see that kind of material by downloading R code that I include with each chapter.

    I have chosen not to use a graphical user interface with pull down menus, etc. with this book. But one exists, and is called RCommander. If you would like to try it, I have a web page that provides an introduction to it. You can find that at http://www.uvm.edu/~dhowell/fundamentals9/Supplements/Rcmdr.html. I also give you the links to two excellent manuals.

    One of the things that R does best is graphics. We will have a whole section devoted to creating meaningful graphs. My goal is to give you annotated code so that you can later steal that code, change the variable names and the text, and produce the same kinds of graphs. Personally I find it easiest to learn by looking at what someone else did and then adapting it to my needs. That is what this section will attempt to do.

    Finally, aside from the pages listed below, I have a complete set of pages that deal with the material in each chapter. For the most part I have used (or at least started with) the code that you find displayed in each chapter as you go along. But I tend to get carried away, so I go on from there. Please don't be overwhelmed by the fact that I don't know when to stop. If you feel that you have had enough for that chapter, then just stop--at least for a while. The quickests way to get to those pages is to go to Introducing R--this page

  • Downloading R
  • Downloading Files
  • Downloading R
  • Simple Examples to Get You Going
  • Entering and Reading Data
  • RCommander--if you want a GUI
  • Basic Graphics


  • dch:

    Free JavaScripts provided
    by
    The JavaScript Source