Graphical Demonstrations for Class Use


David C. Howell

GreenBlueBar.gif GreenBlueBar.gif

This page introduces the reader to a variety of simple demonstrations that are programmed in R. The original source of many of these is related to a paper by Bowman, Crawford, Alexander, & Bowman (2007) Although they are written in R, you do not need to know much about R in order to run them. In fact, the first part of this document relates to how to get R (it is free), how to install it, and how to run it. I no longer have a Macintosh, so I can't speak authoritatively about running R on a Mac, but there is plenty of help on the R site (http://cran.r-project.org/) to handle any problems that are likely to arise.

What is R?

R can be called a "computing environment," a "computing language," a "scripting language," or anything else you care to dream up. I would guess that I could start an argument in the R community just by asking what to call it. Basically it is a language like Fortran or Basic (That will really get people upset!) in which you can write programs. It is far more powerful, however, because of its huge collection of built in functions. It is also an object oriented language, which makes it extremely powerful. But it is also a language with a very flat learning curve, which makes it more difficult to learn than stuff like Visual Basic or Fortran. At least I think so. But you are not going to have to learn very much in order to run these demonstrations.

Where can I get R?

That is the easy part. Simply go to http://cran.r-project.org/ and follow the obvious links. Right at the top is one that says "Download and Install R," and it is a pretty good bet that is what you want. Just choose your operating system and click on the appropriate link. That will take you to a page that will let you download either the base system or the contributed packages. For now just stick with the base system. Right at the top of the next page is the download link that you will want. At the moment the version is R 2.8.1, but that will change. If you have Vista (fortunately I don't) you may want to click on the link below the download button which talks about Vista. It seems that the only problems are related to administrator priveleges. Just read what they have to say if this problem arises.

What else do I need?

What you will have downloaded is the base system with a bunch of extra packages thrown in. A package is a set of functions, programs, scripts, whatever that go beyond the basic package. For example, my daughter is an actuary, and there is a package called "actuar" that has all sorts of cool stuff that actuaries need but that the rest of us can ignore. So she would likely download that specific package. There are other packages for graphing, statistical analyses, etc., and some of these are loaded automatically for you when you install R. For much of what you might do with R this is sufficient. But we are going to need some more. So after your installation is complete, start it up. I am skipping a bunch of stuff on setting defaults. You can read about that at the R web site.

Once you have started R, either using the icon that magically appears on your desktop or by hunting through the various folders and clicking on Rgue.exe, you will have a very basic Graphical User Interface. Those are the things that let you select a file to open, let you copy and paste, and that sort of stuff. This one is pretty bare bones. One of the drop down menus will be labeled Packages. Go to that and click "Install package(s)." Select a site near you, not one half way around the world, and wait until the (huge) list of available packages comes up. Select "rpanel" and download that. While you are at it, download "rgl" as well. These will download to your machine and install themselves--you might have to click on something to start the installation. Now you're done.

Now what?

I assume that you still have R running. If not, restart it. I hate texts that think you are so dim that you will be all excited when you can make your computer type "Hello, World." (Unix people love to start that way, and then they talk about foo and foobar--that's how you know they are Unix people. Similarly if someone says "Dude" over and over again they are probably a snowboarder.) We are going to start with something a great deal more interesting--at least I think so. We are going to plot a regression line and then fiddle with it. Then we are going to move to three dimensional surfaces. The data that I am using can be found at the Web site for my "Methods" book and is named "Tab15-1.dat." You can either download that or make up your own. If you download that file, stick it someplace where you will remember it, such as in a folder called "R Stuff." The folder can be called anything you like, but it will be best if you keep all the files that I will talk about here in that same folder. That data file has six variables, but I will only play with three because that is all I need. But first we have to read in the data.

Our First R programming

Because you have R running, you can go to the R console and type in the following commands. (We'll see later how to enter them into a file and submit that, but for now every command (well most) will execute as you enter them.)

library(rpanel) #rgl package also required but will be loaded automatically
data <- read.table(file.choose(), header = TRUE) #Use Tab15.1

"This" does not equal "this"!

Something that is going to drive you crazy is the fact that R is case sensitive. That means that the word "this" starting with a capital letter is completely and utterly different from "this" starting with a lower case letter. That goes way back to the origins of R, which began at Bell Labs and was called S. (I guess they liked short names!) S transmogrified into R and into S-Plus, which are almost the same except that S-Plus costs real money while R is free. There are a few commands that work differently in the two languages, but very few.

Back to our Commands

The first command loads that package that you just downloaded, called rpanel. When you installed it, you only made it available if called. The "library(rpanel)" actually tells it to wake up and get ready to do something. The second command will open up a dialog window so that you can go hunting for the data file that you so carefully saved. After you get the data you can see the data file by entering


and you will see the beginning of the datasets. Adding the word "head" simply tells it to only write out a few lines, rather than scrolling the interesting stuff off the page.

What About "Attach?"

Contrary to what you might think, you really don't have a variable named "Teach" readily available to you. It is part of a data file (called a data.frame) named "data" and you would have to call it as data$Teach, which quickly becomes a pain in the neck. So the "attach" command yanks a copy of Teach out of the data set and makes it available for you to call by its very own name.

Now some Simple Stuff

Just to show you want R can do, we will make use of some of its built in functions. For example, if you type "mean(Teach)" (without the quotation marks) it will spit out the mean value for Teach. You can probably guess what will happen if you type "sd(Teach)" or var(Teach). You can go further by typing "cor(Teach, Knowledge, Overall)" This will give the set of correlations between each of those variables taken two at a time. There is a good bit that you can do like this without knowing a lot about R. We are about to do something like that but a lot more fancy. But our commands won't be noticeably longer.

Looking at the Regression of Overall on Teach

Let's start by just plotting Overall against Teach. Since Overall is the overall student rating of an instructors performance, and Teach is a rating of his or her teaching skills, it makes sense to use Overall as the dependent variable. Because you have loaded the rpanel library you have the command

rp.regression(Overall, Teach)

available to you. Just type that at the prompt. You will get the following result, or something like it.

Regression of Overall on Teach

The most obvious thing that you see there is a scatterplot, but the interesting stuff is in the upper left. There you can click on the + and - to increase or decrease the intercept, and you can do the same to vary the slope. As these change, so does the equation above the scatterplot. You can play with these controls to get a line that looks like it goes through the data just as the true regression line will. Then you can move down a bit and click on the box that displays the optimal regression line and see how close you came. Finally, you can move down a bit lower and click on boxes that will allow you to move individual data points left to right or up and down. You will see how the regression line changes as you move the points. (The point springs back to its original location when you release the mouse.

What About Three Variables?

We can do one better than the previous demonstration by using another function called rp.regression2. That function will take three variables as input and plot them in three-dimensional space. You can then move the space around and see what it looks like. This is the same figure, more-or-less, that you see in Figure 15.1 in my book. The command is remarkably simple, especially considering how much it is doing. You simply enter

rp.regression2(Overall, Teach, Knowledge)

(Don't try to be cute by adding all three variables to the previous (rp.regression()) command. It just makes an awful mess. The figure below is what you will get, although you may have to use your mouse to move the various bits around so that you can see them.

Spin plot

As you can see, there are a bunch of things that you can do with this plot. Theta and Phi allow you to rotate it in three-dimensional space. In addition, You can ask it to plot the regression of Overall on either Teach or Knowledge, or on both. Notice how the plane through the points changes as you change the predictors.

What Else?

Well, first of all you could substitute variable names and use other variables in this data set. That will get old soon. Alternatively, you can use this same function and play with your own data. Simply make up a data file in three (or more) columns with the variable names in the first row. Then load it just as you did the one above.

But there is more that you can do even with these variables. Entering the following commands will allow you to look at the individual variables and plot them in different ways.

var.plot <- Teach
density.draw <- function(panel) {
plot(density(panel$x, bw = panel$h), main = "Density")

data.plotfunction <- function(panel) {
if (panel$plot.type == "histogram")
hist(panel$x, main = "Histogram")
if (panel$plot.type == "boxplot")
boxplot(panel$x, main = "Boxplot")

panel <- rp.control(x = var.plot)
rp.listbox(panel, plot.type, c("histogram","boxplot","density estimate"),
action = data.plotfunction, title = "Plot type")
rp.slider(panel, h, 0.1, 5, log = TRUE, action = density.draw)

The simplest way to do this is to simply cut and paste these commands into a text file, and then paste them into R. (There are better ways, but this will do.) You will notice that the first line of the script sets var.plot <- Teach. By changing that to a different variable name you can work with each of your variables. (I tried to create a dialog box to do that for you, but I gave up. It isn't that it couldn't be done, it's that I'm not smart enough (yet) to do it.). If you submit this code you will get something like the following.

Descriptive plots

I moved things around before cutting pasting the figure, and you may want to also. The first plot that you see is a density plot, which is a way of fitting a curve to data so as to be faithful to the true wiggles in the plot and to play down noise. (The 7th edition of my "Methods" book discusses density plots.) Think of it as taking a weighted average of the heights of nearby points as you move across the figure. In the little box you will see a slider labeled "h." If you move that left and right you will change the bandwidth that is used to draw a particular section of the curve, and as you make the bandwidth larger the curve will smooth out. After you get bored looking at that you can click on "histogram" or "boxplot" and get different views of the variable.

Yes, There is More! "

So far we have only used two of the functions that Bowman et al. provided. But there are a lot more. My suggestion is to type


at the R prompt--don't include parentheses. This will give you the help files that go with this package. Help files in R are not noted for the warm fuzzy hands-on help that you might like, but they are generally helpful. The file that you will start out with looks like

help file

Over at the left you will see a list of all of the parts of the package. And by clicking on each of those functions you will put up a new help page. (The first bunch are data files, so you can skip those.) Part way down you will see something named rp.logistic, and it's a good bet that it deals with logistic regression. I'm choosing this because I have not played with it before, so both of us get to learn how to use it. I am certainly likely to need a data file, so I have made one available at survdata.dat This relates to whether or not a person survived a diagnosis of cancer as a function of the survival rating his/her doctor gave at the time of diagnosis. You can forget all variables except outcome and survrate. Load the data by typing

survdata <- read.table(file.choose(), header = TRUE)

First we want to load the data. I assume that you have the rpanel library loaded. If not, load it now.

Now go to the help files and click on rp.logistic (or type ?rp.logistic at the prompt.) Near the top of the help file you will see

rp.logistic(x, y, xlab = NA, ylab = NA, panel.plot = TRUE, line.showing = TRUE, hscale = NA, vscale = hscale)

I bet that the x and y refer to the data, which in our terms would be survrate and outcome. "xlab" and "ylab" are the axis labels, and NA means that the default is "not available." We don't know what panel.plot is, though there is a brief description a bit lower, so we will take the default. "hscale and vscale are going to refer to the size of the plot, so we might as well take the defaults again. So our command seems to come down to
rp.logistic(survrate, outcome)
at least we will start from there.

rp.logistic(survrate, outcome)
Error in rp.logistic(survrate, outcome) : object "outcome" not found

Oops! I forgot to attach(survdata) so it couldn't find my variables.

rp.logistic(survrate, outcome)

Wow! It worked! At least I got a figure. And if I play with alpha and beta I can get a plot that looks pretty much like the one in my book (in 6th edition it was Figure 15.9). If I want to spice it up I could replace the xlab = NA with something like xlab = Predicted Survival Rate. Oops, that didn't work so put the label itself in quotation marks. xlab = "Predicted Survival Rate" We could change hscale and vscale, but the help file doesn't give us a lot of advice on what kind of numbers to use. So I tried 1 and 2, respectively, and got a tall skinny thing. Well, at least I get the general idea, and could play further.

Can We Have Another Example--Please??

Playing with the left side of the help screen I came to something called rp.power, and I suspect that it refers to power analysis. When I ask for the help screen I get rp.power(). That's odd, it doesn't have anything between the (), which means that I don't need to pass it any variables. So just try
That gave me a graph and a little dialog box asking me to supply (increase or decrease) n, mu1, mu2, and sigma. When I do I see how the power varies as a function of these values. I also have a little box that I can check to see a plot of the two populations superimposed. Not bad.

Now Where?

First of all, there are a lot of things here to play with and keep yourself busy. But if you want to learn a bit about R, you can then create your own functions. If you want to learn R, I personally think that Crawley's book, with the very creative title of "The R Book," is perhaps the best. Moreover, if you issue a function call without any (), you will see the program statements themselves. For example, if you type rp.cartoons() you will find a very good set of functions that you can run just by choosing from a dropdown menu. (Don't be put off by the silly name.) And if you type rp.cartoons without the parentheses, you will see all of the code. If you see something that you almost like, steal the code (giving credit, please), modify it, and come up with something closer to what you want. This code is all open software, so you really are allowed to take someone elses code and modify it. But PLEASE give them credit.

Oh, and I almost forgot. Go to Google and type rpanel. You will find a lot of the stuff referred to above and some other goodies. Never underestimate Google!

rpanel is not the only graphical interface available in R. There is also "playwith," and you can download those functions just like you downloaded rpanel. It is also worth a look.


Bowman, A., Crawford, E., Alexander, G., & Bowman, R. W. (2007) rpanel: Simple interactive controls for R functions using the tcltk package. Journal of Statistical Software, 17, 1-18. (This paper is available on the web via a simple Google search.)

Crawley, M. J. (2007) The R Book. Chichester, England: Wiley.