/* IF...THEN...ELSE statements have many uses. They conditionally execute SAS statements based on the values of other variables. They can be used to subset existing data sets, recode existing variables, conditionally recode existing variables, create multiple data sets from one data set, create new variables, etc. In this first example, the data are physical fitness measurements and the variables are Age (years), Weight (kg), Oxygen intake rate (ml per kg body weight per minute), time to run 1.5 miles (minutes), heart rate while resting, heart rate while running (same time Oxygen rate measured), and maximum heart rate recorded while running. */ options ls=100; filename foobar url "http://www.uvm.edu/~abh/stat295/datasets/fitness.dat"; data fitness; infile foobar; input Sex $1. Age Weight Oxygen RunTime RestPulse RunPulse MaxPulse; run; proc print; title "Fitness Data"; run; /* The next example uses the SET statement and a subsetting IF statement. There is no need for a THEN clause since there is an implied "then output". So, it is not incorrect to use "if sex = "M" then output;" as the subsetting statement. You can also delete unwanted observations: "if sex ne "M" then delete;" . The choice usually depends on whether it is simpler to specify what observations you want to keep or to specify those to delete. The SET statement specifies which SAS data set to process, just as an INFILE statement specifies which external data file to read. */ data males; set fitness; if sex = "M"; run; proc print; title "Fitness Data - Males only"; run; /* The next example creates three data sets from the original, based on the value of sex. It is an alternative to doing multiple data steps, each with its appropriate subsetting IF statement. This is also a way to find or remove miscoded or missing data from data sets. */ data males females miscoded; set fitness; if sex = "M" then output males; else if sex = "F" then output females; else output miscoded; run; proc print data=males; title "Fitness Data - Males only"; run; proc print data=females; title "Fitness Data - Females only"; run; proc print data=miscoded; title "Fitness Data - Miscoded sex"; run; /* What's wrong with the miscoded data and how can we fix it? There are several ways to do this. Here's one way. The next example fixes the miscoded sex values and creates a new variable based on the value of an existing variable. */ data fitness; set fitness; sex = upcase(sex); if RunTime > 10 then RunGroup = "Over 10 minutes"; else RunGroup = "10 minutes or less"; run; proc print; title "Fitness data with new grouping variable"; run; /* There is one flaw in the above code that is very common. The length of a new variable is set when a value is first specifed, so the RunGroup variable has a length of 15 because "Over 10 minutes" is 15 character. There are two ways to fix this; make sure longest value is specified first, or use a LENGTH statement to declare the length of the new variable. */ data fitness; length RunGroup $ 18 ; set fitness; if RunTime > 10 then RunGroup = "Over 10 minutes"; else RunGroup = "10 minutes or less"; run; proc print; title "Fitness data with new grouping variable"; run; /* Note that the above fixes the length problem and also rearranges the order of the variables. This is because of the LENGTH statement; the RunGroup variable is now first in the data set since it was declared first. */