/* In addition to the usual arithmetic, comparison, and logical operators (like +, -, <, >, AND, OR), there are a few other useful operators that can be used in SAS comparison expressions. One of these is the IN operator, which compares a value on the left of the operator to a list of values that is given on the right. The following DATA step reads in the names of the months, then creates a subset of the summer months by using the IN operator. */ options ls=90 ps=60; data months; input month_num month_name $9. ; cards; 1 January 2 February 3 March 4 April 5 May 6 June 7 July 8 August 9 September 10 October 11 November 12 December run; data summer_names; set months; if month_name in ("June","July","August"); /* This is the same as if month_name = "June" or month_name = "July" or month_name = "August"; but is much easier to write. */ run; proc print data=summer_names; title "Use IN operator to select summer months by name"; run; /* The IN operator can also be used in a numeric comparison expression. Ranges of values can be shortened to a (first:last) notation. */ data summer_nums; set months; if month_num in (6:8); /* This is the same as if month_num = 6 or month_num = 7 or month_num = 8; OR if month_num > 5 and month_num < 9; OR if 6 <= month_num <= 8; but is much easier to think about logically and to write. */ run; proc print data=summer_names; title "Use IN operator to select summer months by number"; run; /* Another handy tool is the use of the colon modifier. Placed after a character comparison operator, it tells SAS to use only the first character of the comparison string. The following example illustrates this. Let's create a subset of the months that begin with the letter "J". */ data j_months; set months; if month_name =: "J"; run; proc print data=j_months; title "Use colon operator modifier to select months beginning with J"; run; /* You can also use more than one character in the comparison. This example creates a subset of months that begin with "Ma". */ data ma_months; set months; if month_name =: "Ma"; run; proc print data=ma_months; title "Use colon operator modifier to select months beginning with Ma"; run; /* Using character functions, we can also get really fancy and select months ending with a "y". */ data y_ending; set months; if left(reverse(month_name)) =: "y"; run; /* The REVERSE function turns the value around. Any trailing blanks are now leading blanks, thus we also need to use the LEFT function to left-justify the value. For example, the value "May" is actually 9 characters long, but has 6 trailing blanks. Reversing it makes it " yaM" with 6 leading blanks. Using the LEFT function makes it "yaM ". */ proc print data=y_ending; title "Use colon operator modifier and character functions to select months ending with y"; run; /* When making comparisons, you can refer to operators by either their symbol or two character mnemonic. For example, when testing whether a variable is less than or equal to a specific value, you can use either <= or LE. These symbols and mnemonics can be used interchangably within the same statement. The following is a list of symbolic comparison operators with their mnemonic equivalents: = EQ ~= NE < LT > GT <= LE >= GE The symbol for NE is dependent on the operating system. This is the one for Windows. The IN comparison operator does not have a symbolic equivalent. */ data summer; set months; if 6 LE month_num <= 8; run; proc print data=summer; title "Use both symbolic and mnemonic comparison operators"; run; /* NOTE: The EQ mnemonic can only be used for comparisons. It cannot be used in a SAS assignment statement. The following DATA step will not work. Even the enhanced editor knows there is an error here. */ data months; set months; thisyear = 2007; lastyear EQ 2006; run; /* Logical operators also have both symbolic and mnemonic operators. For example, you can use an & instead of the word AND in a logical expression. And just as you can use comparison operators interchangably, so too can you use logical operators interchangably. */ data j_months; set months; if month_name = "January" OR month_name EQ "June" | month_name = "July"; run; proc print data=j_months; title "Use both symbolic and mnemonic logical operators"; run; /* The following is a list of symbolic logical operators with their mnemonic equivalents: & AND | OR ~ NOT The symbols for OR and NOT are dependent on the operating system. These are the ones for Windows. */ /* When using dates in comparisons, a date constant is referenced by enclosing the date in quotes, followed by a D. When using times in comparisons, a time constant is referenced by enclosing the time in quotes, followed by a T. When using datetimes in comparisons, a datetime constant is referenced by enclosing the datetime in quotes, followed by DT. */ data date_example; input date mmddyy10. ; cards; 4/28/2006 12/25/2001 1/1/1960 12/31/1959 6/19/1981 8/26/1988 run; data subset; set date_example; if date LT "01JAN2000"D; run; proc print data=subset; format date date9. ; title "Date comparison example"; run; /* If a numeric variable contains missing values, SAS represents and stores these missing values as the smallest possible value. This "feature" may result in unwanted missing values resulting from a comparison operation. The following example shows what can happen as a result of subsetting when missing values are present. */ data ages_incomes; input id sex $ age income comma7. ; cards; 57004 F 22 32,500 24027 M . 102,200 48027 F 37 58,350 36733 F 66 132,900 46343 F . 29,875 34636 M 61 75,250 34344 M 40 125,945 28457 M 66 208,750 56757 F 58 50,455 73557 F 57 23,345 72557 F . 27,500 43769 F 31 38,495 64756 M 45 40,200 46571 F 57 48,390 68621 F 42 31,835 43771 M 61 28,855 86322 M 74 19,335 run; data under65; set ages_incomes; if age lt 65; run; proc print data=under65; title "Missing values included in subset"; run; /* If you do not want missing values included in the subset, then you must explicitly test for non-missing values in the subsetting IF statement. */ data under65; set ages_incomes; if age lt 65 & age ne . ; run; proc print data=under65; title "Missing values excluded in subset"; run;