Research, Udacity Class

SAS Programming: Beginner Notes

Below are my notes from when I originally studied SAS programming beginner techniques through 3 HUGE volumes of books a friend gave me. At that time, I only took notes in Microsoft Word, so there was a real lack of gifs. I’d go through and add some, but this is going to be a HUGE post and WordPress is really struggling to function. Honestly, I’m only adding this post for my own reference and to save my notes, so if I were you I’d go watch some Schitts Creek instead or take the class yourself! 🙂

SAS Helpful Hints

Notes from Lessons 1-34

Lesson One – Basic Concepts

  1. Two-Level Name: reflib.filename (reference library name.filename of SAS data set)
  2. Two steps: Data and Proc (each have different statements)
  3. Statements: data, set, rub, proc
  4. SAS Statement: usually beings with SAS keyword and always ends with ;
  5. Create New File With Variables From Other File
    1. The following SAS program creates a new table (located in “reflib1” and “titled tablename”) and references a library (“reflib2”) and specific file (“filename”) within named library (“reflib2”) and shows data that variable (name of column) is equal to in “filename” (all bolded terms will be substituted with relevant information):

data reflib1.tablename;

set reflib2.filename;

if variable=”whatever”;

run;

  1. Format (format number/text) / Informat (removes formats out of text to make it readable, ex. taking decimals out of numbers):
    1. DOLLAR(w).(d) (shows dollar sign, w = maximum width of value to be written, d = number of decimal places)
    2. COMMA(w).(d) (comma puts commas in appropriate places)

Lesson Two – Using the Programming Workspace

  1. Turns on command line: Tools Options Preferences, View Tab, Command Line checkbox
  2. Turns on menus: Command Line pmenu on
  3. View Enhanced Editor (if editor/log window are hidden)
  4. if product=”SOFA”; (if statement)
  5. Print data into new file:
    1. proc print data=reflib.filename;

run;

  1. Creating/Defining Libraries and Engines (different engines allow you to perform different tasks, including reference non-SAS files):
    1. Active Libraries File New
    2. “Enable at startup” so that library is created each time SAS is started
    3. Type/Browse for path

Lesson Three – Referencing Files and Setting Options

  1. Assign libref (libref=name/library, SAS-data-library=location of SAS data):
    1. LIBNAME libref ‘SAS-data-library’
    2. ex. LIBNAME libraryname ‘c:\filelocation\wherever
  2. Specify Engines:
    1. libref engine ‘SAS-data-library’
    2. Unsure what this means, but probably important for referencing .xls:

Relational Database

Nonrelational Files

PC Files

ORACLE

ADABAS

Excel (.xls)

  1. Viewing contents of SAS:
    1. PROC CONTENTS DATA=SAS-file-specification NODS;

RUN;

    1. SAS-file-specification can take on following forms:

      1. <libref.>SAS-data-setnames (one SAS data set to process)
      2. <libref.>_ALL_ (requests all files in library, use period (.) to append _ALL_ to the libref could not read clearly, may be mistaken
      3. NODS suppresses the printing of detailed information about each file when you specify _ALL_ (can only specify NODS when you specify _ALL_)
      4. ex. proc contents data=libref.filename._all_ nods;
  1. Datasets

    1. proc datasets;

contents data=libref.filename._all_ nods;

quit;

proc contents data=libref.filename._all_ nods;

run;

    1. datasets lists variables alphabetically, to list variable names in the order of their logical position, specify the VARNUM option in proc contents or in contents statement in proc datasets

      1. proc datasets;

contents data=libref.filename varnum;

quit;

proc contents data=libref.filename varnum;

run;

  1. System Options Include:
    1. Date, Details, Page Number, Missing Number Value, Pagesize, Linesize
    2. Could not read into detail, can be manually set through Options but will not be part of SAS program unless written in
  2. Calling variables into new filesets
    1. data libref.filename;

set libref.newfilename;

run;

proc print data data=libref.filename;

var var1 var2 var3 var4

run;

    1. var(n) = names of variables (a.k.a. Columns)

Lesson Four – Editing and Debugging SAS Programs

  1. Open File Command:
    1. INCLUDE ‘file-specification’
    2. file-specification = physical name by which host system recognizes file

      1. ex. c:\programs\sas\filename.sas
    3. No semicolon at end
  2. Editing SAS Programs
    1. Tools Add Abbreviations (set to make life easier)
    2. NUMS Type in command line to toggle on/off line numbers in Program Editor
    3. Text Editor Line Commands:

Command

Action

Cn

Copies n lines (where n= a number up to 9999)

Dn

Deletes n lines

In

Inserts n blank lines

Mn

Moves n lines

Rn

Repeats current line n times

A

After (use with C, I, and M)

B

Before (use with C, I and M)

DD

Deletes a block of lines (mark first and last step)

CC

Copies a block of lines (mark first and last step)

MM

Moves a block of lines (mark first and last step)

RR

Repeats multiple lines (mark first and last step)

EXAMPLES:

Command

Action

00001

i3002

00003

Inserts 3 lines after line 00002

000c2

00002

0a003

Copies 2 lines (00001 and 00002) after line 00003

00b01

00002

00m03

Moves 1 line (00003) before line 00001

00DD1

00002

00DD3

Deletes lines 00001 to 00003

  1. Saving FILE Command:
    1. FILE ‘file-specification’
    2. ex. FILE ‘c:\programs\sas\filename.sas’
  2. Errors:
    1. Syntax errors: occur when program statements do not conform to the rules of the SAS language
    2. Data errors: occur when some data values are not appropriate for the SAS statements that are specified in a program
  3. Syntax errors:
    1. Check log after running for “ERROR”, will list probable cause/location of error

Problem

Symptom

Missing RUN statement

“PROC (or DATA) step running” at top of active window

Missing semicolon

Log message indicating an error in a statement that seems to be valid

Unbalanced quotation marks

Log message indicating that a quote string has become too long or that a statement is ambiguous

Invalid option

Log message indicating that an option is invalid or not recognized

  1. Missing RUN statement
    1. SAS program is run on separate steps, SAS recognizes end of current step when it encounters DATA or PROC statement (indicates beginning of new step), or RUN or QUIT statement (indicates end of current step)
  2. Canceling Submitted Statements
    1. Must cancel statements to correct syntax errors
    2. Press Ctrl and Break keys, Select #1 Cancel Submitted Statements
    3. Select Y to cancel submitted statements and resubmit the program
  3. Comments in SAS Programs:
    1. *text; OR /*text*/ (SAS ignores text within these fields when processing)
    2. /* Insert Comment Here */

Data perm.vansales;

Infile vandata;

Input @1 Region $9

@13 Quarter 1. /* Values are 1, 2, 3, or 4 */

@16 TotalSales comma11.;

Proc print data=perm.vansales;

Run;

  1. Debugging for Logical Errors
    1. Add / debug; after DATA statement

Lesson Five – Creating List Reports

  1. Basic PROC Print:
    1. libname libraryname ‘c:\library\location’; Sets libref as libraryname

proc print data=libraryname.filename; Generates report

run;

  1. Setting Variables:
    1. proc print data=libref.filename;

var var1 var2 var3 var4;

run;

    1. var(n) = columns headers in order to be viewed
    2. Add “NOOBS” in proc print statement to remove the OBS column

proc print data=libref.filename noobs;

  1. ID Statement:
    1. proc print data=libref.filename;

id var1 var2;

run;

    1. Replaces Obs column with variables specified
    2. Great when a ID needs to “spill over” onto variables that take up multiple lines/pages (used for Animal Number in OWT tables)
  1. WHERE Statement (specifies conditions)
    1. proc print data=libref.filename;

id var1 var2;

where var1>100;

run;

    1. WHERE statement specifies to select observations made where variable 1 is greater than 100
    2. ne / ^= mean “not equal to”
  1. Compound WHERE Statements:
    1. Link together with and (& if both expressions are true, then the compound expression is true) / or (| if either expression is true, then the compound expression is true)
    2. Use IN operator to simplify some statements:

Where var1=’LOW’ or var1=’HIGH’;

Where var2=0 or var2=50;

Where var1 in (‘LOW’, ‘HIGH’);
where var2 in (0, 50);

    1. Expressions in Parenthesis are evaluated first

(Where var1=’LOW’ or var1=’HIGH’) and var2=0;

Where var1=’LOW’ or (var1=’HIGH’ and var2=0;)

  1. Sorting Data
    1. proc sort data=libref.filename out=newlibref.newfilename;

by descending var1 var2 var3 ‘descending’ only applies to var1

run;

  1. Generating Column Totals
    1. SUM Statement

proc print data=reflib.filename;

var var1 var2 var3;

where var4<100; Note how var4 is not included but can still be used

sum var3;

run;

Obs

var1

var2

var3

1

value

value

value

2

value

value

value

Sum value

  1. Subtotals
    1. Combine SUM and BY statement together, add in proc print step
    2. SORT step sorts data by var4 and creates new file to output data with name/location of newreflib.newfilename, SUM requests column totals for var3, BY produces subtotal for each value of var4

proc sort data=reflib.filename out=newreflib.newfilename;

by var4;

run;

proc print data=newreflib.newfilename

var var1 var2 var3

where var1>100

sum var3

by var4

run;

      1. BY variable appears before table with value
      2. In table, BY variable and SUM subtotal appear at bottom

Var4=LOW

Obs

Var1

Var2

Var3

1

value

value

value

2

value

value

value

Var4

subtotal

Var4=HIGH

Obs

Var1

Var2

Var3

1

value

value

value

2

value

value

value

Var4

subtotal

total

  1. Combining ID, SUM and BY
    1. Obs line is suppressed, ID/BY variable is printed in its place
    2. ID/BY value is printed at start of each BY group and on line that contains group’s subtotal

proc sort data=reflib.filename out=newreflib.newfilename;

by var4;

run;

proc print data=newreflib.newfilename

var var1 var2 var3

where var1>100

sum var3

by var4

id var4

run;

Var4

Var1

Var2

Var3

LOW

value

value

value

value

value

value

LOW

subtotal

HIGH

value

value

value

value

value

value

HIGH

subtotal

total

    1. Add “PAGEBY” after ID statement to have groups printed on separate pages

proc print data=newreflib.newfilename

var var1 var2 var3

where var1>100

sum var3

by var4

id var4

pageby var4

run;

Var4

Var1

Var2

Var3

LOW

value

value

value

value

value

value

LOW

subtotal

—–PAGE BREAK—–

Var4

Var1

Var2

Var3

HIGH

value

value

value

value

value

value

HIGH

subtotal

total

  1. Double Spacing
    1. Add double onto the end of the proc print statement to get double spaced listing output

proc print data=reflib.filename double;

var var1 var2 var3

where var4=’low’

run;

  1. Titles and Footnotes
    1. Put before print statements
    2. title(n)’text’;
    3. footnote(n)’text’;
    4. Can specify up to 10 titles and footnotes
    5. Can specify in Titles/Footnotes window by typing in TITLES or FOOTNOTES command (these only stay set for active SAS session)
    6. Titles/Footnotes are put in numerical order, if a number is missing, HTML Output will show no space, SAS listing output will put a space for missing title number
    7. Redefining a title or footnote line cancels any higher-numbered title or footnote lines
    8. To cancel all previous titles or footnotes, specify a null TITLE or FOOTNOTE statement (a statement with no text), this will also cancel default SAS Title
    9. EXAMPLE:

Title1 ‘INSERT TITLE HERE’;

Footnote1 ‘INSERT FOOTNOTE HERE’;

proc print data data=libref.filename;

var var1 var2 var3 var4

run;

title1 ; cancels all titles

footnote ; cancels all footnotes

proc print data data=newlibref.newfilename;

var var1 var2 var3 var4

where var5=0

run;

  1. TEMPORARILY Assigning Labels to Variables (used in PROC step)
    1. LABEL variable1=’label1’

proc print data data=libref.filename label;

var var1 var2 var3

label var1=’variable 1’

var2=’variable 2’

var3=’variable 3’

run;

  1. Formatting Variables
    1. FORMAT var format-name
    2. format var1 mmddyy8 (var1 is a date)
    3. LABEL variable1=’label1’

proc print data data=libref.filename label;

var var1 var2 var3

label var1=’variable 1’

var2=’variable 2’

var3=’variable 3’

format var1 dollar8.2 gives $, commas, and 2 decimals

run;

Format

Specifies values…

Example

COMMAw.d

Contain decimals and commas

Comma8.2

DOLLARw.d

Contain $, commas and decimals

Dollar6.2

MMDDYYw

MMDDYY8 = 01/01/01

MMDDYY10 = 01/01/2001

mmddyy10

w.

Rounded to the nearest integer in w spaces

7.

w.d

Rounded to d decimal places in w spaces

8.2

$w.

As character values in w spaces

$12.

DATEw.

Date values of the form 16OCT99 (DATE7.) or 16OCT1999 (DATE9)

Date9.

    1. Field widths: (w) total width that is used for displaying the values in output, this includes commas and decimal points
    2. Decimal places: (d) 2030 with commas and 2 decimals requires 8 spaces after decimals, comma and decimal point is add (2,030.00)
  1. PERMANENTLY Assigning Labels (used in DATA step)
    1. data reflib.filename; location/name of new file being created

set reflib2.filename2; location/name of file getting information from

label var1=’variable 1’;

format var1 date9

run;

proc print data=reflib.filename label; still requires label in proc step

run;

  1. Split Option and Customizing Format
    1. proc print data=filename split=’*’;

var var1 var2;

label var1=”Variable*1”;

run;

    1. proc format

value $repfmt

‘TFB’=’Bynum’

‘MDC’=’Crowley’

Proc print data=filename;

Var var1 var2 var3;

Format var1 $repfmt.;

run;

  1. QUIZ Info:
    1. NONUMBER system option causes to appear without page numbers
    2. You do not need to specify variable IF specified in SUM
    3. SUM value contains same formatting as that of variables

Lesson Six – Creating SAS Data Sets from Raw Data

CREATING A SAS DATA FILE FROM RAW DATA:

  1. Steps to Create SAS Data Set:

To do this…

Use this SAS Statement…

Example…

Description

Reference SAS data library

LIBNAME

libname libref ‘c:\location’

Assigns libref to SAS library c:\location

Reference external file

FILENAME

filename fileref ‘c:\file.dat’

Assigns fileref to external file (not SAS library)

Name SAS data set

DATA

data libref.filename

Name of data set to be made (libref = library to be stored in, filename = name of file)

Identify external file

INFILE

infile file-specification <options>

ex.

infile filename obs=10

File-specification: can be filename (no libref), fileref , or instead of naming file using fileref could insert address

Describe data

INPUT

input variable <$> startcol-endcol…;

Used to break up fixed raw data into columns with variable names

Execute DATA step

RUN

Run;

Runs previous steps

List the data

PROC PRINT

Proc print data=reflib.filename

Puts data into viewable list

Execute final program step

RUN

Run;

Runs previous steps

  1. More about FILENAME
    1. Can associate it to storage area or single file (c:\file or c:\file.dat)
    2. If referencing single file, put in following:

      1. data libref.filename;

infile FILENAME;

input input1 $ input2 $;

run;

    1. If referencing storage file, put in following:

      1. data libref.filename;

infile FILENAME(subfile.dat); put single file in parenthesis

input input1 $ input2 $;

run;

      1. In Windows, can take out file extension in place of ‘ (ex. ‘subfile’)
  1. More About INFILE
    1. Options: describes the input file’s characteristics and specify how it is to be read with the INFILE statement
    2. Column Input: Can only read standard numeric values, no special characters and in fixed format (not free)
    3. Obs = (n) : specifies how many lines of obs will process to check and see if correct fields are being referenced/read
    4. Remove Obs=(n) to view entire raw data file
  2. More About INPUT
    1. Variable: SAS name assigned to field
    2. <$>: Type of data (blank = numeric, $ = character)
    3. Startcol/Endcol: Start column number – End column number

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

2

8

1

0

6

1

M

O

D

F

2

8

0

4

3

8

H

I

G

H

F

2

8

0

7

4

2

L

O

W

M

2

8

1

6

2

6

H

I

G

H

M

filename potato1‘c:\filelocation.dat’;

data potato;

infile potato1;

input ID $ 1-4 Age 6-7 ActLevel $ 9-12 Sex $ 14;

run;

Red Text=”Column” #’s to reference from raw data

Blue Text=Name of variables/columns in printed output

Result:

Obs

ID

Age

ActLevel

Sex

1

2810

61

MOD

F

2

2804

38

HIGH

F

3

2807

42

LOW

M

4

2816

26

HIGH

M

  1. Creating and Modifying Variables
    1. variable=expression
    2. ex. Name=’Christi Henderson’
    3. Operands: variable names or constants, can be numeric, character or both
    4. Operators: special-character operators, grouping parentheses, or functions
  2. Using Operators in SAS Expression

Operator

Action

Example

Priority

Negative prefix

Negative=-x;

I

**

Exponentiation

Raise=x**y;

I

*

Multiplication

Mult-x*y;

II

/

Division

Divide=x/y;

II

+

Addition

Sum=x+y;

III

Subtraction

Diff=x-y;

III

    1. TotalTime=(timemin*60)+timesec;
    2. timemin and timesec are both variables defined in input
    3. Redefine variable by setting it equal to an equation

var1=var1+(var1*.10); this sets var1 10% higher than original value

Result is on left side of equal sign, original value is on right

  1. Date Constants:
    1. ‘ddmmm<yy>yy’d
    2. dd is a one or two digit value for day
    3. mmm is a three letter abbreviation for month (JAN, FEB, etc)
    4. yy or yyyy is a two or four digit value for year, respectively
    5. TestDate=’01jan2000’d;
    6. Time=’9:25’t;
    7. DateTime=’18jane2005:9:27:05’dt;
  2. Subsetting IF Statement
    1. IF expression
    2. ex. IF var1=’value’ (if number then IF var1=number)
    3. if expression is TRUE then data continues to process that record or ob
    4. if expression is FALSE, no further statements are processed for that record or ob and control returns to top of the DATA step
  3. DATALINES Statement (reads instream data that is entered directly in SAS program, not external file—just copy and paste info instead of using “infile” step)
    1. data reflib.filename;

input ID 1-4 Name $ 6-25

RecHR 35-37

Tolerance $ 45;

datalines ;

. Data

. Data

. Data

Data lines go here

. Data

. Data

. Data

;

    1. Last step in DATA step, can only use once per DATA step
    2. Do not need a RUN; statement after because of null ; statement at end of datalines

CREATING A RAW DATA FILE:

  1. _NULL_ Keyword
    1. Enables you to use the DATA step without creating a SAS data set
    2. data _null_;

set libref.filename;

    1. a SET statement specifies the SAS data set that you want to read from
  1. Use FILE (specifies output file) and PUT (specifies how/what to write) Statements to write the observations from a SAS data set to a raw data file (same as INFILE and INPUT, just opposite way)

    1. FILE file-specification <options> <operating-environment-options>
    2. file-specification: previously defined fileref or ‘filename’ or actual name/location of file (‘c:\location.dat’)
    3. filename fileref (‘c:\location.dat’) now previously defined fileref

data _null_;

set libref.filename;

file fileref

    1. PUT variable startcol-endcol;
    2. Do not need to specify $ since it’s raw data

Lesson Seven – Understanding DATA Step Processing

  1. Could Not Read Lesson Very Well (Bad Copies)
  2. Information On How Data Is Compiled Through SAS Program
  3. Would be helpful for debugging, but not essential parts of programming in SAS

Lesson Eight – Creating And Applying User-Defined Formats

  1. Invoking PROC Format
    1. proc format <options>
    2. <options>:

Library=libref specifies the libref for a SAS data library that contains a permanent catalog in which user-defined formats are stored

FMTLIB prints the contents of a format catalog

    1. Formats are automatically stored in a temporary work.formats if not specified elsewhere with library=libref
    2. At the end of current session, catalog is erased
  1. Permanently Storing Formats
    1. proc format library=libref;
    2. FIRST need libname that associates with libref, recommended:

Libname library ‘c:\sas\formats\library’;

Now libref = library (so proc format library=library)

Any format created in this proc format step is stored in a permanent format catalog called library.formats

    1. libname library=library.catalog you can specify a catalog name in the library=option, and can store any formats in a catalog
  1. VALUE Statement
    1. VALUE ($)format-name must begin with $ if applying to characters not #s)

Range1=’label1’

Range2=’label2’

Range3=’label3’;

ex. proc format lib=library;

value potato potato being the name of the new customized format

103=’idaho’

141=’red skin’

163=’mashed’;

run;

    1. RANGE can be single # (10) or letter (‘a’), range of numbers (0-100) and letters (‘a’ – ‘m’), or list of unique values separated by commas (10, 20, 30)
    2. LOW, HIGH, OTHER

value potato potato being the name of the new customized format

low-<13=’child’

13-<20=’teenager’

20-high=’person’

Other=’person’’s identity unknown’; use 2 ‘’ to put in apostraphe

run;

    1. Able to put more than 1 formatting in same PROC step (just insert before run;)
  1. Referencing Custom Formats
    1. Reassign libref Library (libname library ‘c:\location’)
    2. Use same as other formats, but remember to put a period (.) at end of format name
  2. Referencing Cataloged Formats
    1. Add keyword to end of proc format step when referencing catalog:

Libname library ‘c:\sas\formats\lib’;

Proc format library=library fmtlib;

Run;

  1. QUIZ Info
    1. Value statement only needs semicolon after proc statement and after last variable has been entered
    2. Can only specify numbers or characters when labeling a unique list separated by commas (10, 20, 30) or (‘a’, ‘q’, ‘r’)

Lesson Nine – Creating Enhanced List And Summary Reports

  1. Default List Report:
    1. proc report data=<sas-data-set> <options>; data used for report

run;

    1. options: Windows or wd (invokes procedure in windowing mode, default), NoWindows nowd (displays listing of report, no windows)
  1. Selecting Variables
    1. column var1 var2 var3;
    2. can use where statements to show select rows
  2. Ways To Enhance Reports:
    1. Defining How Each Variable Is Used:
      1. define variable / <usage> <attribute(s)> <option(s)>

<justification> <’column-heading’>;

      1. variable = name of variable you want to define
      2. usage specifies how to use the variable. Valid options are ACROSS, ANALYSIS, COMPUTED, DISPLAY, GROUP and ORDER
        1. Display: character values
        2. Analysis: numeric values (default used to calculate the sum statistic)

More information available in “Specifying Statistics” section

        1. Order: orders the detail rows in a report according to their formatted values (default is ascending, add “descending” after order to change)

define var1 / order ‘Column Name’ will be in order by var1

define var2 / ‘Column Name 2’ width=4

        1. Group: used to group together (summarize) detail rows according to their formatted values

define var1 / group ‘Column Name’ width=6 center;

If any display variables are not identified as GROUP, ANALYSIS, ACROSS or COMPUTER then the outcome is same as “order”:

Column Name

Column 2

Column Name 1

23

26

Column Name 2

21

29

Add “group” to DISPLAY varibles if they should be collapsed and combined (otherwise will turn out like order as a list, not a summary):

define var1 / group ‘variable 1’ width=10

define var2 / group ‘variable 2’ width=10

define var3 / group ‘variable 3’ width=10

        1. Across: similar to group, but combines horizontally instead of vertically

Creates a frequency table, where unique observed data is under their respective variable/header, the data reflects the # of times that variable was observed:

Original Table:

Variable 1

Variable 2

Unique Record1

Unique Record4

Unique Record2

Unique Record4

Unique Record3

Unique Record5

Across Table:

Variable1

Variable2

Unique Record1

Unique Record2

Unique Record3

Unique Record4

Unique Record5

1

1

1

2

1

        1. Computed: A character or numeric variable that is defined for a report, they are added within the program and are not a part of the input data
  1. Specify new variable in column statement (to the right of the ones used in its calculation)
  2. Define new variable as a computer variable in a DEFINE statement
  3. Being a computer block by specifying new variable in a COMPUTE statement
  4. Use DATA step statements in the compute block to calculate new variable’s value

new variable = var1.statistic(statistic)var2.statistc)

newvariable = var1.sum-var2.sum

This is saying, newvariable equals the sum of variable 1 minus the sum of variable 2

proc report data=reflib.filename;

where var1 in (10, 20)

column var1 var2 var3 Step 1

define var1 / format=comma6.2

define var3 / computed ‘new variable’; Step 2

compute var3; Step 3

var3=var1.sum-var2.sum; Step 4

endcomp;

run;

        1. Specifying Statistics: can specify in summaries what kind of stats you want run on the numerical values that are combining (default sum)

define var1 / group ‘variable 1’

define var2 / mean ‘variable 2’ variable 2 grouped amount is now a mean of the values instead of the sum of the values

Statistic

Definition

CSS

Corrected sum of squares

USS

Uncorrected sum of squares

CV

Coefficient of variation

MAX

Maximum value

MEAN

Average

MIN

Minimum value

N

Number of observations with nonmissing values

NMISS

Number of observations with missing values

RANGE

Range

STD

Standard deviation

STDERR

Standard error of the mean

SUM

Sum

SUMWGT

Sume of the Weight variable values

PCTN

Percentage of a cell or row frequency to a total frequency

PCTSUM

Percentage of a cell or row sum to a total sum

VAR

Variance

T

Student’s t for testing the hypothesis that the population mean is a 0

PRT

Probability of a greater value of Student’s t

      1. attribute(s) specifies attributes for the variable, including FORMAT=, WIDTH=, and SPACING=

Attribute

Action

FORMAT=format

Assigns a SAS format or a user-defined format

Define var1 / format=dollar15.2;

SPACING=horizontal-positions

Specifies how many blank characters to leave between the selected column and the column immediately to its left. The default is 2.

Define var1 / spacing=5;

WIDTH=column width

Specifies the width of the column. The default column width is just large enough to handle the specified format (9 for numeric variables).

Define var1 / width=4;

COMBINED

Define var1 / format=15.2 width=4 spacing=5

      1. option(s) specifies formatting options, including DESCENDING, NOPRINT, NOZERO, and PAGE
      2. justification specifies column justification (CENTER, LEFT or RIGHT)
        1. Default left justify character values and right justify numeric values
        2. Example below centers column var1 and names is ‘Col Heading’

define var1 / format=comma9 width=3 ‘Col Heading’ center;

      1. ‘column-heading’ specifies label for column heading
        1. Specify new column-heading/variable name in ‘’

define var1 / width=13 spacing=5 ‘Column Heading’;

        1. Use a Split character to automatically spill on next line (default is /)

proc report data=reflib.filename split=’*’; sets * as Split

        1. Put HEADLINE (underlines headings and spaces between them) and HEADSKIP (writes blank line beneath heading or blank line) in proc report step to further enhance headings

proc report data=reflib.filename headline headskip;

Other Ways To Enhance Report (not covered in chapter):

    1. Assigning Formats To Variables
    2. Specifying Column Headings And Widths
    3. Justifying The Variable Values And Column Headings within the Report Columns
    4. Changing the Order of the Rows in the Report

Lesson Ten – Producing Descriptive Statistics

  1. Using Proc MEANS
    1. proc means data=reflib.filename <other stats>;
    2. Adding other stats at the end of the statement will override the default and only display those specified (listed below)
    3. Proc Means: prints out the n-count (number of nonmissing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in a data set

Descriptive Statistics

Statistic

Definition

CSS

Corrected sum of squares

CLM

Two-sided confidence limit for the mean

CV

Coefficient of variation

MAX

Maximum value

MEAN

Average

MIN

Minimum value

N

Number of observations with nonmissing values

NMISS

Number of observations with missing values

RANGE

Range

STD/STDDEV

Standard deviation

STDERR

Standard error of the mean

SUM

Sum

SUMWGT

Sume of the Weight variable values

LCLM

One-sided confidence limit below the mean

KURTOSIS

Kurtosis

UCLM

One-sided confidence limit above the mean

USS

Uncorrected sum of squares

VAR

Variance

Quantile Statistics

Statistic

Definition

MEDIAN / P50

Median or 50th Percentile

P1

1st percentile

P5

5th percentile

P10

10th percentile

Q1 / P25

Lower quartile or 25th percentile

Q3 / P75

Upper quartile or 75th percentile

P90

90th percentile

P95

95th percentile

P99

99th percentile

QRANGE

Difference between upper and lower quartiles: Q3-Q1

Hypothesis Testing

Statistic

Definition

PROBT

Probability of a greater absolute value for the t value

T

Student’s t for testing the hypothesis that the population mean is 0

    1. Proc MEANS output automatically uses BESTw. Format to display numeric values in the report

proc means data=reflib.filename stats maxdec=#; limits decimals to #

    1. Proc MEANS also outputs every variable by default, to limit use var statement

proc means data=reflib.filename min max maxdec=2;

var var1 var2 var3;

If there is a numbered list of variables (such as item1, item2, item3) can put in the range: var item1-item3;

    1. CLASS statement: groups variables when in proc means to separate data for statistics

proc means data=reflib.filename min max maxdec=2;

var heart cardiac;

class survive sex; different variables than what are listed above

run;

Survive

Sex

Variable

Min

Max

Died

1

Heart

#

#

Cardiac

#

#

2

Heart

#

#

Cardiac

#

#

Survived

1

Heart

#

#

Cardiac

#

#

2

Heart

#

#

Cardiac

#

#

    1. BY Statement also specifies variables to use for categorizing observations
      1. Unlike CLASS processing, BY processing requires that data is already sorted or indexed in the order of the BY variables (if not, will need to run the SORT procedure, MAKE SURE TO SPECIFIY OUT= OPTION!)
      2. BY produces several tables sorted by each variable listed, whereas CLASS produces one large table

proc sort data=reflib.filename out=reflib.newfile

by survive sex

proc means data=reflib.newfile min max maxdec=2;

var heart cardiac;

by survive sex; same as sort variables

run;

This produces the following output:

Survive=DIED Sex=1

Variable

Min

Max

Heart

#

#

Cardiac

#

#

Survive=DIED Sex=2

Variable

Min

Max

Heart

#

#

Cardiac

#

#

Survive=SURV Sex=1

Variable

Min

Max

Heart

#

#

Cardiac

#

#

Survive=SURV Sex=2

Variable

Min

Max

Heart

#

#

Cardiac

#

#

    1. OUTPUT Statement: Creates a new document for proc means output that contains just the summarized variable

output out=filename.statistic_var1 var2 var3

      1. Can also specify stats for each variable:

proc means data=reflib.filename;

var var1 var2; same order as this and must include ALL listed

class var3 var4;

output out=filename.sum_var5;

mean=AvgVar1 AvgVar2; must be in same order as var

min=MinVar1 MinVar2; must be in same order as var

run;

  1. Proc SUMMARY: (can use same code as proc means)
    1. Proc MEANS produces a report by default (can be suppressed with option noprint—put in at end of proc SUMMARY statement), whereas proc SUMMARY does not (will need a print option)

proc SUMMARY data=reflib.filename print; print, I say!

  1. Proc FREQ
    1. Creates one-way and n-way frequency tables and reports the distribution of variable values
    2. By default, proc FREQ creats a one-way table with the frequency, percent, cumulative frequency, and cumulative percent of every value of all variables in a data set

proc FREQ data=reflib.filename;

run;

Variable

Frequency

Percent

Cumulative Frequency

Cumulative Percent

Value

Number of observations with the value

Frequency of the value divided by the total number of observations

Sum of the frequency counts of the value and all other values listed above it in the table

Cumulative frequency of the value divided by the total number of observations

    1. To specify variables use TABLES statement (default selects all variables)

proc FREQ data=reflib.filename;

TABLES var1 var2 var3

run;

    1. Two-Way Tables (used to cross tabulate variables)
      1. TABLES var1*var2 <*variable-n> optional
      2. Crosstabulations causes proc freq to create a table with cells that contain:

        1. cell frequency
        2. cell percentage of total frequency
        3. cell percentage of row frequency
        4. cell percentage of column frequency

Table of Var1 by Var2

Var2

Var1

Var2 Data

Var2 Data

Var2 Data

Var1 Data

Cell frequency

Cell frequency

Cell frequency

Cell % of total frequency

Cell % of total frequency

Cell % of total frequency

Cell % of row frequency

Cell % of row frequency

Cell % of row frequency

Cell % of column frequency

Cell % of column frequency

Cell % of column frequency

Var1 Data

Cell frequency

Cell frequency

Cell frequency

Cell % of total frequency

Cell % of total frequency

Cell % of total frequency

Cell % of row frequency

Cell % of row frequency

Cell % of row frequency

Cell % of column frequency

Cell % of column frequency

Cell % of column frequency

Total

      1. N-Way Tables (for a frequency analysis of more than 2 variables, divides the variables up into separate tables sorted by 1st variable listed)

TABLES var3*var1*var2; var3=levels, var1=rows, var2=columns

    1. List Format (makes crosstabulations easier to read)
      1. TABLES var3*var1*var2 / list;

Instead of putting var1 in rows, it adds it to the columns list, list is sorted by var3:

Var3

Var1

Var2

Frequency

Percent

Cumulative Frequency

Cumulative Percent

A

#

#

#

#

#

#

B

#

#

#

#

#

#

C

#

#

#

#

#

#

    1. Suppressing Table Information (added options to tables statement to suppress)
      1. NOFREQ (suppresses cell frequencies)
      2. NOPERCENT (suppresses cell percentages)
      3. NOROW (suppresses row percentages)
      4. NOCOL (suppresses column percentages)

TABLES var1*var2 / nofreq norow nocol; Suppresses all but cell %

  1. QUIZ Info:
    1. BY group processing requires data to already be sorted or indexed
    2. CLASS statement produces a single table, BY group processing creates a series of small tables
    3. Frequency distributions work best with categorical values

Lesson Eleven – Producing HTML Output

  1. Use Output Delivery System (ODS) statements to generalize output in HTML
  2. ODS Open/Close Statement
    1. ODS open-destination; keyword and any required options
    2. ODS close-destination CLOSE;
    3. Listing destination is open by default, as it uses system resources it would be wise to close it at the beginning of the program if it is not required

ods listing close;

Blah

Blah

Blah

ods listing; It is good programming practice to reopen Listing at end

    1. Close multiple ODS destinations by using the keyword _ALL_ in CLOSE statement:

ods _ALL_ close;

  1. Creating Simple HTML Output
    1. ods HTML body=file-specification;

ods HTML close;

file-specification identifies the file that contains the HTML output, can be entire file name or fileref (file shortcut)

ods listing close; closes listing output

ods html body=’c:\location.html’; specifies html output

proc print data=reflib.filename label; specifies data to print and starts label

var sex age height actlevel; specifies variables to include

label actlevel=’Activity Level’; labels actlevel as Activity Level

run; runs steps above

ods html close; closes output delivery system “html”

ods listing; opens output delivery system “list”

    1. Multiple outputs can be put in the same HTML ods output, just add in more proc steps before run and it will save to same html file
  1. HTML Table of Contents
    1. ods html

body=body-file-specification

contents=contents-file-specification

frame=frame-file-specification

ods html close;

    1. body-file-specification is the name of an HTML file that contains the procedure output
    2. contents-file-specification is the name of an HTML file that contains a table of contents with links to the procedure output
    3. frame-file-specification is the name of an HTML file that integrates the table of contents and the body file. If you specify FRAME= you must also specify CONTENTS=

ods listing close;

ods html body=’c:\records\data.html’ creates new file called data

contents=’c:\records\toc.html’ creates new file called toc

frame=’c:\records\frame.html’; new file named frame, put ; on last)

proc print data=clinic.admit label;

var id sex age height weight actlevel;

label actlevel=’Activity Level’;

run;

proc print data=clinic.stress2;

var id resthr maxhr rechr;

run;

ods html close;

ods listing;

    1. URL Option (can only be used with Body and Contents)

ods html body=’c:\records\data.html’

(url=’http://mysite.com/location/file.html’) can save anywhere

contents=’c:\records\toc.html’

(url=”filename.htm”) can only reference if in same location

frame=’c:\records\frame.html’;

    1. PATH = Option (Add onto ods html statement)
      1. path=’filelocation’ (url=option)

ods html path=’c:\location’ (url=none) specifying “non” means no information about path= option appears in links or references

      1. Purpose of PATH= is to specify the location, so body, contents and frame only need to specify file

ods listing close;

ods html path=’c:\location’

body=’data.html’

contents=’contents.html’

style=brick;

proc print data=clinic.stress2;

var id resthr maxhr rechr;

run;

ods html close;

ods listing;

  1. Changing Appearance of HTML Output
    1. Style=style-name
    2. Available Styles:

      1. Beige
      2. Brick
      3. Brown
      4. D3d
      5. Default
      6. Minimal

ods listing close;

ods html body=’c:\records\data.html’

style=brick;

proc print data=clinic.stress2;

var id resthr maxhr rechr;

run;

ods html close;

ods listing;

      1. Find more styles by:

        1. Click Results Tab on the Explorer Window
        2. Right-click the Results Icon and Select Templates
        3. Open Sashelp.tmplmst
        4. Open Styles folder
      2. Can Create own styles by using TEMPLATE procedure

Lesson Twelve – Creating Tabular Reports

  1. Tabulate Procedure
    1. creates customized one-,two- and three-dimensional tables that display any of a large number of descriptive statistics
    2. can generate more than one table in one step
    3. produce subreports without sorting data
  2. Setting Up A Table

Step 1:

Sketch Table

Step 2:

Compute Numbers

Step 3:

Evaluate Table

Step 4:

Enhance Report

    1. Step 1: Sketch Table
      1. Identify the data you are analyzing and then determine:

        1. which variables (if any) you need for classifying the data
        2. which variables (if any) you need for analyzing the data
        3. the type of table you need for representing your data
    2. Writing PROC Tabulate step
      1. PROC TABULATE: invokes procedure and identifies data set

Proc tabulate data=reflib.filename;

      1. CLASS: specifies variables that are used to classify data
      2. VAR: specifies variables that are used to analyze data

***MUST BE NUMERIC***

      1. TABLE: defines the table to display your data

TABLE statement specifies the # of dimensions (page, row, column), the variables in the table, and the statistics to be calculated

TABLE page-expression, row-expression, column-expression/options

The number of commas in expression dictates the number of dimensions the table will have (0 = 1 dimension, 1 = 2 dimensions, 2 = 3 dimensions)

Table type;

If you specify only one expression (in this one, the expression is “type”), then it requests a one dimensional table that contains only one row, and the specified variable are the columns

Type

I

II

N

N

3

17

Table type premium;

Similar to previous, but now there are two columns specified (“type” and “premium”)

Type

I

II

Premium

N

N

Sum

3

17

3359.15

Table type,premium;

If two variables are separated by a comma, the right is still the column statement whiel the left becomes a row expression (“premium” is the column, and “type” is the row)

Premium

Sum

Type

I

312.65

II

3046.50

Table type,premium,sum;

Makes a three dimensional table, the first is the page expression (“type”, appears in upper-left corner), the second is the row expression (“premium”), and the third is the column heading (“sum”)

Type 1

Sum

Premium

3046.50

  1. Specifying Statistics
    1. Use the asterisk (*) to attach a variable to a statistic

Table fee*mean produces a heading that reads “Fee Mean”

Or

Table mean*fee produces a heading that reads “Mean Fee”

Proc tabulate data=reflib.filename;

var fee;

table fee*mean;

run;

    1. If any variables are specified in table statement, the default statistic is SUM and you can request any statistic for the analysis of variables
    2. CLASS variables can only have statistic of n and pctn run on them
    3. In a single table statement, you can specify statistics in any dimension, but they all must be in the same dimension

Proc tabulate data=reflib.filename;

Class var3 var4

var var1 var2;

table var1*mean var2*max, var3; in same dimension (row/before comma)

table var4*pctn var3*n both in column dimension

run;

    1. Use a “WHERE” statement to further limit/specify what is shown in the report

var fee;

table fee*mean;

where fee<50;

    1. Running stats on all variables is simple, add ALL to end of row or column expression in table statement (default is SUM)

table fee all;

or

table fee, type, premium;

    1. Should know how to:

      1. Add titles/footnotes (title(n)’text’;) (footnote(n)’text’;)
      2. Replace variable names with descriptive text (label)
      3. Format data values (format)

Proc tabulate data=sasuser.therapy;

Var walkjogrun swim;

Table walkjogrun swim;

Title1 ‘Attendance in Exercise Therapies’;

Footnote1 ‘6-16-08);

Label walkrunjog=’Walk/Jog/Run’;

Run;

    1. Label Statistics and ALL by using KEYLABEL statement

Keylabel keyword=’text’

Where keyword is ALL or one of the following keywords for statistics: N, SUM, MEAN, MAX, MIN, STD, PCTN, PCTSUM

Proc tabulate data=sasuser.therapy;

Var walkjogrun swim;

Table (walkjogrun swim)*mean, attendants all;

Title1 ‘Attendance in Exercise Therapies’;

Footnote1 ‘6-16-08);

Label walkrunjog=’Walk/Jog/Run’;

Keylabel mean=’Average’

All=’All Participants’;

Run;

  1. Formatting the Table
    1. Specifies width of table cells and format for summary values (default is 12.2)

proc tabulate data=reflib.filename format=6.; #.# width.decimal

    1. The format statement only affects the values of class variables, it has no effect on summary values, unless you put format in the proc tabulate step
    2. Use asterisk (*) to create hierarchical variable tables:

Table sex,actlevel*age*max;

ACTLEVEL

HIGH

LOW

MOD

AGE

AGE

AGE

MAX

MAX

MAX

SEX

44

49

43

M

F

29

60

54

    1. Use parenthesis to control how expressions are evaluated

Table type*(sex all);

    1. Condense multiple pages onto one page:

Table,type,fee,sum / condense;

    1. Specify how percentages are calculated (In table below, the percentages are calculated within each column):

Table type,fee*sex*pctsum<type>;

    1. Can create formats to change headings for the values of class variables:

Proc format;

Value $format ‘var1’=’variable 1

var2’=’variable 2

    1. Specify the order of class variable values

Proc tabulate data=reflib.filename order=var1;

    1. Control the spacing of the row titles

Table mean min max,height weight / rts=6.;

    1. Format selected table cells and their contents

Table type*(sex all*f=5.);

    1. Can replace specific occurrences of a variable name or statistic name with a label

Table type*(sex all=’Both Sexes’*f=5.)

all=’Both Types’;

Lesson Thirteen – Enhancing HTML Tabular Reports

  1. Apply the STYLE= Option
    1. Use in the ods statement to set premade style (see lesson 11), use style= within proc tabulate to change the appearance of individual rows, columns, and cells
    2. STYLE=<style-name> <<parent>>

{<attribute-1=value><…attribute-n=value>}

  1. style-name refers to a predefined style
  2. <parent> applies the style attributes of the nesting cell
  3. attribute specifies a style attribute to change, such as
    1. foreground
    2. background
    3. backgroundimage
    4. flyover
    5. font_face
    6. font_size
    7. font_style
    8. font_weight

You can shorten the keyword style to s in your programs:

s={background=yellow}

You can also use brackets instead of {}

=[background=yellow]

    1. Changing CLASS Headings

Add style option by adding slash (/) after class statement

Class var1 / style=[background=lib];

Lib = light blue, can also use cxRRGGBB value

    1. Keyword statement makes it possible to add style= option

keyword min max mean sum / style=[background=lib];

    1. Use box statement to apply styles to cell in upper-left corner (knows as the box cell), this is added to the end of the table statement with a slash /:

Ods listing close; close the ods listing type

Ods html path=’c:/data’ body=’laguard.html’; name/location of output is html

Title1 ’LaGuardia Flights by Destination’; title 1

Title2 ‘and Revenue, March 4-10’; title 2

Proc format; invoke format statement (to create own format)

Value $desname ‘CPH’;=’Copenhagen’ sets name as desname character

‘FRA’=Frankfurt’ sets values CPH equal to given

Run;

Proc tabulate data=flights.laguardia format=comma9.; name/location of input

Class dest / style=[background=cxCCEEDD]; sets class as dest w/ style

Var boarded transferred deplaned revenue; sets variables to include

Table dest*(boarded transferred deplaned revenue), * forms hierarchy

Min max mean sum / box=[style=[background=lib]]; stats

keyword min max mean sum / style=[background-cxCCEEDD]; styles

label dest=’Destination’; label dest as Destination

format dest $desname.; formats dest with format made in previous step

run;

ods html close; closes ods html

ods listing; opens ods listing

    1. Use CLASSLEV to add styles to of class level cells (class levels=var under class)

Class dest / style=[background=cxCCEEDD];

Classlev dest / style=[background=cxCCEEDD];

    1. Using proc format VALUE statement to create a color format:

Proc format;

Value $colors ‘var1’=’cxBBDDCC’

var2’=’cxAACCBB’

var3’=’cx99BBAA’

……….later on in classlev statement………….

Classlev varbles / style=[background=$colors.];

    1. For Analysis Variables, Add style to the var statement:

var boarded transferred

var deplaned / style=[background=cxCCEEDD];

Write a separate var statement or else other variables will be affected by style

    1. <parent> will allow variables underneath parent cell to be affected by same format (including the conditions)

var deplaned / style=<parent>;

    1. To change the background colors of data cells, add the style= option to the end of the proc tabulate statement

proc tabulate data=lifreb.filename format=comma9. style=[background=ywh];

    1. Applying Styles to Specific Cell Values

Apply same style as heading by adding an asterisk (*) after a variable in the table statement:

Table var1*(var2 var3 var4*(style=<parent>);

    1. Changing Appearance of Text:

Attribute

Sample Values

Foreground

White, cx336699

Font_face

Times, helv, courier, serif

Font_size

1-7, 10pt, .2in

Font_style

Roman, italic, slant

Font_weight

Light, medium, bold, black

Font_width

Compressed, narrow, normal, wide

Put inside of style=, after background (if included) attribute=samplevalue

Classlev varbles / style=[background=$colors. Font_size=10pt];

    1. If under the effects of <parent>, put {} and changed attribute after

Table var1*(var2 var3 var4*(style=<parent>{font_style=italic});

    1. Trafficlighting (aka Conditonal Formatting)

Proc format;

Value condit low-105=’lipk’ light pink = lipk (condit is name of format)

106-high=’vpapb’; vpapb = very pale purplish blue

Table var1*(var2 var3 var4*{style=[background=condit.]};

    1. Flyover text (alt text), attaches text cells to view when mouse in on them:

Make proc format value (like above) only set as normal text, (ex. name of format is condit, low-105=’insert flyover text to be viewed’)

Then add flyover attribute to the style option in the table statement

Table var1*(var2 var3 var4*{style=[flyover=condit.]};

    1. Including HTML Tags Directly in SAS Statements

By default, titles in HTML will be blue sans serif font, change this by adding in attributes and values using HTML tags

Title1 ‘<FONT face=”fontname”>Title</FONT>’;

Title1 “<FONT face=’fontname’ color=’brown’>Title</FONT>”;

If title has single quotations ‘’, then use double on HTML tags “” and vice versa

    1. Embedding Images

Destination

Copenhagen

Boarded

Transferred

Deplaned

Revenue

Value $desname ‘CPH’=’Copenhagen’

To add in image:

  1. Start with <P> tag to force page break
  2. Add in IMG tag and specify the source image file

Value $desname ‘CPH’=’Copenhagen<P><IMG src=”smiley.gif”>’

  1. Must enclose in <DIV> image strings to display mixed text and IMG strings

Value $desname ‘CPH’=’<DIV>Copenhagen<P><IMG src=”smiley.gif”></DIV>’;

Destination

Copenhagen

Boarded

Transferred

Deplaned

Revenue

Lesson Fourteen – Creating Plots (Graphs)

  1. GPlot procedure:

Proc Gplot invokes procedure and specifies what data to use

Plot Statement specifies which variables to plot

PROC GPLOT DATA=reflib.filename;

PLOT vertical-variable*horizontal variable;

RUN;

Vertical-variable y-axis

Horizontal-variable x-axis

  1. Specifying Tick-Mark Values for Axes

PLOT Statement with VAXIS= and HAXIS= options

PLOT vertical-variable*horizontal-variable /

VAXIS=<value-list | range>

HAXIS=<value-list | range>;

Value-list and range determines the placement of tick marks along the axis

    1. Specifying Values:

Value list:

haxis=’MON’ ‘TUES’ ‘WED’ ‘THURS’ ‘FRI’ must be identical to data set

Proc gplot data=clinic.therapy1999;

Plot aerclass*month /

Haxis=’01’ ‘06’ ‘12’; character values need ‘’ around them

Run;

A range of values:

VAXIS=10 to 100 by 10

Proc gplot data=clinic.therapy1999;

Plot aerclass*month / Haxis=’01’ ‘06’ ‘12’

vaxis=0 to 100 by 50;

Run;

    1. Multiple Plot Requests Using OVERLAY

Proc gplot data=reflib.filename;

Plot vertical-variable1*horizontal-variable

vertical-variable2*horizontal-variable /

OVERLAY;

Run;

Horizontal variable must be the same for both plots, first y-axis variable named will label the axis (note: if OVERLAY was not specified, it would have produced 2 graphs)

    1. Enhance Plots with SYMBOL Statement

SYMBOL <n> <option(s)>;

n is the symbol statement number (1 to 99), if no value is assigned symbol1 is assumed, these statements remain in effect until you change/cancel them or end session

SYMBOL Options:

This option…

Specifies…

VALUE=

The plotting symbol

HEIGHT=

The height of a plotting symbol

INTERPOL=

The interpolation technique

WIDTH=

The thickness of the line in pixels

COLOR=

The color of plotting symbols or lines

Symbol1 color=red value=star Interpol=spline sets symbol1’s color, value, interpol

Height=1 cm width=4; sets symbol1’s height and width

Proc gplot data=clinic.totals2000; invokes plot graph of data “clinic.totals2000”

Plot therapy*month; sets variables to y-axis “therapy” and x-axis “month”

Run;

To set OVERLAY plots differently put two consecutive Symbol statements (symbol1 and symbol2), the first applies to the first plot, the second applies to the second, etc.

The VALUE= option specifies the plotting symbol that represents each data point, possible values for VALUE= include:

  • Letters A through W
  • Numbers 0 through 9
  • A number of special symbols, including PLUS, STAR, SQUARE, DIAMOND, TRIANGLE and many others
  • NONE, which produces a plot with no symbols for data points

The HEIGHT= option specifies the height of the plotting symbol, specify a numerical value and unit of measurement, units of measurement include:

  • Percentage of the display area (PCT)
  • Inches (IN)
  • Centimeters (CM)
  • Points (PT)
  • Character cells (CELL), which is the default unit

INTERPOL= option decides whether or not the plot lines will connect, and if they do what the lines will look like. Some INTERPOL options include:

  • NONE
  • JOIN (straight lines)
  • NEEDLE (vertical lines from x-axis)
  • SPLINE (smooth line, can combine with NONE to produce just smooth line)
  • HILO
  • STD

SPLINE WITH NONE

(if NONE was not activated, it would have had plots on highs/lows)

COLOR= option specifies the color of the plotting symbols and interpolated lines

TIP: Specify a different color for each plot point or else the same symbol statement will be used for each OVERLAYING or multiple plot lines.

MUST SPECIFY DIFFERENT COLOR FOR EACH PLOT PIONT FOR NEW SYMBOL TO TAKE EFFECT!!!

Symbol Statements:

  1. Symbol statements remain in effect until end of session or cancelled

    1. Submitting a statement that modifies a previously created symbol (ex. Symbol1) will modify only that aspect of that symbol and leave the other specifications alone
    2. Canceling a symbol statement:

      1. Can cancel an individual symbol option, all options in one symbol statement or all symbol statements currently in effect
      2. Individual Symbol Statement Cancel:
        1. Submit: symbol1 color=; with nothing specified, the previously defined color will be cancel
      3. Entire Symbol Statement Cancel:
        1. Submit: symbol1;
      4. All Symbol Statements Cancel:
        1. Submit: goptions reset=symbol;
  2. RUN-Group Processing
    1. Instead of resubmitting an entire PROC step, just submit a new symbol statement and run, this will automatically resubmit the last step before it along with the new statement
    2. To end the procedure you must submit another PROC, DATA or QUIT step
    3. Not all procedures support RUN-group process (PROC GPLOT steps do, which is beneficial when wanting to change the symbol options)
  3. WHERE Statement in PROC GPLOT steps
    1. EXAMPLES:

      1. Proc gplot data=clinic.therapy1999;

Plot swim*month aerclass*month/overlay;

Where swim>35 and aerclass>35;

Run;

  1. Specifying Storage Location For Graphs
    1. Use GOUT=option in Proc Statement

      1. GOUT=<libref.>output-catalog
  2. Additional Features
    1. y*x=z generates a graph for a third variable only

      1. symbol statements….;

proc gplot data=air.airqual;

avgtsp*month=state;

where state in (“AL” , “GA”);

run;

    1. Control the tickmarks in between graphs:

      1. Plot gplot avgtsp*month=state / vminor=3 hminor=0;
      2. Vminor = determines how many tickmarks on vertical axis
      3. Hminor = determines how many tickmarks on horizontal axis
    2. AREAS=option fills in the areas below the plotted lines
      1. PATTERN statements define the fill colors
      2. NOTE statements identify the data
      3. NOLEGEND option in proc statement suppresses the legend

proc gplot data-air.airqual;

plot avgtsp*month=state /

vminor=3 hminor=0 areas=2 nolegend;

pattern1 color=red;

pattern2 color=blue;

note move=(10, 19) color=red ‘Alabama’;

note move=(10, 20) color=blue ‘California’;

symbol1 c=red i=spline v=none;

symbol2 c=blue i=spline v=none;

where state in (“CA” , “AL”);

run;

  1. Active X Plot Control
    1. Allows for rotating, pointing and clicking to further investigate graph
    2. Ods html body=’plot.htm’;

Goptions reset=global gunit=pct border cback=white

Colors=(blue red) ctext=black

Ftitle=swissb ftext=swiss htitle=6 htext=4

Device=activex;

proc gplot data=stocks;

plot dowjoneslow*year

dowjoneshigh*year / overlay

haxis=axis1

hminor=4

vaxis=axis2

vminor=1

caxis=black

areas=2;

symbol1 interplot=join;

axis1 order=(1955 to 1995 by 5) offset=(2,2)

label=none

major=(height=2)

minor=(height=1);

axis2 order=(0 to 6000 by 1000) offset=(0,0)

label=none

major=(height=2)

minor=(height=1)

run;

quit;

ods html close;

  1. Quiz Info
    1. The plot statement specifies the variables to be plotted (with the variable on the vertical axis listed first), followed by the VAXIS= and HAXIS= options

Lesson Fifteen – Creating Bar and Pie Charts

  1. GCHART Procedure:

PROC GCHART <data=SAS-data-set>;

Chart-form chart-variable </ options>;

RUN; invokes procedure and specifies what data to use

    1. sas-data-set is name of the SAS data set to be used
    2. chart-form is HBAR, HBAR3D, VBAR, VBAR3D, PIE, or PIE3D
    3. chart-variable is the variable that determines the number of bars or pie slices
    4. / (slash) indications that options follow
    5. options are any valid options for the specified chart form
    6. the default stat for GCHART is FREQ

proc gchart data=sasuser.admit;

hbar sex;

vbar age;

pie actlevel;

run;

Output gives three graphs: horizontal bar with sex as the variable, vertical bar with age as the variable and pie chart with actlevel as the variable

  1. Specifying Stats
    1. Use the Type=stat option
    2. proc gchart data=sasuser.admit;

vbar company / type=cfreq;

run;

The bars will reflect the cumulative frequency instead of the frequency, other options include: percent, freq, cpercent

  1. Summarizing a variable in the chart
    1. Use the option SUMVAR=summary-variable
    2. The default statistic is SUM, can use TYPE= but it can only be SUM or MEAN stat
    3. proc gchart data=clinic.insure;

vbar company / sumvar=balancedue;

run;

This output shows each company on the x-axis, with the balancedue on the y-axis (the bars represent the sum of the values of BALANCEDUE for each value of COMPANY (total balance due by company).

    1. proc gchart data=clinic.insure;

vbar company;

run;

This output (sans sumvar= option) shows each company on the x-axis, with the frequency on the y-axis (the bars represent the number of people who have a policy with each company).

    1. Use the Type=MEAN option along with sumvar= to use mean instead of sum for the sumvariable:

proc gchart data=clinic.insure;

vbar company / sumvar=balancedue type=mean;

run;

  1. WHERE statement can be used in gchart:

proc gchart data=clinic.insure;

vbar company / sumvar=balancedue type=mean;

where company in (“A&R”, “ACME”);

run;

  1. Enhancing Charts:
    1. Colors: Use patternid= option in same statement that specifies the chart, patternid=<by | midpoint | group | subgroup> (bar colors and/or patterns vary according to the option specified):

proc gchart data=clinic.insure;

vbar company / sumvar=balancedue type=mean

patternid=midpoint;

run;

    1. To change the pattern to either all solid or all cross-hatch, you can use the FILL=option in the PIE or PIE3D statement

      1. FILL=X changes the fill pattern for all slices to cross-hatch
      2. FILL=S changes the fill pattern for all slices to solid
  1. RUN-group processing: Instead of resubmitting a proc, just add on additions and it will automatically resubmit (example, adding a label for x-axis variable “month”: label month=’Month of March’;)
  2. GROUP= option will categorize data

proc gchart data=clinic.admit;

vbar sex / sumvar=weight type=mean

group=actlevel patternid=group;

run;

This option will separate the bars into different categories, making sex a subcategory divided by actlevel

  1. SUBGROUP= option will divide the bars by category, using patternid=subgroup will change the patterns of the bars, using pattern statements will specify these colors (lib=light blue, lig=light green):

pattern1 color=lib;

pattern2 color=lig;

proc gchart data=clinic.admit;

hbar age / sumvar=weight type=mean

subgroup=sex patternid=subgroup mean;

run;

  1. Specifying Stats in Horizontal Bar Charts (specify stat(s) wanted in chart statement):

proc gchart data=clinic.admit;

vbar sex / group=actlevel

patternid=group freq;

run;

  1. Additional Features for Pie Charts
    1. CTEXT= option specifies the text color
    2. EXPLODE= option separates one or more pie slices from the rest of the pie (for emphasis)
    3. SLICE= option can control where labels appear

proc gchart data=clinic.insure;

pie company / sumvar=balance due type=mean

ctext=blue slice=arrow explode=”ACME”;

where Company in (“ACME”, “RURITAN”, “USA INC.”);

run;

  1. SAS/GRAPH control for ActiveX

ods html body=’chart.htm’;

goptions reset=global gunit=pct border cback=white

colors=(blue red) ctext=black

ftitle=swissb ftext=swiss htitle=6 htext=4

device=activex;

pattern1 color=lib;

pattern2 color=lig;

proc gchart data=clinic.admit;

hbar3d age / sumvar=weight type=mean

subgroup=sex patternid=subgroup mean;

run;

quit;

ods html close;

Lesson Sixteen – Enhancing and Exporting Charts and Plots:

  1. Labeling Variables:
    1. Use the label statement as normal:

Proc gchart data=clinic.insure;

Vbar company / type=sum sumvar=balancedue

patternid=midpoint;

where company in

(‘A&R’,’ACME’,’RURITAN’);

Label balancedue=’Balance Due’

Company=’Insurance Co. Billed’;

Run;

  1. Formatting Variables:
    1. Format statements have the same effect in SAS/GRAPH ouput

Proc gchart data=clinic.insure;

Vbar company / type=sum sumvar=balancedue

patternid=midpoint;

where company in

(‘A&R’,’ACME’,’RURITAN’);

Label balancedue=’Balance Due’

Company=’Insurance Co. Billed’;

Format balancedue dollar5.;

Run;

  1. Titles and Footnotes:
    1. These statements also can be used in SAS/GRAPH output:

Title1 ‘Total Revenue for Males’;

Footnote1 ‘For March 1-15’;

Proc gchart data=clinic.insure;

Vbar company / type=sum sumvar=balancedue

patternid=midpoint;

where company in

(‘A&R’,’ACME’,’RURITAN’);

Label balancedue=’Balance Due’

Company=’Insurance Co. Billed’;

Format balancedue dollar5.;

Run;

    1. It is also possible to enhance titles and footnotes using options font= color= height=, put them in front of the text they modify:

Title1 color=red font=centb h=0.7cm ‘Total Revenue for Males’;

Footnote1 color=blue h=0.5cm font=swiss ‘For March 1-15’;

    1. Default Values for COLOR, FONT and HEIGHT:

Option

TITLE1 Defaults

Other Title and Footnote Defaults

COLOR=

The first default color for the graphics device

The first default color for the graphics device

FONT=

SWISS

NONE. The NON font specifies the default hardware font for the output device

HEIGHT=

2 CELLS

1 CELL

  1. Controlling HBAR Statistics
    1. HBAR | HBAR3D chart-variable | <options>

Statistic

Statistic-label=’label-text’;

Statistic: specifies which statistic or statistics are displayed on the right side of the bars

Statistic-label: is a keyword that includes the name of the statistics whose default label you want to replace. Valid keywords are FREQLABEL=, CFREQLABEL=, PERCENTLABEL=, CPERCENTLABEL=, SUMLABEL=, and MEANLABEL=

label-text: specifies the text that will be used for the label. You can use either single or double quotation marks. Limited to 32 characters.

  1. Controlling VBAR Statistics
    1. VBAR | VBAR3D chart-variable | <options>

OUTSIDE=statistic

INSIDE=statistic

Outside and Inside may be used together, just puts statistic inside the bars or outside of the bars (by default, frequency statistic is run for vbars)

    1. Label Axis:
      1. axis1 label=(‘No. of People’);
      2. Put on line above proc gchart statement
  1. Exporting Graphs
    1. Can be done interactively (File Export as Image) or with program statement
    2. Program statement to export graphs (Specify a destination for the output and the device driver that creates the type of graphics output you want):

Filename mygif ‘c:\mysasfiles\hbar1.gif’;

Goptions gsfname=mygif device=gif;

SAS\GRAPH statements

  1. Sizing Graphs
    1. If the graph is created in the GRAPH window, use the hsize= and vsize= options
    2. If the graph is exported directly to external files by using program statements, use the xpixels= and ypixels= options
    3. Use them in the goptions statement:

Goptions hsize=# <IN | CM | PT> vsize=# <IN | CM | PT> border border adds a border (duh)

  1. Creating Interactive Graphs
    1. SAS/Graph Control for ActiveX enables interactive graphs that can have changeable features such as: chart type, variables and statistics displayed and properties such as color, axes, labels and legends
    2. To view graphs that use SAS/GRAPH ActiveX on the Web, you need an ActiveX-enabled browser such as IE 4.0+ Or Windows 95 or later

Submit…

To…

ODS HTML Statement

Open the HTML destination and create HTML output

GOPTIONS Statement

Specify the ACTIVEX device driver for generating the output

SAS\GRAPH Procedure

Create the graph

ODS HTML Statement

Close the HTML destination

Ods listing close;

Ods html body=’barchart.htm’; Body specifies the file for HTML output

Goptions device=activex xpixels=400 ypixels=400

Proc gchart data=clinic.heart;

Hbar3d shock / sumvar=arterial type=mean;

Run;

Quit;

Ods html close;

  1. Additional Features
    1. Other options with GCHART and GPLOT procedures can further enhance the graphs. Underline, rotate, angel and layer titles and footnotes and even enclose them in boxes.
    2. Display data or geographic maps using GMAP
    3. Manage catalogs of graphics output using PROC GREPLAY
    4. Use PROC GREPLAY to display two or more graphs in one output file
    5. Customize graphics output using the ANNOTATE facility
    6. create three-dimensional response surface images using PROC 3GD

Lesson Seventeen – Creating Drill-Down Graphs in HTML:

  1. Drill-Down Charts are charts that are hyperlinked together to show more intricate statistics and information (Main Chart has three bars that each are linked to separate charts that each have three bars of data that are linked to another chart, etc)
  2. To achieve this type of SAS program, you must do the following:

    1. Create the initial chart that you want to use, using SAS\GRAPH software. This chart will become the primary chart in your linked series of graphs
    2. Create a variable to hold HREF addresses for the linked HTML files
    3. Define values for the new variable
    4. Set the output format to HTML by using the ODS HTML statement
    5. Attach links to the primary chart by identifying the HREF variable in the SAS\GRAPH procedure
    6. Create the secondary, linked charts and tables
  3. Step One (Original Code/Primary Graph):

Data work.saletrnd;

Set finance.prdsal2;

Format actual dollar10.0;

Run;

Goptions hsize=5in vsize=4in;

Title ‘Total Sales by Year’;

Proc gchart data=work.saletrnd;

Vbar year / subgroup=prodtype discrete

Sumvar=actual;

Run;

  1. Step Two:
    1. Use a LENGTH statement to define the new character variable since HREF links are long text strings

Data work.saletrnd;

Set finance.prdsal2;

Length PriDrill $ 40; “$” is used to specify characters, 40 is integer length

Run;

Now PriDrill will store the names of the pages that will be loaded when users click the bars of the chart

  1. Step Three:
    1. Specify values for the variable by using HREF= statements (note: these charts have not been made yet, just assigning values and documenting names of files for right now):

PriDrill:

HREF=sales95.html

HREF=sales96.html

HREF=sales97.html

HREF=sales98.html

    1. Assigning Values Conditionally
      1. Use an IF-Then Statement
      2. IF expression THEN statement;

Data work.saletrnd;

Set finance.prdsal2;

Format actual dollar10.0;

Length PriDrill $ 40;

If year=1995 then pridrill=’HREF=”sales95.html”’;

If year=1996 then pridrill=’HREF=”sales96.html”’;

If year=1997 then pridrill=’HREF=”sales97.html”’;

If year=1998 then pridrill=’HREF=”sales98.html”’;

Run;

      1. ANCHOR: ’HREF=”sales98.html#office”’;

This specifies the anchor named “office” within sales98.html.

Anchors will automatically take you to that “spot” on the webpage.

      1. To lead to anchors that are on the same page as your chart, omit the

HTML page name and use only the pound sign and the anchor name.

  1. Step Four:
    1. For Drill-Down graphs add two options for ODS statements:

      1. PATH=option to specify a location for storing GIF files and HTML files, this directory must already exist in the system, ods will not create it for you
      2. BODY=option to name the HTML file that will hold the drill-down chart. This file will be created in the location specified in the path=option

Ods html path=’c:\data\sales\reports’

Body=’totsales.html’;

      1. In this example, SAS\GRAPH GIF files and HTML output will be stored in C:\Data\Sales\Report, the file totsales.html will be found there and display your program’s SAS/GRAPH output when that output is viewed using a web browser:

Data work.saletrnd;

Set finance.prdsal2;

Format actual dollar10.0;

Length PriDrill $ 40;

If year=1995 then pridrill=’HREF=”sales95.html”’;

If year=1996 then pridrill=’HREF=”sales96.html”’;

If year=1997 then pridrill=’HREF=”sales97.html”’;

If year=1998 then pridrill=’HREF=”sales98.html”’;

Run;

Ods listing close;

Ods html path=’c:\data\sales\reports’

Body=’totsales.html’;

Run;

  1. Step Five: Modifying the original gchart
    1. Change the GOPTIONS to specify GIF output as well as a size for the SAS/GRAPH chart:
    2. Also add the HTML=option to the VBAR statement. The HTML=option assigns elements of the chart to the variable that you created for holding the HREF address strings (PriDrill).

Goptions hsize=5in vsize=4in;

Goptions device=gif xpixels=480 ypixels=360;

Title ‘Total Sales by Year’;

Proc gchart data=work.saletrnd;

Vbar year / subgroup=prodtype discrete

Sumvar=actual html=pridrill;

Run;

PROGRESS CHECK

(Here is what should be made so far):

Data work.saletrnd;

Set finance.prdsal2;

Length PriDrill $ 40;

If year=1995 then pridrill=’HREF=”sales95.html”’;

If year=1996 then pridrill=’HREF=”sales96.html”’;

If year=1997 then pridrill=’HREF=”sales97.html”’;

If year=1998 then pridrill=’HREF=”sales98.html”’;

Format actual dollar10.0;

Run;

Ods listing close;

Ods html path=’c:\data\sales\reports’

Body=’totsales.html’;

Goptions device=gif xpixels=480 ypixels=360;

Title ‘Total Sales by Year’;

Proc gchart data=work.saletrnd;

Vbar year / subgroup=prodtype discrete

Sumvar=actual html=pridrill;

Run;

Quit;

Ods html close;

Ods listing;

  1. Step Six: Creating the Secondary Charts
    1. Now create each of the HREF’s:

Ods html body=’sales95.html’;

Title ‘Sales for 1995 by Quarter’

Proce gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

Where year=1995;

Run;

Ods html body=’sales96.html’;

Title ‘Sales for 1996 by Quarter’

Proce gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

Where year=1996;

Run;

Ods html body=’sales97.html’;

Title ‘Sales for 1997 by Quarter’

Proce gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

Where year=1997;

Run;

Ods html body=’sales98.html’;

Title ‘Sales for 1998 by Quarter’

Proce gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

Where year=1998;

Run;

  1. Creating Secondary Charts and Tables

This will be using an anchor, where the subcharts are found on the same page as their primary charts (will not have separate HTML outputs)

    1. In the DATA step use the length statement to define a variable name SecDrill
    2. Assign values to SecDrill according to the value of Quarter. Assign the HTML anchor names salesq1, salesq2, salesq3, salesq4

Data work.saletrnd;

Set finance.prdsal2;

Length PriDrill $ 40 SecDrill $ 40;

If year=1995 then pridrill=’HREF=”sales95.html”’;

If year=1996 then pridrill=’HREF=”sales96.html”’;

If year=1997 then pridrill=’HREF=”sales97.html”’;

If year=1998 then pridrill=’HREF=”sales98.html”’;

If quarter=1 then secdrill=’HREF=”#salesq1”’;

If quarter=2 then secdrill=’HREF=”#salesq2”’;

If quarter=3 then secdrill=’HREF=”#salesq3”’;

If quarter=4 then secdrill=’HREF=”#salesq4”’;

Run;

    1. To associate these new HREF strings with bars of the secondary charts, add the HTML option to each gchart procedure that produces the quarterly graphs:

Example:

Ods html body=’sales95.html’;

Title ‘Sales for 1995 by Quarter’

Proc gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

html=secdrill;

Where year=1995;

Run;

Do this for the other “secondary” graphs.

    1. Now the tables need to be created for each chart in each year for each quarter
    2. Also add in anchor=anchor-name to specify each table’s anchor

Put after each chart made for each year so they are in same file as respective year:

Ods html body=’sales95.html’;

Title ‘Sales for 1995 by Quarter’

Proc gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

Html=secdrill;

Where year=1995;

Run;

Ods html anchor=’salesq1’;

Title ‘1st Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=1;

Run;

Ods html anchor=’salesq2’;

Title ‘2nd Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=2;

Run;

Ods html anchor=’salesq3’;

Title ‘3rd Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=3;

Run;

Ods html anchor=’salesq4’;

Title ‘4th Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=4;

Run;

FINISHED CODE PLUS WALKTHROUGH:

Data work.saletrnd;

Set finance.prdsal2;

/* create variables to hold HREF address strings */

Length PriDrill $ 40 SecDrill $ 40;

/* define the HTML links for the primary charts */

If year=1995 then pridrill=’HREF=”sales95.html”’;

If year=1996 then pridrill=’HREF=”sales96.html”’;

If year=1997 then pridrill=’HREF=”sales97.html”’;

If year=1998 then pridrill=’HREF=”sales98.html”’;

/* define the HTML links for the secondary charts*/

If quarter=1 then secdrill=’HREF=”#salesq1”’;

If quarter=2 then secdrill=’HREF=”#salesq2”’;

If quarter=3 then secdrill=’HREF=”#salesq3”’;

If quarter=4 then secdrill=’HREF=”#salesq4”’;

/* fix the format of the chart variable ACTUAL */

Format actual dollar10.0;

/*switch to HTML output…set the directory for GIF & */

/* HTML output and specify the name of the HTML file */

Run;

Ods listing close;

Ods html path=’c:\data\sales\reports’

Body=’totsales.html’;

/* switch to GIF output for images…set image size */

Goptions device=gif xpixels=480 ypixels=360;

/* set title of primary chart */

Title ‘Total Sales by Year’;

/* create a bar chart of sales by year… */

/* assign the HTML links held by PriDrill */

Proc gchart data=work.saletrnd;

Vbar year / subgroup=prodtype discrete

Sumvar=actual html=pridrill;

Run;

/* specify the HTML file name of the secondary chart… */

/* set the title and create a bar chart where YEAR=1995 */

Ods html body=’sales95.html’;

Title ‘Sales for 1995 by Quarter’

Proc gchart data=work.saletrnd;

Vbar quarter / sumvar=actual

Subgroup=product discrete;

Html=secdrill;

Where year=1995;

Run;

/* create a named anchor in the current HTML page…set a */

/* title and create a table where YEAR=1995 & Quarter=1=4 */

Ods html anchor=’salesq1’;

Title ‘1st Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=1;

Run;

Ods html anchor=’salesq2’;

Title ‘2nd Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=2;

Run;

Ods html anchor=’salesq3’;

Title ‘3rd Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=3;

Run;

Ods html anchor=’salesq4’;

Title ‘4th Quarter 1995 Sales by Country’;

Proc tabulate data=work.saletrnd format=dollar10.0;

Class country product;

Var actual;

Table (country all)*(product all), sum*actual;

Where year=1995 and quarter=4;

Run;

* specify the HTML file name of the secondary chart… */

/* set the title and create a bar chart where YEAR=1995 */

<<<<<Repeat these steps for Year 1996, 1997 and 1998>>>>>

/* create a named anchor in the current HTML page…set a */

/* title and create a table where YEAR=1995 & Quarter=1=4 */

<<<<<Repeat these steps for Year 1996, 1997 and 1998>>>>>

/* switch back to listing output */

Quit;

Ods html close;

Ods listing;

Lesson Eighteen – Creating and Managing Variables

  1. Sum Statement (creates running total column, in example named “sumsec”)

    1. variable+expression
      1. variable is the accumulator variable
      2. expression is any valid SAS expression
    2. data clinic.stress

infile tests;

input ID $1-4 Name $6-25 timemin 27-29 timesec 31-33

TotalTime=(timemin*60)+timesec;

Sumsec+totaltime;

Run;

  1. Retain Statement (can be used to initialize Sum Statement with value other than default 0)
    1. Retain variable <initial-value>
      1. Has no effect on variables that are read with set, merge or updated statements
      2. Can only be used as a new variable (not one that already exists)
    2. data clinic.stress

infile tests;

input ID $1-4 Name $6-25 timemin 27-29 timesec 31-33

TotalTime=(timemin*60)+timesec;

Retain sumsec 100;

Sumsec+totaltime;

Run;

  1. If-Then Statements
    1. Example:
      1. if var1>800 then var2=’Long’;
      2. No ‘’ necessary for numeric operators (only for characters)

Operator

Comparison Operation

= or eq

Equal to

^= or ne

Not equal to

> or gt

Greater than

< or lt

Less than

>= or ge

Greater than or equal to

<= or le

Less than or equal to

in

Equal to one of a list

Operator

Logical Operation

&

And

|

Or

^ or ~

Not

    1. if (age^=agecheck | time^=3)

& error=1 then Test=1;

    1. if not in(‘SE’,’NE’)

then direction=SW;

    1. Write “or” statements as:

      1. if x=1 or 2;
      2. NOT if x=1 or x=2;
        1. (this asks to look for any 0 or nonmissing numbers)
    2. ELSE Statement (And “put” statement)
      1. Has to go with an “if” statement:
      2. if var1=2 then var2=4;

else if var1=3 then var2=5;

else put ‘Note: Check this variable’ var1=’

run;

  1. LENGTH Statements
    1. Length variable <$> length;
    2. data clinic.stress

infile tests;

input ID $1-4 Name $6-25 timemin 27-29 timesec 31-33;

length Name $ 25;

TotalTime=(timemin*60)+timesec;

Retain sumsec 100;

Sumsec+totaltime;

Run;

    1. Length statements on the same line examples:

      1. length var1 var2 var3 $ 200;
      2. length var1 $ 12 var2 bestfit12.;
        1. bestfit12. = fits variable value to 12 spaces (including decimals)
  1. Delete Statement
      1. If expression then delete;
  2. Drop and Keep Statements
    1. (Drop=variable(s))
    2. (Keep=variable(s))
    3. Examples:

      1. data clinic.stress (drop=var1 var2); “drop” data set option
      2. data clinic.stress;

drop var1 var2; “drop” statement

      1. DROP and KEEP are interchangeable in examples
    1. You cannot use the DROP/KEEP statements in SAS proc steps

      1. only as a proc set option

To do this…

Use this type of statement…

Reference a SAS data library

Reference an external file

libname clinic ‘c:\users\my\data’;

filename tests ‘c:\users\my.dat’;

Name a SAS data set

Identify an external file

Describe raw data

data clinic.stress;

infile tests obs=10;

input ID $1-4 Name $6-25;

Subset data

if resthr<70 then delete;

if tolerance=’D’;

Drop unwanted variables

drop timemin timesec;

Create or modify a variable

Totaltime=(timemin*60)+timesec

Initialize a sum variable

Sum accumulated values

retain SumSec 5400;

sumsec+totaltime;

Specify a variable’s length

length TestLength $ 6;

Execute statements conditionally

if totaltime>800 then TestLength=’Long’;

else if 750<=totaltime<=800

then TestLength=’Normal’;

else if totaltime<750

then TestLength=’Short’;

Label a variable

Format a variable

LABEL Statement

FORMAT Statement

Execute the DATA step

run;

List the data

Execute the final program step

proc print data=clinic.stress label;

run;

  1. Label and Format Statements
    1. label var1=’Variable 1’;
    2. format var1 comma6.;
    3. Using label and format in the DATA step sets them permanently

      1. When specified in the PROC step it is temporary, but overrides defaults
      2. Important: to use the labels specified in a DATA step, use the label option in the PROC statement
        1. proc print data=clinic.stress label; considered an option

run;

  1. SELECT Group
    1. Select groups are used similarly to the if-then statements
    2. When you have a long series of mutually exclusive conditions and the comparison is numeric, using a SELECT group is slightly more efficient than using a series of IF-THEN or IF-THEN/ELSE statements because CPU time is reduced. SELECT groups also make the program easier to read and debug.
    3. For programs with few conditions, use IF-THEN/ELSE statements
    4. SELECT <(select-expression)>;

WHEN-1 (when-expression-1 <…, when-expression-n>) statement;

WHEN-n (when-expression-1 <…, when-expression-n>) statement;

<OTHERWISE statement;>

End;

This statement…

Performs this action…

SELECT

Begins a SELECT group

WHEN

Identifies SAS statements that are executed when a particular condition is true

OTHERWISE (optional)

Specifies a statement to be executed if no WHEN condition is met

END

Ends a SELECT group

    1. Examples:
      1. In the following example, variable a is specified in the SELECT statement, and various values to compare to a are specified in the WHEN statements.
      2. When the value of a is:

        1. 1 x multiplied by 10
        2. 3, 4, or 5 x multiplied by 100
        3. 2 or any other value, nothing happens

Select (a);

When (1) x=x*10;

When (3,4,5) x=x*100;

Otherwise;

End;

      1. WHEN/OTHERWISE statements are case-sensitive
      2. The following example would not work on values such as “fa1” or “fa3”

Data emps (keep=salary group);

Set sasuser.payrollmaster;

Length Group $ 20;

Select (jobcode);

When (“FA1”) group=”Flight Attendant I”;

When (“FA2”) group=”Flight Attendant II”;

When (“FA3”) group=”Flight Attendant III”;

Otherwise group=”other”;

End;

Run;

    1. Specifying SELECT Statements
      1. If you do specify a select-expression, SAS compares the value of the select-expression with the value of each when-expression (SAS evaluates the select-expression and when-expression, compares the two for equality, and returns a value of true or false)

        1. Select (var1); now all WHEN statements only apply to values in the condition listed (in this case, any “var1” variables)
      2. If you do not specify a selection-expression, SAS evaluates each when-expression to produce a result of true or false

        1. Select; WHEN statements are evaluated on all values
  1. DO Groups
    1. DO;

SAS statements;

END;

    1. if var1=20 then

do;

var2=’Normal’;

message=’Type Message Here!’;

end;

    1. Example of multiple conditions:

data payroll;

set salaries;

select (payclass);

when (‘monthly’) amt=salary;

when (‘hourly’)

do;

amt=hrlywage*min(hrs,40);

if hrs>40 then put ‘CHECK TIMECARD’;

end;

otherwise put ‘PROBLEM OBSERVATION’;

end;

run;

    1. Indenting and Nesting DO Groups

do;

statements;

do;

statements;

do;

statements;

end;

end;

end;

    1. Three other forms of DO Statement

      1. The iterative DO statement executes statements between DO and END statements repetitively based on the value of an index variable. The iterative DO statement can contain a WHILE or UNTIL clause
      2. The DO UNTIL statement executes statements in a DO loop repetitively until a condition is true, checking the condition after each iteration of the DO loop
      3. The DO WHILE statement executes statements in a DO loop repetitively while a condition is true, checking the condition before each iteration of the DO loop
    2. QUIZ Reminders:
      1. Length Statements MUST go before any reference to the specified variable in the DATA step
      2. You cannot use the DROP/KEEP statements in SAS proc steps

        1. only as a proc set option

Lesson Nineteen – Reading SAS Data Sets

  1. DROP and KEEP Data Set Options (when reading in a SAS data set)
    1. Use drop as an option in the DATA statement if you want to be able to process the variable in the DATA statement

      1. data lab23.drug1h (drop=placebo);
    2. Use drop in the set statement if you do not want to use the variable (and you do not want to read the variable)

      1. data lab23.drug1h;

set research.cltrials (drop=placebo);

  1. BY-Group Processing
    1. Can be used in proc sort to sort observations, in proc print to group observations for subtotals
    2. When you use the BY statement with the SET statement

      1. the data sets that are listed in the set statement must be sorted by the vaues of the BY variable(s), or they must have an appropriate index
      2. the DATA step creates two temporary variables for each BY variable. One is name first.variable, where variable is the name of the BY variable, and the other is name last.variable.Their values are either 0 or 1. They identify the first and last observation in each BY group (1=first/last observation, 0=any other observation)
    3. proc sort data=company.usa out=work.temp;

by dept;

run;

data company.budget (keep=dept payroll);

set work.temp;

by dept;

if wagecat=’S’ then Yearly=wagerate*12;

else if wagecat=’H’ then Yearly=wagerate*2000;

if first.dept then Payroll=0;

payroll+yearly;

if last.dept;

run;

    1. Using Multiple BY variables

      1. by var1 var2;
      2. first.variable for each variable is set to 1 at the first occurrence of a new value for the variable
      3. a change in the value of a primary BY variable forces last.variable to equal 1 for the secondary BY variables
  1. Reading Observations Using Direct Access
    1. To access observations directly by their observation number, you use the POINT=option in the SET statement
    2. Point=variable
      1. Variable specifies a temporary numeric variable that contains the observation number of the observation to read
      2. Must be given a value before the SET statement is executed
    3. Need to use a Stop; statement to prevent continuous looping

      1. Must put in an “output” statement before “stop” statement or DATA step will not process any output

data work.getobs5 (drop=obsnum);

obsnum=5;

set company.usa (keep=manager payroll) point=obsnum;

output;

stop;

run;

    1. Detecting the End of a SAS Data Set

      1. End= option is used to create a temporary numeric variable whose value is used to detect the last observation
        1. Used in the SET Statement
        2. End=variable
          1. variable creates and names a temporary variable that contains an end-of-file marker. The variable, which is initialized to 0, is set to 1 when the SET statement reads the last observation of the data set

This variable is not added to the data set

        1. Do not specific END= with POINT=. POINT= reads only a specific observation, so the last observation in the data set is not encountered
        2. Use END=last to specify the end point of the last observation

          1. Use an IF statement to select only the last observation:

data work.addtoend (drop=timemin timesec);

set clinic.stress2 (keep=timemin timesec) end=last;

TotalMin+timemin;

TotalSec+timesec;

TotalTime=totalmin*60+timesec;

if last;

run;

proc print data=work.addtoend noobs;

run;

  1. QUIZ Reminders:
    1. If a variable needs to be processed to create a new variable and then drop’d in the data set option, it needs to be keep’d in the proc set option

      1. data new.data (drop=var1);

set old.data (keep=var1);

var2=var1+1;

run;

Lesson Twenty – Combining SAS Data Sets

  1. Different Ways to Combine SAS Data Sets

***These are the two original SAS Data Sets for the following examples***

Num

VarA

1

A1

2

A2

3

A3

4

A4

Num

VarB

2

B1

3

B2

4

B3

    1. One-to-one reading
      1. Creates observations that contain all of the variables from each contributing data set
      2. Combines observations based on their relative position in each data set
      3. Statement: SET

Num

VarA

VarB

2

A1

B1

3

A2

B2

4

A3

B3

    1. Concatenating
      1. Appends the observations from one data set to another
      2. Statement: SET

Num

VarA

VarB

1

A1

2

A2

3

A3

4

A4

2

B1

3

B2

4

B3

    1. Interleaving
      1. Intersperses observations from two or more data sets, based on one or more common variables
      2. Statements: SET, BY

Num

Var

1

A1

2

A2

2

B2

3

A3

3

B3

4

A4

4

B4

    1. Match-merging
      1. Matches observations from two or more data sets into a single observation in a new data set according to the values of a common variable
      2. Statements: MERGE, BY

Num

VarA

VarB

1

A1

2

A2

B2

3

A3

B3

4

A4

B4

  1. One-to-one Reading
    1. General form:

DATA output-SAS-data-set;

SET SAS-data-set1;

SET SAS-data-set2;

RUN;

    1. The new data set contains all the variables from all the input data sets. If the data sets contain variables that have the same names, the values that are read in from the last data set overwrite the values that were read in from earlier data sets
    2. The number of observations in the new data set is the number of observations in the smallest original data set. Observations are combined based on their relative position in each data set; that is, the first observation in one data set is joined with the first observation in the other, and so on. The DATA step stops after it has read the last observation from the smallest data set.

Data one2one;

Set c;

Set d;

Run;

    1. Subsetting observations from data sets:

      1. Put subset options under data sets SET statement

data one2one;

SET data1;

where var=0; applies to data set data1

SET data2;

where var=1; applies to data set data2

run;

  1. Concatenating
    1. General form:

DATA output-sas-data-set;

SET SAS-data-set1 SAS-data-set2 <etc>;

RUN;

    1. If the data sets share the same variable:

      1. they must have the same type attribute, or SAS stops processing the DATA step and issues an error message stating that the variables are incompatible
      2. they will have the length attribute from the first data set that contains the variable

        1. the same is true for label, format and informat attributes

Data one2one;

Set c d;

Run;

  1. Interleaving
    1. General form:

DATA output-SAS-data-set;

SET SAS-data-set1 SAS-data-set2 <etc>;***

BY variable;

RUN;

***You may specify any # of data sets to include; however, they must be sorted or indexed in ascending order based on the BY variable!

Data one2one;

Set c d;

BY num;

Run;

  1. Match-Merging
    1. General form:

DATA output-SAS-data-set;

MERGE SAS-data-set1 SAS-data-set2 <etc>;***

BY <DESCENDING> variable(s);

RUN;

***Each input data set in the MERGE statement must be sorted or indexed in ascending order based on the BY variable! Each BY variable must have the same type in all data sets to be merged

***You cannot use DESCENDING option with indexed data sets because indexes are always stored in ascending order

Data one2one;

MERGE c d;

BY num;

Run;

    1. To sort the data sets in descending order and then merge them, submit the following:

Data one2one;

merge c d;

by DESCENDING num;

Run;

    1. To prevent overwriting variables use RENAME= data set option

      1. merge clinic.demog (rename=(date=Birthdate))

clinic.visit (rename=(date=VisitDate));

    1. Excluding Unmatched Observations
      1. To exclude unmatched observations from your output data set, you can use the IN= data set option and the subsetting IF statement in your DATA step

        1. use the IN= data set option to create and name a variable that indicates whether the data set contributed data to the current observation
        2. use the subsetting IF statement to check the IN= values and write to the merged data set only those observations that appear in the data sets for which N= is specified
      2. IN= data set option
        1. (IN=variable)
        2. This creates a temporary variable that can be used in the DATA step, but is not in the SAS data set
        3. Within the DATA step, the value of the variable is 1 if the data set contributed to the current observation. Otherwise, its value is 0.

data clinic.merged;

merge clinic.demog (in=indemog

rename=(date=Birthdate))

clinic.visit (in=invisit

rename=(date=(VisitDate));

by id;

if indemog=1 and invisit=1;

run;

        1. The program above creates a temporary variable “indemog” in the demog data set and “invisit” in the visit dataset. This variable is a “1” if the observation (id) contributed data when merging or a “0” if it did not
        2. IF statements produce a result that is either nonzero, zero, or missing. A nonzero or nonmissing results in true; a zero or missing results in a false.

          1. Thus the IF statement used above can be written two ways:

if indemog=1 and invisit=1; IF statement checks specifically for a value of “1”

OR

if indemog and invisit; IF statement checks for a value that is neither missing nor 0 (which for IN= variables is always 1)

        1. Drop/Keep statements example

merge clinic.demog (drop=weight in=indemog

rename=(date=BirthDate))

clinic.visit (keep=Date in=invist);

    1. Additional Features
      1. Other Options to Use
        • Use IF/THEN/ELSE logic to control processing based on one or more conditions
        • Specify additional data set options
        • Perform calculations
        • Create new variables
        • Process variables in arrays
        • Use SAS functions
        • Use special variables such as first. and last. to control processing
      2. Combining SAS data sets in other ways:
        1. One-to-one merging is the same as one-to-one reading with two exceptions:
          1. You can use the MERGE statement instead of multiple SET statements
          2. The DATA step reads all observations from all data sets

data work.onemerge;

merge clinic.demog clinic.visit;

run;

        1. Conditional merge using DO loops or other conditional statements

data work.combine;

set sales.pounds;

do while (not(begin le date le last));

set sales.rate;

end;

Dollars=(sales*1000)*rate;

run;

        1. You can read the same data set in more than one SET statement

data work.combine (drop=totpay);

if _n_=1 then do until(last);

set sales.budget (keep=payroll) end=last;

end;

set sales.budget;

Percent=payroll/totpay;

run;

Lesson Twenty-One – Performing Queries Using PROC SQL

  1. PROC SQL is SAS’ implementation Structured Query Language (SQL), which is a standard language that is widely used to retrieve and update data in tables and in views that are based on those tables

  2. Terms used in data processing, SAS and SQL that are synonymous:

Data Processing

SAS

SQL

File

SAS data set

Table

Record

Observation

Row

Field

Variable

Column

  1. PROC SQL can often be used as an alternative to other SAS procedures or the DATA step. You can use PROC SQL to

    1. retrieve data from and manipulate SAS tables
    2. add or modify data values in a table
    3. add, modify, or drop columns in a table
    4. create tables and views
    5. join multiple tables (whether or not they contain columns with the same name)
    6. generate reports
  2. Differences
    1. Unlike other PROC statements, many statements in PROC SQL are composed of clauses. The SELECT statement contains several clauses: SELECT, FROM, WHERE and ORDER

proc sql;

select empid,jobcode,salary,salary*.06 as bonus

from sasuser.payrollmaster

where salary<32000

order by jobcode;

quit; does not use a “run” statement, but needs a “quit” statement or it will run continuously!

    1. PROC SQL does not require a run statement, it executes each query automatically and continues to run until another PROC step, DATA step or QUIT statement is submitted
  1. General form of PROG SQL with SELECT statement:

PROC SQL;

SELECT column-1<,…column-n>

FROM table-1 | view-1<,…table-n | view-n>

<WHERE expression>

<GROUP BY column-1<,…column-n>>

<ORDER BY column-1<,…column-n>>;

    1. Where:

      1. PROC SQL invokes the SQL procedure
      2. SELECT specifies the column(s) to be selected
      3. WHERE subsets the data based on a condition
      4. GROUP BY classifies the data into groups based on the specified column(s)
      5. ORDER BY sorts the rows that the query returns by the value(s) of the specified column(s)
      6. Note: Clauses MUST be in the order shown above!
        1. Select and From are required; Where, Group By and Order By are optional
        2. The semicolon (;) goes at the end of the entire statement

Type of Output

PROC SQL Statements

Report

SELECT

Table

CREATE TABLE and SELECT

PROC SQL view

CREATE VIEW and SELECT

    1. Selecting and Creating Columns

      1. Write a SELECT clause

        1. After the keyword SELECT, list one or more column names, separated by commas
        2. Use the word “as” to create a new column, or assign an existing column an alias

proc sql;

select emid,jobcode,salary, Selecting existing columns

salary*.06 as bonus, creating a new column

firstname as Name creates an alias shows up as “Name”

    1. Specifying the Table

      1. After writing the SELECT clause, you specify the table to queried in the FROM clause (type in the keyword FROM followed by the name of the table)

proc sql;

select emid,jobcode,salary,

salary*.06 as bonus

from sasuser.payrollmaster

    1. Subsetting and Ordering

      1. Use a WHERE clause to subset information

        1. The columns specified in the WHERE clause do not need to be specified in the SELECT clause
      2. Use ORDER BY to sort rows by values in the column(s) specified

        1. ORDER BY defaults to sort acending, to sort in descending order put DESC after the column name

          1. order by jobcode desc;
        2. ORDER BY can be specified also by the column’s position (1=first column, 2=second column,etc)

          1. order by 2;
        3. Separate multiple variables with a comma (,), and it is possible to mix and match column names with column positions

          1. order by jobcode,salary desc,2;

proc sql;

select emid,jobcode,salary,

salary*.06 as bonus

from sasuser.payrollmaster

where salary<32000

order by jobcode;

Practice Example:

Using PROC SQL, select the columns ActLevel, Age, KgWgt, and MeterHgt from sasuser.newadmit. Create a new column named BodyMass as kgwgt/meterhgt**2. (The symbol ** indicates exponentation.) Select only females (F), and order rows by values of ActLevel. Add a QUIT statement to the query to end the procedure.

proc sql;

select actlevel,age,kgwgt,meterhgt

kgwgt/meterhgt**2 as BodyMass

from sasusers.newadmit

where sex=F this column was not specified in the example, used arbitrary name for column (sex)

order by actlevel;

quit;

  1. Querying Multiple Tables
    1. In SQL terminology, combining tables horizontally is called joining tables. Joins do not alter the original tables.

Example Problem:

Suppose you want to create a report that displays the following information for employees of a company: employ identification number, last name, original salary, and new salary. There is no single table that contains all of these columns, so you will have to join the two tables sasuser.Salcomps and sasuser.Newsals. In your query, you want to select four columns, two from the first table and two from the second table. You also need to be sure that the rows you join belong to the same employee. To check this, you want to match employee identification numbers for rows that you merge and to select only the rows that match.

Note: This type of join is known as an inner join. An inner join returns a result set for all of the rows in a table that have one or more matching rows in another table.

proc sql;

select salcomps.empid,lastname,

newsals.salary,newsalary

from sasuser.salcomps,sasuser.newsals

where salcomps.empid=newsals.empid

order by lastname;

    1. Specifying Columns That Appear in Multiple Tables
      1. If the tables included in the query contain same-named columns, then list one (or both—one with an alias) of these columns in the SELECT clause and include the specified-table’s name as the columns prefix

        1. Example: tablename.columnname
        2. Prefixing a table name to a column name is called qualifying the column name

In the following example, the tables Salcomps and Newsals both contain the columns “empid” and “salary”. The column “empid” is being read from the table Salcomps, and the column “salary” is being read from the Newsals table.

proc sql;

select salcomps.empid,lastname,

newsals.salary,newsalary

from sasuser.salcomps,sasuser.newsals

where salcomps.empid=newsals.empid

order by lastname;

    1. Specifying Multiple Table Names
      1. Specify each table name in the FROM clause, separated by commas

proc sql;

select salcomps.empid,lastname,

newsals.salary,newsalary

from sasuser.salcomps,sasuser.newsals

where salcomps.empid=newsals.empid

order by lastname;

    1. Subsetting Rows
      1. When you join multiple tables, be sure that the WHERE clause specifies columns with data whose values match, to avoid unwanted combinations

In the following example, the WHERE clause selects only rows in which the value for empid in the table Salcomps matches the value for empid in the table Newsals. Qualified column names must be used in the WHERE clause to specify each of the two empid columns. (Remember, “qualified” is having the specified-column’s table name as a prefix)

proc sql;

select salcomps.empid,lastname,

newsals.salary,newsalary

from sasuser.salcomps,sasuser.newsals

where salcomps.empid=newsals.empid

order by lastname;

    1. Ordering Rows
      1. Specified in the ORDER BY clause

        1. Sorts table by column(s) in ascending order

          1. add desc after column for descending order

proc sql;

select salcomps.empid,lastname,

newsals.salary,newsalary

from sasuser.salcomps,sasuser.newsals

where salcomps.empid=newsals.empid

order by lastname;

Practice Example:

Write a PROC SQL step to select month, WalkJogRun, and Swim from sasuser.therapy1999 and to select treadmill and newadmit from sasuser.totals2000 (Month is in both tables). Create the new column Exercise by adding the values of columns WalkJogRun and Swim. Select rows for which the values of Month match. End the step with a quit statement.

proc sql;

select therapy1999.Month,WalkJogRun,Swim

treadmill,newadmit

from sasuser.therapy1999,sasuser.totals2000

where therapy1999.month=totals2000.month;

quit;

  1. Summaring Groups of Data
    1. Use the GROUP BY clause

      1. The GROUP BY clause is used in queries that include one or more summary functions. Summary functions produce statistical summary for each group that is defined in the GROUP BY clause.

Example Problem:

Suppose you want to determine the total number of miles traveled by frequent-flyer program members in ceach of three membership classes (Gold, Silver, and Bronze). Frequent-flyer program information is stored in the table sasuser.frequentflyers. To summarize your data, you can submit the following PROC SQL step:

proc sql;

select membertype,

sum(milestraveled) as TotalMiles

from sasuser.frequentflyers

group by membertype;

quit;

OUTPUT:

MemberType

TotalMiles

BRONZE

3229225

GOLD

2903569

SILVER

4345169

Note: If you specify a GROUP BY clause in a query that does not contain a summary function, your clause is changed to an ORDER BY clause, and a message to that effect is written to the SAS log

    1. Summary Functions
      1. To summarize data, you can use the following summary functions with PROC SQL. Notice that some functions have more than one name to accomdoate both SAS and SQL conventions. Where multiple names are listed, the first name is the SQL name.

SQL/SAS Summary Function

Description of Function

AVG, MEAN

Mean or average of values

COUNT, FREQ, N

Number of nonmissing values

CSS

Corrected sum of squares

CV

Coefficient of variation (percent)

MAX

Largest value

MIN

Small value

NMISS

Number of missing values

PRT

Probability of a greater absolute value of Student’s t

RANGE

Range of values

STD

Standard deviation

STDERR

Standard error of the mean

SUM

Sum of values

T

Student’s t value for testing hypothesis that the population mean is zero

USS

Uncorrected sum of squares

VAR

Variance

Practice Example:

Write a PROC SQL step to select sex from sasuser.diabetes. Create averageage by calculating the average value of age. Create averageweight by calculating the average value of weight. Group the results by the values of sex. End the step with a quit statement.

proc sql;

select sex,

avg(age) as averageage,

avg(weight) as averageweight

from sasuser.diabetes

group by sex;

quit;

  1. Creating Output Tables
    1. To create a new table from the results of a query, use a CREATE TABLE statement that includes the keyword AS and the caluses that are used in a PROC SQL query: SELECT, FROM, and any optional clauses, such as ORDER BY. The CREATE TABLE statement stores your query results in a table instead of displaying the results as a report.
    2. General Form of CREATE TABLE statement:

PROC SQL;

CREATE TABLE table-name AS

SELECT column-1<,…column-n>

FROM table-1 | view-1<,…table-n | view-n>

<WHERE expression>

<GROUP BY column-1<,…column-n>>

<ORDER BY column-1<,…column-n>>;

    1. Where:

      1. Table-name specifies the name of the table to be created
      2. The rest are the same as the SELECT statement (discussed above)
      3. A query can also include a HAVING clause that works with the GROUP BY clause to further refine a proc sql query

Example Problem:

Suppose that after determining the total miles traveled for each frequent-flyer membership class is in the sasuser.frequentflyers table, you want to store this information in a temporary table work.miles. To do so, you can submit the following PROC SQL step:

proc sql;

create table work.miles as “work” libref creates a temp SAS file

select membertype,

sum(milestraveled) as TotalMiles

from sasuser.frequentflyers

group by membertype;

      1. Because the CREATE TABLE statement is used, this query does not create a report. The SAS log verifies that the table was created and indicated how many rows and columns the table contains.
      2. In this example, the data is saved as a temporary table using the work. libref.

        1. If this program is ran in the SAS viewer, this table will be displayed, but will be deleted at the end of the SAS session.
        2. If this was used in Batch Submit, the table will not be displayed and will instead have been deleted (the same as if a libref was never specified).
  1. Additional Features
    1. To further refine PROC SQL query that contains a GROUP BY clause, you can use a HAVING clause. A HAVING clause works with the GROUP BY clause to restrict groups that are displayed in the output, based on one or more specified conditions.

Example Problem:

The following PROC SQL query grups the output rows by jobcode. The HAVING clause uses the summary function AVG to specify that only the groups that have an average salary that is greater than 40,000 will be displayed in the output.

proc sql;

select jobcode,avg(salary) as Avg

from sasuser.payrollmaster

group by jobcode

having avg(salary)>40000

order by jobcode;

Lesson Twenty-Two – Transforming Data with SAS Functions

  1. Examples of Date Functions

Function

Description

Form

Sample Value

YEAR

Extracts the year value from a SAS date value

YEAR(date)

2002

QTR

Extracts the quarter value from a SAS date value

QTR(date)

1

MONTH

Extracts the month value from a SAS date value

MONTH(date)

12

DAY

Extracts the day value from a SAS date value

DAY(date)

5

  1. Categories of SAS Functions

Array

Probability

Bitwise Logical Operations

Quantile

Character

Random Number

Character String Matching

Sample Statistics

Date and Time

SAS File I/O

Dynamic Link Library

Special

External Files

State and ZIP Code

Financial

Trigonometric

Hyperbolic

Truncation

Macro

Variable Control

Mathematical

Variable Information

MultiByte Character Set

  1. SAS Functions that Computer Sample Statistics
    1. Note: Despite the similarity of certain SAS functions, don’t assume they can be used interchangeably. For example, missing values might be handled differently for a similar function and procedure.

Function

Syntax

Calculates…

SUM

sum(argument, argument,…)

sum of values

MEAN

mean(argument, argument,…)

average of nonmissing values

MIN

min(argument, argument,…)

minimum value

MAX

max(argument, argument,…)

maximum value

VAR

var(argument, argument,…)

variance of the values

STD

std(argument, argument,…)

standard deviation of the values

  1. SAS Functions
    1. Can be used in DATA step programming and in some statistical procedures

      1. Can be specified anywhere that you would use a SAS expression, as long as the function is part of a SAS statement
    2. Example:
      1. AvgScore=mean(exam1,exam2,exam3);
    3. General Form:
      1. function-name(argument-1<,argument-n>);
      2. Where:

        1. variables mean(x,y,z)
        2. constants mean(456,502,612,498)
        3. expressions mean(37*2,192/5,mean(22,34,56))
      3. Note: even if the function does not require arguments, the function name must still be followed by a parenthesis—for example, function-name()
      4. Other Examples for Multiple Arguments:
        1. Normal

mean(x1,x2,x3)

        1. Variable List

mean(of x1-x3)

        1. Array

mean(of newarray {*})

        1. Note: If you do not list “of” with a variable list or array, then it may not calculate the way you expect. For example, mean(x1-x3) calculates the average of x1 minus x3; whereas, mean(of x1-x3) calculates the average of the sum of x1, x2 and x3.
  1. Target Variables
    1. A target variable is the variable to which the result of a function is assigned. In the example below, AvgScore is the target variable.

      1. Example: AvgScore=mean(exam1,exam2,exam3);
      2. Unless the length of a target variable is assigned, it is assigned a default length (which could be as long as 200).

        1. Add a length statement to prevent using more (or less) space than required
  2. Converting Data with Functions
    1. Converting Character to Numeric and Numeric to Character
      1. Use the INPUT function to convert character data values to numeric values
      2. Use the PUT function to convert numeric data values to character values
    2. Automatic Character-to-Numeric Conversion
      1. Occurs when a character value is

        1. assigned to a previously defined numeric value, such as the numeric value Rate

Rate=payrate;

        1. used in an arithmetic operation

Salary=payrate*hours;

        1. compared to a numeric value, using a comparison operator

If payrate>=rate;

        1. specified in a function that requires numeric arguments

NewRate=sum(payrate,raise);

      1. The automatic conversion

        1. uses the w.d informat, where w is the width of the character value that is being converted
        2. produces a numeric missing value from any character value that does not conform to standard numeric notation (digits with an optional decimal point or leading sign)

Character Value

Automatic Conversion

Numeric Value

12.47

12.47

-8.96

-8.96

1.243E1

12.43

1,742.64

.

      1. Restriction for WHERE Expressions
        1. The WHERE statement does not perform automatic conversions in comparisons
    1. Explicit Character-to-Numeric Conversion
      1. Use the INPUT function to convert character data values to numeric values
      2. General Form:
      3. INPUT(source,informat)
      4. Where

        1. Source indicates the character variable, constant or expression to be converted to a numeric value
        2. a numeric informat must also be specified, as in this example:

          1. input(payrate,2.)
          2. When choosing an informat, be sure to select a numeric informat that can read the form of the values

Character Value

Informat

2115233

7.

2,115,233

COMMA9.

      1. Example:
        1. Test=input(saletest,comma9.);
        2. The function uses the numeric format COMMA9. to read the values of character variable SaleTest. Then the resulting numeric values are stored in variable Test.
      1. Example Without INPUT (character value=payrate)

data hrd.newtemp;

set hrd.temp;

Salary=payrate*hours;

run;

      1. Example With INPUT (character value=payrate, informat=2.)

data hrd.newtemp;

set hrd.temp;

Salary=input(payrate,2.)*hours;

run;

      1. The form of the INPUT function is very similar to the form of the PUT function (which performs numeric-to-character conversions)

        1. PUT(source,format)
        2. Note: the PUT function requires a format, whereas the INPUT function requires an informat
    1. Automatic Numeric-to-Character Conversion
      1. Numeric values of the variable Site are converted to character values if you

        1. assign the numeric value to a previously defined character value, such as the character variable Sitecode

SiteCode=site;

        1. use the numeric value with an operator that requires a character value, such as the concatenation operator:

SiteCode=site||dept;

        1. specify the numeric value in a function that requires character arguments, such as the SUBSTR function

Region=substr(site,1,4);

        1. SAS writes the numeric value with the BEST12. format, and the resulting character value is right-aligned. This conversion occurs before the value is assigned or used with any operator or function.

          1. Note: It is best to use a PUT function instead of relying on automatic conversions (just like the input function)
    1. Explicit Numeric-to-Character Conversion
      1. Use the PUT function to explicitly convert numeric data values to character data values

        1. You can write an “assignment” statement (which specifies the assignment-variable) that contains the concatenation operator (||) to combine strings of information and variables

          1. Assignment=site||’/’||dept;
          2. The example above will work as intended if both site and dept are characters
      2. General Form
        1. PUT(source,format)
        2. Where

          1. Source indicates the character variable, constant or expression to be converted to a numeric value
          2. a numeric format must also be specified, as in this example:

put(site,2.)

          1. The PUT function always returns a character string
          2. The PUT function returns the source written with a format
          3. The format must agree with the source in type
          4. Numeric formats right-align the result; character formats left-align the result
          5. If you use the PUT function to create a variable that has not been previously identified, it creates a character variable whose length is equal to the format width
      1. Example Without PUT (numeric value=site)

data hrd.newtemp;

set hrd.temp;

Assignment=site||’/’||dept;

run;

      1. Example With PUT (numeric value=site, format=2.)

data hrd.newtemp;

set hrd.temp;

Assignment=put(site,2.)||’/’||dept;

run;

  • Side Note:
    • Concatenate General Form:
      • argument1||argument-2<||argument-3||etc,…>
      • Arguments can be variables, constants or expressions
  1. Manipulating SAS Date Values with Functions
    1. SAS Stores Date/Time Values as Numbers
      1. SAS stores a date value as the number of days from January 1, 1960, to a given date

Jan 1, 1959

Jan 1, 1960

Jan 1, 1961

-365

0

366

      1. SAS stores a time value as the number of seconds since midnight

(12:00 am)

Midnight

12:15 pm

17:00

(or 5:00 pm)

0

44100

61200

      1. SAS stores a datetime value as the number of seconds between midnight on January 1, 1960, and a given date and time

July 4, 1776

11:30:23

Jan 1, 1960

Midnight

July 4, 1994

16:10:45

-5790400177

0

1088957445

      1. SAS stores date values as numbers so that you can easily sort the values or perform arithmetic computations

data test (keep=name totday);

set hrd.temp;

Totday=enddate-startdate;

run;

      1. Formats only affect the display of dates, not the values
      2. SAS date values are valid for dates that are based on the Gregorian calendar from A.D. 1582 through A.D. 20,000
    1. SAS Date Functions
      1. Functions that create SAS date values

Function

Typical Use

Result

MDY

date=mdy(mon,day,yr);

SAS date

TODAY DATE

now=today();

now=date();

Today’s date as a SAS date

TIME

curtime=time();

Current time as a SAS time

      1. Functions that extract values from SAS date values

Function

Typical Use

Result

DAY

day=day(date);

Day of month (1-31)

QTR

quarter=qtr(date);

Quarter (1-4)

WEEKDAY

wkday=weekday(date);

Day of week (1-7)

MONTH

month=month(date);

Month (1-12)

YEAR

yr=year(date);

Year (4 digits)

INTCK

x=intck(‘day’,d1,d2);

x=intck(‘week’,d1,d2);

x=intck(‘month,d1,d2);

x=intck(‘quarter’,d1,d2);

x=intck(‘year’,d1,d2);

Days from d1 to d2

Weeks from d1 to d2

Months from d1 to d2

Quarters from d1 to d2

Years from d1 to d2

INTNX

x=intnx(‘interval’,start-from,increment);

Date, time, or datetime value

      1. To use DAY, QTR, YEAR, MONTH, WKDAY, etc, put the variable name in the date spot (examples below use the variable “StartDate”)

        1. if year(startdate)=1999 then delete;
        2. where month(startdate)=11;
    1. Year and Month Functions

Example Problem #1

Create a data set named sasuser.talent10 from sasuser.talent. Include in the data set only those actors who were hired in October by extracting the month value from the variable LastHired. Format the SAS date value for LastHired. Print the dataset.

data sasuser.talent10;

set sasuser.talent;

if month(LastHired)=10;

format LastHire date9.;

run;

proc print data=sasuser.talent10;

run;

Example Problem #2

Create a data set named sasuser.taloct99 from sasuser.talent. Include in the data set only those actors who were hired in October 1999 by extracting the month and year values from the variable LastHired. Format the SAS date value for LastHired. Print the dataset.

data sasuser. taloct99;

set sasuser.talent;

if year(LastHired)=1999 and month(LastHired)=10;

format LastHired date9.;

run;

proc print data=sasuser. taloct99;

run;

Example Problem #3

Create a data set named sasuser.talent99 from sasuser.talent. Include in the data set only those actors who were hired in 1999 by extracting the year value from the variable LastHired. Format the SAS date value for LastHired. Print the dataset.

data sasuser. talent99;

set sasuser.talent;

if year(LastHired)=1999;

format LastHire date9.;

run;

proc print data=sasuser. talent99;

run;

    1. Weekday Function
      1. SAS Value Equivalents for Days of the Week

Value

Equals

Day of the Week

1

=

Sunday

2

=

Monday

3

=

Tuesday

4

=

Wednesday

5

=

Thursday

6

=

Friday

7

=

Saturday

    1. MDY Function
      1. General Form
        1. MDY(month,day,year)
        2. Month, day, and year can be numeric values or variables

Practice Example

In the data set Hrd.Temp, the values for month, day, and year are stored in the numeric variables month, day, and year.

data hrd.newtemp (drop=month day year);

set hrd.temp;

Date=mdy(month,day,year);

run;

      1. To set the same SAS date to every observation, you may use numbers in the place of variables

        1. Date=mdy(5,10,20);

          1. This would output as: May 10, 1920 (see note below)
        2. Date=mdy(5,10,2020);

          1. This would output as: May 10, 2020

Note: Be careful when only putting in a 2-digit year. SAS interprets two-digit values according to the 100-year span that is set by the YEARCUTOFF=system option. They default value of YEARCUTOFF=1920. Use four-digit year values whenever possible! To format years clearly, format SAS dates with DATE9. format.

      1. It is also possible to mix and match variables and numbers

        1. ReviewDate=mdy(12,date,1998);

          1. Where date is a variable in the existing dataset

Example Problem

Create a data set named sasuser.master from sasuser.talent. Add a variable named RepHired to the data set to contain the date on which actors joined the agency. The month and day values are stored in the month and day variables. The year is 1998. Print the data set.

data sasuser.master;

set sasuser.talent;

RepHired=mdy(month,day,1998);

run;

proc print data=sasuser.master;

format RepHired date9.;

run;

    1. Date and Today Functions
      1. Return the current date from the system clock as a SAS date value
      2. General forms:
        1. DATE()
        2. TODAY()

          1. These functions require no arguments, but they must still be followed by parenthesis

data hrd.newtemp;

set hrd.temp;

EditDate=date();

run;

Applies the date program is ran on: Month Day, Year (January 1, 2000)

    1. INTCK Function
      1. General Form
      2. INTCK(‘interval’,from,to)
      3. Where
        1. ‘interval’ specifies a character constant or variable. The value must be one of the following:


DAY

WEEKDAY

WEEK

TENDAY

SEMIMONTH

MONTH

QTR

SEMIYEAR

YEAR

        1. from specifies a SAS date, time, or datetime value that identifies the beginning of the time span
        2. to specifies a SAS date, time, or datetime value that identifies the end of the time span
        3. Note: the type of interval (date, time, or datetime) must match the type of value in from
      1. The INTCK function counts intervals from fixed interval beginnings, not in multiples of an interval unit from the from value.

        1. Partial intervals are not counted
        2. WEEK intervals are counted by Sundays rather than seven-day multiples from the from argument
        3. MONTH intervals are counted by day 1 of each month
        4. YEAR intervals are counted from 01JAN, not in 365-day multiples

SAS Statement

Value

Weeks=intck(‘week’,31dec2000’d,’01jan2001’d)

0

Months=intck(‘month’,31dec2000’d,’01jan2001’d)

1

Years=intck(‘year’,31dec2000’d,’01jan2001’d)

1

Because December 31, 2000 is a Sunday, no WEEK interval is crossed between that day and January 1, 2001. HOWEVER, both MONTH and YEAR intervals are crossed.

Practice Example

A common use of the INTCK function is to identify periodic events such as due dates and anniversaries.

The following program identifies the mechanics whose 20th year of employment occurs in the current month. It uses the INTCK function to compare the value of the variable Hired to the date on which the program is run.

data work.anniv20;

set flights.mechanics

(keep=id lastname firstname hired);

Years=intck(‘year’,hired,today());

if years=20 and month(hired)=month(today());

proc print data=work.anniv20;

title ’20-Year Anniversaries This Month’;

run;

Example Problem

Create a data set named sasuser.master from sasuser.talent. Add a variable named RepHired to the data set to contain the date on which actors joined the agency. The month and day values are stored in the month and day variables. The year is 1998.

Create the data set work.master, based on the data set sasuser.master (created above). Add the variable QtrsElapsed to work.master to contain the number of quarters that have elapsed since each actor joined the agency. Base the values of QtrsElapsed on the values in RepHired and LastHired. Print the data set.

Note: Not sure on whether this example wants RepHired to be the start and LastHired to be the finish, OR if it wants both of these to be separate starts to be incorporated (if both are to be incorporated, I’m not sure how they want it separated so values are not overwritten, but the finish would be “today()”)

data sasuser.master;

set sasuser.talent;

RepHired=mdy(month,day,1998);

run;

data work.master;

set sasuser.master;

QtrsElapsed=intck(‘qrt’,RepHired,LastHired);

run;

proc print data=work.master;

run;

    1. INTNX Function
      1. General Form
        1. INTX(‘interval’,start-from,increment<,’alignment’>)
          1. Where

‘interval’ specifies a character constant or variable

start-from specifies a starting SAS date, time, or datetime value

increment specifies a negative or positive integer that represents time intervals toward the past or future

‘alignment’ (optional) forces the alignment of the returned date to the beginning, middle, or end of the interval

Note: The type of interval (date, time or datetime) must match the type of value in start-from and increment

      1. When you specify date intervals, the value of the character constant or variable that is used in interval must be one of the following:

DAY

WEEKDAY

WEEK

TENDAY

SEMIMONTH

QTR

SEMIYEAR

YEAR

        1. TargetYear=intnx(‘year’,’05feb94’d,3);

          1. Returns the value of Jan 1, 1997
        2. TargetMonth=intnx(‘semiyear’,’01jan2001’d,1);

          1. Returns the value of July 1, 2001
      1. Beginning, Middle, End
        1. The INTNX function can return the date in the beginning, middle or end of the interval specified
        2. The optional alignment value uses the following values:

BEGINNING B

MIDDLE M

END E

SAS Statement

Date Value

MonthX=intnx(‘month’,’01jan95’d,5,’b’);

12935 (June 1, 1995)

MonthX=intnx(‘month’,’01jan95’d,5,’m’);

12949 (June 15, 1995)

MonthX=intnx(‘month’,’01jan95’d,5,’e’);

12964 (June 30, 1995)

These INTX statements count five months from January, but the returned value depends on whether alignment specifies the beginning, middle or end of the resulting month. If alignment is not specified, the beginning day is returned by default.

  1. Modifying Character Values with Functions
    1. Examples of ways you can manipulate values of character variables

      1. replace the contents of a character value
      2. trim trailing blanks from a character value
      3. search a character value and extract a portion of the value
      4. convert a character value to uppercase or lowercase

Function

Purpose

SCAN

Returns a specified word from a character value

SUBSTR

Extracts a substring or replaces character values

TRIM

Trims trailing blanks from character values

INDEX

Searches a character value for a specific string

UPCASE

Converts all letters in a value to uppercase

LOWCASE

Converts all letters in a value to lowercase

    1. SCAN Function
      1. Enables you to separate a character value into words and to return a specified word
      2. The SCAN Function uses delimiters—which are characters that are specified as word separators, to separate a character string into words

        1. The following examples use comma delimiters
        2. Low,Moderate,High

          1. Low is 1, Moderate is 2, High is 3
        3. 209 Radcliffe Road, Center City, NY, 92716

          1. If you specify the SCAN function to return the 3rd word, you would get “NY”
      3. Specifying Multiple Delimiters
        1. You can specify as many delimiters as needed to correctly separate the character expression

          1. The SCAN Function treats two or more contiguous delimiters as one delimiter, and leading delimiters have no effect
          2. The following examples have a slash, parenthesis and dash as delimiters / ( ) –

(345)/5672-Trailer The first word is “345”, second is “5672” and third is “Trailer”

      1. Default Delimiters
        1. If no delimiters are specified when using the SCAN Function, default delimiters are used
  1. blank
  2. .
  3. <
  4. (
  5. +
  6. |
  7. &
  8. !
  9. $
  10. *
  11. )
  12. ;
  13. ^
  14. /
  15. ,
  16. %
      1. General Form
        1. SCAN(argument,n,<delimiters>)
          1. Where

argument specifies the character variable or expression to scan

n specifies which word to be read

delimiters are special characters that must be enclosed in single quotation marks (‘’). If you do not specify delimiters, default delimiters are used

Practice Example

Use the SCAN function to create first, middle and last name variables for hrd.temp.

Agency

ID

Name

Administrative Support, Inc.

F274

CICHOCK, ELIZABETH MARIE

Administrative Support, Inc.

F101

BENINCASA, HANNAH LEE

OD Consulting, Inc.

F054

SHERE, BRIAN THOMAS

New Time Temps Agency

F077

HODNOFF, RICHARD LEE

data hrd.newtemp (drop=name);

set hrd.temp;

LastName=scan(name,1,’ ,’); Specifies blank and comma as delimiters

MiddleName=scan(name,3,’ ,’);

FirstName=scan(name,2,’ ,’);

run;

      1. Specifying a Length
        1. The SCAN Function assigns a length of 200 to each target variable (a target variable is the variable that receives the result of the function)
        2. To save storage space, add a length statement to your DATA step and specify an appropriate length for all three variables

          1. Be sure to place this before the assignment statements that contain the SCAN function

data hrd.newtemp (drop=name);

set hrd.temp;

length LastName MiddleName FirstName $ 10; specifies character 10

LastName=scan(name,1,’ ,’);

MiddleName=scan(name,3,’ ,’);

FirstName=scan(name,2,’ ,’);

run;

Example Problem

Create a new data set named sasuser.Agency99 from sasuser.talent. Add a new variable named State to the data set by extracting the state abbreviation from the variable Address2. Print the data set.

Here is a sample Address2 value:

Glenview, NJ

data sasuser.agency99;

set sasuser.talent;

State=scan(address2,2,’ ,’);

proc print data=sasuser.agency99;

run;

    1. SCAN versus SUBSTR
      1. SCAN extracts words within a value that is marked by delimiters
      2. SUBSTR extracts a portion of a value by starting at a specified location

        1. The SUBSTR function is best used when you know the exact position of the string that you want to extract from the character value (This does not need to be marked by delimiters)

Name

ID

Trentonson, Matthew Robert

SO45467

Truell, Marcia Elizabeth

SR32881

      1. The SUBSTR function is the best choice to extract class level information from ID. By contrast, the SCAN function is best used when:

        1. you know the order of the words in the character value
        2. the starting position of the words varies
        3. the words are marked by some delimiter
    1. SUBSTR Function
      1. Can be used to:

        1. extract a position of a character value
        2. replace the contents of a character value
      2. General Form
        1. SUBSTR(argument,position,<n>)
          1. Where

argument specifies the character variable or expression to scan

position is the character position to start from

n specifies the number of characters extracted. If n is omitted, all remaining characters are included in the substring

Practice Example

Using the SUBSTR function, you can extract the first letter of the MiddleName value to create the new value MiddleInitial (from the data set hrd.newtemp, created in the previous practice example)

data work.newtemp;

set hrd.newtemp;

MiddleInitial=substr(middlename,1,1); Specifies the 1st letter and only the 1st character to be included in the target variable result

run;

Example Problem

Create a data set named sasuser.newtal from sasuser.talent. Add a new variable to sasuser.newtal named Sex. Create the variable Sex by extracting the abbreviation for sex from the variable ID. Print the data set.

Here are some sample ID values:

452F

107M

data sasuser.newtal;

set sasuser.talent;

Sex=substr(id,4,1);

proc print data=sasuser.newtal;

run;

      1. Replacing Text Using SUBSTR
        1. The second use for SUBSTR function is to replace the contents of a character variable
        2. When the function is placed on the right side of an assignment statement, it returns the requested string (extracts)

MiddleInitial=substr(middlename,1,1);

        1. When the function is placed on the left side of an assignment statement, it modifies the variable values

substr(region,1,3)=’NNW’;

        1. When the SUBSTR function modifies variable values, the right side of the assignment statement must specify the value to place into the variable.

Example: To replace the fourth and fifth characters of a variable named Test with 92, you write the following assignment statement:

substr(test,4,2)=’92’;

Old Test Value

SUBSTR Function

New Test Value

S7381K2

substr(test,4,2)=’92’;

S7392K2

S7381K7

substr(test,4,2)=’92’;

S7392K7

Practice Example

Use the SUBSTR function to replace the 622 exchange in the variable Phone to 433. Not all phone numbers start with 622, so use a SUBSTR function to assign each phone number’s exchange in a separate variable. Then use the exchange-variable to create an if statement.

data hrd.temp2 (drop=exchange);

set hrd.temp;

exchange=substr(phone,1,3);

if exchange=’622’ then substr(phone,1,3)=’433’;

run;

Example Problem

Create a data set named sasuser.detalent from sasuser.njtalent. In the new data set, replace a portion of the variable NewPhone. Change the area code from 201 to 302 in each value of the variable. Print the data set.

Here is a sample value of NewPhone:

(201)5554200

data set sasuser.detalent;

set sasuser.njtalent;

substr(newphone,2,3)=’302’;

proc print data=sasuser.detalent;

run;

    1. TRIM Function
      1. Enables you to remove trailing blanks from character values
      2. Whenever the value of a character variable does not match the length of the variable, SAS pads the value with trailing blanks

Practice Example

The data set hrd.temp contains four address variables: address, city, state, and zip. Create one address variable that contains the values of three variables address, city, and zip. State is not required because it is a local business data set.

data hrd.newtemp (drop=address city state zip);

set hrd.temp;

NewAddress=address||’, ‘||city||’, ‘||zip;

run;

Note: Because of the trailing blanks in these three variables, the output will not turn out as expected. Instead, the TRIM function must be used to get the desired output.

      1. General Form
        1. TRIM(argument)
          1. Where

argument can by any character expression such as

a character variable: trim(address)

another character function: trim(left(id))

To remove blanks from the practice example’s output, incorporate the TRIM function as follows (note: the zip variable does not include trailing blanks):

data hrd.newtemp (drop=address city state zip);

set hrd.temp;

NewAddress=TRIM(address)||’, ‘||TRIM(city)||’, ‘||zip;

run;

      1. Points to Remember
        1. The TRIM function does not affect how a variable is stored. If you TRIM a variable and assign it to a new variable with another length statement that is longer than some of the values, they will once again be padded with trailing blanks.

data temp;

set hrd.temp;

length Street $ 20;

Street=TRIM(address);

run;

This program will assign the trimmed value of address to the new variable Street. When the trimmed value is assigned to Street, trailing blanks are added to the value to match the length of 20.

    1. INDEX Function
      1. The INDEX function enables you to search a character value for a specified string.

        1. It searches from left to right, looking for the first occurrence of the string
        2. It returns the position of the string’s first character
        3. If the string is not found, it returns a value of 0
      2. General Form
        1. INDEX(source,excerpt)
          1. Where

source specifies the character variable or expression to search

excerpt specifies the character string that is enclosed in quotation marks (‘’)

Practice Example

The data set hrd.temp has a character variable Job that lists skills of temporary employees. Create a data set that contains the names of all temporary employees who have word processing experience.

data hrd.datapool;

set hrd.temp;

if index(job,’word processing’) > 0; any index values of “0” do not contain that character string and will not be included in the data set

run;

NOTE: INDEX is case sensitive, so the character string that you are searching for must be specified exactly as it is recorded in the data set. Use the upcase and lowcase functions to convert variable values to uppercase or lowercase letters to ensure all occurrences of a character string are found.

data hrd.datapool;

set hrd.temp;

if index(upcase(job),’WORD PROCESSING’) > 0;

run;

data hrd.datapool;

set hrd.temp;

if index(lowcase(job),’word processing’) > 0;

run;

Example Problem

Create a new data set named sasuser.stage from sasuser.talent. Include in the new data set only those actors who have stage experience by searching the variable Comment. Print the data set.

Here are some sample Comment values:

directing, stage

stage,commercials

data sasuser.stage;

set sasuser.talent;

if index(lowcase(comment),’stage’) > 0;

proc print data=sasuser.stage;

run;

    1. UPCASE and LOWCASE Functions
      1. Converts all letters in a character expression to uppercase
      2. General Forms:
        1. UPCASE(argument)
          1. Where argument can be any SAS expression, such as a character variable or constant
        2. LOWCASE(argument)
          1. Where argument can be any SAS expression, such as a character variable or constant
    2. TRANWRD Function
      1. Replaces or removes all occurrences of a pattern of characters within a character string.

        1. The translated characters can be located anywhere in the string
      2. General Form:
        1. TRANWRD(source,target,replacement)
          1. Where

source specifies the source string that you want to translate

target specifies the string that SAS searches for in source

replacement specifies the string that replaces target

Note: target and replacement can be specified as variables or as character strings. If you specify character strings, be sure to enclose the strings in quotation marks (‘’ or “”).

Practice Example

You can use TRANWRD function to update variables in place. In this example, the function updates the values of Name by changing every occurrence of the string Monroe to Manson

Name=tranwrd(name,’Monroe’,’Manson’)

  1. Introduction to Modifying Numeric Values
    1. SAS provides functions to create or modify numeric values. These include arithmetic, financial, and probability functions.
    2. INT Function
      1. To return the integer portion of a numeric value, use the INT Function

        1. Decimals are discarded
      2. General Form
        1. INT(argument)
          1. Where argument is a numeric variable, constant, or expression

data work.after;

set work.before;

Examples=int(examples);

run;

Examples Before

INT Function

Examples After

326.54

Examples=int(examples);

326

98.20

Examples=int(examples);

98

-32.66

Examples=int(examples);

-32

1401.75

Examples=int(examples);

1401

    1. ROUND Function
      1. Rounds values to the nearest specified unit
      2. General Form
        1. ROUND(argument,round-off-unit)
          1. Where

argument is a numeric variable, constant, or expression

round-off-unit is numeric and nonnegative

data work.after;

set work.before;

Examples=round(examples,.2);

run;

Examples Before

INT Function

Examples After

326.54

Examples=round(examples,.2);

326.60

98.20

Examples=round(examples,.2);

98.20

-32.66

Examples=round(examples,.2);

-32.60

1401.75

Examples=round(examples,.2);

1401.80

  1. Nesting SAS Function
    1. Nest functions together to write more efficient programs (instead of breaking them down step-by-step)

      1. You can nest any functions as long as the function that is used as the argument meets the requirements for the argument

Practice Examples

You can nest SCAN function within the SUBSTR function in an assignment statement to computer the value for MiddleInitial:

MiddleInitial=substr(scan(name,3),1,1);

  1. SUBSTR takes the character value in the 1st location of the SCAN-variable and extracts only 1 character
  2. SCAN locates the third delimited word in the variable Name

This example of nested numeric functions determines the number of years between June 15, 1999, and today:

Years=intck(‘year’,’15jun1999’d,today());

  1. today() automatically returns the date the SAS program is ran

Lesson Twenty-Three – Accessing DBMS Data

  1. SAS/ACCESS LIBNAME Statement
    1. General Form
      1. LIBNAME libref SAS/ACCESS-engine-name

<SAS/ACCESS-engine-connection-options>

<SAS/ACCESS-LIBNAME-options>;

      1. Where
        1. SAS/ACCESS-engine-name is the engine name for your DBMS, such as Oracle or SYBASE
        2. SAS/ACCESS-engine-connection-options are connection options that are specific to the database, such as USER= and PASSWORD=
        3. SAS/ACCESS-LIBNAME-options are processing options that are specific to the database, such as PATH= and DBINDEX=

Practice Example

The SAS/ACCESS LIBNAME statement below associates the libref dblib with an Oracle database management system. If the Oracle DBMS contains a table named Employees, you can reference Dblib.Employees as if it were a SAS dataset.

libname dblib oracle

user=tester1 password=testpw

path=ora7db dbindex=y;

  1. Clearing Librefs
    1. To clear a libref from a DBMS, specify the libref in a libname statement followed by the CLEAR option

      1. libname dblib clear;
  2. Using Engine Connection Options
    1. Options that you specify to connect to a particular database, typical connection options include:

      1. a user name
      2. a password
      3. a database name

Practice Example

In this example, the libref Mydblib connects to an Oracle database. The SAS/ACCES engine connection options are USER=, PASSWORD=, and PATH=specifies an alias for the database specifications.

libname mydblib oracle user=tester

password=testpass path=myorapath;

In this example, the libref myodbc uses the ODBC engine to connect to a Microsoft SQL Server database. The SAS/ACCESS engine connection options are UID=, PWD=, and DSN=.

libname myodbc odbc uid=testuser pwd=testpass

dsn=sqlservr;

Note: If connection options include characters that are not allowed in SAS names, enclose the values of the options in quotation marks.

  1. Using LIBNAME Options
    1. Options that apply to the processing of objects and data in a DBMS, such as tables or indexes
    2. Common SAS/ACCESS LIBNAME options:

ACCESS=

CONNECTION=

CONNECTION_GROUP=

DBCONINIT=

DBCONTERM=

DBINDEX=

DBLIBINIT=

DBLIBTERM=

DBPROMPT=

DEFER=

PRESERVE_COL_NAMES=

PRESERVE_TAB_NAMES=

READ_LOCK_TYPE=

REREAD_EXPOSURE=

SPOOL=

UPDATE_LOCK_TYPE=

Practice Example

In the statement below, the SAS/ACCESS LIBNAME options ensure that multiple librefs share a single connection, that the ODBC database indexes are used, and that the user-defined initialization command EXEC UPDATE_PROC is submitted immediately after connection:

libname dbdeux odbc

user=cestmoi using=sesame db=receipts

connection=globalread dbindex=yes

dbconinit=’exec update_proc’;

  1. Preserving Table and Column Names
    1. Preserves spaces, special characters, and case sensitivity in column or table names when data is being read or sent to DBMS

      1. In order to use column names that are not valid SAS names in your SAS program, you must use either the DQUOTE=ANSI option or the VALIDVARNAME=ANY option
    2. General Form
      1. PRESERVE_COL_NAMES=NO|YES
      2. PRESERVE_TAB_NAMES=NO|YES

Practice Example

In the program below, specifying the value YES in the PRESERVE_COL_NAMES= option retains the case of the column names in the Oracle data that is accessed through Oraclib.Staff.

libname oraclib oracle

user=kilroy password=washere

path=oracpath preserve_col_names=yes;

proc sql;

create table oraclib.emps as

(select IDnum, LName, FName

from oraclib.staff

where City=”STAMFORD”);

quit;

Value

Result

Example

NO

Names of columns or tables in the DMBS are specified using SAS naming conventions

In your SAS program, you can reference the Oracle column Total$Cost as the variable Total_Cost

Mixed-case alphanumerics and the underscore (_) character are supported

If a column or table name contains something else, it is converted to an underscore when reference by SAS

YES

Names are read from and passed to the DBMS with special characters and exact spelling preserved

In your SAS program, you can reference the Oracle column Total$Cost as the variable Total$Cost

  1. Referencing DBMS Objects in SAS Programs
    1. After a DBMS libref is specified, use the same naming as if it is a SAS data set

      1. DBMS: libref.object-name
      2. SAS Dataset: libref.filename

Practice Example #1: Querying a DBMS Table

In this program, the SQL procedure reads flight information from the Oracle table Delay. The WHERE clause selects only flights to London and Frankfurt.

libname oraclib oracle user=cestmoi

password=sesame path=’oracpath’;

proc sql;

select dates format=date9., dest

from oraclib.delay

where dest in (“FRA”,”LON”)

order by dest;

quit;

Practice Example #2: Combining DBMS Data and SAS Data

This example shows how to read DBMS data and how to create an additional variable for calculations or subsets. The program below creates the SAS data set work.highwage by reading the DB2 table Payroll (reference as Db2lib.Payroll). It also creates a variable, Category, based on the value of the Salary column in the DB2 table Payroll. The Payroll table is not modified.

libname db2lib db2 ssid=db2;

data highwage;

set db2lib.payroll (drop=sex birth hired);

if salary>6200 then

Category=”High”;

else if salary<30000 then

Category=”Low”;

else Category=”Avg”;

run;

Practice Example #3: Updating a SAS Data Set with DBMS Data

You can use the DATA step to update an existing SAS data set with new DBMS data. In this case, the SAS data set work.payroll is updated with data from the DB2 tables Payroll and Payroll2. The DB2 tables are not modified.

libname db2lib db2 ssid=db2;

data payroll;

update db2lib.payroll db2lib.payroll2;

by idnum;

run;

SAS assumes that the variables in SAS data sets correspond to the columns in DBMS tables. If the variables and columns do not match, you can use the DROP= or KEEP= data set options to specify only the SAS variables and DBMS columns that do correspond.

Practice Example #4: Updating DBMS Data

You can write directly to a DBMS, updating rows, columns, and tables with the SQL procedure. The following example adds a new row to the DB2 table Superv.

libname db2lib db2 ssid=db2;

proc sql;

insert into db2lib.superv

values(‘1588’,’NY’,’FA’);

quit;

Practice Example #5: Creating a DBMS Table

You can also use the SQL procedure to create new tables in DBMS. This example creatures the Oral table Gtforty by retriveing data from the Staff and Payroll tables.

libname oraclib oracle user=cestmoi

password=sesame path=oracpath

preserv_col_names=yes;

proc sql dquote=ansi;

create table oraclib

select lname as LastName,

fname as FirstName,

salary as Salary

format=dollar10.2

from oraclib.staff a,

oraclib.payroll b

where (a.idnum eq b.idnum) and

(salary gt 40000);

quit;

DQUOTE=ANSI is a PROC SQL option that specifies whether PROC SQL treats values that are enclosed in double quotation marks as a character string or as a column name or table name.

  1. Processing SQL Statements
    1. Enables yout o run a proc sql directly to a dbms

Statement or Component

Function

CONNECT Statement

Establishes a connection to the DBMS

CONNECTION TO component (in the FROM clause of the PROC SQL SELECT statement)

Retrevies data directly from the DBMS

EXECUTE Statement

Sends dynamic, non-query, SQL statements to the DBMS

DISCONNECT Statement

Terminates the connection to the DBMS

    1. Establishing a DBMS Connection
      1. General form:

CONNECT TO dbms-name <AS alias> <(<connect-statement-arguments>

<database-connection-arguments>)>;

      1. Where:
        1. dbms-name identifies the database management system
        2. alias names an optional, 1 to 8 character alias for the connection
        3. connect-statement-arguments specifies arguments that indicate whether you can make multiple connections, share or unique connections, and so on, to the database. These arguments are optional.
        4. datbase-connection-arguments specifies the DBMS-specific arugments that are need by the PROC SQL to connect to the DBMS. These arguments are optional for most databases.

Practice Example

The CONNECT statement below connects to SYBASE server and assigns the alias Sybocn1 to it. SYBASE is a case-sensitive database, so the names of the database objects are in uppercase, as they werwe specified when the objects were created. The CONNECTION=GLOBAL argument specifies that multiple CONNECT statements that use identical values for all options can share the same connection to the DBMS. Notice that the arguments are enclosed in parenthesis.

proc sql;

connect to sybase as sybcon1

(server=SERVER1 database=PERSONNEL

User=testuser password=testpass

Connection=global);

quit;

    1. Specifying the DBMS Connection to Use
      1. General form:
        1. CONNECTION TO dbms-name | alias (DBMS-query)
        2. Where:
          1. dbms-name | alias identifies the database management system or its alias
          2. (DBMS-query) specifies an SQL query in parentheses
      2. You specify the CONNECTION TO component in the FROM clause of a PROC SQL SELECT statement

PROC SQL;

SELECT column-list

FROM CONNECTION TO dbms-name (DBMS-query)

other optional PROC SQL clauses

QUIT;

    1. Sending SQL Statements to the DBMS
      1. General Form
        1. EXECUTE (DBMS-specific-SQL-statement)

BY dbms-name | alias;

      1. Where:
        1. DBMS-specific-SQL-statement is a dynamic, non-query SQL statement enclosed in parenthesis. The statement may be case-sensitive and is passed to the DBMS exactly as you type it
        2. dbms-name | alias identifies the DBMS or its alias. You cannot use an alias if you omitted the CONNECT statement
      2. Typically, the DBMS-specific SQL statements in an EXECUTE statement include one or more of the following:

        1. CREATE a DBMS table, view, index, or other object
        2. INSERT rows in a DBMS table
        3. DELETE rows from a DBMS table
        4. UPDATE the data in a DBMS table
        5. DROP or delete a DBMS table, view, or other object
        6. GRANT authority to access or modify objects
        7. REVOKE the access or modification privileges that were given by the GRANT statement

Practice Example

In the following program, the EXECUTE statement grants UPDATE and INSERT authority to user CESTMOI for the DBMS table Orders.

proc sql;

connect to oracle as dblink

(user=cestmoi pw=trustme

path=oracpath connection=global);

execute (grant update, inerst on orders

to cestmoi) by dblink;

disconnect from dblink;

quit;

    1. Disconnecting from the DBMS
      1. General form
        1. DISCONNECT FROM dbms-name | alias
        2. Where dbms-name | alias identifies the database management system or its alias
          1. If you omite the DISCONNECT statement, am implicit DISCONNECT is performed when PROC SQL terminates

Practice Example

In the PROC SQL step below, the DISCONNECT statement ends the connection to the DB2 database that has the alias Dbshare, and the QUIT statement terminates the SQL procedure.

proc sql;

connect to db2 as dbshare (ssid=db2a);

disconnect from dbshare;

quit;

Lesson Twenty-Four – Generating Data with DO Loops

  1. Constructing DO Loops
    1. DO loops process a group of statements repeatedly rather than once

      1. The following example calculates “Earned”, which is compounded monthly

data finance.earnings;

set finance.master;

Earned=0;

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

earned+(amount+earned)*(rate/12);

run;

      1. This can be simplified using a DO Loop

data finance.earnings;

set finance.master;

Earned=0;

do count=1 to 12;

earned+(amount+earned)*(rate/12);

end;

run;

    1. General Form
      1. DO index-variable=start TO stop BY increment;

SAS Statements;

END;

      1. Where
        1. index-variable stores the value of the current iteration of the DO loop. You may use any valid SAS name
        2. start value specifies the initial value of the index variable
        3. TO clause specifies the stop value. The stop value is the last index value that executes the DO loop
        4. BY clause (optional) specifies an increment value for the index variable. The default is 1.

Example Problem

Complete the DATA step by constructing a DO loop to computer the total value of Value, earning 7.5% interest for 20 years.

data work.earn;

Value=2000;

do year=1 to 20;

Interest=value*.075;

value+interest;

end;

run;

proc print data=work.earn;

run;

    1. Counting Iterations of DO Loops
      1. The following example creates an index variable Counter and drops it from the dataset, but also assigns a variable Year to count the number of iterations of the DO Loop

data work.earn (drop=counter);

Value=2000;

do counter=1 to 20;

Interest=value*.075;

value+interest;

Year+1;

end;

run;

    1. Explicit OUTPUT Statements
      1. To created an observation for each iteration of the DO loop, place an OUTPUT Statement inside the loop

        1. By default, every DATA step contains an implicit OUPUT statement at the end of the Step
        2. Placing an explicit output statement overrides automatic output, causing SAS to add an observation to the data set only when the explicit OUTPUT statement is executed

data work.earn;

Value=2000;

do Year=1 to 20;

Interest=value*.075;

value+interest;

output;

end;

run;

    1. Decrementing DO Loops
      1. Specify a negative value for the BY clause

DO index-variable=5 to 1 by -1;

SAS statements;

END;

    1. Specifying a Series of Items
      1. DO index-variable=value1,value2,value3…;

SAS Statements;

END;

        1. Values can be character or numeric
      1. To list items in a series, you must specify either

        1. all numeric values

DO index-varaiable=2,5,9,13,27;

SAS Statements

END;

        1. all character values, with each value enclose in quotation marks

DO index-varaiable=“MON”;”TUE”,”WED”,”THR”,”FRI”;

SAS Statements

END;

        1. all variable names—the index variable takes on the values of the specified variables

          1. variable names must represent either all numeric or all character values. Do not enclose variable names in quotation marks.

DO index-variable=win,place,show;

SAS Statements;

END;

  1. Nesting DO Loops
    1. Iterative DO statements can be executed within a DO loop.
    2. In order for nested DO loops to execute correctly, you must:

      1. assign a unique index-variable name in each iterative DO statement
      2. end each DO loop with an END statement

data work.earn;

do year=1 to 20;

Capital+2000;

do month=1 to 12;

Interest=capital*(.075/12);

Capital+Interest;

end;

end;

run;

Example Problem

The variable Rate is an annual interest Rate. Revise the program so that the interest earned on Amount is calculated four times annually.

data work.save;

Rate=.0625;

do year=1 to 20;

Amount+2000;

do j=1 to 4;

amount+amount*rate;

end;

end;

run;

  1. Iteratively Processing Data The Is Read from a Data Set
    1. The following example sets the variable Years as the stop value. As a result, the DO loop executes the number of times specified by the current value of Years

data work.compare (drop=i);

set finance.cdrates;

Investment=5000;

do i=1 to years;

investment+rate*investment;

end;

run;

Example Problem

Complete the DATA step to compute the interest on several loans. Create the variable Balance equal to the variable Amount. Then construct a DO loop to execute the variable i to the value of the variable Months. Print the data set.

data work.totals (drop=i balance interest);

set sasuser.loans;

Balance=Amount;

TotalInterest=0;

do i to Months;

Interest=balance*(rate/12);

balance+interest-payment;

totalinterest+interest;

end;

run;

proc print data=work.totals;

run;

  1. Conditionally Executing DO Loops
    1. Used when exact number of iterations required is not known, but needs to be looped until a set condition is met (or is no longer met)
    2. Use DO WHILE and DO UNTIL
    3. General Form of DO UNTIL
      1. DO UNTIL (expression);

more SAS statements;

END;

      1. Where expression is a valid SAS expression enclosed in parenthesis
      2. Note: The expression is not evaluated until the bottom of the loop, so a DO UNTIL is always executed at least once.

Practice Example

Assume you want to know how many years it will take to earn $50,000 if you deposit $2,000 each year into an account that earns 10% interest. The DATA step below uses a DO UNTIL statement to perform the calculation until the value is reached. Each iteration of the DO loop represents one year of earning.

data work.invest;

do until (Capital>=50000);

capital+2000; adds 2000 to “Capital”

capital+capital*.10; adds “Capital*.10” to Capital”

Year+1; adds “1” to Year

end;

run;

Example Problem

Complete the DATA step by including a DO UNTIL statement to determine the number of years until Savings reaches at least $1,000,000. Supply your own initial values for Income and Savings. Print the data set.

data work.retire;

Savings=8000;

Income=42000;

do until (savings>=1000000);

Year+1;

income+income*.04;

savings+income*.10;

end;

run;

proc print data=work.retire;

run;

    1. General Form of DO WHILE
      1. DO WHILE (expression);

more SAS statements;

END;

      1. Where expression is a valid SAS expression enclosed in parenthesis
      2. Note: The expression is evaluated at the top of the loop, so a DO UNTIL is only executed when the expression is true the first time it is evaluated

Example Problem

Change the DO UNTIL statement to a DO WHILE statement to calculate the number of years until Savings reaches $1,000,000.

data work.retire;

Savings=8000;

Income=42000;

do while (savings>=1000000);

Year+1;

income+income*.04;

savings+income*.10;

end;

run;

proc print data=work.retire;

run;

    1. Using Conditional Clauses with the Iterative DO Statement
      1. You can combine conditional and unconditional execution of the DO Loop

do i=1 to 10 until (Capital>=50000);

        1. This example will execute until i=10 or Capital is greater than or equal to $50,000.
      1. General Form:
      2. do index-variable=start to stop by increment until/while (expression);

Example Problem

Modify the DATA step to compute the savings after 30 years or until Savings reaches $1,000,000 or more. Print the data set.

OLD PROGRAM:

data work.retire;

Savings=6000;

Income=38000;

do i=1 to 30 while (savings<1000000);

Year+1;

income+(income*.06);

savings+(income*.20);

end;

run;

    1. Creating Samples
      1. Because it performs iterative processing, a DO loop provides an easy way to draw sample observations from a data set.

Practice Example

Suppose you would like every tenth observation of the 5,000 observations in Factory.Widgets.

data work.subset;

do sample=10 to 5000 by 10;

set factory.widgets point=sample;

output;

end;

stop;

run;

      1. The program evaluates the following:

        1. Sets the new data set as Subset stored in Work (temporary)
        2. Runs a “do loop” where sample starts at 10 and goes until 5,000 by 10
        3. The “point” option is used to set the observation point equal to the sample
        4. Output is selected to output the current observation in the loop
        5. After the loop is done processing, the “end” statement ends it
        6. After the loop is ended, the POINT option is stopped using a “stop” statement
        7. The program is submitted using a “run” statement
  1. Quiz Notes
    1. DO Loops can only be used in DATA steps, they cannot be used in PROC steps
    2. For DO Loops using a fixed condition (do i=1 to 10), the value of “i” at the end of the tenth iteration will be incremented to 11. At the beginning of the next iteration it will have exceeded the stop value and the current values will be written to the observation

      1. If the output statement is included in the DO Loop, then this is over-ridden and the observation written will be 10, not 11 (since the output will be written before the end of the data step—which also means the final value of i=11 will also not be written since the automatic output will be turned off).

Lesson Twenty-Five – Processing Variables with Arrays

  1. You can use arrays to simplify code needed to

    1. perform repetitive calculations
    2. create many variables that have the same attributes
    3. read data
    4. rotate SAS data sets by changing variables to observations or observations to variables
    5. compare variables
    6. perform a table lookup
  2. Creating One-Dimensional Arrays
    1. A SAS array is a temporary grouping of SAS variables under a single name
    2. One reason for using an array is to reduce the number of statements that are required for processing variables

      1. For example, in the DATA step below, the values of seven data set variables are converted from Fahrenheit to Celsius temperatures

data work.report;

set master.temps;

mon=5*(mon-32)/9;

tue=5*(tue-32)/9;

wed=5*(wed-32)/9;

thr=5*(thr-32)/9;

fir=5*(fri-32)/9;

sat=5*(sat-32)/9;

sun=5*(sun-32)/9;

run;

      1. As you can see, the assignment statements perform the same calculation on each variable in this series of statements. Only the name of the variable changes in each statement.
      2. By grouping the variables into a one-dimensional array, you can process the variables in a DO Loop. You use fewer statements, and the DATA step program is more easily modified or corrected.

data work.report;

set master.temps;

array wkday{7} mon tue wed thr fri sat sun;

do i=1 to 7;

wkday{i}=5*(wkday{i}-32)/9;

end;

run;

    1. General Form
      1. ARRAY array-name{dimension} <elements>;
        1. Where
          1. array-name specifies the name of the array
          2. dimension describes the number and arrangement of array elements. The default dimension is one.
          3. elements lists the variables to include in the array. Array elements must either be all numeric or all character. If no elements are listed, new variables will be created with default names.

Note: Do not give an array the same name as a variable in the same DATA step.

Note: Avoid using the name of a SAS function. The array will be correct, but you won’t be able to use the function in the same DATA step, and this will also cause a warning message to appear in the SAS log

Note: You cannot use array names in LABEL, FORMAT, DROP, KEEP or LENGTH statements. Arrays exist only for the duration of the DATA step, they do not become part of the output data set.

    1. Specifying the Array Name
      1. To group the variables in the array, first give the array a name. In this example, the array’s name is “sales”

array sales{4} qtr1 qtr2 qtr3 qtr4

    1. Specifying the Dimension
      1. The dimension describes the number and arrangement of elements in the array. There are several ways to specify the dimension.

        1. In a one-dimensional array, you can simply specify the number of array elements. The array elements are the existing variables that you want to reference and process elsewhere in the DATA step.

array sales{4} qtr1 qtr2 qtr3 qtr4;

        1. The dimension of an array doesn’t have to be the number of array elements. You can specify a range of values for the dimension when you define that array. For example, you can define the array Sales as follows:

array sales{96:99} totals96 totals97 totals98 totals99;

        1. You can also indicate the dimension of a one-dimensional array by using an asterisk (*). This way, SAS determines the dimension of the array by counting the number of elements.

array sales{*} qtr1 qtr2 qtr3 qtr4;

        1. Enclose the dimension in either parenthesis, braces, or brackets:

          • [ ]
          • { }
          • ( )
    1. Specifying Array Elements
      1. You can list each variable name or a variable list in a one-dimensional array

        1. array sales{4} qtr1 qtr2 qtr3 qtr4;
        2. array sales{4} qtr1- qtr4;
    2. Variable Lists as Array Elements
      1. You can specify variables lists in the forms shown below

To specify these variables…

Use this form…

A numbered range of variables

Var1-Varn

All numeric variables

_NUMERIC_

All character variables

_CHARACTER_

All variables

_ALL_

      1. A Numbered Range of Variables
        1. The variables must have the same name except for the last character or characters
        2. The last character of each variable must be numeric
        3. The variables must be numbered consecutively

array sales{4} qtr1-qtr4;

        1. In the preceding example, you would use Sales{4} to reference Qtr4. However, the index of an array doesn’t have to range from one to the number of array elements. You can specify a range of values for the index when you define the array. For example, you can define the array Sales as follows:

array sales{96:99} qtr1-qtr4;

      1. All Numeric Variables
        1. Amount Rate Term _NUMERIC_
        2. _NUMERIC_ specifies all numeric variables that have already been defined in the current DATA step

array sales(*) _numeric_;

      1. All Character Variables
        1. FirstName LastName Address _CHARACTER_
        2. _CHARACTER_ specifies all character variables that have already been defined in the current DATA step

array sales(*) _character_;

      1. All Variables
        1. FirstName LastName Address Amount Rate Term _ALL_
        2. _ALL_ specifies all variables that have already been defined in the current DATA step. The variables must all be of the same type: all character or all numeric.

array sales(*) _all_;

    1. Referencing Elements of an Array
      1. The ability to reference the elements of an array by an index value is what gives arrays their power. Typically, arrays are used with DO loops to process multiple variables and to perform repetitive calculations.

data quarter{4} jan apr jul oct;

do i=1 to 4;

YearGoal=quarter(i)*1.2;

end;

      1. When you define an array in a DATA step, an index value is assigned to each array element. The index values are assigned in the order of the array elements.
      2. data quarter{4} jan apr jul oct;

        1. sets the index jan=1, apr=2, jul=3, oct=4
      3. data quarter{10:13} jan apr jul oct;

        1. sets the index jan=10, apr=11, jul=12, oct=13
      4. You use an array reference to perform an action on an array element during execution. To reference an array element in the DATA step, specify the name of the array, followed by an index value enclosed in parenthesis.
      5. General Form:
        1. array-name(index-value)
          1. where index-value

is enclosed in parentheses, braces, or brackets

specifies a variable, a SAS expression or an integer

is within the lower and upper bounds of the dimension of the array

      1. When used in a DO loop, the index variable of the iterative DO statement can reference each element of the array:

array qtr{4} jan apr jul oct;

do i=1 to 4;

YearGoal=qtr{i}*1.2;

end;

Practice Example

The Health Center of a company conducts a fitness class for its employees. Each week, participants are weighed so that they can monitor their progress. The weight data, currently stored in kilograms, needs to be converted to pounds.

You can use a DO loop to update the variables Weight1 through Weight6 for each observation in hrd.fitclass.

data hrd.convert;

set hrd.fitclass;

array wt{6} weight1-weight6;

do i=1 to 6;

wt{i}=wt{i}*2.2046;

end;

run;

    1. Using the DIM Function in an Iterative DO Statement
      1. You can also use the DIM function to specify the TO clause of the iterative DO statement. For a one-dimensional array, specify the array name as the argument for the DIM function. The function returns the number of elements in the array.
      2. General Form
        1. DIM(array-name)
          1. Where array-name specifies the array
      3. When you use the DIM function, you do not have to re-specify the stop value of an iterative DO statement if you change the dimension of the array.

data hrd.convert;

set hrd.fitclass;

array wt{*} weight1-weight6;

do i=1 to dim(wt);

wt{i}=wt{i}*2.2046;

end;

run;

Example Problem

Create the sasuser.added data set from sasuser.funddrive, which contains employees’ contributions to a charity. Write an ARRAY statement to group the variables qtr1 through qtr4 for processing. Name the array contrib. Write a DO loop to add the company’s supplement of 25 percent to each employee’s quarterly contribution. Print the data set.

data sasuser.added;

set sasuser.funddrive;

array contrib{*} qtr1-qtr4;

do i=1 to dim(contrib);

contrib{i}=contrib{i}+(contrib{i}*.25);

end;

run;

proc print data=sasuser.added;

run;

  1. Expanding Your Use of Arrays
    1. Creating Variables in an ARRAY Statement
      1. You can create variables in an ARRAY statement by omitting the array elements form the statement.
      2. General Form
        1. ARRAY array-name{dimension};
        2. Where
          1. array-name specifies the name of the array
          2. dimension describes the number and arrangement of array elements. The default dimension is one.

Practice Example

Suppose you need to calculate the weight gain or loss from week to week for each member of a fitness class. You’d like to create variables that contain this weekly difference. To perform the calculation, you first group the variables Weight1 through Weight6 into an array.

data hrd.diff;

set hrd.convert;

array wt{6} weight1-weight6;

Next, you want to create the new variables to store the differences between the six recorded weights. You can use an additional ARRAY statement without elements to create the new variables.

data hrd.diff;

set hrd.convert;

array wt{6} weight1-weight6;

array WgtDiff{5};

The default variable names are created by concatenating the array name and the numbers 1,2,3 and so on, up to the array dimension. You can specify individual variable names by listing each name as an element of the array.

To create an array of character variables add a dollar sign ($) after the array dimension.

array firstname{5} $;

By default, all character variables that are created in an ARRAY statement are assigned a length of 8. You can assign your own length by specifying the length after the dollar sign.

array firstname{5} $ 24;

Note: The length that you specify is automatically assigned to all variables that are created by the ARRAY statement.

Now you can use a DO loop to calculate the differences between each of the recorded weights. Notice that each value of WgtDiff{i} is calculated by subtracting wt{i} from wt{i+1}. By manipulating the index variable, you can easily reference any array element.

data hrd.diff;

set hrd.convert;

array wt{6} weight1-weight6;

array WgtDiff{5};

do i=1 to 5;

wgtdiff{i}=wt{i+1}-wt{i}

end;

run;

    1. Assigning Initial Values to Arrays
      1. To assign initial values in an ARRAY statement:

        1. place the values after the array elements
        2. specify one initial value for each corresponding array element
        3. separate each value with a comma or blank
        4. enclose the initial values in parenthesis
        5. enclose each character value in quotation marks

array goal{4} g1 g2 g3 g4 (9000 9100 9200 9300);

array col{3} $ color1-color3 (‘red’,’green’,’blue’);

          1. Note: It is also possible to assign initial values to an array without specifying each array element

array Var{4} (1 2 3 4);

this specifies initial values for Var1 Var2 Var3 Var4

Practice Example

Assume that you have the task of comparing the actual sales figures in the Finance.Qsales data set to the sales goals for each sales representative at the beginning of the year. The sales goals are not recorded in Finance.Qsales.

The first array created processes sales data for each quarter.

data finance.report;

set finance.qsales;

array sale{4} sales1-sales4;

To compare actual sales to the sales goals, you must create the variables for the sales goals and assign values to them.

data finance.report;

set finance.qsales;

array sale{4} sales1-sales4;

array Goal{4} (9000 9300 9600 9900);

A third ARRAY statement creates the variables Achieved1 through Achieved4 to store the comparison of actual sales versus sales goals.

data finance.report;

set finance.qsales;

array sale{4} sales1-sales4;

array Goal{4} (9000 9300 9600 9900);

array Achieved{4};

A DO loop executes four times to calculate the value of each element of the Achieved array (expressed as a percentage). You can drop the index variable from the new data set by adding a DROP= option to the DATA statement.

data finance.report (drop=i);

set finance.qsales;

array sale{4} sales1-sales4;

array Goal{4} (9000 9300 9600 9900);

array Achieved{4};

do i=1 to 4;

achieved{i}-100*sale{i}/goal{i};

end;

run;

  • This practice is an example of a simple table-lookup program.
  • The resulting data set contains the variables that were read from finance.qsales, plus the eight variables created with ARRAY statements
  • Variables to which initial values are assigned in an ARRAY statement are automatically retained
  • Goal1 through Goal4 should have been created as temporary array elements, since they were only used to create Achieve1 through Achieve4

    1. Creating Temporary Array Elements
      1. To create temporary array elements for DATA step processing without creating new variables, specify _TEMPORARY_ after the array name and dimension

data finance.report (drop=i);

set finance.qsales;

array sale{4} sales1-sales4;

array Goal{4} _temporary_ (9000 9300 9600 9900);

array Achieved{4};

do i=1 to 4;

achieved{i}-100*sale{i}/goal{i};

end;

run;

  1. Creating Multidimensional Arrays
    1. Understanding Multidimensional Arrays
      1. You can group variables into table-like structures called multidimensional arrays

Practice Example

Suppose you want to write a DATA step to compare responses on a quiz to the correct answers. As long as there is only one correct answer per question, this is a simple one-to-one comparison.

Resp1 Answer1

Resp2 Answer2

However, if there is more than one correct answer per question, you must compare each response to each possible correct answer in order to determine whether there is a match.

Resp1 Answer1 Answer2 Answer3

Reps2 Answer4 Answer5 Answer6

You can process the above data more easily by grouping the Answer variables into a two-dimensional array. Just as you can think of a one-dimensional array as a single row of variables, as in this example…

Answer1 Answer2 Answer3 Answer4 … Answer 9 Answer 10

…you can think of a two-dimensional array as multiple rows of variables.

Answer1 Answer2 Answer3

Answer4 Answer5 Answer6

Answer7 Answer8 Answer9

    1. Defining a Multidimensional Array
      1. You specify the number of elements in each dimension, separated by a comma.
      2. This ARRAY statement defines a two-dimensional array:

        1. array new{3,4} x1-x12;
      3. In a two-dimensional array, the two dimensions can be thought of as a table of rows and columns:

Columns

Rows

x1

x2

x3

x4

x5

x6

x7

x8

x9

x10

x11

x12

array new(r,c) x1-x12;

        1. The first dimension in the ARRAY statement specifies the number of rows
        2. The second dimension specifies the number of columns
        3. You can perform an action on the variable x7 by specifying the array reference new(2,3)

array new{3,4} x1-x12;

new(2,3)=0;

        1. The array elements are grouped in the order in which they are listed in the ARRAY statement. For example, the array elements x1 through x4 can be though of as the first row of the table. The elements x5 through x8 become the second row, and so on.
    1. Referencing Elements of a Two-Dimensional Array
      1. Multidimensional arrays are typically used with nested DO loops. The next example uses a one-dimensional array, a two-dimensional array, and a nested DO loop to re-structure a set of variables.

Practice Example

Your company’s sales figures are stored by month in the SAS data set finance.monthly. Your task is to generate a new data set of quarterly sales rather than monthly sales.

Variable

Type

Length

Year

num

8

Month1

num

8

Month2

num

8

Month3

num

8

Month4

num

8

Month5

num

8

Month6

num

8

Month7

num

8

Month8

num

8

Month9

num

8

Month10

num

8

Month11

num

8

Month12

num

8

Defining the array m{4,3} puts the variables Month1 through Month12 into four groups of three months (yearly quarters)

data finance.quarters;

set finance.monthly;

array m{4,3} month1-month12;

Defining the array Qtr{4} creates the numeric variables Qtr1 Qtr2 Qtr3 Qtr4, which will be used to sum the sales for each quarter.

data finance.quarters;

set finance.monthly;

array m{4,3} month1-month12;

array Qtr{4};

A nested DO loop is used to reference the values of the variables Month1 through Month12 and to calculate the values of Qtr1 through Qtr4. Because the variables i and j are used only for loop processing, the DROP= option is used to exclude them from the finance.quarters data set.

data finance.quarters (drop=i j);

set finance.monthly;

array m{4,3} month1-month12;

array Qtr{4};

do i=1 to 4;

qtr{i}=0;

do j=1 to 3;

qtr{i}+m{i,j};

end;

end;

The assignment statement qtr{i}=0 sets qtr to zero after each iteration of the first DO loop. Without the assignment statement, the values of Qtr1, Qtr2, Qtr3 and Qtr4 would accumulate across iterations of the data step due to the qtr{i}+m{i,j} sum statement within the DO loop.

Example Problem

Complete the program by replacing the question marks (?). Begin by grouping the variables Item1 through Item18 into a two-dimensional array of three rows of six elements. Next, complete the DO loop. Make it perform its calculation on each group of survey questions. Print the data set.

data sasuser.summary (drop=i j total);

set sasuser.suvery;

array section{3} eating exercise stress;

array resp{3,6} Item1-Item18;

do i=1 to 3; one for each selection/row

Total=0;

do j=1 to 6; one for each Item/column in the row

total+(resp({i,j});

end;

section{i}=total/6;

end;

run;

Eating:

Item1

Item2

Item3

Item4

Item5

Item6

Exercise

Item7

Item8

Item9

Item10

Item11

Item12

Stress

Item13

Item14

Item15

Item16

Item17

Item18

  1. Additional Features
    1. Transposing (rotating)
      1. Where you change variables into observations and observations into variables
      2. The following program is an example of rotating a sas data set:

LastName

Qtr1

Qtr2

Qtr3

Qtr4

ADAMS

18

19

22

18

ALEXANDE

17

15

10

12

APPLE

25

25

25

25

ARTHUR

10

25

20

30

data work.rotate (drop=qtr1-qtr4);

set finance.funddrive;

array contrib{4} qtr1-qtr4; creates a one dimensional array “contrib” and specifies variables qtr1, qtr2, qtr3, qtr4

do Qtr=1 to 4;

Amount=contrib{qtr}; sets amount equal to contrib’s corresponding index-value’s Qtr-value (qtr=1, then Amount=Qtr1-value)

output; tells each observation to be output

end;

run;

proc print data=rotate (obs=16) noobs; prints rotate, 16 obs, no obs column

run;

Lesson Twenty-Six – Improving Program Efficiency w/ Macro Variables

  1. Understanding Macro Variables
    1. Macro variables substitute text in programs so one change can appear throughout the entire program
    2. You can assign any text string to a macro variable
    3. Macro variables can be defined and referenced anywhere in a SAS program except within data lines (will not work when reading instream data)
    4. Two Types of Macro Variables
      1. Automatic macro variables
      2. User-defined macro variables
  2. Automatic Macro Variables
    1. Whenever SAS runs, automatic macro variables are created, such as:

      1. the date or time a SAS job or session began executing
      2. which release of SAS you are running
      3. the name of the most recently created SAS data set
      4. the abbreviation for your host operating system
    2. SYSDATE Macro Variable
      1. Provides the date on which the program is executed
      2. Reference it by including an ampersand (&) followed by the macro variable name: &SYSDATE
      3. NOTE: Macro variable references that appear in quoted strings MUST be enclosed in double quotation marks (if single is used, the macro will not be resolved)
        1. footnote ‘Report Run on &sysdate’ macro will not resolve
        2. footnote “Report Run on &sysdate” macro will resolve
    3. Commonly Used Automatic Macro Variables

Name

Information Supplied

Example

SYSDATE9

Date the job or session began executing

21APR2000

SYSDATE

Date the job or session began executing

16FEB98

SYSDAY

Weekday the job or session began executing

Tuesday

SYSTIME

Time the job or session began executing

15:32

SYSSCP

Operating system abbreviation

WIN

SYSVER

SAS version and/or release number

8.0

SYSLAST

Name of the most recently created data set

HRD.TEMP99

    1. Note: Do not begin user-defined macro variables with “SYS”, as SAS reserves the right to use this prefix for their automatic macro variables

Example Problem

Rewrite the FOOTNOTE statement to display the day of the week on which the SAS session began. Cancel the footnote statement.

title;

footnote “Date: &sysday, &sysdate9”;

data sasuser.talent99;

set sasuser.talent;

if year(lasthired)=1999;

format birthdate lastdate date.;

run;

proc print data=sasuser.talent99;

run;

footnote ;

Example Problem

Add or modify statements to reference other macro variables. Cancel the title statements.

title1 ‘Temporary Employees for 1999’;

title2 “as of &sysday, &sysdate9”;

data work.talent99;

set sasuser.talent;

opsystem=”&sysscp”;

if year(lasthired)=1999;

format birthdate lasthired date.;

run;

proc print data=work.talent99;

run;

title1 ;

title2 ;

  1. Creating Your Own Macro Variables
    1. The %LET Statement
      1. General Form
        1. &LET name=value;
        2. Where
          1. name specifies the name of your macro variable
          2. value is the value of the macro variable
      2. If a value is stored with quotation marks, then that is stored with the value

        1. &let region=northwest value is northwest
        2. &let region=’northwest’ value is ‘northwest’

Example Problem

Revise the program to create the two macro variables Number and Name. Assign the value 11 to Number, and assign the value November to Name. Then reference the macro variables in the code. Cancel the Title and Footnote statements.

%let name=November;

%let number=11;

title1 Actors Hired in &name”;

footnote1 Report Number &number”;

data sasuser.newhire;

set sasuser.talent99;

if month(lasthired)=&number;

format lasthired date9.;

run;

proc print data=sasuser.newhire;

run;

title1 ;

footnote1 ;

    1. SYMBOLGEN System Option
      1. This option specifies whether messages about the resolution of macro variable references are written to the SAS log

        1. can help verify the values of macro variables that are referenced in a program
      2. General Form
        1. OPTIONS NOSYMBOLGEN | SYMBOLGEN;
          1. Where

NOSYMBOLGEN specifies that the values of macro variables will not be displayed in the log (this is default)

SYMBOLGEN specifies that the values of macro variables will be displayed in the log

options symbolgen;

%let name=November;

%let number=11;

title1 “Actors Hired in &name”;

footnote1 “Report Number &number”;

data sasuser.newhire;

set sasuser.talent99;

if month(lasthired)=&number;

format lasthired date9.;

run;

proc print data=sasuser.newhire;

run;

title1 ;

footnote1 ;

Example Problem

Modify the program below by adding in a statement that will put a message in the log about the macro variable resolutions. Cancel the title and footnote statements.

options symbolgen;

%let name=November;

%let number=11;

title1 “Actors Hired in &name”;

footnote1 “Report Number &number”;

data sasuser.newhire;

set sasuser.talent99;

if month(lasthired)=&number;

format birthdate lasthired date.;

run;

proc print data=sasuser.newhire;

run;

title1 ;

footnote1 ;

    1. Combining Macro Variable References with Prefixes
      1. When you combine macro variable references with prefixes, SAS forms the words for you

%let yr=1999;

data hrd.temp&yr; hrd.temp1999;

      1. When combining a macro variable prefix and a suffix, you must separate them with a period

%let endtime=end;

if year(&endtime.date)=&yr; resolves to: enddate (remember the period!)

      1. If your macro variable reference must appear before a suffix that begins with a period, you must specify two periods

%let libref=hrd;

data &libref..temp; resolves to hrd.temp (remember two periods!)

  1. Creating Macro Variables During DATA Step Execution
    1. CALL SYMPUT
      1. enables you to assign a value that is produced in the DATA step to a macro variable
      2. CALL SYMPUT is a CALL routine, which is a program that you can run from the DATA step

        1. invoked by means of a CALL statement
      3. General Form
        1. CALL SYMPUT(name,value);
          1. Where

Name is the name of the macro variable to be defined. The variable can be a character string enclosed in quotation marks, a character variable, or a character expression

Value is the value to be assigned to the macro variable. The value can be a text string enclosed in quotation marks, a data set variable, or a DATA step expression

Practice Example:

if overtime ne .;

TotalOvertime+overtime;

call symput(‘total’,totalovertime);

title “Temporary Employees Worked &total Overtime Hours”; remember to put macro references after creating the macro variable (not before)

If totalovertime is a numeric value, it will automatically convert to character (because ‘title’ is specified as a text-string) and might create a value that contains leading blanks. To avoid this (potential) problem, use a PUT/INPUT statement to keep macro variable names and values the same type manually instead of automatically (with the appropriate informat/format set).

call symput(‘total’,put(totalovertime,2.));

Example Problem

Modify the program to create a macro variable Total that has the value of the variable TotFee. Add the PUT function to the CALL SYMPUT routine and use the DOLLAR6. format.

%let number=11;

%let name=November;

%let abbrev=nov;

%let year=99;

footnote1 “Report Number &number”;

data &abbrev.hire;

set sasuser.talent&year;

if month(lasthired)=&number then

do;

Fee=rate*.10;

TotFee+fee;

output;

end;

format lasthired date9.;

call symput(‘total’,put(TotFee,DOLLAR6.));

run;

title1 “Actors Hired in &name”;

title2 “Agency Commision &total”;

proc print data=&abbrev.hire;

run;

    1. Specifying When CALL SYMPUT Executes
      1. You can control the execution of the CALL SYMPUT routine by using the END= option in the SET statement and placing the CALL SYMPUT routine in an IF-THEN statement
      2. END= is specified in the SET statement to specify a temporary variable that contains an end-of-file indicator

        1. it initializes at 0 and is set to 1 at the end of the last observation read
        2. setting END=last will set the variable “last” as the end-of-file indicator
        3. The following example tells CALL SYMPUT to execute only after last=1 (after the final observation has been read)

data hrd.overtime;

set hrd.temp (keep=name overtime) end=last;

if overtime ne .;

TotalOvertime+overtime;

if last=1 then call

symput(‘total’,put(totalovertime,2.));

run;

title “Temporary Employees Worked &total Overtime Hours”;

proc print data=hrd.overtime;

run;

      1. The if-then statement can also be written as follows (since the default is 1)

if last then call

symput(‘total’,put(totalovertime,2.));

Example Problem

Modify the program to create the temporary variable Final to store an end-of-file marker. Then edit the program to execute the CALL SYMPUT routine when the last observation is read.

%let number=11;

%let name=November;

%let abbrev=nov;

%let year=99;

footnote1 “Report Number &number”;

data &abbrev.hire;

set sasuser.talent&year end=final;

if month(lasthired)=&number then

do;

Fee=rate*.10;

TotFee+fee;

output;

end;

if final=1 then call symput(‘total’,put(TotFee,DOLLAR6.));

run;

title1 “Actors Hired in &name”;

title2 “Agency Commision &total”;

proc print data=&abbrev.hire;

run;

    1. Precautions/Things to Remember
      1. You cannot reference a macro variable that is created by the CALL SYMPUT routine within the same DATA step in which the macro variable is created

        1. Macro variable references are resolved before the compilation of a DATA step and CALL SYMPUT creates a macro variable during the execution of the DATA step.
      2. You cannot reference a macro variable that is created by the CALL SYMPUT routine in a global statement (such as a TITLE statement) that precedes the DATA step in which the macro variable is created (duh)
      3. When referencing text-string macro variables, be sure to encase them in double quotation marks

        1. Day=“&sysday”;

Lesson Twenty-Seven – Reading Raw Data in Fixed Fields

  1. Review of Column Input
    1. Use the INPUT statement to specify the variables to add

      1. Put a “$” after character variables

1—+—-10—+—-20—+–

BIRD FEEDER LGO88 3 20

GLASS MUGS SB082 6 12

GLASS TRAY BQ049 12 6

PADDED HANGRS MN256 15 20

JEWELRY BOX AJ498 23 0

RED APRON AQ072 9 12

CRYSTAL VASE AQ672 27 0

PICNIC BASKET LS930 21 0

data info;

infile information;


input Item $ 1-13 IDnum $ 15-19 InStock 21-22 BackOrd 24-25;

run;

    1. Whatever order the variables are listed in input is the order they will be stored in

input InStock 21-22 BackOrd 24-25 Item $ 1-13 IDnum $ 15-19;

    1. Column Input Features
      1. It can be used to read character variable values that contain embedded blanks

input Name $ 1-25;

1—+—-10—+—-20—+–

JOSEPH PAUL THACKERY JR.

      1. No placeholder is required for missing data. A blank field is read as missing and does not cause other fields to be read incorrectly.

input Item $ 1-13 IDnum $15-19 InStock 21-22 BackOrd 24-25;

1—+—-10—+—-20—+–

BIRD FEEDER LGO88 3 20

GLASS MUGS SB082 12

GLASS TRAY BQ049 12 6

      1. Fields or parts of fields can be re-read.

input Item $ 1-13 IDnum $15-19 Supplier $ 15-16 InStock 21-22 BackOrd 24-25;

1—+—-10—+—-20—+–

BIRD FEEDER LGO88 3 20

GLASS MUGS SB082 6 12

GLASS TRAY BQ049 12 6

      1. Fields do not have to be separated by blanks or other delimiters.

input Item $ 1-13 IDnum $ 14-18 InStock 19-20 BackOrd 21-22;

1—+—-10—+—-20—+–

PADDED HANGRSMN2561520

GLASS MUGS SB082 612

GLASS TRAY BQ04912 6

Example Problem

Create sasuser.choltest from the file reference by the fileref Choldata. Read the fields in this order: IDnum, Department, LastName, and Cholesterol. Print the data set.

data sasuser.choltest;

infile Choldata;

input IDnum $10-13 Department $15-16 LastName $ 1-8 Cholesterol 18-22;

proc print data=sasuser.choltest;

run;

  1. Identifying Nonstandard Numeric Data
    1. Standard Numeric Data
      1. Values contain only

        1. Numbers
        2. Decimals
        3. Numbers in scientific, or E, notation (23E4)
        4. minus signs and plus signs

Examples: 15, -15, 15.4, +.05, 1.54E3, -1.54E-3.

    1. Nonstandard Numeric Data
      1. Includes

        1. values that contain special characters, such as percent signs (%), dollar signs ($), and commas (,)
        2. data and time values
        3. data in faction, integer binary, real binary, and hexadecimal forms
  1. Choosing an Input Style
    1. Nonstandard data values require an input style that has more flexibility than column input

      1. You can use formatted input, which combines the features of column input with the ability to read both standard and nonstandard data
    2. Whenever you encounter raw data that is organized into fixed fields, you can use:

      1. column input to read standard data only
      2. formatted input to read both standard and nonstandard data
  2. Using Formatted Input
    1. General Form
      1. INPUT <pointer-control> variable informat.;
        1. Where
          1. pointer control positions the input pointer on a specified column

@n moves the input pointer to a specific column number

+n moves the input pointer forward to a column number that is relative to the current position

          1. variable is the name of the variable that is being created
          2. informat is the special instruction that specifies how SAS reads raw data
    1. Using the @n Column Pointer Control
      1. Moves the input pointer to a specific column number
      2. General Form
        1. INPUT @n variable informat.;
        2. To point to a column, use an @ sign and the column number in the INPUT statement

input LastName $7. @9 FirstName $5.;

    1. Reading Columns in Any Order
      1. You can use the @n to move a pointer forward or backward when reading a record

V—+—V10—VV20—+–

EVANS DONNY 112 29,996.63

HELMS LISA 105 18,567.23

HIGGINS JOHN 111 25,309.00

input @9 FirstName $5. @1 LastName $7. @15 JobTitle 3. @19 Salary comma9.;

    1. The +n Pointer Control
      1. Moves the input pointer forward to a column number that is relative to the current position (moves it forward n columns)
      2. General Form
        1. INPUT +n variable informat.;
      3. Use +(-n) to move the pointer backwards

V—+–VV10–VVV20—+–

EVANS DONNY 112 29,996.63

HELMS LISA 105 18,567.23

HIGGINS JOHN 111 25,309.00

input LastName $7. +1 FirstName $5 +5 Salary comma9. @15 JobTitle 3.;

  1. Using Informats
    1. An informat is an instruction that tells SAS how to read raw data.
  • PERCENTw.d
  • $BINARYw.
  • $VARYINGw.
  • $w.
  • COMMAw.d
  • DATEw.
  • DATETIMEw.
  • HEXw.
  • JULIANw.
  • MMDDYYw.
  • NENGOw.
  • PDw.d
  • PERCENTw.
  • TIMEw.
  • w.d

    1. Note that
      1. each informat contains a w value to indicate the width of the raw data field
      2. each informat also contains a period, which is a required delimiter
      3. for some informats, the optional d value specifies the number of implied decimal places
      4. informats for reading character data always begin with a dollar sign ($)
    2. Reading Character Values
      1. $w. informat enables you to read character data
        1. In the following example, $ indicates FirstName is a character variable, the 5 indicates a field width of five columns, and a period ends the informat

input @9 FirstName $5.;

    1. Reading Standard Numeric Data
      1. w.d informat enables you to read standard numeric data
      2. w specifies the field width of the raw data, and d specifies the number of implied decimal places
      3. the w.d informat ignores any specified d value if the data already contains a decimal point

Raw Data Value

w. Informat

Variable Value

34.0008

7.

34.0008

      1. Be certain to specify the period in the informat name. For example, if you omit the period in the following INPUT statement, SAS assigns the value 3 to JobTitle instead of reading JobTitle with the 3. informat

input @9 FirstName $5. +7 JobTitle 3;

    1. Reading Nonstandard Numeric Data
      1. The COMMAw.d informat is used to read numeric values and to remove embedded

        1. blanks
        2. commas
        3. dashes
        4. dollar signs
        5. percent signs
        6. right parenthesis
        7. left parenthesis, which are converted to minus signs
      2. The COMMAw.d informat has three parts:

        1. the informat name: COMMA
        2. a value that specifies the width of the field to be read (including dollar signs, decimal places, or other special characters), followed by a period: w.
        3. an optional value that specifies the number of implied decimal places for a value (not necessary if value already contains decimal places): d
      3. The COMMAw.d informat does more than simploy read the raw data values. It removes special characters such as commas from numeric data and stores only numeric values in a SAS data set
    2. DATA Step Processing of Informats
      1. Numeric values will be automatically stored with a length of 8 (no matter how many digits the value contains, or what is put in the INPUT statement)
  1. Record Formats
    1. Fixed-Length Records
      1. External files that have a fixed-length record format have an end-of-record marker after a predetermined number of columns

        1. A typical record length is 80 columns
    2. Variable-Length Records
      1. Files that have a variable-length record format have an end-of-record marker after the last field in each record

        1. The length of each record varies
    3. Reading Variable-Length Records
      1. Values that are shorter than others or missing might cause problems with their end-of-file markers (* = end-of-file marker)

Practice Example:

1—+—-10—+—-20—+–

EVANS DONNY 112 29,996.63*

HELMS LISA 105 8,567.23*

HIGGINS JOHN 111 309.00*

input FirstName $ 9-13 LastName $ 1-7 @19 Receipts comma8.;

Note: In the second record, the pointer hits the end-of-record marker before reading 8. It goes on to the third record and attempts to read the character value HIGGINS, and since Receipts is a numeric variable an invalid data error occurs and Receipts is set to missing.

    1. The PAD Option
      1. The PAD Option pads each record with blanks so that all data lines have the same length

infile receipts pad;

      1. The PAD option is only useful when missing data occurs at the end of a record and when SAS encounters an end-of-record marker before the last field is completely read
      2. The default value of the maximum record length is determined by your operating system

        1. if you get unexpected results when reading many variables, you might need to change the maximum record length by specifying the LRECL=option in the INFILE statement

Lesson Twenty-Eight – Reading Free-Format Data

  1. Free Format Data
    1. Data that is not arranged in fixed fields is considered Free Format
    2. General Form
      1. INPUT variable <$>;
        1. Where
          1. variable specifies the variable whose value the INPUT statement is to read
          2. $ specifies that the variable is a character variable

Practice Example:

1—+—-10—+—-20

MALE 27 1 8 0 0

FEMALE 29 3 14 5 10

FEMALE 34 2 10 3 3

MALE 35 2 12 4 8

FEMALE 36 416 3 7

MALE 21 1 5 0 0

MALE 25 2 9 2 1

Suppose you have an external data file like the one shown above. The file, which is reference by the fileref Credit, contains the results of a survey on the use of credit cards by males and females in the 18-39 age range.

You need to read the data values for

  • Gender
  • Age
  • Number of bank credit cards
  • Bank card use per month
  • Number of department store credit cards
  • Department store card use per month

List input maybe be the easiest input style to use because, as shown in the INPUT statement below, you simply list the variable names in the same order as the corresponding raw data fields. Remember to distinguish character variables from numeric variables.

input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

Because list input, by default, does not specify column locations:

  • All fields must be separated by at least one blank or other delimiter
  • Fields must be read in order from left to right
  • You cannot skip or re-read fields

    1. Processing List Input
      1. List input is processed where SAS reads each value until it encounters a blank or other delimiter
    2. Working with Delimiters
      1. Use the DLM= option in the INFILE statement to specify a delimiter other than a blank
      2. General Form:
        1. DLM=delimiter(s)
          1. Where delimiters specifies a delimiter for list input in either of the following forms:

‘list-of-delimiting-characters’ specifies one or more characters (up to 200) to read as delimiters. The list of characters must be enclosed in quotation marks.

character-variable specifies a character variable whose value becomes the delimiter

      1. The field delimiter must not be a character that occurs in a data value

data perm.survey;

infile credit dlm=’,’;

input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

run;

    1. Reading a Range of Variables
      1. When the variable values in the raw data file are sequential and are separated by a blank (or by another delimiter), you can specify a range of variables in the INPUT statement
      2. The following example creates fie new numeric variables and assigns them the names Ques1, Ques2, Ques3, Ques4 and Ques5:

input IDnum $ Ques1-Ques5;

      1. You can also specify a range in the VAR statement in the PROC PRINT step to list a range of variables

var ques1-ques3;

      1. If you are specifying a range of character variables, both the variable list and the $ sign must be enclosed in parentheses

input IDnum $ (Ques1-Ques5) ($);

      1. You can also specify a range of variables using formatted input. If you specify a range of variables using formatted input, both the variable list and the format must be enclosed in parenthesis, regardless of the variable’s type

input IDnum $ (Ques1-Ques5) (6.);

    1. Limitations of List Input
      1. In its default form, list input places several restrictions on the types of data that can be read (note: there are ways to work around these limitations using modified list input)

        1. Although the width of a field can be greater than eight columns, both character and numeric variables have a default length of 8. Character values that are longer than eight characters will be truncated.
        2. Data must be in standard numeric or character format
        3. Character values cannot contain embedded delimiters
        4. Missing numeric and character values must be represented by a period or some other character
  1. Reading Missing Values
    1. Reading Missing Values at the End of a Record
      1. Use MISSOVER option in the INFILE statement

        1. The MISSOVER option prevents SAS from going to another record if, when using list input, it does not find values in the current line for all the INPUT statement variables
        2. At the end of the current record, values that are expected but not found are set to missing

data perm.survey;

infile credit missover;

input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

run;

    1. Reading Missing Values at the Beginning or Middle of a Record
      1. When specifying delimiters, it is important to remember that any number of delimiters in a row are all interpreted as 1 delimiter

Practice Example:

When the program below executes, each field in the raw data file is read one by one. The INPUT statement tells SAS to read six data values from each record. However, the first record contains only five values.

data permsurvey;

infile credit dlm=’,’;

input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

run;

The two commas in the first record are interpreted as one delimiter. The incorrect value (1) is read for Age. The program continues to read subsequent incorrect values for Bankcard (8), FreqBank(0), and Deptcard (0). The program then attempts to read the character field FEMALE, at the beginning of the second record, as the value for the numeric variable FreqDept. This causes the value of FreqDept in the first observation to be interpreted as missing. The input pointer then moves down to the third record to begin reading values for the second observation. Therefore, the first observation in the data set contains incorrect values and values from the second record in the raw data file are not included.

      1. Use the DSD option in the INFILE statement to correctly read raw data with missing observations at the beginning or middle of the file
      2. The DSD option:

        1. sets the default delimiter to a comma
        2. treats two consecutive delimiters as a missing value
        3. removes quotation marks from values
      3. If the data uses multiple delimiters or a single delimiter other than a comma, simply specify the delimiter value(s) with the DLM= option.

data permsurvey;

infile credit dsd dlm=’*‘;

input Gender $ Age Bankcard FreqBank Deptcard FreqDept;

run;

      1. The DSD option can be used to read raw data when there is a missing value at the beginning of a record as long as a delimiter precedes the first value in the record
  1. Specifying the Length of Character Values
    1. The LENGTH Statement
      1. Variable attributes are defined when the variable is first encountered in the DATA step

        1. Even if another statement specifies a variable as numeric (when it is introduced as character), it will still be character (but might error)

data perm.growth;

infile citydate;

length City $ 12;

input city $ Pop70 Pop80; Doesn’t need to be specified again, but won’t error

run;

  1. Modifying List Input
    1. The Two Modifiers
      1. The ampersand (&) modifier is used to read character values that contain embedded blanks
      2. The colon (:) modifier is used to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks

1—+—-10—+—-20—+–

1 NEW YORK 7,262,700

2 LOS ANGELES 3,259,340

3 CHICAGO 3,009,530

4 HOUSTAN 1,728,910

    1. Reading Values That Contain Embedded Blanks
      1. The ampersand (&) enables you to read character values that contain single embedded blanks

        1. The value is read until two or more consecutive blanks are encounter
        2. The & modifier precedes a specified informat if one is used
        3. If an informat is used, the w will replace the variable’s default length (default length = 8)

input Rank City & $11.; $11. overrides the default

or

length City $ 11; A length statement can be used

input Rank city &;

      1. Note: You must use two consecutive blanks as delimiters when you use the & modifier. You cannot use any other delimiter to indicate the end of each field
    1. Reading Nonstandard Values
      1. The colon (:) modifier enables you to read nonstandard data values and character values that are longer than eight characters, but which contain no embedded blanks

        1. If an informat is used, the w will replace the variable’s default length (default length = 8)
        2. List input reads each value until the next blank (or delimiter specified) is detected. The default length of numeric variables is 8.

          1. By default, if a numeric value is >8 digits it will be truncated to 8. Using a length statement or an informat with the list input modifiers will override this default of 8.

1—+—-10—+—-20—+–

1 NEW YORK 7,262,700

2 LOS ANGELES 3,259,340

3 CHICAGO 3,009,530

4 HOUSTAN 1,728,910

data perm.cityrank;

infile topten;

input Rank City & $11. Pop86 : comma.;

Note: COMMAw.d informat does not specify a w value because the default is 8. If, after all commas and special characters are removed, the value is less than or equal to 8 digits then the default does not need to be overridden. In this example, the longest value is “7,262,700”. After COMMA removes the special characters, it is stored as “7262700” and is only 7 digits in length.

This is different from using a numeric informat with formatted input. In that case, you must specify a w value in order to indicate the number of columns to be read.

    1. Comparing Formatted Input and Modified List Input
      1. Formatted Input
        1. Informats determine both the length of character variables and the number of columns that are read
      2. List Input
        1. Informats determine only the length of the variable, not the number of columns that are read (values are read until a delimiter or two consecutive blanks are encountered)
  1. Creating Free-Format Data
    1. The PUT statement can be used with list output to create free-format raw data files
    2. General Form of PUT
      1. PUT variable <: format>;
        1. Where
          1. Variable specifies the variable whose value you want to write
          2. : precedes a format
          3. format specifies a format to use for writing the data values

data _null_;

set perm.finance;

file ‘c:\data\findat’;

put ssn name salary date : date9.;

run;

    1. Specifying a Delimiter
      1. You can use the DLM= option with a FILE statement to create a character-delimited raw data file

data _null_;

set perm.finance;

file ‘c:\data\findat’ dlm=’,’;

put ssn name salary date : date9.;

run;

    1. Note: For creating a simple raw data file, an alternative to the DATA step is the EXPORT procedure
    2. General Form of EXPORT
      1. PROC EXPORT DATA=SAS-dataset;

OUTFILE=filename <DELIMITER=’delimiter’>;

RUN;

        1. Where
          1. SAS-dataset names the input SAS data set
          2. filename specifies the complete path and filename of the output
          3. delimiter specifies the delimiter to separate columns of data in the output file
    1. Using the DSD Option
      1. Use the DSD Option to enclose values that contain commas in double quotation marks “” if a comma delimiter is desired

WRONG:

data _null_;

set perm.finance;

file ‘c:\data\findat2’ dlm=’,’;

put ssn name salary : comma6. date : date9.;

run;

The example above will recognize the commas in the salary values as delimiters and separate them into different columns

RIGHT:

data _null_;

set perm.finance;

file ‘c:\data\findat2’ dsd; specifies comma as delimiters by default

put ssn name salary : comma6. date : date9.;

run;

    1. Reading Values That Contain Delimiters Within a Quoted String
      1. Use the DSD option in an INFILE statement to read values that contain delimiters within a quoted string
      2. The DSD option enables the INFILE statement to treat commas within the values as valid characters and removes the quotation marks from the character strings before storing the value
  1. Mixing Input Styles
    1. With some file layouts, you might need to mix input styles in the same INPUT statement in order to read the data correctly

1—+—-10—+—-20—+—-30—+—-40-

209-20-3721 07JAN78 41,983 SALES 2896

321-18-3261 02MAY86 27,332 EDUCATION 2344

456-26-9987 01JUN82 15,453 MARKETING 2674

777-31-7643 17DEC79 23,567 RESEARCH 2956

    1. Column input is an appropriate choice for the first field because the values can be read as standard character values and are located in fixed columns
    2. The next two fields are also located in fixed columns, but the values require an informat. So, formatted input is a good choice here.
    3. Values in the fourth field begin in column 28 but do not end in the same column. List input is appropriate here, but notice that some values are longer than eight characters. You need to use the : format modifier with an informat to read these values
    4. The last field does not always begin or end in the same column, so list input is the best input style for those values

data perm.mixed;

infile rawdata;

input ssn $ 1-11 @13 HireDate date7.

@21 Salary comma6. Department : $9. Phone;

run;

  1. Additional Features
    1. Writing Character Strings and Variable Values
      1. You can use a PUT statement to write both character strings and variable values to a raw data file.
      2. To write out a character string, simply add a character string, enclosed in quotation marks, to the PUT statement

        1. It is a good idea to include a blank space as the last character in the string to avoid spacing problems

filename totaldat ‘c:\records\junsales’;

data _null_;

set work.totals;

file totaldat;

put ‘Sales for salesrep ’ salesrep;

‘totaled ‘ sales : dollar9.;

run;

1—+—-10—+—-20—+—-30—+—-40—+

Sales for salesrep Friedman totaled $14,893

Sales for salesrep Keane totaled $14,324

Sales for salesrep Schuster totaled $13,914

Note: The blue values represent the variables salesrep and sales in the programLesson Twenty-Nine – Reading Date and Time Values

  1. Things to Remember When Using SAS Date and Time Values
    1. SAS date values are based on the Gregorian calendar, and they are valid for dates from A.D. 1582 through A.D. 20,000.

      1. Most of Europe started to use the Gregorian calendar in 1582
      2. Great Britain and the American colonies adopted it in 1752
      3. Check the adoption date for other parts of the world before making important calculations
    2. SAS makes adjustments for leap years but ignores leap seconds
    3. SAS does not make adjustments for daylight savings time
  2. Reading Dates and Times with Informats
    1. Common date and time informats

      1. DATEw.
      2. DATETIMEw.
      3. MMDDYYw.
      4. TIMEw.

SAS Date Informat

Date Expression Example

MMDDYYw.

10/15/99

DATEw.

15Oct99

MMDDYYw.

10-15-99

YYMMDDw.

99/10/15

    1. Specifying Informats
      1. Using the INPUT statement with the informat after the variable is the easiest way to read date and time values
      2. General Form
        1. INPUT <pointer-control> variable informat;

input @15 Style $3. @21 Price 5.2;

This example reads the variable “Style” with the character informat ($3.) starting at column 15 and going through column 17. It then starts at column 21 and goes 5 columns to read the numeric variable “Price” that contains 2 decimals.

    1. MMDDYYw. Informat
      1. Reads values in the form of: 10/15/99
      2. Can also read values that have a 4-digit year
      3. The month, day and year fields can be separated by blanks or delimiters such as – or /.

        1. Remember to include delimiters when specifying w
    2. DATEw. Informat
      1. Reads date values in the form of: 30MAY2000
      2. Can be in the format of “ddmmmyy” or “ddmmmyyyy”
      3. Can use delimiters, as long as they are included in the w
    3. TIMEw. Informat
      1. Reads time values in the form hh:mm:ss:ss
      2. ss:ss is an optional field (seconds and hundredths of seconds)

        1. if there is no value for ss:ss it will be assumed the value is zero
      3. Five is the minimum acceptable field width for the TIMEw. informat
    4. DATETIMEw. Informat
      1. Reads expressions that are composed of two parts, a date value and a time value in the form: ddmmmyy hh:mm:ss:ss
      2. The date value and the time value are separated by a blank or other delimiter
      3. Can have 2- (yy) or 4-digit (yyyy) year
    5. YEARCUTOFF= SAS System Option
      1. Affects only two-digit year values
      2. The default is 1920 (100 year span)

        1. 0-19 will return a date in the 2000’s by default
        2. 21+ will return a date in the 1900’s by default
    6. When You Work With Date and Time Values:
      1. check the value of YEARCUTOFF= and change it if necessary
      2. specify the proper informat for reading a date value
      3. specify the correct field width so that the entire date value is read
  1. Using Dates and Times in Calculations

Practice Example:

1—+—-10—+—-20—+—-30—+—-40

Akron 04/05/99 04/09/99 175.00 298.45

Brown 04/12/99 05/01/99 125.00 326.78

Jamison 04/27/99 04/29/99 125.00 174.24

Denison 04/11/99 04/12/99 175.00 87.41

Suppose you work in the billing department of a small community hospital. It’s your job to create a SAS data set from the raw data file that is reference by the fileref Aprdata. A portion of the raw data file above shows data values that represent each patients:

  • Last name
  • Date checked in
  • Date checked out
  • Daily room rate
  • Equipment cost

The data set that you create must also include variable values that represent how many days each person stayed in the hospital, the total room charges, and the total of all expenses that each patient incurred. When building the SAS program, you must first name the data set, identify the raw data file Aprdata, and use formatted input to read the data.

options yearcutoff-1920;

data perm.aprbills;

infile aprdata;

input LastName $8. +1 DateIn mmddyy8. +1 DateOut mmddyy8.

RoomRate 6. @35 EquipCost 6.;

Days=dateout-datein+1; hospitals charge for first and last day, +1

RoomCharge=days*roomrate;

Total=RoomCharge +Equipcost;

run;

  1. Using Date and Time Formats
    1. WEEKDATEw. Format
      1. Writes date values in the form of day-of-week, month-name dd, yy (or yyyy)

proc print data=perm.aprbills;

format datein dateout weekdate17.;

run;

      1. The results vary depending on the w value in the format

FORMAT Statement

Result

format datein weekdate3.

Mon

format datein weekdate6.

Monday

format datein weekdate17.

Monday, Apr 5, 99

format datein weekdate21.

Monday, April 5, 1999

    1. WORDDATEw. Format
      1. Writes date values in the form of month-name dd, yyyy

proc print data=perm.aprbills;

format datein dateout worddate12.;

run;

FORMAT Statement

Result

format datein worddate3.

Apr

format datein worddate5.

April

format datein worddate14.

April 15, 1999

options yearcutoff-1920;

data perm.aprbills;

infile aprdata;

input LastName $8. +1 DateIn mmddyy8. +1 DateOut mmddyy8.

RoomRate 6. @35 EquipCost 6.;

Days=dateout-datein+1;

RoomCharge=days*roomrate;

Total=RoomCharge +Equipcost;

format datein dateout worddate12.;

run;

Lesson Thirty – Creating a Single Observation from Multiple Records

  1. Using Line Pointer Controls
    1. Types of Pointer Controls

      1. Column Specifications
        1. Positions input pointer to read specific columns
        2. input Name $ 1-12
      2. Column Pointer Controls
        1. Positions input pointer on specific column
        2. input Name $12. @15 Age 2.
      3. Line Pointer Controls
        1. Positions input pointer on specific record
        2. input #2 Name $ 1-12
    2. Two Types of Line Pointer Controls
      1. forward slash (/) specifies a line location that is relative to the current one
      2. #n specifies the absolute number of the line to which you want to move the pointer
  2. Reading Multiple Records Sequentially
    1. The Forward Slash (/) Line Pointer Control
      1. Skips to the next line in the raw data file

1—+—-10—+—-20

Akron

04/05/99 04/09/99

175.00 298.45

Brown

04/12/99 05/01/99

125.00 326.78

Jamison

04/27/99 04/29/99

125.00 174.24

Denison

04/11/99 04/12/99

175.00 87.41

input LastName $ 1-7. / DateIn mmddyy8. DateOut mmddyy8. / Room 6. Equip 6.;

    1. Number of Records Per Observation
      1. The file must contain the same number of records for each observation

        1. In the previous example, if one of the DateIn values is missing, the program would skip to the next line searching for a value for “datein” and error because RoomRate is numeric and not compatible with the date informat used
  1. Reading Multiple Records Non-Sequentially
    1. The #n Line Pointer Control
      1. The #n specifies the absolute number of the line to which you want to move the input pointer

        1. it must be specified before the instructions for reading values in a specific record

1—+—-10—+—-20

Akron

04/05/99 04/09/99

175.00 298.45

Brown

04/12/99 05/01/99

125.00 326.78

Jamison

04/27/99 04/29/99

125.00 174.24

Denison

04/11/99 04/12/99

175.00 87.41

input #2 DateIn mmddyy8. DateOut mmddyy8. #1 LastName $ 1-7 #3 Room 6. Equip 6.;

Practice Example :

1—+—-10—+—-20—

ALEX BEDWAN

609 WILTON MEADOW DRIVE

GARNER NC 27529

XMO34 FLOYD

ALISON BEYER

8521 HOLLY SPRINGS ROAD

APEX NC 27502

XF124 LAWSON

data perm.patients;

infile patdata;

input #4 ID $5.

#1 Fname $ Lname $

#2 Address $23.

#3 City $ State $ Zip $

#4 @7 Doctor $6.;

run;

  1. Combining Line Pointer Controls
    1. The forward slash (/) and #n pointer controls can be combined together to read data both sequentially and non-sequentially

      1. The practice example above can use #n pointer controls as seen below:

data perm.patients;

infile patdata;

input #4 ID $5.

#1 Fname $ Lname $ /

Address $23. /

City $ State $ Zip $ /

@7 Doctor $6.;

run;

Lesson Thirty-One – Creating Multiple Observations from a Single Record

  1. Reading Repeating Blocks of Data
    1. Holding the Current Record with a Line-Holder Specifier
      1. SAS provides two line-hold specifiers

        1. The trailing at sign (@) holds the input record for the execution of the next INPUT statement
        2. The double trailing at sign (@@) holds the imputer record for the execution of the next INPUT statement, even across iterations of the DATA step
      2. The term trailing indicates that the @ or @@ must be the last item specified in the INPUT statement

input Name $20. @; or input Name $20. @@;

    1. Using the Double Trailing At Sign (@@) to Hold the Current Record
      1. Normally each time a DATA step executes, the INPUT statement reads a new record. But when you use the trailing @@, the INPUT statement holds the current record and reads the next value.

1—+—-10—+

102 92 78 103

84 23 36 75

Input Statement

N

Score

input Score;

2

84

input Score @@;

2

92

      1. The double trailing at sign (@@):

        1. works like the trailing @ except it also holds the data line in the input buffer across multiple executions of the DATA step
        2. typically is used to read multiple SAS observations from a single line of data
        3. should not be used with the @ pointer control, with column input, nor with the MISSOVER option
      2. A record that is being held by the double trailing at (@@) signis not released until either of the following events occurs:

        1. the input pointer moves past the end of the record

          1. then the input pointer moves down to the next record

1—+—-10–V+

102 92 78 103

84 23 36 75

        1. an INPUT statement that has no line-hold specifier executes

input ID $ @@;

input Department 5.;

Practice Example :

The following example reads two variables over and over throughout the record:

1—+—-1V—+—-20—+—-30–

01APR90 68 02APR90 67 03APR90 70

01APR90 68 02APR90 67 03APR90 70

01APR90 68 02APR90 67 03APR90 70

01APR90 68 02APR90 67 03APR90 70

data perm.april90;

infile tempdata;

input Date : date. HighTemp @@;

    1. Completing the DATA Step
      1. You can add a FORMAT statement to the DATA step to display date or time values with a specified format in the data set.

data perm.april90;

infile tempdata;

input Date : date. HighTemp @@;

format date date9.;

run;

Practice Example :

You want to pair each employee ID number with one quarterly sales total to produce a single observation. That way, four observations can be derived from one record.

1—+—-10—+—-20—+—-30—+—-40

0734 1,323.34 2,472.85 3,276.65 5,345.52

0734 1,323.34 2,472.85 3,276.65 5,345.52

0734 1,323.34 2,472.85 3,276.65 5,345.52

0734 1,323.34 2,472.85 3,276.65 5,345.52

This means the DATA step must:

  • Read the value for ID and hold the current record
  • Create a new variable named Quarter to identify the fiscal quarter for each sales figure
  • Read a new value for Sales and write the values to the data set as an observation
  • Continue reading a new value for Sales and writing values to the data set three more times

    1. Using the Single Trailing At Sign (@) To Hold the Current Record
      1. The single trailing (@):

        1. enables the next INPUT statement to be read from the same record
        2. releases the current record when a subsequent INPUT statement executes without a line-hold specifier
      2. Differences between @ and @@:
        1. the double trailing at sign (@@) holds a record across multiple iterations of the DATA step until the end of the record is reached
        2. the single trailing at sign (@) releases a record when control returns to the top of the DATA step

data perm.sales97;

infile data97;

input ID $ @; used to hold the current record for the next input statement

do Quarter=1 to 4; writes variable Quarter and loops 1 – 4

input Sales : comma. @; reads sales and then puts in a placeholder

output; outputs each observation (with same ID used before loop)

end;

run;

  1. Reading a Varying Number of Repeating Fields
    1. Use the MISSOVER Option
      1. Use the MISSOVER option in the INFILE statement to prevent SAS from reading the next record when missing values are encountered at the end of a record

        1. Can be used on records that have a varying number of repeating fields
    2. Other Related Options
      1. TRUNCOVER Option
        1. Reads the column or formatted input when the last variable that is read by the INPUT statement contains varying-length data
        2. The TRUNCOVER option assigns the contents of the input buffer to a variable when the field is shorter than expected
      2. FLOWOVER (the default)
      3. STOPOVER
      4. SCANOVER
    3. Executing SAS Statements While a Condition is TRUE
      1. Use a DO WHILE statement instead of the iterative DO statement, enclosing the expression in parenthesis (combine this with MISSOVER)

data perm.sales97;

infile data97 missover;

input ID $ @;

Quarter=0;

input Sales : comma. @;

do while (sales ne .); executes the loop only while sales does not equal missing

Quarter+1; adds “1” to Quarter

output; outputs each looped observation

input Sales : comma. @; reads next ob for “do while” to evaluate

end;

run;

Lesson Thirty-Two – Reading Hierarchical Files

  1. Creating One Observation per Detail Record
    1. In order to create one observation per detail record, it is necessary to distinguish between header and detail records

      1. It is easier to have a field that does this for you (H for header and D for detail, for instance)

1—+—-10—+—-

H 321 S. Main ST

D MARY E 21 F

D WILLIAM M 23 M

D SUSAN K 3 F

    1. Retaining the Values of Variables
      1. As you write the DATA step, you want to keep the header record as a part of each observation until the next header is encountered

        1. use a RETAIN statement

1—+—-10—+—-

H 321 S. Main ST

D MARY E 21 F

D WILLIAM M 23 M

D SUSAN K 3 F

data perm.people;

infile census;

retain Address;

      1. Next, you need to read the first field in each record, which identifies the record’s type

        1. Use the @ line-hold specifier to hold the current record so that the other values in the record can be read

1—+—-10—+—-

H 321 S. Main ST

D MARY E 21 F

D WILLIAM M 23 M

D SUSAN K 3 F

data perm.people;

infile census;

retain Address;

input type $1. @;

    1. Conditionally Executing SAS Statements
      1. Use “type” to identify each record

        1. If type=H then execute an INPUT statement to read the values for Address
        2. If type=D then execute an INPUT statement to read the values for Name, Age and Gender
        3. Drop type since it is not needed anymore

data perm.people (drop=type);

infile census;

retain Address;

input type $1. @;

if type=’H’ then

input @3 address $15.;

if type=’D’ then

input @3 Name $10. @13 Age 3. @16 Gender $1.;

  1. Creating One Observation per Header Record
    1. Want to make one summary variable Total to record the total number of people living at each address (header record)
    2. This example starts normal but the if/then statements change into if/then/do
    3. You do not want the first header record to be written as an observation until the related detail records have been read and summarized

      1. _n_ is an automatic variable whose value is the number of times the DATA step has begun to execute
      2. The expression _n_>1 defines a condition where the DATA step has executed more than once

        1. Use this to find out if the current record is the header
        2. The DATA step has executed more than once
      3. Set the Total variable to 0 for this observation (which will be added to by the detail records)
      4. Put the input information for reading Address


1—+—-10—+—-

H 321 S. Main ST

D JON D 21 F

D ALICE B 3 F

H 324 S. Main ST

D MARY E 21 F

D WILLIAM M 23 M

D SUSAN K 3 F

D MARYANN A 20 F

D JOHN S 54 M

H 325A S. Main ST

D JAMES L 34 M

D LIZA A 31 F

D MARGO K 27 F

data perm.people (drop=type);

infile census;

retain Address;

input type $1. @;

if type=’H’ then do;

if _n_>1 then output;

Total=0;

input address $3-17;

end;

      1. Put in an else if/then statement for type=’D’
      2. Since Total has already been initialized at 0 each time a header record is read, Total can be incremented by 1 each time the type=’D’ using a Sum statement
      3. Things to Remember
        1. A sum statement enables you to add any valid SAS expression to an accumulator variable. You may add two variables together if required
        2. The value generated by a sum statement is automatically retained throughout the DATA step, which is why it is very important to set the value of Total to 0 each time a header record is read

data perm.people (drop=type);

infile census;

retain Address;

input type $1. @;

if type=’H’ then do;

if _n_>1 then output;

Total=0;

input address $3-17;

end;

else if type=’D’ then total+1;

      1. The program currently only writes an observation to the data set only when another header record is read and the DATA step is executed more than once

        1. After the last detail record is read, there are no more header records to cause the last observation to be written to the data set
        2. Use the END= option in the INFILE statement to specify the end of the file

          1. END= specifies a temporary numeric variable whose value is 0 until the last line is read and 1 after the last line is read
          2. it is not necessary to specify =1 since the default is 1

data perm.people (drop=type);

infile census end=last;

retain Address;

input type $1. @;

if type=’H’ then do;

if _n_>1 then output;

Total=0;

input address $3-17;

end;

else if type=’D’ then total+1;

if last then output;

run;Lesson Thirty-Three – Reading Variable-Length Records

  1. Reading Characters That Have Varying Lengths
    1. The LENGTH= Option
      1. Used to define a numeric variable whose value is set to the length of the current record
      2. Used in the INFILE statement
      3. INFILE file-specification LENGTH=variable;

1—+—-10—+—-

1802JOHNSON2123

1803BARKER2142

1804EDMUNDSON2325

1805RIVERS2543


data perm.phones;

infile phonedat LENGTH=reclen; length of each record is assigned to the variable “reclen”

    1. Holding the First Record


data perm.phones;

infile phonedat LENGTH=reclen;

input ID 4. @;

    1. Storing the Current Length of a Variable-Length Field
      1. A variable needs to be made that contains the length of the Name
      2. ID and PhoneExt each take up 4, so take reclen and subtract 8 to get the value for NameLen


data perm.phones;

infile phonedat LENGTH=reclen;

input ID 4. @;

namelen=reclen-8;

    1. Reading Variable-Length Values
      1. $VARYINGw. informat is a special SAS informat that enables you to read a character value whose length differs from record to record
    2. INPUT variable $VARYINGw. length-variable;
      1. w specifies the maximum length of the variable
      2. length-variable specifies a numeric variable that contains the actual length of the variable in the current record

data perm.phones;

infile phonedat LENGTH=reclen;

input ID 4. @;

namelen=reclen-8;

input Name $varying10. namelen PhoneExt;

    1. Reading Characters That Have Varying Lengths
      1. A variable that is defined by LENGTH= option is not stored as part of the observation

        1. No DROP option is required
        2. The minimum and maximum values of the variable are written to the SAS log
  1. Reading Records That Have a Varying Number of Fields
    1. You can read the following example below as one observation for each block of data (even though there are varying lengths of repeating information)

1—+—-10—+—-20—+—-30—+—-40—+—-50

1234 13MAR89 120/70

1443 12FEB89 120/80 11JAN90 120/80 13MAR89 120/70

1681 11JAN90 120/80 19NOV88 130/70

2034 19NOV88 130/70 13MAR89 120/70 12FEB89 120/80

    1. LENGTH= Option and first Input statement

data perm.phones;

infile phonedat LENGTH=reclen;

input ID 4. @;

    1. Create a loop to read in the date and blood pressure

data perm.phones;

infile phonedat LENGTH=reclen;

input ID 4. @;

do index=6 to reclen by 15;

input Date : date. BP $ @;

output;

end;

run;

    1. At the end of the first loop, the index variable is increased to 21

      1. if this is larger than the record’s “reclen” value then the loop will not execute again
      2. The iterative DO statement defines the index variable Index, whose value ranges from 6 to reclent, incremented by 15 (the length of the values for date and bp, plus the blank that delimits each block)

Leave a Reply