Statistical Analysis Software: October 2013

Thursday, October 31, 2013

MULTIDIMENSIONAL ARRAY

What are Multidimensional Arrays?

Definition: It is defined by specifying number of elements in each dimension separated by comma in {row, column} format.
array book{100, 5} $ p1-p500;
This array represents 100 pages, each containing 5 paragraphs.

How do I access elements in multidimensional array?

The indices are used wisely to access specified element. Generally, iterative DO loops will do the job.

data publish.collection(drop=i j k);
array book{100, 50, 25} $ 25 p1-p125000 ('This' 'is' 'story' 'about' 'how' 'SAS' 'was' 'born' 'and' 'how' 'it' 'is' 'grown' 'into' 'a' 'stable' 'statistical' 'analysis' 'software' 'that' 'is' 'robust' 'and' 'scalable' 'in' 'nature' );
array page{5000, 25} $ 25 p1-p125000;
do k=1 to 100;
do i=1 to 50;
do j=1 to 25;
page{i+(50*(k-i)),j}=book{k,i,j};
end;
end;
end;
run;

The snippet above reads three dimensional array and saves it as two-dimensional array.

Wednesday, October 30, 2013

ARRAY

What is Array?

Definition: It is grouping of variables that exists only for duration of DATA step. Its dimension is specified by number in curly braces. Arrays get their power from the ability to reference elements by an index value.

How to assign initial values to array elements?

Initial values can be assigned by enclosing values in parentheses.
array firstq{3} m1 m2 m3 (80 85 82);
Essentially element m1 is set to 80.

What are types of arrays?

There are two types: numeric and character.
Once array is created it is designed to contain only one type of data, either numeric or character.
Numeric:
array combination{3} color1-color3 (255 100 202);
Character:
array combination{3} $ 16 color1 color2 color3 ('red' 'green' 'blue');

What does the number 16 signify?
It determines length of character array element. It overrides default value of 8 for character array and sets it to 16.

How to name array elements?

Array elements can be specified as variable list or range.
Example of variable list for academic calendar:
array acadcal{12} m1-m12;
Example of range:
array lastq{10:12} m10 m11 m12;

Is there a function to determine length of array?

Of course! It is possible to create an array without specifying how many elements it will contain and determine those number of elements, rather dimension, later by means of function DIM.

data work.college;
set department.class;
retain totalenroll;
totalenroll=0;
array classsize{*} c1-c100;
do i=1 to dim(classsize);
totalenroll=totalenroll+classsize{i};
end;
run;

Above snippet calculates total enrollment for all classes offered where each array element stores number of students in a class. Well, what happens to variable i? Will it be written to output data set? That's interesting question and it is answered at the end of this post***.

Can temporary array elements be created?

Yes! with the use of _temporary_ keyword. These array elements are created and used in compilation and execution phase and do not get written to output data set.

array secondq{3} _temporary_ m4-m5;

How are arrays used?

To perform repetitive calculations
Create many variables that have same attributes
Rotate data sets by changing variables to observations and vice versa
Compare variables
Perform table lookup

Tips:

Do not give an array the same name as a variable in the same DATA step.
Initial values can be separated by comma as well.
Avoid using name of SAS function
Do not use array names in LABEL, FORMAT, DROP, KEEP, LENGTH statements.
Dimension of array can be determined by *
Parentheses, braces or brackets are good to specify dimension

array courses(3) art science commerce;
array courses{3} art science commerce;
array courses[3] art science commerce;

*** YES, i is written to output data set. It can be eliminated by simple use of drop= option in DATA statement.

Read Raw Data File

How to read / access .dat file?

filename statement can be used to create fileref and specify full path of raw data file. Further, infile statement can be used like set statement in data step to access raw data. Finally, input statement is used to define data structure of data set naming how many variables, their names, data type (alpha or numeric) and column numbers.

It is important that fileref used in filename is same as that used in infile.

Scenario: Enroll raw data file contains information on students ID, Name and CourseID which can be of two types: 1000 denotes Associate degree, 2000 denotes Bachelor degree.

Sample Code:

filename enroll 'C:\sas\enroll.dat';
data work.enrollinfo;
infile enroll;
input ID $ 1-4 Name $ 6-25 CourseID $ 27-30;

if CourseID='1000';
Degree='Associate';
else if CourseID='2000';
Degree='Bachelor';
run;

Constraints on Variable names:

must be 1 to 32 characters in length
must begin with letter (A-Z) or an underscore (_)
can continue with any combination of numerals, letters or underscores

Tuesday, October 29, 2013

Program Data Vector

What is Input Buffer?

Definition: Area of memory created to hold a record from external file. It is created only when raw data is read, not when SAS data set is read. It is created in Compilation phase.

What is Program Data Vector?

Definition: Temporary area of computer memory in which SAS builds a data set, one observation at a time. It is logical concept and short for PDV.

It is created in Compilation phase after input buffer is created. It contains two automatic variables _N_ and _ERROR_ that can be used for processing but are not written to data set as part of observation.

_N_ counts number of times that data step begins to execute

_ERROR_ signals occurrence of an error that is caused by data during execution. Value of 0 is default which means no error. Value of 1 means one or more errors.

The Compilation phase ends with creation of descriptor portion of data set.

Monday, October 28, 2013

DATA step

What is Data Step?

Definition: It typically takes data, processes it, and creates a SAS data set. It starts with data keyword and ends with run statement.

Applications: It is used to create new variables within a data set.

Scenario: Assuming your existing data set (employee) contains three variables: Employee, Department, Salary. You intend to make copy of this data set (employeearchive) and store it as archive, then following data step can be used:

Sample Code:

data work.employeearchive ;
set work.employee;
run;

Statistical Analysis Software

Pages