Monday, November 4, 2013

LINE POINTER CONTROL

What is Line Pointer Control?
Definition: It is technique by which SAS allows you to rearrange variables in a data set while reading data from raw data files. It lets you read multiple records either sequentially or non-sequentially and create single observation. It involves correct use of special characters like #n or / which are respectively called non-sequential and sequential (forward slash) line pointer controls.

What is advantage of Forward Slash Line Pointer Control?

When you are interested in creating one observation from multiple records without using multiple INPUT statements, forward slash line pointer control comes in handy.

ROMA TOMATO
FRESHPOINT CA
1.80 DAILY
GREEN BEANS
ORGANICS NC
1.20 WEEKLY

Code to read data file:

data produce.vegetables;
     infile vegdata;
     input VName $ 1-11 /
           Supplier $ State $ /
           Rate 4. Frequency $;
run;

This will create data set vegetables:


VName
Supplier
State
Rate
Frequency
ROMA TOMATO
FRESHPOINT
CA
1.8
DAILY
GREEN BEANS
ORGANICS
NC
1.2
WEEKLY

What is advantage of #n Line Pointer Control?

In case of accessing variables non-sequentially #n line pointer control comes in very handy. In the above example, if you were required to create data set with specific order of variables such as 

  1. Supplier
  2. State
  3. Rate
  4. Frequency
  5. VName

Then the best way to proceed is with use of #n as it allows you to jump on n record and go back and forth between multiple records to create one observation.

data produce.vegetables;
     infile vegdata;
     input #2 Supplier $ State $
           #3 Rate 4. Frequency $
           #1 VName;
run;

Data set created:


Supplier
State
Rate
Frequency
VName
FRESHPOINT
CA
1.8
DAILY
ROMA TOMATO
ORGANICS
NC
1.2
WEEKLY
GREEN BEANS

It is possible to control / Forward Slash and #n line pointer controls for more complex data records.


*** For questions and discussion, use COMMENTS section or email sasavant9.4@gmail.com























Thursday, October 31, 2013

MULTIDIMENSIONAL ARRAY

What are Multidimensional Arrays?

Definition: It is defined by specifying number of elements in each dimension separated by comma in {row, column} format.
         array book{100, 5} $ p1-p500;
This array represents 100 pages, each containing 5 paragraphs.


How do I access elements in multidimensional array?

The indices are used wisely to access specified element. Generally, iterative DO loops will do the job.

data publish.collection(drop=i j k);
     array book{100, 50, 25} $ 25 p1-p125000 ('This' 'is' 'story'      'about' 'how' 'SAS' 'was' 'born' 'and' 'how' 'it' 'is'            'grown' 'into' 'a' 'stable' 'statistical' 'analysis'              'software' 'that' 'is' 'robust' 'and' 'scalable' 'in'            'nature' ); 
     array page{5000, 25} $ 25 p1-p125000;
     do k=1 to 100;
        do i=1 to 50;
           do j=1 to 25;
              page{i+(50*(k-i)),j}=book{k,i,j}; 
           end;
        end;
     end;
run;

The snippet above reads three dimensional array and saves it as two-dimensional array.

Wednesday, October 30, 2013

ARRAY

What is Array?


Definition: It is grouping of variables that exists only for duration of DATA step. Its dimension is specified by number in curly braces. Arrays get their power from the ability to reference elements by an index value.

How to assign initial values to array elements?

Initial values can be assigned by enclosing values in parentheses. 
        array firstq{3} m1 m2 m3 (80 85 82);
Essentially element m1 is set to 80.

What are types of arrays?

There are two types: numeric and character.
Once array is created it is designed to contain only one type of data, either numeric or character.
Numeric:
      array combination{3} color1-color3 (255 100 202);
Character:
      array combination{3} $ 16 color1 color2 color3 ('red' 'green' 'blue');

What does the number 16 signify?
It determines length of character array element. It overrides default value of 8 for character array and sets it to 16.

How to name array elements?

Array elements can be specified as variable list or range.
Example of variable list for academic calendar:
        array acadcal{12} m1-m12;
Example of range:
        array lastq{10:12} m10 m11 m12;

Is there a function to determine length of array?

Of course! It is possible to create an array without specifying how many elements it will contain and determine those number of elements, rather dimension, later by means of function DIM.

   data work.college;
     set department.class;  
     retain totalenroll;
     totalenroll=0;
     array classsize{*} c1-c100;
     do i=1 to dim(classsize);
        totalenroll=totalenroll+classsize{i};
     end;
   run;

Above snippet calculates total enrollment for all classes offered where each array element stores number of students in a class. Well, what happens to variable i? Will it be written to output data set? That's interesting question and it is answered at the end of this post***.

Can temporary array elements be created?


Yes! with the use of _temporary_ keyword. These array elements are created and used in compilation and execution phase and do not get written to output data set.
        array secondq{3} _temporary_ m4-m5;

How are arrays used?

  • To perform repetitive calculations
  • Create many variables that have same attributes
  • Rotate data sets by changing variables to observations and vice versa
  • Compare variables
  • Perform table lookup
Tips:
  • Do not give an array the same name as a variable in the same DATA step.
  • Initial values can be separated by comma as well.
  • Avoid using name of SAS function
  • Do not use array names in LABEL, FORMAT, DROP, KEEP, LENGTH statements.
  • Dimension of array can be determined by *
  • Parentheses, braces or brackets are good to specify dimension
           array courses(3) art science commerce;
           array courses{3} art science commerce;
           array courses[3] art science commerce;

*** YES, i is written to output data set. It can be eliminated by simple use of drop= option in DATA statement.

Read Raw Data File

How to read / access .dat file?

filename statement can be used to create fileref and specify full path of raw data file. Further, infile statement can be used like set statement in data step to access raw data. Finally, input statement is used to define data structure of data set naming how many variables, their names, data type (alpha or numeric) and column numbers. 


It is important that fileref used in filename is same as that used in infile.

Scenario: Enroll raw data file contains information on students ID, Name and CourseID which can be of two types: 1000 denotes Associate degree, 2000 denotes Bachelor degree.

Sample Code:

filename enroll 'C:\sas\enroll.dat';
data work.enrollinfo;
  infile enroll;
  input ID $ 1-4 Name $ 6-25 CourseID $ 27-30;
  
  if CourseID='1000';
     Degree='Associate';
  else if CourseID='2000';
     Degree='Bachelor';  
run;

Constraints on Variable names:
  1. must be 1 to 32 characters in length
  2. must begin with letter (A-Z) or an underscore (_)
  3. can continue with any combination of numerals, letters or underscores




Tuesday, October 29, 2013

Program Data Vector


What is Input Buffer?

Definition: Area of memory created to hold a record from external file. It is created only when raw data is read, not when SAS data set is read. It is created in Compilation phase.

What is Program Data Vector?

Definition: Temporary area of computer memory in which SAS builds a data set, one observation at a time. It is logical concept and short for PDV. 

It is created in Compilation phase after input buffer is created. It contains two automatic variables _N_ and _ERROR_ that can be used for processing but are not written to data set as part of observation.

_N_ counts number of times that data step begins to execute

_ERROR_ signals occurrence of an error that is caused by data during execution. Value of 0 is default which means no error. Value of 1 means one or more errors.

The Compilation phase ends with creation of descriptor portion of data set.

Monday, October 28, 2013

DATA step

What is Data Step?

Definition: It typically takes data, processes it, and creates a SAS data set. It starts with data keyword and ends with run statement.


Applications: It is used to create new variables within a data set.

Scenario: Assuming your existing data set (employee) contains three variables: Employee, Department, Salary. You intend to make copy of this data set (employeearchive) and store it as archive, then following data step can be used:

Sample Code:
data work.employeearchive ;
set work.employee;
run;