Statistical Analysis Software

Monday, May 23, 2016

Visual Analytics

SAS Visual Analytics

This is one of my luckiest day since I got sneak peek into how visual analytics uses dashboards to assimilate data into various graphical visualizations that have very appealing look and feel and allow analyst to drill through multiple levels of hierarchy. The data can be easily viewed across multiple dimensions with impressive tooltips to guide into deriving insightful conclusions.

Figure 1 Vertical Categorized Graph

This example illustrates how to use parameters to retrieve relevant data. The sleek animation effect that trails with zoom in and out is addictive.

Figure 2 Big Data with cascaded hierarchy

Figure 3 Exposure by Industry

Figure 4 Drilled Down Graph

Multiple options of color coding and displaying numerical data with associated widgets enhances the result. The red, yellow and green indicators below delineate low, medium and high ranges for return percent and RAROC dimensions.

Figure 5 Tabular Data

If you ever want to know more about in-memory computations and how to have faster load times for content rich data and graphics, SAS Visual Analytics will be good solution!

Friday, April 1, 2016

AUTOEXEC

How many of us have wondered how to make server connection more of one-button click process in SAS Enterprise Guide? Generally, there are multiple servers available and depending on your need, you may want to switch them. However, once you decide to stick of particular development server, it is important to be aware that there exists an option of AUTOEXEC which denotes creating and saving simple one or multi-line automatic executable program node. This program node must be named AUTOEXEC (case-insensitive) and saved in process flow Autoexec.

This program node contains libname statements to connect to specific libraries on desired server. As long as you check mark the setting "Automatically run "Autoexec" process flow when project opens" under Tools, Options, General.

Wednesday, May 6, 2015

How to achieve DATA MODELING - Session 4

INTEGRATION OF DATA:

Often, the data needed to create meaningful model may reside over multiple sources such as flat text files (.txt, .csv, .tab), hyper text markup language (.htm, .html), Microsoft excel worksheets (.xls, .xlsm, .xlsb, .xlsx), Microsoft Access (.mdb, .accdb), SAS datasets (.sas7bdat) and hence the key is to ensure successful and reliable data import process. Data modeling softwares such as SAS Enterprise Guide will provide you advanced features to import data from myriad of sources very conveniently without diving into coding details.

Data Import example of SAS Datasets and Microsoft Excel worksheet

In the above process flow snapshot, the first three icons denoted by red ball represent SAS datasets named Orders, Products and salaries. The fourth green X icon represents Microsoft Excel worksheet named Suppliers.

During the import wizard process, close attention must be paid to the nature of the data being imported such as the field's name, length, format, data type, informat, label and by all means take advantage of the features provided to ignore certain fields if they are irrelevant to the data model in order to optimize the design in terms of memory requirements.

It is a good practice to open each of the imported dataset and perform quick data examination of sample observations so that there is confidence in proceeding with next steps involved in data modeling.

Tuesday, April 28, 2015

How to achieve DATA MODELING - Session 3

DATA MODEL'S INPUTS:

One of the most important consideration is the complexity level of the design of the Data Model. It is normal to break down the tasks depending on basic guidelines into multiple process flows or multiple dataset sources.

Process Flow Concept: The basic notion is to use two or more process flows so that there is better readability and optimum resource utilization. It is crucial to group the tasks such that there is minimum repetition of data sources across each of the process flows.
Dataset Concept: In the scenario, when only one process flow would suffice the requirement, the controlling factor is where the datasets reside, generally referred as libraries. The joins between the datasets need to be accurately devised to achieve desired outcome. Thorough understanding of key relationships and heavy usage of data examination techniques will govern the stability and robustness of data model.

DATA MODEL'S OUTPUTS:

In iterative design, it is important to consider if outcome of basic data model will source other advanced data model.

Dataset Concept: The general thumb rule is to strategize the categories of output into multiple datasets, either in one or more libraries, so that they can be reused. It is convenient to follow standard naming conventions since creating new datasets results into potential training documentation educating analysts and other power users and demonstrates proactive attitude towards planning efforts.
Report Concept: It may be quintessential to publish the output of data model and in that case, there are myriad of solutions to customize the look and feel of the content in terms of reports. Relying on standard templates provided helps narrow down the professional outlook of the reports. The basic structure can be broken down to include visual representations such as line graphs, bar charts in conjunction with tabular data in matrix format for powerful delivery of information.

Here is an example of automated academic report that used to be manually generated by feeding data into MS Excel worksheets:

Figure 1. Complete PDF Report

Figure 2. Image and Highlights Content

Figure 3. Tabular Content

Figure 4. Graphical Content

DOWNLOADABLE REPORT

Thursday, April 23, 2015

How to achieve DATA MODELING - Session 2

BASIC BUILDING BLOCKS

There are important considerations to be made while determining the basic building blocks of a DATA MODEL. The design needs to be outcome-driven so that it supports the intermediate results effectively. Simply put, relying on set of stable inputs such as large datasets that get updated automatically on daily basis is the key to successful data mining. Here is a classic example explaining how some analytical tools can address business queries very efficiently:

DATA ACCESS

One of the most daunting task is to access the data that might be piped from multiple sources. Being able to connect to those records through appropriate engines and library references is the key to begin intricate data modeling. Many data scientists like to get this organized by following standard naming conventions and heavy usage of ordering sequence. It helps browse through data marts efficiently when there is a need to join multiple datasets to create complex views for deriving extensive quantitative measures.

Tuesday, April 21, 2015

How to achieve DATA MODELING - Session 1

DATA MODELING

In today's era of smart data management, most of the data scientists are interested in drawing business insights with the help of various tools. So those of you who still may wonder what is "Data Modeling" or what is the "need" for data modeling, here is an interesting view:

Advantages

Understand potential data sources
Gauge the depth and width of hidden information
Develop faith in overall business processes

Disadvantages

Involves heavy financial investment upfront
Cumbersome techniques that require professionals with niche experience

Monday, November 4, 2013

LINE POINTER CONTROL

What is Line Pointer Control?
Definition: It is technique by which SAS allows you to rearrange variables in a data set while reading data from raw data files. It lets you read multiple records either sequentially or non-sequentially and create single observation. It involves correct use of special characters like #n or / which are respectively called non-sequential and sequential (forward slash) line pointer controls.

What is advantage of Forward Slash Line Pointer Control?

When you are interested in creating one observation from multiple records without using multiple INPUT statements, forward slash line pointer control comes in handy.

ROMA TOMATO
FRESHPOINT CA
1.80 DAILY
GREEN BEANS
ORGANICS NC
1.20 WEEKLY

Code to read data file:

data produce.vegetables;
infile vegdata;
input VName $ 1-11 /
Supplier $ State $ /
Rate 4. Frequency $;
run;

This will create data set vegetables:

VName	Supplier	State	Rate	Frequency
ROMA TOMATO	FRESHPOINT	CA	1.8	DAILY
GREEN BEANS	ORGANICS	NC	1.2	WEEKLY

What is advantage of #n Line Pointer Control?

In case of accessing variables non-sequentially #n line pointer control comes in very handy. In the above example, if you were required to create data set with specific order of variables such as

Supplier
State
Rate
Frequency
VName

Then the best way to proceed is with use of #n as it allows you to jump on n record and go back and forth between multiple records to create one observation.

data produce.vegetables;
infile vegdata;
input #2 Supplier $ State $
#3 Rate 4. Frequency $
#1 VName;
run;

Data set created:

Supplier	State	Rate	Frequency	VName
FRESHPOINT	CA	1.8	DAILY	ROMA TOMATO
ORGANICS	NC	1.2	WEEKLY	GREEN BEANS

It is possible to control / Forward Slash and #n line pointer controls for more complex data records.

*** For questions and discussion, use COMMENTS section or email sasavant9.4@gmail.com