Wednesday, May 6, 2015

How to achieve DATA MODELING - Session 4

INTEGRATION OF DATA:

Often, the data needed to create meaningful model may reside over multiple sources such as flat text files (.txt, .csv, .tab), hyper text markup language (.htm, .html), Microsoft excel worksheets (.xls, .xlsm, .xlsb, .xlsx), Microsoft Access (.mdb, .accdb), SAS datasets (.sas7bdat) and hence the key is to ensure successful and reliable data import process. Data modeling softwares such as SAS Enterprise Guide will provide you advanced features to import data from myriad of sources very conveniently without diving into coding details.

Data Import example of SAS Datasets and Microsoft Excel worksheet
In the above process flow snapshot, the first three icons denoted by red ball represent SAS datasets named Orders, Products and salaries. The fourth green X icon represents Microsoft Excel worksheet named Suppliers.
 
During the import wizard process, close attention must be paid to the nature of the data being imported such as the field's name, length, format, data type, informat, label and by all means take advantage of the features provided to ignore certain fields if they are irrelevant to the data model in order to optimize the design in terms of memory requirements.
 
It is a good practice to open each of the imported dataset and perform quick data examination of sample observations so that there is confidence in proceeding with next steps involved in data modeling.

Tuesday, April 28, 2015

How to achieve DATA MODELING - Session 3

DATA MODEL'S INPUTS:

One of the most important consideration is the complexity level of the design of the Data Model. It is normal to break down the tasks depending on basic guidelines into multiple process flows or multiple dataset sources.
  • Process Flow Concept: The basic notion is to use two or more process flows so that there is better readability and optimum resource utilization. It is crucial to group the tasks such that there is minimum repetition of data sources across each of the process flows.
  • Dataset Concept: In the scenario, when only one process flow would suffice the requirement, the controlling factor is where the datasets reside, generally referred as libraries. The joins between the datasets need to be accurately devised to achieve desired outcome. Thorough understanding of key relationships and heavy usage of data examination techniques will govern the stability and robustness of data model.
DATA MODEL'S OUTPUTS:

In iterative design, it is important to consider if outcome of basic data model will source other advanced data model.
  • Dataset Concept: The general thumb rule is to strategize the categories of output into multiple datasets, either in one or more libraries, so that they can be reused. It is convenient to follow standard naming conventions since creating new datasets results into potential training documentation educating analysts and other power users and demonstrates proactive attitude towards planning efforts.
  • Report Concept: It may be quintessential to publish the output of data model and in that case, there are myriad of solutions to customize the look and feel of the content in terms of reports. Relying on standard templates provided helps narrow down the professional outlook of the reports. The basic structure can be broken down to include visual representations such as line graphs, bar charts in conjunction with tabular data in matrix format for powerful delivery of information.
Here is an example of automated academic report that used to be manually generated by feeding data into MS Excel worksheets:

Figure 1. Complete PDF Report

Figure 2. Image and Highlights Content

Figure 3. Tabular Content

Figure 4. Graphical Content

DOWNLOADABLE REPORT

Thursday, April 23, 2015

How to achieve DATA MODELING - Session 2

BASIC BUILDING BLOCKS

There are important considerations to be made while determining the basic building blocks of a DATA MODEL. The design needs to be outcome-driven so that it supports the intermediate results effectively. Simply put, relying on set of stable inputs such as large datasets that get updated automatically on daily basis is the key to successful data mining. Here is a classic example explaining how some analytical tools can address business queries very efficiently:





DATA ACCESS

One of the most daunting task is to access the data that might be piped from multiple sources. Being able to connect to those records through appropriate engines and library references is the key to begin intricate data modeling. Many data scientists like to get this organized by following standard naming conventions and heavy usage of ordering sequence. It helps browse through data marts efficiently when there is a need to join multiple datasets to create complex views for deriving extensive quantitative measures.

Tuesday, April 21, 2015

How to achieve DATA MODELING - Session 1

DATA MODELING

In today's era of smart data management, most of the data scientists are interested in drawing business insights with the help of various tools. So those of you who still may wonder what is "Data Modeling" or what is the "need" for data modeling, here is an interesting view:




Advantages


  • Understand potential data sources
  • Gauge the depth and width of hidden information
  • Develop faith in overall business processes


Disadvantages


  • Involves heavy financial investment upfront
  • Cumbersome techniques that require professionals with niche experience