Friday, 9 May 2014

Power BI - Yet Another experiment in Redmond

With growing e-commerce platform, business are becoming more data centric in terms of decision making, today we have a lot of platform which covers a wide span from traditional data storage [ Relational Model ] to Big Data [ Hadoop ].
With new platform coming into business the urgency for new data visualization has also increased, again the this domain has wide span to cover from legacy data visualization tools like Microstrategy, OBIEE, R SAS and now much talked about Power BI, which is surprisingly an eye catcher for every one in this domain.

There is no doubt excel has been far more productive especially in finance domain for any enterprise and to be very honest I am a die hard fan of excel, though we have state of art technology for data manipulation but still excel remains the first choice in many teams. So they why Folks in Redmond thought of building something new when they have excel. Lets analyze some of the basic shortcoming for excel.

Most of the time when we visualize data and try to find out some trend there standard time frame on which we want to see data, WoW known as vow [ Week over Week ], MTD, QTD and YTD adding to complexity we further break it down to CY and PY i.e Current year and Previous year. Considering any e-commerce business, the amount of data they generate in a single day ranges about 100,000 rows to 500,000 rows data [ considering all regions of operations], now here is the challenge with excel, processing such huge amount of data in excel has got major drawback
1) Slow processing
2) Non- Scalable when comes to slicing and dicing
3) Problem in doing a deep dive
4) Sharing the output and many more.

And that's how Power BI came into picture, no doubt it has the ability to process huge amount of data and has the capability to connect to any Data source makes it first choice of many excel users who are facing either one the problem outlined below. Sounds perfect ! but here was the hidden pain is.

Technically speaking you design a meta-data in Power BI to extract data and perform join at the back end, which is nothing but a traditional way of creating an ETL job or writing a SQL to obtain your desired result, considering 2 scenario, there are some added benefit with the second option, first you have lot more flexibility to improve the performance when you create an ETL job or write query, with Power BI there is a restriction. If there is a need to auto refresh your final output Power BI is slightly manual the second option is more flexible.
The most important feature as reporting analyst I want to is to drill up and down to see behavior at different granularity, Power BI kind of makes it complicated however especially with SQL you can easily do it. The deep dive feature in Power BI takes you to separate excel file when you try to go deeper, in SQL its pretty easy.

Bottom line if I have to spend so much of time and effort in creating one metric by collecting huge amount of data I would sit spend time in quality design so that I make a scalable reporting solution rather than use Power BI and make a one time use case.

Technology has got different perspective, what I think may be absolutely wrong to you and makes sense, however I would love to see feedback to know the opinion on the other side of the fence.

best
Siddharth Gupta

Friday, 6 December 2013

Robust Business Intelligence Solutions

Working in this domain for nearly half a decade, I have been trying to figure out some of the key challenges when it comes to designing a robust Business Intelligence solution, from text books to real business case studies as such there is no single thread you can pick up and say here it goes. The question becomes even more tough to answer when it requires a cross geographical team and cross platform integration

The answer to this question is actually subjective and depends on the domain you are working on be it retail, banking, supply chain etc etc., however there are bunch check list that you need to mark before you actually commit .

1) Scalable Data Integration
    I would call this as one of the major challenge teams are facing, every integration design is ought to be outdated at some point of time, the question is what time do we set as threshold, for industry like e-commerce, banking online retail this will be a continued area of research because of cut-throat competition and business requirement changes with a blink of an eye, however if the design sustain a period of 3-4 years, I would probably call that an excellent design. One of the key thing to incorporate when we design a system is we try to capture all business scenario and integrate data at the most granular level, so that we take care of scalability into consideration. Another dimension to an element of scalability is how flexible is the design is, if a new business case needs to be developed rather than starting from square 1 its better to complement the current system with some additional work around, but yes this should not be made a habit else this might create a web of data which would be hard to maintain.

2) Long Term Solutions 
Most of the time folks are trying to solve a business problem which is 2-3 years in their line of sight, technically a new business launched at least needs 5-6 years to come into break even, providing a solution for such a short term, might break down the whole structure and would incur extra cost. The key to this problem is, the participants need to be prudent enough so that they design solutions which have long lasting life.

3) Cross Geographical Team Integration
With the advent of internet, the world has become flat, to make business operations running 24x7 its always a good idea to make sure teams are coupled tightly among all geographies. Knowledge flow should be proper and the division of labor is well defined. Providing a good business solution is less of technology and more of understanding the business process, so if the knowledge flow is not correct, all the hard work done above are of no use.

4) Remove Social Barriers
Since we are working as global team, and encompass folks from different region it is a key requirement that we remove all the social barriers, the team should not be working with a local goal but should strive on a global scale, driving business is like connecting all the parts of the machine and making sure that each part is functional, with social barriers the drive may not be smooth.

5) Regular Audit
Above all, the most critical point that needs to be plugged in each stage is an element of audit, no design and solution will go perfect in single shot, however by auditing it we can make sure we are sticking to quality standard so the business users have a confidence in the solutions we provide to them.

The above checklist is certainly not perfect and its the outcome of my past experience, however if we take these things into consideration my guess is we would be able to make a good and robust Business Intelligence solution.
I would appreciate feedback and comments on this article, in case I have missed out any key aspect of this discussion.

Best
Siddharth

Thursday, 29 August 2013

XML To Be The New Revolution in Business Intelligence Reporting

Hi, Its been long break from my last post here, but yes I don't like the idea of just writing anything crap on the web, after completing my school, I joined Amazon and life became hectic. The good thing about working at Amazon is you really work at the grass root level of the technology for me it was data. data and data.

The very idea that no business can run without informed decisions makes the domain Business Intelligence very interesting, I know a lot more new terms have been coined these days, especially DATA SCIENTIST, well I am not here to comment on what the salutation should be :-), and with internet being the new mode of communication capturing each customer movement and arriving at the correct decision is a critical activity.

Since internet is such a versatile customer experience that it can possibly capture customer information in any form from, DB Storage to flat file, the question is how do we process all these information and collectively arrive at a consolidated decision, since your information could be on several pieces of storage, technically individual sources should add up to the common source, commonly known as POC ( Proof Of Concept ).

There are many vendors which provide us with state of art to Extract, Transform and Load ( ETL ) solution, so capturing the raw data and storing it is not a problem, also storage these days are cheap so eliminating the problem of storing massive amount of data. The real problem is how do we integrate all pieces of block and build a wall so that we are able to retain our PnL statement.

Till date with my knowledge of handling Data Quality issues, there is only one Vendor who has tried to solve this problem is Oracle by coining a yet another term called Real Time Heterogeneous Data Base integration using Oracle Golden Gate, to be very honest two different database servers 10000 miles a part it takes 5-10 seconds to synchronize them, but still the question is maintaining expensive servers and then buying the license to integrate them, is the expense worth extracting such information ? OGG is primarily used for DRS system ( Disaster Recovery System ) and less of data integration with few exceptions.

The solution to the above problem where we can arrive at a consolidated solution by combining all the data points stored at different system without physically synchronizing them is XML.For many readers it might be a surprise but YES.. XML is a strong tool for data exchange primarily used as communication channel between web and back end data server, but this very fact can be used as communication channel between different data points on a common platform eg web. We may use the power of XML data transfer technique and build a robust reporting system that can interact with almost any data source and display the result, and all this without spending a penny.

The concept is really simple to understand but equally tough to implement. Would appreciate any suggestions, reading comments on this.

hope you might have enjoyed my idea, although its just a summary of what I have been reading and thinking.

Best
Sid

Tuesday, 15 May 2012

ALTERNATE WAY TO DBLINK

DBlink most of the time is used to synchronize two different databases, before going to the details about this discussion I am restricting my blog to Oracle only. Since now there are many technology available for the data synchronization DBlink is rarely used for such operations.

To briefly describe the database link, its a mechanism where we can connect to database objects, mainly tables. which can be on the database server, i.e with different SID or may be on different server ( physically located. The use of DBlink in my views should be the last option although there are some question where users might want to say that there is no alternative. Well recently at my work I had to compare two data set that were on different tables and do a minus operator to make sure that both the tables are same. The reason for such operation was that we are migrating some legacy jobs to a new environment for performance issues and want to make sure that the new jobs are populating the data in the same way as the legacy jobs.

The only option that we had for the testing was the use of DBlink or write a complex code in java which is not everyone's cup of tea. creating a DBlink itself is DBA activity and being a developer I did not had the admin rights to create a DBlink, use of java is more cumbersome as it requires more of jar files to be included in the environment ( eclipse ). So what is an alternative way to accomplish this task, the answer is EXCEL.

With widely supported on many machines this utility is very handy when we are doing such ad-hoc data validation, the use of excel for such data validation is simple and doesn't require a great deal of programming also. The architecture of the tool is very simple, collect the data from different database either from different server or from the same database server, bring that data to different excel sheet, in my case it was a two sheet and then a loop through both the sheet to see if there are any data anomalies in the data set.

The only constraint in this case would be the sql running on both the server should have order by clause in the same order, which is the also the perquisite for the minus operator when comparing two data set.

Using a DBlink in this case would be a costly operations keeping in mind the full table scan and joins in case used, reading the data from tables separately and then comparing them on the local machine reduces the i/o of operations by huge amount considering there are huge numbers of rows to be compared with the minus operator that has DBlink.

The same application can be also be applied on when trying to insert the data from database server to a different database server.

Please let me know if any one is interested in seeing my excel file for this operation.

I hope you might have enjoyed this very alternative way of DBlink

Regards,
Siddharth gupta

Sunday, 29 April 2012

OBIEE Assignment - Designing the Dasboard


BI Dashboard was one of the admiring features of Business Intelligence, during my industrial experience I had a chance to work on the BI Dashboard using Microstrategy but the knowledge was limited to analyzing the data, which was more of sql rather than designing the reports and making dasboard.  With this assignment of making reports and dasboard I got a chance to actually analyze the problem and make effective report to answer some of the business problem.

The methodology our team followed to make report and dasboard was to give the user every flexibility to run the report with parameter he/she wanted, we achieved this by using promts in the OBIEE environment.
The reason I proposed this approach was because of my previous work experience. Since we had the common dasboard the common report for all the business users, the only criteria that was different was the specific details like revenue details for any property, the fact contained all the data for all property, it was just specific users are interested in the data for their own property, so while they ran the report they just enetered the property id for their preorty. Also purposely we allowed multiple selection of time value from the time dimension so the user may analyze the data for any specific time period or for just one year.

The drill down functinalty was the last option that I wanted to implement but could not do it because there was not much of help available at the internet or on the oracle OBIEE documentation, so was not able to do it.

The assignment was a good practice for me in the sense it gave me an opportunity to build reports according to the problem at hand and create a dasboard by visuliazign the data. I was able to bring use of Microstrategy experience while  creating the reports and dasboard and still use the class room teaching to analyze the problem and decide on the KPI.

Regards,
Siddharth gupta

SSIS Tutorial

SSIS tutorial was one of the mind squeezing assignment although not graded, coming from ETL background I knew the assignment will be challenging terms of learning new environment. I was also interested in this assignment as I wanted to test my learning curve for new ETL tool, previously worked on Informatica currently working on IBM Data stage at UITS - Mosaic, SSIS was easy to pick up although there were some cool transformation that I never found or never used during my industrial experience.

Working on different ETL tool, I was able to figure out the most important thing that was also pointed out by our professor Dr. Ram , is to use your knowledge and use it on the problem at hand. Solving the problem is just a trivial task if you know how to use your knowledge ( in this case SSIS transformation ). There will be wide variety of tools that will keep emerging in the market and you mat not be pro in learning all of it, the key is we need to know the underlying concept and then use the technology to accomplish it.

Although the I had to follow up the tutorial for setting up the environment as I was not aware about the SSIS environment, once I had the ETL flow diagram set up in my mind it was pretty much drag and drop of the transformation and changing parameter value.

Nevertheless the SSIS assignment was yet another huge learning curve for me in BI Class this semester after the GOMC prject and OBIEE dashboard. Looking forward to follow MSDN link and learn more about it.

Regards,
Siddharth gupta 

Sunday, 22 April 2012

INFOGRAPHIC RESUME

Infographic resume was one of the toughest assignment for this semester in BI class this spring. Although in the class lecture it was looking a good idea to present in form of some picture rather than textual representation but its very tough to select a theme that shows your passion and still be able to convey your skills and achievements.

Finally I decided to make my infographic resume on the theme of a famous Microsoft game called Age of Empire which I used to play in my undergrad, playing games is one of my passion be it on computer on ipad.

Infographic Resume Theme:
Fedual Age
I have tried to portray me resume as a story of a soldier from  Persian Civilization, who over a period of time has gained skills by passing on different stages.
The first being Fedual age, the lowest rank of a solider, i.e when I was in undergrad college at Panjab University.


Castle Age


 As the civilization grows the soldier improves on his rank and becomes much more skillful this is shown in the adjoining image.

Imperial age

Finally when the civilization reaches the imperial age the solider attains the highest level of skill set shown below.




With the idea of gaming I have tried to display in way that suits my passion, I hope you appreciate my resume