Tuesday, 17 April 2012

What is the difference between Relational and Multidimensional database implementation ?

This was one of my second question on linked in to get a more industrial perspective answer. A lot of discussion on the implementation of relational or multidimensional database implementation. Some of the feedback from experts in this fields are outlined below :

Michel Voogd :
“The difference in implementation is that a multidimensional database includes pre-packaging subsets of data into small objects that are usable for fast online browsing, usually in a BI portal environment such as Cognos or Business Objects.
A relational database in itself doesn't include those packages but it would allow querying larger datasets.”

Bala Seetharaman :
“Relational DB - ER Modeling it has to comply the Codd's 12 rules. Here you can store only the way supported by DB engine (you can partition or multifile groups)
Multidimensional DB - Dimensional model - store the pre-aggregated data in the multidimensional form, still data sourced from Relational DB or Flat files. (Here you can store in the form of MOLAP, ROLAP and HOLAP and DOLAP too).

SQL - Query language used to search and manipulate the data from Relational DB
MDX - Multidimensional Query Expression - used to search and retrieve the data from cube or MDB (Multidimensional) store.

Siddharth: To answer your question "are there is any different tool or language to query the multidimensional database ( CUBE )",
MDX is the query language used to query the cube like your SQL again, its not like your ANSI standard SQL, we need to write in the form of 3D axis.

the calculations are quite easy in RDB than MDB, here if don't understand the dimension and hierarchy members we can't get the result easily in cube. “
John McKenna :
“….In relational databases data is organized by tables and columns (tuples) and records are grouped into blocks for storage and access. Querying is performed based upon relational algerbra (SQL). In multi-dimensional database implementations (most no longer exist), data is organized into mulit-dimensional cubes (think multi dimensional arrays), and queried based on a language suitable to navigating cubes (I am not aware of a standard although one may exist). To further muddy the waters you have columnar databases that group column data into blocks (efficient for ROLAP applications where few columns are in the result set, therefore less blocks traversed).

In addition to the database implementations many reporting tools have (OLAP/cube) functionality built in but many of these are not full blown multi-dimensional databases but scaled bown persistance engines that store all cube values together. Most full blown multi-dimensional databases have faded away due to performance issues (due to sparcity issues, etc), learning new query languages, supporting multiple database platforms and people finding that it was relatively easy to implement cubes in relational databases (ROLAP) by using dimensional database design (Ralph Kimball). …….”


The next question was the query methodology to query both type of database implementation, luckily I came to know that Oracle has also implemented multidimensional database architecture called the ESSBASE and for SQL Server its SSAS and SSRS.

Well there is still lot of information on my profile if you may want to have a look. Compiling all the notes is actually a tedious job. I have tried to aggregate some of the valuable comments.

For detailed discussion please follow the link

Sunday, 26 February 2012

BI Second Project - Twitter API programming

Hi All,
I hope every one is almost done with their first project execution plan and already Dr. Ram has asked us to start the second project planning, I am having a lot of expectation from this project as many skills will be put to test, one of the challenging test would be api programming in twitter.

This tutorial will provide how to integrate twitter api for your BI second project, I have spend almost 2 weeks on this, I don't want you folks to waste so much of time and get started right away.

Here are the steps to start with twitter api.

1 )download the twitter api twitter4j-core-2.2.5 from twitter twitter4j.org .

2 ) Search for the above api in the zip file, and place on your local machine.

3) Launch eclipse and right click on any project you want to import the twitter4j api and select build path --> configure build path. below is the screen shot.

4) Under the library tab click " Add External Jar files", and import the jar file you downloaded.

5) Under the project folder you can see the sub folder "Referenced Libraries"

Just check all the packages and classes are there. If they are you are good to go.

Now you need to access the twitter data with this api, but you need to have an key and a secret key.

1 ) Go to twitter.com

2) Scroll down and you will find a developer hyperlink, click and get your self registered.

3) Fill out the form and get your key and secret key.

4) use the below code to test the connection.

//replace by correct values
twitter.setOAuthConsumer("[consumer key]", "[consumer secret]");
AccessToken at=new AccessToken("[token]", "[token secret]");
twitter.setOAuthAccessToken(at);
try {
RateLimitStatus rls=twitter.getRateLimitStatus();
} catch (TwitterException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}

5) use any of the function and play now.

I hope this would help you in Second BI project.

Please don't share the code any where else.

Sunday, 29 January 2012

Chronological Definition Business Intelligence

Is BI only related to Dashboards, metrics and reports or it also includes designing of robust data warehouse ?

It took me 10 days to research and 30 mins to write this blog, not sure how folks are able to chalk down all the stuff so quickly with my experience writing a blog is collecting information from different sources and present a unified view, rather than post what was the lecture about anyways, I have been posting lot of grad level doubts on linked in but this was the most commented one. The very first question when I started working as BI analyst was What is Business Intelligence ?

To summarize this blog I would keep the title as Chronological definition of Business Intelligence. My journey with this word started somewhere in September 2009 when I started my job as BI Developer for my client, initially
I was into the database integration of OLAP and OLTP system, for the real time BI solution after 4 months of profile I started BI is to provide real time data to the business users, which is one of the domain that quantifies Business Intelligence, means leveraging the business with advantage of real time business data ( adding intelligence to the business )
I then moved to the ETL team ( Extract Tranform and Load ) which does nothing but takes the raw data from the business OLTP and process that and make it meaningful information for the business again adding value to the business objective ( adding intelligence to the business ).
My third profile where I spend some of the quality timing was reporting, which of course many think that is the end of BI, then I thought of defining BI as set of practice where we visualize data on fancy dasboard, of course it requires complex capability to that and they are highly paid also. Another definition I formulated was use of reporting tool is BI to help business make correct decision ( adding intelligence to Business ).
All these definition were correct but not correct, with the same question I posted this on the Linked profile, trust me at present 46 replies are there with all different definition and scope for BI, which brings me back to the same question for BI - " What is Business Intelligence" , some of the domain where we can define Business Intelligence as per the threads are

Data warehouse Architecture viewpoint
Almost every one talked about the design of Data warehouse, obviously if you can not store it you cannot view it, be it the most expensive reporting tool or be it the highest paid data analyst. So defining Business Intelligence in terms of design of database is not wrong although its not complete also.

ETL Developer viewpoint
The definition of the BI holds good for some one who is in this domain, to be very honest I personally feel this is where BI starts, no matter what is the architecture of the warehouse, if business logic is not implemented, the dashboard shows nothing, which includes real time data ( much talked about these days )

Business Reporting
Most elusive and easily considered to be the core part of BI, since its the face of business intelligence with fancy dasboards, expensive tools we feel tools and technology that incorporates data visualization is Business Intelligence, correct but not fully qualified.

Web Mining ( BI 2.0 )
The process of effective decision making now days is not only dependent on internal data but also external data, use of API to mine data and make decision is one of the hot topics in Business Intelligence, but again going to my earlier viewpoints until you don't process them its use less so a good defination but not fully qualified.

Data Analysis
Now you have all the data from the world, all processed, visualized how to make sure that the business policy and business decisions are correct, were the KPI ( another Jargon used in my class ) correct if not how do we redefine them, what will be the future trend, how to forecast the demands. I call all this is predictive analysis, doing all the above and missing this step will not be a complete BI solution.

Well till date I am still not sure how to define the term Business Intelligence, that's what our Prof also said, BI is a huge umbrella, definition depends what aspect are you considering.

Still I will not conclude the definition of BI, probably I will wait for the semester to end, but I am sure I will be still on the same page.

Hope this was a good attempt to define Business Intelligence.

Regards,
Sg.

Thursday, 12 January 2012

Oracle Golden Gate for Data Synchronization

This is one of my first professional post after 2-3 years, since the purpose of the blog is to highlight some of the major opportunity we as a student may get from here.
Business Intelligence is most of the time has been focused on reporting and metric value, obviously that is one of the pivoting reasons business takes leverage, but adding a new dimension to the same thought how do we populate the data for the senior business users so that they may take accurate decisions, what if the leadership is taking decision from different sources, adding one more value what if we can make this data a real time data. I know the story is puzzling let's imagine a scenario.
Suppose you being the owner of a firm that has a chain at three different countries, now for obvious reason of availability of technical expertise you have different environment set up to capture business data, lets say Oracle, Microsoft and MySQL. Now the first problem for the leadership is to bring all these data onto one platform, so that leadership take a holistic decision, since they all are physically separated, is there any mechanism we can integrate all of them at real time.

what follows below is my research work on real time database integration using one of the latest technology in the market.

"Data Synchronization is today an important part of data warehousing, for a large firm where different application have data residing on different platform, like SIBEL, Oracle Database, etc. it is important that organization is able to access the data on one platform, to make correct decision, popularly known as Business Intelligence.

The way to synchronize different database is done with the help of Change Data Capture also called CDC process. Which captures any change in data called the delta from the source and makes a replication on the target. Earlier Oracle has Oracle Stream to capture these changes, triggers and other methods. The problem with these methods were, they are dependent on Oracle Environment, and the biggest of all they interact directly with the live server, also called the OLTP system. Since an OLTP system is designed for the application access. A constant ping from such CDC process will create performance issue for the DB and Application like making the I/O process long, high CPU utilization which is certainly not desirable for the end users and application.

Broadly we can define the CDC process in 2 different categories:

1) Non-Log based category

2) Log –Based Category.

Oracle has now provided one of the most robust solutions for the data synchronization called the Oracle Golden Gate. The added advantage for the Oracle Golden Gate over other similar product from Oracle are, first it is not dependent on the platform. Oracle Golden Gate offers wide range of cross platform compatibility for different source and target. The other and most important reason for using OGG is its power to read from the logs of the live server (OLTP) system, causing the same data capture mechanism as done by other CDC process, but still not causing any performance issue with the OLTP system. The architecture for the OGG is such that it reads all the changes for the data in the table from the log files and process only those records which are COMMIT

The above diagram shows the part of pulling the data from the OLTP system while capturing the data from logs and not touching the OLTP DB. The data is read by OGG, at this point of time the OGG is said to be operating as an Extract and writing it on the file called “Trail files”, the data in these trail files are propriety of Oracle and can be read by OGG process. The same trail files can be read by the OGG at the target end and the CDC delta captured at source side is written on the target DB. The above data flow is for and Real TIME BI reporting, but changing the architecture OGG can be configure for Data recovery for Disaster recovery. The real time integration means that change captured can be send across the network with minimum time delay (less than 10 sec).

With the Data flow outlined above, OGG can be configured to capture data at schema level, table level and column level across different platforms and can be migrated to target system. OGG also gives the features to filter out the data from the source side itself, for example business might be interested in capturing the data for a particular time stamp filtering out the unnecessary data.

With this small document on OGG I have tried to show that data synchronization can be done across different platform with zero downtime. I have not gone into details for the component of Oracle Golden Gate, like Extract, Data Pump, trail files, Replicate and definition files. I have attended a three day session on Oracle Golden Gate as a part of my training inand was handling this application for over 1 year."

This part of my implementation may not fall exactly under the umbrella of BI but still for smart BI solutions firm may still want to implement the idea.

Also for the readers I have created a small prototype of the above architecture on my machine, if any one is interested please let me know, I will be more than happy.

Regards,
Siddharth gupta








Monday, 17 November 2008

Online Medical Diagnosis

The use of internet these days are so wide that we cannot imagine some day without using it, take it blogging,shopping, paying your bill, booking a ticket may be for railways , airways or even movies. A huge proportion of internet users are using net for these kind of subjects. In short internet has occupied a major part of our life, but there is still a one area where internet use is still nt exploited. I did my minor project on a topic called Online medical Diagnosis , In india where technology is growing in every sphere this aspect of our life is still in its dark days, many countries have such facility but the they dont give better results. Why can't we get help from doctore online. The major advantage of using such technology would be for patients, as they need to visit the doctors daily they can downlaod their prescriptin from internet.Doctors too can go for vacations and get their emergency patients details from their internet. Imagine a migrane in your head at midnight in Mumbai, the first problem is is dctor ready for you n even if yes then how far is from your house, with online m sounds edical facility we can get the firs aid and that too from a reliable source.All this is not so easy as it sounds, the engine for such performance rquires a gret deal for ARTIFICIAL INTELLIGENCE, but keeping in mind Indian Software cadre this problem will not be a big issue. If this drea projects get success ,it would be a blessing for us,especially for family who do not have a concept of FAMILY DOCTORS.

Effective Use of Computer Resources

These days there is a common fasion of using high end processors for domestic use, particularly in PC . for example using a quad4 processor ! why? so that u can play a high end graphics game such as Assasin's creed or may be NFS carbon. User of such kind really dont undersatnd this is now a seroius problem . Using a high end processor not only waste its computing capability but also its power consumpton. these days IBM is in this server problem of reducing server heat by introducing new technology. It is estimated that with such ignorance there will be huge demand of power supply for these high end processors and that too being used for playing games or at the max for listening songs and executing any other application. Such ratio of increasing power consumption and copmuting power is given by MOORE'S LAW(spelling error).