Thursday, 12 January 2012

Oracle Golden Gate for Data Synchronization

This is one of my first professional post after 2-3 years, since the purpose of the blog is to highlight some of the major opportunity we as a student may get from here.
Business Intelligence is most of the time has been focused on reporting and metric value, obviously that is one of the pivoting reasons business takes leverage, but adding a new dimension to the same thought how do we populate the data for the senior business users so that they may take accurate decisions, what if the leadership is taking decision from different sources, adding one more value what if we can make this data a real time data. I know the story is puzzling let's imagine a scenario.
Suppose you being the owner of a firm that has a chain at three different countries, now for obvious reason of availability of technical expertise you have different environment set up to capture business data, lets say Oracle, Microsoft and MySQL. Now the first problem for the leadership is to bring all these data onto one platform, so that leadership take a holistic decision, since they all are physically separated, is there any mechanism we can integrate all of them at real time.

what follows below is my research work on real time database integration using one of the latest technology in the market.

"Data Synchronization is today an important part of data warehousing, for a large firm where different application have data residing on different platform, like SIBEL, Oracle Database, etc. it is important that organization is able to access the data on one platform, to make correct decision, popularly known as Business Intelligence.

The way to synchronize different database is done with the help of Change Data Capture also called CDC process. Which captures any change in data called the delta from the source and makes a replication on the target. Earlier Oracle has Oracle Stream to capture these changes, triggers and other methods. The problem with these methods were, they are dependent on Oracle Environment, and the biggest of all they interact directly with the live server, also called the OLTP system. Since an OLTP system is designed for the application access. A constant ping from such CDC process will create performance issue for the DB and Application like making the I/O process long, high CPU utilization which is certainly not desirable for the end users and application.

Broadly we can define the CDC process in 2 different categories:

1) Non-Log based category

2) Log –Based Category.

Oracle has now provided one of the most robust solutions for the data synchronization called the Oracle Golden Gate. The added advantage for the Oracle Golden Gate over other similar product from Oracle are, first it is not dependent on the platform. Oracle Golden Gate offers wide range of cross platform compatibility for different source and target. The other and most important reason for using OGG is its power to read from the logs of the live server (OLTP) system, causing the same data capture mechanism as done by other CDC process, but still not causing any performance issue with the OLTP system. The architecture for the OGG is such that it reads all the changes for the data in the table from the log files and process only those records which are COMMIT

The above diagram shows the part of pulling the data from the OLTP system while capturing the data from logs and not touching the OLTP DB. The data is read by OGG, at this point of time the OGG is said to be operating as an Extract and writing it on the file called “Trail files”, the data in these trail files are propriety of Oracle and can be read by OGG process. The same trail files can be read by the OGG at the target end and the CDC delta captured at source side is written on the target DB. The above data flow is for and Real TIME BI reporting, but changing the architecture OGG can be configure for Data recovery for Disaster recovery. The real time integration means that change captured can be send across the network with minimum time delay (less than 10 sec).

With the Data flow outlined above, OGG can be configured to capture data at schema level, table level and column level across different platforms and can be migrated to target system. OGG also gives the features to filter out the data from the source side itself, for example business might be interested in capturing the data for a particular time stamp filtering out the unnecessary data.

With this small document on OGG I have tried to show that data synchronization can be done across different platform with zero downtime. I have not gone into details for the component of Oracle Golden Gate, like Extract, Data Pump, trail files, Replicate and definition files. I have attended a three day session on Oracle Golden Gate as a part of my training inand was handling this application for over 1 year."

This part of my implementation may not fall exactly under the umbrella of BI but still for smart BI solutions firm may still want to implement the idea.

Also for the readers I have created a small prototype of the above architecture on my machine, if any one is interested please let me know, I will be more than happy.

Regards,
Siddharth gupta








No comments: