Oracle Change Data Capture – An Architecture for Cloud Adoption

Oracle Change Data Capture – An Architecture for Cloud Adoption

All organizations in the modern business environment depend heavily on data. In the past, this data was based in massive databases, and batch ETL transactions were used to move these huge volumes of data to warehouses and other data stores for analytics to make critical operational decisions. As advanced technologies were introduced businesses began to modernize their database management systems and started to rely on the cloud for analytics. However, they realized that getting real-time data insights was not easy and the existing databases could not be seamlessly replaced even though the data and transactions taking place in them were essential for analytics.

At this juncture, businesses realized that the volume and velocity of their data were increasing exponentially. The critical need was to opt for scalable cloud adoption and Change Data Capture (CDC) from databases like MySQL, SQL Server, Oracle, and others. Oracle CDC is particularly one area where companies saw a rising trend in modern data integration use cases. More and more companies began moving to event-driven architectures to optimize the dynamic distributed scalability of Oracle CDC and other databases.

The advantage of Oracle CDC is that events can be extracted as they are created in real-time, enriched with in-memory, SQL-based denormalization, and subsequently delivered to Azure Cloud. It ensures scalable, real-time, low-cost analytics, without in any way impacting the source database.

Before going further into the intricacies of Oracle CDC, it is important to know more about the concept of Change Data Capture.

Sometimes, data warehousing requires the extraction and transportation of relational data from a single or multiple source database into the data warehouse for later analytics. The task of Change Data Capture is to speedily identify and process only the data that has changed for later use. CDC, therefore, does away with the need to process entire tables or carry out full refreshes whenever a change is made at the source database. Without CDC, extraction of the database becomes a tedious activity where entire contents of tables have to be moved into flat files which then have to be loaded into the data warehouse.

Oracle CDC does not depend on intermediate files to place the data outside the relational database. It captures the change data that is a result of INSERT, UPDATE, and DELETE operations that are made to user tables. The change data is then stored in a change table which is a database object and this change data can be used by applications in a controlled manner.

What Is the Effect of Database Extraction with or Without Oracle CDC?

  • Extraction: With CDC, database extraction happens immediately from INSERT, UPDATE, and DELETE operations at the same time that the changes take place in the source tables. Without CDC, database extraction is not effective for INSERT operations and very problematic for UPDATE, and DELETE as the data is no longer available in the table.
  • Staging: With CDC, staging data is directly placed in relational tables and flat files are not required. Without CDC, the contents of the entire tables are moved into flat files.
  • Interface: CDC provides an easy-to-use publish and subscribe interface through DBMS_LOGMNR_CDC_PUBLISHand DBMS_LOGMNR_CDC_SUBSCRIBE Without CDC, the process is susceptible to errors and requires extensive manpower to administer.
  • Costs: Oracle CDC is provided with the Oracle 9i and later database server. Overhead costs are reduced by simplifying the extraction of change data. Without CDC, the process is expensive as the capture software must be written and maintained in-house or be purchased from a third-party vendor.

These are some of the massive benefits of Oracle CDC vis-à-visnon-CDC registering of changes.

There are two modes of Oracle CDC

Oracle CDC supports two journalizing modes.

  • Synchronous mode: In this mode, triggers are placed at the source database so that any change made to the data is captured immediately. Each SQL statement here performs a DML (Data Manipulation Language) activity which is primarily Insert, Update, and Delete. In this mode, the change data is captured as a component of the transaction and is responsible for changing the data at the source. The Synchronous mode feature is provided in the Standard Edition and the Enterprise edition of Oracle.
  • Asynchronous mode: In this journalizing mode, data is sent to the redo files and the change data is captured after a SQL statement is initiated through a DML activity. The data that is modified is not captured as a part of the transaction that led to the changes in the source table. Hence, it does not impact the transaction in any way. There are three modes of asynchronous Oracle CDC which are HotLog, Distributed HotLog, and AutoLog.

The Oracle CDC feature was first introduced by Oracle in its 9i version. It helped to track and record all changes made to the user tables in a database. These were stored for use in ETL applications in change tables. The change data could then be processed and transferred to other databases and data warehouses.  Subsequently, Oracle released another form of CDC technology with their Oracle 10g version that leveraged redo logs of the source database and the built-in tool for Oracle CDC which was called Oracle Streams.