So, Finally here I go with an article on CDC (Change Data Capture) implementation through an informatica which had been a long waiting from my side to be posted.

Requirement – To track on the data change in the source system with the earlier data pulled into the warehouse and to track on Updates, new Inserts and Delete on the data that is no more in the source. I would continue on this CDC implementation with an another article on loading of data forward into a table to save the change history for the data user to keep a track on the change history.

Mapping Design:

The mapping I have come up with for the CDC logic would be as below,

CDC_Data_Load_img2

 

(Click on the image for a better view)

The mapping would involve the Staging/source table which would be a daily truncate load from the source system as is with the current status on the transactions from a source. And the data are then to be sorted with respect to the primary key columns, sample table used would be as below. The sample implies on the target CDC table to have the same composite primary key columns.

Source Table:

CDC_Data_Load_img3

Target Table:

CDC_Data_Load_img4

The mapping would involve a simple logic to have a joiner (Full Outer Join) with the source table and the Target CDC table and to check for the matching and non matching records on the Primary key columns as is illustrated below,

CDC_Data_Load_img5

This being a Full Outer Join, enables the loading of all matching and non matching transactions to flow through and then the following three expressions should be defined to check on the Change type,

Flag_DELETE:

ISNULL(TransactionID) AND ISNULL(ProductID) AND ISNULL(ReferenceOrderID) and ISNULL(ReferenceOrderLineID)

 

Flag_INSERT:

ISNULL(TransactionID_old) AND ISNULL(ProductID_old) AND ISNULL(ReferenceOrderID_old) AND ISNULL(ReferenceOrderLineID_old)

 

Flag_UPDATE:

TransactionID=TransactionID_old AND

ProductID=ProductID_old AND

ReferenceOrderID=ReferenceOrderID_old AND

ReferenceOrderLineID=ReferenceOrderLineID_old AND

(TransactionDate != TransactionDate_old OR

TransactionType != TransactionType_old OR

Quantity != Quantity_old OR

ABS(ActualCost – ActualCost_old)>1)

 

Flag => IIF(v_Flag_Update=1,-1,IIF(v_Flag_Insert=1,-2,IIF(v_Flag_Delete=1,-3,0)))

where ‘0’ is for no change records, and then to use a filter transformation to filter out all the no change transactions with Flag=0.

Then a Router to define the grouping as per the result of the above expression on checking if the Transaction is an INSERT, UPDATE or DELETE.

 

CDC_Data_Load_img1

(Click on the image for a better view)

And the Update strategy transformation for each to perform and Update or Insert for each, and to note even for DELETE flags I will be going for a DD_UPDATE, which would update the CDC flag for the transactions not available in the source system to ‘D’ in the target table.

Also to note, that the Delete flow, would involve pulling the ports of CDC source table from the Router to the target and Update on the key column matching records with CDC_Flag=’D’.

And finally to make the session level changes for each target to define ‘Update as Update’ or ‘Update as Insert’ for the targets respectively. Upon running the mapping for the first time all transactions would be tagged as INSERT and then the proceeding runs will then depend on the source system changes to be captured through the CDC logic of our mapping.

Note – The DELETE transactions are the one which are no more in the source table, and hence these once after being tagged as ‘D’ for the CDC_Flag can be deleted from the target CDC table as well after making use of the change on to the History tracking table which will be covered more in detail with the next article shortly.

I will shortly be posting another phase of this topic on CDC and on tracking of history on a transaction change in your warehouse with the CDC. Hope you enjoyed reading this and helps you. Please post your comments and feedback for a better work and clear ideas. Thanks! !

 

Advertisements
%d bloggers like this: