Hello All,

Hope you all had a very great Christmas and awaiting to kick off a new year soon. Before we kick off, a happy new year to all of you in advance..!!

Just when i was waking up to welcome a sunny Saturday, i had a thought mugging around me and telling me to do this. Yes to write a post on what i had been badly attacked by, it was indeed not something of a physical attack 🙂

So without further dawdling let me explain how hard i had to fight against the Target Update Override which helps me customize my Update strategy used in Informatica allowing us to Update any target table despite not having the right Primary Key combination for the Update. I have done this before and have done it with a blunder every time which taught me something right but time and again i have done the same repetitively. I just then had thought to help other fellow developers avoid this and so i am writing this blog post to help you out with the Target Update Override usage.

Probably when you go for an Update over your target, you would use the Update Strategy which would link to your target ports, the Primary key columns on which Informatica would perform an update on the rest of the columns you will be linking with the target. So there would also be a scenario where you will wish to update few fields with a conditional check over the other fields (non primary key columns).

There are two ways of doing this,

Method 1 –

Make target structure change on your Informatica side to make the Ports/columns on which you will be performing a check to update the row. This means you will have to change these check fields to have a Primary key property enabled on the Informatica end and then use the Update strategy transformation as a usual way.

blog_primary_key_infa_2

Method 2-

To script down a Update sql override in the target. It’s easier to be done with a usual Update query syntax to be written on your target object in the mapping and there is a strict syntax to be followed which also means a very proper spacing to be used between the Operators and the sql literals. By default, the PowerCenter Integration Service updates target tables based on key values. You can override the default UPDATE statement for each target in a target definition. And as discussed we would like to Update on the basis of the Non key columns and this can be done something like below,

Sytax:

UPDATE SALES_TBL SET EMP_NAME = :TU.EMP_NAME DATE_SHIPPED = :TU.DATE_SHIPPED TOTAL_SALES = :TU.TOTAL_SALES WHERE EMP_ID = :TU.EMP_ID

Because the target ports must match the target column names, the update statement includes the keyword:TU to specify the ports in the target transformation. If you modify the UPDATE portion of the statement, use :TU to specify ports.

blog_primary_key_infa I made a mistake..!!

I am just adding this part to save your time. I had done a very silly mistake while composing my Update override query. For a very long time my session was failing and i was unable to figure out what went wrong with the session failing stating i had my Update query wrong.

Revisiting it almost like 100 times changing all that i could and unable to get what went wrong 😦

Finally a big brick hit my head and i just copied my Update sql to the notepad and found that i had no space between WHERE clause and my operand, and i was not able to figure this out as i had it going to a next line character, just as is in below picture.. which i would think was just like a usual word wrap..!!

blog_primary_key_infa_3

Hope this helps you and saves your time. Thanks for reading the post and share as it’s fun to learn and share 🙂

So, Finally here I go with an article on CDC (Change Data Capture) implementation through an informatica which had been a long waiting from my side to be posted.

Requirement – To track on the data change in the source system with the earlier data pulled into the warehouse and to track on Updates, new Inserts and Delete on the data that is no more in the source. I would continue on this CDC implementation with an another article on loading of data forward into a table to save the change history for the data user to keep a track on the change history.

Mapping Design:

The mapping I have come up with for the CDC logic would be as below,

CDC_Data_Load_img2

 

(Click on the image for a better view)

The mapping would involve the Staging/source table which would be a daily truncate load from the source system as is with the current status on the transactions from a source. And the data are then to be sorted with respect to the primary key columns, sample table used would be as below. The sample implies on the target CDC table to have the same composite primary key columns.

Source Table:

CDC_Data_Load_img3

Target Table:

CDC_Data_Load_img4

The mapping would involve a simple logic to have a joiner (Full Outer Join) with the source table and the Target CDC table and to check for the matching and non matching records on the Primary key columns as is illustrated below,

CDC_Data_Load_img5

This being a Full Outer Join, enables the loading of all matching and non matching transactions to flow through and then the following three expressions should be defined to check on the Change type,

Flag_DELETE:

ISNULL(TransactionID) AND ISNULL(ProductID) AND ISNULL(ReferenceOrderID) and ISNULL(ReferenceOrderLineID)

 

Flag_INSERT:

ISNULL(TransactionID_old) AND ISNULL(ProductID_old) AND ISNULL(ReferenceOrderID_old) AND ISNULL(ReferenceOrderLineID_old)

 

Flag_UPDATE:

TransactionID=TransactionID_old AND

ProductID=ProductID_old AND

ReferenceOrderID=ReferenceOrderID_old AND

ReferenceOrderLineID=ReferenceOrderLineID_old AND

(TransactionDate != TransactionDate_old OR

TransactionType != TransactionType_old OR

Quantity != Quantity_old OR

ABS(ActualCost – ActualCost_old)>1)

 

Flag => IIF(v_Flag_Update=1,-1,IIF(v_Flag_Insert=1,-2,IIF(v_Flag_Delete=1,-3,0)))

where ‘0’ is for no change records, and then to use a filter transformation to filter out all the no change transactions with Flag=0.

Then a Router to define the grouping as per the result of the above expression on checking if the Transaction is an INSERT, UPDATE or DELETE.

 

CDC_Data_Load_img1

(Click on the image for a better view)

And the Update strategy transformation for each to perform and Update or Insert for each, and to note even for DELETE flags I will be going for a DD_UPDATE, which would update the CDC flag for the transactions not available in the source system to ‘D’ in the target table.

Also to note, that the Delete flow, would involve pulling the ports of CDC source table from the Router to the target and Update on the key column matching records with CDC_Flag=’D’.

And finally to make the session level changes for each target to define ‘Update as Update’ or ‘Update as Insert’ for the targets respectively. Upon running the mapping for the first time all transactions would be tagged as INSERT and then the proceeding runs will then depend on the source system changes to be captured through the CDC logic of our mapping.

Note – The DELETE transactions are the one which are no more in the source table, and hence these once after being tagged as ‘D’ for the CDC_Flag can be deleted from the target CDC table as well after making use of the change on to the History tracking table which will be covered more in detail with the next article shortly.

I will shortly be posting another phase of this topic on CDC and on tracking of history on a transaction change in your warehouse with the CDC. Hope you enjoyed reading this and helps you. Please post your comments and feedback for a better work and clear ideas. Thanks! !

 

This was something of a unexpected bug found in our code dealing with a mapping variable and i was unable to figure out why was the variable value not taken up as expected.

The issue was when i was trying to pass a value from a port to the Variable and use the same in the Workflow. So while i was checking at the Session level, the value was different whereas the same Port value flowing to the target was looking correct.

With some repeated run and test, i was able to catch what had really went wrong. Here i used a mapping variable defined and i had my only option to set the Aggregation option as either ‘MAX’ or ‘MIN’. So went on to select ‘MAX’.

Then comes assigning the port value to the variable with SETVARIABLE(). In doing so i was assuming the value on every run shall just be assigned to the Variable here. I was completely wrong with this assumption.

As referred in the Informatica content-Help,

 “At the end of a successful session, the PowerCenter Integration Service compares the final current value of the variable to the start value of the variable. Based on the aggregate type of the variable, it saves a final current value to the repository. Unless overridden, it uses the saved value as the initial value of the variable for the next session run.”

Breaking this down for an easy understanding – The Variable $$Var will first be assigned the default value, then on the successful run when we try setting a new value say here default value was $$Var=10 and i am having a new value as ’15’ and my aggregation was type ‘MAX’ and so compares the new value with old and finds the new is greater value so $$Var=15 is assigned.

And now this value $$Var shall not change or be over written until i get a value >15. So was my issue where the new values were not assigned as expected with SETVARIABLE().

Thus my solution was to set a minimum value before the session starts, as simple as it is to just make it a override value through Parameter file (.prm file) and so i set it as ‘$$Var=0’. Now my new value gets always the greater value and is assigned to the variable with the latest 🙂

Hope this makes a good understanding, please put down your questions and comments if any. Thanks!!

Lets have some lights on to the basic Architecture of PowerCenter which is an ABCD learning for any Informatica Developer. Its good we know how and where our Infa objects gets created and how the way it runs in the back end.

So as we might be well aware of the multi-tier architecture of our Informatica, having a Client Server model with different services running together for what we just see on the PowerCenter client as objects created, deleted and run. As a developer, we must be well aware of the Repository database and its purpose, saves all the meta data on the objects in Informatica.

Infa_Repo_Architecture

As in here we have the two major services, Repository Services and the Integration Services. The Repository services helps in communication between the PowerCenter components responding to the PC Client and getting in the Repository Database with a native driver connection for the meta data.

PowerCenter Client uses native protocol to communicate with the PowerCenter Repository Service and PowerCenter Integration Service.The PowerCenter Client uses ODBC to connect to source and target databases.

How it works:

When we use the PC Client, and we create a Domain connection with a gateway from the client tool we have the service manager in the server which then connects to the PowerCenter Repository Services with a TCP/IP connection established. Also we define the Repository Database which is defined while installation of Informatica and is connected with a native driver connection from the Repository services to the Database.

Every time a job/task runs it is run through the Client connecting with the Integration service which in turn interacts with the Repository services to get the Repository objects from the Repository Database. It uses TCP/IP to connect to the associated PowerCenter Repository Service and retrieve metadata.

The ODBC Connection in the Client/Integration services:
When ever we try creating source/target on our client machine we have an independent ODBC connections created through the client machine and the Integration Services runs with the similar way by connecting to the source through a ODBC/Native connection established and defined already.

The Flow:

The flow runs as, when we connect to a client (eg. Workflow Manager) a Repository connection is made with the Repository services which retrieves meta data connecting to the Repository database as per the request from the client. Then as we run a workflow, the Client connects to the Integration services and the Command is fetched by the Integration service to connect with the Source/Target through an ODBC connection to fetch the data or write the output. The IS then interacts with the Repository services for the Infa object metadata and the process runs as defined in the mapping in the Integration services.

Hope this gives a bit of clarity on the backstage working of our Informatica. Do let me know your suggestions or corrections and queries on this article.

Hello Friends, Time for some learning. This time i have taken up the Informatica Variables and Parameters which i am very much fond of 🙂

This time i have taken up something that i came across in one of the development my team was involved in, and there was a need to pass a value from one task/session to another as we had a check with respect to the value we get from one session to be used going forward in the process.

blog_banner_session_Variable_assignment

For instance, my requirement is to check on the count of records from session_1 and then to count on records in session_2 but with the same counter (increment from where the Session_1 count ended) ending up with the total count of records. Something like, i had 100 records in session_1 and then i have to count from there as 100+1 and so on for the session_2. Looks simple, but quite a nail biting and challenging experience while we try getting it in Informatica!!

And then we came across this approach/feature available in the Informatica ‘Pre-Session/Post-Session Variable Assignment’. Which was all about the assigning of Variable values between the Sessions and it’s Parent Worklet/Workflow and vice-versa. Interesting right? If not clear lets go with a step by step illustration for a better understanding on this.

1. All we need to do is define a Counter variable/Parameter $$v1_Count for the session_1, and then our session runs to populate $$v1_Count as a usual Mapping Variable/Parameter does.

2. Next is, to assign this value to the common Variable (that acts as an intermediate between two of our sessions) which will be the Parent Worklet/Workflow Variable defined as any other Variable in the Edit Workflow/Worklet Options. In my case i have Workflow variable $$wf_v_Count defined.

3. Now in the session_1, Components tab -> lets edit the “Post session Variable assignment” option (it can be on failure or on success),

(i) Post-session on success variable assignment
(ii) Post-session on failure variable assignment

and i define it as – “Parent_Workflow/Worklet_Variable = Mapping_Variable/Parameter”

i.e., $$wf_v_Count = $$v1_Count

So here we have the $$v1_Count value open for any task,session or Worklet under the same parent can take up and use it for processing.

4. In our case, for the next session_2, i have to go for Components tab -> “Pre Session Variable Assignment” to be defined to assign the value from the $$wf_v_Count to the respective session’s Mapping Variable/Parameter $$v2_Count.

So we define it as “Mapping_Variable/Parameter = Parent_Workflow/Worklet_Variable”

i.e., $$v2_Count = $$wf_v_Count

The same can be used all over the Parent Workflow/Worklet, in any decision tasks or on link conditions. Thus we have now passed on the value from one session to the other and can be processed thereafter. Hope this was an interesting and helpful piece to you.

Do post your comments on any corrections or queries with respect to this. Share this if it can help someone.

When i first came across using the Normalizer, all i was thinking about is just passing multiple values in one row of several columns to multiplly/normalise the source to several rows. Say i have a table source tracking down each Customer’s Instalment on a monthly basis as like, And now my requirement to put these down as different transactions to the target, then as we normalise it through this transformation which comes in handy here.

All we might know

So this was an assumption on what can be the limit of this transformation is just to Normalise the de normalised data, but then this was something i got to know that we can make a big use of some more features of this transformation.The Normalizer tab in the transformation where we define the Ports which are to pass through and doing so we also define the
‘Occurence’ and this is where we define the Column which has to be Normalized from multiple column to rows. As in our instance i define the Monthly amount Column, where i shall have the Occurence given as 12.

Understanding the GCID and GK ports –

Now my Normalizer shall have 12 inputs and the output port for these shall be 12 rows with different Amount values and other Column being of same value. Also i will need to have a track on the months and would like to track one extra column which i can define as Month_Name which has to give the Month. So how i do this, quite simple – as we can see in the port tab along with the 12 Ports of Month_Amount i have a new port auto created as’GCID_(the reoccuring_Column_name)’. This represents a Integer value (Generated COlumn ID). This will be repetitive values of 1 to 12 for each Amount with respect to the month.

Now all we have to do is pull this port also to the next Expression transformation and have your expression to define the Month name depending upon the GCID value. Also.. also we have the another port created by default in the Normalizer for the respective multiple occuring Column GK_{column_name}. This is the Generated
Key column which is a Key column can be used if required for a unique key value.

VSAM and the Pipeline Type:

Also the Normalizer i have used as a transformation was only the Pipeline transformation as what we just discussed in the above example instance. There als is a type known as VSAM Normalizer, which is just a Saource Qualifier like transformation for a COBOL source (COBOL VSAM source). Here the VSAM COBOL source type does gives a de normalized data which are then Normalized through the VSAM Normalizer.The VSAM Normalizer does not allows to edit the ports and the the Normalizer tab is just read only. A VSAM Normalizer transformation has one input port for a multiple-occurring column unlike the Pipeline Normalizer which has multiple input ports for the multiple occuring value.

Have not got a chance to wet my hands with VSAM type, hope to do some day soon and shall update on many more such experiences. Untill then take care and a happy learning 🙂

%d bloggers like this: