Hello all, I am today going to discuss and brief on some key best practices that i have followed or have come across in my experience with Informatica. Most of the best practices are specific to keep low the ambiguity and make it easy for every other developer around you to understand or share the work and few with intention to have a better performance without making things complex. Let’s quickly go through them,

blog_banner_infa_best_practices

Shortcuts (Shared Folder) – The uses of shared folder to use the sources/targets around multiple folders keeps your repository and objects less and clean. A shared folder allows you make the best use of re-usability in Informatica and maintains consistence and standard and keeps the code in tact with less number of duplicate objects being created.

Re-usability – It’s always good to take advantage of re-usability options on Informatica with the Mapplets, Worklets, reusable transformations, tasks etc. which highly reduces effort on design and keeps things simple.

Simplicity – This is the practice to keep your mapping/code as much as simple as you can, it can be good to have multiple mapping to do things rather than one to do something more complex. Mapping or your code should be easier to understand and debug in the future.

Source-Qualifier – It’s the place where you can tap your required data and filter out unnecessary column /rows that are no were relevant to our requirement. It’s always good to tap out the data at SQ rather than allowing them to be processed and being filtered at a later part of your mapping.

Filter Transformation – It’s the best you use your Filters as close as possible to the Source, to allow lesser data to travel across your mapping and thus improving the performance of your mapping with minimal data.

Most important of all, avoid complex expressions and logic on your filter condition, it’s always good you make them all in your Expression transformation and decide with a Flag port that travels on to the Filter and makes the Filter transformation far too simple and better.

Expression Transformation – It’s always good using more operators (||,+,/) rather than Functions where ever required, and also the processing and loading of Integer type data is always the faster compared to String type. Good to make use of local variables and keep the calculations as much as simple as possible with multiple variables being used to breakdown the complex calculations to less redundant and easy to understand one.

Joiners – If possible, join the data at the SQ, and in case of must to go for a JOINER transformation make sure you use the Source with less records used as the Master and the other source as the Detail which has more records to process.

Parameterize your code – Wherever possible it’s always the best thing to do is to parameterize your code, this helps in making the code easy for any future changes to be made with no change on the Informatica code. Avoid hard coding values, and parameterize them which can ease for any changes going forward.

SQL Query – Always the most important thing to check for is how good your SQ or any SQL related process will work on the DB side. Your source/target relational object on the DB side should be good enough to handle any UPDATE/INSERT/SELECT process more effectively. Go for the SQL execution plan, and accordingly add Indexes (Clustered/Non-Clustered) as and when required on THE DB table.

Router transformation – Always use a less complex conditional expression on the Groups of your Router, avoiding any calculation/function being performed row wise at your router eating away more time and memory in running your session. Instead do your conditional checks for grouping on the expression prior to loading to the Router and just keep it to a Flag which decides the Grouping on the Router making it perform at its best. Never use a Filter after a Router, which makes it logic less use, as the Router is to perform a Group, and hence the filters can be added alongside the logic of your Grouping (which again can be handles at an expression in getting the Flags to group by).

Lookups – Also try using unconnected lookups wherever we can as it gives space for re using the same Lookup multiple times and can use the same value to test/calculate in expression. Use Filter on your Lookup source to avoid Caching in unwanted data thus optimizing the Cache use improving the performance. Try using a Joiner instead of a connected lookup wherever possible.

In case you use a Unconnected lookup and wish to return multiple values to expression, then concatenate the columns and after returning the value go for splitting the values J

Union Transformation – The lesser the Union transformation, better is your mapping performance. Try avoid using Unions and if used keep the number to minimal by trying to Union the source on SQ itself as the Union consumes a lot of Memory and time in processing being an active transformation.

I will keep updating this Post further more with some more points on best practices in Informatica i might come across. Please do share if you have any and that can help people around here. Learning and Sharing is a joy 🙂

Lets have some lights on to the basic Architecture of PowerCenter which is an ABCD learning for any Informatica Developer. Its good we know how and where our Infa objects gets created and how the way it runs in the back end.

So as we might be well aware of the multi-tier architecture of our Informatica, having a Client Server model with different services running together for what we just see on the PowerCenter client as objects created, deleted and run. As a developer, we must be well aware of the Repository database and its purpose, saves all the meta data on the objects in Informatica.

Infa_Repo_Architecture

As in here we have the two major services, Repository Services and the Integration Services. The Repository services helps in communication between the PowerCenter components responding to the PC Client and getting in the Repository Database with a native driver connection for the meta data.

PowerCenter Client uses native protocol to communicate with the PowerCenter Repository Service and PowerCenter Integration Service.The PowerCenter Client uses ODBC to connect to source and target databases.

How it works:

When we use the PC Client, and we create a Domain connection with a gateway from the client tool we have the service manager in the server which then connects to the PowerCenter Repository Services with a TCP/IP connection established. Also we define the Repository Database which is defined while installation of Informatica and is connected with a native driver connection from the Repository services to the Database.

Every time a job/task runs it is run through the Client connecting with the Integration service which in turn interacts with the Repository services to get the Repository objects from the Repository Database. It uses TCP/IP to connect to the associated PowerCenter Repository Service and retrieve metadata.

The ODBC Connection in the Client/Integration services:
When ever we try creating source/target on our client machine we have an independent ODBC connections created through the client machine and the Integration Services runs with the similar way by connecting to the source through a ODBC/Native connection established and defined already.

The Flow:

The flow runs as, when we connect to a client (eg. Workflow Manager) a Repository connection is made with the Repository services which retrieves meta data connecting to the Repository database as per the request from the client. Then as we run a workflow, the Client connects to the Integration services and the Command is fetched by the Integration service to connect with the Source/Target through an ODBC connection to fetch the data or write the output. The IS then interacts with the Repository services for the Infa object metadata and the process runs as defined in the mapping in the Integration services.

Hope this gives a bit of clarity on the backstage working of our Informatica. Do let me know your suggestions or corrections and queries on this article.

Hello friends..

I have been trying to get on with this article for a long time, but have been moving around with other commitments. Finally time to share a interesting topic with all Informatica nerds.

Not so functional or a logical thing i am here with to share with you guys, it was just about how i made use of Sorter and never meant to sort things!

Confused?? OK without much ado lets get on with this..

I was in a need to compare 2 results from same source (SQL Table) and then do an update on the same, kind of a data clean up activity i was involved into. And so i was just done with my mapping design and then just went on to run the session as well. But to my surprise, all i see was the queries, 2 source tables was competing with each other and also there was an insert/update on the target end leading to lot of pressure onto the DB and hence resulting in the process to hang with data load literally halting.

Using_Sorter_Not_For_Sorting

Then a bulb glows (a bit late though).. and the thought was, why cant i tap the data at one point and once it reaches, then to open the tap. But how?

As we all know we have ‘The Integration Service passes all incoming data into the Sorter transformation before it performs the sort operation.’ I just made use of this feature, and as you all might have got it by now, i used 2 sorters at the 2 source flows coming in and joining and then once the source is read with different SQ SQL overrides, all the transformations are performed and finally the data is written on to the same SQL table.

Hope this gets as a Savior to you too at a similar point when you are helpless!!

Share and help others learn with a joy 🙂

When i first came across using the Normalizer, all i was thinking about is just passing multiple values in one row of several columns to multiplly/normalise the source to several rows. Say i have a table source tracking down each Customer’s Instalment on a monthly basis as like, And now my requirement to put these down as different transactions to the target, then as we normalise it through this transformation which comes in handy here.

All we might know

So this was an assumption on what can be the limit of this transformation is just to Normalise the de normalised data, but then this was something i got to know that we can make a big use of some more features of this transformation.The Normalizer tab in the transformation where we define the Ports which are to pass through and doing so we also define the
‘Occurence’ and this is where we define the Column which has to be Normalized from multiple column to rows. As in our instance i define the Monthly amount Column, where i shall have the Occurence given as 12.

Understanding the GCID and GK ports –

Now my Normalizer shall have 12 inputs and the output port for these shall be 12 rows with different Amount values and other Column being of same value. Also i will need to have a track on the months and would like to track one extra column which i can define as Month_Name which has to give the Month. So how i do this, quite simple – as we can see in the port tab along with the 12 Ports of Month_Amount i have a new port auto created as’GCID_(the reoccuring_Column_name)’. This represents a Integer value (Generated COlumn ID). This will be repetitive values of 1 to 12 for each Amount with respect to the month.

Now all we have to do is pull this port also to the next Expression transformation and have your expression to define the Month name depending upon the GCID value. Also.. also we have the another port created by default in the Normalizer for the respective multiple occuring Column GK_{column_name}. This is the Generated
Key column which is a Key column can be used if required for a unique key value.

VSAM and the Pipeline Type:

Also the Normalizer i have used as a transformation was only the Pipeline transformation as what we just discussed in the above example instance. There als is a type known as VSAM Normalizer, which is just a Saource Qualifier like transformation for a COBOL source (COBOL VSAM source). Here the VSAM COBOL source type does gives a de normalized data which are then Normalized through the VSAM Normalizer.The VSAM Normalizer does not allows to edit the ports and the the Normalizer tab is just read only. A VSAM Normalizer transformation has one input port for a multiple-occurring column unlike the Pipeline Normalizer which has multiple input ports for the multiple occuring value.

Have not got a chance to wet my hands with VSAM type, hope to do some day soon and shall update on many more such experiences. Untill then take care and a happy learning 🙂

Just in a situation when i was in need of a guide to help me in getting a Bulk import/export done i was struck as i was not getting one good article that can help me in doing. It was all in bits, hard to get them all in one place and get what i expect. Let me take up this and get it down here, hope this helps for a better understanding.

blog_banner_bcp_utility

We had some data in a flat file populated with the and ETL (Informatica) tool and now i have to load the same data to my SQL database table as a bulk load. And to load your data as bulk we have the BCP utility that comes in handy in allowing bulk load from data file to the SQL table and vice verse.

We have both import and export commands in BCP to load data from and to the SQL table. The standard BCP command syntax as follows,

bcp {db_name.schema_name.Table_Name} in/out {File_Path\file_name.txt} -S {server_name} -t ‘field_terminator’ –T

here we have,
-S -> Server Name
-t -> filed terminator (example “/t” for tab delimited, “,” for comma delimited)
in/out -> ‘in’ for import from data file to sql table and ‘out’ for export o data file from the sql table.
-T -> to allow a trusted connection (for non trusted connection we define ‘-U’ user_name and ‘-P’ password)

This is the simple bcp command that can help you load data faster and in an elegant way.

There is one add on feature or a parameter that you can define for the bcp command which can make the load from and to the data file more tightly mapped and Provides a flexible system for writing data files that requires little or no editing to comply with other data formats.

We need to create a format file before loading the data with the bcp command. And the syntax as follows,

bcp {db_name.schema_name.Table_Name} format nul -c -f {Format_File_Path\file_name.fmt} -S {server_name} –T

format nul -f -> to define the path and the format file name to be created (.fmt or .xml file)

We can have both an xml and a non xml format file, here i go with a non xml format file which gives a structure that looks as
below,

blog_banner_format_file

Hope this helps in getting the bulk load for your data to and from a data file to the sql table. Learn and share 🙂

Just curious to write on this, something not every one would be aware of. This can be really useful in finding the real BUG in you Informatica code. This one good thing in Informatica has a lot to say. Yes when you have a error thrown on running your Informatica session with a row returned to the Bad file, would wonder how can the bad file help or why do we need a bad file to just see a rejected row in it!!

With combination of the session log along with the Rejected bad file can be a a life saver for any Informatica developer. The Rejected bad file saves at the path $PMBadFileDir (by default a bad file directory in your Infa root folder).

So here for all those who are not aware, and for those who are aware but are missing some clarity on this. When we have an error where a data is rejected from being inserted to the target and gets loaded to the bad file, this is majorly due to some data type mismatch or some non NULL allowing target column being inserted with a NULL value and many more such data
issues. How to track what is what here??

That is where the bad file concept comes into the picture to help us out. We have two indicators in a bad file, the row indicator and the column indicator. Where the row indicator says about the whole row and the column says about each column value whether its a valid or an invalid data.

Tips: For a clear and easy reading of the bad file, Save it as a CSV file and open it.

Row Indicator:

The first column in the reject file is the Row Indicator , that determines whether the row was for insert, update, delete or reject. It is basically a flag that determines the Update Strategy for the data row. When the Commit Type of the session is configured as “User-defined” the row indicator indicates whether the transaction was rolled back due to a non- fatal error, or if the committed transaction was failed.

Mahaveer_Blog_Informatica_bad_file

Column Indicatror:

Then is the column indicator, each one along with respective column in the bad file gives the column indicator. It says about the column data that was being inserted/updated to target and might have rejected. The column indicator has several meaning to its value as is shown below.

Mahaveer_Blog_Informatica_bad_file2

 

Hope this helps, keep sharing as knowledge acquired has its value when its shared 🙂

A little light on Update Over ride and its use:

We have a an Update strategy transformation to perform any kind of update on the target table, then what is it all abot the usage of Update over ride in a target.

Before we can jump into our topic, for all those who are new to Update startegy -> Update startegy transformation is something that can help you in updating your target table with the new values on the basis of the Primary key match.
Making it simple,”UPDATE tableA set ColumnA = New_Val Where Primary_Key matches”

So now lets take up the use of Update Over ride option, as i was going through this particular scenario i had to Update my Target table with the new values, but not for the Primary Key Column. Rather i was in a position where i had to update the table on the equation of non Primary key column.

As an instance, i had to update the salary of Employees of country Australia. WHere as in my table i do not have the Country column as a Primary key (Employee_ID shall be a Prmary key here). We got to update this!! but how?

Then comes the use of having Update over ride, where i have to lookinto the Properties of my targe table. There we have the “Update Override”. SImple on we have the expression or the Update query to be written as one below,

UPDATE TableA
SET Employee_ID = :TU.Employee_ID, Country = :TU.Country, Salary = :TU.Salary
–Here is where you add condition of check on basis of Non Primary key
WHERE
Country = ‘AUSTRALIA’

And this shall update the records with the new records (changed salary) that have the country = ‘Australia’. SO this is where the Update Override comes as a help in hand.

Thanks for reading, please share. sharing makes us feel good 🙂

%d bloggers like this: