Loops and Lookups – a performance problem

A bit of a performance tuning nugget around loops and lookups which I faced recently.

We have a scenario where we are looping a few hundred times to execute a series of changing SQL statements, then passing the data through lookups before writing to target. We are doing this with a mix of the standard lookup and Cozyroc’s Lookup Plus task. However we were finding that while executing each SQL query took less than a second, the SSIS package was taking a long time to run each loop.

The loop was taking the query and then pausing for a while before moving any data. Eventually we realised this pause was SSIS building up the caches for the lookups. The standard lookups were running off a pre-built cache but the CozyRoc task doesn’t support caching and had to rebuild each lookup cache every time the loop was instantiated – not a big job but repeated a few hundred times those few seconds soon mount up.

The lessons here are:

  1. If you are reusing a lookup multiple times, use the Cache Transform to preload the cache into memory (2008 onwards)
  2. Be aware of the performance implication of caching lookups and the fact that they will be rebuilt each time the Data Flow is started

4 thoughts on “Loops and Lookups – a performance problem

  1. Hi James,

    The CozyRoc Lookup Plus component supports IDictionary objects provided from outside with package variable. You can use this script to load your lookup collection in Hashtable based object:

    http://www.cozyroc.com/script/dictionary-destination

    And then you can configure the Lookup Plus to use this variable. The component can accomplish similar result like the cache transform in 2008. Contact us if you need further details.

    p.s.
    I would recommend you check the sample at the bottom of the CozyRoc Lookup Plus documentation. It shows how you can construct your own custom structures and then use them. This is not possible with the cache transform in 2008.

  2. Hi i am new to this ssis stuff,i would like to know in which scenarios we would be using cache transform translation tool. i had search for it and i was only able to find out the use of it but cannot really figure out in which scenarios can we use it.

  3. Hi Shadab.

    In SQL2012 the Cache transform has become more useful as you can use casches across packages. So one example would be if you have a large date lookup table (a common DW scenario) you could load the cache once in your job, and then reuse the cached data across many fact table loads.

    In practice, unless your lookups are very large, used a lot or reloaded many times (as in the scenario in this post) you probaby can get by without ever using it.

    Cheers, James

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>