A bit of a performance tuning nugget around loops and lookups which I faced recently.
We have a scenario where we are looping a few hundred times to execute a series of changing SQL statements, then passing the data through lookups before writing to target. We are doing this with a mix of the standard lookup and Cozyroc’s Lookup Plus task. However we were finding that while executing each SQL query took less than a second, the SSIS package was taking a long time to run each loop.
The loop was taking the query and then pausing for a while before moving any data. Eventually we realised this pause was SSIS building up the caches for the lookups. The standard lookups were running off a pre-built cache but the CozyRoc task doesn’t support caching and had to rebuild each lookup cache every time the loop was instantiated – not a big job but repeated a few hundred times those few seconds soon mount up.
The lessons here are:
- If you are reusing a lookup multiple times, use the Cache Transform to preload the cache into memory (2008 onwards)
- Be aware of the performance implication of caching lookups and the fact that they will be rebuilt each time the Data Flow is started