Having now researched Columnstore Indexes further, I thought I’d share the key learning I’ve picked up on this feature – which now sounds even more powerful than I’d originally thought.
The most important thing to take away is that a Columnstore Index should actually cover the entire table. Its name is a little misleading – the feature is less of an index, and more of a shadow copy of the table’s data, compressed with the Vertipaq voodoo. I suspect they have used the term index because the Columnstore doesn’t cover all data types – the important ones are there, but some extreme decimals and blobs are excluded – for a full list see the MSDN documentation. So for any big table, whack a Columnstore index across the entire table.
Next up is to understand how to use them and how to detect when they are or are not being used. The key thing is to only use them in isolation (e.g. summary queries) or for Inner Joins. Outer Joins don’t work right now, though there are cunning workarounds that apply if you are Outer Joining to summary data – see Eric Hanson’s video referenced below somewhere around the 50 minute mark.
You can detect when they are being used by the Execution Mode described in the Query Plan. This is new in Denali and is either Row or Batch. Row means traditional SQL Server execution and Batch means the Columnstore is being used.
So, the key takeaways:
- For any large table put a Columnstore index across the entire table
- Only join using Inner Joins
- Spot the use of the Columnstore in Query plans via the Execution Mode of Batch
Useful reference material:
- Eric Hanson’s Tech-Ed video (a bit dull, but informative – if you know your Data Warehousing theory, skip the first 15 minutes)
- The TechNet Columnstore Index FAQ
- Columnstore Indexes for Fast Data Warehouse Query Processing in SQL Server 11.0 – Whitepaper by Eric Hanson
- MSDN documentation