Gimme a table, but hold the indexes there buddy!

I’ve just wrapped up a quick bit of work with a client who was just getting moving on their DW journey, and I found some fun things that the rookies had done.

I was also partly inspired to talk about this particular issue by this timely DBA Reactions post: When the developers hear my warning about 27 indexes on a single table

When the developers hear my warning about 27 indexes on a single table
When the developers hear my warning about 27 indexes on a single table

 

Part of what I was looking at was applying compression to their tables to reduce the disk space consumption. So I ran my handy query to size the databases and found an 8GB table with 12GB of indexes! Yoinks – something was clearly astray…  it turned out that one table – with about 30 columns – had 8 indexes on it.

Why not have 27 indexes?

So there’s a few good reasons:

  • That many indexes take a long time to build / update, slowing load performance (in this case it took about 5 minutes to load the table and a further 29 minutes to build the indexes….)
  • The Query Optimiser is going to struggle to pick the right index if any of them overlap
  • If you are on 2012+ you can just stick the entire table in a columnstore index, massively boost performance and consume less disk space (applying a columnstore on the whole table took a whopping 1 minute to build)

As a broad rule I’d consider these rules of thumb following when looking at a tables indexes:

  • If I have more than 2 indexes, I’m probably not indexing well, my table design is wrong or my queries need re-evaluating
  • If my indexes are more than 20% of the size of the table, I should probably look at replacing them with a whole table columnstore
  • Are my indexes making use of included columns to increase their coverage?

If any readers have some good index management tips, please share them in the comments!

Read More

Columnstore Indexes revisited

Having now researched Columnstore Indexes further, I thought I’d share the key learning I’ve picked up on this feature – which now sounds even more powerful than I’d originally thought.

The most important thing to take away is that a Columnstore Index should actually cover the entire table. Its name is a little misleading – the feature is less of an index, and more of a shadow copy of the table’s data, compressed with the Vertipaq voodoo. I suspect they have used the term index because the Columnstore doesn’t cover all data types – the important ones are there, but some extreme decimals and blobs are excluded – for a full list see the MSDN documentation. So for any big table, whack a Columnstore index across the entire table.

Next up is to understand how to use them and how to detect when they are or are not being used. The key thing is to only use them in isolation (e.g. summary queries) or for Inner Joins. Outer Joins don’t work right now, though there are cunning workarounds that apply if you are Outer Joining to summary data – see Eric Hanson’s video referenced below somewhere around the 50 minute mark.

You can detect when they are being used by the Execution Mode described in the Query Plan. This is new in Denali and is either Row or Batch. Row means traditional SQL Server execution and Batch means the Columnstore is being used.

So, the key takeaways:

  • For any large table put a Columnstore index across the entire table
  • Only join using Inner Joins
  • Spot the use of the Columnstore in Query plans via the Execution Mode of Batch

Useful reference material:

Read More

Columnstore indexes in Denali (aka: “Apollo”)

James Serra has a great post on a new feature in SQL Server Denali – Columnstore indexes.

The tl/dr version is this:

Columnstore indexes use the Vertipaq compression engine (that’s the shiny compression engine in PowerPivot) to further compact indexes to make querying them between 10-100 times faster.

The most significant limitation is that tables become read-only when they have a columnstore index (no Inserts / Updates / Deletes etc) – though James notes you can work around this by using Partitions if you are dealing with tables that are just additive. Otherwise indexes will need to be dropped and recreated as data changes.

So – a powerful new indexing feature which, with careful management – can have a serious positive impact on the performance of your Data Warehouse.

Read More