This post is to an extent a small rant about some design decisions I have been constrained by on my current project. These decisions were made predominantly for one fairly bad reason: it made the architect’s life easier (apologies to the architects if they are reading – but these were bad choices!)
The design choices in question are around the managing of history data. In one component of the system it relates to Database storage design, the other relates to Cube storage design. In both cases the history data is stored in a separate location to the “current” data.
Databases: Why separate history tables are a bad idea
The first – and most compelling – reason for not storing your history data in separate tables to your current tables is that it increases complexity for users. Instead of having one location to look for data, your users now have to use two.
The second compelling reason is that there is no point to doing this from a storage point of view. SQL 2005 & 2008 (Enterprise editions only, admittedly) provide partitioning. This enables the contents of an individual table to be stored in separate locations on different filegroups. This means that you can store your current days data in one location and your history in a different one. The reason for doing this is the same as splitting it into separate tables – that querying the current section will be faster than the historic section. In theory queries against partitioned tables should in fact be faster as the current data is now no longer in the same filegroup as the history data.
Now, there is an overhead associated with designing and maintaining partitions but I don’t see that it is significantly larger than that required to deal with the process required to archive data into separate tables on a daily basis. Additionally when maintaining separate history tables, you need to separate out every single table, whether it gets 10 rows a day or 10 million. With partitioning you can just target the large tables that need that focus.
There are other downsides to maintaining separate tables. If you make a change to a table design, you need to do it in 2 places. You also need to remember to update your history processes. If your history process fails, you can end up with users getting unexpected query results or ETL process failures when the system loads the next day’s data into the current tables – and untangling it becomes a real mess. If your partition processes fail to run, you just have too much data in one filegroup for a while – unlikely to be fatal.
So if you have large tables you need to split out for performance purposes – do it at the back end, using the power of the database – which is designed to store data efficiently. Keep it away from the users – they neither need to know or care about your need to keep the data separate. If you want to give them a single object to query with the current day’s data, just use views.
Cubes: Why a separate history Cube is a bad idea
Much of the above applies here – SSAS also has partitions – so you can again store your historic and current data in separate physical locations with the users being totally unaware of this. Again there is overhead in maintenance, but this will also balance out with the maintenance and risks associated with maintaining two identical cubes that only differ in terms of data source.
Use your storage options!
So without banging on about the same things any further, please consider the following two points whenever considering managing your history data:
- How does what i’m planning affect my users?
- How does what i’m planning leverage the platforms capabilities?