I’m currently doing some Tabular Cube design, and part of the design has to include several “Y/N” flag fields. My inner geek then wondered what, if any, difference the data type made to the compression rates of the cube.
So, to start with I created a SQL Server table with three columns with the three usual types for this type of data
… and then populated it with a mix of Y/N’s for char and 1/0’s for tinyint and bit. I thought about 40 billion rows should do it, as that gave me about half a gig of on disk data.
Next I sucked it into a Tabular model, which gave me the following data types:
- char(1) -> Text
- tinyint -> Whole Number
- bit -> TRUE/FALSE
The results were:
- char(1) = 323,754 bytes
- tinyint = 304,256 bytes
- bit = 303,840 bytes
… and yes, that is bytes. Assuming that the data is stored evenly on disk that means that approx 180 million bytes were compressed to 300 thousand – a compression ratio of 600:1 – yoinks!
Anyway, the first pass of the results seem to show that char is mildly more inefficient that tinyint and bit. However, if you take it a level deeper, and just look at the Memory assigned to the In-Memory Table.Columns element, the most significant source of difference between the memory usage, you see this:
- char(1) = 17,944 bytes
- tinyint = 2,344 bytes
- bit = 2,344 bytes
So it appears that char(1) is much less efficient that tinyint or bit for storage, which end up being the same at this level of detail. For those of you who care deeply, you can view my results spreadsheet.
My conclusion? To be honest the amount of memory used by these columns is going to be pretty trivial unless you have a gigantic dataset, but if you have a flag field on your fact tables, I would choose to use bit in your database and TRUE/FALSE in your tabular model if you can.