The Row Count Transformation

Fig 1: The Row Count Transformation
In this post I will be covering the Row Count Transformation. The sample package can be found here for 2005 and guidelines on use are here.
What does the Row Count Transformation do?
The Row Count Transformation counts the number of rows that have passed through a Data Flow and puts that count into a variable. Configuration is simple – all you need to do to is specify the variable name that will hold the row count on the first page of the editor (down at the bottom under “Custom Properties”). There’s a little gotcha here – whilst the tab for ‘Input Columns” is active, if you try to select any columns it will return an error and not allow you to continue.
It is worth noting that the variable is only updated once all rows of data have passed through the data flow – i’ve demonstrated this in the sample package by adding the variable to a column in a Derived Column – it returns zero all the way through, so you cannot use the Row Count as a row number generator.
Where would you use the Row Count Transformation?
The most obvious use is in logging processes – for example counting input rows versus outputs rows or counting failed rows. Anywhere you need to track the number of rows being passed through a given data flow.
MSDN Documentation for the Row Count Transformation can be found here for 2008 and here for 2005.
Count the number of rows in every Table in a Database in no time!
Here is a piece of T-SQL code that uses DMV’s (Dynamic Management Views) to give an approximate row count of every table in your database in virtually no time at all. Bear in mind it is running off collected statistics so won’t always be 100% accurate – but it’s far quicker than doing a Count(*) on a table by table basis when you just need a rough idea when doing sizing. In case you don’t know what DMV’s there are, go poke around in SSMS – [Database] > Views > System Views and see what’s there. Anyway, the code:
SELECT s.[Name] as [Schema] , t.[name] as [Table] , SUM(p.rows) as [RowCount] FROM sys.schemas s LEFT JOIN sys.tables t ON s.schema_id = t.schema_id LEFT JOIN sys.partitions p ON t.object_id = p.object_id LEFT JOIN sys.allocation_units a ON p.partition_id = a.container_id WHERE p.index_id in(0,1) -- 0 heap table , 1 table with clustered index AND p.rows is not null AND a.type = 1 -- row-data only , not LOB GROUP BY s.[Name], t.[name] ORDER BY 1,2
For a very swift explanation – sys.schemas and sys.tables list the schemas and tables in the database, so joining these together on schema_id gives a list of all tables by schema in the database. Adding on sys.partitions then pulls in the partitions associated with each table, and finally sys.allocation_units pulls in the allocation units, which i’m not quite sure what they are – the guts of this query were pulled from another blog which I embarrasingly can’t trace back to now.
I’m no expert on DMV’s so if you have any views on the quality of this query – please leave a comment with your thoughts.