The Raw File Source and Destination
In this post i’m going to cover the Raw File Source & Destination together, as the two are inseparable and neither are particularly complex. The sample package can be found here for 2008, read guidelines on use here.
What are Raw Files and why use them?
Raw files are designed to pass data between Data Flows, by effectively dumping data in SSIS native format into a file. When I was new to SSIS I tried to pass data between flows this using Recordsets, and fell over at the lack of a Recordset Source, which caused me to do the usual cursing of SSIS developers. Then Jamie Thompson enlightened me as to why no such Adapter exists. Because the Recordset is dumped into an Object Variable which has unknown column metadata, you would have to manually configure the Recordset Source for each and every column. Raw Files contain the column metadata, so the Raw File Source Adapter can pick up this information with ease – with the obvious caveat that you need to create the file first.
Another good reason to use them is that because the file is written locally in an extremely SSIS friendly format, read / write operations are extremely fast. If you have to use staging points in your package, where you push half-finished data back into a database or spend too long accessing live systems, consider dumping it to a Raw File instead to speed it up.
Using Raw Files
In the simple example I have created, in step 1 I suck all the data from the SalesOrderDetails table and simply dump it into a Raw File. This process takes about 2 seconds on my machine. If you open the Raw File with a text editor you will recognise some text or details but it’s mostly human-unreadable, as you can see below.
The only things to note about configuring the Raw File destination is what write option you use to create the file – Create Always, Create Once, Append or Truncate and Append, and whether you populate the file name directly or from a variable. To show how fast it performs, in step 2 I then read that data from the Raw File created in Step 1 and dump it out to a raw File again. This then takes about 1 second – so in a simple read / write of data, you get a substantial performance gain against reading from OLEDB.
I have to admit Step 2 threw me to start with when I was building the sample package as I dumped the output from the Raw File into a Recordset Destination, thinking that would be faster, but that process actually took 12 seconds – which is a further argument against using Recordsets for transferring data about within packages.
The Raw File Source and Destination is a fast way of moving data between data flows, and can also be used when having to stage data to avoid slow writes to other locations.