The Raw File Source and Destination

In this post i’m going to cover the Raw File Source & Destination together, as the two are inseparable and neither are particularly complex. The sample package can be found here for 2008, read guidelines on use here.

Fig 1: The Raw File Source and Destination Adapters

Fig 1: The Raw File Source and Destination Adapters

What are Raw Files and why use them?

Raw files are designed to pass data between Data Flows, by effectively dumping data in SSIS native format into a file. When I was new to SSIS I tried to pass data between flows this using Recordsets, and fell over at the lack of a Recordset Source, which caused me to do the usual cursing of SSIS developers. Then Jamie Thompson enlightened me as to why no such Adapter exists. Because the Recordset is dumped into an Object Variable which has unknown column metadata, you would have to manually configure the Recordset Source for each and every column. Raw Files contain the column metadata, so the Raw File Source Adapter can pick up this information with ease – with the obvious caveat that you need to create the file first.

Another good reason to use them is that because the file is written locally in an extremely SSIS friendly format, read / write operations are extremely fast. If you have to use staging points in your package, where you push half-finished data back into a database or spend too long accessing live systems, consider dumping it to a Raw File instead to speed it up.

Using Raw Files

In the simple example I have created, in step 1 I suck all the data from the SalesOrderDetails table and simply dump it into a Raw File. This process takes about 2 seconds on my machine. If you open the Raw File with a text editor you will recognise some text or details but it’s mostly human-unreadable, as you can see below.

An SSIS Raw File viewed in Notepad

Fig 2: An SSIS Raw File viewed in Notepad

The only things to note about configuring the Raw File destination is what write option you use to create the file – Create Always, Create Once, Append or Truncate and Append, and whether you populate the file name directly or from a variable. To show how fast it performs, in step 2 I then read that data from the Raw File created in Step 1 and dump it out to a raw File again. This then takes about 1 second – so in a simple read / write of data, you get a substantial performance gain against reading from OLEDB.

I have to admit Step 2 threw me to start with when I was building the sample package as I dumped the output from the Raw File into a Recordset Destination, thinking that would be faster, but that process actually took 12 seconds – which is a further argument against using Recordsets for transferring data about within packages.

Summary

The Raw File Source and Destination is a fast way of moving data between data flows, and can also be used when having to stage data to avoid slow writes to other locations.

MSDN documentation for the raw file source can be found here for 2008 and here for 2005. Similarly, documentation for the raw file destination can be found here for 2008 and here for 2005.

About BI Monkey

Comments

2 Responses to “The Raw File Source and Destination”
  1. Iain says:

    “Because the Recordset is dumped into an Object Variable which has unknown column metadata, you would have to manually configure the Recordset Source for each and every column” – well trying to import from a raw file as a source from a raw file destination that you’ve just created results in no metadata being available for the raw file source at design time. You actually have to create the raw file destination file so that the raw file source can read the appropriate metadata from the existing file before being able to continue.

    On both of these though [raw file destination and recordset destination] the appropriate metadata is on the input side of the task (the task can only put out what was given in to it) so if only SSIS could allow this to carry through the task (as it does with the derived column task for example) then you could use a recordset destination as both a destination and a source within a dataflow. SSIS is heavily flawed in this regards, making what should be a simple dataflow operation very clumsy and much more complicated than it needs to be.

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!