Flat Files with too many Delimiters

It is not uncommon for flat file sources to have delimiters where they shouldn’t be – extra commas in address fields is a particularly common problem, for example. These don’t necessarily cause the import to fail – especially if the data is not being imported is not being done so in a strongly typed manner.

To understand what happens, you need to understand how SSIS Flat File sources handle delimited files. Unlike Access or Excel, it doesn’t carve up the rows based on it finding a delimiter and creating a column. In an SSIS source the number of columns is predefined. So imagine I had a Flat File Connection manager configured as below:

Fig 1: Flat File Connection Manager

How it reads in a row in a file is as follows:

  1. Scan until reaches 1st comma – Put found data in “RowID”
  2. Continue scan until reaches 2nd comma – Put found data in “MonkeyFood”
  3. Continue scan until reaches 3rd comma – Put found data in “Motivation Factor”
  4. Continue scan until reaches Carriage Return / Line Feed – Put found data in “Cost”

So the outcome is that if you happen to have more than 3 commas in your row, anything after the 3rd comma but before the Carriage Return / Line Feed ends up in the 4th column. So if you look at my example below, where the 2nd row of the text file has gone a little comma crazy and has 6 commas in it:

RowID,Monkey Food,Motivation Factor,Cost

The end result of pumping it through the connection manager as defined in Fig 1 is this:

Fig 2: Output from corrupted file

Note how everything after the 3rd comma has ended up in the final column, because after reaching the 3rd column it has scanned across for the Carriage return / Line Feed to define the final column ignoring any further commas.

There is a sample package here demonstrating the above example.

2 thoughts on “Flat Files with too many Delimiters

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>