The Flat File Source is one of the simplest sources to configure, but conversely the Connection Manager it uses is the one that requires the most manual configuration. As the two are inseparable I will cover them both in this (somewhat lengthy) post.
Configuring The Flat File Connection Manager
This Connection Manager is the means provided for accessing file based data sources. In terms of configuration it is probably the component that requires the most manual input because so little of the metadata can be obtained from the source itself, unlike a relational source for example.
The General Tab
The first thing you are required to do when creating a Flat File Connection Manager is to select a file. This means if the file does not exist yet you will have to create a placeholder file otherwise you won’t be able to proceed. Once a file has been selected the first page of options on the General tab become available. Note absolutely no information is derived from the source file – as an a example, it will not autodetect a CSV file and prefill anything for you even if you select a CSV file type.
The first set of options are to set the Locale, whether the file is in Unicode format and what Code Page to use. In practice you are unlikely to change these options very often, unless data is coming from a country other than your own.
Below the dividing line come the basic file information settings. The most important of these is the Format. Your options are:
- Delimited – most common, where each column is separated by a character such as a comma (CSV), Tab or Space
- Fixed Width – common from mainframes – a file where each column width is a fixed number of characters
- Ragged Right – the same as fixed width except the last column which will terminate with a carriage return at the end of the data item (read how I learned this the hard way)
The next set of options are
- Text Qualifier – a symbol that encloses text strings – commonly a double quote (“) – needed when your text strings may contain the delimiter character – only for Delimited Format files
- Header row delimiter – The end of line delimiter in the header rows
- Header rows to skip – Number of rows to skip before attempting to read in data. Useful if the text file has rows of header data which is of no relevance to the data flow.
- Column names in the first data row – Checkbox to indicate if the first row has the column names in it. If present the connection manager will map these names in the next tab
The Columns Tab
This tab behaves differently depending on whether you selected Delimited or Fixed Width / Ragged Right in the General Tab.
If you are loading a delimited file, the tab gives you the option to specify the Row Delimiter (i.e. the end of row character) and the Column Delimiter (the character that separates data items). If you have these set correctly, a preview will be displayed showing a few rows of sample data, as below:
If you are loading a Fixed Width or Ragged Right file, you will be presented with a screen where you manually set the columns. For Fixed Width, you first need to set the overall row width – this can be done either by dragging the red line which marks the end of a row in the GUI, or entering the Row width in the text box below the GUI. For Ragged Right the end of line is identified by the Row Delimiter character and does not need to be manually set in the GUI. The data should then start looking like rows of data. To then set each column click and place a black line (which is draggable) to mark where each column starts / ends. For a big file this can be a pain, so you can skip this and set it in the Advanced tab if you have the file definition to hand, which I will cover in the next section.
The Advanced Tab
Again, this tab has different options depending on the file type selected.
If a Delimited file type is selected, you will able to set the following properties:
- Name – the column name
- Column Delimiter – the character that terminates the column. This can be manually entered or selected form a drop down. An interesting feature is that because this can vary by column, it is possible to have different delimiters splitting columns within the same row.
- DataType – by default this will be a 50 wide string [DT_STR] – but if you click the Suggest Types button below the properties pane, you can fire a wizard which will sample the data and try and work out the smallest applicable data type to use for the columns. Beware this will apply the results of the wizard to all columns.
- TextQualified – A boolean to say whether the Text Qualifier specified in the General tab applies to thos column.
If a Fixed Width file type is selected, you get these options instead:
- InputColumnWidth – the width of the Incoming column. If you bypassed the GUI in the Columns tab, here is where you would manually set each column width.
- DataType – by default this will a string [DT_STR]. The Suggest Types button is still clickable but seems to have no effect.
- OutputColumnWidth – this is the width of the column the component will output – by default as wide as the incoming column. You may want to shrink this if for example the column is 6 wide but you know it will only ever have a single character in it.
If you are manually configuring a file definition, the New and Delete buttons will insert and delete new columns as required.
The Preview Tab
At this point, the preview tab will simply display your correctly configured file in all its glory.
Configuring The Flat File Source, and its ‘hidden’ options
In the main component editor, all that needs to be done is to select the Flat File connection manager already constructed, choose how to treat nulls in the source and choose your columns.
However the flat file source has a few of extra properties hidden in the Advanced Editor. First, on the Component Properties tab under Custom Properties is a Property called FileNameColumnName – which will add a column to the component that contains the file name. The second, on the Input and Output properties tab, under the Custom Properties of the Output Columns is the FastParse option. For specific numeric and date types and formats, this offers a different parsing engine (i.e. something that tries to convert the text into the data type you expect) which is allegedly faster, as per MSDN documentation here. However I did some loose testing and found performance actually degraded in some cases, and when it did change, it wasn’t by a significant margin – especially as reading flat files is so blisteringly fast in the first place. I would suggest ignoring this feature unless you have a specific performance problem to address. Finally, in the same place there is the UseBinaryFormat option for when you want to pass through raw binary data.
The Flat File Source is the simplest data source to configure – however the Flat File connection manager it needs requires significant manual input to set up correctly. The Connection manager does offer a very high degree of flexibility in how you handle data coming from flat file sources. Reading Flat Files is very very fast, so if you experience performance issues it’s more likely to be tied to network or disk problems that the package itself.