The XML Source component – The Basics

This post is going to cover the XML source component, which allows you to read data from XML sources into SSIS data flows. It is largely a rehash of a post by Matt Mason on the SSIS Team Blog, though this one has some samples to play with. A zip file containing the samples can be found here, read guidelines on use here.

Referencing the XML file, and its associated XSD

The XML Source is unusual in that it does not require a Connection Manager, but instead references a XML data source either as a direct reference to a file,a variable containing a file reference, or a variable containing XML data. These options can be found in the Data Access Mode of the Connection Manager Tab, shown below.

The XML Source Connection Manager Tab
Fig 1: The XML Source Connection Manager Tab

Apart from the XML file location, the XML Source also requires you to specify the location of the the XML Schema Definition (shortened to XSD) which specifies the locations, names and formats (text, numeric etc.) of the data items found in the XML file. This schema definition can exist as part the source XML file, in which case you can check the “Use inline schema” box, and the “XSD location:” box will grey out. See example 4 in the sample package for a demonstration of this. However beware that the schema must be within the data rows tag – see the connect article here for a full explanation, but the inline schema the XML source works with is not compliant to standards.

Otherwise you need to specify the location of the XSD file. If you don’t have one already available, SSIS can attempt to generate one for you by clicking the “Generate XSD…” button, which will prompt you for a file location to generate the XSD in. If successful the XSD generation process will create a .xsd file for you and reference the location in the Connection manager tab.

How XML data is handled by the XML Source

XML Data items can be formed in two ways, as Elements within XML tags, or Attributes of XML tags, expressed simply below:

Attributes Example

<xmldata>
<datarow dataitemname = “dataitemvalue” />
</xmldata>

Elements Example

<xmldata>
<datarow>
<dataitemname>dataitemvalue<dataitemname>
</datarow>
</xmldata>

In each case, the XML Source will treat each identifiable row, as indicated by the <datarow> tag as a incoming row of data, and each data item as a column. Examples of both scenarios are included in the sample package in case 2 and 3.

A more complex scenario is where there are nested XML elements, as in the sample below:

<xmldata>
<datarow>
<dataitem1>dataitemvalue<dataitem1>
<nestedelement>
<dataitem2>dataitemvalue</dataitem2>
<dataitem3>dataitemvalue</dataitem3>

</nestedelement>
</datarow>
</xmldata>

In this case there will be multiple outputs from the XML source. In the sample above there would be one output for datarow, and one for nestedelement. These can then be tied together by a row id column generated by the component. The Columns tab from the package is shown below, detailing the two outputs due to the presence of the nested tags. You can also see the autogenerated row_id column:

The XML Source component Columns Tab for Multi-Element XML sources
Fig 2: The XML Source component Columns Tab for Multi-Element XML sources

The sample in the package then uses a Merge Join to then pull these back together as a flattened line of data, joining on the row id.

Summary

The XML source is a flexible component used to access XML files. It is capapable of generating XSD schema files if required. If you have nested elements, the component generates multiple outputs that can be recombined on an id generated by the components. It is worth noting that the SSIS team blog says it handles XML “to various degrees of success“, which implies that as the XML gets more complicated, you may experience undesirable results.

The MSDN documentation for this component can be found here for 2008 and here for 2005

9 thoughts on “The XML Source component – The Basics

  1. Hi Leigh – the response in the forum is probably in line with what I would suggest – the Advanced Editor is about your best bet, but if Elements are changing name the GUI for the component may simply be unable to cope.

    However that doesn’t mean you are out of luck – if you look at the raw XML of the package (View > Code in Visual Studio) – you can probably edit the xml describing the inputs that have changed name and bypass the GUI altogether. It’s ugly – backup first – but may well save you having to remap everything.

    Let me know how you get on. Oddly enough i’ve been wrestling with XML quite a bit for the last day or two, it’s an evil thing!

  2. I’m trying to pull in soap messages using the XML source in SSIS 2005. I wrote an XSD that breaks down the elements. When I point the XML source to the XML file and XSD it shows me the columns correctly. I then map these to an OLE DB Destination. At this point everything is setup with no warnings or errors.

    The package then executes sucessfully, however, there is no data in the desintation table. Any thoughts on what might be causing this?

  3. Most likely your XSD isn’t quite right if no data is going through the pipeline. The XSD provides something the XML source can read and translate into columns. If the data doesn’t conform to what is described in the XSD it is quite possible for the process to silently “fail” as it cannot recognise any data to pass on.

    Consider testing your XSD with the SQL Server XML Bulk Loader, i’ve done a post on it here in the past.

  4. I’m currently trying to load an xml file into a temp table I’ve created. When I create the connection I have “three” XML categories (Organization, People, Place). I get External Columns for Organization and Place that I can then select to populate my temp table, but my People shows just one column and I can’t figure out why. When I look at the XML file the People element has sub elements and they are populated with data. I’m not sure if it’s a bug in VS 2005, a bug in SSIS, and incorrect Schema file generated by VS 2005, a poorly generated XML.doc file generated by a SOAP, or some other issue I haven’t thought of. Any ideas? I’m not an XML person and if I had my choice I wouldn’t have picked XML as the file type but alas you work with what you get. Thanks in advance for the help provided.

  5. I would suspect that you have an incorrect XSD file – this is what translates the data from the XML. Nested elements can be a pain to work with. Visual Studio has probably generated something that doesn’t quite work – look at the XSD definition documentation – it’s patchy but should get you on teh right track.

  6. Hi there, How do I use a variable for the XSD location as I don’t know where the file will be installed at the client site and I need to put this location into a configuration table in the database. Cheers.

  7. You can do this from the Control Flow. If you check the properties of the Data Flow containing the XML Source, you will see this property: [XMLSource].[XMLData]. You can set this via an Expression.

    It’s a bit ugly but it works. The same trick applies to any data source, so it can be useful for feeding dynamic queries into OLE DB & ADO.NET sources that don’t support parameterised queries.

  8. Hi,
    For every parent node in xml im getting a paretn node_id column in xml source in ssis that is automatically getting generated. I want to know what component in ssis xml source does this and how?

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>