Bulk loading XML with SQLXMLBulkload

I was recently asked to help out with loading some awkward XML into SQL Server using the SQLXMLBulkLoad feature of SQL Server (it’s been there since 2000 but is one of SQL Server’s lesser known features). It’s a more graceful approach than using the XML Source Component in SSIS but has its own limitations and frustrations.

The specific problem I was trying to resolve was loading a file that had an Element that only had sub-Elements and contained no data, which resulted in me gettingĀ  the error “Schema: the parent/child table of the relationship does not match”

What is SQLXMLBulkload?

Apart from a mouthful with a vowel shortage, it is a standard component of SQL Server that provides a means to bulk load XML into standard relational tables. All you need is the SQL Server target tables, a script to invoke the SQLXMLBulkLoad, the XML file itself and an appropriately formed XSD schema to describe the XML and how to load it into the relational schema. The biggest limitation is that you cannot manipulate the data as it is imported – it is after all a bulk loader. The frustration part comes in forming the XSD schema properly, which can be a bit of a black art.

How to Invoke SQLXMLBulkload

The script to call the bulk loader is pretty simple – my example below was simply saved in notepad with a .vbs extension, which is then executed with a double click (or could be called from SSIS using an Execute Process task).

Set objBL = CreateObject(“SQLXMLBulkLoad.SQLXMLBulkLoad”)
objBL.ConnectionString = “Data Source=SERVER\INSTANCE_NAME;Initial Catalog=TARGET_DATABASE;Provider=SQLNCLI10.1;Integrated Security=SSPI;”
objBL.ErrorLogFile = “C:\error.log”
objBL.Execute “C:\sample.xsd”, “C:\sample.xml”
Set objBL = Nothing

Handling Elements with sub-Elements only with sql:is-constant

Now, to the problem. My XML source contained a nested Element which had no content itself, but had further multiple occurences of a nested Element within it, as below:

<Nuts>
<Nut Id = “1” Name = “Peanut”>
<Photos>
<Photo>nutty1.jpg</Photo>
<Photo>nutty2.jpg</Photo>
</Photos>
</Nut>
<Nut Id = “2” Name = “Cashew”>
<Photos>
<Photo>nutty3.jpg</Photo>
<Photo>nutty4.jpg</Photo>
</Photos>
</Nut>
</Nuts>

For ages I was going in circles trying to get the relationship set between the Nut and Photo elements (using the sql:relationship annotation) but had endless trouble because each nesting demanded a relationship – so it looked like it needed a Nut > Photos > Photo relationship, but because the Element <Photos> had no data items there was nothing on which I could establish a relationship. Eventually I stumbled upon the sql:is-constant annotation which tells the XSD schema that the element doesn’t map to a database table or column – and one of its documented uses is specifically to create a container element, as my situation required. So my final XSD looked like this:

<?xml version=”1.0″?>
<xsd:schema xmlns:xsi=”http://www.w3.org/2001/XMLSchema-instance” xmlns:xs=”http://www.w3.org/2001/XMLSchema” xmlns:xsd=”http://www.w3.org/2001/XMLSchema” xmlns:sql=”urn:schemas-microsoft-com:mapping-schema” attributeFormDefault=”unqualified” elementFormDefault=”qualified”>
<xsd:annotation>
<xsd:appinfo>
<sql:relationship name = “Link” parent = “Nuts” parent-key = “Id” child = “NutPhotos” child-key = “Id” />
</xsd:appinfo>
</xsd:annotation>
<xsd:element name=”Nut” sql:relation=”Nuts”>
<xsd:complexType>
<xsd:sequence>
<xsd:element name=”Photos” sql:is-constant = “1”>
<xsd:complexType>
<xsd:sequence minOccurs=”0″>
<xsd:element name=”Photo” sql:field = “Photo” sql:relation=”NutPhotos” sql:relationship = “Link” />
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
<xsd:attribute name=”Name” type=”xsd:string” sql:field=”Name” />
</xsd:complexType>
</xsd:element>
</xsd:schema>

One tiny bit of annotation solved my headaches – understanding the flow of XML and how XSDs interpret it is definitely tricky, but it can be very powerful. Of course, the fact that this took me four hours to solve using SQLXMLBulkload, but 5 minutes in SSIS took the colour off it a bit, but SSIS can’t always be used, as was the case for the person I helped out.

The SQLXMLBulkLoad documentation on MSDN is actually a good reference and well worth spending a little time going over so you can better understand this feature.

Read More

The XML Source component – The Basics

This post is going to cover the XML source component, which allows you to read data from XML sources into SSIS data flows. It is largely a rehash of a post by Matt Mason on the SSIS Team Blog, though this one has some samples to play with. A zip file containing the samples can be found here, read guidelines on use here.

Referencing the XML file, and its associated XSD

The XML Source is unusual in that it does not require a Connection Manager, but instead references a XML data source either as a direct reference to a file,a variable containing a file reference, or a variable containing XML data. These options can be found in the Data Access Mode of the Connection Manager Tab, shown below.

The XML Source Connection Manager Tab
Fig 1: The XML Source Connection Manager Tab

Apart from the XML file location, the XML Source also requires you to specify the location of the the XML Schema Definition (shortened to XSD) which specifies the locations, names and formats (text, numeric etc.) of the data items found in the XML file. This schema definition can exist as part the source XML file, in which case you can check the “Use inline schema” box, and the “XSD location:” box will grey out. See example 4 in the sample package for a demonstration of this. However beware that the schema must be within the data rows tag – see the connect article here for a full explanation, but the inline schema the XML source works with is not compliant to standards.

Otherwise you need to specify the location of the XSD file. If you don’t have one already available, SSIS can attempt to generate one for you by clicking the “Generate XSD…” button, which will prompt you for a file location to generate the XSD in. If successful the XSD generation process will create a .xsd file for you and reference the location in the Connection manager tab.

How XML data is handled by the XML Source

XML Data items can be formed in two ways, as Elements within XML tags, or Attributes of XML tags, expressed simply below:

Attributes Example

<xmldata>
<datarow dataitemname = “dataitemvalue” />
</xmldata>

Elements Example

<xmldata>
<datarow>
<dataitemname>dataitemvalue<dataitemname>
</datarow>
</xmldata>

In each case, the XML Source will treat each identifiable row, as indicated by the <datarow> tag as a incoming row of data, and each data item as a column. Examples of both scenarios are included in the sample package in case 2 and 3.

A more complex scenario is where there are nested XML elements, as in the sample below:

<xmldata>
<datarow>
<dataitem1>dataitemvalue<dataitem1>
<nestedelement>
<dataitem2>dataitemvalue</dataitem2>
<dataitem3>dataitemvalue</dataitem3>

</nestedelement>
</datarow>
</xmldata>

In this case there will be multiple outputs from the XML source. In the sample above there would be one output for datarow, and one for nestedelement. These can then be tied together by a row id column generated by the component. The Columns tab from the package is shown below, detailing the two outputs due to the presence of the nested tags. You can also see the autogenerated row_id column:

The XML Source component Columns Tab for Multi-Element XML sources
Fig 2: The XML Source component Columns Tab for Multi-Element XML sources

The sample in the package then uses a Merge Join to then pull these back together as a flattened line of data, joining on the row id.

Summary

The XML source is a flexible component used to access XML files. It is capapable of generating XSD schema files if required. If you have nested elements, the component generates multiple outputs that can be recombined on an id generated by the components. It is worth noting that the SSIS team blog says it handles XML “to various degrees of success“, which implies that as the XML gets more complicated, you may experience undesirable results.

The MSDN documentation for this component can be found here for 2008 and here for 2005

Read More