Handling Recursive Hierarchies in SQL Server

I was recently posed a question on handling recursive hierarchies which left me completely stumped, so I had to find a good solution to it. This post will cover all that, but first –

…what is a Recursive Hierarchy?

A recursive hierarchy is a one where children of parent members can be parents themselves, such as an Organisation chart, or chart of accounts. A simple example is shown below, where the node A2 is both a child of A1 and a parent of A4 & A5:

b

Fig 1: A Simple Recursive Hierarchy

Handling these within a database environment can be difficult because the number of parent / child relationships (aka “Depth”) can vary, so it is impossible to create a fixed width table to accomodate them for their future growth. Consequently, most operational systems store these in a simple two level tabel which records the parent / child relationships only, as shown below.

b

Fig 2: The Parent Child Table

From the operation systems point of view this is usually all it needs to function as they usually only need to know the relationship one step in either direction, which this table satisfies. Determining the parent of A2 or the children of A2 can be determined with a simple SELECT query.

How do they cause difficulties for reporting?

The difficulty faced in a SQL based reporting situation is that you cannot easily determine relationships between parents and grandchildren, great-grandchildren, etc. without nesting queries. The specific problem with this is that you cannot know from a parent exactly how many levels of children lie below it. Vice versa, you cannot know how many levels of parents a child record has. Consequently you cannot know in advance how deep to nest your queries. From a practical standpoint as well, these relationships could be hundreds or thousands deep, and writing that query can pose problems in itself, even if you know there are exactly 1,567 levels in it!

The problem I was posed was superficially simple – if every level of such a hierarchy can have values, such as laid out below, how do you determine the aggregate value for a given parent and all its children?

b

Fig 3: The Values Table

For a tiny example such as this, nested queries is an option – your sub selects would only have to go two deep at most. But what if the depth changed, or ran into the tens or hundreds? Also, how do you create the generic query for any node in the tree? And for added complexity, what if there are multiple trees in the hierarchy? The problems are not trivial to resolve, but I have located two good solutions.

Solution 1: The LR Method

The first, and in my view most elegant, comes from Michael J. Kamfonas in his article “Recursive Hierarchies: The Relational Taboo!“, which I strongly recommend reading to fully grasp the solution. His solution is to pre-number each node in the hierarchy with a value that on the Left forms a lower bound and on the Right an upper bound that allows you to select all nodes under it using a BETWEEN clause. A picture explains this concept clearly, with the L Value in Pink and the R Value in Green:

b

Fig 4: The LR Method applied to a Recursive Hierachy

So as you can see for node A2, the range between its L Value of 2 and R Value of 7 encompasses the L Values of all its children (A4 with 3, and A5 with 5). So to select all the nodes below A2 there is no need to resolve any relationships, you can simply select all children nodes based on their L values.

In database terms, the output looks like this:

b

Fig 5: The L R Table

So, if I wanted to know the sum of all the values below any given node, I would use this pseudo SQL:

SELECT SUM(Value)
FROM ValuesTable v

LEFT JOIN LR_Table lr
ON v.Node = lr.Node

WHERE lr.L_Value
BETWEEN (SELECT L_Value FROM LRTable WHERE Node = ‘ParentNode’)
AND (SELECT R_Value FROM LRTable WHERE Node = ‘ParentNode’)

To demonstrate this in practice, I have created some SQL which creates a recursive hierarchy table as in Fig 2, a values table as in Fig 3 and an LR table as in Fig 5. The LR table is then populated with L/R values by a simple cursor which logs its activity to the message window so you can see what it is doing. I don’t think it will win any prizes for efficiency, but it works! The code sample then closes out with a T-SQL sample which calculates the sum of all it and its childrens values. Download the sample code here.

Solution 2: Kimball Helper Table

Unsurprisingly, Ralph Kimball also has an approach. His involves the use of a “Helper Table” which he describes in his article “Helper tables handle dimensions with complex hierarchies“. As for solution 1 I advise reading the article as I will only go into the practical implications of his approach.

The Kimball approach requires creating a table that stores every path from each node in the tree to itself and to every node below it. Looking at the example below, this means creating one row per node, plus one row for each path down the tree for each parent node to all of its children.

b

Fig 6: Paths to capture in a Helper Table

The helper table also includes a few extra flags – the Level Depth from the top parent, and flags for the top level parent nodes (Topmost) and lowest level children nodes (Lowest). The output table ends up looking like this:

b

Fig 7: The Kimball Helper Table

This makes navigating relative positions in the hierarchy simpler, and allows the answering of the original question easy as all that is required is to join on the Values table to the Helper table for the given parent node, as below:

SELECT SUM(Value)
FROM HelperTable r

LEFT JOIN ValuesTable v
ON r.ChildNode = v.Node

WHERE ParentNode = ‘ParentNode’

However – as you will see from my sample code – populating these tables is much more complex. The sample code creates and populates the recursive hierarchy table as in Fig 2, a values table as in Fig 3 and an Helper table as in Fig 7. The process of populating the table is a mix of cursors, inserts and updates and is much trickier to get right than the LR method, because of the higher demands for information, such as Depth from Parent.

Recursive Hierarchies – no problem!

Above are two solid approaches to the Recursive Hierarchy problem. The code samples provided should help you get started on understanding how to handle these scenarios when trying to report against them in Database based reporting scenarios such as in SSRS. SSAS and other OLAP based reporting handles all of this in its stride however, as described here, for example.

If you find issues with my code samples or see improvements, i’d love to know about them, so please keep me posted about your experiences with them.

Finally, I will close out with the old joke on the subject:

To truly understand recursion,

you must first understand recursion

About BI Monkey

Comments

15 Responses to “Handling Recursive Hierarchies in SQL Server”
  1. Lok says:

    How if each node can have more than 2 child nodes?

  2. BI Monkey says:

    It should handle multiple child nodes – the example is simple for demonstration purposes only.

  3. XEN says:

    if you want to list the hirerachy why are you doing the sum ? Can you please explain how to extract the hirerachy tree

  4. RHD110 says:

    Please show how to display hierarchy in following situations:
    A little domain knowledge before starting
    An assembly can have multiple items and multiple subAssemblies.

    e.g
    (1) Assembly A1
    (50) |___________ SubAssembly SA1
    | |______ I1 (100)
    | |______ I2 (101)
    |
    (51) |___________ Item I10 (110)
    |
    (52) |___________ Item I20 (111)
    |
    (53) |___________ SubAssembly SA2 (150)
    |___________ Item I30 (115)

    I have two table 1.]datAssemblies table and 2.]datComponents table

    1. datAssemblies table having all assemblies entry and Components will have all the mapping with subassemblies and items

    2. subAssemblies entries are also listed in datAssemblies also

    1.]datAssemblies (PK : BillSequenceID )

    BillSequenceID AssemblyID
    1 1
    2 50
    3 53
    4 150

    2.]datComponents (PK : InventoryItemID)
    As SA1 is subAssembly it’s entry is in datAssemblies
    InventoryItemID BillSequenceID ComponentItemID
    1 1 50
    2 1 51
    3 1 52
    4 1 53
    5 2 100
    6 2 101
    7 3 150
    8 150 115

    I now want the hierarchy of the main top assembly
    Please advice

  5. BI Monkey says:

    RHD110, i’m not 100% sure I follow but this is my understanding:

    1) Each assembly can have a sub-assembly
    2) Each assembly can have multiple items

    The issue you have is that because each assembly can have multiple items you cannot simply express the items as a property of the assembly.

    I can see a couple of options for the display of this, depending on what suits your business needs:

    1) Have an Assembly hierachy which is separate from your item dimension or
    2) Treat each Assemblys items as part of the Assembly hierarchy, and indicate that an Item is not strictly part of the assembly hierarchy (e.g. with a prefix on the member name)

    Cheers, James

  6. SRB1 says:

    Please, show ho to get the number of children for each parent.

  7. BI Monkey says:

    As in the example, just use Count instead of Sum – too easy :)

  8. la says:

    what happens to the sum when a node has multiple parent ?

  9. BI Monkey says:

    The scripts here aren’t able to cope with that. From a dimensional perspective it then becomes a design decision about how you handle such cases. Can you support multiple hierarchies for example? Or set up a scenario where data is duplicated to handle each parent? Not easy!

  10. Brenda Brink says:

    I don’t understand Fig. 7 in your example. Kimball’s article says that the depth from parent counts how many levels the subsidiary is below the parent. For example, it seems that DFP for A1-A4 should be 2, not 1, and for A6-A6 should be 0, not 2. What am I missing?

  11. Brett says:

    Your solution helped me tremendously! I was trying to avoid some large recursive queries. Thanks!!!

  12. Mike says:

    The problem with the LR Method, otherwise known as Nested Sets, is inserting, deleting and moving nodes to other parents. When you do one of these things, you have to update all the left and right nodes to compensate for the change. If you’ve put indexes on your Left and Right columns (to improve query performance) and you update a node in a tree with hundreds of thousands of records, you may need to write tens of thousands of updates to left and right columns when you change just one node.

    So, Nested Sets may be good for a content publishing system where updates don’t need to happen in real time, but it’s not ideal for an active web application, especially a distributed one, where reaction time is important.

  13. Gary says:

    I am using the Kimball Helper Table Method perfectly against a SQL-Server database. The only problem is I am now having to port to and Oracle 10(something) database.

    The code does not work in this environment, so I wondered if you had Oracle equivalent code available?

    If yes I would greatly appreciate a copy.

  14. BI Monkey says:

    Sorry Gary, I rarely touch Oracle so don’t have a translated version available.

  15. Colin says:

    Slightly off topic. Have you come across SSRS failing to display a many-to-many recursive parent correctly using a tablix group with a recursive parent? The scenario is a school room as the parent and the student as the child. Most students use most of the rooms, hence many to many. Grouping on the student with the recursive parent set to the school room in a tablix displays EVERY instance of a student under the first room and nothing under the remaining rooms. A bug?

Speak Your Mind

Tell us what you're thinking...
and oh, if you want a pic to show with your comment, go get a gravatar!