A caution on using Dimensional DSVs in Data Mining – part 2

As a followup to this post I have found that not only does using a table external to the one being mined to provide a grouping fail to actually group within the model, it also confuses the Mining Legend in the Mining Model Viewer.

What I was seeing in the Mining Legend for a node in a Decision Tree was like this:

Total Cases: 100

Category A: 10 Cases

Category B: 25 Cases

Category C: 0 Cases

Category D: 9 Cases

… so the Total cases and the cases displayed didn’t tie up. By digging further using the Microsoft Mining Content Viewer and looking at the NODE_DISTRIBUTION I saw that there were multiple rows for the categories, and the Mining Legend was just picking one of those values.

So if you find youself looking at a node and wondering why the numbers don’t add up – it’s because your grouping hasn’t been used by the model.

Read More

A caution on using Dimensional DSVs in Data Mining

If you are using a dimensional-style DSV in a Data Mining project, such as below:
b
Fig 1: A Dimensional DSV

Be aware that if you include a column from a Dimension table in your Mining Structure, the model will actually identify each key entry on the source table as a distinct value, rather than each distinct value in the Dimension table. I found this out because I added a grouping category to one of my dimensional tables – a simple high – medium – low group – and there were multiple values in the attribute states for each grouping, as below:

b
Fig 2: Mining Legend

To work around this you will need to add a Named Calculation to get the group on the main table, or convert the main table to a Named Query.

Read More