A caution on using Dimensional DSVs in Data Mining – part 2
As a followup to this post I have found that not only does using a table external to the one being mined to provide a grouping fail to actually group within the model, it also confuses the Mining Legend in the Mining Model Viewer.
What I was seeing in the Mining Legend for a node in a Decision Tree was like this:
Total Cases: 100
Category A: 10 Cases
Category B: 25 Cases
Category C: 0 Cases
Category D: 9 Cases
… so the Total cases and the cases displayed didn’t tie up. By digging further using the Microsoft Mining Content Viewer and looking at the NODE_DISTRIBUTION I saw that there were multiple rows for the categories, and the Mining Legend was just picking one of those values.
So if you find youself looking at a node and wondering why the numbers don’t add up – it’s because your grouping hasn’t been used by the model.