Death by reconciliation

In a system migration, one of the most common testing requirements is “the numbers must match the old system”. Which sounds reasonable enough – especially in systems where reports go externally and if the numbers change your image suffers if you can’t explain the change in a manner acceptable to the consumer.

However from an IT project perspective, this testing requirement is a surefire way to ruin the projects reputation, annoy the customer and drive everyone on the project insane. Let me explain…

The Old System is not the New System

As obvious as it may seem, the problem with matching the old system is that what has been specified as the new system is highly unlikely to match the old one. The reasons for this are manifold, but common examples include forgotten practices and undocumented changes. Even with great business requirements you’ll still find this stuff. The older the system, the more skeletons live in the cupboard.

What happens when you start the reconciliation process is that you discover the business requirements you were given don’t produce the same results as the old system, despite nominally doing the same thing because:

  • Implicit behavior not captured (e.g: Exclude anything over 5 years old. The old system also threw away anything with a negative age)
  • Explicit behavior not captured (e.g: Product “B” is overridden to product “A” on Wednesdays)
  • The old system is wrong (e.g: it just ignored orders with a negative value)

The underlying problem is that nobody has perfect knowledge of the old system. The new system may be perfectly understood as all the rules are spelled out in black and white, but is rarely a perfect reflection of the old one.

Managing the reconciliation

Of course, in many projects this reconciliation requirement is inescapable, and by the point you reach this stage the requirements phase is over and done with so whether it was the requirements gathering process was inadequate, the requirements weren’t fully reviewed – or whatever, the management of the reconciliation process is all that lies on your control, and I believe is best done by adhering to three simple rules:

  1. Testing to the requirements (not expectations)
  2. Strict change control for any deviation from requirements
  3. An open ended test period for reconciliation

What this means is that firstly the written, signed-off requirements are what you develop, deliver and test against to claim project success (importantly, and knowingly, not business success). If the business expected a different outcome that is immaterial in terms of the project’s accountability. This is often frustrating to the business, but vital to the project so that those involved can safely say they have done what was asked of them with the resources provided (if they did…   it can equally be used by the business to highlight a poorly performing project team).

The second point means that any deviation from the written, signed-off requirements is properly captured and costed. I’ll be the first to admit that neither of these policies will make a project lead terribly popular, but it is for the benefit of the project and the business. The reason behind this is that the cost of insufficient / missed requirements is spelled out as a business and project cost, not simply a poor project delivery cost. It raises the visibility to the business of these changes, and helps prevent the business being able to offload the costs (in monetary and image terms) to the project team.

The third point is that in terms of estimating for the reconciliation testing, you must have a large contingency period and also not be held to a fixed cost and time for it. In the reconciliation period it’s likely you will discover new or incomplete requirements, which means cycles of more development and testing. There is no way of knowing in advance exactly what this will be – any guess will be a stab in the dark – and I once saw a 6 month project overrun by 18 months as this phase ran its course – all at the consultancies expense.

Justifying wearing the pain

The reason why these strict policies will benefit is not around costs and timelines, which will probably drift by roughly the same amount whether they are managed on a reactive “fix them as they come” basis or under the strict approach I propose. The benefit is around ensuring the reason for the drift is understood and the pain is shared, not allocated 100% to the project team.

If the changes are managed reactively, then in the short term the project delivery team feel they are being helpful and accommodating. However in the long term what happens is the customer starts perceiving a couple of things. One, that all delays in the project are the delivery team’s fault as they are the ones always taking longer to implement the requirement – even though the delay is a non-technical one to do with changed requirements. Second (and more dangerous) is the belief that change comes at zero cost to them – so they have no hesitation in adding extra components or requirements in – further delaying delivery and making the project team look even worse.

Applying strict change control is not about pushing back against the business to prevent them making changes – it is about making visible to them the cost of those changes. It’s a lot easier to face up to a stakeholder who asks why a project is running late if you can quantify the delays in terms of specific problems and shared responsibilities.

Yes, it’s Project Self Defence

One initial comment I had on this approach was that if the project delivers, but it’s wrong, then it’s still wrong. And I agree – this is purely Project Self Defence.

First, it’s about managing cost and budget – you can do what you are asked with the resources and time you estimated. You cannot necessarily do what the business expects with those resources. Any gap between requirements and expectations needs to be managed and the cost understood and shared. Especially that painful “match the old system” testing period which can go on for a very long time.

Second it’s about managing perception and image of the project. If you run late / over cost because of accommodating change then the project team suffers all the reputational damage. If you can call out that the delays are for identifiable reasons where the responsibility is shared, then the delivery problems become a joint concern with more buy in from the business.

Hopefully now you’ll think twice about accepting that testing requirement now…

Human Infrastructure & Analyst First

Recently I was watching a vendor vs vendor punchup on LinkedIn – various salespeople, vested interest consultants and fanboys all trying to declare their database was clearly far better than the other. To me it looked suspiciously like a bunch of car salesmen desperately trying to convince someone their vehicle was the superior one because of x,y or z feature.

The Stig

Rather driven by having recently attended an early meeting of Analyst First, I was somehat bemused at the complete sidelining of the Human component. Hands down, I will agree a Porsche 911 is technically faster than a Subaru Impreza. However, stick them both on the same track, put me in the Porsche and the Stig in the Impreza – and I wouldn’t put great odds on me crossing the finishing line first (or, to be honest – at all – i’m no race driver and would propbably end up in a ditch with spinning wheels).

Wreck

In any tooling choice, it is smarter to pick a toolset with which you can comfortably match people’s skills and experience. I will build you a great Microsoft BI solution, because I know the toolset intimately and will squeeze every possible drop of value out of it. I will make a middling Cognos solution because I roughly know what it does and should do in theory (I will also complain vociferously about anything MS BI can do better that it can’t). I will build you a terrible Jaspersoft solution because I don’t even know how to turn it on.

The impact of a few shortfalls here and there in capabilities of a toolset your team are familiar with will be minor compared to the impact of them blindly feeling their way through a new toolset with a set of preconceptions based on how their previous one worked.

Analyst First

Which is kind of where Analyst First comes in. They represent a component of the Analyst community here in Australia with a very focused aim: to equip the man, not man the equipment. What does that mean in practice? It means not spending the big bucks on analytics software and expecting the analytical manna to start falling from heaven, but instead spending it on the people who know the raindance, so to speak. Their proposition is simple and quite reasonable: a good analyst first and foremost needs skills – not tools – to do their jobs well.

Rolling back to the car analogy, there is no no point buying a learner driver a Porsche – spend the money on driving lessons first. The learner will benefit more from it, and also not suffer from the false sense of security that a powerful car can give you. I’m fast! I’m safe! I’m wrapped around a lampost! Oops. Analytics is a tricky occupation – it’s very easy for powerful tools to give you an answer, and for the inexperienced analyst to believe it must be right because the expensive tool made the answer (and made it look pretty to boot).

I’ve done just enough Data Mining to know that the wrong answers can leap off the page and look very convincing until you look under the hood as to why you get that answer. One example was that I had a strongly predictive indicator come out of my data. It predicted with about 95% accuracy that if factor Y was present, the customer fell into category X. Convincing stuff. Until I got under the hood and discovered that factor Y was only ever entered into the system for customers in category X. It went from being 95% predictive to 0%.

Human Infrastructure

Tying these two related topics together is the concept of Human Infrastructure, one that is often neglected in project plans and budgets. BI, and its cleverer – if scruffier and more academically inclined - relative, Analytics – is not just another system which needs a mundane user guide which states “To get outcome A, press button B”. To get value out of data you don’t just need to know how to use the tool – you also need to understand how to analyse data. This is a mishmash of competencies around maths, stats and logic to name a few, none of which are able to be bypassed through use of a tool.

I often hear that users don’t want to know about the details of calculations and aggregations and that BI should just serve it up on a plate. This worries me as if your end users aren’t motivated enough (or as few people will dare say out loud, smart enough) to understand how an outcome arose, but are prepared to make decisions on it, then they will make bad decisions. Witness the sub Prime crisis driven by people selling stuff devised by clever quants regardless of their own ability to understand it.

The bottom line: Worry less about tools, and more about the people that are going to use them.

Exceptional programmers are 100 times better than average ones

… at least, this is a claim made by Mark Zuckerberg of Facebook:

“Someone who is exceptional in their role is not just a little better than someone who is pretty good,” he argued when asked why he was willing to pay $47 million to acquire FriendFeed, a price that translated to about $4 million per employee. “They are 100 times better.”

The source of this quote is from a blog post here : Great People Are Overrated which actually argues the counter point – that a solid team is more important than a few superstars. The comments are actually more enlightening than the article itself and largely agree with Zuckerberg’s position – I suggest reading through some of them.

As a total aside, the comments also lead me to this article about a book I’ve added to my must read list – The Mythical Man Month – which talks about various software project management issues some of which relate to the above. The Wikipedia article linked to has a good summary of the key points (not least the 9 women can’t make 1 baby in a month).

So, Superstars or Average Joes?

The answer probably lies somewhere in the middle. You need the superstars to provide vision, solve problems and create genuinely innovative and effective solutions. You need the Joes to implement, fix, maintain – do the stuff that can waste a superstars energy and focus. This also gives the Joes a chance to develop, learn – and potentially become stars themselves (the caveat being that there will always be Joes who stay Joes – and perhaps these should be fired if you want to truly excel as an organisation). Also, an organisation once it reaches a certain size needs structure, process and systems to operate without collapsing, and this is a Joe job, not a Superstar one.

Something that perhaps isn’t drawn out by the article is the issue of Domains of stardom. Superstars will invariably be able to grasp the concepts of and even have the capacity to excel in other domains that they turn their focus to, but tend to be Superstars in their own Domain. Loathe that I am to use sports analogies, a Superstar Football player will probably make a pretty good Rugby player – but not a Superstar – until they decide that’s what they want to do and focus exclusively on that. In my own little world, I’d happily state I’m an SSIS Star – not the best – but outclassing most. However I’m an average SSAS guy, and wouldn’t want to trade MDX punches with the likes of Boyan Penev.

The reason I draw out the Domain issue is one of Ego. People who are very good at one thing often get confused and think their Domain expertise means they are qualified to speak out on other issues. For example witness Linus Pauling’s absurd position on Vitamin C – a medical issue – upon which as a great physicist he was utterly unqualified to make comment.

A key message to take away is that you need Superstars to succeed and excel. A team of Joes will never make your organisation great, just functional.

What is the ideal Superstar : Average Joe ratio?

The above formula will be weighted by the size of your organisation.  In a small outfit you need to be made up of near 100% greatness so that you can drive, expand and succeed. In a bigger one, you are compelled by supply to bring on Joes purely because there aren’t enough Stars around, and no Superstar wants do to donkeywork. Besides, there is donkeywork to be done and you don’t want to waste your best people on that. You can however multiply the value of the Superstars by getting them to create solutions and solve problems but not get slowed down by the detail of actual implementation.

Superstars grow and drive your business. Joes maintain it. Is growth worth “X” times more than maintenance? To answer that question in terms of how business evaluates that simply ask yourself why do the sales people get paid more than a grunt developer…

Too Many Hats

I’m coming to the end of a project and contemplating some of the lessons learned during its near 9 month duration. One key one I’ve taken away is that in the middle of the project we had a bit of a resourcing issue and my role was expanded to cover more activities – from pure architect to hybrid architect / project manager / senior developer. Not a problem in itself – as a consultant you expect to wear many hats – technical expert, customer relationship manager, architect, sales guy – it’s all part of the fun of consulting. However I found myself on the hook at one point for a hat too many – and that hat was the easiest for me to wear – the developer hat.

The difference with the developer hat is two-fold. First up, you have hard deliverables. If you are a bit late with a project plan, or your architecture isn’t quite complete – life can usually move on. If your code is late, you start holding up other developers and raising red markers on the project plan. Of course the code you get allocated as a senior guy is not the easy stuff, but the bits where dragons can be found – so the odds on it being late goes up. Which leads to the second aspect – you fall under pressure because your code is late – and under pressure you focus on what you are comfortable with dealing with – coding – and everything else gets left by the wayside.

Consequently the other activities on my plate began to lose focus – notably the project manager related ones (organisation is not my strong point anyway) – and the project began to suffer because soft deliverables can slip… but only so far. Thus the pressure rises and my head stuck itself deeper in the tasks I could deal with and knew I would get called out on.

The tl/dr summary of this is that when doing multiple roles on a project:

  • Hard deliverables add significant extra pressure to your role
  • It is easy to unintentionally put more effort into the role you are most comfortable with (especially under pressure)

COSH – The Cost of Substandard Hardware

Now, some of you may note that “Substandard” is probably a more polite term than most would use in the acronym “COSH” – but this is a family friendly blog (just in case there are any 8-year old ETL developers out there)

I’m sure all of us in our developing career have been given the worst PC in the building, dev servers made of bricks and the wonky chair that needs a Degree in Engineering to sit on without sustaining injury. Which, on a personal level, sucks as you can’t work as fast as your brain wants and rapidly becomes frustrating. The thrill of a dangerous chair wears off pretty quickly, too.

However looking at it from above, this also attracts a cost to the project as a whole. For example, say you have all your development machines hosted virtually. Makes sense – easy to reproduce your dev environments for everybody, avoids having to get lots of stuff installed on desktops – all round a sensible approach. This works in practice, but there’s 2 things you have to really think about:

Availability

If your Host goes down, then your developers are offline. Let me spell out how much that costs to you in cold, hard maths:

Number of developers * Developer cost per hour * Hours of downtime = Cost of failure.

So, assuming a standard Australian dev resource @ $100ph, that means if you have a team of 4 devs, a day of downtime costs this:

4 * $100 * 8 = $3200 / day

Plus of course that doesn’t allow for the cost of delaying the project by one day. Suddenly that backup host doesn’t seem so expensive. Or the cost of some Infrastructure consultants to make sure that the homebrewed Hyper-V setup is actually configured properly, which leads me on to…

Performance

Poorly performing hardware can slow developers down. Beautiful though your documentation and planning may be, much development is still only practical by repeating a “run-check-fix-rerun” cycle, and is the only way to unit test. If your dev hardware runs twice as slow as it could (say, relative to production) then that slows your developers down. It’s not quite as clear cut as above, as developers develop as well as run – i’d say the factor is probably about 50% of dev time is spent running – feel free to make your own judgment, but here’s the maths:

Number of developers * Developer cost per day * Developer Run Time *  (Relative performance -1) = Cost of poor performance.

So, assuming a standard Australian dev resource @ $100ph, that means if you have a team of 4 devs, running 50% of the time on twice as slow hardware,  the daily slowdown cost is this:

4 * 800 * 0.5 * (2-1) = $1600 / day

Not quite as expensive as downtime, but a subtle, creeping cost nonetheless.

Summary

The above examples apply to any setup, not just Virtual – slow desktops, flaky database servers, slow networks – they can all have an impact. You may disagree with my maths (I’m happy to take your views on my formulas) – but what I’m trying to illustrate is that there is more to poorly performing development environments that just annoyed developers. It costs the project time and money – and these costs can get easily get big enough to warrant spending on expertise or hardware to remediate.

Waterfall and the Illusion of Control

I recently overheard a PM at my client site say the following:

We’re great at delivering projects on time and on budget. We’ve delivered three such projects! The original project, the second project to address all the scope we cut in the first phase to deliver it on time and on budget, and the third project to address all the scope we cut in the second phase. By phase three we had finally delivered the original project!

This obviously was to a greater extent tongue in cheek, but it exposes a significant weakness in overplanning a project, which I think is Waterfall’s biggest problem. There are three variables in any project – Schedule, Scope and Budget. Any project managers job is to tame these beasts and bring them in line with The Plan. The problem with this is that adhering to The Plan becomes paramount in a Waterfall driven project because the Project Manager is held accountable to this. At a high level view, in my experience  most Project Owners when whipping the PM’s will measure them against (in order) Budget, Schedule and Scope. See what was last on that list? Scope – the useful bit of the project that the business actually use. But if they have managed to do the project (as an abstract concept, anyway) on budget and on time, they have delivered the illusion of control.

To me, any project should put Scope top of the list, as Scope = Business Value. Budget should be mapped to Scope areas so you can get value out of what you pay for. For example, if you have a shiny UI that is pretty much neutral in terms of benefit relative to cost, as soon as this starts overrunning, can it. If you have a core DW platform that has benefits that outweigh cost tenfold, then allow it to overrun, as long as you are still going to get payback. Schedule, is to me, irrelevant – it should be along the lines of “When do we want it? Now!” If you don’t want it now… well, why are you building it?

What’s the solution? I’m not going to jump up and start shouting Agile, but at least Agile puts Scope back at the top of the list. In big projects, it may not be the right approach – but it’s a set of processes to consider. Ultimately I understand the need for control to be in place – after all, the businesses wallet is not infinitely deep and managers rarely have the patience of saints – but I think overly planned approaches result in diminished delivered value.

Infrastructure pains

One of the headaches that has plagued various projects I have been part of has been Infrastructure. From the provisioning of environments, to dodgy release practices, to environments that were deemed “unnecessary” – sometimes the problems have not been the code, but where and how it gets into the wild.

Dev -> Test -> Prod

At the very least, any software release should go through these basic code promotion steps. When you’re doing a BI project, just because its a bit odd in software terms, doesn’t mean you can skip the standard code promotion activities. Development should be done in the Dev environment, where any damage done by bad code is minimal. Once it “works” it needs to be promoted into a Test environment to ensure that it actually does work, and it’s not a fluke of the right test data having been constructed. Once its been tested, it can then go into Production.

This means on a project you need these three environments up and running from the outset. It can sometimes be a hard sell to smaller, less experienced IT departments who haven’t experienced the pain of an overly keen developer trashing production IT infrastructure. It does increase cost, but prevention is far better than cure.

As far as how the environments look, Dev can be whatever you like – the databases can be a mess, the code can be spaghetti – who cares? This is the developers playground and they can do what they like in here. However Test and Prod should be exactly the same. This way you can spot the “but it worked in Development” problems that somehow drag down production.

Dev = Test = Prod

Now, the next important thing is to ensure each of these environments are physically the same. So all the software is the same, it’s patched to the same version, it has the same network cards and drive mappings. If you have a seperate database and SSAS box, don’t just use one machine in Dev and Test because “in theory” it’s the same as production.

This - like the code promotion cycle above – is about prevention being better than cure. I’ve wasted many hours of my development life debugging issues that eventually turned out to be due to inconsistent patching, drive letters not being the same in different environments and so on. One of the issues with inconsistent environments is that after a while you accommodate for them – and forget you are doing it – then a new developer comes on board and blows things up because you’ve become so accustomed to the workarounds you’ve almost forgotten they’re there.

The key here is to have a good infrastructure build guide that explains how each environment is constructed, so the Infrastructure team have no excuse for building inconsistent environments. There will of course be some differences – IP addresses, Server Names, etc. – but these will be documented and should be legitimate.

Dev -> Test = Dev -> Prod

Finally, code promotion should be the same regardless of environment. If you find yourself making allowances for a quirk in one environment… well, see my comments above about inconsistent environments. Code promotion should be a boring routine that can be done by anyone who can follow simple instructions. Because in theory, your developers should have no access to production environments and the promotion from Dev to Test should be considered a dry run for the promotion to Prod.

Surely this is a bit too much?

Yes, yes it is. If we all coded perfectly and when we deployed we never made a mistake it’s totally unneccessary :)

Agile BI, or why Big Bang warehouses should (sometimes) be shot

Recently I wrapped up on a project that could have been a money pit with uncertain results, but managed to be steered into something that delivered results quickly and made the business very happy. To give a real high level overview, the client in question had a collection of mainframe delivered text files which fed Excel reports via Access via SQL2000. The project was being pushed by IT (usually a Bad Thing) primarily to solve a business problem (which meant it turned back into a Good Thing). The root business problem was mostly being the Access databases kept falling over for various reasons so the analysts couldn’t do their work.

As the requirements were scoped out, it became apparent that the resultant Data Warehouse was going to be a very large and complex thing that would take a long time to develop. We did a quick pilot to demonstrate that the ETL could deliver results in a structured and controlled manner into the Warehouse, which delivered about 5% of the files through from end to end. IT were pleased as it all went well. Their first thought was then to deliver the remaining 95% of data.

Do you need to put all the Data in the Warehouse?

I asked the client to pause for thought at this point. In delivering the 5% of data we had proved that the DW process was feasible to IT – but had given nothing to the business. We had solved an IT problem, not a business one. In order to get the business excited we needed to deliver someting to them. The IT side of the client saw the wisdom of this – if the business had something tangible and helpful it helped sell the project. Also, if they hated it, we could find out we were on the wrong track in a far cheaper manner than after delivering a monster DW that nobody wanted!

So what did we do next? We focused on satisfying just one business problem – in this case the data needs of a single department, and set about building the parts of the DW required to meet those needs. At the end of this we could throw them a shiny new cube and see if it stuck. This way we would:

  • Deliver initial results at minimum cost
  • Establish if we were on the right track from an early stage
  • Get business visibility of the DW faster

If all went well, we could focus on delivering the next 5%, then the next…  and probably find that there was probably a big chunk that wasn’t used anyway, thus giving more savings – or identifying areas for new insight. Much better that delivering the big bang DW that may actually be a damp squib!

So was that Agile BI then?

It was a form of it, yes! The idea of Agile Development is to deliver results quickly, allowing the business to see what they are getting sooner and tune the results. Now, this isn’t all good – it can lead to massive scope creep, drift of purpose and even failure to deliver anything at all. So there is a need to be very cautious and apply this methodology only where appropriate. Cases such as Cubes for analysis and Reports are things that can benefit from a rapid development cycle carried out in close conjunction with the business. A good example of where not to apply it is the ETL – it’s something that needs rigour and structure to be delivered effectively, and in practice is something business users neither understand or care about.

Sometimes with BI it is the right approach in greenfield sites because of one significant issue - clients can have a poor grasp on the capability of BI solutions and need leading into the wonderful world of BI. Often they can be skeptical about the promises made by consultants and by using an Agile approach on the first build you can get them into the BI mode of thinking rapidly. The client sees what they are getting and can get excited – whereas with standard waterfall delivery  they can wait months to see results after an long analysis / design phasse, and if the consultants have misjudged the business needs – kill the DW in its infancy.

But how can we deliver BI without the Warehouse?

I’m now on another project which has become lost in the design phase and the client is wondering what is going to come out of it all. We shifted approach to an Agile method where instead of focusing on getting the solution letter perfect in Design (Waterfall style) we are going to deliver a functioning SSAS Cube based on a sample of real data and build the design off of that. This way the client gets to see the light at the end of the tunnel and we know we are building the right solution even before the design is finalised.

The key in this case is to establish what the end deliverable is as quickly as possible and demonstrate that we, as consultants, have identified the right target in order to give the business confidence. The DW can follow to support that deliverable.

So when should a project go agile?

There’s no hard and fast rule, but I would suggest the following indicators may guide you towards an initial Agile build:

  • Requirements are sparse / undefined – and the goalposts can move
  • It is the first BI project and BI capabilities are poorly understood
  • Business users are highly available
  • You need to give confidence you can deliver the right thing

In case you want to know more, here are a few additional articles on Agile BI that I have found:

Checklists

I am currently going through the mindbogglingly dull process of going through the QA process I apply to everything I, or anyone on my team, develops in SSIS. There’s an established approach to this – I have drawn up a checklist of common things that must be in place for each type of package to run effectively. Some of the items checked are set by default in the templates, but I insist on checking them anyway, because “things happen” in development. Most of the checks are simple review points to make sure certain portions of the template (e.g. variables) have been properly updated. A few of them are there so another developer can quickly review whats been done and apply a common sense appraisal of the work.

The process of going through the QA checklist takes maybe about 5 minutes per package at most. However that process consistently reveals two important facts:

  1. Other people don’t do their work to the standards I set
  2. I don’t do my work to the standards I set

Just because you’re a professional who has been doing a job for years doesn’t mean you won’t forget things. Someone will distract you with a question and you lose track of your progress – and a mistake is made. This great article on a checklist for doctors working in Intensive Care units really emphasises the value of not just having a process, but having a simple means of enforcing it. Like a checklist to make sure you did everything you should have done :)

Data Quality – the Business Problem

Steve Bennett over at Oz Analytics has just done a couple of good posts on data quality from the perspective of how to “sell” the issue of poor data quality to the business and make them realise it’s not just a technical problem, but can also cost them money. The relevant posts are here and here.

A flipside to his approach of quantifying the cost to the business is of that for us data monkeys, we should focus our thinking when faced with a data quality problem to consider if it’s actually worth solving. It may grate for us to have crappy data in our lovely warehouse, but if the cost of solving it exceeds the benefit realised – we may sometimes just have to let it be there.*

* Of course this thought makes me feel a little dirty, and I think I need to take a shower in some nice strongly typed data with enforced referential integrity :)

Next Page »