Infrastructure pains

One of the headaches that has plagued various projects I have been part of has been Infrastructure. From the provisioning of environments, to dodgy release practices, to environments that were deemed “unnecessary” – sometimes the problems have not been the code, but where and how it gets into the wild.

Dev -> Test -> Prod

At the very least, any software release should go through these basic code promotion steps. When you’re doing a BI project, just because its a bit odd in software terms, doesn’t mean you can skip the standard code promotion activities. Development should be done in the Dev environment, where any damage done by bad code is minimal. Once it “works” it needs to be promoted into a Test environment to ensure that it actually does work, and it’s not a fluke of the right test data having been constructed. Once its been tested, it can then go into Production.

This means on a project you need these three environments up and running from the outset. It can sometimes be a hard sell to smaller, less experienced IT departments who haven’t experienced the pain of an overly keen developer trashing production IT infrastructure. It does increase cost, but prevention is far better than cure.

As far as how the environments look, Dev can be whatever you like – the databases can be a mess, the code can be spaghetti – who cares? This is the developers playground and they can do what they like in here. However Test and Prod should be exactly the same. This way you can spot the “but it worked in Development” problems that somehow drag down production.

Dev = Test = Prod

Now, the next important thing is to ensure each of these environments are physically the same. So all the software is the same, it’s patched to the same version, it has the same network cards and drive mappings. If you have a seperate database and SSAS box, don’t just use one machine in Dev and Test because “in theory” it’s the same as production.

This - like the code promotion cycle above – is about prevention being better than cure. I’ve wasted many hours of my development life debugging issues that eventually turned out to be due to inconsistent patching, drive letters not being the same in different environments and so on. One of the issues with inconsistent environments is that after a while you accommodate for them – and forget you are doing it – then a new developer comes on board and blows things up because you’ve become so accustomed to the workarounds you’ve almost forgotten they’re there.

The key here is to have a good infrastructure build guide that explains how each environment is constructed, so the Infrastructure team have no excuse for building inconsistent environments. There will of course be some differences – IP addresses, Server Names, etc. – but these will be documented and should be legitimate.

Dev -> Test = Dev -> Prod

Finally, code promotion should be the same regardless of environment. If you find yourself making allowances for a quirk in one environment… well, see my comments above about inconsistent environments. Code promotion should be a boring routine that can be done by anyone who can follow simple instructions. Because in theory, your developers should have no access to production environments and the promotion from Dev to Test should be considered a dry run for the promotion to Prod.

Surely this is a bit too much?

Yes, yes it is. If we all coded perfectly and when we deployed we never made a mistake it’s totally unneccessary :)

Agile BI, or why Big Bang warehouses should (sometimes) be shot

Recently I wrapped up on a project that could have been a money pit with uncertain results, but managed to be steered into something that delivered results quickly and made the business very happy. To give a real high level overview, the client in question had a collection of mainframe delivered text files which fed Excel reports via Access via SQL2000. The project was being pushed by IT (usually a Bad Thing) primarily to solve a business problem (which meant it turned back into a Good Thing). The root business problem was mostly being the Access databases kept falling over for various reasons so the analysts couldn’t do their work.

As the requirements were scoped out, it became apparent that the resultant Data Warehouse was going to be a very large and complex thing that would take a long time to develop. We did a quick pilot to demonstrate that the ETL could deliver results in a structured and controlled manner into the Warehouse, which delivered about 5% of the files through from end to end. IT were pleased as it all went well. Their first thought was then to deliver the remaining 95% of data.

Do you need to put all the Data in the Warehouse?

I asked the client to pause for thought at this point. In delivering the 5% of data we had proved that the DW process was feasible to IT – but had given nothing to the business. We had solved an IT problem, not a business one. In order to get the business excited we needed to deliver someting to them. The IT side of the client saw the wisdom of this – if the business had something tangible and helpful it helped sell the project. Also, if they hated it, we could find out we were on the wrong track in a far cheaper manner than after delivering a monster DW that nobody wanted!

So what did we do next? We focused on satisfying just one business problem – in this case the data needs of a single department, and set about building the parts of the DW required to meet those needs. At the end of this we could throw them a shiny new cube and see if it stuck. This way we would:

  • Deliver initial results at minimum cost
  • Establish if we were on the right track from an early stage
  • Get business visibility of the DW faster

If all went well, we could focus on delivering the next 5%, then the next…  and probably find that there was probably a big chunk that wasn’t used anyway, thus giving more savings – or identifying areas for new insight. Much better that delivering the big bang DW that may actually be a damp squib!

So was that Agile BI then?

It was a form of it, yes! The idea of Agile Development is to deliver results quickly, allowing the business to see what they are getting sooner and tune the results. Now, this isn’t all good – it can lead to massive scope creep, drift of purpose and even failure to deliver anything at all. So there is a need to be very cautious and apply this methodology only where appropriate. Cases such as Cubes for analysis and Reports are things that can benefit from a rapid development cycle carried out in close conjunction with the business. A good example of where not to apply it is the ETL – it’s something that needs rigour and structure to be delivered effectively, and in practice is something business users neither understand or care about.

Sometimes with BI it is the right approach in greenfield sites because of one significant issue - clients can have a poor grasp on the capability of BI solutions and need leading into the wonderful world of BI. Often they can be skeptical about the promises made by consultants and by using an Agile approach on the first build you can get them into the BI mode of thinking rapidly. The client sees what they are getting and can get excited – whereas with standard waterfall delivery  they can wait months to see results after an long analysis / design phasse, and if the consultants have misjudged the business needs – kill the DW in its infancy.

But how can we deliver BI without the Warehouse?

I’m now on another project which has become lost in the design phase and the client is wondering what is going to come out of it all. We shifted approach to an Agile method where instead of focusing on getting the solution letter perfect in Design (Waterfall style) we are going to deliver a functioning SSAS Cube based on a sample of real data and build the design off of that. This way the client gets to see the light at the end of the tunnel and we know we are building the right solution even before the design is finalised.

The key in this case is to establish what the end deliverable is as quickly as possible and demonstrate that we, as consultants, have identified the right target in order to give the business confidence. The DW can follow to support that deliverable.

So when should a project go agile?

There’s no hard and fast rule, but I would suggest the following indicators may guide you towards an initial Agile build:

  • Requirements are sparse / undefined – and the goalposts can move
  • It is the first BI project and BI capabilities are poorly understood
  • Business users are highly available
  • You need to give confidence you can deliver the right thing

In case you want to know more, here are a few additional articles on Agile BI that I have found:

Checklists

I am currently going through the mindbogglingly dull process of going through the QA process I apply to everything I, or anyone on my team, develops in SSIS. There’s an established approach to this – I have drawn up a checklist of common things that must be in place for each type of package to run effectively. Some of the items checked are set by default in the templates, but I insist on checking them anyway, because “things happen” in development. Most of the checks are simple review points to make sure certain portions of the template (e.g. variables) have been properly updated. A few of them are there so another developer can quickly review whats been done and apply a common sense appraisal of the work.

The process of going through the QA checklist takes maybe about 5 minutes per package at most. However that process consistently reveals two important facts:

  1. Other people don’t do their work to the standards I set
  2. I don’t do my work to the standards I set

Just because you’re a professional who has been doing a job for years doesn’t mean you won’t forget things. Someone will distract you with a question and you lose track of your progress – and a mistake is made. This great article on a checklist for doctors working in Intensive Care units really emphasises the value of not just having a process, but having a simple means of enforcing it. Like a checklist to make sure you did everything you should have done :)

Data Quality – the Business Problem

Steve Bennett over at Oz Analytics has just done a couple of good posts on data quality from the perspective of how to “sell” the issue of poor data quality to the business and make them realise it’s not just a technical problem, but can also cost them money. The relevant posts are here and here.

A flipside to his approach of quantifying the cost to the business is of that for us data monkeys, we should focus our thinking when faced with a data quality problem to consider if it’s actually worth solving. It may grate for us to have crappy data in our lovely warehouse, but if the cost of solving it exceeds the benefit realised – we may sometimes just have to let it be there.*

* Of course this thought makes me feel a little dirty, and I think I need to take a shower in some nice strongly typed data with enforced referential integrity :)

How is what we are building helping you make better decisions?

This is a followup to a good article by Timo Elliott on making information understandable. It underlines a good point – good information is useless unless it is presented in a way the users can understand. I always try and build any reports with the philosophy; “How do I make it so the most number illiterate recipient can grasp what this report is showing?” – working for a bunch of Lawyers in one job really honed this mode of thinking!

Why Bus Timetables are a great example of BI

For a comparable transport analogy (and general moan), Sydney has terrible bus information. At most stops is a sheet of A4 (if you’re lucky) with bus numbers and times in order of time. No maps. No explanation of what bus number goes where. Unless you know in advance which bus you need, this data is useless. It’s not a surprise I never bother getting the bus – I can’t hop on one because I have no idea when a useful one is coming, and if one does come, whether its route is good for me. Compared to London with its simple route maps and timetables ordered by route and time, I hopped on the bus regularly, because if I passed a bus stop in a strange part of town I could rapidly figure out if it was a good option.

What question makes the difference between a Report and Information?

A common failure point for BI is that too much emphasis has been put on the tools, and getting data in shape – so you have a fantasic warehouse and toolset to read it – but users who aren’t “educated” enough to use it, because there seems to be the mindset among developers that BI systems are just that – systems. Training is usually functional – “how to access Report B, how to rerun the Data” – not practical – “How does Report B give me the ability to make better decisions”? There is a gap in expectations – the developers assume management know what they are asking for. Managers assume what they get will get the answers from the reports. No-one in the equation is brave enough to pipe up and ask: “Do you know what you really want? Because I don’t think this will help.”

A good consultant should continually be asking this question of their clients – “How is what we are building helping you make better decisions?

Why you should check your consultants

I’m currently working on a near-real time reporting system for a big client, and one of the components of the work is to fix the ETL system put in place by the consultancy who were in previously. So what, you might say, they’ve been burned by a tin pot little outfit who overreached. Well, er, no. The consultancy who built it are a huge global player, and are people who should have the brightest and best working for them. They seem to have made 2 critical mistakes:

  1. Applied a generic methodology without customising it to the tool or situation
  2. Outsourced development offshore

Why can a general approach fail?

It’s true, ETL processes have general principles that should be followed in most processes, but each ETL tool has its quirks that need to be allowed for, and performance features that should be embraced. Similarly each project will have its own unique requirements that will mean the generic approach won’t be a 100% fit for the project you are working on.

For example, in the case I’m dealing with now, they have a control framework which is designed to do all sorts of logging and control – except about 50% of it simply doesn’t work because of the way the data of the project  has to be handled. Similarly they have control tables whose contents have been stuffed with chunks of queries simply to make the existing framework function, rather than rework the framework to suit the need.

A general approach will fail when it is applied too rigidly by people who don’t understand the reasons for the approach in the first place. This is why lawyers have the concept of the spirit and letter of the law. With any methodology you need flexibility in how you handle each situation – the reason why there isn’t a universal ETL method used by everyone is because such a system cannot exist given the variety of demands and constraints on such systems.

Offshoring saves us money so is good for the business, surely?

Offshoring works in an IT project in a very narrow range of circumstances, and BI projects very rarely meet those circumstances. What happens when you offshore is you send your specification to a third party who cannot – because of time differences, cultural differences, language issues – communicate with you constantly and get an understanding of what you are asking for. As a result your specifications will be met exactly, with no allowance for what you wanted, because any experienced IT developer will admit the specifications often overlap – but rarely match exactly – with what the business needs. This will be fatal for a BI project because as seasoned BI professionals will warn – BI projects are first and foremost Business projects, not IT projects.

Combined with the risks in having insufficiently skilled developers sticking rigidly to a general approach that doesn’t really make any sense, and you have what we are dealing with – a dysfunctional system that may simply need to be thrown away.

So how do I protect myself against bad consultants?

This is probably a series of articles in itself, but I would give the following quick suggestions around development:

  • Have development work done on site as much as possible. If the words outsource or offshore are mentioned, run away.
  • Ensure that lead developers are experienced – certified if possible – in the technology you are using. Being a SSIS ace doesn’t mean you will be bale to do great work in Data Integrator
  • If possible get a third party to review the work at an early stage

If anyone else has had similar experiences and ideas on how to mitigate the risks of engaging poor consultants, comment away!

Testing Business Rules

Everybody hates testing. I hate testing. You hate testing. Bob down the corridor, well – actually he quite likes it, but then he files his pencils in order of length in a special drawer, so he’s probably not right in the head.

However, get it right and you save yourself alot of grief. This is in reference to one client I have worked with who didn’t have a test plan as much as a “Well, if the numbers match the old system when we’ve finished, well, it’ll be right and we’ll sign off”. Not in itself an invalid test, but a lousy way of testing that the details work as expected. As a result of this approach the testing phase of this project dragged on for months, people on both sides were blaming each other for the slow progress and the relationship has gone to pot – driven heavily by inadequate test plans.

So what was so bad? Well, whenever anything didn’t match – on a test case such as “numbers of widgets moved from location A to location B in March” – the only way to find out why it didn’t match was to dig through all systems, find out why they didn’t work – applying all the business rules along the way – and then find out it was down to “ah, we didn’t apply business rule A in the BI layer” – or the client had forgot they didn’t apply business rule B in the old system, but they did in the new. The amount of effort put in digging into this a) frustrated the consultants working on the project and b) made the clients lose patience as every test case that failed to ages to resolve.

Fogged into all this was that often we’d resolve one problem on the test case, only to reveal another. This further dented the client’s confidence (because obviously when it didn’t work it was always the consultancy’s fault – some clients have an odd ability to completely gloss over their own contribution to a project’s problems).

What the tests should have been doing was testing the specific rules that were being applied to the data were being done so – on a rule by rule basis that would allow us to isolate it’s function. So test cases should have been “If Widget is of type A, but not in class C then they should be moved to location B”. This way we could have picked up whether rules were being applied correctly far earlier in the process, and with much less effort.

As an addendum to this, reconciling against old reports should only be done once it’s understood how the old reports work. We had significant problems because the old reports we were trying to align against weren’t applying the same business rules as the new system.

So, what is BI Monkey’s concluding thought? Test the details. Test them in isolation. Make sure that the details work long before you try and test the whole system, and you’ll have a much better chance of success.