October Sydney training roundup – MS BI, Cloud, Analytics

The end of the year is closing in fast but there’s still plenty of chances to learn from specialist providers Agile BI, Presciient and of course, me!

Topics cover the full spread of DW, BI and Analytics so there’s something for every role in the data focused organisation.

Build your Data Warehouse in SQL Server & SSIS with the BI Monkey

Nov 24/25 – Are you about to build your Data Warehouse with Microsoft tools and want to do it right first time?

This course is designed to help a novice understand what is involved in building a Data Warehouse both from a technical architecture and project delivery perspective. It also delivers you basic skills in the tools the Microsoft Business Intelligence suite offers you to do that with.

Get more detail here

Agile BI workshops

Power BI specialist Agile BI brings your product updates on this key new self service BI technology:

Oct 15 – Power BI workshop – Excel new features for reporting and data analysis – more detail here

Oct 30 – What Every Manager Should Know About Microsoft Cloud, Power BI for Office 365 and SQL Server 2014 – more detail here

Presciient Training

Dr Eugene Dubossarsky shares his deep business and technical exercise across a range of advanced and basic analytics. Full details here but the key list is:

Dec 9/10 – Predictive analytics and data science for big data

Dec 11/12 –Introduction to R and data visualisation

Dec 16/17 –Data analytics for fraud and anomaly detection, security and forensics

Dec 18/19 – Business analytics and data for beginners

 

Read More

July Training roundup

Ok, so there’s a lot of things happening in July or later in the year which you should be aware of, led either by me or people I know deliver great training across Microsoft BI, SQL Server, R, Data Science, Predictive modelling…  and more.

Build your Data Warehouse in SQL Server & SSIS with the BI Monkey

There’s a big advert to the right – this workshop taking place 28/29 July will help you get to grips with the core concepts of data warehousing, give you hands on experience implementing those concepts using the Microsoft SQL Server BI Toolset, then round it out with making you aware of all the peripheral details that can enhance your solution and success.

Register Now

AgileBI Workshops on SQL Server 2014 & PowerBI

AgileBI – which includes Iman Eftekhari, Viktor Isakov and myself – are delivering the following workshops:

  • What Every Manager Should Know About Microsoft Cloud, Power BI for Office 365 and SQL Server 2014 – 11/2 day session on July 24th
  • SQL Server 2014 & Power BI Jump Start – Full Day Workshop on July 31st
  • Power BI workshop – Excel new features for reporting and data analysis – lunchtime July 10th

Register Now

Presciient Training on R, Data Science & Predictive Modelling

  • Training led by the ever interesting Dr Eugene Dubossarsky in September across Canberra, Sydey & Melbourne – Sydney dates below:
  • Introduction to R and data visualisation – 9/10 Sep
  • Predictive modelling, data science and big data – 11/12 Sep
  • Data analytics for fraud and anomaly detection, security and forensics – 16/17 Sep
  • Advanced R – 18/19 Sep

Register Now

Read More

Trick of a tiny database: cut down 1GB of your database every day by simply using this 1 weird old tip

Even if you use an adblocker, I bet you’ve all  seen this:

Trick of a tiny database
Trick of a tiny database

 

Well, the good news is if you have a chubby Enterprise Edition SQL Server, you too can get it on a fast weight loss program.

One Weird Old Tip

Okay, okay, I’ll pack in the click-baiting. SQL Server packs a great feature called Data Compression. What this does is compress the data on disk using a low CPU intensity algorithm. You can apply it to the data tables (and associated clustered indexes) and non-clustered indexes.

Now, there’s no such thing as a free lunch, but this isn’t far off paying a dollar for a really, really good sandwich. The cost for applying compression is that you burn CPU decompressing the data on read and compressing it on write. However reported costs of this on a server are about a 2-5% increase in baseline CPU usage. Most Data Warehouse applications barely tickle their CPU (unless someone has written some shocking code) so this cost is to all intents and purposes negligible.

Now for the taste of the really, really good sandwich. Disk usage savings in my experience are around the 50% mark (+/- 5%). As a practical example of that, one of my clients had a problem with a 680GB database – they needed more space for more data and indexes and were scrabbling for funding before they tipped over the 1TB mark. A quick evaluation showed they could cut their disk consumption to  to 280GB! No more need for funding scrabbles…!

So, how do I lose this weight?

It’s so easy. (Well, as long as you have Enterprise Edition).

ALTER TABLE <table_name> 
REBUILD PARTITION = ALL WITH (DATA_COMPRESSION =  <option>)

Just run a quick ALTER TABLE and compression is applied. It may take a while to compress the data, but after that it’s just … on. You will most likely never notice a difference, and you don’t have to start writing queries any differently.

I’m not going to run through the subtleties of whether to choose ROW or PAGE compression (or COLUMNSTORE, but that’s your only option for COLUMNSTORE tables in 2014+) but when I’ve run my evaluations on various tables I’ve generally seen PAGE showing a slight advantage.

Will my shadow be fat still?

Hmm. I did promise I’d stop. So there is a bonus compression feature which applies to all editions worth talking about (i.e. Standard, BI, Enterprise) from SQL2008R2 which is backup compression. It’s also available in 2008 but Enterprise only. There are some minor limitations, but in most circumstances you can simply compress your backups as you take them. Which is handy as again it will significantly reduce the amount of archival storage you use at again a negligible performance cost.

Hey, BI Monkey, can you help me with a script?

Why, yes I can – and here it is: Size and compress. This will size every table in your database, recommend a compression option and generate output that looks a little like this:

Compression script output
Compression script output

It was written for SQL2012 so doesn’t take into account Columnstore compressed tables in 2014, and isn’t smart enough to check the edition, but otherwise it’s a good starting point.

Now, go get your database being as fit as it should be!

Before and after compression
Before and after compression

Read More

Gimme a table, but hold the indexes there buddy!

I’ve just wrapped up a quick bit of work with a client who was just getting moving on their DW journey, and I found some fun things that the rookies had done.

I was also partly inspired to talk about this particular issue by this timely DBA Reactions post: When the developers hear my warning about 27 indexes on a single table

When the developers hear my warning about 27 indexes on a single table
When the developers hear my warning about 27 indexes on a single table

 

Part of what I was looking at was applying compression to their tables to reduce the disk space consumption. So I ran my handy query to size the databases and found an 8GB table with 12GB of indexes! Yoinks – something was clearly astray…  it turned out that one table – with about 30 columns – had 8 indexes on it.

Why not have 27 indexes?

So there’s a few good reasons:

  • That many indexes take a long time to build / update, slowing load performance (in this case it took about 5 minutes to load the table and a further 29 minutes to build the indexes….)
  • The Query Optimiser is going to struggle to pick the right index if any of them overlap
  • If you are on 2012+ you can just stick the entire table in a columnstore index, massively boost performance and consume less disk space (applying a columnstore on the whole table took a whopping 1 minute to build)

As a broad rule I’d consider these rules of thumb following when looking at a tables indexes:

  • If I have more than 2 indexes, I’m probably not indexing well, my table design is wrong or my queries need re-evaluating
  • If my indexes are more than 20% of the size of the table, I should probably look at replacing them with a whole table columnstore
  • Are my indexes making use of included columns to increase their coverage?

If any readers have some good index management tips, please share them in the comments!

Read More

Size every table in your database in a flash – Part 2

A looooong time a go I had a post on how to “Count the number of rows in every Table in a Database in no time!” which did an approximate row count and sizing of every table in a database using DMV’s.

Well, since then then I have found  a far better way to quickly get a row count, table & index size from the handy stored proc sp_spaceused. It’s worth running DBCC UPDATEUSAGE first just to ensure the stats it works with are current.

I’ve wrapped a little cursor around it so you can – extremely quickly – get these stats for a whole database (sorted by descending data size but that can be easily tweaked in that final select):

DECLARE @TableSize TABLE
(
[TableName] nvarchar(255)
,    [RowCount] int
,    [Reserved] nvarchar(255)
,    [Data] nvarchar(255)
,    [Index Size] nvarchar(255)
,    [Unused] nvarchar(255)
)

DECLARE @CurrentTable nvarchar(255)

DECLARE TableCursor CURSOR FOR
SELECT [Name] FROM SYS.tables

OPEN TableCursor

FETCH NEXT FROM TableCursor INTO  @CurrentTable

WHILE @@FETCH_STATUS = 0

BEGIN

INSERT INTO @TableSize
exec sp_spaceused @CurrentTable

FETCH NEXT FROM TableCursor INTO  @CurrentTable

END

CLOSE TableCursor
DEALLOCATE TableCursor

SELECT * FROM @TableSize
order by CAST(REPLACE([Data],‘ KB’,) as int) desc

… and this is what the results look like (Against good old AdventureWorks):

Table Sizing Results
Table Sizing Results

Read More

How to do complex aggregations through Window functions in SQL Server

The BI Monkey has a confession – his partner is also somewhat of a data geek – though more buried in doing real analytics – but I have persuaded them of the benefits of using SQL instead of SAS sometimes. Right tool for the job etc. etc.

Now I was providing a bit of email support and was reminded of a valuable lesson – a written requirement is not always a very good way of expressing what you want – I spent most of the day bouncing emails advising about how to write cursors (yes, I know, sorry) – and when we actually sat in front of a laptop with the data I found out what it was we actually were trying to achieve and wrote a 5 line query that was literally thousands of times faster.

Are we talking about Window functions yet?

Sure, let’s get to the point. Actually I have blogged about them before but a refresher and different look is always helpful. Window functions allow you to define a “window” of data over which you can perform an operation against a set of rows. Note this is distinct from a GROUP BY which will aggregate the data. Window functions operate at the row level and leave the row data intact. The kind of capabilities this enables is things like adding a row sequence number within a group, cumulative totals, moving averages and other such shenanigans.

Now the key bit of code is the OVER clause. This allows you to specify a “window” of data over which to perform an operation. The window is defined by a PARTITION clause which specifies which column’s values to use to subdivide the data. Then an ORDER BY clause specifies what order – if any – the data in the PARTITION should be processed in. Finally a ROW or RANGE statement can influence the start and end points of the PARTITION.

However, that’s a lot of fine words and SQL speaks clearer:

SELECT    OrderDateKey
,        SalesOrderNumber
,        SalesAmount
,        SUM(SalesAmount) OVER (PARTITION BY OrderDateKey ORDER BY SalesOrderNumber ROWS UNBOUNDED PRECEDINGAS CumulativeAggregation
,        SUM(SalesAmount) OVER (PARTITION BY OrderDateKey ORDER BY SalesOrderNumber ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWINGAS SumOverWindow
,        COUNT(SalesOrderNumber) OVER (PARTITION BY OrderDateKey ORDER BY SalesOrderNumber ROWS UNBOUNDED PRECEDINGAS RowNumberIncrement
,        COUNT(SalesOrderNumber) OVER (PARTITION BY OrderDateKey ORDER BY SalesOrderNumber ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWINGAS RowNumberCountdown

FROM [dbo].[FactInternetSales]

ORDER BY SalesOrderNumber

This will run against the AdventureWorksDW2012 sample database.

So there are four columns I’m calculating across a window of “OrderDateKey”

  1. CumulativeAggregation – which accumulates values across the window
  2. SumOverWindow – which totals values across the window
  3. RowNumberIncrement – which provides an increasing row count across the window
  4. RowNumberCountdown – which provides a decreasing row count across the window

Here’s the results:

Window Function Results
Window Function Results

So to break these down a bit:

CumulativeAggregation

Here we PARTITION by OrderDateKey. So the window is based on values in the “OrderDateKey” column. So in the results above, we get a window for “20050701”, “20050702”, etc.

To get the  accumulation to carry out in the order we want we apply an ORDER BY on “SalesOrderNumber”.

Now for the clever bit. The ROWS UNBOUNDED PRECEDING tells the SUM function to evaluate from the start of the window. Hence you get the cumulative sum appearing – each row evaluates the total for all values for SalesAmount from the start of the window to the current row.

SumOverWindow

We follow much of what we did in CumulativeAggregation – same PARTITION, and same ORDER BY. The ORDER BY has no impact on the result but is needed for the statement to be syntactically valid.

This time we extend the ROWS UNBOUNDED PRECEDING by adding AND UNBOUNDED FOLLOWING – this now tells the SUM function to evaluate the SUM from the start of the window to the end of the window. So this time each row has the total SalesAmount for the window. Useful for calculating things such as percentage in period.

RowNumberIncrement

This time we switch the SUM to a COUNT. Here our ROWS UNBOUNDED preceding counts the rows from the start of the window to the current row – effectively giving us a row number.

RowNumberCountdown

In our last example we change to a stated set of ROWS between CURRENT ROW AND UNBOUNDED FOLLOWING. So now the COUNT evaluates from the current row to the end of the window – which effectively gives us a countdown to the end row of the window.

Summary

Using Window Functions via the OVER clause we can efficiently perform complex aggregations in place of awkward cursors, sub-selects and other such workarounds we may have had to lean on in the past. I’ve covered a small proportion of the options available but this functionality opens up so powerful analytic possibilities.

Also, talking to a user – even if it the person you are supposed to know best in the world – is vastly more effective if you want to understand their problem and solve it efficiently!

Read More

SQL Server Data Warehouse Training in Sydney – July 28

The eagle eyed among you may have noticed I’ve added a big countdown banner to the right of the page announcing a course “Build your Data Warehouse in SQL Server & SSIS with the BI Monkey“, running for 2 days from July 28.

The course is aimed at giving an overview of how to build and manage a Data Warehouse using the SQL Server platform as a base. It’s aimed to be a solid mix of Data Warehouse theory and hands on development to get the concepts locked in. For full details see my new “Training” page, but the key outcomes are:

  • Learn how to get the Architecture right and keep your solution flexible
  • Understand the whole lifecycle of a DW project
  • Get hands on with the tools you need to move data

 

The high level agenda is:

  • The Three Main Data Warehouse architectures – Kimball, Inmon & Data Vault
  • SSIS Basics – Projects, Control Flows and Data Flows
  • Moving Data with SSIS
  • ETL Control Frameworks
  • Database considerations
  • How HDInsight (Hadoop) fits in
  • Code Management with Team Foundation Server
  • Testing
  • Build Automation
  • Deployment
  • Management
  • MDS, DQS and other useful Microsoft Toys

 
Click the button to register now – look forward to seeing you!

Eventbrite - Build your Data Warehouse in SQL Server & SSIS with the BI Monkey

Read More

Troubleshooting Service Broker Stored Procedures

A quick post on troubleshooting the Stored Procedures that can get called to handle the messages in a Service Broker queue.

The stored procedures are pretty vanilla – see step 4 in this link for an example – take the XML of the message body and process it within the confines of the queue.

However the procedures cant be executed in the query editor so they only log their output to the error log – so to debug you need to look in the main SQL Server log – found at somewhere like “C:\Program Files\Microsoft SQL Server\MSSQL11.SQLSERVER\MSSQL\Log\ERRORLOG” – openable with a text editor of your choosing.
Then hunt for lines that look like this:

2013-11-08 10:56:26.10 spid34s     The activated proc ‘[dbo].[PayloadReceive]’ running on queue ‘ServiceBrokerDatabase.dbo.InboundQueue’ output the following:  ‘Some useful info’

To break that down, you get timestamp, process id, and then a message that says effectively “stored procedure” on “queue” output “some stuff”. So the “some stuff” could be an error message output by SQL, or a PRINT statement in your proc as you try to test.

Read More

Build your own SQL 2012 Demo Machine – Part 7 – Performance Tweaks

Step 8: Tweak Windows Server 2008R2

In the final  (optional) part we’ll apply some performance tweaks from BlackViper, just to lighten the VM and make it run a bit more smoothly.

TAKE A BACKUP NOW! You can break your VM doing this so snapshot, backup – whatever you prefer – but do it. Now.

Disable the following services:

  • Diagnostic Policy Service
  • Diagnostic Policy Host
  • Diagnostic System Host
  • IP Helper
  • Power
  • Print Spooler
  • Problem Reports and Solutions Control Panel Support
  • Remote Procedure Call (RPC) Locator
  • Remote Registry
  • Smart Card
  • Smart Card Removal Policy
  • Windows Font Cache Service
  • Windows Remote Management (WS-Management)

Then Restart your machine. You are now done and can start adding content to your Demo machine and exploring SQL2012 features!

Read More

Build your own SQL 2012 Demo Machine – Part 6 – Office Components

Step 7: Install Office Components

The final part is to install Office – specifically Excel – and the key Add-Ins that you will need for demo purposes.

Important:  We want to install 64-bit Office to leverage PowerPivot in Excel properly. So don’t use the default Setup option, but navigate to the x64 folder on the installation media and run that version of setup.

1. Run 64 Bit Office setup and get the Product Key screen:

Fig 7.1: Product Key
Fig 7.1: Product Key

2. Enter your key

3. Click Continue for the Licence Terms dialog. Accept the licence terms

4. Click Continue. This will bring you to the Installation Type screen:

Fig 7.2: Installation Type
Fig 7.2: Installation Type

We only want certain components so choose “Customize”, which gives the Installation Options dialog:

Fig 7.3: Installation Options
Fig 7.3: Installation Options

5. Follow the settings shown above to remove the following components:

  • Access
  • InfoPath
  • OneNote
  • Outlook
  • Publisher
  • Word

You can be flexible with the above – for example I’ve left in PowerPoint as it’s often easier to demo the slide decks from within the VM during demos – but the lighter the install the better. Enter User Information if you want.

Important Check: If you see an additional tab in the dialog in Fig 6.3 called “Platform” you are installing 32-bit – cancel and start again, installing the 64 bit version as described at the start.

6. Once configured, click “Install Now”. It’s time for another progress bar and a meal, this one takes a while:

Fig 7.4: Installation Progress
Fig 7.4: Installation Progress

Eventually you will get the completion screen:

Fig 7.5: Completion
Fig 7.5: Completion

7. Click close.

8. Restart the VM, run Windows Update, Install any updates and Restart again for good measure.

9. Install the PowerPivot add-in. Run the x64 Installer:

Fig 7.6: PowerPivot Excel AddIn Installer
Fig 7.6: PowerPivot Excel AddIn Installer

10. Click Next for the Licence Terms screen. Accept the licence terms.

12. Click Next then enter your details on the Registration Information screen

13. Click Next for the confirmation screen

14. Click Install. Enjoy the progress bar:

Fig 7.7: PowerPivot Excel AddIn Installer
Fig 7.7: PowerPivot Excel AddIn Installer

15. Once complete, click Finish.

16. Install the Data Mining Add In:

Fig 7.8: Data Mining for Excel AddIn
Fig 7.8: Data Mining for Excel AddIn

18. Click Next for the Licence terms screen. Accept the licence terms.

19. Click next to get to the Feature Selection:

Fig 7.9: Data Mining for Excel AddIn Feature Selection
Fig 7.9: Data Mining for Excel AddIn Feature Selection

In a change from the default, select the Data Mining Client for Excel.

20. Click Next for the COnfirmation Screen.

21. Click Install at the next screen.

22. Click Finish once completed.

And that’s it! You have a complete demo machine. Optionally you can do some further tweaks for performance or just get stuck in. If you haven’t been religiously taking snapshots, now is a very good time to take one and baseline your VM.

Optionally, you can now move on to Part 7 – Performance Tweaks

Read More