Microsoft SQL server

SQL Server has a number of aggregate functions including MAX, COUNT and AVG, but if you want to calculate MEDIAN you will be rolling your own, probably with a little help from Google. If your data is appropriately indexed and performance is critical then Paul White has your back: the OFFSET-FETCH method from 2012 and a dynamic cursor method for earlier versions, both calculated singly, and per group. It’s hard to imagine a faster solution as those provided pretty much jump direct to the median values.

For data without a supporting index it gets a lot more interesting as established methods are quite memory and time-consuming: they all involve sorting the data. Dwain Camps seems to have developed the leading approach and applied it per group. This built on Aaron Bertrand’s work documenting and running performance tests across a single dataset.

I have a few ideas to present that will massively strip back the amount of sorting required to calculate medians without an index, resulting in huge performance gains. Unfortunately these solutions are quite “elaborate”, but I’ve managed to avoid unsupported features, query hints and trace flags (for a change).

Updating statistics on large tables can take a while. A page-compressed warehouse fact table with 19 billion rows, 1000 partitions, a sensible clustered index, and 35 columns occupying in excess of 1TB disk space on my server takes around 22 hours to update its 33 stats objects using the default sample size. Sure you can do several tables in parallel, but you can’t update the various stats on a single table in parallel using supported functionality.

So the next step is usually to tinker with sample size, spread the update across multiple maintenance windows, switch older partitions out to an archive table, or perhaps leverage filtered stats (conditions apply) or incremental stats (from SQL 2014). This blog post looks at a more radical approach – attempting to update those 33 stats “in parallel” to get that 22 hour binge down to the length of a reasonable coffee break.

Simon Birch on SQL Server

Simon is a SQL Server data warehouse architect in Australia with a passion for performance tuning, problem solving and unicycling

Median Strip

Updating Statistics on Steroids