February 18, 2017February 18, 2017 simoncbirch

Non-indexed temporary tables in warehouse stored procs

The definitive work on this by Paul White explains in tremendous detail some surprising facts about the caching of temporary tables and their statistics.

I’m going to summarise some of these points and outline some general recommendations I like my team to follow. This goes a little beyond Paul’s advice, possibly because he thought this corollary too obvious to mention, or because he wanted to retain applicability to OLTP workloads with a high volume of concurrent transactions.

February 12, 2017February 18, 2017 simoncbirch

A lesson in statistics from a data migration

I recently consulted on the migration of some large tables in SQL Server 2008R2. “Statistically significant” errors were made, and hurriedly corrected. Here are some of the lessons learnt.

November 19, 2016February 18, 2017 simoncbirch

XML datatype instance has too many levels of nested nodes. Maximum allowed depth is 128 levels. Gah!

Yeah, that’s pretty annoying! You’re in a rush to troubleshoot the performance of a stored proc that’s gone rogue, and the tool you are using (something like the SQL statement pictured above) is crashing with this XML error with sys.dm_exec_text_query_plan. You don’t have time to fix the tool properly, so you hack it apart and finally get the query plan text displayed graphically, address the issue and hastily move on to the next burning issue.

November 19, 2016February 18, 2017 simoncbirch

How I Present SQL Code in WordPress

I really should get more practice writing blog posts to give me a chance of remembering the torturous process of inserting quite nicely formatted SQL code. I say “quite nicely” because I’ve seen it done better elsewhere, but then again I’ve also seem much worse.

February 25, 2016February 26, 2016 simoncbirch

Protected: Agile Data Warehouse Interfaces

February 11, 2016February 21, 2016 simoncbirch

Dimension ProcessAdd with speed and integrity

There’s an SSAS Dimension ProcessAdd tuning tip that no one seems to consider, as evidenced in several blog posts and workplaces. I’m not sure if that’s correlation or causation but the line of conversation I’ve witnessed goes like this:

February 6, 2016February 14, 2016 simoncbirch

Median Strip

SQL Server has a number of aggregate functions including MAX, COUNT and AVG, but if you want to calculate MEDIAN you will be rolling your own, probably with a little help from Google. If your data is appropriately indexed and performance is critical then Paul White has your back: the OFFSET-FETCH method from 2012 and a dynamic cursor method for earlier versions, both calculated singly, and per group. It’s hard to imagine a faster solution as those provided pretty much jump direct to the median values.

For data without a supporting index it gets a lot more interesting as established methods are quite memory and time-consuming: they all involve sorting the data. Dwain Camps seems to have developed the leading approach and applied it per group. This built on Aaron Bertrand’s work documenting and running performance tests across a single dataset.

I have a few ideas to present that will massively strip back the amount of sorting required to calculate medians without an index, resulting in huge performance gains. Unfortunately these solutions are quite “elaborate”, but I’ve managed to avoid unsupported features, query hints and trace flags (for a change).

January 23, 2016August 1, 2016 simoncbirch

Updating Statistics on Steroids

Updating statistics on large tables can take a while. A page-compressed warehouse fact table with 19 billion rows, 1000 partitions, a sensible clustered index, and 35 columns occupying in excess of 1TB disk space on my server takes around 22 hours to update its 33 stats objects using the default sample size. Sure you can do several tables in parallel, but you can’t update the various stats on a single table in parallel using supported functionality.

So the next step is usually to tinker with sample size, spread the update across multiple maintenance windows, switch older partitions out to an archive table, or perhaps leverage filtered stats (conditions apply) or incremental stats (from SQL 2014). This blog post looks at a more radical approach – attempting to update those 33 stats “in parallel” to get that 22 hour binge down to the length of a reasonable coffee break.

Simon Birch on SQL Server

Simon is a SQL Server data warehouse architect in Australia with a passion for performance tuning, problem solving and unicycling