Non-indexed temporary tables in warehouse stored procs

The definitive work on this by Paul White explains in tremendous detail some surprising facts about the caching of temporary tables and their statistics.

I’m going to summarise some of these points and outline some general recommendations I like my team to follow.  This goes a little beyond Paul’s advice, possibly because he thought this corollary too obvious to mention, or because he wanted to retain applicability to OLTP workloads with a high volume of concurrent transactions.

Continue reading

Advertisements

XML datatype instance has too many levels of nested nodes. Maximum allowed depth is 128 levels. Gah!

Yeah, that’s pretty annoying! You’re in a rush to troubleshoot the performance of a stored proc that’s gone rogue, and the tool you are using (something like the SQL statement pictured above) is crashing with this XML error with sys.dm_exec_text_query_plan. You don’t have time to fix the tool properly, so you hack it apart and finally get the query plan text displayed graphically, address the issue and hastily move on to the next burning issue.

Continue reading

Median Strip

SQL Server has a number of aggregate functions including MAX, COUNT and AVG, but if you want to calculate MEDIAN you will be rolling your own, probably with a little help from Google. If your data is appropriately indexed and performance is critical then Paul White has your back: the OFFSET-FETCH method from 2012 and a dynamic cursor method for earlier versions, both calculated singly, and per group.  It’s hard to imagine a faster solution as those provided pretty much jump direct to the median values.

For data without a supporting index it gets a lot more interesting as established methods are quite memory and time-consuming: they all involve sorting the data.  Dwain Camps seems to have developed the leading approach and applied it per group. This built on Aaron Bertrand’s work documenting and running performance tests across a single dataset.

I have a few ideas to present that will massively strip back the amount of sorting required to calculate medians without an index, resulting in huge performance gains. Unfortunately these solutions are quite “elaborate”, but I’ve managed to avoid unsupported features, query hints and trace flags (for a change).

Continue reading

Updating Statistics on Steroids

Updating statistics on large tables can take a while.  A page-compressed warehouse fact table with 19 billion rows, 1000 partitions, a sensible clustered index, and 35 columns occupying in excess of  1TB disk space on my server takes around 22 hours to update its 33 stats objects using the default sample size.  Sure you can do several tables in parallel, but you can’t update the various stats on a single table in parallel using supported functionality.

So the next step is usually to tinker with sample size, spread the update across multiple maintenance windows, switch older partitions out to an archive table, or perhaps leverage filtered stats (conditions apply) or incremental stats (from SQL 2014). This blog post looks at a more radical approach – attempting to update those 33 stats “in parallel” to get that 22 hour binge down to the length of a reasonable coffee break.

Continue reading