A lesson in statistics from a data migration

I recently consulted on the migration of some large tables in SQL Server 2008R2. “Statistically significant” errors were made, and hurriedly corrected. Here are some of the lessons learnt.

The project team had been preparing for months, and the data migration was a relatively small concern. It had passed all tests in lower environments but the final testing was in a full size environment and, BAM! –  post migration performance was abysmal. Without migrating stats, all the previous auto-column stats were gradually triggered for creation (each taking around 30 minutes) and critical processes were hung up pending the new statistics.  Because some of the large tables were wide, and contained many such statistics, there was no time in the initial maintenance window to trigger creation of these statistics (SQL will only generate one statistics per table at a time).  This would be unacceptable in production, so we devised a plan to migrate the statistics (and histograms) using the SSMS GUI to script out the statistics including their histograms.  So lesson #1 was: don’t forget to migrate stats on large tables!

Index-stats and user-created stats were already managed by the migration, so we only had to consider the auto-created single column stats. Unfortunately a side-effect of the GUI scripting approach (and equivalently DBCC SHOWSTATISTICS WITH STATS_STREAM) is that exported auto-stats are imported as user-created stats, even though they retain the original name (“_WA_Sys_” prefix and then the column ordinal and table’s object ID in hex). So they look like auto-stats, but aren’t. Once you know the naming convention for auto-stats it’s a bit troubling to know the names no longer match the object_id of the table, so we ingeniously hacked the exported script to fix this, hoping this might switch their auto-created flag, but of course they are still considered user-created stats.  For most systems this wouldn’t be a problem, but our regular statistics maintenance scripts manage auto- and user-created scripts differently (it’s a long story), and we weren’t keen to open the can of worms storing those scripts. There was no way to get SQL Server to accept these were auto-created, and no fast way to drop them and build new auto-stats… or was there?

Lesson #2 is a fast way to create auto-column stats: delete any user-created stats on the relevant tables/columns, then switch out most of the partitions (I hope you don’t have enormous non-partitioned tables, otherwise you and I will have to have a little chat). Switching out all bar the latest partition containing data may be appropriate.  Then run a simple minimal query with a predicate that will trigger the new stat. Unfortunately the sample size will never be as large as for an auto stat on the full size table.  This may well be crucial, so we chose to try to replicate the original sample size as follows: retain as many partitions as necessary so the rowcount of the table approximates the sample size of the original stats. In our case that required one to five partitions per table. Trigger the new stats as above (the rows sampled will be scaled back), then update statistics with a full scan. All this should be quite quick as you are probably down from billions to millions of rows. Then switch back the bulk of the data, and DONE! Let’s hope the partitions you used were representative of the entire table. If not (either the data is skewed, or perhaps you have a non-indexed identity column), then this may not be appropriate. Of course, switching the bulk of the data back in serves to increase each stat’s modification counter, so your regular stats maintenance is likely to rebuild these pretty soon.

There was still one fly in the ointment – switching some partitions out was slow, and no it was not due to data movement. It looked like the switch was being blocked by a system spid, and we traced it down to the statistics on the primary key index being triggered for update by the partition switching.  Rather than bear the cost of several such updates (yes we set stats updates to be asynchronous, but these were still blocking the partition switch) we chose to set them to not recompute, and to make this speedy:


And at the very end, after partitions were switched back, incur a single update:


And that was it – another use for partition switching, with some fun along the way.





Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s