Click here to Skip to main content
15,867,704 members
Articles / Database Development / SQL Server

Query Plan Caching - My SQL Server Nightmare

Rate me:
Please Sign up or sign in to vote.
5.00/5 (6 votes)
30 Apr 2012CPOL5 min read 20.5K   7   4
If you have an issue with a slow part of a site after a large dataload, that you can't replicate after a DB restore, it's likely you could be dealing with a similar issue.

So we had an interesting issue this week which was bugging the hell out of myself and our technical services team that deploy our software (developed by a third party).

The issue was a severe performance degradation on a certain search field in the application.

First off, the code for this site is around 5/6 years old, and was developed by "wet behind the ears", juniors, just out of university, developers with no senior supervision. This was before my time with the company, but the code got the job done, but by no means in an elegant manner.

So, the issue as reported was "When we search on this field, it we get an error page after a while, and it used to be really fast" (or words to that affect).

My first thought was that there was an issue with the indexes after a rebuild.... nope, rebuilt all indexes, and it was still causing issues.

My next thought was that there was an issue in the WebService layer that the site uses to communicate with the data layer... nope, worked fine on test.

Borked Upgrade? Nope, redid the upgrade (and it worked fine on the test system).

Additional Settings in the web.config causing timeouts? Nope, compared the configs.

DB Connections not being disposed of correctly? Nope, took it into single user mode and back again.

Data issue with the amount of data (mismatched with a smaller dataset in test)... nope, restored the database over the test database...

So in hopes of replicating the issue in a place where I could step through the code, the live database was restored into an isolated place, and the code base copied... nope, search is now working fine on this installation.

An issue with the database engine itself? Nope, restored the database onto the same DB server and App server, search works fine on this installation.

I'd like to mention that by this point, we're on the second day, and I'd dropped everything as I couldn't think about anything else... it was like a drug, and I was completely addicted...

I started to think of completely off the wall ideas like dropping all the data and re-inserting, or restoring data from before the upgrade, I was getting desperate... So we decided that if we didn't find a fix by 4pm (currently around 12pm), we would stop the site, backup the database, restore it to another location, and repoint the site to it. This would solve the problem (as it's the same as we've done to test it), but was by no means a solution. This gave me a new burst of enthusiasm, and I always feel that if you have a fallback plan, and a deadline, the solution presents itself faster.

We started to think about the problem like Dr. Gregory House. I got a couple of people in a room, and we started to write down the symptoms, and what we tried...

  • Ruled out the data itself being the issue
  • Ruled out the indexes (full text included)
  • Ruled out websites
  • Ruled out the code

The only thing left is the original database in its current place.

This led to the revelation that the databases on the database server hadn't been offline EVER (well since the new DB server was installed 2 years ago). Therefore everything that was in memory and was not forcibly removed by SQL server, was still in memory. So we had the question, "What is kept in memory, and not recreated exactly when you restored the database?"... the answer is a lot of things, but one of those is the Cached Query Plan.

I found out that one of the things that was part of this update was an addition of over 100,000 postcodes. Whereas previously, the table held less than 50.

So, we had a look at the query cache, and found the use count on the queries in question was massive (in comparison, the site usage on a monthly basis). This gave me the ammunition to try something experimental on a production machine.

The plan was to take the database into single user mode, then take offline for a minute, bring it back online and into multi-user mode again. Thus giving a small amount of downtime, and we could chalk that upto connectivity issues if anyone notices it. We ran a test on another database, and we worked out that it should be 2 minutes maximum.

For me, this didn't work... It felt like a workaround, and surely people have had to do this in the past... surely... So I set about speed reading some articles on Cached Query Plans and how they worked. I came across an article with a DBCC command you can use to clear the plan cache on a per database level (for 2005, apparently for 2008, you can do this on a per-query basis).

The command is:

SQL
DBCC FLUSHPROCINDB (<dbid>)

And you can get the dbid from:

SQL
SELECT * FROM sys.databases

As soon as the command was run, the search in the site returned, albeit quite slowly (~30 seconds), however the second time, it was less than a second.

So, if you have an issue with a slow part of a site after a large dataload, that you can't replicate after a DB restore, it's likely you could be dealing with a similar issue.

Hope this helps someone...

Note: I do have an idea that updating the stats may have helped, however, I'm not 100% sure. From what I've been reading, it seems that once a query is optimized so far, it stops optimizing and just runs, so I'm not sure if this would cause the cached plans to update themselves...

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United Kingdom United Kingdom
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
GeneralJust use auto update statistics Pin
Dan Randolph23-Jun-12 5:27
Dan Randolph23-Jun-12 5:27 
GeneralRe: Just use auto update statistics Pin
Martin Thwaites23-Jun-12 5:49
Martin Thwaites23-Jun-12 5:49 
GeneralMy vote of 5 Pin
Akram El Assas30-Apr-12 0:25
Akram El Assas30-Apr-12 0:25 
GeneralMy vote of 5 Pin
Justin Cooney20-Apr-12 7:25
Justin Cooney20-Apr-12 7:25 

General General    News News    Suggestion Suggestion    Question Question    Bug Bug    Answer Answer    Joke Joke    Praise Praise    Rant Rant    Admin Admin   

Use Ctrl+Left/Right to switch messages, Ctrl+Up/Down to switch threads, Ctrl+Shift+Left/Right to switch pages.