Click here to Skip to main content
15,900,907 members
Articles / Programming Languages / SQL

SQL Query Run Time Estimation

Rate me:
Please Sign up or sign in to vote.
5.00/5 (1 vote)
31 May 2011CPOL2 min read 16.2K   2  
SQL Query run time estimation

This is a follow up to a post about subqueries with an estimation for how long it would take for the query to run.

In a recent post about subqueries, I compared 4 queries and how they handled 4 million rows of data. The queries involved:

  1. Joins with AND conditions
  2. Subqueries with AND conditions
  3. Joins with OR conditions
  4. Subqueries with OR conditions

I compared the speed differential between all of them [on SQL Server], but the last of the queries never finished. As I indicated I might do, I left it running overnight – it still never finished!

Well, my curiosity about how long it would take for the query to finish wasn’t diminished – it was only enhanced. How could I find out and at least be able to estimate how long it would take? I needed to have some sort of understanding of how the performance was degrading as more records were added to the table. So I did that. I ran the query using a smaller number of records in the StudentTestData table. Here are the results:

# recordsRun Time
25,0006 seconds
50,00023 seconds
100,00099 seconds
200,000505 seconds

My anticipation is SQL Server was doing some sort of cross-join type comparison in the background, and the numbers above seem to support this. The efficiency, in Big-O notation, is around O(n^2). Since I want to work with actual numbers to estimate how long the query would run with N = 4 million rows, here’s a rough function:

Number of Seconds = 6*(N/25000)^2, where N is the number of rows.

This function underestimates the time requirement as you add more records, as it is slightly too low for N=100,000 and it is about 2 minutes off for N=200,000. Despite this, it gives me something to work with.

As a spot-check, I tried N = 80,000 students. The function expectation is 61 seconds, and the actual time the query ran was 60 seconds. That’s about as close as I can ask for.

But what about if N is 4 million rows as it was in the previous post? Based on the function, the expectation is over 43 hours. Also, this is an understatement given the results above for N=200,000. It’s a good thing I didn’t wait for it to finish!

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
United States United States
This member has not yet provided a Biography. Assume it's interesting and varied, and probably something to do with programming.

Comments and Discussions

 
-- There are no messages in this forum --