Code Answer: For autoincrement fields: MAX(ID) vs TOP 1 ID ORDER BY ID DESC

I want to find the highest AutoIncremented value from a field. (its not being fetched after an insert where I can use @@SCOPE_IDENTITY etc) Which of these two queries would run faster or gives better performance. Id is the primary key and autoincrement field for Table1. And this is for Sql Server 2005.

SELECT MAX(Id) FROM Table1

SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC

[Edit]
Yes in this case Id is the field on which I have defined the clustered index.
If the index is ID DESC then what..
And yes it would be nice to know how the performance would be affected if
1. Id is a clustered index + primary key.
2. Id is a clustered index and not primary key.
3. Id is a non clustered index ASC + primary key.
4. Id is a non clustered index ASC and not primary key.
5. Id is a non clustered index DESC + primary key.
6. Id is a non clustered index DESC and not primary key.
7. Id is just AutoIncrement

Hope its not a tall order!

From stackoverflow

In theory, they will use same plans and run almost same time.

In practice,

SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC

will more probably use a PRIMARY KEY INDEX.

Also, this one is more extendable if you will decide to select some else column along with id.

An actual plan on MAX() says:

SELECT <- AGGREGATE <- TOP <- CLUSTERED INDEX SCAN

, while plan for TOP 1 says:

SELECT <- TOP <- CLUSTERED INDEX SCAN

, i. e. aggregate is omitted.

Aggregate actually won't do anything here, as there is but one row.

P. S. As @Mehrdad Afshari and @John Sansom noted, on a non-indexed field MAX is slightly faster (of course not 20 times as optimizer says):

-- 18,874,368 rows

SET LANGUAGE ENGLISH
SET STATISTICS TIME ON
SET STATISTICS IO ON
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC
PRINT 'MAX'
SELECT MAX(id) FROM master
PRINT 'TOP 1'
SELECT TOP 1 id FROM master ORDER BY id DESC

Changed language setting to us_english.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.
MAX

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 20 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 447, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 5452 ms,  elapsed time = 2766 ms.
TOP 1

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 2, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 6813 ms,  elapsed time = 3449 ms.
MAX

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 44, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 5359 ms,  elapsed time = 2714 ms.
TOP 1

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 6766 ms,  elapsed time = 3379 ms.
MAX

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 5406 ms,  elapsed time = 2726 ms.
TOP 1

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 6780 ms,  elapsed time = 3415 ms.
MAX

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 85, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 5392 ms,  elapsed time = 2709 ms.
TOP 1

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 10, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 6766 ms,  elapsed time = 3387 ms.
MAX

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 5374 ms,  elapsed time = 2708 ms.
TOP 1

SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 1 ms.

(строк обработано: 1)
Table 'master'. Scan count 3, logical reads 32655, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:
   CPU time = 6797 ms,  elapsed time = 3494 ms.

Mehrdad Afshari : Query optimizer is deterministic. How come something will "more probably" use an index?

Quassnoi : Optimizer's decisions depend on table statistics which depend on table data which are probabilistic. An optimizer MAY decide to sort the table instead of using and INDEX SCAN along with TOP.

Just compare execution plans and you'll see (press Ctrl+M in Management Studio when editing a query). My wild guess will be that these queries are equally performant provided there is a (clustered) index on Id column.

However, this as a whole is a very bad idea.

Learning : What is a very bad idea?

Anton Gogolev : Using additional queries to retrieve the ID of a newly added row is a very bad idea. Quite a lot of database operations may occur between these two requests.

Learning : Oh indeed. Quite true. +1

Binoj Antony : @Anton please read the question carefully I have clearly stated that "(its not being fetched after an insert where I can use @@SCOPE_IDENTITY etc)"
MAX is generally faster.

Both queries will use the index on the column if exists.

If no index exists on the column, the TOP 1 query will use a Top N Sort operator to sort the table instead of stream aggregation, which makes it slower.

MAX also provides better readability.

Side Note: while MAX will use a stream aggregate operator in the execution plan in the indexed case, it doesn't have any specific cost as it's just processing a single row (Actual Rows = 1). You can compare queries by running them in a single batch and see the relative cost. In the indexed case, both queries will cost 50%. I tested the non-indexed case on a table with about 7000 rows and TOP will cost 65% in comparison to MAX that costs 35%.

Quassnoi : TOP 1 will not sort the query. The question explicitly notes that ID is a PRIMARY KEY.

Mehrdad Afshari : And I think I explicitly mentioned "if no index exists on the column". Read more carefully.

Quassnoi : An index always exists on a primary key.

Mehrdad Afshari : Yes it does, but remember, SO is something more or less like wikipedia. It never hurts to answer more generally. I think I first answered the question with "Both queries will use the index on the column if exists." and after that clarified the general case to avoid pitfalls.

Quassnoi : Maybe it's a communication fault from my side, but as I see your answer, is says that TOP 1 will always sort the table, even if it uses the index. That's not true.

Mehrdad Afshari : Yes, I didn't mean to say that. I'll fix the wording to make it more clear.
If there is a clustered index there is virtually no difference in performance between the two queries.

This is becuase both will perform a Clustered Index Scan that will bear 100% of the query cost.

Performing the two queries on a column that does not have an index results in 3 operators being used in both execution plans.

The Top clause uses the Sort operator and the Max function uses a Stream Aggregate operator.

When there is no index, the MAX() function provides better performance.

Proof of concept can be found and full walkthrough of a test scenario can be found here

Performance Comparison Top 1 Verses MAX() Funciton

Mehrdad Afshari : he indexed case, the stream aggregate operator will have "Actual Rows = 1" which basically costs nothing. If you don't have an index, the execution plan for TOP 1 will have a "Top N" sort which will make it slower than the "Stream Aggregate" that MAX uses.

John Sansom : @Mehrdad: Indeed it does, full details will be added to my blog.

Binoj Antony : @John, use the permanent link to this question on your blog http://stackoverflow.com/questions/590079/

John Sansom : @Binoj: Will do, thanks.
I've just tested the two SQL statements you provided against a typical dataset:
```
SELECT MAX(Id) FROM Table1

SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC
```
And SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC is marginally faster because it has one last step in the execution plan. Here are the execution plans each query carries out:

SELECT MAX(Id) FROM Table1

Clustered Index Scan >> Top >> Stream Aggregate >> Select

SELECT TOP 1 Id FROM Table1 ORDER BY Id DESC

Clustered Index Scan >> Top >> Select
Yes in this case Id is the field on which I have defined the clustered index. If the index is ID DESC then what.. And yes it would be nice to know how the performance would be affected if
1. Id is a clustered index + primary key.
2. Id is a clustered index and not primary key.
3. Id is a non clustered index ASC + primary key.
4. Id is a non clustered index ASC and not primary key.
5. Id is a non clustered index DESC + primary key.
6. Id is a non clustered index DESC and not primary key.
7. Id is just AutoIncrement
For Cases 1 and 2, both will perform a clustered index scan that returns a single record. There is no IO difference between the two queries.

For Cases 3, 4, 5 and 6, both will perform an index scan that returns a single record. There is no IO difference between the two queries.

For Case 7, both will perform a table scan. There is no difference in the IO cost.

Summary: Case 1-6 are made of win! If you're in Case 7, then you've already lost from an IO standpoint.

You can measure IO by using SQL's Query analyzer. Run this before your query.
```
SET STATISTICS IO ON
```
Nobody mentioned IDENT_CURRENT('Table1') - blows them all away - of course it only works on identity columns, but that was the question...

Matthew PK : Voted up, but not *always* applicable because `IDENT_CURRENT` isn't necessarily in scope. To be fastest and safest the transaction should be in a stored procedure and use `SCOPE_IDENTITY()`

Code Answer

Friday, April 8, 2011

For autoincrement fields: MAX(ID) vs TOP 1 ID ORDER BY ID DESC

0 comments:

Post a Comment

Blog Archive