SQL Server Deep Dives: Missing Index Feature

The missing indexes feature uses dynamic management objects and Show plan to provide information about missing indexes that could enhance SQL Server query performance. The Missing Index feature will suggest you the candidates for index based on the execution plans of the queries being run against the database. This feature is based on the data generated from the execution plans stored in the cache.

When the query optimizer generates a query plan, it analyzes what are the best indexes for a particular filter condition. If the best indexes do not exist, the query optimizer generates a suboptimal query plan, but still stores information about these indexes. The missing indexes feature enables you to access information about these indexes so you can decide whether they should be implemented. You can use Sys.dm_db_missing_index_details DMV to view the details of missing indexes or you can examine the execution plan or XML show plans for a query to find the missing index details

Here on this article we will see the Limitations of the Missing Index feature.

The following are the Limitations of Missing Index Feature.

The missing index may recommend a non-clustered index that already exists. Over and over again.

Sometime the Missing Index feature recommends you to create a Non-Clustered Index that is already exist. This is actually a bug with the Missing Index feature (View details)

Setup the work table

CREATE TABLE MissingIndex

(

EmpID INT IDENTITY,

RefID INT,

LastRefID INT,

Remarks CHAR(4000)

)

-- Insert test data – Create a Covered Index

INSERT INTO MissingIndex (RefID,LastRefID,Remarks)

SELECT RAND()*10000,RAND()*10000,'NA'

GO 10000

-– Create a Covered Index

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(RefID,LastRefID)

INCLUDE(EmpID)

SELECT EmpID

FROM MissingIndex

WHERE RefID BETWEEN 10 AND 1000

AND LastRefID > 6000 ;

The Execution plan will guide you to create those missing indexes.

Let’s create those missing indexes and examine the Execution plan again for the same query.

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(RefID,LastRefID)

INCLUDE(EmpID)

Estimated Execution Plan after creating the Missing index Suggested by the optimizer.

Good to see an index seek operation, and the index I created was also useful. Now let’s see how this index works for a little complicated query.

Check the estimated execution plan for the following query

DECLARE @EmpID INT

DECLARE curMissingIndex CURSOR

FOR SELECT EmpID

FROM MissingIndex

WHERE RefID BETWEEN 10 AND 1000

AND LastRefID > 6000

OPEN curMissingIndex

FETCH NEXT FROM curMissingIndex INTO @ EmpID;

WHILE @@fetch_status = 0

BEGIN

FETCH NEXT FROM curMissingIndex INTO @ EmpID;

END

CLOSE curMissingIndex ;

DEALLOCATE curMissingIndex ;

This time also optimizer suggests me for a missing index. Let’s see what the missing index is.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[MissingIndex] ([RefID],[LastRefID])

INCLUDE ([EmpID])

Ah! Shocking? The missing index is actually exactly the same as the one I created little earlier. . It will continue to recommend you the same again and again even if you follow the missing index suggestion.Even if you examine the Sys.dm_db_missing_index_details DMV you can find an entry there.

It is not intended to fine tune an indexing configuration.

The missing indexes feature is a lightweight tool for finding missing indexes that might significantly improve query performance. It does not provide adequate information to fine tune your indexing configuration. It will not suggest you Partitioning or indexed Views etc. Use Database Engine Tuning Advisor for that purpose it is having a wide range of recommendations.

It cannot gather statistics for more than 500 missing index groups

Once maximum of 500 missing indexes groups are captured, no more missing index group data is gathered. This threshold is not a tunable parameter and cannot be changed.

It does not specify an order for columns to be used in an index.

The missing indexes feature does not consider the optimum sort order when suggesting an index. Consider an example

SELECT EmpID

FROM MissingIndex

WHERE RefID BETWEEN 10 AND 1000

AND LastRefID > 6000

ORDER BY LastRefID, RefID

For the above query the optimizer will suggest you a missing index with columns RefID and LastRefID and an INCLUDE part as EmpID, But the reality is that if you create the index with the column order RefID and LastRefID then you can take advantage of Sorting.
If I created the missing index as

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(RefID,LastRefID)

INCLUDE(EmpID)

Then cost of the query was 1.34561

Instead of creating index “IDX_NCL_tMissingIndex_cRefID_cLastRefID”
If I created as the following

DROP INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(LastRefID, RefID)

INCLUDE(EmpID)

Then cost of the query came down to 0.728691!

For queries involving only inequality predicates, it returns less accurate cost information

The missing index feature uses a simplistic model to generate cost information for queries involving only inequality predicates. Consequently the cost information returned for these queries may be less accurate than the information returned for queries that involve equalities.

It reports only include columns for some queries, so index key columns must be manually selected

Sometimes the feature only reports column usage information for include columns. For example, the feature might report equality columns = { }, inequality columns = { }, and include columns = {b, c, …}. In this case, one of the include columns must be selected for the index key column

It returns only raw information about columns on which indexes might be missing.

The missing indexes feature returns raw information about columns on which indexes might be missing. The information returned might require additional processing before we can use the information to create an index.

For example, if the column involved is a computed column, and the computation is nondeterministic or imprecise, the CREATE INDEX statement on that column will fail. In the case of an imprecise computation, the column should be persisted first, and then you can create the index.

Sometimes missing index groups appears multiple times on the Showplans

Let’s see an example,

Setup the sample tables

-- Create Customer Table

CREATE TABLE Customer

(CustomerID INT IDENTITY,

FirstName VARCHAR(10),

MiddleName VARCHAR(10),

LastName VARCHAR(10),

Remarks CHAR(2000)

)

-- Insert 1000 Customers

INSERT INTO Customer(FirstName, MiddleName, LastName, Remarks)

SELECT CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0)),'NA'

GO 1000

-- Contact information

CREATE TABLE CustomerContactInfo

(CustomerID INT,

Address1 CHAR(200),

Address2 CHAR(200),

CityID INT,

Phone VARCHAR(10),

)

INSERT INTO CustomerContactInfo

SELECT ROUND((999 * RAND() + 1), 0),

CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,ROUND(RAND()*100,0)

,CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))

+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))

GO 3000

-- Sales Information

CREATE TABLE SalesSummary

(SalesID INT IDENTITY,

CustomerID INT,

UnitPrice MONEY,

Tax MONEY,

Discount MONEY,

TotalPrice AS UnitPrice+Tax+Discount,

Remark CHAR(2000))

INSERT INTO SalesSummary (CustomerID,UnitPrice, Tax, Discount, Remark)

SELECT ROUND((999 * RAND() + 1), 0),ROUND(RAND()*10000,0),ROUND(RAND()*100,0),ROUND(RAND()*1000,0), 'NA'

GO 10000

Execute the following query, since our tables don’t have any useful index to satisfy the query it The optimizer will place an entry on the Missing index details.

-- Execute the query

SELECT C.CustomerID,FirstName,MiddleName,LastName,Address1,Address2,CityID,Phone,SalesID,TotalPrice,S.Remark

FROM Customer C

INNER JOIN dbo.SalesSummary S

ON C.CustomerID = S.CustomerID

INNER JOIN CustomerContactInfo CC

ON CC.CustomerID = C.CustomerID

WHERE S.CustomerID = 223

Now, examine the missing index details by running the following query

-- Check the missing index details

SELECT OBJECT_NAME(Object_id),equality_columns,inequality_columns,included_columns

FROM sys.dm_db_missing_index_groups g

INNER JOIN sys.dm_db_missing_index_details d

ON g.index_handle = d.index_handle

ORDER BY object_id

You can see the optimizer came up with a suggestion to create two indexes on Sales table, but the Index Key Column is the same “CustomerID”, Only difference is the included column.This occurs when different parts of a single query benefit differently from the same missing index group.
Keep these limitations on mind while working on Query Performance Tuning!

SQL Server Deep Dives

Tuesday, September 28, 2010

Missing Index Feature - Limitation

No comments:

Post a Comment