Tuesday, September 28, 2010

Missing Index Feature - Limitation

The missing indexes feature uses dynamic management objects and Show plan to provide information about missing indexes that could enhance SQL Server query performance. The Missing Index feature will suggest you the candidates for index based on the execution plans of the queries being run against the database. This feature is based on the data generated from the execution plans stored in the cache.

When the query optimizer generates a query plan, it analyzes what are the best indexes for a particular filter condition. If the best indexes do not exist, the query optimizer generates a suboptimal query plan, but still stores information about these indexes. The missing indexes feature enables you to access information about these indexes so you can decide whether they should be implemented. You can use Sys.dm_db_missing_index_details DMV to view the details of missing indexes or you can examine the execution plan or XML show plans for a query to find the missing index details

Here on this article we will see the Limitations of the Missing Index feature.

The following are the Limitations of Missing Index Feature.

The missing index may recommend a non-clustered index that already exists. Over and over again.

Sometime the Missing Index feature recommends you to create a Non-Clustered Index that is already exist. This is actually a bug with the Missing Index feature (View details)

Setup the work table


CREATE  TABLE MissingIndex
(
 EmpID INT IDENTITY,
 RefID INT,
 LastRefID INT,
 Remarks CHAR(4000)
)
GO

-- Insert test data – Create a Covered Index
INSERT INTO MissingIndex (RefID,LastRefID,Remarks)
SELECT RAND()*10000,RAND()*10000,'NA'

GO 10000
-– Create a Covered Index
CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID
ON MissingIndex(RefID,LastRefID)
INCLUDE(EmpID)

SELECT EmpID
FROM   MissingIndex
WHERE  RefID BETWEEN 10 AND 1000
AND LastRefID > 6000 ;


The Execution plan will guide you to create those missing indexes.

Let’s create those missing indexes and examine the Execution plan again for the same query.


CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID
ON MissingIndex(RefID,LastRefID)
INCLUDE(EmpID)



Estimated Execution Plan after creating the Missing index Suggested by the optimizer.

Good to see an index seek operation, and the index I created was also useful. Now let’s see how this index works for a little complicated query.

Check the estimated execution plan for the following query


DECLARE @EmpID INT

DECLARE curMissingIndex CURSOR
FOR SELECT  EmpID
FROM    MissingIndex
WHERE   RefID BETWEEN 10 AND 1000
AND LastRefID > 6000


OPEN curMissingIndex
FETCH NEXT FROM curMissingIndex INTO @ EmpID;

WHILE  @@fetch_status = 0
BEGIN
    FETCH NEXT FROM curMissingIndex INTO @ EmpID;
END

CLOSE curMissingIndex ;
DEALLOCATE curMissingIndex ;




This time also optimizer suggests me for a missing index. Let’s see what the missing index is.

















/*
/*
CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [dbo].[MissingIndex] ([RefID],[LastRefID])
INCLUDE ([EmpID])
GO
*/
*/

*/
Ah! Shocking? The missing index is actually exactly the same as the one I created little earlier. . It will continue to recommend you the same again and again even if you follow the missing index suggestion.Even if you examine the Sys.dm_db_missing_index_details DMV you can find an entry there.

It is not intended to fine tune an indexing configuration.

The missing indexes feature is a lightweight tool for finding missing indexes that might significantly improve query performance. It does not provide adequate information to fine tune your indexing configuration. It will not suggest you Partitioning or indexed Views etc. Use Database Engine Tuning Advisor for that purpose it is having a wide range of recommendations.

It cannot gather statistics for more than 500 missing index groups

Once maximum of 500 missing indexes groups are captured, no more missing index group data is gathered. This threshold is not a tunable parameter and cannot be changed.

It does not specify an order for columns to be used in an index.

The missing indexes feature does not consider the optimum sort order when suggesting an index. Consider an example


SELECT EmpID
FROM   MissingIndex
WHERE  RefID BETWEEN 10 AND 1000
AND LastRefID > 6000
ORDER BY LastRefID, RefID

For the above query the optimizer will suggest you a missing index with columns RefID and LastRefID and an INCLUDE part as EmpID, But the reality is that if you create the index with the column order RefID and LastRefID then you can take advantage of Sorting.
If I created the missing index as



CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID
ON MissingIndex(RefID,LastRefID)
INCLUDE(EmpID)
Then cost of the query was 1.34561

Instead of creating index “IDX_NCL_tMissingIndex_cRefID_cLastRefID”
If I created as the following


DROP INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID
ON MissingIndex

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID
ON MissingIndex(LastRefID, RefID)
INCLUDE(EmpID)

Then cost of the query came down to 0.728691!

For queries involving only inequality predicates, it returns less accurate cost information

The missing index feature uses a simplistic model to generate cost information for queries involving only inequality predicates. Consequently the cost information returned for these queries may be less accurate than the information returned for queries that involve equalities.

It reports only include columns for some queries, so index key columns must be manually selected

Sometimes the feature only reports column usage information for include columns. For example, the feature might report equality columns = { }, inequality columns = { }, and include columns = {b, c, …}. In this case, one of the include columns must be selected for the index key column

It returns only raw information about columns on which indexes might be missing.

The missing indexes feature returns raw information about columns on which indexes might be missing. The information returned might require additional processing before we can use the information to create an index.

For example, if the column involved is a computed column, and the computation is nondeterministic or imprecise, the CREATE INDEX statement on that column will fail. In the case of an imprecise computation, the column should be persisted first, and then you can create the index.

Sometimes missing index groups appears multiple times on the Showplans

Let’s see an example,

Setup the sample tables



-- Create Customer Table
CREATE TABLE Customer
(CustomerID INT IDENTITY,
 FirstName VARCHAR(10),
 MiddleName VARCHAR(10),
 LastName VARCHAR(10),
 Remarks CHAR(2000)
 )

GO
-- Insert 1000 Customers
INSERT INTO Customer(FirstName, MiddleName, LastName, Remarks)
SELECT CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))
,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))
,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0)),'NA'
GO 1000

-- Contact information
CREATE TABLE CustomerContactInfo
(CustomerID INT,
 Address1 CHAR(200),
 Address2 CHAR(200),
 CityID INT,
 Phone VARCHAR(10),
 )
GO


INSERT INTO CustomerContactInfo
SELECT ROUND((999 * RAND() + 1), 0),
         CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))
         ,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))
         ,ROUND(RAND()*100,0)
         ,CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))
         +CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))
       
        
GO 3000

-- Sales Information
CREATE  TABLE SalesSummary
(SalesID INT IDENTITY,
 CustomerID INT,
 UnitPrice MONEY,
 Tax MONEY,
 Discount MONEY,
 TotalPrice AS UnitPrice+Tax+Discount,
 Remark CHAR(2000))

GO

INSERT INTO SalesSummary (CustomerID,UnitPrice, Tax, Discount, Remark)
SELECT ROUND((999 * RAND() + 1), 0),ROUND(RAND()*10000,0),ROUND(RAND()*100,0),ROUND(RAND()*1000,0), 'NA'
GO 10000

Execute the following query, since our tables don’t have any useful index to satisfy the query it The optimizer will place an entry on the Missing index details.

-- Execute the query
SELECT C.CustomerID,FirstName,MiddleName,LastName,Address1,Address2,CityID,Phone,SalesID,TotalPrice,S.Remark
FROM Customer C
INNER JOIN dbo.SalesSummary S
ON C.CustomerID = S.CustomerID
INNER JOIN CustomerContactInfo CC
ON CC.CustomerID = C.CustomerID
WHERE S.CustomerID = 223

GO

Now, examine the missing index details by running the following query


-- Check the missing index details
SELECT OBJECT_NAME(Object_id),equality_columns,inequality_columns,included_columns
FROM sys.dm_db_missing_index_groups g
INNER JOIN sys.dm_db_missing_index_details d
ON g.index_handle = d.index_handle
ORDER BY object_id






You can see the optimizer came up with a suggestion to create two indexes on Sales table, but the Index Key Column is the same “CustomerID”, Only difference is the included column.This occurs when different parts of a single query benefit differently from the same missing index group.
Keep these limitations on mind while working on Query Performance Tuning!


Monday, September 20, 2010

SQL Server 2008 : Table-Valued Parameter

If you are new to TSQL you should have asked “How can I create arrays on SQL Server?” or ” How can I pass an array to SQL Server Stored Procedure?”.  The answer is, Sorry!  SQL Server does not support arrays because T-SQL is a Set-based language, deals with sets of data rather than Row by Row. But there are times that you may need to pass the parameters as arrays.  Before SQL Server 2008, common ways to meet this need were based on dynamic SQL, a split function, XML, and other techniques. The approach using dynamic SQL involved the risk of SQL Injection and did not provide efficient reuse of execution plans. Using a split function was complicated, and using XML was complicated and non-relational.

But things changed with the release of SQL Server 2008. Table-Valued Parameters (TVP) are one of the key enhancement of SQL Server 2008 T-SQL. In this post I will explain what a Table-Valued Parameter is?When to use Table-Valued Parameter? How to create a Table-Valued Parameter? The benefits and restrictions of Table-Valued Parameter and an Example.


The Scenario.

I have a table SalesSummary to store the information about the Sales Summary Information about the Sales representative of a company.


/*  Create the SalesSummary Sample table */
CREATE TABLE SalesSummary
(SalesID INT PRIMARY KEY IDENTITY,
 SalesRepID INT,
 SaleDate DATE,
 TotalSales  INT,
 )

/* 
Insert some sample data */

INSERT INTO SalesSummary(SalesRepID, SaleDate, TotalSales)
VALUES (55,'02-01-2001',20),
(55,'03-11-2001',58),
(55,'05-06-2001',25),
(55,'06-12-2002',3),
(55,'06-21-2002',10),
(55,'12-10-2003',26),
(55,'04-05-2004',1),
(55,'12-01-2004',20),
(66,'02-01-2001',1),
(66,'03-11-2001',5),
(66,'05-06-2001',5),
(66,'06-12-2002',23),
(66,'06-21-2002',10),
(66,'12-10-2003',16),
(66,'04-05-2004',11),
(66,'12-01-2004',20),
(77,'02-01-2001',5),
(77,'03-11-2001',2),
(77,'05-06-2001',5),
(77,'06-12-2002',16),
(77,'06-21-2002',2),
(77,'12-10-2003',42),
(77,'04-05-2004',16),
(77,'12-01-2004',20)



-- Check the Inserted data
SELECT * FROM SalesSummary

Now the board members of the company wish to drill down the details of the Sales Rep who doesn’t met the target of 20 for each year.
Eg: Given below are the details of SalesRepthat the user wish to see
SalesRepID
Sales Year
55
2002
66
2001
66
2003
77
2001
77
2002

I.e., For the SalesRep 66, the user wish to get the details for the year 2001 and 2003 for 55, 2002 etc.
Lets see, how we done this before the release of SQL Server 2008 (I know you can use dynamic sql, Split function or using XML method to solve this, but I am using Split Method here.)









-- Create a stored procedure using split method separated
-- Sales Rep are separated by comma, SalesRep and year is separated by space and Multiple year for the SalesRep is separated by #

CREATE PROCEDURE dbo.usp_GetSalesRepSummary_SplitMethod 
@inputString VARCHAR(100)
AS
BEGIN

DECLARE
            @SalesRepID VARCHAR(4),
            @SalesYear VARCHAR(4)

DECLARE @temp TABLE
(SalesRepID INT,
 SalesYear INT)


WHILE LEN(@InputString)<>0
BEGIN
           
            -- Split the SalesRepID
            SET @SalesRepID = LEFT(@inputString,CHARINDEX(' ',@inputString))
           
            -- update the @inputString
            SET @InputString = STUFF(@inputString,1,CHARINDEX(' ',@inputString),'')
           
            -- Split the year
            WHILE LEFT(@inputString,1) <> ',' AND LEN(@inputString) > 0
            BEGIN
                 
                  SET @SalesYear = LEFT(@inputString,4)
                  SET @InputString = STUFF(@inputString,1,4,'')
                 
                  IF LEFT(@inputString,1) = '#'
                  BEGIN
                 
SET @InputString = STUFF(@inputString,1,1,'')
                  END
                 
                  INSERT INTO @temp                  
                  SELECT @SalesRepID, @SalesYear
           
            END
           
            -- update the @inputString
            SET @InputString =TUFF(@inputString,1,CHARINDEX(',',@inputString),'')
           
END


SELECT SS.*
FROM @Temp t
INNER JOIN dbo.SalesSummary ss
ON t.SalesRepID = SS.SalesRepID
AND t.SalesYear = YEAR(ss.SaleDate)

END
-- Execute the procedure to see the result.EXEC dbo.usp_GetSalesRepSummary_SplitMethod  '55 2002,66 2001#2002,77 2001#2002'







As you can observe that the logic inside split procedure is looping, and it really looks ugly because of complicated Parameter, split process and, of course not a pure set based approach!
If the user wants to see the details of 100 SalesRep with each one having 10 years, then the looping is 100*10
Let’s see what TVP can do here!



What is a Table-Valued Parameter? 

Table Valued parameters are like parameter arrays in OLE DB and ODBC, but offer more flexibility and closer integration with Transact-SQL. Table Valued Parameters also have the benefit of being able to participate in set-based operations.

Table Valued Parameters are declared by using user-defined table types. You can use table-valued parameters to send multiple rows of data to a Transact-SQL Statement or a Routine, such as a Stored Procedure or a Function, without creating temporary table or many parameters.

How to create Table-Valued Parameter?

Table-Valued Parameter have two components, A SQL Server Type and a Parameter that reference that type. 

Step1 : Creating a table Type and Table Structure. 

Step2 : Declare a routine (Stored Procedure or Function) having Parameter of Table Type.

Step3: Declare a variable of the table type and reference the table type.

Step4 : Using Insert Statement fill the table variable created ion step3.

Step 5: After the table variable is created and filled, you can pass the variable to a routine.

Let's walk through the steps.









Step 1 : Creating a Table Type and Table Structure:

--    Create a table type and define the table structure
CREATE  TYPE dbo.typSalesRepSummary AS TABLE
 (SalesRepID INT,
  SalesFrom DATE,
  SalesTo DATE);

GO
Step 2 : Declare a routine that has a parameter of the table type:

CREATE PROCEDURE usp_GetSalseRep_CustYearSummary
@CustDetails dbo.typSalesRepSummary READONLY
AS
BEGIN

  SET NOCOUNT ON

  SELECT S.SalesRepID, CONVERT(VARCHAR,SalesFrom)+' - '+CONVERT(VARCHAR,SalesTo)Duration
  , TotalSales
  FROM @CustDetails C
  INNER JOIN SalesSummary S
  ON S.SalesRepID = C.SalesRepID
  WHERE SaleDate BETWEEN SalesFrom AND SalesTo

END 
Note: Currently, table-valued parameters are read only, and you must define them as such by using the READONLY keyword.

To execute the TVP Procedure.

Step 3: Declare a variable of the table type, and reference the table type:
Here in this step you need to create Table variable as type That reference the table type that we created  on step 1. Later you will be using this table variable to insert the data and pass to the set to the TVP procedure
--    Declare a variable of the table type, and reference the table type
DECLARE @SalesRepSummary dbo. typSalesRepSummary

Step 4: Insert Statement fill the table variable created on Step 3:

INSERT INTO @SalesRepSummary
VALUES (55, '01-01-2002', '12-31-2002')
,(66, '01-01-2001', '12-31-2001')
,(66,'01-01-2003', '12-31-2003')
,(77,'01-01-2003','12-31-2001')
,(77,'01-01-2002','12-31-2002')
Step 5: Call the routine by passing the table variable that we created on step 4:

EXEC dbo.usp_GetSalseRep_CustYearSummary @SalesRepSummary

Scope

A table-valued parameter is scoped to the stored procedure, function or dynamic Transact-SQL text, exactly like other parameters. Similarly, a variable of table type has scope like any other local variable that is created by using a DECLARE statement

Benefits
Table-valued parameters offer more flexibility and in some cases better performance than temporary tables or other ways to pass a list of parameters. Table-valued parameters offer the following benefits: 


·         Do not acquire locks for the initial population of data from a client.
·         Provide a simple programming model.
·         Enable you to include complex business logic in a single routine.
·         Reduce round trips to the server.
·         Can have a table structure of different cardinality.
·         Are strongly typed.
·         Enable the client to specify sort order and unique keys.




Restrictions



·          SQL Server does not maintain statistics on columns of table-valued parameters.
·          Table-valued parameters must be passed as input READONLY parameters to Transact-SQL routines. You cannot perform DML operations such as UPDATE, DELETE, or INSERT on a table-valued parameter in the body of a routine.
·          You cannot use a table-valued parameter as target of a SELECT INTO or INSERT EXEC statement. A table-valued parameter can be in the FROM clause of SELECT INTO or in the INSERT EXEC string or stored-procedure.