SQL Server Deep Dives: Performance Tuning

Showing posts with label Performance Tuning. Show all posts

Friday, April 8, 2011

Top SQL Server performance killers..!

Here are some areas that degrades SQL Server Performance. These are not in any order.

Improper Database and SQL Server Settings

tempdb is one of the busiest system database, It is a place for temporary storage such as when you create temp tables, table variables, cursors, joins and operations like sorting, row versioning etc. All these operations can hurt IO, CPU performance. So it is necessary to adequately design the storage structure of tempdb.

Another important problem can be with incorrect storage design for data files, log files, SQL program files and OS files. They should be placed on appropriate RAID levels.

Also you should properly set the configuration settings for SQL Server like Memory, Query time out, parallelism etc, Database settings like Managing Statistics, Auto growth of db files etc.

Database Design

Bad logical database design results in bad physical database design, and generally results in poor database performance. Both the logical and physical design must be right before you can expect to get good performance out of your database.

Database should be adequately normalized and de-normalized to improve the performance and reduce blocking. Normalized database reduces redundant data in turn reduces storage requirements. Row size should be well planned so that more rows can be fit on an 8KB data page.

Over normalized tables also as bad as under normalized database, this results in increased number of complexity of joins to retrieve the data.

Index and Statistics

One of the common and biggest performance problems on SQL Server is poor or missing indexes and outdated statistics. When a query executes, Query optimizer will search for the useful indexes, in the absence of proper index for a query may result in blocking, dead lock, a high amount of stress in Disk, memory and CPU because SQL Server needs to process or retrieve much more data while executing the query.

Coming to statistics, Statistics are the information about the distribution of data in columns or indexes. SQL Server query optimizer relies on this statistics to generate more efficient query plan to retrieve or update the data. A good updated statistics helps to create high quality plan. By default SQL Server automatically creates and updates statistics. An outdated statistics gives the query optimizer inaccurate information about data distribution like the number of rows returned etc, this creates inefficient plans to execute the query.

Excessive Fragmentation

SQL Server manages the indexes automatically whenever an INSERT, UPDATE, DELETE occur on the underlying data this results in page split, fragmentation. There are two types of fragmentation External fragmentation – the leaf level of index is fragmented in a non-orderly fashion. The other one is Internal Fragmentation - the index pages are not being used to their maximum volume. A property called fill factor of an index decides how much space to be left on index pages, severe internal fragmentation can lead to increased index size and cause additional reads to be performed to return needed data. These extra reads can lead to degradation in query performance. You can overcome the fragmentation by rebuilding or reorganizing the index.

Blocks and Dead Locks

When a modification occurs in a transaction, SQL Server Isolates it from other concurrent transactions, this prevents the transactions to see intermediate states of the data, i.e. A transaction either sees the data either before state or after state of the data. This is the default property of SQL Server, and is done by using a mechanism called locks. Because of this isolation Blocking occurs in a database

A deadlock occurs when two resources attempt to escalate or expand locked resources and conflict with one another. The query engine determines which process is the least costly to roll back and chooses it as the deadlock victim.

Thus the execution time of a query is adversely affected by the amount of blocking and deadlock it faces.

Poor Query Design

The effectiveness of index depends on the way how you write queries like use of functions on a column of a where clause e.g Year(SaleDate) = 2010, this will result in index scan rather than an index seek. Another problem is retrieving more data from the table than that is required. You must ensure that the queries are fine tuned and it can make use of the indexes effectively.

T-SQL is a set based language meaning it operates on sets of data rather than rows. It performs better when deals with set of data. Non-set based approach make use of cursors, loops etc rather than more efficient Joins and subqueries. These Non-Set based approach kills the performance. If you are forced to use cursor if you are forced to deal with cursors, be sure to use efficient cursor types such as fast-forward only.

Poor Execution plans and Re-Compliation

When a query comes for execution optimizer checks for query plan available to execute the query, if it is missing it will generate a fair plan and usually this plan will be cached in memory for future use once it is created. There is fair amount of CPU expense while creating the Query plan. If a plan is designed so that you cannot plug variables to it, SQL Server creates a query plan every time the query is resubmitted with different value. This eats up the CPU cycles, so it is important to submit your queries in a form that SQL Server can cache and reuse it.

One of the standard ways of ensuring a reusable execution plan, independent of variable values used in a query, is to use a stored procedure. Using stored procedure to execute a set of SQL queries allows SQL Server to create parameterized execution plan.

Thursday, March 24, 2011

No of Read and Write against tables

Is it possible to find the number of reads and writes performed against each tables in a database? We can depend on sys.dm_db_index_usage_stats DMV to achieve this partially. The columns user_lookups, user_scans, user_seeks, user_updates are counters and gets incremented for each individual seek, scan or update on the specific index on a table when a query is executed.

Here is the query,

SELECT S.name,

SUM(user_lookups + user_scans + user_seeks) AS Reads,

SUM(user_updates) AS Updates

FROM sys.dm_db_index_usage_stats v

INNER JOIN sys.objects S ON S.object_id = v.object_id

WHERE type = 'U'

AND database_id = DB_ID()

GROUP BY S.name

The Reads column displays the number of times sql server accessed the indexes on that table and, the Updates column represent the number of time sql server performed update/write operation on the indexes of a table.

For example – I have a table, Employee

CREATE TABLE Employee

(

ID INT PRIMARY KEY,

EmpName NVARCHAR(30),

DeptID TINYINT

)

CREATE INDEX IX_NCL_Employee_EmpName

ON Employee(EmpName)

CREATE INDEX IX_NCL_Employee_DeptID

ON Employee(DeptID)

Our Employee table has three indexes, One Clustered Index on column ID and the rest are non-clustered index on EmpName and DeptID. For each insert against the Employee table SQL Server needs to do update the underlying indexes on the table also, in our case it is three. If we are doing an update on ID column then SQL Server needs to update the three indexes, because when you update a clustered index all the non-clustered indexes are also get updated. What if we update DeptID of that table? This time SQL Server needs to update two indexes, index on DeptID and the Clustered Index.

Well, there are some limitations with this query, the major one is that the counters reset when sql server starts. Another limitation is, when we drop any indexes on that table, we will lost reads/update information performed against those indexes. So this method is a temporary solution only!

Tuesday, September 28, 2010

Missing Index Feature - Limitation

The missing indexes feature uses dynamic management objects and Show plan to provide information about missing indexes that could enhance SQL Server query performance. The Missing Index feature will suggest you the candidates for index based on the execution plans of the queries being run against the database. This feature is based on the data generated from the execution plans stored in the cache.

When the query optimizer generates a query plan, it analyzes what are the best indexes for a particular filter condition. If the best indexes do not exist, the query optimizer generates a suboptimal query plan, but still stores information about these indexes. The missing indexes feature enables you to access information about these indexes so you can decide whether they should be implemented. You can use Sys.dm_db_missing_index_details DMV to view the details of missing indexes or you can examine the execution plan or XML show plans for a query to find the missing index details

Here on this article we will see the Limitations of the Missing Index feature.

The following are the Limitations of Missing Index Feature.

The missing index may recommend a non-clustered index that already exists. Over and over again.

Sometime the Missing Index feature recommends you to create a Non-Clustered Index that is already exist. This is actually a bug with the Missing Index feature (View details)

Setup the work table

CREATE TABLE MissingIndex

(

EmpID INT IDENTITY,

RefID INT,

LastRefID INT,

Remarks CHAR(4000)

)

-- Insert test data – Create a Covered Index

INSERT INTO MissingIndex (RefID,LastRefID,Remarks)

SELECT RAND()*10000,RAND()*10000,'NA'

GO 10000

-– Create a Covered Index

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(RefID,LastRefID)

INCLUDE(EmpID)

SELECT EmpID

FROM MissingIndex

WHERE RefID BETWEEN 10 AND 1000

AND LastRefID > 6000 ;

The Execution plan will guide you to create those missing indexes.

Let’s create those missing indexes and examine the Execution plan again for the same query.

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(RefID,LastRefID)

INCLUDE(EmpID)

Estimated Execution Plan after creating the Missing index Suggested by the optimizer.

Good to see an index seek operation, and the index I created was also useful. Now let’s see how this index works for a little complicated query.

Check the estimated execution plan for the following query

DECLARE @EmpID INT

DECLARE curMissingIndex CURSOR

FOR SELECT EmpID

FROM MissingIndex

WHERE RefID BETWEEN 10 AND 1000

AND LastRefID > 6000

OPEN curMissingIndex

FETCH NEXT FROM curMissingIndex INTO @ EmpID;

WHILE @@fetch_status = 0

BEGIN

FETCH NEXT FROM curMissingIndex INTO @ EmpID;

END

CLOSE curMissingIndex ;

DEALLOCATE curMissingIndex ;

This time also optimizer suggests me for a missing index. Let’s see what the missing index is.

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]

ON [dbo].[MissingIndex] ([RefID],[LastRefID])

INCLUDE ([EmpID])

Ah! Shocking? The missing index is actually exactly the same as the one I created little earlier. . It will continue to recommend you the same again and again even if you follow the missing index suggestion.Even if you examine the Sys.dm_db_missing_index_details DMV you can find an entry there.

It is not intended to fine tune an indexing configuration.

The missing indexes feature is a lightweight tool for finding missing indexes that might significantly improve query performance. It does not provide adequate information to fine tune your indexing configuration. It will not suggest you Partitioning or indexed Views etc. Use Database Engine Tuning Advisor for that purpose it is having a wide range of recommendations.

It cannot gather statistics for more than 500 missing index groups

Once maximum of 500 missing indexes groups are captured, no more missing index group data is gathered. This threshold is not a tunable parameter and cannot be changed.

It does not specify an order for columns to be used in an index.

The missing indexes feature does not consider the optimum sort order when suggesting an index. Consider an example

SELECT EmpID

FROM MissingIndex

WHERE RefID BETWEEN 10 AND 1000

AND LastRefID > 6000

ORDER BY LastRefID, RefID

For the above query the optimizer will suggest you a missing index with columns RefID and LastRefID and an INCLUDE part as EmpID, But the reality is that if you create the index with the column order RefID and LastRefID then you can take advantage of Sorting.
If I created the missing index as

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(RefID,LastRefID)

INCLUDE(EmpID)

Then cost of the query was 1.34561

Instead of creating index “IDX_NCL_tMissingIndex_cRefID_cLastRefID”
If I created as the following

DROP INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex

CREATE INDEX IDX_NCL_tMissingIndex_cRefID_cLastRefID

ON MissingIndex(LastRefID, RefID)

INCLUDE(EmpID)

Then cost of the query came down to 0.728691!

For queries involving only inequality predicates, it returns less accurate cost information

The missing index feature uses a simplistic model to generate cost information for queries involving only inequality predicates. Consequently the cost information returned for these queries may be less accurate than the information returned for queries that involve equalities.

It reports only include columns for some queries, so index key columns must be manually selected

Sometimes the feature only reports column usage information for include columns. For example, the feature might report equality columns = { }, inequality columns = { }, and include columns = {b, c, …}. In this case, one of the include columns must be selected for the index key column

It returns only raw information about columns on which indexes might be missing.

The missing indexes feature returns raw information about columns on which indexes might be missing. The information returned might require additional processing before we can use the information to create an index.

For example, if the column involved is a computed column, and the computation is nondeterministic or imprecise, the CREATE INDEX statement on that column will fail. In the case of an imprecise computation, the column should be persisted first, and then you can create the index.

Sometimes missing index groups appears multiple times on the Showplans

Let’s see an example,

Setup the sample tables

-- Create Customer Table

CREATE TABLE Customer

(CustomerID INT IDENTITY,

FirstName VARCHAR(10),

MiddleName VARCHAR(10),

LastName VARCHAR(10),

Remarks CHAR(2000)

)

-- Insert 1000 Customers

INSERT INTO Customer(FirstName, MiddleName, LastName, Remarks)

SELECT CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0)),'NA'

GO 1000

-- Contact information

CREATE TABLE CustomerContactInfo

(CustomerID INT,

Address1 CHAR(200),

Address2 CHAR(200),

CityID INT,

Phone VARCHAR(10),

)

INSERT INTO CustomerContactInfo

SELECT ROUND((999 * RAND() + 1), 0),

CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))+CHAR(ROUND((26 * RAND() + 65), 0))

,ROUND(RAND()*100,0)

,CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))

+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))+CONVERT(VARCHAR,ROUND((9 * RAND() + 1), 0))

GO 3000

-- Sales Information

CREATE TABLE SalesSummary

(SalesID INT IDENTITY,

CustomerID INT,

UnitPrice MONEY,

Tax MONEY,

Discount MONEY,

TotalPrice AS UnitPrice+Tax+Discount,

Remark CHAR(2000))

INSERT INTO SalesSummary (CustomerID,UnitPrice, Tax, Discount, Remark)

SELECT ROUND((999 * RAND() + 1), 0),ROUND(RAND()*10000,0),ROUND(RAND()*100,0),ROUND(RAND()*1000,0), 'NA'

GO 10000

Execute the following query, since our tables don’t have any useful index to satisfy the query it The optimizer will place an entry on the Missing index details.

-- Execute the query

SELECT C.CustomerID,FirstName,MiddleName,LastName,Address1,Address2,CityID,Phone,SalesID,TotalPrice,S.Remark

FROM Customer C

INNER JOIN dbo.SalesSummary S

ON C.CustomerID = S.CustomerID

INNER JOIN CustomerContactInfo CC

ON CC.CustomerID = C.CustomerID

WHERE S.CustomerID = 223

Now, examine the missing index details by running the following query

-- Check the missing index details

SELECT OBJECT_NAME(Object_id),equality_columns,inequality_columns,included_columns

FROM sys.dm_db_missing_index_groups g

INNER JOIN sys.dm_db_missing_index_details d

ON g.index_handle = d.index_handle

ORDER BY object_id

You can see the optimizer came up with a suggestion to create two indexes on Sales table, but the Index Key Column is the same “CustomerID”, Only difference is the included column.This occurs when different parts of a single query benefit differently from the same missing index group.
Keep these limitations on mind while working on Query Performance Tuning!