Tipping point

YOU NEED ADVENTUREWORKS TO RUN THIS EXAMPLE:

When working with Executionplans, I have often wondered about why a query could end up with two different executionplans, depending on your where clause.

Lets have an example.

— CODE START —
select ProductID, OrderQty
from sales.SalesOrderDetail
where ProductID between 776 and 777
— CODE END —

Returns the following executionplan:

Basically that means: I look in a nonclustered index to find the rows and then (one by one) i look up in the clustered index to find the data.

 

If I write

— CODE START —
select ProductID, OrderQty
from sales.SalesOrderDetail
where ProductID between 776 and 778
— CODE END —

then I get this executionplan

Basically that means: I scan all records in the clustered index to find the rows that I want.

This is called the Tipping point, and that is the point where the SQL Server decides that it is more efficient to scan all data instead of using a nonclustered index. because what happens when I use an index is that on the leaf level of the index I have the clustered indexkey, so when I have found the records in the nonclustered index that fits with my where clause, then it must go to the clustered index to find the data that I want to show that are not in the nonclustered index (in this situation OrderQty).

At some point the Query optimizer decides to forget about the nonclustered index, because it estimates that it is more efficient to scan the clustered index because the where clause is very wide.

That point is called “Tipping Point”.

Does it work?

If we continue with another example

— CODE START —

select ProductID, OrderQty
from sales.SalesOrderDetail
where ProductID between 776 and 800

— CODE END —

In this example the execution of the query took 1,21229 SQL Units (the sum of IO Cost and CPU Cost). If I had forced it to use the nonclustered index:

— CODE START —

select ProductID, OrderQty
from sales.SalesOrderDetail with (Index(IX_SalesOrderDetail_ProductID))
where ProductID between 776 and 800

— CODE END —

it would have taken 7,01266 SQL Units. So yes… The SQL Server is in most situations quite clever when it comes to choosing between a scan and using an index. Look at the plans below, and you will see, that the one the query optimzer chooses it self, is much more efficient than the one I choose by adding the tablehint.

That was tipping point. A nice little thing about this is the question. Could I have done it even more faster?

Yes, I could have used a covering index, meaning that I will add Qty to the leaf level of the nonclusterend index. If I do this, then the Query will only take 0,04 SQL Units!! YES THAT IS CORRECT.. 0,04!!!!. The cost is space and time to insert/update rows.

But that is a completely other story.

Søren

 

A short introduction to Full Text Search

YOU NEED THE ADVENTUREWORKS DATABASE FOR THIS BLOG!!!!

I think that everybody that works with SQL Server hates when users wants to make advanced searches, where the where-clause should be configured in such a way that it is impossible to get a good performance and to understand what is happening.

With Full text Search, Microsoft has a unique tool to optimize searches for e.g. websites or internal systems that requires a fast response and that must be flexible.

Imagine that you want to search in a producttable for Bike in the name field and the descriptionfield, You would then write something like:

Select *
from producttable
where name like ‘%Bike%’ or description like ‘%Bike%’

First of all, that would cause a lot of bad performance since you have a wildcard at the beginning of the searchfield, and second problem would be that  you can’t do an intelligent search, so that e.g. it also returns rows where it does not say Bike, but Racer, Biking and so on.

Full Text Search is the answer for that.

To do a full text search you need to setup a catalog and then an Index, and since it is a short introduction we will do it with the least amount of effort and without talking about all the properties:

— CODE START —

USE [AdventureWorks]
GO
CREATE FULLTEXT CATALOG [MainCatalog]WITH ACCENT_SENSITIVITY = ON
AS DEFAULT

— THen we create the full-text index

USE [AdventureWorks]
GO
CREATE FULLTEXT INDEX ON [Production].[Product] KEY INDEX [PK_Product_ProductID] ON ([MainCatalog]) WITH (CHANGE_TRACKING AUTO)
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Color])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ADD ([Name])
GO
USE [AdventureWorks]
GO
ALTER FULLTEXT INDEX ON [Production].[Product] ENABLE
GO
— CODE END —

Now you first have a full text catalog, and within that catalog you have a full text index. the index says on what table you want to add a full text index and what rows that should be included.

An easy way to do that is to go to Management studio, choose your database –> Storage and right click “Full Text Catalogs” to add a new catalog, and when you are finished. Click the catalog and choose properties.

Now we can query the table and get an incredible performance and flexiblity.

You query by using four different words, and we will look at them now:

  1. Freetext
  2. Contains
  3. FreetextTable
  4. ContainsTable

The most simple ones are freetext. The syntax is:

— CODE START —
Select *
from Production.Product
where freetext(*, ‘Black’)

Select *
from Production.Product
where Contains(*, ‘Black’)
— CODE END —

The star means that it should look in all columns that are in the index, that means name and color. I could write a column name instead or [col1, col2, colx].

‘Black’ is of course the searchexpression.

With freetext it is simple, and you write a word or a short statement. What freetext then do is:

  1. Wrap words according to languagespecific rules, so that it does not only look for a sentense but is capable of finding words also.
  2. Stemming. Meaning that it looks after other types of the word, e.g. Start = Started = Starting e.g.
  3. Look up in the thesaurus. A file that is situated in the SQL Server folder FTDATA (there is one for each language. This one you can modify, and it tells the Full Text search, that if you search for e.g. NT then it should also return Windows, Operating system e.g.

Contains let you do more advanced searches, e.g.

— CODE START —
Select *
from Production.Product
where Contains(Name, ‘HEadSet NEAR Bearings’)
— CODE END —

Try to do that with a normal SQL Statement.

If i want contains to find all stemmings of a word (start, started e.g.), then I could write.

— CODE START —
Select *
from Production.Product
where Contains(Name, ‘FORMSOF(Inflectional, Bike)’)
— CODE END —

If I want to test it, I code do the following:

— CODE START —
— We REset Product 876
update Production.Product
set name = ‘Bike’
where productID=’876′

— See product 876
Select *
from Production.Product
where Contains(Name, ‘FORMSOF(Inflectional, Bike)’)

— Now we change it to Biking
update Production.Product
set name = ‘Biking’
where productID=’876′

— Product 876 is gone
Select *
from Production.Product
where Contains(Name, ‘Bike’)

— But with Inflectional it is here again
Select *
from Production.Product
where Contains(Name, ‘FORMSOF(Inflectional, Bike)’)
— CODE END —

NICE.. The difference between Inflectional and writing Thesaurus is that the latter looks up in the Thesaurus file, and you can controll it.

Remember.. If you change in the Thesaurus file, you must run:

— CODE START —
exec sys.sp_fulltext_load_thesaurus_file 1033;
— CODE END —

1033 is the localeID (Language code).

If you want to test what a searchphrase would return, then you could use the following Dynamic Management View:

— CODE START —
Select *
from sys.dm_fts_parser(‘FORMSOF(inflectional, Yellow)’,1033,0,0)

Select *
from sys.dm_fts_parser(‘FORMSOF(inflectional, blau)’,1031,0,0)

Select *
from sys.dm_fts_parser(‘FORMSOF(thesaurus, NT5)’,1033,0,0)

Select *
from sys.dm_fts_parser(‘FORMSOF(thesaurus, BIKE)’,1033,0,0)
— CODE END —

One problem with Full Text search is that if I live en Denmark I would love to return all rows were the word THE is in. It means tea. But if I am in you, and a user search for The Company, then I would nok like to return all rows where THE is in. I must add a stopword, and a stopword means a word that is not included in the search.

To do that I must:

  1. Create a stoplist
  2. Add stopwords
  3. attach a stoplist to an index

— CODE START–

select *
from Production.Product
where contains(*,’Headset’)

— Then Lets create a stoplist
USE [AdventureWorks]
GO
CREATE FULLTEXT STOPLIST [SorensStop]
FROM SYSTEM STOPLIST
;

GO
— Then we add a stopword
USE [AdventureWorks]
GO
ALTER FULLTEXT STOPLIST [SorensStop] ADD ‘Headset’ LANGUAGE ‘English’;
GO

alter  Fulltext index on Production.Product
set stoplist=SorensStop — YOUR FULLTEXT INDEX

— Lets try again
select *
from Production.Product
where contains(*,’Headset’)
— CODE STOP —

Nice and easy 🙂

Søren

Why Autoshrink is so bad

One of the interesting things about Microsoft SQL Server 2008 is the possibility to change a lot of properties on serverlevel and databaselevel so that you can configure the server as you want it.

One of these properties is the autoshrink property, and it can be found when you right click a database and choose Options.

As you can see, then it is the third option in the propertypage, and it is set to false. If I want autoshrink, then I can set it to true.

Setting it to true could start a process, where your database autoshrink, then it expands, then it autoshrinks and so on, and we don’t want that.

But what is the biggest problem when it comes to autoshrink? It is that it generates fragmentation, and that will degrade the performance dramatically.

Why? Let’s look at an example and let me explain. If you imagine that a database has two tables T1 and T2 and they both requires 7 datapages to store there data, and when we have created them, it looks like this:

You read the datafile from left to right, and that means if I want to scan table T2, then I start at T2-1 and end at T2-7. It will be easy for the SQL server to read it (please forget all about extents, Heaps, indexes e.g. This is just to get an understaning about what is happening). Actually now, i first read T2-1, t2-2, and I end at T2-7.

But now i want to delete talbe T1, so I drop it, and the datapages are released for T1, and now my datafile looks like this

I have 7 empty slots, and now we shrink the database, and what would be nice, was if the shrink will end up like this:

BUT it does not. It ends up like this:

It turns the table around, because the SQL Server is so clever (that was ironic) that it starts from the end of the datafile with the last page of T2 and put it in the first free slot, then it takes the second last (T2-6) and moves that into the second position and so on.

That causes a dramatically increase in fragmentation of the table.

Lets see that. First we setup the database

— CODE START —

use master
go

if DATABASEPROPERTYEX(‘TestAutoShrink’,’Version’)>0
    drop database TestAutoShrink
go

— REMEMBER TO ADD A FILELOCATION THAT IS VALID ON YOUR COMPUTER!!!!!

CREATE DATABASE [TestAutoShrink] ON  PRIMARY
( NAME = N’TestAutoShrink’, FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestAutoShrink.mdf’ , SIZE = 262144KB , FILEGROWTH = 1024KB )
 LOG ON
( NAME = N’TestAutoShrink_log’, FILENAME = N’C:\Program Files\Microsoft SQL Server\MSSQL10_50.MSSQLSERVER\MSSQL\DATA\TestAutoShrink_log.ldf’ , SIZE = 262144KB , FILEGROWTH = 10%)
go

use TestAutoshrink
go

— CODE END —

Then we add two tables

— CODE START —

use TestAutoshrink
go

Create table Table1    (ID            int identity primary key,
                     Content    varchar(200)
                     )

Create table Table2    (ID            int identity primary key,
                     Content    varchar(200)
                     )
go

insert into Table1 Values(‘THIS IS A DEMONSTRATION OF AUTOSHRINK’)
go 10000

insert into Table2 Values(‘THIS IS A DEMONSTRATION OF AUTOSHRINK’)
go 10000

— CODE END–

Note the Go 10000. It means that it runs the batch 10.000 times. It is nice and easy, and a trick that Paul Randal has shown me on the Immersion Class in Seattle.

Lets see some statistics about. First we want to see the fragmentation and how many pages that the tables use.

— CODE START —

select    t1.object_id, t2.name, avg_fragmentation_in_percent as fragmentation,
        page_count
from sys.dm_db_index_physical_stats(DB_ID(),null,null,null,’DETAILED’) t1 inner join
        sys.objects t2 on t1.object_id=t2.object_id
where index_level=0

— CODE END —

So, what I can see is a fragmentation of approx 5% for each tables leaf level (you may get another figure, but it is at least a lov figure), and I can see that the table uses 68 pages.

Now, if you want you can see more info about these pages, by running DBCC IND. If i want to see DBCC IND for Table1, then I can do that:

— CODE START–

dbcc ind (‘testautoshrink’,’Table1′,-1)

— CODE END —

minus 1 kan be replaced with e.g. -2 or the indexnumber that you want to see.

What you can see is that it returns 70 rows, and the reason for that is that there is an Index Allocation Map (IAM-page) and an indexpage:

IAM pages has an Pagetype of 10, datapages a pagetype of 1 and indexpages a pagetype of 2.

NOW. Lets delete a table.

— CODE START —

Drop table Table1

— CODE END —

If you try to run DBCC IND again, then you will get an error.

OK. now lets do the shrink,

— CODE START —

USE [TestAutoShrink]
GO
DBCC SHRINKFILE (N’TestAutoShrink’ , 2)
GO

— CODE END —

And now the intersting part is. What did happen to our datafile and our table.

The datafile is easy. It is minimized in size, and is now 2mb big. But if we run the code from before and look at the infos for table one, we will end up with some very interesting figures.

— CODE START–
select    t1.object_id, t2.name, avg_fragmentation_in_percent as fragmentation,
        page_count
from sys.dm_db_index_physical_stats(DB_ID(),null,null,null,’DETAILED’) t1 inner join
        sys.objects t2 on t1.object_id=t2.object_id
where index_level=0
— CODE END–

Let’s see the output

WOW!!! What happened to the fragmentation? It increased dramatically, and now we need to do a rebuild to reduce the fragmentation.

To the once of you that does not know about fragmentation, then it short.. IT IS BAD!!! Of course you can find situations where you can argue that it is not bad, but in general.. IT IS BAD!!!

CONCLUSION: Unless you have a completely different dataflow in your database (e.g. if you drop a lot of data), then you should not shrink a database and of course you should also not autoshrink!

Søren

Course at SQL Skills in Seattle – Day 3-4

2 more days of the course has gone, and this will be a very short post, because, it is tough.. very tough. Long days and then work/practise after the course.

But anyway it is wort all the time.

Wednesday was about partitioning and you would think that partitioning is just something enterprisebusiness are using. You are wrong.

Partitioning can be done in a lot of different ways, and it can improve dramatically.

BUT the best part until now is about the statistics. I have never earlier thought about how statistics works in a microsoft SQL Server 2008, but now I know in debt.

I will post my last blog tomorrow and then I will head back home to Denmark.

Søren

Course at SQL Skills in Seattle – Day 2

Logfiles… What the f… It is just something that is there so that the SQL Server will not fail???? Or not.

Once againg Paul Randal gave a good lecture about how the Microsoft SQL Server 2008 is working when it comes to logfiles. Everything clears up, when you get a good and deep understand, and now at least I have been introduced to more tools to investigate and understand the logfile.

Also Kimberly L. Tripp started to teach, and damn…. Where does she get the energy from. I don’t know, but I know that she has PLENTY of energy and drive.It seems like she know everything about the Microsoft SQL Server, and she is really positive when it comes to answering questions.

Søren

Course at SQL Skills in seattle – Day 1

Finally, the day arrived where the course would start, and let me say it like this: WOW…. This is simply the best course I have ever attended, today our primary focus was the datastore and physical structure, and to be honest, I thought I new all that there was to know about these topics, but now I know, that i Dont’t.

On the other hand, then I am also positive suprised that I knew a lot, so at the end of the day, I feel enlighted but also confident.

I will post a small post a little bit later today about one of the funny and small details, autoshrink. I have always told my customers that autoshrink is bad, but now I know it is and I know why. Never thought about that before, but Paul Randal gave us a fantastic insight in the what happens when you put on autoshrink.

But any way. My taste for SQL Server has increased, and I am really happy that I have decided to participate in this Immerse yourself event from SQLSkills.

Course at SQL Skills in seattle – Day -1

Sunday… And one day before the start of the course. I am really looking forward to meeting all the other participants and the guys/girls from SQL Skills.

I have been out in the city today, and actually it is not in Seattle it self I am staying, but in a suburb called Bellevue, Well… Suburb is perhaps not the right word, since there are more than 100.000 people living there.

I went to a market today, and it was the annual Art market. It was fantastic, but the most exciting thing was that i found a Microsoft store. Would you believe that?

Well, It was amazing, and 100 times better than all the app-stores I have benn visiting.

So… Tomorrow we start…