Course at SQL Skills in seattle – Day -2

This is something I have been looking forward to for the last¬†couple of month… Meeting the guys (and actually also girls) that really knows and SQL server is working and that can teach me new stuff. I am going to participate in the Performance and Internals immersion from august the 1th 2011 to august the 5th, and I will try to post a small blog every day, about the topics we have covered (including if we drink a beer or two)…

Day Minus 2, started wednesday morning very early in Denmark, and after a stop in Amsterdam, I left for US. 10 hours later, I arrived in Seattle, and went to the hotel, were the event also i held. It is the Courtyard Mariott hotel in Bellevue Seattle.

On my trip I tried to study a little bit more for the course, and I will try to post a blog about one of the interesting topics that I covered.

The course is, as I wrote earlier, a performance and internals course, that are held by a company called SQL  Skills (www.sqlskills.com), and two of the instructors are:

Kimberly L. Trpp and Paul S. Randal, two of the most famous SQL Guys in the world.

Paul has been working on the SQL Server team, and he has amongst other been responsible for the DBCC command.

Next post will be send tonight, and then I will send a post after each day of the course.

Recursion

recursion is one of the things that makes it easy to work with data that belongs to different levels. that could e.g. be a organsiation, where finance consist of 3 levels (Finanse –> Purchase –> internatiol) and IT only one level.

these different levels makes it easy to extract data so that you can create an organsiational map, and for that we use Recursion.

I will stick to the organsational example, where the organisation looks like.

what I want is, that I want a list showing me:

1) Organisational unit
2) What organisational unit it belongs to.
3) Level (Company=0 –> Domestic/Internationl =3)

First we need to setup the database. JUST copy the code into your management studio.

—– CODE SNIPPET START —–

— 1. First we do the normal preparations
—¬†¬†¬† We create the database and jumps right into it.

use MASTER
GO

IF DATABASEPROPERTYEX(‘DemoRecursion’,’Version’)>0
DROP DATABASE DemoRecursion
GO

CREATE DATABASE DemoRecursion
GO

USE DemoRecursion
GO

— 2. Then we create a table

CREATE TABLE OrgUnits    (    ID    INT    IDENTITY PRIMARY KEY,
OrgUnitID            INT UNIQUE,
OrgUnitName            VARCHAR(500),
BelongstoUnitId        INT Null
)
go

—– CODE SNIPPET END —–
Then we need to put in some testdata.

—– CODE SNIPPET START —–

INSERT INTO OrgUnits VALUES(1,’Company’,NULL) — Top of organisation

INSERT INTO OrgUnits VALUES(2,’IT’,1)
INSERT INTO OrgUnits VALUES(3,’Infrastructure’,2)
INSERT INTO OrgUnits VALUES(4,’Software’,2)
INSERT INTO OrgUnits VALUES(5,’Education’,2)

INSERT INTO OrgUnits VALUES(6,’Finance’,1)
INSERT INTO OrgUnits VALUES(7,’Debtors’,6)
INSERT INTO OrgUnits VALUES(8,’Purchase’,6)

INSERT INTO OrgUnits VALUES(9,’Domestic’,8)
INSERT INTO OrgUnits VALUES(10,’International’,8)

—– CODE SNIPPET END —–

NOW WE ARE READY TO DO SOME RECURSION.

To create the desired output, we need to use something called Common Table Expression (or just CTE). that is a way of creating a temporary dataset and then use it in a select clause (the simple description.

Using CTE requires two parts

1) Setting up the temporary dataset
2) Selecting it

So if we forget recursion and just try to use CTE, then try to copy and paste the following code into your SSMS and then run it.

—– CODE SNIPPET START —–

WITH organisation(OrgUnitID, OrgUnitName, BelongsToID, BelongsToName, LEVEL)
AS
(
SELECT OrgUnitID, OrgUnitName,OrgUnitID,OrgUnitName,0
FROM OrgUnits
WHERE BelongstoUnitId IS NULL

)
SELECT *
FROM organisation
—– CODE SNIPPET END —–

I have created a temporary dataset called organsation, and it has 5 fields.

Between () I have created a normalselect statement, and puts all rows into the dataset from table OrgUnits where BelongsToUnitID is null. In our case it means the top of the orgunit.

after the ) I use a select from the dataset. Very simple and it can be joined with all other tables in your database.

A CTE ALWAYS has these two parts

1) WITH xxx AS ()
2) SELECT * FROM xxx

Actually what we have created now is our Anchor in the recursion, so that we now where to start and where to end.

so now we just need to create some code saying:

1) I am at toplevel. Please return all children (IT / FINANCE).
2) After they are return we first move to child number 1 (IT) and says
3) I am at IT please return all childen.
4) If any children then do 2+3 for all children

and basically 2+3+4 are repeated until there are no more children nowhere in the structure.

The code looks like

—– CODE SNIPPET START —–

WITH organisation(OrgUnitID, OrgUnitName, BelongsToID, BelongsToName, LEVEL)
AS
(
— First we create the Anchor
SELECT OrgUnitID, OrgUnitName,OrgUnitID,OrgUnitName,0
FROM OrgUnits
WHERE BelongstoUnitId IS NULL

— Then we creat the recursion part
UNION ALL

SELECT T1.OrgUnitID, T1.OrgUnitName,t2.OrgUnitID,t2.OrgUnitName,Level+1
FROM OrgUnits T1 INNER JOIN organisation T2 ON t1.BelongstoUnitId=t2.OrgUnitID

)
SELECT *
FROM organisation
—– CODE SNIPPET END —–

As you might see, then I have just added UNION ALL and TWO LINES. What you can see is that I create a normal Select Clause and it takes all rows from OrgUnits where the belongsToUnitID are in the temporary dataset.

Let me try to explain

1) First we add the anchor to dataset (it has ID 1)
2) then we select all from Orgunits where BelongstoUnitID is in organisation (meaning is in 1). That returns IT and FINANCE.
3) AHA… Now we have two new rows in the dataset, and we need to select all from OrgUnits where belongstoUnitID that has not been processed (That means 3,4,5+6,7,8).

And we continue with 2+3 until we are gone through all organsational units.

Nice and easy!!!

Why make it difficult

Some times it is not necessary to write long blogs to write about nice features in the SQL Server. This time I was watching a video for the Master programme on the SQL Server, and then I learned a little trick that I will share with you.

you all know the little command

go

What happens if you write

go xxx

where xxx is a number?

Well, it simply runs the batch xxx times.

That can be very useful if you want to create some testdata and insert rows many times to a table. Lets try it, and first we create the database and a table-

— CODE SNIPPET START —

use master
go

if DATABASEPROPERTYEX(‘testGO’,‘Version’)>0
drop databasetestGO
GO

create database testGO
go

use testGO
go

Create table testTable ( C1 int identity, C2 varchar(20) default ‘testdata’)
go

— CODE SNIPPET END —

Now we have a database called testGO and a table called testTable, and now I want to insert 10000 rows of dummy data. How do I do that?

— CODE SNIPPET¬†START —

insert into testTable default values

go 10000

— CODE SNIPPET END —

and we can test it

— CODE SNIPPET¬†START —

select COUNT(*)
from testTable

— CODE SNIPPET END —

GO xxx is really nice.

Defragmentation

I have been asked if there is a way to do a more intelligent fragmentation than just using the normal maintenanceplan, and of course there is. But lets start with the basics and forget about all the hard issues.

My personal opinion is that if an index

  • is above 10% and below 30% fragtmented, then it should be reorganised.
  • is over or equal 30% fragmented then it should be rebuild.

That is my personal opinion, and I know that a lot of other consultants has a lot of other opinions about that. So, in this article then lets stick to my opinion and limit this post to finding a way, where we can go through all indexes and rebuild or reorganise according to this rule (i know that there is a lot of other rules, settings e.g. to think about, but lets start here).

To do this I will use the following techniques:

  1. Cursors.
  2. Dynamic SQL
  3. Dynamic management objects
  4. SQL Server Agent

To run this sample you must have the adventureworks database installed.

What I will do, is that I would go through all indexes and get there fragmentation, and to do that I need a cursor and the dynamic mangement object sys.dm_db_physical_stats.

First I setup my cursor, and a cursor basically consists of two parts, an outer loop and somthing we do in that loop. Imagine that we traverse through a table with 10 countries then the inner loop will be run 10 times, one time for each country:

—– CODE SNIPPET START—–

use adventureworks
go

— We declare some variables that should hold the
— parameters that are send from the outerloop to the code that
— are run for each row in the outer loop.

Declare @objectID varchar(50),
@IndexID
varchar(50)
,
@Fragmentation
real
,
@pagecount
int

— We declare the cursor

Declare traverse cursor for
select object_ID, index_id ,avg_fragmentation_in_percent,page_count
from sys.dm_db_index_physical_stats(DB_ID(),null,null,null,null)
where avg_fragmentation_in_percent>=10

— We open the cursor
Open traverse

— We try to fetch the first row of the cursor into the 4 variables

fetch next from traverse into @objectid, @indexid, @fragmentation, @pagecount

— If ok then we do something
while (@@FETCH_STATUS=0)
begin

— We try to fetch the next row of the cursor into the 4 variables
fetch next from traverse into @objectid, @indexid, @fragmentation,@pagecount
end

— Remember to Close the cursor
close traverse

— Remember to Deallocte the cursor
deallocate traverse

—– CODE SNIPPET END —–

Thats it… We have a cursor that are working, now we can look at two things:

  1. What is the pagecount
  2. what is the fragmentation

If the pagecount is very small then defragmentation is not that important. So let’s stick to the plan. In the cursor we have removed all indexes with a fragmentation below 10, so we only need to consider: Is it higher or lower than 30%?

—– CODE SNIPPET¬†START —–

use adventureworks
go

— We declare some variables that should hold the
— parameters that are send from the outerloop to the code that
— are run for each row in the outer loop.

Declare @objectID varchar(50),
@IndexID varchar(50),
@Fragmentation real,
@pagecount int

— We declare the cursor
Declare traverse cursor for
select object_ID, index_id ,avg_fragmentation_in_percent,page_count
from sys.dm_db_index_physical_stats(DB_ID(),null,null,null,null)
where avg_fragmentation_in_percent>=10

— We open the cursor

Open traverse

— We try to fetch the first row of the cursor into the 4 variables

fetch next from traverse into @objectid, @indexid, @fragmentation, @pagecount

— If ok then we do something

while (@@FETCH_STATUS=0)
begin
— We need more variables
Declare @tablename varchar(200),
@IndexName varchar(200)
,
@sql
varchar(5000)

— We do it a stupid way, but we want to use tablename and indexname
— So we must find it first
select @tablename=‘[‘+sys.schemas.name+‘].[‘+sys.tables.name+‘]’,@IndexName=sys.indexes.name
from sys.tables
inner join sys.indexes on sys.tables.object_id=sys.indexes.object_id
inner join sys.schemas on sys.tables.schema_id=sys.schemas.schema_id
where sys.tables.object_id=@objectID and index_id=@IndexID
— Then we test what the fragmentation is
— We could also test for pagecount
if @Fragmentation<30
begin
set @sql=‘Alter index [‘+@Indexname+‘] on ‘+ @tablename+‘ REORGANIZE’
end
else
begin
set @sql=‘Alter index [‘+@Indexname+‘] on ‘+ @tablename+‘ Rebuild’
end

— Then we execute the code
exec (@sql)
print‘The following code has been executed: +@sql

— We try to fetch the next row of the cursor into the 4 variables
fetch next from traverse into @objectid, @indexid, @fragmentation, @pagecount

end

— Remember to Close the cursor

close traverse

— Remember to Deallocte the cursor

deallocate traverse

—– CODE SNIPPET¬†END —–

And now the last things to do is to

  1. Remove the print line (or perhaps create a logfile so that you can see what are defragmentated, how and when.
  2. Add a job to the SQL Server Agent.

Very easy, and if you want to then you can try to reduce the size of my code to a minimum. I just wanted to show you some different technologies, in a way where you just can copy it from my blog into your own production environment.

There are a lot of “traps” and performance issues, both when it comes to using a cursor, and when it comes to defragmentation. This is just the beginning!!!!

 

 

Filestream for newbies

Filestream is one of the coolest new functionalites in SQL Server, unfortunately a lot of systems don’t use it, because it is to difficult to use or the application that users use can’t handle filestreams.

This document will try to give a short introduction to newbies on how Filestream is working. Don’t expect a deep technical blog. This is the very simple way of presenting filestreams.

Basically Filestreams are used to store large documents like movies, PDF-files, music and so on, but you could basically store all types of data in a FILESTREAM.

BUT, let’s take a look at it.

1. FIRST WE NEED TO PREPARE THE SQL SERVER / INSTANCE FOR FILESTREAM

In windows

Go to start –> MICROSOFT SQL SERVER –> Configuration –> SQL SERVER CONFIGURATION MANAGER.

Right click the instance that you want to enable filestream on. Choose Properties.

Filestream 001
SQL Server configuration

Depending on the type of access you need, then you should enable filestream for T-sql and/or IO Streaming. In this post I will only use T-SQL.

 
Go into the SQL Server management studio and logon to the Instance you want to enable for filestream.
 
2. FIRST WE WILL CREATE THE DATABASE WE WILL TEST AND WORK WITH:
 

—- CODE SNIPPET START —

 

use master
go

 

if DATABASEPROPERTYEX(‘Filestream’,‘VERSION’)>0
drop database Filestream
go

 

create database Filestream
go

 

 

 

use Filestream
go

 

EXEC sp_configure filestream_access_level,2
RECONFIGURE
go

—- CODE SNIPPET END —-

 

3. NOW WE NEED TO ADD A FILESTREAM FILEGROUP

—- CODE SNIPPET¬†START —-

USE [master]
GO

ALTER DATABASE [Filestream] ADD FILEGROUP [Filestream] CONTAINS FILESTREAM
GO

ALTER DATABASE [Filestream] ADD FILE ( NAME =N’FStream’,FILENAME=N’C:\FStream’) TOFILEGROUP [Filestream]
GO

USE [Filestream]
GO

IF NOT EXISTS(SELECT name FROM sys.filegroups WHERE is_default=1 AND name =N’Filestream’) ALTER DATABASE [Filestream] MODIFYFILEGROUP [Filestream]¬† DEFAULT
GO 

—- CODE SNIPPET END —-

4. NOW WE ARE READY TO ADD A TABLE AND SOME DATA
A table with a filestream requires a ROWGUIDCOL and all filestream data must be varbinary(max), so the table will look like this (including insertion of sampledata):
 

—- CODE SNIPPET START —-

CREATE TABLE FilestreamFun (
[Id] [uniqueidentifier] ROWGUIDCOL NOT NULL UNIQUE,
[Number] INTEGER UNIQUE,
[Name] VARBINARY(MAX) FILESTREAM NULL
)

GO

insert into Records values (NEWID(), 1,CONVERT(varbinary(Max),‘I am no. 1’))
insert into Records values (NEWID(), 2,CONVERT(varbinary(Max),‘I am no. 2’))
insert into Records values (NEWID(), 3,CONVERT(varbinary(Max),‘I am no. 3’))¬†

—- CODE SNIPPET¬†END —-

If you now go into c:\then you should have a folder called: FStream, and the content in there is:

At this stage, you don’t have to think about the structure of this folder.

5. RETRIEVING DATA
If you want to see your data, you can write a sql-statement like this

—- CODE SNIPPET¬†START —-

Select *,CONVERT(varchar(50), chart)as converted
from FilestreamFun

—- CODE SNIPPET¬†END —-

It should look something like:

What you can see is that if you don’t convert, then you can’t see your data.

 

Remember that you need to convert to the datatype that you want to see the data in.

 

THE BIG QUESTIONS:
OK, that is nice, but why not just use a link to external documents? Basically you could do that, but then you loose a lot of benefits:

  1. The data is a part of the normal SQL Server backup, so you don’t have to worry about backups if you have setup the SQL Server backup properly.
  2. Performance on large BLOB’s are quite good.
  3. Transactions are treated as a normal datatype, so that you e.g. can rollback.
  4. The data is treated like all other datatypes, so it makes your life much easier (of course there are technical limitations but don’t think about them now).