SQL Programmering er blogs der vedr√łrer programmering af SQL servere.

Primary Keys and clustered indexes

One of the common misunderstandings in relation to the Microsoft SQL server is that a primary key always will be enforced with a clustered index.

That is not true, and in the following video I demonstrate that.

MS SQL 2012 ‚Äď Offset and fetch

(You must have the database adventureworks2012 installed. Download it from codeplex.com)

This is one of the new cool features of Microsoft SQL Server 2012, and if you are a developer, then you will love.

Earlier when you wanted to create e.g. a website where you wanted to have a grid where you wanted paging, then you had to rely on either the .net components or complex SQL if you wanted to show e.g. 50 rows pr. page and your table had e.g. 2000 rows.

With the new Offset and fetch, you can create readable SQL and easy to understand SQL (even for all the .net guys out there).

So lets see a simple example. You want to retrieve 50 rows from the Person table in the Person Scheme. You want to order by lastname, firstname and middlename and you want to retrieve row 51 to 100.

   1: select BusinessEntityID, coalesce(Firstname+' '+MiddleName+' '+LastName, Firstname+' '+LastName,FirstName) as name

   2: from person.person

   3: order by LastName, Firstname, middlename

   4: offset 50 rows fetch next 50 rows only;

Line 4 does the trick for you, and basically you say:

 

  1. I want you to start at offset 50 (0 i also a offset, so offset 50 will be record 51).
  2. You want to fecth the next 50 rows.

If you want to read the full documentation, then go to books online and read it there.

But what about performance? Well, if we look at two important things, then we can look at the execution plan and the page reads. The executionplan looks like this

image

 

As you can see, then we do an index scan, Compute Scalar (that is of no interest here, because that has just something to do with the Coalesce that I uise) and a Top.

From the executionplan I can see that the cost is 0,0037976.

If we look at the pages read, then it looks like this

image

 

As you can see, then there are 1 scan and 3 logical reads.

What if we did it the old fashion way? Then the code would look something like this (an example):

   1: with yourCTE as

   2: (

   3:     select row_number() over (order by LastName, Firstname, middlename) as rowNumber

   4:             , BusinessEntityID, coalesce(Firstname+' '+MiddleName+' '+LastName, Firstname+' '+LastName,FirstName) as name

   5:     from Person.Person

   6: )

   7: select *

   8: from yourCTE

   9: where Rownumber between 51 and 100

  10: order by rownumber;

 

As you can see, then we uses a common table expression, and the code is much more complex. For a SQL Guy, it is easy to read, but why make it difficult, when we can make it simple? Let’s look at the executionplan for this query:

image

It’s much more complex, and the cost is a little bit higher: 0,0038956.

Reads are the same.

Conclusion: I would recommend you to use the new offset and fetch instead of common table expressions; it’s easier and in many cases performance would be better. If performance is the key issue, then make sure that you look into all alternatives AND that you measure instead of guessing. In some situations it might be better to load the complete dataset into the application and then do the filtering within the application code. It all depends.

IAM Page Error – Fix it with a page restore

Today I have been playing around with situations, where your IAM page get corrupt, and I have tried to figure out what to do, and let me say it like this: If you have a valid backup then use that, and depending on the situation consider using page restore.

NOTE!! THIS BLOG IS TECHNICAL AND THERE ARE THINGS THAT YOU SHOULD NOT DO UNLESS YOU ARE 100% SURE ABOUT WHAT YOU DO. WE TAKE NO RESPONSIBILITY IF YOU TRY WHAT WE WRITE IN THE BLOG.

So, lets get started. I want to create a simple database, where I create one table, and I will corrupt that tables IAM Page. Basically that means: You can’t access your data.

So, lets start the blog by setting up a new database, a table and fill in 1.000 rows.

use master

go

 

if¬†DATABASEPROPERTYEX(‘PageRestore’,‘Version’)>0

drop database pagerestore

 

create database Pagerestore

go

 

use PageRestore

go

 

create table LargeTable(id intidentity, name char(4000))

 

go

 

insert into¬†Large Table¬†Values (‘Soak AS’)

go 1000

So, now we have a database called PAGERESTORE and a table called LARGETABLE with 1.0000 rows.

What I want to see now, is what pages the table LargeTable occupies:

dbcc ind(PageRestore , largetable,1)

go

On my computer the result is something like.

Skærmbillede 2013 04 07 kl 17 54 51

I can see that page 119 in file 1 has Null in IAMFID and IAMPID (and that PageType = 10), and then I know that it is a IAM Page.

Before we start, lets do a backup (create the physical folder C:\SQL

BACKUP¬†DATABASE¬†[Pagerestore]¬†TO¬† DISK=N’c:\sql\pagerestore.bak’¬†WITH¬†FORMAT,¬†INIT,¬† NAME=N’Pagerestore-Full Database Backup’, ¬† STATS= 20

go 

After you have taken the backup, then we corrupt the IAM page. NOTE!!! DON’T DO THIS ON A PRODUCTSERVER!!!!!! NEVER!!!!

use master

go

alter database page restore set single_user with no_wait

 

dbcc writepage(PageRestore, 1, 119, 200, 1, 0x50,1)

alter database pagerestore set multi_user with no_wait

DBCC WritePage is an undocumented command, that writes values to the physical page. Imagine what that does to a database. Terrible thins, so don’t use it, unless you really want to get into troubles. I assume that your IAM page is at page 119. If not, then change the number 119. 200 is byte 200 on the page. It was just a random byte, to destroy the page.

Now….. try to run DBCC CHECKDB. It willnot look nice.

Skærmbillede 2013 04 07 kl 18 02 34

and at the bottom of the output, I can see the following:

Skærmbillede 2013 04 07 kl 18 03 10

I think the most important thing is the second line from the bottom. It says that the minimum repair level is Reparir_allow_data_loss, which means that if I run checkDB to fix the error, then I will loose the data in my table (basically).

If I try to run query against the table LargeTable, I will get an error:

Skærmbillede 2013 04 07 kl 18 05 13

So basically… I am in deep trouble.

Lucky for me‚Ķ I have a backup, so what I can do is to restore that backup, and it is very simple with SQL 2012. Right click the database –> chose tasks –> choose restore –> Choose page, and the following dialog will occur:

Skærmbillede 2013 04 07 kl 18 08 21

Choose Check database Pages, and if no backup is selected, then select it. PLEASE NOTE!!! There might be some difficulties with selecting backup’s but when you have the right backups, then just click OK.

What happens, and what does everybody with SQL 2012 need to do and understand?

Well, I could restore a single page by writing:

RESTORE DATABASE¬†[Pagerestore]¬†PAGE=‘1:119’¬†FROM¬† DISK=N’c:\sql\pagerestore.bak’¬†WITH¬† FILE= 1,¬† NORECOVERY

And what happens, then is that the page is marked with Recovery pending. That is because we must restore the necessary log backups, and before we do that we must start taking a backup of the tail of the database. So, to restore a page, we now know that we need these three steps:

1) Restore the page.

2) Backup the tail of the log.

3) Restore necessary log backups, including the tail of the log.

So the complete steps, if you don’t have SQL 2012 and a nice UI will be:

USE[master]

go

RESTORE DATABASE¬†[Pagerestore]¬†PAGE=‘1:119’¬†FROM¬† DISK=N’c:\sql\pagerestore.bak’¬†WITH¬† FILE= 1,¬† NORECOVERY

backup¬†log[Pagerestore]¬†to¬†disk=N’c:\sql\pagerestore.trn’¬†with¬†format

restore¬†log¬†[Pagerestore]¬†from¬†disk=N’c:\sql\pagerestore.trn’

 

If you have done everything correctly, then you should be able to run DBCC CHECKDB and to query the table ūüôā

that’s it. Very easy, but of course there are difficulties. But now you have the general perspective and overview.

 

 

Waits and queues with a practical approach

Earlier I have been writting about Waits and queues and how to use that approach to find out about performance issues. Waits and queues does not give you the answers, but it guides you to the area(s) where there might be a problem.  You can read my post about waits and queues here.

You must have read and understood my blog about waits and queues before you continue with this blog.

With this post I want to show you how to setup an environment where you can see changes in waits and queues and you can do some analysis. If time permits, then I will post a blog later about how to dive into the waits and see what caused the wait.

What I want to achieve is a database where I capture the accumulated waits and queues every hour (half hour or every day, depending on what you prefer).

So, what I need is:

  1. Setup a database.
  2. Create a table to hold the data.
  3. Create a stored procedure to collect the data.
  4. Setup a SQL Server Agent job to automate the collection.
  5. Create a view to see the capture times.
  6. Create a stored procedure to report the data

Before we start, then note that the script to show the waits is done by Paul Randal, www.sqlskills.com.

1. Setup a database

First we must setup the database, and I suggest that you do the following (remember to change physical path):

USE [master]
GO

CREATE DATABASE [Logging] ON  PRIMARY
( NAME = N'Logging', FILENAME = N'd:\SQL\Data\Logging.mdf'
         , SIZE = 524288KB , MAXSIZE = UNLIMITED, FILEGROWTH = 131072KB )
 LOG ON
( NAME = N'Logging_log', FILENAME = N'e:\SQL\LOG\Logging_log.ldf'
      , SIZE = 262144KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO

2. Setup a table to hold the data

Now we must have a table to store the data that we collect every hour, and I would recommend the following table. It can be optimized, so this is a basic table:

USE [Logging]
GO

CREATE TABLE [dbo].[wait_statistics](
    [CaptureTime] [datetime] NOT NULL,
    [WaitType] [nvarchar](60) NOT NULL,
    [Wait_S] [decimal](14, 2) NULL,
    [Resource_S] [decimal](14, 2) NULL,
    [Signal_S] [decimal](14, 2) NULL,
    [WaitCount] [bigint] NOT NULL,
    [Percentage] [decimal](4, 2) NULL,
    [AvgWait_S] [decimal](14, 4) NULL,
    [AvgRes_S] [decimal](14, 4) NULL,
    [AvgSig_S] [decimal](14, 4) NULL
) ON [PRIMARY]

3. Create a stored procedure to collect the data

So, now we are ready to fill in some data, and for that we use a stored procedure. The reason for that, is that the wait types we are filtering (that we don’t want to see) changes from version to version, and therefore it would be nice to have a stored procedure where we can easily change what to filter and not.

USE [Logging]
GO

SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE procedure [dbo].[collect]
as
WITH Waits AS
    (SELECT
        wait_type,
        wait_time_ms / 1000.0 AS WaitS,
        (wait_time_ms - signal_wait_time_ms) / 1000.0 AS ResourceS,
        signal_wait_time_ms / 1000.0 AS SignalS,
        waiting_tasks_count AS WaitCount,
        100.0 * wait_time_ms / SUM (wait_time_ms) OVER() AS Percentage,
        ROW_NUMBER() OVER(ORDER BY wait_time_ms DESC) AS RowNum
    FROM sys.dm_os_wait_stats
    WHERE wait_type NOT IN (
        'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE', 'SLEEP_TASK',
        'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH', 'WAITFOR', 'LOGMGR_QUEUE',
        'CHECKPOINT_QUEUE', 'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT'
        , 'BROKER_TO_FLUSH',
        'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT'
        , 'DISPATCHER_QUEUE_SEMAPHORE',
        'FT_IFTS_SCHEDULER_IDLE_WAIT', 'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN'
        , 'BROKER_EVENTHANDLER',
        'TRACEWRITE', 'FT_IFTSHC_MUTEX', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
        'BROKER_RECEIVE_WAITFOR', 'ONDEMAND_TASK_QUEUE', 'DBMIRROR_EVENTS_QUEUE',
        'DBMIRRORING_CMD', 'BROKER_TRANSMITTER', 'SQLTRACE_WAIT_ENTRIES',
        'SLEEP_BPOOL_FLUSH', 'SQLTRACE_LOCK')
    )
insert into wait_statistics
SELECT
    getdate() as CaptureTime
    ,W1.wait_type AS WaitType,
    CAST (W1.WaitS AS DECIMAL(14, 2)) AS Wait_S,
    CAST (W1.ResourceS AS DECIMAL(14, 2)) AS Resource_S,
    CAST (W1.SignalS AS DECIMAL(14, 2)) AS Signal_S,
    W1.WaitCount AS WaitCount,
    CAST (W1.Percentage AS DECIMAL(4, 2)) AS Percentage,
    case when w1.WaitCount<>0 then
        CAST ((W1.WaitS / W1.WaitCount) AS DECIMAL (14, 4))
    else
        0
    end AS AvgWait_S,
    case when w1.WaitCount<>0 then
        CAST ((W1.ResourceS / W1.WaitCount) AS DECIMAL (14, 4))
    else
        0
    end AS AvgRes_S,
    case when w1.WaitCount<>0 then
        CAST ((W1.SignalS / W1.WaitCount) AS DECIMAL (14, 4))
    else
        0
    end AS AvgSig_S
FROM Waits AS W1
    INNER JOIN Waits AS W2 ON W2.RowNum <= W1.RowNum
GROUP BY W1.RowNum, W1.wait_type, W1.WaitS, W1.ResourceS, W1.SignalS
        , W1.WaitCount, W1.Percentage

GO

It is very important to know what version of SQL Server you are using, because then you can decide which wait types you are going to exclude in the line WHERE WAIT_TYPE NOT IN (xxx).

4. Setup a SQL Server Agent Job to automate collection

Now when we have a stored procedure to collect the data it is very easy to setup a sql server agent job to automate collection.

USE [msdb]
GO
DECLARE @jobId BINARY(16)
EXEC  msdb.dbo.sp_add_job @job_name=N'Collection',
        @enabled=1,
        @notify_level_eventlog=0,
        @notify_level_email=2,
        @notify_level_netsend=2,
        @notify_level_page=2,
        @delete_level=0,
        @description=N'This job collects wait statistics.',
        @category_name=N'Database Maintenance',
        @owner_login_name=N'YOURUSERID', @job_id = @jobId OUTPUT
select @jobId
GO
EXEC msdb.dbo.sp_add_jobserver @job_name=N'Collection', @server_name = N'YOURSERVERNAVN'
GO
USE [msdb]
GO
EXEC msdb.dbo.sp_add_jobstep @job_name=N'Collection', @step_name=N'Wait_statistics',
        @step_id=1,
        @cmdexec_success_code=0,
        @on_success_action=1,
        @on_fail_action=2,
        @retry_attempts=0,
        @retry_interval=0,
        @os_run_priority=0, @subsystem=N'TSQL',
        @command=N'exec collect',
        @database_name=N'Logging',
        @flags=0
GO
USE [msdb]
GO
EXEC msdb.dbo.sp_update_job @job_name=N'Collection',
        @enabled=1,
        @start_step_id=1,
        @notify_level_eventlog=0,
        @notify_level_email=2,
        @notify_level_netsend=2,
        @notify_level_page=2,
        @delete_level=0,
        @description=N'This job collects wait statistics.',
        @category_name=N'Database Maintenance',
        @owner_login_name=N'YOURUSER',
        @notify_email_operator_name=N'',
        @notify_netsend_operator_name=N'',
        @notify_page_operator_name=N''
GO
USE [msdb]
GO
DECLARE @schedule_id int
EXEC msdb.dbo.sp_add_jobschedule @job_name=N'Collection', @name=N'EveryHour',
        @enabled=1,
        @freq_type=8,
        @freq_interval=1,
        @freq_subday_type=8,
        @freq_subday_interval=1,
        @freq_relative_interval=0,
        @freq_recurrence_factor=1,
        @active_start_date=20130312,
        @active_end_date=99991231,
        @active_start_time=0,
        @active_end_time=235959, @schedule_id = @schedule_id OUTPUT
select @schedule_id
GO

5. Create a view to see the capture times

So.. Let’s see if we can create a view that can show us all capture times, because we need them as inputparameters for the stored procedure that will return the figures.

use Logging
go

create view CaptureTimes
as
select distinct capturetime
from wait_statistics

6. Create a stored procedure to report the data

So, now we only need to create a stored procedure to capture the data. That is very easy, and when that is done, you can start monitoring the waits and queues.

USE [logging]
GO

/****** Object:  StoredProcedure [dbo].[InternalDezision_reports_waits_output]    Script Date: 03/12/2013 09:50:34 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE procedure [dbo].[ReportWaits]

@endtime        datetime,
@starttime        datetime
as

begin try
    drop table #tmptable
end try
begin catch
end catch

select WaitType
        , convert(numeric(38,10), Wait_S - isnull((select Wait_s
from wait_statistics i1 where t1.WaitType=i1.WaitType
and i1.CaptureTime=@starttime),0)) as Wait_s
        , Resource_S - isnull((select Resource_s from wait_statistics i1
where t1.WaitType=i1.WaitType and i1.CaptureTime=@starttime),0) as Resource_s
        , Signal_S - isnull((select Signal_s from wait_statistics i1
where t1.WaitType=i1.WaitType and i1.CaptureTime=@starttime),0) as signal_s
        , WaitCount - isnull((select WaitCount from wait_statistics i1
where t1.WaitType=i1.WaitType and i1.CaptureTime=@starttime),0) as waitcount
        , convert(numeric(38,10), 0.00) as percentage
        , convert(numeric(38,10), 0) as avg_Wait_S
        , convert(numeric(38,10), 0) as avg_Resource_S
        , convert(numeric(38,10), 0) as avg_signal_s
into #tmptable
from wait_statistics t1
where CaptureTime=@endtime
order by CaptureTime desc

-- Lets delete
delete #tmptable
where waitcount=0

-- Let's find the wait
declare @wait_s numeric(38,10)
select @wait_s=SUM(wait_s) from #tmptable

update #tmptable
set percentage=wait_s/@wait_s *100
    , avg_Wait_S=wait_s/waitcount
    , avg_resource_S=resource_s/waitcount
    , avg_signal_S=signal_s/waitcount

select *
        , @starttime as starttime
        , @endtime as endtime
        , datediff(s, @starttime, @endtime) as PeriodInSeconds
        , 100*wait_s/datediff(s, @starttime, @endtime) as ShareOfPeriod
from #tmptable
order by percentage desc

7. Monitoring ‚Äď An example

Let’s see how it works:

First you try to find two dates you can compare, and for that you simply write:

select *
from capturetimes

The result in my case will be:

image

So I have two capturetimes to compare and therefore I can write:

USE [Logging]
GO

EXEC    @return_value = [dbo].[ReportWaits]
        @endtime = N'2013-03-12 09:49:19.373',
        @starttime = N'2013-03-12 09:41:59.793'

In my case (it is a very small timeperiod), the result is something like this:

image

As you can see, then on my server it is BackupIO that has caused 97.4877% of the waits.

What is missing in a lot of other blogs you are reading, is the column ShareOfPeriod. Here you can see that BackupIO has caused 98.865909, but the main thing is: With this column, you can see how long time of the capturetime we have been waiting. There could be a lot of theoretical discussions about this figure, and if it is good or bad, if we can use this column when we have multiple processor and so on.

BUT!!! It is better than just knowing: It is from the last restart or DBCC. When we collect each hour, we get a lot of flexibility to narrowing down where the problem is.

You can do a lot of optimization and make it easier to run the stored procedure without having to find the capturetimes, but… there should also be something for you to do.

Unique identifer or INT as clustered index

Today I was asked about what is best to use for a clustered index and a primary key in a table created on e.g. SQL Server 2008 R2. Is it a Unique Identifier or an Int.

As usual my answer was… It Depends, but if we only look at index maintenance and other performanceissues, then I would always prefer an Int instead of an Unique Identifier.

So what else can we to than try to test different scenarios.

First we create a database.

 

use master
go

-- First let's see if the database
-- already exists and if we have to delete it.
if DATABASEPROPERTYEX('keyTest','Version') is not null
begin
    drop database KeyTest
end

-- Lets create the database
Create database keyTest
go

use keyTest
go

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

First step is to test if the database KeyTest already exists. If it does then we drop it and create it again.  Then we switch context to the keyTest database.

Now, lets create two tables that have the same data, but a clustered index that are build on different datatypes. We implement the clustered index by using the primary key keyword (there are a big difference between a clustered index and a primary key, but if we create a table with the primary key keyword, then a clustered index is also created.

-- Lets create the two tables
Create table IdentifierTBL
        (    ID        uniqueidentifier        primary key
             default newID()
            , Postalcode        char(4)
            , Name                char(50)
        )

Create table IntTBL
        (    ID        int    identity        primary key
            , Postalcode        char(4)
            , Name                char(50)
        )

The two tables has the same columns, but in the first table the ID is a uniqueidentifier, that are 16 bytes long and in the first table it is an int, that are 4 bytes long.

Now let try to insert 10.000 rows in each of the tables.

 

-- Let's insert some data
insert into IdentifierTBL values (newid(), '1234', 
Replicate('A', 50))
go 10000

insert into IntTBL values ('1234', Replicate('A', 50))
go 10000

Thats it. We have two tables with 1000 rows. The only test we want to carry out now is to see the fragmentation and how many pages they take up.

select    t1.object_id
        , t2.name
        , t1.avg_fragmentation_in_percent
        , index_level, page_count
        , avg_record_size_in_bytes
from sys.dm_db_index_physical_stats(DB_ID(), 
       null, null, null, 'Detailed') t1
    inner join sys.objects t2 on t1.object_id=t2.object_id
where t2.name in ('IdentifierTBL', 'IntTBL')

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

The result looks like this:

image

We can see that both tables consists of two levels, a leaf level and a root level. We can see that if we use an Unique identifier, then fragmentation is approx. 99%, and when we use an integer, then it’s approx. 2%.

We can also see that the table with an Int as the clustered key, takes up 92 pages and the average record size is 65 bytes. The same figures for the table with the unique identifier is 154 pages and 77 bytes in average recordsize.

But it does not stop here. What if we create a nonclustered index on the postalcode. What will happen to our physical structures?

USE [keyTest]
GO
CREATE NONCLUSTERED INDEX
        [Nonclustered] ON [dbo].[IntTBL]
(
    [Postalcode] ASC
)

CREATE NONCLUSTERED INDEX
        [Nonclustered] ON [dbo].[IdentifierTBL]
(
    [Postalcode] ASC
)

 

Lets see the characteristica

 

select    t1.object_id
        , t2.name
        , t3.name
        , t1.avg_fragmentation_in_percent
        , index_level, page_count
        , avg_record_size_in_bytes
from sys.dm_db_index_physical_stats(DB_ID(), null, 
null, null, 'Detailed') t1
    inner join sys.objects t2 on t1.object_id=t2.object_id
    inner join sys.indexes t3 on t1.object_id=t3.object_id 
and t1.index_id=t3.index_id
where t2.name in ('IdentifierTBL', 'IntTBL')
and t3.name='NonClustered'

.csharpcode, .csharpcode pre
{
font-size: small;
color: black;
font-family: consolas, “Courier New”, courier, monospace;
background-color: #ffffff;
/*white-space: pre;*/
}
.csharpcode pre { margin: 0em; }
.csharpcode .rem { color: #008000; }
.csharpcode .kwrd { color: #0000ff; }
.csharpcode .str { color: #006080; }
.csharpcode .op { color: #0000c0; }
.csharpcode .preproc { color: #cc6633; }
.csharpcode .asp { background-color: #ffff00; }
.csharpcode .html { color: #800000; }
.csharpcode .attr { color: #ff0000; }
.csharpcode .alt
{
background-color: #f4f4f4;
width: 100%;
margin: 0em;
}
.csharpcode .lnum { color: #606060; }

 

The result is

image

What I am interested in showing you is the page count and the record size. As you can see, then the index is much bigger with a unique identifier (it uses 36 pages compared to the 20 pages). That is because the recordsize is  24 bytes compared to the 12 bytes that are used in the IntTBL. The difference is 12 bytes, and it is the difference between the 16 bytes of a unique identifier and the 4 bytes of an int.

What we can conclude is that the clustered index key is added to the nonclustered index and if it is 16 bytes, then 16 bytes is added; if it is 4 bytes, then it is 4 bytes that are added.

Why should we ever then use Unique identifier, because I have now showed you how much better an integer performs as a clustered key.

Well……. If I e.g. should have a database with disconnected users, e.g. a CRM system, then it could be a good idea to use a unique identifier. But it is not within the scope of this blog to write about that.

S√łren

Two ways of improving your ETL with SQL 2012 #1

SQL 2012 has improved a lot when it comes to the possibility to create a better and more efficient ETL, and in this short intro, I would like to show you some of the things I find useful.

In this series of blogs I will show you:

  1. Projectdeployment instead of package deployment.
  2. Using Change Data Capture to improve performance.

Projectdeployment

One of the things that has always irritated me was the need to deploy a package instead of a complete project. To be practical, then I will show you how to deploy a project to a SQL Server 2012 instead of. It is very easy and gives you a lot of benefits. And it makes it easier to configure values that must change depending on the installation. On a scale from 1 to 10, where 1 is no new value and 10 is great new value, then I give this functionality 9.

Change data capture

This is also not new, and we have been able to setup change data capture since version 2008, but now even a newbie to ETL can setup changedatacapture and use it in Integration services. We have some new components that helps us. I would give this 10 out of 10.

This is simply brilliant, since it addresses one of the biggest problems in building a datawarehouse and getting a good performance: How do we only update and insert changed and new rows.

So let’s get started with projectdeployment.

Project deployment –>

Setting up a database snapshot

A databasesnapshot is a point in time view of a given database. Imagine that you have the Adventureworks database and you want to have a view of the database pr. one given date and time. In that situation you want a database snapshot of the Adventure works database.

The syntax is:

Create database adventureworks_yyyymmdd on
(Name=adventureworks,¬† Filename=’d:\sqltests\Snapshots\adventureworksSnapshot.sss’)
as snapshot of adventureworks

What you have created now is a databasesnapshot that you can query like all other ‚Äúdatabases‚ÄĚ, the only difference is that the size of this database is 0.

When you create a snapshot, then it is empty until a page is changed. Before a page is changed, then the original page is moved to the snapshot and then the page is changed.

So when you query a snapshot, then the query first check if a page is changed since the snapshot. If so, then it is taken from the snapshot, otherwise it is taken from the original databasefile.

Your first extended event

Extended events are used to track down what happens in the SQL Server, and you can very easily setup an  extended event to e.g. montioring what SQL Statements a user is executing, what waits there are on the servers and e.g.

I recommend that you try this demo first, and then read the webpages from MSDN that introduces Extended events (http://msdn.microsoft.com/en-us/library/bb630354%28v=sql.105%29.aspx).

Learning Extened events are quite easy when you have an SQL server 2008 (and in 2012 it is getting easier again). It is a three step process:

  1. Setting up your first extened event session.
  2. Understanding the output
  3. Learning what events and actions you can use.

But let me start with a warning. Extened events are lightweight and very efficent, but if you set it up wrongly and try to catch to many events and outputs, then you can basically drain the performance on the SQL Server.

So lets try to get working (you need to have the adventureworks database installed, and you can download it from here. You..  also need an SQL Server 2008 or never).
OK. Lets start to setup an extended event that catches all sql statements in the adventureworks database.

  1. Open a command promt (go to start and type CMD).
  2. Write MD c:\XEtest hit enter

Extened events requires a folder to store the XML files that are the output (at least we choose to output to a file in this example). The service account that has started the SQL Server must have access to write to that folder.

We want to create an extended event called testSQL and therefore we must make sure that we drop it if it exists on the server (so that we can set it up). Open Microsoft Sql Server Management studio and start a new query. Write:

use master
go
IF EXISTS(SELECT * FROM sys.server_event_sessions WHERE name=’testSQL’)
DROP EVENT SESSION testSQL ON SERVER

This makes sure that we can setup a new SQL Extended event called testSQL, since we have just dropped the old one (if it existed).

Before we can setup the event session, then we must find the ID of the database that we want to track the SQL statements from. Execute the following statement and write down the number that it returns (for me it is 30)

select DB_ID(‘adventureworks’)

Now, lets try to setup the new extended event, and first of all we must decide what event to track, what action to setup and where to output. Let’s write the code

Create event session testSQL on server
add event sqlserver.sql_statement_completed
(    action (sqlserver.sql_text)
where sqlserver.database_ID=30)
add target package0.asynchronous_file_target
(set filename = N’C:\xetest\SQLTEST.xml’
, metadatafile= N’C:\xetest\SQLTESTmeta.xml’
)
with (max_dispatch_latency = 1 seconds)

 

The words I have written in red, is the important words, and let me try to explain.

First of all we Create an event session, and the syntax is CREATE EVENT SESSION xxxxx ON SERVER, where xxxxx is the name.

To create the event session, we must add an EVENT to track, and in this situation it is and event called sqlserver.sql_statement_completed. In a later post I will go through some of the cool events that can be used.

When the event is added, then we must add a target, and in this situation we just write to a file (I will also go through this in a later post).

Finally we can set some options with the with clause, and in our situation we have set a max_dispatch_latency on one second, which basically means that we store what we have cached in memory for max of 1 second.

If you have set it up correctly, then you now can start the event:

alter event session testSQL on server
State = Start

Now the event is starting, and you can try to write some sql statements:

use AdventureWorks
go

select *
from Person.Address

when you have done that, then you can stop the eventsession

alter event session testSQL on server
State = Start

 

and we are ready to see what we have captured.

select DATA as xmldata
, Data.value(‘(/event/action[@name=”sql_text”]/value)[1]’, ‘varchar(max)’)
from (
select CONVERT(xml, event_data) as data
from sys.fn_xe_file_target_read_file(
‘c:\xetest\sqltest*.xml’
, N’c:\xetest\sqltestmeta*.xml’
, null
, null
)
) entries

If you run this script, you will see two columns, one with some xml that you can look at, and one with the statement that has been executed.

And that is basically it. That is how you setup Extended events. But of course there are more to it.

  1. What events can we capture
  2. What actions can we do
  3. Which targets can we send the output to
  4. What options can we use.
  5. How can we format the output.

Especially 5 is very interesting and difficult, but basically now you know how to setup an extended event.

 

 

 

 

TSQL ‚Äď introduction – Working with data (4 of 4)

At this time you should be able to select data and working with objects, and now we will very quickly look at three important statements. They are called DML statements (Data Manipulation Language):

  1. Insert
  2. Update
  3. Delete

First we create a new database and a table:

use master
go

create database tsqldemo
go

use tsqldemo
go

create table    firstTable    (ID        int        identity    primary key
                             , name    varchar(200)
                             , birthPlace    varchar(200)
                            )

If that works and you get “Command(s) completed successfully), then we can continue with datamanipulation.

INSERT

Insert is very complex, and we will only look at the most simple form: Inserting simple values.

The syntax is

Insert into firsttable (name, birthPlace) values ('Soren', 'Vejle')

The first () where it says Name and birthplace, tells the system, that you want to insert values into the columns name and birthplace. We don’t want to insert into ID column, because it is an Identity column and SQL server takes care of that for us.

then we have the word Values and then the second (), where we have added the values (remember: text is put into single quotes ‘).

It’s so easy and you can try to write: Select * from firsttable. You should have one row.

UPDATE

Update has another syntax, so if I want to update firsttable, I can write:

Update firsttable
set name='Pernille'

It’s quite easy. You can also add a where clause:

Update firsttable
set name=’Soren’
Where name=’Pernille’

then it is only rows that meets the where clause that are updated.

DELETE

Delete has the following syntax:

Delete firsttable
where name='Soren'

ONE IMPORTANT NOTE ABOUT DELETE!!!!!!!! Always write a delete as a select clause, and then change Select * from xxxx to Delete XXXX. What i mean is. If I wan’t to delete all entries from my table where name = ‘Soren’, then I would write:

Select *
from firsttable
where name='Soren'

and then change it to:

Delete firsttable
where name='Soren'

That’s it. You now know the basics of TSQL. You have a lot to learn, but you have a good basis to move on. Use books online and Google to find more information or just write me on this blog, and I will try to help you.

<<– Back to step 3

TSQL ‚Äď introduction – Working with objects (3 of 4)

In this chapter you will learn about DDL Queries. DDL (Data Definition Language) is used to Created, Drop or altering new objects. an object could be a database, a table, a user an much more, but no matter what object, then DDL is very easy: You just need to learn three words:

  1. CREATE
  2. ALTER
  3. DROP

When you know these words, then you just need to learn the syntax for all the different types of objects. I will focus on Databases and tables.

CREATE

Create is very simple, and if you want to create a new database, you simply write

create database tsqldemo

Please try that and press f5. Now you have a new database, and if you want to test that, then clear the query window and write USE tsqlDemo. Press F5. If you don’t get any errors, then the database is created, and you are now in the database, ready to write some queries.

Why is it then difficult? Well, look at this link and see all the syntax for the Create database. You can do quite a lot of things, and if you don’t know how to configure things correctly, then you are in deep trouble. But don’t worry. Now you know how to create a database at least.

Before we start creating new tables, then we will make sure that you are in the right database. In the top left corner there is a drop down menu, and it should say the name of your database. It shows what database you are writting queries against:

As you can see, then I am in the wrong database, so I need to

  1. Use the dropdown to find my database OR
  2. write USE TSQLDEMO

Now I am ready to create the first table, and I write:

use tsqldemo
go

create table firstTable    (ID            int                identity        primary key
                         , name        varchar(200)
                        )    

I don’t need to write use tsqldemo and go if I am in the right database, but I am not ūüôā

As you might see, then we again just write CREATE object type, object name. Object type = table and object name=first table. Click on this link to see the full documentation for the Create Table.

Now I know that a table definition should always include () and within the brachets, you always write the columns that you want.

I have two columns, ID and name, and ID has a datatype Int for Integer and name has a datatype of varchar and it can be up to 200 characters long.

The ID column is an Identity, that means that the SQL server puts in the value and it adds 1 for each new row. that means: first row is 1, next is 2 and so on.

The ID column is also a primary key, that means that you can only have one value with the same ID (One column with ID 1 and so on).

NOW YOU HAVE ALL THE objects you need to start storing data (but we will wait with that a little while).

ALTER

Alter is when you want to change an existing object, that could be adding a column to a table, changing a property on a database or something else.

If I want to add a column to my table i just write:

alter table firstTable
add  birthPlace    varchar(200);

DROP

Drop is deleting an object, BUT please note that we are not talking about DELETING an object, because we DELETE DATA but DROP OBJECTS, so if I want to drop my table, then I write:

drop table firstTable

If I want to drop my database I write:

use master 
go

drop database tsqldemo

when dropping a database, it is important to learn that you must close all connections, and the best way when testing, it is very easy just to write Use master, so that you are not in the database you want to drop.

That’s it. You are now ready to learn to fill in data.

<<– Go back to blog 2¬†¬†¬†¬† go to blog 4 –>