Microsoft Business Intelligence TOOLS(SSIS,SSRS and SSAS): 3 Nov 2010

Wednesday, 3 November 2010

How to Use Merge Statement

We often need to perform operation that include

1. INSERT Record if it does not exits in Target Table

2. UPDATE or DELETE Record if it already exists in Target Table

There are different ways to perform above operation like using IF, CASE construct etc but none method is better than MERGE statement. In this article, we are going to learn how to use MERGE statement.

Scenario Detail

1. Update Record if Found in Target table

2. Insert Record if Not Found in Target Table

3. Mark Record as Deleted if Deleted at Source

Database Objects and Data used in example

Source Table : HumanResource.Department

CREATE TABLE [HumanResources].[Department]

(

[DepartmentID] [smallint] IDENTITY(1,1) NOT NULL,

[Name] [dbo].[Name] NOT NULL,

[GroupName] [dbo].[Name] NOT NULL,

[ModifiedDate] [datetime] NOT NULL,

CONSTRAINT [PK_Department_DepartmentID] PRIMARY KEY CLUSTERED

(

[DepartmentID] ASC

)

) ON [PRIMARY]

Target Table : Department_Copy (without IDENTITY property but with Additional column bit type named Deleted)

Key Column Name: Name

Data in Target Table:

Name GroupName ModifiedDate Deleted
Engineering Research and Development 6/1/1998 0
Tool Design Research and Development 6/1/1998 0
Sales Sales and Marketing 6/1/1998 0
Marketing Sales and Marketing 6/1/1998 0
Purchasing Inventory Management 6/1/1998 0
Research and Development Research and Development 6/1/1998 0
Production Manufacturing 6/1/1998 0
Production Control Manufacturing 6/1/1998 0
Human Resources Executive General and Administration 6/1/1998 0
Finance Executive General and Administration 6/1/1998 0

How to use MERGE

Scenario #1

Lets update GroupName for Finance department from "Executive General and Administration" to "Accounts and Finance" using following query:

update HumanResources.Department

set GroupName = 'Accounts and Finance'

where Name = 'Finance'

Now Lets Sync-up Department_Copy table with HumanResources.Department:

MERGE Department_Copy AS deptCopy

USING (SELECT * FROM HumanResources.Department)AS dept

ON (deptCopy.Name = dept.Name)

WHEN MATCHED then

update

set

deptCopy.GroupName = dept.GroupName

,ModifiedDate = getDate()

; -- A MERGE statement must be terminated by a semi-colon (;)

After you execute above statement, you will notice that Department_Copy table has updated value for Department "Finance" with modification date:

Name GroupName ModifiedDate Deleted
Finance Accounts and Finance 4/26/2010 0

Scenario #2

Lets insert new record in HumanResource.Department

insert into HumanResources.Department (Name, GroupName)

values ('Document Control','Quality Assurance')

Now Lets Sync-up Department_Copy table with HumanResources.Department if any new record entered then add in Department_Copy without removing flexibility of changing data if any updated:

MERGE Department_Copy AS deptCopy

USING (SELECT * FROM HumanResources.Department)AS dept

ON (deptCopy.Name = dept.Name)

WHEN MATCHED then

update

set

deptCopy.GroupName = dept.GroupName

,ModifiedDate = getDate()

--following line will add new record if not found in Department_Copy table

WHEN NOT MATCHED BY TARGET THEN

insert

(

DepartmentID

,Name

,GroupName

,ModifiedDate

)

values

(

DepartmentID

,Name

,GroupName

,ModifiedDate

)

;

Scenario #3

Lets delete record in HumanResource.Department but do not delete record in Department_Copy table while synchronizing but need to identity if record was deleted in HumanResoource.Department table

In order to achieve this, I have added an extra column in Department_Copy table named Deleted using bit data type. Now lets examine by performing followings:

delete HumanResources.Department

where Name = 'Finance'

Now, lets synchronize Department_Copy with HumanResource.Department which can flag deleted record along with providing solution for 1st two scenario tested above:

MERGE Department_Copy AS deptCopy

USING (SELECT * FROM HumanResources.Department)AS dept

ON (deptCopy.Name = dept.Name)

WHEN MATCHED then

update

set

deptCopy.GroupName = dept.GroupName

,ModifiedDate = getDate()

--following line will add new record if not found in Department_Copy table

WHEN NOT MATCHED BY TARGET THEN

insert

(

Name

,GroupName

,ModifiedDate

)

values

(

Name

,GroupName

,ModifiedDate

)

--following line will update Deleted column with value 1 if Record(s) is/are deleted in HumanResource.Department table

WHEN NOT MATCHED BY SOURCE THEN

update

set Deleted = 1

;

Result of Scenario # 3 will changed valued for Deleted column for Finance Department as shown below:

Name GroupName ModifiedDate Deleted
Finance Accounts and Finance 4/26/2010 1

Conclusion
MERGE Statement can be used to perform INSERT, UPDATE and DELETE all in single statement

Table Partitioning

SQL Server Database Table Partitioning technique is fully available in SQL Server 2005 and SQL Server 2008 but logically it was available in SQL Server 2000 as a "Partitioned View". Partitioned View had several limitations. Lets looks at few facts of Partitioned View implementation:

Partitioning a table horizontally by replacing original table with several smaller member tables.
Data separation loaded into table was based on CHECK Constraint created on one of their column
In case of distributed partitioned view, each member table is an individual member server
You can not exceed 256 member tables
One member table can not have more than one range.
If you are implementing distributed partitioned view then add a linked server
Any linked server cannot be a loopback linked server, that is, a linked server that points to the same instance of SQL Server
Data type mapping limitation in Distributed partitioned view for example smallmoney columns in remote tables is mapped as money.
Data placement has to be planned properly as it is not controlled automatically
Partitioned views are transparent to the application
Limitations of data types and column types.
Having views on every table in your database could be a maintenance nightmare
SQL Server 2005 and 2008 extended horizontal partitioning concept that 2000 was using by allowing partition placement on the same table within a single database. Hence, planning, implementation and maintenance is very easy and manageable. It also improves the performance, reduce contention and increase availability of data.

How to implement Partitioning in SQL Server 2005/2008?

Lets assume you have Sales database that has a table named SalesLog. Your goal is to to keep 3 years of data at any given point and purge old data. You also know that your mostly report requires data for 6 month to analyze sales detail. There is no update or delete operation on table. Table structure is:

SaleDate datetime,

.....................
.....................

Below is step by step to implement partitioning on SalesLog table to meet requirement and also to gain performance.

Step 1

As we identified, 3 years is life cycle of data, mostly report requires 6 month of data and table has only insert and select operation. So, it would be ideal to have 7 partitions with separate file groups. Main Reasons:

Performance
Data Backup Strategy. Since, data is static/read only so we can implement File group backup instead of performing full backup. Number of filegroups and files per database could be up to 32767 in both 32 bit and 64 bit SQL Server
Lets Alter database to add filegroups using command below:

Alter Database Sales Add FileGroup [PartitionFG1]
Alter Database Sales Add FileGroup [PartitionFG2]
Alter Database Sales Add FileGroup [PartitionFG3]
Alter Database Sales Add FileGroup [PartitionFG4]
Alter Database Sales Add FileGroup [PartitionFG5]
Alter Database Sales Add FileGroup [PartitionFG6]
Alter Database Sales Add FileGroup [PartitionFG7]

Step 2

Add files to filegroups:

Alter Database Sales Add File
(
Name = 'Partition_Data1',
FileName = 'F:\MSSQL\Data\Partition_Data1.ndf',
Size = 10240 MB
)
To FileGroup [PartitionFG1]

.
.
.
.
.
.
.

Alter Database Sales Add File
(
Name = 'Partition_Data7',
FileName = 'F:\MSSQL\Data\Partition_Data7.ndf',
Size = 10240 MB
)
To FileGroup [PartitionFG7]

Step 3

Create Partition Function to define data range. It will map row of SalesLog table into partitions based on the values of SaleDate column.

use Sales
GO
Create Partition Function SalesLogPFN (datetime)
as
range RIGHT for values
(
'2008-01-01',
'2008-07-01',
'2009-01-01',
'2009-07-01',
'2010-01-01',
'2010-07-01'
)

Data Type specified in Partition Function must be the same as Partition Column in Table that you are going to partition. Partition function as such is not directly related to any table. Possible value for range is RIGHT and LEFT. In our case, 1st Partition will contain rows that has SaleDate less than 2008-01-01, 2nd Partition will have row starting from 2008-01-01 up to 2008-06-30 23:59:59:900 and similarly 3rd and others.

Step 4

Create Partition Scheme to link partition function to proper file groups that we have created as part of Step 2. So, in our case, we created 7 file groups which is correct as we have 6 value for partition function.

use Sales
GO

Create Partition Scheme SalesLogScheme
AS
PARTITION SalesLogPFN
TO ([PartitionFG1], [PartitionFG2], [PartitionFG3], [PartitionFG4], [PartitionFG5], [PartitionFG6], [PartitionFG7])

Step 5

Now, we need to create clustered Index on SaleDate column that will place data to corresponding file groups automatically based on definition of SalesLogScheme

CREATE CLUSTERED INDEX [IDX_SalesLog_SaleDate] ON [dbo].[SalesLog]([SaleDate]) ON SalesLogScheme ([SaleDate])

It is important to create clustered index aligned which is by creating index on column that is partitioned. If a clustered index is not aligned then any merging of partition will require us to drop the clustered index along with the non-clustered indexes and rebuild them.

Now your are all set.

Conclusion

Partitioning a table divides the table and its indexes into smaller chunks so that it can be easily maintained as maintenance operation can be applied partition by partition basis and also improves query performance as optimizer can use proper queries to appropriate partitions instead of querying entire table.

Change Data Capture

It is new feature in SQL Server 2008 that provides feature to track all data changes without writing triggers. CDC process works using transaction log. When table is enabled for CDC immediately after any changes made to table is available for tracking. Meta data change is also written along with the actual change.

Before we start looking into implementation, lets look at pre-requisition:

SQL Server 2008 Enterprise Edition

Sysadmin permission required at database level to enable CDC

dbo permission required to enable CDC at table level

Primary Key required in table only when net change is required

Now, lets start how to implement it:

I have not used any particular database name or table name while writing step by step implementation guide. Hence please follow syntax pattern with actual name for value indicated within <> brackets

Step 1: Enable CDC at database Level

Step 2: Enable CDC at Table Level

You are all set to start using changed data tracked.

You may visit http://technet.microsoft.com/en-us/library/bb510744.aspx for list of functions available for CDC

Implementation Overview

Conclusion

CDC is an excellent feature in SQL Server 2008 but it is very important to use it with proper planning. A lot many number of records will be added to the tracked table so ensure you have enough space to store changed data.

It is new feature in SQL Server 2008 that provides feature to track all data changes without writing triggers. CDC process works using transaction log. When table is enabled for CDC immediately after any changes made to table is available for tracking. Meta data change is also written along with the actual change.

Before we start looking into implementation, lets look at pre-requisition:

SQL Server 2008 Enterprise Edition

Sysadmin permission required at database level to enable CDC

dbo permission required to enable CDC at table level

Primary Key required in table only when net change is required

Now, lets start how to implement it:

I have not used any particular database name or table name while writing step by step implementation guide. Hence please follow syntax pattern with actual name for value indicated within <> brackets

Step 1: Enable CDC at database Level

Step 2: Enable CDC at Table Level

You are all set to start using changed data tracked.

You may visit http://technet.microsoft.com/en-us/library/bb510744.aspx for list of functions available for CDC

Implementation Overview

Conclusion

CDC is an excellent feature in SQL Server 2008 but it is very important to use it with proper planning. A lot many number of records will be added to the tracked table so ensure you have enough space to store changed data.

What is difference between Clustered Index and Non Clustered Index?

- A Clustered Index consists of index as well as data pages. Clustered Index is not just an index but also contains the table data. A clustered index is organized as a B-tree where the non-leaf nodes are index pages and the leaf nodes are data pages.

- A Non-clustered index is organized as a B-tree but it consists of only index pages. The leaf nodes in a non-clustered index are not data pages, but contains pointer for individual rows in a data pages.

Microsoft Business Intelligence TOOLS(SSIS,SSRS and SSAS)

http://reddymsbitools.blogspot.com

Wednesday, 3 November 2010

How to Use Merge Statement

Table Partitioning

Change Data Capture

What is difference between Clustered Index and Non Clustered Index?

Followers

Popular Posts

Blog Archive