http://reddymsbitools.blogspot.com

Sunday, 28 November 2010

Deleting the duplicate records in sql sevrer by using CTE Expression

As I have mentioned in one of my previous tips that using ROW_NUMBER() we can get the serial number which is here.
 
Using the same ROW_NUMBER() function you can also easily delete the Duplicate Records present inside a table.

 
Lets say I have a Users Table which is having duplicate records. I need to delete the duplicate records which are having both the FirstName and LastName same.

 

 
Table: UsersList

 
FirstName       LastName       PhoneNumber       Address
------------    ------------     ----------------    -------------
Rashmita        Devi                987554437           NULL
Rashmita        Devi                446576578           NULL
Adwin            Ratzz              222222222           NULL
Charle            Hardwick         343534545           NULL
Adwin            Ratzz              576767688           NULL
Charle            Hardwick         877778777           NULL

 

 
So, with the help of ROW_NUMBER() all the duplicate records present in the above table can be removed easily.
The query will be composed this way:-

 
WITH TempUsers (FirstName,LastName, duplicateRecordCount)
AS
(
SELECT FirstName,LastName,
ROW_NUMBER()OVER(PARTITIONBY FirstName, LastName ORDERBY FirstName) AS duplicateRecordCount
FROM dbo.Users
)
DELETE
FROM TempUsers
WHERE duplicateRecordCount > 1
GO

 
 
Instead of TempUsers you can give any name. Because this is used only for Temporary purpose.
 
After the execution of the above query, Users table will have the following records.

 
FirstName          LastName        PhoneNumber         Address
------------       ------------      ----------------      -------------
Rashmita           Devi                987554437              NULL
Adwin               Ratzz              222222222             NULL
Charle               Hardwick         877778777             NULL
 
 

 
Note: Row_Number() function is present in SQL Server 2005 and in later version(s).

Thursday, 18 November 2010

Incremental Loading & Batch Processing?

Incremental Loading:Its also called retrospective data load.In this data being loaded on a regular inerval and only new data is moved.
eg in ur data warehouse ur loading "Transactional Data" on daily basis and its happning only for new Transactions because old Data is already in the Data warehouse.
Batch Processing:
Its processing of Jobs/Task together.
eg if u have a staging area u may have valid reason to get data for different Departments there first and then move all that in one shot to production.

Difference between ODS and Staging

---Scenario 1#
An operational data store (ODS) is a type of database often 
used as an interim area for a data warehouse. Unlike a data 
warehouse which contains static data the contents of the 
ODS are updated through the course of business operations. 
An ODS is designed to quickly perform relatively simple 
queries on small amounts of data (such as finding the 
status of a customer order) rather than the complex 
queries on large amounts of data typical of the data 
warehouse. An ODS is similar to your short term memory in 
that it stores only very recent information; in comparison 
the data warehouse is more like long term memory in that it 
stores relatively permanent information.

But in staging we are storing current as well as historic 
data. This data might be a raw and then need cleansing and 
transform before load into datawarehouse.
--Scenario 2#
ODS keeps the transactional data from where direct reporting is possible. The data in ODS can be partly denormalized.
Staging area is the exact replica of the transactional database which acts as a source for datawarehouse processing.

Types Of Facts?

# Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
# Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table but not the others.
# Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

Fact vs Dimension Table

A fact table captures the data that measures the organization's business operations. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization.

Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst.

Explain degenerated dimension in detail.

 Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis such column is called degenerate dimension or line item dimension.For ex we have a fact table with customer_id product_id branch_id employee_id bill_no date in key section and price quantity amount in measure section. In this fact table bill_no from key section is a single value it has no associated dimension table. Instead of cteating a seperate dimension table for that single value we can include it in fact table to improve performance.SO here the column bill_no is a degenerate dimension or line item dimension
-----Scenario 2
Degenerated Dimension is achieved through a gradual modeling approach following Dimensional Modeling standards. Let's take example of a Star Schema representing Sales Invoices. The FACT would have the "Invoiced Amount" as primary measure. Now when we look at the source of the Invoice, it is the body if the Paper Invoice that gives us the following particulars about each Invoice:
Invoice Date
Customer ID
Products within the Invoice
Reference to Order Number(s)
Invoice Number
Invoice Line Numbers (which are multiple lines in single Invoice)
Invoice Line Amount
Invoice Total Amount
When we model the above following Dimensional Modeling standards, we get following distinct Dimensions:
Calendar Dimension - representing the Invoice Date
Customer Dimension - representing Customer ID
Product Dimension - representing Products within the Invoice
Order Dimension - representing Orders
Invoice Dimension รข€“ representing Invoice Number & Invoice Line Numbers
Question comes - what attributes would be left to be part of the INVOICE DIMENSION, if at all we decide to have one! Only candidate attributes are Invoice Number and Invoice Line Numbers. But, this is at the granularity of the FACT, which stores references to all above said Dimensions as well as the measures i.e. Invoice Line Amount, Invoice Total Amount (Derived by aggregation).
It is at this situation, we may decide to degenerate the attributes Invoice Number & Invoice Line Number into the Fact and avoid having a distinct entity to represent Invoice Number / Line Numbers as a Dimension. What we achieve by this:
1. avoiding a huge join as both Fact and this Dimension would have the same granularity,
2. still able to query with Invoice Number as the entry point
So, when such a scenario appears, we make the left out attributes (i.e. Invoice Number & Invoice Line Number in our case) part of the Fact and part of the Primary Key in the Fact. This is why and how we model Degenerated Dimensions.

Sql server Database Backup's

OverviewThere are only two commands for backup, the primary is BACKUP DATABASE.  This allows you to do a complete backup of your database as well as differential, file, etc. backups depending on the options that you use.
ExplanationThe BACKUP DATABASE command gives you many options for creating backups.  Following are different examples.
Create a full backup to diskThe command is BACKUP DATABASE databaseName.  The "TO DISK" option specifies that the backup should be written to disk and the location and filename to create the backup is specified.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
GO
Create a differential backupThis command adds the "WITH DIFFERENTIAL" option.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK' 
WITH DIFFERENTIAL 
GO
Create a file level backupThis command uses the "WITH FILE" option to specify a file backup.  You need to specify the logical filename within the database which can be obtained by using the command sp_helpdb 'databaseName', specifying the name of your database.
BACKUP DATABASE TestBackup FILE = 'TestBackup' 
TO DISK = 'C:\TestBackup_TestBackup.FIL'
GO
Create a filegroup backupThis command uses the "WITH FILEGROUP" option to specify a filegroup backup.  You need to specify the filegroup name from the database which can be obtained by using the command sp_helpdb 'databaseName', specifying the name of your database.
BACKUP DATABASE TestBackup FILEGROUP = 'ReadOnly' 
TO DISK = 'C:\TestBackup_ReadOnly.FLG'
GO
Create a full backup to multiple disk filesThis command uses the "DISK" option multiple times to write the backup to three equally sized smaller files instead of one large file.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks_1.BAK',
DISK = 'D:\AdventureWorks_2.BAK',
DISK = 'E:\AdventureWorks_3.BAK'
GO
Create a full backup with a passwordThis command creates a backup with a password that will need to be supplied when restoring the database.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH PASSWORD = 'Q!W@E#R$'
GO
Create a full backup with progress statsThis command creates a full backup and also displays the progress of the backup.  The default is to show progress after every 10%.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH STATS
GO
Here is another option showing stats after every 1%.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH STATS = 1
GO
Create a backup and give it a descriptionThis command uses the description option to give the backup a name.  This can later be used with some of the restore commands to see what is contained with the backup.  The maximum size is 255 characters.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH DESCRIPTION = 'Full backup for AdventureWorks'
GO
Create a mirrored backupThis option allows you to create multiple copies of the backups, preferably to different locations.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
MIRROR TO DISK =  'D:\AdventureWorks_mirror.BAK'
WITH FORMAT
GO
Specifying multiple optionsThis next example shows how you can use multiple options at the same time.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
MIRROR TO DISK =  'D:\AdventureWorks_mirror.BAK'
WITH FORMAT, STATS, PASSWORD = 'Q!W@E#R$'
GO

Tuesday, 16 November 2010

Google News Blog: Add Google News to Your Site

Google News Blog: Add Google News to Your Site

Differences between SQL Server temporary tables and table variables

Temporary Tables

Temporary tables are created in tempdb. The name "temporary" is slightly misleading, for even though the tables are instantiated in tempdb, they are backed by physical disk and are even logged into the transaction log. They act like regular tables in that you can query their data via SELECT queries and modify their data via UPDATE, INSERT, and DELETE statements. If created inside a stored procedure they are destroyed upon completion of the stored procedure. Furthermore, the scope of any particular temporary table is the session in which it is created; meaning it is only visible to the current user. Multiple users could create a temp table named #TableX and any queries run simultaneously would not affect one another - they would remain autonomous transactions and the tables would remain autonomous objects. You may notice that my sample temporary table name started with a "#" sign. This is the identifier for SQL Server that it is dealing with a temporary table.

The syntax for creating a temporary table is identical to creating a physical table in Microsoft SQL Server with the exception of the aforementioned pound sign (#):

CREATE TABLE #TempTable

(

ID INT NOT NULL,

Name VARCHAR(10),

DOB DATETIME

)


Table Variables

The syntax for creating table variables is quite similar to creating either regular or temporary tables. The only differences involve a naming convention unique to variables in general, and the need to declare the table variable as you would any other local variable in Transact SQL:

DECLARE @TableVariable TABLE (

ID INT NOT NULL,

Name VARCHAR(10),

DOB DATETIME

)

As you can see the syntax bridges local variable declaration (DECLARE @variable_name variable_data_type) and table creation (column_name, data_type, nullability). As with any other local variable in T-SQL, the table variable must be prefixed with an "@" sign. Unlike temporary or regular table objects, table variables have certain clear limitations.

* Table variables can not have Non-Clustered Indexes
* You can not create constraints in table variables
* You can not create default values on table variable columns
* Statistics can not be created against table variables



Similarities with temporary tables include:

* Instantiated in tempdb
* Clustered indexes can be created on table variables and temporary tables
* Both are logged in the transaction log
* Just as with temp and regular tables, users can perform all Data Modification Language (DML) queries against a table variable: SELECT, INSERT, UPDATE, and DELETE.


Differences between SQL Server temporary tables and table variables

There are three major theoretical differences between temporary tables And table variables

* The first difference is that transaction logs are not recorded for the table variables. Hence, they are out of scope of the transaction mechanism

After declaring our temporary table #T and our table-variable @T, we assign each one with the same "old value" string. Then, we begin a transaction that updates their contents. At this point, both will now contain the same "new value" string. But when we rollback the transaction, as you can see, the table-variable @T retained its value instead of reverting back to the "old value" string. This happened because, even though the table-variable was updated within the transaction, it is not a part of the transaction itself.

* The second major difference is that any procedure with a temporary table cannot be pre-compiled, while an execution plan of procedures with table variables can be statically compiled in advance. Pre-compiling a script gives a major advantage to its speed of execution. This advantage can be dramatic for long procedures, where recompilation can be too pricy.

* Finally, table variables exist only in the same scope as variables. Contrary to the temporary tables, they are not visible in inner stored procedures and in exec(string) statements. Also, they cannot be used in an insert/exec statement.

Difference between a "where" clause and a "having" clause.

Having clause is used only with group functions whereas Where is not used with.

SQL Server consists of fourdatabases by default.

Master It contains system catalogs that keep information about disk space, file allocations, usage, system wide configuration settings, login accounts, the existence of other database, and the existence of other SQL Servers (for distributed operations).
Model It is a simply a template database. Every time you create a new database, SQL Server makes a copy of model to form the basis of the new database.
Tempdb Temporary database, tempdb, is a workspace. SQL Serverรข€™s tempdb database is unique among all other databases because it is recreated not recovered every time SQL Server is started.
Msdb This database is used by the SQL Server Agent Service, which performs scheduled activities such as backups and replication tasks.

Database Files: A database file is nothing more than an operating system file. SQL Server 2000 allows the following three types of database files:

ร‚·Primary data files (.mdf)
ร‚·Secondary data files (.ndf)
ร‚·Log files (.ldf)

When we create a new user database, SQL Server copies the model database (includes 19 system tables and 2 system views (for backward compatibility)). A new user database must be 1MB or greater in size.

We can create a new database using the following command: CREATE DATABASE newdb

Types of Backups:

* Full Backup
* Differential Backup
* Log Backup

Types of User Defined Functions

User defined Functions in SQL

User Defined Functions are compact pieces of Transact SQL code, which can accept parameters, and return either a value, or a table. Data transformation and reference value retrieval are common uses for functions. User Defined Functions enable the developer or DBA to create functions of their own, and save them inside SQL Server.

Advantages of User Defined Functions

Before SQL 2000, User Defined Functions (UDFs), were not available. Stored Procedures were often used in their place.

One of the advantages of User Defined Functions over Stored Procedures, is the fact that a UDF can be used in a Select, Where, or Case statement. They also can be used to create joins. In addition, User Defined Functions are simpler to invoke than Stored Procedures from inside another SQL statement.

Disadvantages of User Defined Functions

User Defined Functions cannot be used to modify base table information. The DML statements INSERT, UPDATE, and DELETE cannot be used on base tables.

Another disadvantage is that SQL functions that return non-deterministic values are not allowed to be called from inside User Defined Functions.

GETDATE is an example of a non-deterministic function. Every time the function is called, a different value is returned. Therefore, GETDATE cannot be called from inside a UDF you create.

Types of User Defined Functions

There are three different types of User Defined Functions. Each type refers to the data being returned by the function.

1. Scalar functions return a single value.
2. In Line Table functions return a single table variable that was created by a select statement.
3. The final UDF is a Multi-statement Table Function. This function returns a table variable whose structure was created by hand, similar to a Create Table statement. It is useful when complex data manipulation inside the function is required.


Example for Scalar-valued Function

CREATE FUNCTION dbo.DateOnly
(
@DateTime datetime
)
RETURNS varchar(10)
AS
BEGIN
DECLARE @Output varchar(10)
SET @Output = CONVERT(varchar(10),@DateTime,101)
RETURN @Output
END

To call the function, execute : SELECT dbo.DateOnly(GETDATE())


Example for Inline Table-valued Function

CREATE FUNCTION dbo.FindNamesBy
(
@Name varchar(10)
)
RETURNS TABLE
AS
BEGIN
RETURN SELECT * FROM Employee
WHERE EmpName LIKE '%' + @Name
END

To use the above function, execute : SELECT * FROM dbo.FindNamesBy('Cherukuri')


Example for Multi statement Table-valued Function

CREATE FUNCTION dbo.MultiLineFunction
(
@Name varchar(10)
)
RETURNS @Result TABLE
(
Empname varchar(20),
HireDate datetime,
OnProbation char(1)
)
AS
BEGIN
INSERT INTO @Result (EmpName, HireDate)
SELECT Empname, HireDate FROM Employee
WHERE EmpName LIKE '%' + @Name

UPDATE @Result SET OnProbation = 'N'

UPDATE @Result SET OnProbation = 'Y'

WHERE HireDate < '11/16/2002'

RETURN
END

To use the above function, execute :
SELECT EmpName,HireDate,OnProbation FROM dbo.MultiLineFunction('Ch')

Wednesday, 3 November 2010

How to Use Merge Statement

We often need to perform operation that include

1. INSERT Record if it does not exits in Target Table

2. UPDATE or DELETE Record if it already exists in Target Table

There are different ways to perform above operation like using IF, CASE construct etc but none method is better than MERGE statement. In this article, we are going to learn how to use MERGE statement.

Scenario Detail

1. Update Record if Found in Target table

2. Insert Record if Not Found in Target Table

3. Mark Record as Deleted if Deleted at Source



Database Objects and Data used in example

Source Table : HumanResource.Department

CREATE TABLE [HumanResources].[Department]

(

[DepartmentID] [smallint] IDENTITY(1,1) NOT NULL,

[Name] [dbo].[Name] NOT NULL,

[GroupName] [dbo].[Name] NOT NULL,

[ModifiedDate] [datetime] NOT NULL,

CONSTRAINT [PK_Department_DepartmentID] PRIMARY KEY CLUSTERED

(

[DepartmentID] ASC

)

) ON [PRIMARY]

Target Table : Department_Copy (without IDENTITY property but with Additional column bit type named Deleted)

Key Column Name: Name



Data in Target Table:

Name GroupName ModifiedDate Deleted
Engineering Research and Development 6/1/1998 0
Tool Design Research and Development 6/1/1998 0
Sales Sales and Marketing 6/1/1998 0
Marketing Sales and Marketing 6/1/1998 0
Purchasing Inventory Management 6/1/1998 0
Research and Development Research and Development 6/1/1998 0
Production Manufacturing 6/1/1998 0
Production Control Manufacturing 6/1/1998 0
Human Resources Executive General and Administration 6/1/1998 0
Finance Executive General and Administration 6/1/1998 0

How to use MERGE

Scenario #1

Lets update GroupName for Finance department from "Executive General and Administration" to "Accounts and Finance" using following query:

update HumanResources.Department

set GroupName = 'Accounts and Finance'

where Name = 'Finance'

Now Lets Sync-up Department_Copy table with HumanResources.Department:

MERGE Department_Copy AS deptCopy

USING (SELECT * FROM HumanResources.Department)AS dept

ON (deptCopy.Name = dept.Name)

WHEN MATCHED then

update

set

deptCopy.GroupName = dept.GroupName

,ModifiedDate = getDate()

; -- A MERGE statement must be terminated by a semi-colon (;)

After you execute above statement, you will notice that Department_Copy table has updated value for Department "Finance" with modification date:

Name GroupName ModifiedDate Deleted
Finance Accounts and Finance 4/26/2010 0

Scenario #2

Lets insert new record in HumanResource.Department

insert into HumanResources.Department (Name, GroupName)

values ('Document Control','Quality Assurance')

Now Lets Sync-up Department_Copy table with HumanResources.Department if any new record entered then add in Department_Copy without removing flexibility of changing data if any updated:

MERGE Department_Copy AS deptCopy

USING (SELECT * FROM HumanResources.Department)AS dept

ON (deptCopy.Name = dept.Name)

WHEN MATCHED then

update

set

deptCopy.GroupName = dept.GroupName

,ModifiedDate = getDate()

--following line will add new record if not found in Department_Copy table

WHEN NOT MATCHED BY TARGET THEN

insert

(

DepartmentID

,Name

,GroupName

,ModifiedDate

)

values

(

DepartmentID

,Name

,GroupName

,ModifiedDate

)

;

Scenario #3

Lets delete record in HumanResource.Department but do not delete record in Department_Copy table while synchronizing but need to identity if record was deleted in HumanResoource.Department table

In order to achieve this, I have added an extra column in Department_Copy table named Deleted using bit data type. Now lets examine by performing followings:

delete HumanResources.Department

where Name = 'Finance'

Now, lets synchronize Department_Copy with HumanResource.Department which can flag deleted record along with providing solution for 1st two scenario tested above:

MERGE Department_Copy AS deptCopy

USING (SELECT * FROM HumanResources.Department)AS dept

ON (deptCopy.Name = dept.Name)

WHEN MATCHED then

update

set

deptCopy.GroupName = dept.GroupName

,ModifiedDate = getDate()

--following line will add new record if not found in Department_Copy table

WHEN NOT MATCHED BY TARGET THEN

insert

(

Name

,GroupName

,ModifiedDate

)

values

(

Name

,GroupName

,ModifiedDate

)

--following line will update Deleted column with value 1 if Record(s) is/are deleted in HumanResource.Department table

WHEN NOT MATCHED BY SOURCE THEN

update

set Deleted = 1

;

Result of Scenario # 3 will changed valued for Deleted column for Finance Department as shown below:

Name GroupName ModifiedDate Deleted
Finance Accounts and Finance 4/26/2010 1



Conclusion
MERGE Statement can be used to perform INSERT, UPDATE and DELETE all in single statement

Table Partitioning

SQL Server Database Table Partitioning technique is fully available in SQL Server 2005 and SQL Server 2008 but logically it was available in SQL Server 2000 as a "Partitioned View". Partitioned View had several limitations. Lets looks at few facts of Partitioned View implementation:

Partitioning a table horizontally by replacing original table with several smaller member tables.
Data separation loaded into table was based on CHECK Constraint created on one of their column
In case of distributed partitioned view, each member table is an individual member server
You can not exceed 256 member tables
One member table can not have more than one range.
If you are implementing distributed partitioned view then add a linked server
Any linked server cannot be a loopback linked server, that is, a linked server that points to the same instance of SQL Server
Data type mapping limitation in Distributed partitioned view for example smallmoney columns in remote tables is mapped as money.
Data placement has to be planned properly as it is not controlled automatically
Partitioned views are transparent to the application
Limitations of data types and column types.
Having views on every table in your database could be a maintenance nightmare
SQL Server 2005 and 2008 extended horizontal partitioning concept that 2000 was using by allowing partition placement on the same table within a single database. Hence, planning, implementation and maintenance is very easy and manageable. It also improves the performance, reduce contention and increase availability of data.

How to implement Partitioning in SQL Server 2005/2008?

Lets assume you have Sales database that has a table named SalesLog. Your goal is to to keep 3 years of data at any given point and purge old data. You also know that your mostly report requires data for 6 month to analyze sales detail. There is no update or delete operation on table. Table structure is:

SaleDate datetime,

.....................
.....................



Below is step by step to implement partitioning on SalesLog table to meet requirement and also to gain performance.

Step 1

As we identified, 3 years is life cycle of data, mostly report requires 6 month of data and table has only insert and select operation. So, it would be ideal to have 7 partitions with separate file groups. Main Reasons:

Performance
Data Backup Strategy. Since, data is static/read only so we can implement File group backup instead of performing full backup. Number of filegroups and files per database could be up to 32767 in both 32 bit and 64 bit SQL Server
Lets Alter database to add filegroups using command below:

Alter Database Sales Add FileGroup [PartitionFG1]
Alter Database Sales Add FileGroup [PartitionFG2]
Alter Database Sales Add FileGroup [PartitionFG3]
Alter Database Sales Add FileGroup [PartitionFG4]
Alter Database Sales Add FileGroup [PartitionFG5]
Alter Database Sales Add FileGroup [PartitionFG6]
Alter Database Sales Add FileGroup [PartitionFG7]

Step 2

Add files to filegroups:

Alter Database Sales Add File
(
Name = 'Partition_Data1',
FileName = 'F:\MSSQL\Data\Partition_Data1.ndf',
Size = 10240 MB
)
To FileGroup [PartitionFG1]

.
.
.
.
.
.
.

Alter Database Sales Add File
(
Name = 'Partition_Data7',
FileName = 'F:\MSSQL\Data\Partition_Data7.ndf',
Size = 10240 MB
)
To FileGroup [PartitionFG7]



Step 3

Create Partition Function to define data range. It will map row of SalesLog table into partitions based on the values of SaleDate column.

use Sales
GO
Create Partition Function SalesLogPFN (datetime)
as
range RIGHT for values
(
'2008-01-01',
'2008-07-01',
'2009-01-01',
'2009-07-01',
'2010-01-01',
'2010-07-01'
)

Data Type specified in Partition Function must be the same as Partition Column in Table that you are going to partition. Partition function as such is not directly related to any table. Possible value for range is RIGHT and LEFT. In our case, 1st Partition will contain rows that has SaleDate less than 2008-01-01, 2nd Partition will have row starting from 2008-01-01 up to 2008-06-30 23:59:59:900 and similarly 3rd and others.

Step 4

Create Partition Scheme to link partition function to proper file groups that we have created as part of Step 2. So, in our case, we created 7 file groups which is correct as we have 6 value for partition function.

use Sales
GO

Create Partition Scheme SalesLogScheme
AS
PARTITION SalesLogPFN
TO ([PartitionFG1], [PartitionFG2], [PartitionFG3], [PartitionFG4], [PartitionFG5], [PartitionFG6], [PartitionFG7])

Step 5

Now, we need to create clustered Index on SaleDate column that will place data to corresponding file groups automatically based on definition of SalesLogScheme

CREATE CLUSTERED INDEX [IDX_SalesLog_SaleDate] ON [dbo].[SalesLog]([SaleDate]) ON SalesLogScheme ([SaleDate])

It is important to create clustered index aligned which is by creating index on column that is partitioned. If a clustered index is not aligned then any merging of partition will require us to drop the clustered index along with the non-clustered indexes and rebuild them.

Now your are all set.

Conclusion

Partitioning a table divides the table and its indexes into smaller chunks so that it can be easily maintained as maintenance operation can be applied partition by partition basis and also improves query performance as optimizer can use proper queries to appropriate partitions instead of querying entire table.

Change Data Capture

It is new feature in SQL Server 2008 that provides feature to track all data changes without writing triggers. CDC process works using transaction log. When table is enabled for CDC immediately after any changes made to table is available for tracking. Meta data change is also written along with the actual change.



Before we start looking into implementation, lets look at pre-requisition:

SQL Server 2008 Enterprise Edition

Sysadmin permission required at database level to enable CDC

dbo permission required to enable CDC at table level

Primary Key required in table only when net change is required

Now, lets start how to implement it:

I have not used any particular database name or table name while writing step by step implementation guide. Hence please follow syntax pattern with actual name for value indicated within <> brackets

Step 1: Enable CDC at database Level



Step 2: Enable CDC at Table Level







You are all set to start using changed data tracked.



You may visit http://technet.microsoft.com/en-us/library/bb510744.aspx for list of functions available for CDC

Implementation Overview





Conclusion

CDC is an excellent feature in SQL Server 2008 but it is very important to use it with proper planning. A lot many number of records will be added to the tracked table so ensure you have enough space to store changed data.




It is new feature in SQL Server 2008 that provides feature to track all data changes without writing triggers. CDC process works using transaction log. When table is enabled for CDC immediately after any changes made to table is available for tracking. Meta data change is also written along with the actual change.



Before we start looking into implementation, lets look at pre-requisition:

SQL Server 2008 Enterprise Edition

Sysadmin permission required at database level to enable CDC

dbo permission required to enable CDC at table level

Primary Key required in table only when net change is required

Now, lets start how to implement it:

I have not used any particular database name or table name while writing step by step implementation guide. Hence please follow syntax pattern with actual name for value indicated within <> brackets

Step 1: Enable CDC at database Level



Step 2: Enable CDC at Table Level







You are all set to start using changed data tracked.



You may visit http://technet.microsoft.com/en-us/library/bb510744.aspx for list of functions available for CDC

Implementation Overview





Conclusion

CDC is an excellent feature in SQL Server 2008 but it is very important to use it with proper planning. A lot many number of records will be added to the tracked table so ensure you have enough space to store changed data.

What is difference between Clustered Index and Non Clustered Index?

- A Clustered Index consists of index as well as data pages. Clustered Index is not just an index but also contains the table data. A clustered index is organized as a B-tree where the non-leaf nodes are index pages and the leaf nodes are data pages.

- A Non-clustered index is organized as a B-tree but it consists of only index pages. The leaf nodes in a non-clustered index are not data pages, but contains pointer for individual rows in a data pages.