What is normalization? Explain different levels of normalization?
Check out the article Q100139 from Microsoft knowledge base and of course, there's much more information available in the net. It'll be a good idea to get a hold of any RDBMS fundamentals text book, especially the one by C. J. Date. Most of the times, it will be okay if you can explain till third normal form.
What is denormalization and when would you go for it?
As the name indicates, denormalization is the reverse process of normalization. It's the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced.
How do you implement one-to-one, one-to-many and many-to-many relationships while designing tables?
One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships.
One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships.
Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table.
It will be a good idea to read up a database designing fundamentals text book.
What's the difference between a primary key and a unique key?
Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key allows one NULL only.
What are user defined datatypes and when you should go for them?
User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called Flight_Num which appears in many tables. In all these tables it should be varchar(8). In this case you could create a user defined datatype called Flight_num_type of varchar(8) and use it across all your tables.
See sp_addtype, sp_droptype in books online.
What is bit datatype and what's the information that can be stored inside a bit column?
Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.
Define candidate key, alternate key, composite key.
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys.
A key formed by combining at least two or more columns is called composite key.
What are defaults? Is there a column to which a default can't be bound?
A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can't have defaults bound to them. See CREATE DEFUALT in books online.
Back to top
SQL Server architecture (top)
What is a transaction and what are ACID properties?
A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction. For more information and explanation of these properties, see SQL Server books online or any RDBMS fundamentals text book.
Explain different isolation levels
An isolation level determines the degree of isolation of data between concurrent transactions. The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable. See SQL Server books online for an explanation of the isolation levels. Be sure to read about SET TRANSACTION ISOLATION LEVEL, which lets you customize the isolation level at the connection level.
CREATE INDEX myIndex ON myTable(myColumn)
What type of Index will get created after executing the above statement?
Non-clustered index. Important thing to note: By default a clustered index gets created on the primary key, unless specified otherwise.
What's the maximum size of a row?
8060 bytes. Don't be surprised with questions like 'what is the maximum number of columns per table'. Check out SQL Server books online for the page titled: "Maximum Capacity Specifications".
Explain Active/Active and Active/Passive cluster configurations
Hopefully you have experience setting up cluster servers. But if you don't, at least be familiar with the way clustering works and the two clusterning configurations Active/Active and Active/Passive. SQL Server books online has enough information on this topic and there is a good white paper available on Microsoft site.
Explain the architecture of SQL Server
This is a very important question and you better be able to answer it if consider yourself a DBA. SQL Server books online is the best place to read about SQL Server architecture. Read up the chapter dedicated to SQL Server Architecture.
What is lock escalation?
Lock escalation is the process of converting a lot of low level locks (like row locks, page locks) into higher level locks (like table locks). Every lock is a memory structure too many locks would mean, more memory being occupied by locks. To prevent this from happening, SQL Server escalates the many fine-grain locks to fewer coarse-grain locks. Lock escalation threshold was definable in SQL Server 6.5, but from SQL Server 7.0 onwards it's dynamically managed by SQL Server.
What's the difference between DELETE TABLE and TRUNCATE TABLE commands?
DELETE TABLE is a logged operation, so the deletion of each row gets logged in the transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table, but it won't log the deletion of each row, instead it logs the deallocation of the data pages of the table, which makes it faster. Of course, TRUNCATE TABLE can be rolled back.
Explain the storage models of OLAP
Check out MOLAP, ROLAP and HOLAP in SQL Server books online for more infomation.
What are the new features introduced in SQL Server 2000 (or the latest release of SQL Server at the time of your interview)? What changed between the previous version of SQL Server and the current version?
This question is generally asked to see how current is your knowledge. Generally there is a section in the beginning of the books online titled "What's New", which has all such information. Of course, reading just that is not enough, you should have tried those things to better answer the questions. Also check out the section titled "Backward Compatibility" in books online which talks about the changes that have taken place in the new version.
What are constraints? Explain different types of constraints.
Constraints enable the RDBMS enforce the integrity of the database automatically, without needing you to create triggers, rule or defaults.
Types of constraints: NOT NULL, CHECK, UNIQUE, PRIMARY KEY, FOREIGN KEY
For an explanation of these constraints see books online for the pages titled: "Constraints" and "CREATE TABLE", "ALTER TABLE"
Whar is an index? What are the types of indexes? How many clustered indexes can be created on a table? I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?
Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.
Indexes are of two types. Clustered indexes and non-clustered indexes. When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.
If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.
Back to top
Database administration (top)
What is RAID and what are different types of RAID configurations?
RAID stands for Redundant Array of Inexpensive Disks, used to provide fault tolerance to database servers. There are six RAID levels 0 through 5 offering different levels of performance, fault tolerance. MSDN has some information about RAID levels and for detailed information, check out the RAID advisory board's homepage
What are the steps you will take to improve performance of a poor performing query?
This is a very open ended question and there could be a lot of reasons behind the poor performance of a query. But some general issues that you could talk about would be: No indexes, table scans, missing or out of date statistics, blocking, excess recompilations of stored procedures, procedures and triggers without SET NOCOUNT ON, poorly written query with unnecessarily complicated joins, too much normalization, excess usage of cursors and temporary tables.
Some of the tools/ways that help you troubleshooting performance problems are: SET SHOWPLAN_ALL ON, SET SHOWPLAN_TEXT ON, SET STATISTICS IO ON, SQL Server Profiler, Windows NT /2000 Performance monitor, Graphical execution plan in Query Analyzer.
Download the white paper on performance tuning SQL Server from Microsoft web site. Don't forget to check out sql-server-performance.com
What are the steps you will take, if you are tasked with securing an SQL Server?
Again this is another open ended question. Here are some things you could talk about: Preferring NT authentication, using server, databse and application roles to control access to the data, securing the physical database files using NTFS permissions, using an unguessable SA password, restricting physical access to the SQL Server, renaming the Administrator account on the SQL Server computer, disabling the Guest account, enabling auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL Server from the web server etc.
Read the white paper on SQL Server security from Microsoft website. Also check out My SQL Server security best practices
What is a deadlock and what is a live lock? How will you go about resolving deadlocks?
Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process.
A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.
Check out SET DEADLOCK_PRIORITY and "Minimizing Deadlocks" in SQL Server books online. Also check out the article Q169960 from Microsoft knowledge base.
What is blocking and how would you troubleshoot it?
Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.
Read up the following topics in SQL Server books online: Understanding and avoiding blocking, Coding efficient transactions.
Explain CREATE DATABASE syntax
Many of us are used to craeting databases from the Enterprise Manager or by just issuing the command: CREATE DATABAE MyDB. But what if you have to create a database with two filegroups, one on drive C and the other on drive D with log on drive E with an initial size of 600 MB and with a growth factor of 15%? That's why being a DBA you should be familiar with the CREATE DATABASE syntax. Check out SQL Server books online for more information.
How to restart SQL Server in single user mode? How to start SQL Server in minimal configuration mode?
SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has some very important parameters with which a DBA should be familiar with. -m is used for starting SQL Server in single user mode and -f is used to start the SQL Server in minimal confuguration mode. Check out SQL Server books online for more parameters and their explanations.
As a part of your job, what are the DBCC commands that you commonly use for database maintenance?
DBCC CHECKDB, DBCC CHECKTABLE, DBCC CHECKCATALOG, DBCC CHECKALLOC, DBCC SHOWCONTIG, DBCC SHRINKDATABASE, DBCC SHRINKFILE etc. But there are a whole load of DBCC commands which are very useful for DBAs. Check out SQL Server books online for more information.
What are statistics, under what circumstances they go out of date, how do you update them?
Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.
Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version
Look up SQL Server books online for the following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats
What are the different ways of moving data/databases between servers and databases in SQL Server?
There are lots of options available, you have to choose your option depending upon your requirements. Some of the options you have are: BACKUP/RESTORE, dettaching and attaching databases, replication, DTS, BCP, logshipping, INSERT...SELECT, SELECT...INTO, creating INSERT scripts to generate data.
Explian different types of BACKUPs avaialabe in SQL Server? Given a particular scenario, how would you go about choosing a backup plan?
Types of backups you can create in SQL Sever 7.0+ are Full database backup, differential database backup, transaction log backup, filegroup backup. Check out the BACKUP and RESTORE commands in SQL Server books online. Be prepared to write the commands in your interview. Books online also has information on detailed backup/restore architecture and when one should go for a particular kind of backup.
What is database replicaion? What are the different types of replication you can set up in SQL Server?
Replication is the process of copying/moving data between databases on the same or different servers. SQL Server supports the following types of replication scenarios:
Snapshot replication
Transactional replication (with immediate updating subscribers, with queued updating subscribers)
Merge replication
See SQL Server books online for indepth coverage on replication. Be prepared to explain how different replication agents function, what are the main system tables used in replication etc.
How to determine the service pack currently installed on SQL Server?
The global variable @@Version stores the build number of the sqlservr.exe, which is used to determine the service pack installed. To know more about this process visit SQL Server service packs and versions.
Back to top
Database programming (top)
What are cursors? Explain different types of cursors. What are the disadvantages of cursors? How can you avoid cursors?
Cursors allow row-by-row prcessing of the resultsets.
Types of cursors: Static, Dynamic, Forward-only, Keyset-driven. See books online for more information.
Disadvantages of cursors: Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one rowundtrip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Furthere, there are restrictions on the SELECT statements that can be used with some types of cursors.
Most of the times, set based operations can be used instead of cursors. Here is an example:
If you have to give a flat hike to your employees using the following criteria:
Salary between 30000 and 40000 -- 5000 hike
Salary between 40000 and 55000 -- 7000 hike
Salary between 55000 and 65000 -- 9000 hike
In this situation many developers tend to use a cursor, determine each employee's salary and update his salary according to the above formula. But the same can be achieved by multiple update statements or can be combined in a single UPDATE statement as shown below:
UPDATE tbl_emp SET salary =
CASE WHEN salary BETWEEN 30000 AND 40000 THEN salary + 5000
WHEN salary BETWEEN 40000 AND 55000 THEN salary + 7000
WHEN salary BETWEEN 55000 AND 65000 THEN salary + 10000
END
Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don't have to use cursors for this. This can be achieved using WHILE loop, as long as there is a unique key to identify each row. For examples of using WHILE loop for row by row processing, check out the 'My code library' section of my site or search for WHILE.
Write down the general syntax for a SELECT statements covering all the options.
Here's the basic syntax: (Also checkout SELECT in books online for advanced syntax).
SELECT select_list
[INTO new_table_]
FROM table_source
[WHERE search_condition]
[GROUP BY group_by_expression]
[HAVING search_condition]
[ORDER BY order_expression [ASC | DESC] ]
What is a join and explain different types of joins.
Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.
Types of joins: INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.
For more information see pages from books online titled: "Join Fundamentals" and "Using Joins".
Can you have a nested transaction?
Yes, very much. Check out BEGIN TRAN, COMMIT, ROLLBACK, SAVE TRAN and @@TRANCOUNT
What is an extended stored procedure? Can you instantiate a COM object by using T-SQL?
An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL, just the way we call normal stored procedures using the EXEC statement. See books online to learn how to create extended stored procedures and how to add them to SQL Server.
Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreate stored procedure. Also see books online for sp_OAMethod, sp_OAGetProperty, sp_OASetProperty, sp_OADestroy. For an example of creating a COM object in VB and calling it from T-SQL, see 'My code library' section of this site.
What is the system function to get the current user's user id?
USER_ID(). Also check out other system functions like USER_NAME(), SYSTEM_USER, SESSION_USER, CURRENT_USER, USER, SUSER_SID(), HOST_NAME().
What are triggers? How many triggers you can have on a table? How to invoke a trigger on demand?
Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table.
In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action. But in 7.0 there's no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder
Triggers can't be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined.
Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster.
Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online for INSTEAD OF triggers.
Also check out books online for 'inserted table', 'deleted table' and COLUMNS_UPDATED()
There is a trigger defined for INSERT operations on a table, in an OLTP system. The trigger is written to instantiate a COM object and pass the newly insterted rows to it for some custom processing. What do you think of this implementation? Can this be implemented better?
Instantiating COM objects is a time consuming process and since you are doing it from within a trigger, it slows down the data insertion process. Same is the case with sending emails from triggers. This scenario can be better implemented by logging all the necessary data into a separate table, and have a job which periodically checks this table and does the needful.
What is a self join? Explain it with an example.
Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.
CREATE TABLE emp
(
empid int,
mgrid int,
empname char(10)
)
INSERT emp SELECT 1,2,'Vyas'
INSERT emp SELECT 2,3,'Mohan'
INSERT emp SELECT 3,NULL,'Shobha'
INSERT emp SELECT 4,2,'Shridhar'
INSERT emp SELECT 5,2,'Sourabh'
SELECT t1.empname [Employee], t2.empname [Manager]
FROM emp t1, emp t2
WHERE t1.mgrid = t2.empid
Here's an advanced query using a LEFT OUTER JOIN that even returns the employees without managers (super bosses)
SELECT t1.empname [Employee], COALESCE(t2.empname, 'No manager') [Manager]
FROM emp t1
LEFT OUTER JOIN
emp t2
ON
t1.mgrid = t2.empid
Learning about Sql server2005/2008,SSIS,SSRS and SSAS
Wednesday, 20 October 2010
How to Read Raw Files in SSIS
One of the big benefits of working with raw files is the ease with which you can add them to your packages. The raw file destination requires you to specify only the name of file to use and the columns you want to store. Conversely, if you use a text file destination, then you have to build the structure of the file and ensure it makes your data flow. If you use an OLEDB destination, then you have to have a database and also a table created to store the data.
Their ease of configuration makes raw files ideal for storing errors in your data flows. In the past I've been guilty of configuring certain components of my packages to ignore failures, on the assumption that it "can't error". Of course, this isn't ideal and a much better approach is to capture all errors and handle them appropriately. This is where raw files are very useful. All you need to do is direct your error flows to a raw file destination, specify and filename and select the columns you want to store. If errors do occur, then they will be stored in the raw files.
So, you're storing any errors in a raw file, and now you want to review them – but you can't because of the proprietary format used. This is the exactly the srot fo scenario for which my raw file reader is designed.
Using the Raw File Reader
The Raw File Reader is designed to make it very simple to read the contents or raw files. A few of the key features are as follows ( a full list can be seen at the above link):
•Read any SSIS Raw file
•Results displayed in a grid
•Columns automatically resized
•Order of columns can be changed
•Data can be copied from the results grid using CTRL+C
•Column headers can be included or excluded from the copied data
•Notification (and option to reload) when the currently loaded raw file has changed i.e. package is rerun
Opening a raw file
The simplest operation is to open a raw file (File | Open). This process will, by default, read the raw file configuration and then immediately read the data from the raw file. The configuration is displayed in a list at the top of the form and the data is displayed in a grid at the bottom of the form (you can change this default behaviour so that the data is not automatically loaded – see later).
Once the data from your raw file has been loaded you can sort the data and copy it into another application. Once a raw file has successfully been read, the filename will appear in the Recent Files section of the File menu. The last 10 files opened will be displayed here.
Sorting and copying data
Data on the grid can be sorted and copied using standard "excel-style" functionality. To sort data, simply click on the column heading of the column you wish to sort. Clicking on a column heading for the second time will reverse the order of the sorting.
You can select a cell from the grid (or multiple cells, by dragging the cursor and/or using the CTRL key) and copy to the clipboard using CRL+C, from where they can be pasted into other applications. When the data is copied to the clipboard the area copied will be the maximum area required to include all the selected cells, however cells not selected will be blank. In the example below, we copy only the object name and the object type:
When the copied data is pasted into Excel you can see that, by default, column headings are included but the data in the unselected cells are not included (you can change this default behaviour – see the Options section, later).
Opening the raw files associated with a package
Opening every raw file individually may be time consuming. Therefore, the tool provides the option of opening every raw file associated with a package. Simply, open the package configuration, identify the raw file destinations and open the files that are specified for these destinations. At present the filename for the raw file destinations must be configured directly, not via expressions or variables.
When you open the package, all of the raw files for the package will be opened, and can be organized using the options in the Window menu. As with opening raw files, recently opened packages are included in the Recent Projects list on the File menu.
Working with raw files whilst debugging packages
The Raw File Reader will automatically reload a raw file when it detects the file has changed. This is a configurable option and so can be turned off if required. If you turn this option off you can manually reload the file by using the Read File command in the Tools menu.
Reading large files
Raw files are often large. If you are using large files I recommend that you disable the option that enables the raw file data to be loaded as soon as the file is opened (see the Options section). Once the configuration has been read, you choose to load only the required columns, by checking and un-checking the columns in the Raw file columns pane at the top of the window. This can drastically reduce the time it takes to load a file. If you want to select/de-select more than one column at a time, select them using the SHIFT or CTRL keys and then click on one of the check boxes. All the selected items will then be changed to the state of the checkbox selected.
Once you have selected the columns you want, the raw file can be read by using the Read File command in the Tools menu.
Options
The configurable options can be accessed by the Options menu in the Tools menu. The copy options configures when headers are included with copied cells.
Option
Description
Always include headers
The column headers, for the cells copied, are always included, irrespective of the number or distribution of cells selected.
Never include headers
Headers are never included
Auto include headers
Column headers are only included if the cells selected are in more than one column. This enables you to copy data from a single column without column headers being added.
Other options are as follows:
Option
Description
Reload file when it changes
This enables the application to reload the data for a raw file automatically when the application detects the raw file has changed
Load file when opened
If selected the data from a raw file is read immediately that a file is opened. If deselected then only the raw file configuration is read when the file is opened.
Raw file extensions
There is no standard naming for raw file extensions, this allows you to configure the extension that the Open File dialog uses to determine a raw file. Multiple extensions can be specified by separating them by semi colons ";"
Future enhancements
I am looking to develop the application further and would appreciate comments on what features you would like to see in future versions. Some of those currently on my list are
1.Changing the raw file columns pane to display the meta data for each column i.e. data types, sizes etc, this will allow sorting of columns
2.Row numbers
3.Extra column headers details
4.Paging rows
5.Monitoring a folder for raw files
Their ease of configuration makes raw files ideal for storing errors in your data flows. In the past I've been guilty of configuring certain components of my packages to ignore failures, on the assumption that it "can't error". Of course, this isn't ideal and a much better approach is to capture all errors and handle them appropriately. This is where raw files are very useful. All you need to do is direct your error flows to a raw file destination, specify and filename and select the columns you want to store. If errors do occur, then they will be stored in the raw files.
So, you're storing any errors in a raw file, and now you want to review them – but you can't because of the proprietary format used. This is the exactly the srot fo scenario for which my raw file reader is designed.
Using the Raw File Reader
The Raw File Reader is designed to make it very simple to read the contents or raw files. A few of the key features are as follows ( a full list can be seen at the above link):
•Read any SSIS Raw file
•Results displayed in a grid
•Columns automatically resized
•Order of columns can be changed
•Data can be copied from the results grid using CTRL+C
•Column headers can be included or excluded from the copied data
•Notification (and option to reload) when the currently loaded raw file has changed i.e. package is rerun
Opening a raw file
The simplest operation is to open a raw file (File | Open). This process will, by default, read the raw file configuration and then immediately read the data from the raw file. The configuration is displayed in a list at the top of the form and the data is displayed in a grid at the bottom of the form (you can change this default behaviour so that the data is not automatically loaded – see later).
Once the data from your raw file has been loaded you can sort the data and copy it into another application. Once a raw file has successfully been read, the filename will appear in the Recent Files section of the File menu. The last 10 files opened will be displayed here.
Sorting and copying data
Data on the grid can be sorted and copied using standard "excel-style" functionality. To sort data, simply click on the column heading of the column you wish to sort. Clicking on a column heading for the second time will reverse the order of the sorting.
You can select a cell from the grid (or multiple cells, by dragging the cursor and/or using the CTRL key) and copy to the clipboard using CRL+C, from where they can be pasted into other applications. When the data is copied to the clipboard the area copied will be the maximum area required to include all the selected cells, however cells not selected will be blank. In the example below, we copy only the object name and the object type:
When the copied data is pasted into Excel you can see that, by default, column headings are included but the data in the unselected cells are not included (you can change this default behaviour – see the Options section, later).
Opening the raw files associated with a package
Opening every raw file individually may be time consuming. Therefore, the tool provides the option of opening every raw file associated with a package. Simply, open the package configuration, identify the raw file destinations and open the files that are specified for these destinations. At present the filename for the raw file destinations must be configured directly, not via expressions or variables.
When you open the package, all of the raw files for the package will be opened, and can be organized using the options in the Window menu. As with opening raw files, recently opened packages are included in the Recent Projects list on the File menu.
Working with raw files whilst debugging packages
The Raw File Reader will automatically reload a raw file when it detects the file has changed. This is a configurable option and so can be turned off if required. If you turn this option off you can manually reload the file by using the Read File command in the Tools menu.
Reading large files
Raw files are often large. If you are using large files I recommend that you disable the option that enables the raw file data to be loaded as soon as the file is opened (see the Options section). Once the configuration has been read, you choose to load only the required columns, by checking and un-checking the columns in the Raw file columns pane at the top of the window. This can drastically reduce the time it takes to load a file. If you want to select/de-select more than one column at a time, select them using the SHIFT or CTRL keys and then click on one of the check boxes. All the selected items will then be changed to the state of the checkbox selected.
Once you have selected the columns you want, the raw file can be read by using the Read File command in the Tools menu.
Options
The configurable options can be accessed by the Options menu in the Tools menu. The copy options configures when headers are included with copied cells.
Option
Description
Always include headers
The column headers, for the cells copied, are always included, irrespective of the number or distribution of cells selected.
Never include headers
Headers are never included
Auto include headers
Column headers are only included if the cells selected are in more than one column. This enables you to copy data from a single column without column headers being added.
Other options are as follows:
Option
Description
Reload file when it changes
This enables the application to reload the data for a raw file automatically when the application detects the raw file has changed
Load file when opened
If selected the data from a raw file is read immediately that a file is opened. If deselected then only the raw file configuration is read when the file is opened.
Raw file extensions
There is no standard naming for raw file extensions, this allows you to configure the extension that the Open File dialog uses to determine a raw file. Multiple extensions can be specified by separating them by semi colons ";"
Future enhancements
I am looking to develop the application further and would appreciate comments on what features you would like to see in future versions. Some of those currently on my list are
1.Changing the raw file columns pane to display the meta data for each column i.e. data types, sizes etc, this will allow sorting of columns
2.Row numbers
3.Extra column headers details
4.Paging rows
5.Monitoring a folder for raw files
Adding a KPI to an SQL Server Analysis Services Cube
In SQL Server Analysis Services (SSAS), you can add key performance indicators (KPIs) to your database cube in order to evaluate business performance, as reflected in the cube data. A KPI is associated with a measure group and is made up of a set of calculations. Typically, the calculations are a combination of calculated members and Multidimensional Expressions (MDX) statements.
A KPI consists of four main properties that are important to evaluating business performance:
•Value Expression. An MDX expression that returns the KPI’s actual value.
•Goal Expression. An MDX expression that returns the KPI’s target value.
•Status Expression. An MDX expression that returns the KPI’s state at a specific point in time.
•Trend Expression. An MDX expression that returns the KPI’s value over time.
In addition to these components, there are other properties that you can configure, but these four components make up the heart of your KPI.
As you work through the process of creating a KPI, you’ll get a better sense of what each of these properties means and how they relate to one another. In this article, I show you how to add a KPI to a cube in an Analysis Services 2008 database. The example I demonstrate is based on the solution from an earlier article I wrote, Five Basic Steps for Implementing an Analysis Services Database". For that solution, I created the following database components:
1.A data source that points to the AdventureWorksDW2008 database on a local instance of SQL Server 2008.
2.A data source view that includes the tables shown in Figure 1.
3.Database dimensions based on each dimension table in the data source view.
4.A cube based on the database dimensions as well as on the two fact tables in the data source view.
Figure 1: Default diagram from the Sales data source view
Be sure to refer to the article mentioned above for more details about the solution. In addition, if you don’t know how to implement a basic cube in Analysis Services, read that article first and refer to SQL Server Books Online for additional information. Once you know how to implement an Analysis Services database, you’re ready to add KPIs to your solution.
Creating a Calculated Member
When you create a KPI, you base one or more of your expressions on members in a measure group or dimension. However, in some cases, the existing members don’t support the type of KPI you want to create, at least not in their current form. If that’s the case, you can create a calculated member, which is similar to creating a computed column in a SQL Server database.
To create a calculated member, open your Analysis Services project in SQL Server Business Intelligence Development Studio (BIDS), and then open the cube in which you want to create your KPI. (For this article, I’m adding the KPI to the Sales cube.) In Cube Designer, click the Calculations tab, and then click the New Calculated Member button. A new calculation form opens in the right pane, as shown in Figure 2.
Figure 2: Form for a new calculated member
You should first name the calculated member by typing the name in the Name text box. For this example, I use the following name:
[Profit Margin]
Notice that I enclose the name in brackets. If your name includes a space, as mine does, you must use the brackets.
Next, you should verify the setting for the Parent hierarchy property. By default, the property is set to Measures. Because we’re creating a calculated measure, this is the hierarchy we want to use.
After you select the hierarchy, you must define an MDX expression that determines the value for your calculated measure. In this case, I want to create a measure that provides the profit margin for each sale. The following expression calculates the margin by dividing the net profit by the sales amount:
([Measures].[Sales Amount] -
([Measures].[Total Product Cost] + [Measures].[Tax Amt] +
[Measures].[Freight])) /
[Measures].[Sales Amount]
Notice that I calculate the net profit by subtracting the total product cost, tax amount, and freight from the sales amount. (You might decide on a different formula for you net profit.) I then divide that total by the sales amount.
Note: When opening the Calculations tab, you probably noticed the list of measure groups and dimensions in the lower-left pane. You can drag a member from any of these hierarchies to your expression text box to add the fully qualified name of that member to the expression. Also note, the MDX expression shown above is a relatively simple one. You can, of course, create far more complex expressions. However, an in-depth discussion about MDX is beyond the scope of this article. For more information about MDX, see SQL Server Books Online.
After you create your expression for the calculated member, you can set additional properties. For this example, I set the Format string property to "Percent" and then select Fact Internet Sales in the Associated Measure Group property because I want to associate the calculated member with that measure group. Figure 3 shows what the form should look like after you’ve configured all the properties.
Figure 3: Creating the Profit Margin calculated member
That’s all there is to creating a calculated member. Be sure to save the project and then process the cube so the measure is available to your KPI. After you process your cube, you can verify that the measure has been successfully added by browsing the cube data and viewing the Profit Margin measure.
Creating a Key Performance Indicator
Now that your calculated measure is set up, you’re ready to create your KPI. In Cube Designer, click the KPIs tab, and then click the New KPI button. A new KPI form opens in the right pane, as shown in Figure 4.
Figure 4: Form for a new key performance indicator (KPI)
To configure the KPI, first provide a name. (I use Gross Profit for our example KPI.) Then select a value for the Associated measure group property. (I use Fact Internet Sales.) You’re now ready to add the necessary expressions to your KPI.
Adding the Value Expression
Your value expression should reflect the basic measure by which your KPI is gauged. The value returned by the expression serves as the foundation for your KPI. It’s the only one of the four expressions that’s required. For our example KPI, I use the following expression:
[Measures].[Profit Margin]
As you can see, I’m simply calling the Profit Margin calculated measure. Notice that, as you saw when creating your calculated member, I’m using the fully qualified member name. Also, as with the calculated member, you can drag the name from the hierarchies listed in the lower-left pane to the expression text box.
Adding the Goal Expression
As the name suggestions, the goal expression indicates what your organization is trying to achieve. For example, your profit margin might currently be at 25%, but your goal might be to reach 30%. And you can also set your goal to match more specific criteria. For instance, in the example KPI, I set the goal to vary depending on the specific sales territory group, as shown in the follow MDX expression:
Case
When [Territory].[Sales Territory Group]
Is [Territory].[Sales Territory Group].[Europe]
Then .34
When [Territory].[Sales Territory Group]
Is [Territory].[Sales Territory Group].[North America]
Then .36
When [Territory].[Sales Territory Group]
Is [Territory].[Sales Territory Group].[Pacific]
Then .32
Else .30
End
Notice that I use a Case statement to define my criteria. The Case statement includes three When expressions, one for each territory group. Each When expression identifies the member on which to base the expression and the member value, following the Is keyword. This is followed by a Then expression which defines what action to take. For example, the first When expression states that if the sales territory group is Europe, then the profit margin goal is 34%. However, the goal for the North American group is 36%, and the goal for the Pacific group is 32%. The Else clause then specifies that all other groups have a 30% goal.
Adding the Status Expression
Your status expression determines the current status of the KPI by comparing the goal expression to the value expression. For example, if your KPI value returns a 20% profit margin, but your goal is 30%, the status will indicate that you are below your goal. However, to arrive at the status, your status expression must return a value in the range of -1 to +1, where -1 indicates bad performance and +1 indicates good performance. For our example KPI, I use the following MDX expression to determine the status of performance:
Case
When KpiValue("Gross Profit") / KpiGoal("Gross Profit") > .90
Then 1
When KpiValue("Gross Profit") / KpiGoal("Gross Profit") <= .90 And KpiValue("Gross Profit") / KpiGoal("Gross Profit") > .80
Then 0
Else -1
End
Once again, I create a Case statement. In the first When expression, I divide the KPI value by the KPI goal and compare it to .90. If the value is greater than 90% of the goal, I assign the status value a +1. However, in the second When expression, I specify that if the KPI value is less than or equal to 90% and is greater than 80%, the status should be 0. Otherwise, the KPI status should be -1.
Notice that I use the KpiValue function to retrieve the KPI’s value and I use the KpiGoal function to retrieve the KPI’s goal. These functions make it easy to retrieve the value and goal within your MDX expressions.
One other thing I want to point out is the status indicator. For this example, I use the traffic light. That means, when the status value is 1, the traffic light will be green. If the status value is 0, the light will be yellow. Otherwise, the light will be red. For example, if my KPI goal is 30% and my value is 20%, the traffic light will be red. That’s because a 20% profit margin represents only about 67% my goal, which would evaluate to a -1 in the status expression. However, a value of 25% would evaluate to about 83%, which would mean a yellow light, and a value of 28% would evaluate to about 93%, which would result in a green light.
After you add the status expression and set up the status indicator, your KPI form should look similar to the one shown in Figure 5. (You have to scroll down to see the trend expression, which we’ll work on next.)
Figure 5: Creating the Gross Profit KPI
Adding the Trend Expression
The trend expression lets you compare how your KPI is performing over time. Although the status value can tell you how well you’re achieving you goal at a fixed point in time, it doesn’t tell you how those achievements compare to another point in time. For example, your organization might have reached 92% of its goal this year, but reached 95% the year before. Although the performance looks good when just viewing this year’s total, that performance measure doesn’t reveal that this is actually a downward trend. And this is what the trend expression allows you to find out.
For the Gross Profit example KPI, I use a trend expression that compares this year’s totals to the previous year, as shown in the following Case statement:
Case
When IsEmpty(ParallelPeriod([Order Date].[Calendar Year].[Calendar Year],
1, [Order Date].[Calendar Year]))
Then 0
When [Measures].[Profit Margin] >
(ParallelPeriod([Order Date].[Calendar Year].[Calendar Year],
1, [Order Date].[Calendar Year]), [Measures].[Profit Margin])
Then 1
When [Measures].[Profit Margin] =
(ParallelPeriod([Order Date].[Calendar Year].[Calendar Year],
1, [Order Date].[Calendar Year]), [Measures].[Profit Margin])
Then 0
Else -1
End
Notice that the first When expression includes the IsEmpty function and the ParallelPeriod function. The IsEmpty returns a value of true if the evaluated expression an empty cell value. Otherwise the function returns false. I use this function in the first When expression to determine whether the year preceding the current one exists in the cube data. If not, then the trend expression returns a 0, which indicates a neutral trend. (A trend expression, like the status expression, should return a value from -1 to +1.)
The ParallelPeriod function returns a member value from a previous period that’s in the same relative position as the current member. Because I specify Calendar Year, the previous member will also be based on Calendar Year. The function includes three arguments. The first argument returns the level of the hierarchy that you want to target. In this case, it’s the Calendar Year level of the Calendar Year hierarchy, which is part of the Order Date hierarchy. The second argument, 1, indicates the number of units that you want to go back. Because Calendar Year is the current position, the function will go back one year. The final argument specifies that I am basing the calculation on Calendar year. As a result, the first When expression in the Case statement above will determine if the previous calendar year exists, and if it doesn’t the trend will be set to 0.
The second When expression determines whether the profit margin for the current year is greater than the profit margin for the previous year. Notice that the third argument in the ParallelPeriod function now specifies Profit Margin, rather than Calendar Year. If the profit margin is greater in the current year, the trend value is set to +1. However, the third When expression determines whether the profit margin for the current year equals that of the previous year. If so, the trend value is set to 0. Otherwise the trend value is set to -1.
Notice also that you can set the trend indicator. You have three different types of arrows from which to choose. Or you can pick the smiley face!
Completing the Key Performance Indicator
A KPI supports properties in addition to the ones I’ve described so far. Although configuring those properties are beyond the scope of the article, you should have some idea how they work. If you click the Additional Properties down arrow at the bottom of the KPI form, you can view and configure the following properties:
•Display folder. The folder in which the KPI can be found when browsing the cube.
•Parent KPI.> A KPI that acts as the parent of the current KPI so the parent KPI can use the value of the child KPI.
•Current time member. An MDX expression that returns a member that identifies the KPI’s temporal context.
•Weight. An MDX expression that assigns a weight to a child KPI to indicate its relative importance in the parent KPI.
•Description. A description of the KPI.
After you’ve completed your KPI, you can then view its results, based on the current values in the cube data. To view the KPI, click the Browser View button on the KPIs tab. The browser view includes two panes. The top pane lets you define filters that determine what data the KPI uses, and the bottom pane displays the KPI. By default (before any filters are defined), the KPI calculates the KPI value for the entire data set. However, you can create filters that let you define the data for which you want to run the KPI.
For example, Figure 6 shows the filter I created for the Gross Profit KPI. I first selected Pacific as the sales territory group, and then selected 2004 as the calendar year. As you can see in the figure, the KPI value is 29.72%. Because the goal for the Pacific group is only 32%, the Pacific group reached nearly 93% of its goal for 2004, which means the status indicator is green. However, notice that the trend points downward. That’s because they Pacific group did better in 2003 (30.58%).
Figure 6: Browsing the Gross Profit KPI
As this example demonstrates, a KPI can be an effective way to get a quick read on your organization’s performance. Of course, how the KPI is displayed and what sort of status indicators you use will depend on the implementation of the client applications. But Analysis Services provides the features you need to make the KPI data easily available to your cube. And once you have an understanding of the basic components that make up a KPI, you can start implementing KPIs that support the various business needs of your organization.
A KPI consists of four main properties that are important to evaluating business performance:
•Value Expression. An MDX expression that returns the KPI’s actual value.
•Goal Expression. An MDX expression that returns the KPI’s target value.
•Status Expression. An MDX expression that returns the KPI’s state at a specific point in time.
•Trend Expression. An MDX expression that returns the KPI’s value over time.
In addition to these components, there are other properties that you can configure, but these four components make up the heart of your KPI.
As you work through the process of creating a KPI, you’ll get a better sense of what each of these properties means and how they relate to one another. In this article, I show you how to add a KPI to a cube in an Analysis Services 2008 database. The example I demonstrate is based on the solution from an earlier article I wrote, Five Basic Steps for Implementing an Analysis Services Database". For that solution, I created the following database components:
1.A data source that points to the AdventureWorksDW2008 database on a local instance of SQL Server 2008.
2.A data source view that includes the tables shown in Figure 1.
3.Database dimensions based on each dimension table in the data source view.
4.A cube based on the database dimensions as well as on the two fact tables in the data source view.
Figure 1: Default diagram from the Sales data source view
Be sure to refer to the article mentioned above for more details about the solution. In addition, if you don’t know how to implement a basic cube in Analysis Services, read that article first and refer to SQL Server Books Online for additional information. Once you know how to implement an Analysis Services database, you’re ready to add KPIs to your solution.
Creating a Calculated Member
When you create a KPI, you base one or more of your expressions on members in a measure group or dimension. However, in some cases, the existing members don’t support the type of KPI you want to create, at least not in their current form. If that’s the case, you can create a calculated member, which is similar to creating a computed column in a SQL Server database.
To create a calculated member, open your Analysis Services project in SQL Server Business Intelligence Development Studio (BIDS), and then open the cube in which you want to create your KPI. (For this article, I’m adding the KPI to the Sales cube.) In Cube Designer, click the Calculations tab, and then click the New Calculated Member button. A new calculation form opens in the right pane, as shown in Figure 2.
Figure 2: Form for a new calculated member
You should first name the calculated member by typing the name in the Name text box. For this example, I use the following name:
[Profit Margin]
Notice that I enclose the name in brackets. If your name includes a space, as mine does, you must use the brackets.
Next, you should verify the setting for the Parent hierarchy property. By default, the property is set to Measures. Because we’re creating a calculated measure, this is the hierarchy we want to use.
After you select the hierarchy, you must define an MDX expression that determines the value for your calculated measure. In this case, I want to create a measure that provides the profit margin for each sale. The following expression calculates the margin by dividing the net profit by the sales amount:
([Measures].[Sales Amount] -
([Measures].[Total Product Cost] + [Measures].[Tax Amt] +
[Measures].[Freight])) /
[Measures].[Sales Amount]
Notice that I calculate the net profit by subtracting the total product cost, tax amount, and freight from the sales amount. (You might decide on a different formula for you net profit.) I then divide that total by the sales amount.
Note: When opening the Calculations tab, you probably noticed the list of measure groups and dimensions in the lower-left pane. You can drag a member from any of these hierarchies to your expression text box to add the fully qualified name of that member to the expression. Also note, the MDX expression shown above is a relatively simple one. You can, of course, create far more complex expressions. However, an in-depth discussion about MDX is beyond the scope of this article. For more information about MDX, see SQL Server Books Online.
After you create your expression for the calculated member, you can set additional properties. For this example, I set the Format string property to "Percent" and then select Fact Internet Sales in the Associated Measure Group property because I want to associate the calculated member with that measure group. Figure 3 shows what the form should look like after you’ve configured all the properties.
Figure 3: Creating the Profit Margin calculated member
That’s all there is to creating a calculated member. Be sure to save the project and then process the cube so the measure is available to your KPI. After you process your cube, you can verify that the measure has been successfully added by browsing the cube data and viewing the Profit Margin measure.
Creating a Key Performance Indicator
Now that your calculated measure is set up, you’re ready to create your KPI. In Cube Designer, click the KPIs tab, and then click the New KPI button. A new KPI form opens in the right pane, as shown in Figure 4.
Figure 4: Form for a new key performance indicator (KPI)
To configure the KPI, first provide a name. (I use Gross Profit for our example KPI.) Then select a value for the Associated measure group property. (I use Fact Internet Sales.) You’re now ready to add the necessary expressions to your KPI.
Adding the Value Expression
Your value expression should reflect the basic measure by which your KPI is gauged. The value returned by the expression serves as the foundation for your KPI. It’s the only one of the four expressions that’s required. For our example KPI, I use the following expression:
[Measures].[Profit Margin]
As you can see, I’m simply calling the Profit Margin calculated measure. Notice that, as you saw when creating your calculated member, I’m using the fully qualified member name. Also, as with the calculated member, you can drag the name from the hierarchies listed in the lower-left pane to the expression text box.
Adding the Goal Expression
As the name suggestions, the goal expression indicates what your organization is trying to achieve. For example, your profit margin might currently be at 25%, but your goal might be to reach 30%. And you can also set your goal to match more specific criteria. For instance, in the example KPI, I set the goal to vary depending on the specific sales territory group, as shown in the follow MDX expression:
Case
When [Territory].[Sales Territory Group]
Is [Territory].[Sales Territory Group].[Europe]
Then .34
When [Territory].[Sales Territory Group]
Is [Territory].[Sales Territory Group].[North America]
Then .36
When [Territory].[Sales Territory Group]
Is [Territory].[Sales Territory Group].[Pacific]
Then .32
Else .30
End
Notice that I use a Case statement to define my criteria. The Case statement includes three When expressions, one for each territory group. Each When expression identifies the member on which to base the expression and the member value, following the Is keyword. This is followed by a Then expression which defines what action to take. For example, the first When expression states that if the sales territory group is Europe, then the profit margin goal is 34%. However, the goal for the North American group is 36%, and the goal for the Pacific group is 32%. The Else clause then specifies that all other groups have a 30% goal.
Adding the Status Expression
Your status expression determines the current status of the KPI by comparing the goal expression to the value expression. For example, if your KPI value returns a 20% profit margin, but your goal is 30%, the status will indicate that you are below your goal. However, to arrive at the status, your status expression must return a value in the range of -1 to +1, where -1 indicates bad performance and +1 indicates good performance. For our example KPI, I use the following MDX expression to determine the status of performance:
Case
When KpiValue("Gross Profit") / KpiGoal("Gross Profit") > .90
Then 1
When KpiValue("Gross Profit") / KpiGoal("Gross Profit") <= .90 And KpiValue("Gross Profit") / KpiGoal("Gross Profit") > .80
Then 0
Else -1
End
Once again, I create a Case statement. In the first When expression, I divide the KPI value by the KPI goal and compare it to .90. If the value is greater than 90% of the goal, I assign the status value a +1. However, in the second When expression, I specify that if the KPI value is less than or equal to 90% and is greater than 80%, the status should be 0. Otherwise, the KPI status should be -1.
Notice that I use the KpiValue function to retrieve the KPI’s value and I use the KpiGoal function to retrieve the KPI’s goal. These functions make it easy to retrieve the value and goal within your MDX expressions.
One other thing I want to point out is the status indicator. For this example, I use the traffic light. That means, when the status value is 1, the traffic light will be green. If the status value is 0, the light will be yellow. Otherwise, the light will be red. For example, if my KPI goal is 30% and my value is 20%, the traffic light will be red. That’s because a 20% profit margin represents only about 67% my goal, which would evaluate to a -1 in the status expression. However, a value of 25% would evaluate to about 83%, which would mean a yellow light, and a value of 28% would evaluate to about 93%, which would result in a green light.
After you add the status expression and set up the status indicator, your KPI form should look similar to the one shown in Figure 5. (You have to scroll down to see the trend expression, which we’ll work on next.)
Figure 5: Creating the Gross Profit KPI
Adding the Trend Expression
The trend expression lets you compare how your KPI is performing over time. Although the status value can tell you how well you’re achieving you goal at a fixed point in time, it doesn’t tell you how those achievements compare to another point in time. For example, your organization might have reached 92% of its goal this year, but reached 95% the year before. Although the performance looks good when just viewing this year’s total, that performance measure doesn’t reveal that this is actually a downward trend. And this is what the trend expression allows you to find out.
For the Gross Profit example KPI, I use a trend expression that compares this year’s totals to the previous year, as shown in the following Case statement:
Case
When IsEmpty(ParallelPeriod([Order Date].[Calendar Year].[Calendar Year],
1, [Order Date].[Calendar Year]))
Then 0
When [Measures].[Profit Margin] >
(ParallelPeriod([Order Date].[Calendar Year].[Calendar Year],
1, [Order Date].[Calendar Year]), [Measures].[Profit Margin])
Then 1
When [Measures].[Profit Margin] =
(ParallelPeriod([Order Date].[Calendar Year].[Calendar Year],
1, [Order Date].[Calendar Year]), [Measures].[Profit Margin])
Then 0
Else -1
End
Notice that the first When expression includes the IsEmpty function and the ParallelPeriod function. The IsEmpty returns a value of true if the evaluated expression an empty cell value. Otherwise the function returns false. I use this function in the first When expression to determine whether the year preceding the current one exists in the cube data. If not, then the trend expression returns a 0, which indicates a neutral trend. (A trend expression, like the status expression, should return a value from -1 to +1.)
The ParallelPeriod function returns a member value from a previous period that’s in the same relative position as the current member. Because I specify Calendar Year, the previous member will also be based on Calendar Year. The function includes three arguments. The first argument returns the level of the hierarchy that you want to target. In this case, it’s the Calendar Year level of the Calendar Year hierarchy, which is part of the Order Date hierarchy. The second argument, 1, indicates the number of units that you want to go back. Because Calendar Year is the current position, the function will go back one year. The final argument specifies that I am basing the calculation on Calendar year. As a result, the first When expression in the Case statement above will determine if the previous calendar year exists, and if it doesn’t the trend will be set to 0.
The second When expression determines whether the profit margin for the current year is greater than the profit margin for the previous year. Notice that the third argument in the ParallelPeriod function now specifies Profit Margin, rather than Calendar Year. If the profit margin is greater in the current year, the trend value is set to +1. However, the third When expression determines whether the profit margin for the current year equals that of the previous year. If so, the trend value is set to 0. Otherwise the trend value is set to -1.
Notice also that you can set the trend indicator. You have three different types of arrows from which to choose. Or you can pick the smiley face!
Completing the Key Performance Indicator
A KPI supports properties in addition to the ones I’ve described so far. Although configuring those properties are beyond the scope of the article, you should have some idea how they work. If you click the Additional Properties down arrow at the bottom of the KPI form, you can view and configure the following properties:
•Display folder. The folder in which the KPI can be found when browsing the cube.
•Parent KPI.> A KPI that acts as the parent of the current KPI so the parent KPI can use the value of the child KPI.
•Current time member. An MDX expression that returns a member that identifies the KPI’s temporal context.
•Weight. An MDX expression that assigns a weight to a child KPI to indicate its relative importance in the parent KPI.
•Description. A description of the KPI.
After you’ve completed your KPI, you can then view its results, based on the current values in the cube data. To view the KPI, click the Browser View button on the KPIs tab. The browser view includes two panes. The top pane lets you define filters that determine what data the KPI uses, and the bottom pane displays the KPI. By default (before any filters are defined), the KPI calculates the KPI value for the entire data set. However, you can create filters that let you define the data for which you want to run the KPI.
For example, Figure 6 shows the filter I created for the Gross Profit KPI. I first selected Pacific as the sales territory group, and then selected 2004 as the calendar year. As you can see in the figure, the KPI value is 29.72%. Because the goal for the Pacific group is only 32%, the Pacific group reached nearly 93% of its goal for 2004, which means the status indicator is green. However, notice that the trend points downward. That’s because they Pacific group did better in 2003 (30.58%).
Figure 6: Browsing the Gross Profit KPI
As this example demonstrates, a KPI can be an effective way to get a quick read on your organization’s performance. Of course, how the KPI is displayed and what sort of status indicators you use will depend on the implementation of the client applications. But Analysis Services provides the features you need to make the KPI data easily available to your cube. And once you have an understanding of the basic components that make up a KPI, you can start implementing KPIs that support the various business needs of your organization.
New Features in Reporting Services 2008 and 2008 R2
New and Enhanced Features in Reporting Services, SQL Server 2008
The new features in the SQL Server 2008 release of Reporting Services were in four general areas:
•Report Authoring
•Report Processing and Rendering
•Server Architecture and Tools
•Report Programmability
We will take a look at these in more detail.
Report Authoring
The Report Designer gained new data regions, a new data source type (a new data processing extension for Teradata) and an improved design / authoring experience. SQL Server 2008 Reporting Services also introduces two new data regions, the Tablix and Gauge. The Tablix data region combines the previously independent table, list and matrix into a single data region, with more versatile grouping of rows and columns to allow for better customization. You’ll like the impact you can get from the new Gauge data region (see Illustration 1), which you can use as a stand-alone gauge, several gauges inside a gauge panel, or within one of the Tablix options.
Illustration 1: The New Gauge Data Region (Example is a Radial Gauge)
A completely redesigned Chart region has appeared, supporting many new chart types, that’s also easier to use.
SQL Server 2008 Reporting Services also introduces support for richly formatted text, along with other enhancements to the Report Designer that make it more intuitive for report authors to access data and design elements. The Report Builder 2.0 now resembles Microsoft Office’s style more closely. The data layout and visualization is better than Report Builder 1.0. The user interface appears, upon launch, as depicted in Illustration 2.
Illustration 2: The User Interface for Report Builder 2.0
There are new report items which allow more choices and more control over page breaks when exporting reports to different renderers. Report Definition Language (RDL) elements have been added to support the new features we have already mentioned, and others to support the new report processing and report rendering models. Expressions now appear on the design canvas as placeholders (the placeholders are automatically substituted when we upgrade our SQL Server 2005 Reporting Services reports).
The Report Designer has been improved by:
•Replacing of the Data tab with the always-visible Data pane, which displays the entire inventory of report data sources, datasets, parameters, images, and built-in fields, and so forth.
•Adding Positioning devices such as rulers that show the current position of an item, and visible snap lines when we drag / resize items.
•Replacing the “deep dive” (multiple mouse clicks to get there) access to grouping for the Tablix data region (and therefore for the table, list and matrix), with a new, always-visible Grouping pane that provides a convenient means of creating and managing groups
Upgrading reports from previous versions of the data regions is a simple process. When we open, and then save, a report in the more recent version it happens automatically. However, the reports might also benefit from adding features that did not exist at the time the original reports were created. You’ll also need to test any upgraded report to check whether, by introducing new report items and RDL elements, you’ve altered the information they display before you re-deploy it.
Report Processing and Rendering
SQL Server 2008 Reporting Services included a new rendering extension for Microsoft Word, and improvements to the rendering extensions for Microsoft Excel and CSV. The improvements help to ensure that all rendering extensions handle soft and logical page breaks consistently.
As well as this, the report processing engine has been refactored to improve the scalability and performance of reports: you can also improve the processing of reports on large amounts of data by using On-Demand Report Processing: This enables the rendering of each report page as it is viewed ("on-demand").
Server Architecture and Tools
There is a new report server architecture in SQL Server 2008 Reporting Services that replaces the support provided by Internet Information Services (IIS) in previous versions. Reporting Services now has:
•Native support for HTTP.SYS and ASP.NET.
•URL management for site and virtual directory names.
•A new authentication layer.
•Health monitoring through newly designed memory management features.
This new architecture consolidates the Report Server Web service, Report Server Windows service, and Report Manager into a single service. This means that we can now define and manage a single service account for all report server applications; it also reduces the configuration and maintenance tasks we face, while making it easier to add custom functions and capabilities. It makes deployment simpler without removing features, or having to change the way we access the applications.
There is a better toolset for configuring and managing the report server in SQL Server 2008 Reporting Services that is used in:
•The Reporting Services Configuration tool.
•SQL Server Management Studio (SSMS),
•Report Manager (for a native mode report server).
•SharePoint application pages (for SharePoint integrated mode).
Data-driven subscriptions and job management are now supported in SharePoint integrated mode.
The authentication provided by IIS in previous versions is now handled by SQL Server 2008 Reporting Services through a new authentication subsystem. Both Windows-based and custom authentication is supported.
Health monitoring has been enhanced within SQL Server 2008 Reporting Services in a number of ways. New memory management features allow us to set up memory thresholds for report processing. We can now configure a maximum limit on memory, as well as interim thresholds that dictate report server response to changes in memory pressure. (The report server simply used all of available memory in previous releases.) To replace the log file generated by IIS in previous versions, now HTTP logging keeps a record of all HTTP requests handled by the report server. Moreover, a single log file (named ReportServerService .log) consolidates trace log files, and provide trace information for:
•The Report Server Web service.
•Report Manager.
•The background processing application.
Report Server Programmability
SQL Server 2008 Reporting Services introduces a new server extension called the Report Definition Customization extension. This provides report definition preprocessing, allowing us to dynamically customize a report definition before its passage to the processing engine, based upon values dictated within the RDL. For example, we can modify a report's layout based upon the language in which it is to be rendered.
In addition, the "feature gap" that existed between report servers implemented in native mode and SharePoint integrated mode has been narrowed dramatically. The newly added support for data-driven subscriptions and job management eliminates differences that existed between report servers implemented in the two modes, allowing them to offer equivalent functionality.
New and Enhanced Features in Reporting Services, SQL Server 2008 R2: The Details
Having examined the significant changes between SQL Server 2005 and 2008 Reporting Services, we will now focus upon what’s new in SQL Server 2008.
SharePoint Integration
The new features fall into two broad categories: those that apply to a report server configured for SharePoint integration mode ("Server-related enhancements") and those related to the SQL Server 2008 R2 Reporting Services Add-in for SharePoint applications. Some features apply to both categories.
Server-related Enhancements
Support for Multiple SharePoint Zones
In SQL Server 2008 Reporting Services, we have new options, via the alternate access mapping functionality in the SharePoint environment, to access report server items (in previous releases, report server items were available from only the default SharePoint zone). The latest release allows us to access these items from one or more of the following SharePoint zones:
•Default
•Intranet
•Extranet
•Internet
•Custom
This can be an advantage where SharePoint environment can be accessed by users from several zones. We could, for example, employ alternate access mapping to provide access to the same report server items from our primary SharePoint site for information consumers from the intranet and Internet zones.
Support for SharePoint User Token
Using a new server interface (called IRSSetUserToken), which débuts in SQL Server 2008 Reporting Services, we can use the SharePoint user token to connect to a SharePoint site, via the SharePoint Object Model, and then retrieve data from the site by using SharePoint credentials.
Claims Based Authentication
In addition to supporting the existing SharePoint User Tokens, SQL Server 2008 R2 Reporting Services in SharePoint integrated mode supports Claims Based Authentication.
Scripting with the rs Utility
Servers configured in SharePoint integrated mode are now supported using the rs utility, which can be employed to automate deployment and administration tasks.
Add-in for SharePoint - Related Enhancements and Additions
When you use the SQL Server 2008 R2 Reporting Services Add-in for SharePoint applications, you’ll notice some improvements. In order to get the most benefit, Microsoft recommends that we use the SQL Server 2008 R2 version of the add-in with a SQL Server 2008 R2 report server.
Integration Configuration in Fewer Steps
The new SQL Server 2008 R2 Reporting Services Add-in for SharePoint applications makes it easier to integrate a SharePoint server with a Reporting Services report server. In the current version, fewer steps are required to configure report server integration within SharePoint Central Administration.
Support for SharePoint Universal Logging Service
Reporting Services now provides several categories which can be configured to be a part of the SharePoint diagnostic logging. (SharePoint Foundation 2010 and SharePoint Server 2010 offer diagnostic logging to monitor activity on a SharePoint server.) The new categories are part of the 'SQL Server Reporting Services' log file area, and will log several categories of information, including the following:
•General: Logs events that involve access to the following items:
•Reporting Services Web pages
•Report Viewer HTTP handler
•Report access (.rdl files)
•Data sources (.rsds files)
•URLs on the SharePoint site (.smdl files)
•Database: Logs events that involve database access.
•Office Server General: Logs logon failures.
•Topology: Logs current user information.
•Web Parts: Logs events that involve access to the Report Viewer web part.
SQL Server 2008 R2 also makes available new categories for SharePoint diagnostic logging specifically for Reporting Services. The categories that log information with a product area of 'SQL Server Reporting Services' include the following:
•HTTP Redirector: Logs calls made by client applications to the report server.
•Configuration Pages: Logs calls from Reporting Services pages in SharePoint central administration
•UI Pages: Logs calls made by non-configuration-related pages within the SharePoint user interface.
•Soap Client Proxy: Logs calls made by the Reporting Services web pages or Report Viewer web part, to the report server.
•Local Mode Rendering: Logs call made by the Reporting Services rendering engine while in local mode.
•Local Mode Processing: Logs calls made by the Reporting Services processing engine while in local mode.
Getting Data from a SharePoint List
A new Microsoft SharePoint List and query experience allows us to easily bring SharePoint List information into a report when we designate a SharePoint List as a data source for reports.
Local Mode
Reports from Microsoft Access 2010 and the new Reporting Services SharePoint list data extension can run locally from the SharePoint document library (no connection to a SQL Server Reporting Services report server is required). In the new Local Mode we can use the Report Viewer to directly render reports from SharePoint (when the data extension supports local mode reporting).
Local mode supports rendering reports that have an embedded data source or a shared data source from an .rsds file. However, the management of the report or its associated data source is deliberately not supported in local mode, but only in connected mode.
HTTP requests
In the latest version of Reporting Services, an HTTP redirector has been put in place to listen for client requests (such as those from Report Builder) to the SharePoint web front end. The HTTP redirector will re-direct such requests to the Report Server.
Links directly use Document Libraries.
In SQL Server 2008 R2 Reporting Services, on a SharePoint server integrated with Reporting Services, report subscriptions and drill-through links work directly with the linked resources in the document library itself.
Increased language and character support
The SQL Server 2008 R2 Reporting Services related interface within SharePoint now supports 37 languages, complex text layout with Thai, and right-to-left character sets with Hebrew and Arabic, and. In addition to document library menus and user interface controls, support includes the Report Viewer web part.
Collaboration and Reuse
SQL Server 2008 R2 Reporting Services provides new features that support enhanced collaboration and reuse of components within the reporting environment. We can classify these new features within two groups: Report Parts and Shared Datasets.
Report Parts
Report Parts allow us to employ the various strengths and roles of team members. For example, a given team member can reuse, in his or her various reports, report parts that have been created and saved by another member, say, working within a developmental capacity. Previous versions of SQL Server Reporting Services did not offer a ready means of generating report parts (such as tables, charts and parameters) that might be used pervasively throughout our business intelligence solutions. SQL Server 2008 R2 Reporting Services supports our selection of individual report parts from a given report for uploading to a library on the report server, from which we can subsequently browse and select them for inclusion in prospective reports.
The items that we can publish as Report Parts include the following:
•Charts
•Gauges
•Images and embedded images
•Lists (via the Tablix data region)
•Maps
•Matrices (via the Tablix data region)
•Parameters
•Rectangles
•Tables (via the Tablix data region)
Report Parts are stored either on a report server, or within a SharePoint site that is integrated with a report server. Report Parts can be reused in multiple reports, and they can be updated on the server.
We can reuse report parts in many reports, where each report part we add to a given report uses a unique ID to maintain a relationship to the instance of the “parent” report part on the respective server or site. Report parts that display data (for example, a matrix, table, or chart) can now be based upon a shared dataset (see the section that follows); otherwise, when a report part is published, the dataset upon which it depends is saved as an embedded dataset. Moreover, embedded datasets can be based upon embedded data sources, but credentials are not stored in embedded data sources.
Choices for the type of dataset underlying a given Report Part can have security implications: if a Report Part depends upon an embedded dataset that uses an embedded data source, for example, the credentials for the embedded data source will need to be provided by anyone reusing this Report Part. (To avoid this, we would base our embedded and shared datasets upon shared data sources with stored credentials.)
Upgrade and use considerations:
•Because of the “portability” of report parts for uploading into a library from which selections will be made to reuse them, naming of the parts (perhaps we made a habit of leaving these at their defaults, such as “Tablix1” or “Chart3,” when we worked with previous versions) will take on a new importance. Descriptive naming conventions will become important in order to allow for intuitive reuse by other development team members, etc.
•From the perspective of the new “portability” of report parts, the full authoring and sharing can only be accomplished via the new Report Builder 3.0 (although reports can be authored in Business Intelligence Development Studio). In effect, we are limited in what we can do with report parts in Business Intelligence Development Studio; we can publish report parts from the Development Studio, but cannot select or reuse the existing parts.
Shared DatasetsThe second new type of Report Server item that we see in SQL Server 2008 R2 Reporting Services is Shared Datasets. Shared Datasets can retrieve data from shared data sources that connect to external data sources. Shared Datasets use only shared data sources (embedded data sources are not allowed). A Shared Dataset can be based on a report model, or upon any data source for a supported Reporting Services data extension.
Many of us have no doubt experienced the need to use identical datasets within reports we designed that are similar in presentation, or that use similar or identical information. (This is particularly common when the datasets act as data sources for report parameters that are common to multiple reports). Before SQL Server 2008 R2 Reporting Services, we were limited to copying such a dataset’s definition from the Report Definition Language (RDL) of one report to that of another report to allow the latter to “share” the same dataset (the Business Intelligence Development Studio did not afford us the ability to copy and paste datasets between reports). The only other alternative was to create an identical dataset from scratch within the new report.
A Shared Dataset offers numerous benefits; at the top of the list is the fact that it provides a way to share a query among many reports. This practice helps report authors and others to build reports upon consistent sets of data. The dataset query can include dataset parameters. Moreover, we can configure a Shared Dataset to cache query results for specific parameter combinations. We can perform such a cache upon first use or through the specification of a schedule. Shared Dataset caching can be used together with report data feeds and report caching to help manage access to a data source, as well as to optimize our use of system resources.
We can “share” a dataset within the Report Designer simply by selecting Convert to Shared Dataset on the context menu that appears when we right-click a pre-existing dataset within a report. Once we have converted a standard dataset to a shared dataset, we can select that dataset from among a list of any other shared datasets, within the reports of the current Report Server project, simply by ensuring the selection of the radio button labeled Use a shared dataset within the Dataset Properties dialog that appears when we go to add a new dataset in the Report Data pane, Datasets folder.
When added to a report, a Shared Dataset (like a report part) maintains a relationship to the definition of the “parent” dataset on the report server. Unlike Report Parts, when the definition is changed, we do not have to accept updates. All reports that have the relationship always use the Shared Dataset definition on the report server.
Data Sources
SQL Server 2008 R2 Reporting Services offers three new data sources types: Microsoft SQL Azure, Microsoft SQL Server Parallel Data Warehouse, and Microsoft SharePoint List. Each of these data sources types, together with a description of its nature and operation, appears in Table 1.
Data Sources Type
Description
Microsoft SQL Azure
Connects to SQL Server databases in the cloud and enables reports to retrieve and render report data from SQL Azure databases.
Microsoft SQL Server Parallel Data Warehouse
Connects to a Microsoft SQL Server Parallel Data Warehouse and enables reports to retrieve and render report data from SQL Server databases.
Microsoft SharePoint List
Connects to a SharePoint site and enables reports to retrieve and render report data from SharePoint lists.
Table 1: New Data Source Types in SQL Server 2008 R2 Reporting Services
Data Visualization
SQL Server 2008 R2 Reporting Services provides three new ways to visualize data in reports: Maps, Sparklines and Data Bars, and Indicators. Let’s take a look at each of these ways, together with a description of its nature and operation, in the sections below.
Maps:
The new Map Wizard and Map Layer Wizard in Report Designer allow us to add maps and map layers to our reports, an example of which we can see in Illustration 3, to help visualize data against a geographic background.
Illustration 3: Adding a Map Visualization to a Report
Once map elements are related with report data, we can control color, size, width, or marker type on any given layer, as well as add labels, legends, titles, a color scale, and a distance scale to help assist users in interpreting the map display. Moreover, we can add interactive features, such as tooltips and drillthrough links, or provide parameters that enable a user to interactively control the visibility of each layer, and much more.
Sparklines and Data Bars
Sparklines and Data Bars have the basic chart elements of categories, series, and values, but they have no legend, axis lines, labels, or tick marks. Both are simple charts used to convey much information in a little space (and often inline with text), and they can be easily used within matrices and tables.
Sparklines and Data Bars are most effective when many of them are presented together, for rapid visual comparison (making it easy to identify outliers). Each Sparkline often represents multiple data points over time. Because Sparklines display aggregated data, they must go in a cell that is associated with a group (and are not added to a detail group in a table). By contrast, Data Bars can represent multiple data points, but typically illustrate only one (with each Data Bar typically presenting a single series).
An example of the use of Sparklines is depicted in Illustration 4.
Illustration 4: Example of a Use for Sparklines
Indicators
Indicators are available in Report Builder 3.0 and Report Designer, and are icon-like, minimal gauges that convey the state of a single data value at a glance. They are typically used in matrices or tables to present data in rows or columns.
Indicators are often used to reflect:
•Trends – with directional images such as arrows;
•Ratings – using incremental icons such as stars;
•States (of being) – with traffic lights or check marks.
Once we drag the indicator item from the toolbox to a report we are crafting in Report Designer, we are prompted to make a selection among various options as shown in Illustration 5.
Illustration 5: Selecting an Indicator Type in Report Designer
Report Layout and Rendering
SQL Server 2008 R2 Reporting Services continues to add features designed to support more rendering options. Moreover, we can now use our reports as the source of data feeds and as exports to Microsoft Excel.
Naming Excel Worksheet Tabs
We can now generate the names of worksheet tabs when we export reports to Excel, using the properties of reports and page breaks in tandem. We can provide an initial page name of a report that can be exported as the default name of worksheet tabs, or use page breaks and page names to provide different names for each worksheet tab.
Rendering Reports to Data Feeds
Via enhancements in SQL Server 2008 R2 Reporting Services, including the new Atom rendering extension, we can export a production report as a data feed or create a report whose primary purpose is provide data, in the form of data feeds, to applications. The option to use reports as a data feed gives us an additional way to provide data to applications. We might, for example, prefer to hide the complexity of a data source and make it simpler to use the data: We might, alternatively do this when targeted data is not easy to access through client data providers.) Another benefit would be the availability of familiar Reporting Services features, such as Report Manager, security, report snapshots and scheduling, to manage the reports that now provide data feeds. Whatever the business need, the capability to render reports directly to data feeds means reusability of existing reports, as well as a means of making best use of all the effort of creating and securing the underlying datasets.
Report Pagination and Text Rotation
SQL Server 2008 R2 Reporting Services introduces enhancements to page breaks in several key components, including:
•Tablix data regions (table, matrix, and list)
•Groups
•Rectangles
When we delimit the pages of a report using page breaks, we can now assign different reset page numbering and page names to those pages. Furthermore, we can leverage expressions to establish the dynamic update of page names and page numbers at runtime, or the conditional (in addition to full) disablement of page breaks.
An example of the settings involved in creating a page break, based upon a group (Product Category) within a sample report, is depicted in Illustration 6.
Illustration 6: Making a Report Break when the Product Category Changes
We can now rotate standalone text boxes up to 270 degrees in:
•Report headers
•Report footers
•Report body
•Cells of tables
•Cells of tables
This rotation, and the display of text written vertically and bottom to top that it supports, can help us to better use “available real estate” to create more readable reports, fit more data on printed reports, and create reports with more graphical appeal.
Aggregates, Expressions, and Functions
Aggregates, expressions and functions gain power in SQL Server 2008 R2Reporting Services. This includes:
•Enhanced expression support – New globals are introduced, including:
◦OverallPageNumber and OverallTotalPages - Support for page numbering for the entire rendered report.
◦PageName- Support for naming pages.
◦RenderFormat - Support for information that is specified for the renderer.
A new read-only property (set by default) for report variables is introduced. This property provides a way to persist data across report rendering, page changes, and some types of user interaction. The value of the variable is set when the report is reprocessed, but is maintained in the current session. (This is the equivalent of setting the Writable property for a report variable in RDL.)
•Support for calculating aggregates of aggregates – we can now calculate aggregates of aggregates, and use this capability, as an example, to more precisely align horizontal and vertical axes for charts and scales for gauges (when nesting within a table, etc.).
•Lookup Functions – Expressions in data regions can now include references to functions which can retrieve names and values from a dataset that is not bound to the data region.
Report Authoring Tools
In SQL Server 2008 R2 Reporting Services both report authoring tools, Report Builder and Report Designer, been improved to make report design and creation easier and more efficient.
Enhanced Preview Experience in Report Builder 3.0
The introduction of edit sessions enables the reuse of cached datasets when previewing reports in Report Builder 3.0, resulting in quicker rendering. We can now reuse cached datasets when previewing reports, thanks to the introduction of edit sessions in Report Builder 3.0. This means not only quicker rendering, but, because edit sessions are bound to a report, we can use references to subreports, as well as relative references, in reports.
Easier Report Design and Editing
Report Builder 3.0 provides a user-interface for changing credentials when it is unable to connect to the data source. We can use the Enter Data Source Credentials dialog box to change the credentials used by Report Builder 3.0 at design time, to connect to the data source as the current Windows user, or provide a user name and password. (This is to compensate for the fact that run-time credentials, specified in the data source properties of a report, might not work for design time tasks such as creating queries and previewing reports, depending upon the setup of the local environment).
New Capabilities in the Graphical Query Designer
The new query designer provides a graphical user interface to create custom relationships between related and unrelated tables. The query designer also makes it easy to include aggregates and grouping that summarize data in queries. The graphical query designer can now be used to create queries that retrieve report data from Microsoft SQL Server, Microsoft SQL Azure, and Microsoft SQL Server Parallel Data Warehouse databases, as we noted earlier.
Similarly, the new graphical query designer for the Microsoft SharePoint List data source type simplifies the design and creation of queries that retrieve report data from SharePoint lists. The query designer lists the fields in each SharePoint list item on a SharePoint site for easy inclusion into a query; the designer also helps us to intuitively define filters within our queries to limit the amount of data returned.
Cache Refresh Plans
SQL Server 2008 R2 Reporting Services introduces cache refresh plans, enabling us to cache shared dataset query results or reports from a schedule, or upon first use. (Previous versions allowed us to control caching at the report level only, resulting in the simultaneous caching of all datasets.) We can schedule cache refreshes through an item-specific schedule or a shared schedule, managing and scheduling report caching separately from report subscriptions. This means that we can refresh certain report components independently, and with varying frequencies, from others. Cache refresh plans for commonly used parameter combinations, as one illustration, can help improve data retrieval and report viewing response times.
Report Manager Enhancements
Report Manager has been considerably updated in SQL Server 2008 R2 Reporting Services, to make design and creation easier. More noticeable changes in Report Manager include an enhanced layout, which provides easier navigation to manage report properties and report server items, as well as an updated color scheme. A new drop-down menu is accessible for each report or report server item in a folder, whereby we can access the various configuration options for the report or item we choose.
The Report Manager has improvements in navigation and workflow for viewing and managing reports and report server items, provided by the new drop-down menu to access various configuration options for each report or report server item in a folder. It Eliminates of the need to render a report before accessing and configuring report properties when in default view. It has an updated Report Viewer toolbar, which includes some updates to the toolbar controls, as well as the ability (discussed earlier) to export report data to an Atom service document and data feeds. It also provides more “real estate” for Report Viewer when rendering reports.
Business Intelligence Development Studio (BIDS)
The Business Intelligence Development Studio as installed with SQL Server 2008 R2 Reporting Services supports design, creation and maintenance of both SQL Server 2008 and SQL Server 2008 R2 reports, and of Report Server projects in the SQL Server 2008 R2 version of Business Intelligence Development Studio. This means that we can open, preview, save, and deploy either version of reports or Report Server projects.
We can set Report Server project properties to specify the version of the Report Server to which we wish to deploy reports, as well as to dictate how warnings and errors we might experience, when either upgrading a report from SQL Server 2008 to SQL Server 2008 R2, or reverting a report from SQL Server 2008 R2 to SQL Server 2008, are handled.
New Report Definition Language Schema and New Web Service Endpoint
The Report Definition Language (RDL) schema introduced in SQL Server 2008 R2 includes a new element, together with elements that define the map report item.
In order to support management operations of the report server in both native mode and SharePoint integrated mode, SQL Server 2008 R2 includes a new Web Service management endpoint, named ReportingService2010,that merges the functionalities of both the ReportingService2005 (which is used for managing objects on a report server that is configured for native mode) and the ReportingService2006 (used for managing objects on a report server that is configured for SharePoint integrated mode) endpoints. Features that are introduced in SQL Server 2008 R2, such as shared datasets and cache refresh (which we have discussed earlier), are included within the new endpoint.
Summary...
With the release of SQL Server 2008 Reporting Services, Microsoft introduced significant improvements that continued to roll out in the 2008 R2 update. We have reviewed many of these improvements in this article, and I hope that the extensive list makes a compelling case for upgrade, and the opportunity to develop and deploy better reports faster. Through the recent addition of entirely new capabilities, and the advances that have occurred in existing features, SQL Server Reporting Services delivers the maturity and power of a “best of class” enterprise reporting application.
The new features in the SQL Server 2008 release of Reporting Services were in four general areas:
•Report Authoring
•Report Processing and Rendering
•Server Architecture and Tools
•Report Programmability
We will take a look at these in more detail.
Report Authoring
The Report Designer gained new data regions, a new data source type (a new data processing extension for Teradata) and an improved design / authoring experience. SQL Server 2008 Reporting Services also introduces two new data regions, the Tablix and Gauge. The Tablix data region combines the previously independent table, list and matrix into a single data region, with more versatile grouping of rows and columns to allow for better customization. You’ll like the impact you can get from the new Gauge data region (see Illustration 1), which you can use as a stand-alone gauge, several gauges inside a gauge panel, or within one of the Tablix options.
Illustration 1: The New Gauge Data Region (Example is a Radial Gauge)
A completely redesigned Chart region has appeared, supporting many new chart types, that’s also easier to use.
SQL Server 2008 Reporting Services also introduces support for richly formatted text, along with other enhancements to the Report Designer that make it more intuitive for report authors to access data and design elements. The Report Builder 2.0 now resembles Microsoft Office’s style more closely. The data layout and visualization is better than Report Builder 1.0. The user interface appears, upon launch, as depicted in Illustration 2.
Illustration 2: The User Interface for Report Builder 2.0
There are new report items which allow more choices and more control over page breaks when exporting reports to different renderers. Report Definition Language (RDL) elements have been added to support the new features we have already mentioned, and others to support the new report processing and report rendering models. Expressions now appear on the design canvas as placeholders (the placeholders are automatically substituted when we upgrade our SQL Server 2005 Reporting Services reports).
The Report Designer has been improved by:
•Replacing of the Data tab with the always-visible Data pane, which displays the entire inventory of report data sources, datasets, parameters, images, and built-in fields, and so forth.
•Adding Positioning devices such as rulers that show the current position of an item, and visible snap lines when we drag / resize items.
•Replacing the “deep dive” (multiple mouse clicks to get there) access to grouping for the Tablix data region (and therefore for the table, list and matrix), with a new, always-visible Grouping pane that provides a convenient means of creating and managing groups
Upgrading reports from previous versions of the data regions is a simple process. When we open, and then save, a report in the more recent version it happens automatically. However, the reports might also benefit from adding features that did not exist at the time the original reports were created. You’ll also need to test any upgraded report to check whether, by introducing new report items and RDL elements, you’ve altered the information they display before you re-deploy it.
Report Processing and Rendering
SQL Server 2008 Reporting Services included a new rendering extension for Microsoft Word, and improvements to the rendering extensions for Microsoft Excel and CSV. The improvements help to ensure that all rendering extensions handle soft and logical page breaks consistently.
As well as this, the report processing engine has been refactored to improve the scalability and performance of reports: you can also improve the processing of reports on large amounts of data by using On-Demand Report Processing: This enables the rendering of each report page as it is viewed ("on-demand").
Server Architecture and Tools
There is a new report server architecture in SQL Server 2008 Reporting Services that replaces the support provided by Internet Information Services (IIS) in previous versions. Reporting Services now has:
•Native support for HTTP.SYS and ASP.NET.
•URL management for site and virtual directory names.
•A new authentication layer.
•Health monitoring through newly designed memory management features.
This new architecture consolidates the Report Server Web service, Report Server Windows service, and Report Manager into a single service. This means that we can now define and manage a single service account for all report server applications; it also reduces the configuration and maintenance tasks we face, while making it easier to add custom functions and capabilities. It makes deployment simpler without removing features, or having to change the way we access the applications.
There is a better toolset for configuring and managing the report server in SQL Server 2008 Reporting Services that is used in:
•The Reporting Services Configuration tool.
•SQL Server Management Studio (SSMS),
•Report Manager (for a native mode report server).
•SharePoint application pages (for SharePoint integrated mode).
Data-driven subscriptions and job management are now supported in SharePoint integrated mode.
The authentication provided by IIS in previous versions is now handled by SQL Server 2008 Reporting Services through a new authentication subsystem. Both Windows-based and custom authentication is supported.
Health monitoring has been enhanced within SQL Server 2008 Reporting Services in a number of ways. New memory management features allow us to set up memory thresholds for report processing. We can now configure a maximum limit on memory, as well as interim thresholds that dictate report server response to changes in memory pressure. (The report server simply used all of available memory in previous releases.) To replace the log file generated by IIS in previous versions, now HTTP logging keeps a record of all HTTP requests handled by the report server. Moreover, a single log file (named ReportServerService
•The Report Server Web service.
•Report Manager.
•The background processing application.
Report Server Programmability
SQL Server 2008 Reporting Services introduces a new server extension called the Report Definition Customization extension. This provides report definition preprocessing, allowing us to dynamically customize a report definition before its passage to the processing engine, based upon values dictated within the RDL. For example, we can modify a report's layout based upon the language in which it is to be rendered.
In addition, the "feature gap" that existed between report servers implemented in native mode and SharePoint integrated mode has been narrowed dramatically. The newly added support for data-driven subscriptions and job management eliminates differences that existed between report servers implemented in the two modes, allowing them to offer equivalent functionality.
New and Enhanced Features in Reporting Services, SQL Server 2008 R2: The Details
Having examined the significant changes between SQL Server 2005 and 2008 Reporting Services, we will now focus upon what’s new in SQL Server 2008.
SharePoint Integration
The new features fall into two broad categories: those that apply to a report server configured for SharePoint integration mode ("Server-related enhancements") and those related to the SQL Server 2008 R2 Reporting Services Add-in for SharePoint applications. Some features apply to both categories.
Server-related Enhancements
Support for Multiple SharePoint Zones
In SQL Server 2008 Reporting Services, we have new options, via the alternate access mapping functionality in the SharePoint environment, to access report server items (in previous releases, report server items were available from only the default SharePoint zone). The latest release allows us to access these items from one or more of the following SharePoint zones:
•Default
•Intranet
•Extranet
•Internet
•Custom
This can be an advantage where SharePoint environment can be accessed by users from several zones. We could, for example, employ alternate access mapping to provide access to the same report server items from our primary SharePoint site for information consumers from the intranet and Internet zones.
Support for SharePoint User Token
Using a new server interface (called IRSSetUserToken), which débuts in SQL Server 2008 Reporting Services, we can use the SharePoint user token to connect to a SharePoint site, via the SharePoint Object Model, and then retrieve data from the site by using SharePoint credentials.
Claims Based Authentication
In addition to supporting the existing SharePoint User Tokens, SQL Server 2008 R2 Reporting Services in SharePoint integrated mode supports Claims Based Authentication.
Scripting with the rs Utility
Servers configured in SharePoint integrated mode are now supported using the rs utility, which can be employed to automate deployment and administration tasks.
Add-in for SharePoint - Related Enhancements and Additions
When you use the SQL Server 2008 R2 Reporting Services Add-in for SharePoint applications, you’ll notice some improvements. In order to get the most benefit, Microsoft recommends that we use the SQL Server 2008 R2 version of the add-in with a SQL Server 2008 R2 report server.
Integration Configuration in Fewer Steps
The new SQL Server 2008 R2 Reporting Services Add-in for SharePoint applications makes it easier to integrate a SharePoint server with a Reporting Services report server. In the current version, fewer steps are required to configure report server integration within SharePoint Central Administration.
Support for SharePoint Universal Logging Service
Reporting Services now provides several categories which can be configured to be a part of the SharePoint diagnostic logging. (SharePoint Foundation 2010 and SharePoint Server 2010 offer diagnostic logging to monitor activity on a SharePoint server.) The new categories are part of the 'SQL Server Reporting Services' log file area, and will log several categories of information, including the following:
•General: Logs events that involve access to the following items:
•Reporting Services Web pages
•Report Viewer HTTP handler
•Report access (.rdl files)
•Data sources (.rsds files)
•URLs on the SharePoint site (.smdl files)
•Database: Logs events that involve database access.
•Office Server General: Logs logon failures.
•Topology: Logs current user information.
•Web Parts: Logs events that involve access to the Report Viewer web part.
SQL Server 2008 R2 also makes available new categories for SharePoint diagnostic logging specifically for Reporting Services. The categories that log information with a product area of 'SQL Server Reporting Services' include the following:
•HTTP Redirector: Logs calls made by client applications to the report server.
•Configuration Pages: Logs calls from Reporting Services pages in SharePoint central administration
•UI Pages: Logs calls made by non-configuration-related pages within the SharePoint user interface.
•Soap Client Proxy: Logs calls made by the Reporting Services web pages or Report Viewer web part, to the report server.
•Local Mode Rendering: Logs call made by the Reporting Services rendering engine while in local mode.
•Local Mode Processing: Logs calls made by the Reporting Services processing engine while in local mode.
Getting Data from a SharePoint List
A new Microsoft SharePoint List and query experience allows us to easily bring SharePoint List information into a report when we designate a SharePoint List as a data source for reports.
Local Mode
Reports from Microsoft Access 2010 and the new Reporting Services SharePoint list data extension can run locally from the SharePoint document library (no connection to a SQL Server Reporting Services report server is required). In the new Local Mode we can use the Report Viewer to directly render reports from SharePoint (when the data extension supports local mode reporting).
Local mode supports rendering reports that have an embedded data source or a shared data source from an .rsds file. However, the management of the report or its associated data source is deliberately not supported in local mode, but only in connected mode.
HTTP requests
In the latest version of Reporting Services, an HTTP redirector has been put in place to listen for client requests (such as those from Report Builder) to the SharePoint web front end. The HTTP redirector will re-direct such requests to the Report Server.
Links directly use Document Libraries.
In SQL Server 2008 R2 Reporting Services, on a SharePoint server integrated with Reporting Services, report subscriptions and drill-through links work directly with the linked resources in the document library itself.
Increased language and character support
The SQL Server 2008 R2 Reporting Services related interface within SharePoint now supports 37 languages, complex text layout with Thai, and right-to-left character sets with Hebrew and Arabic, and. In addition to document library menus and user interface controls, support includes the Report Viewer web part.
Collaboration and Reuse
SQL Server 2008 R2 Reporting Services provides new features that support enhanced collaboration and reuse of components within the reporting environment. We can classify these new features within two groups: Report Parts and Shared Datasets.
Report Parts
Report Parts allow us to employ the various strengths and roles of team members. For example, a given team member can reuse, in his or her various reports, report parts that have been created and saved by another member, say, working within a developmental capacity. Previous versions of SQL Server Reporting Services did not offer a ready means of generating report parts (such as tables, charts and parameters) that might be used pervasively throughout our business intelligence solutions. SQL Server 2008 R2 Reporting Services supports our selection of individual report parts from a given report for uploading to a library on the report server, from which we can subsequently browse and select them for inclusion in prospective reports.
The items that we can publish as Report Parts include the following:
•Charts
•Gauges
•Images and embedded images
•Lists (via the Tablix data region)
•Maps
•Matrices (via the Tablix data region)
•Parameters
•Rectangles
•Tables (via the Tablix data region)
Report Parts are stored either on a report server, or within a SharePoint site that is integrated with a report server. Report Parts can be reused in multiple reports, and they can be updated on the server.
We can reuse report parts in many reports, where each report part we add to a given report uses a unique ID to maintain a relationship to the instance of the “parent” report part on the respective server or site. Report parts that display data (for example, a matrix, table, or chart) can now be based upon a shared dataset (see the section that follows); otherwise, when a report part is published, the dataset upon which it depends is saved as an embedded dataset. Moreover, embedded datasets can be based upon embedded data sources, but credentials are not stored in embedded data sources.
Choices for the type of dataset underlying a given Report Part can have security implications: if a Report Part depends upon an embedded dataset that uses an embedded data source, for example, the credentials for the embedded data source will need to be provided by anyone reusing this Report Part. (To avoid this, we would base our embedded and shared datasets upon shared data sources with stored credentials.)
Upgrade and use considerations:
•Because of the “portability” of report parts for uploading into a library from which selections will be made to reuse them, naming of the parts (perhaps we made a habit of leaving these at their defaults, such as “Tablix1” or “Chart3,” when we worked with previous versions) will take on a new importance. Descriptive naming conventions will become important in order to allow for intuitive reuse by other development team members, etc.
•From the perspective of the new “portability” of report parts, the full authoring and sharing can only be accomplished via the new Report Builder 3.0 (although reports can be authored in Business Intelligence Development Studio). In effect, we are limited in what we can do with report parts in Business Intelligence Development Studio; we can publish report parts from the Development Studio, but cannot select or reuse the existing parts.
Shared DatasetsThe second new type of Report Server item that we see in SQL Server 2008 R2 Reporting Services is Shared Datasets. Shared Datasets can retrieve data from shared data sources that connect to external data sources. Shared Datasets use only shared data sources (embedded data sources are not allowed). A Shared Dataset can be based on a report model, or upon any data source for a supported Reporting Services data extension.
Many of us have no doubt experienced the need to use identical datasets within reports we designed that are similar in presentation, or that use similar or identical information. (This is particularly common when the datasets act as data sources for report parameters that are common to multiple reports). Before SQL Server 2008 R2 Reporting Services, we were limited to copying such a dataset’s definition from the Report Definition Language (RDL) of one report to that of another report to allow the latter to “share” the same dataset (the Business Intelligence Development Studio did not afford us the ability to copy and paste datasets between reports). The only other alternative was to create an identical dataset from scratch within the new report.
A Shared Dataset offers numerous benefits; at the top of the list is the fact that it provides a way to share a query among many reports. This practice helps report authors and others to build reports upon consistent sets of data. The dataset query can include dataset parameters. Moreover, we can configure a Shared Dataset to cache query results for specific parameter combinations. We can perform such a cache upon first use or through the specification of a schedule. Shared Dataset caching can be used together with report data feeds and report caching to help manage access to a data source, as well as to optimize our use of system resources.
We can “share” a dataset within the Report Designer simply by selecting Convert to Shared Dataset on the context menu that appears when we right-click a pre-existing dataset within a report. Once we have converted a standard dataset to a shared dataset, we can select that dataset from among a list of any other shared datasets, within the reports of the current Report Server project, simply by ensuring the selection of the radio button labeled Use a shared dataset within the Dataset Properties dialog that appears when we go to add a new dataset in the Report Data pane, Datasets folder.
When added to a report, a Shared Dataset (like a report part) maintains a relationship to the definition of the “parent” dataset on the report server. Unlike Report Parts, when the definition is changed, we do not have to accept updates. All reports that have the relationship always use the Shared Dataset definition on the report server.
Data Sources
SQL Server 2008 R2 Reporting Services offers three new data sources types: Microsoft SQL Azure, Microsoft SQL Server Parallel Data Warehouse, and Microsoft SharePoint List. Each of these data sources types, together with a description of its nature and operation, appears in Table 1.
Data Sources Type
Description
Microsoft SQL Azure
Connects to SQL Server databases in the cloud and enables reports to retrieve and render report data from SQL Azure databases.
Microsoft SQL Server Parallel Data Warehouse
Connects to a Microsoft SQL Server Parallel Data Warehouse and enables reports to retrieve and render report data from SQL Server databases.
Microsoft SharePoint List
Connects to a SharePoint site and enables reports to retrieve and render report data from SharePoint lists.
Table 1: New Data Source Types in SQL Server 2008 R2 Reporting Services
Data Visualization
SQL Server 2008 R2 Reporting Services provides three new ways to visualize data in reports: Maps, Sparklines and Data Bars, and Indicators. Let’s take a look at each of these ways, together with a description of its nature and operation, in the sections below.
Maps:
The new Map Wizard and Map Layer Wizard in Report Designer allow us to add maps and map layers to our reports, an example of which we can see in Illustration 3, to help visualize data against a geographic background.
Illustration 3: Adding a Map Visualization to a Report
Once map elements are related with report data, we can control color, size, width, or marker type on any given layer, as well as add labels, legends, titles, a color scale, and a distance scale to help assist users in interpreting the map display. Moreover, we can add interactive features, such as tooltips and drillthrough links, or provide parameters that enable a user to interactively control the visibility of each layer, and much more.
Sparklines and Data Bars
Sparklines and Data Bars have the basic chart elements of categories, series, and values, but they have no legend, axis lines, labels, or tick marks. Both are simple charts used to convey much information in a little space (and often inline with text), and they can be easily used within matrices and tables.
Sparklines and Data Bars are most effective when many of them are presented together, for rapid visual comparison (making it easy to identify outliers). Each Sparkline often represents multiple data points over time. Because Sparklines display aggregated data, they must go in a cell that is associated with a group (and are not added to a detail group in a table). By contrast, Data Bars can represent multiple data points, but typically illustrate only one (with each Data Bar typically presenting a single series).
An example of the use of Sparklines is depicted in Illustration 4.
Illustration 4: Example of a Use for Sparklines
Indicators
Indicators are available in Report Builder 3.0 and Report Designer, and are icon-like, minimal gauges that convey the state of a single data value at a glance. They are typically used in matrices or tables to present data in rows or columns.
Indicators are often used to reflect:
•Trends – with directional images such as arrows;
•Ratings – using incremental icons such as stars;
•States (of being) – with traffic lights or check marks.
Once we drag the indicator item from the toolbox to a report we are crafting in Report Designer, we are prompted to make a selection among various options as shown in Illustration 5.
Illustration 5: Selecting an Indicator Type in Report Designer
Report Layout and Rendering
SQL Server 2008 R2 Reporting Services continues to add features designed to support more rendering options. Moreover, we can now use our reports as the source of data feeds and as exports to Microsoft Excel.
Naming Excel Worksheet Tabs
We can now generate the names of worksheet tabs when we export reports to Excel, using the properties of reports and page breaks in tandem. We can provide an initial page name of a report that can be exported as the default name of worksheet tabs, or use page breaks and page names to provide different names for each worksheet tab.
Rendering Reports to Data Feeds
Via enhancements in SQL Server 2008 R2 Reporting Services, including the new Atom rendering extension, we can export a production report as a data feed or create a report whose primary purpose is provide data, in the form of data feeds, to applications. The option to use reports as a data feed gives us an additional way to provide data to applications. We might, for example, prefer to hide the complexity of a data source and make it simpler to use the data: We might, alternatively do this when targeted data is not easy to access through client data providers.) Another benefit would be the availability of familiar Reporting Services features, such as Report Manager, security, report snapshots and scheduling, to manage the reports that now provide data feeds. Whatever the business need, the capability to render reports directly to data feeds means reusability of existing reports, as well as a means of making best use of all the effort of creating and securing the underlying datasets.
Report Pagination and Text Rotation
SQL Server 2008 R2 Reporting Services introduces enhancements to page breaks in several key components, including:
•Tablix data regions (table, matrix, and list)
•Groups
•Rectangles
When we delimit the pages of a report using page breaks, we can now assign different reset page numbering and page names to those pages. Furthermore, we can leverage expressions to establish the dynamic update of page names and page numbers at runtime, or the conditional (in addition to full) disablement of page breaks.
An example of the settings involved in creating a page break, based upon a group (Product Category) within a sample report, is depicted in Illustration 6.
Illustration 6: Making a Report Break when the Product Category Changes
We can now rotate standalone text boxes up to 270 degrees in:
•Report headers
•Report footers
•Report body
•Cells of tables
•Cells of tables
This rotation, and the display of text written vertically and bottom to top that it supports, can help us to better use “available real estate” to create more readable reports, fit more data on printed reports, and create reports with more graphical appeal.
Aggregates, Expressions, and Functions
Aggregates, expressions and functions gain power in SQL Server 2008 R2Reporting Services. This includes:
•Enhanced expression support – New globals are introduced, including:
◦OverallPageNumber and OverallTotalPages - Support for page numbering for the entire rendered report.
◦PageName- Support for naming pages.
◦RenderFormat - Support for information that is specified for the renderer.
A new read-only property (set by default) for report variables is introduced. This property provides a way to persist data across report rendering, page changes, and some types of user interaction. The value of the variable is set when the report is reprocessed, but is maintained in the current session. (This is the equivalent of setting the Writable property for a report variable in RDL.)
•Support for calculating aggregates of aggregates – we can now calculate aggregates of aggregates, and use this capability, as an example, to more precisely align horizontal and vertical axes for charts and scales for gauges (when nesting within a table, etc.).
•Lookup Functions – Expressions in data regions can now include references to functions which can retrieve names and values from a dataset that is not bound to the data region.
Report Authoring Tools
In SQL Server 2008 R2 Reporting Services both report authoring tools, Report Builder and Report Designer, been improved to make report design and creation easier and more efficient.
Enhanced Preview Experience in Report Builder 3.0
The introduction of edit sessions enables the reuse of cached datasets when previewing reports in Report Builder 3.0, resulting in quicker rendering. We can now reuse cached datasets when previewing reports, thanks to the introduction of edit sessions in Report Builder 3.0. This means not only quicker rendering, but, because edit sessions are bound to a report, we can use references to subreports, as well as relative references, in reports.
Easier Report Design and Editing
Report Builder 3.0 provides a user-interface for changing credentials when it is unable to connect to the data source. We can use the Enter Data Source Credentials dialog box to change the credentials used by Report Builder 3.0 at design time, to connect to the data source as the current Windows user, or provide a user name and password. (This is to compensate for the fact that run-time credentials, specified in the data source properties of a report, might not work for design time tasks such as creating queries and previewing reports, depending upon the setup of the local environment).
New Capabilities in the Graphical Query Designer
The new query designer provides a graphical user interface to create custom relationships between related and unrelated tables. The query designer also makes it easy to include aggregates and grouping that summarize data in queries. The graphical query designer can now be used to create queries that retrieve report data from Microsoft SQL Server, Microsoft SQL Azure, and Microsoft SQL Server Parallel Data Warehouse databases, as we noted earlier.
Similarly, the new graphical query designer for the Microsoft SharePoint List data source type simplifies the design and creation of queries that retrieve report data from SharePoint lists. The query designer lists the fields in each SharePoint list item on a SharePoint site for easy inclusion into a query; the designer also helps us to intuitively define filters within our queries to limit the amount of data returned.
Cache Refresh Plans
SQL Server 2008 R2 Reporting Services introduces cache refresh plans, enabling us to cache shared dataset query results or reports from a schedule, or upon first use. (Previous versions allowed us to control caching at the report level only, resulting in the simultaneous caching of all datasets.) We can schedule cache refreshes through an item-specific schedule or a shared schedule, managing and scheduling report caching separately from report subscriptions. This means that we can refresh certain report components independently, and with varying frequencies, from others. Cache refresh plans for commonly used parameter combinations, as one illustration, can help improve data retrieval and report viewing response times.
Report Manager Enhancements
Report Manager has been considerably updated in SQL Server 2008 R2 Reporting Services, to make design and creation easier. More noticeable changes in Report Manager include an enhanced layout, which provides easier navigation to manage report properties and report server items, as well as an updated color scheme. A new drop-down menu is accessible for each report or report server item in a folder, whereby we can access the various configuration options for the report or item we choose.
The Report Manager has improvements in navigation and workflow for viewing and managing reports and report server items, provided by the new drop-down menu to access various configuration options for each report or report server item in a folder. It Eliminates of the need to render a report before accessing and configuring report properties when in default view. It has an updated Report Viewer toolbar, which includes some updates to the toolbar controls, as well as the ability (discussed earlier) to export report data to an Atom service document and data feeds. It also provides more “real estate” for Report Viewer when rendering reports.
Business Intelligence Development Studio (BIDS)
The Business Intelligence Development Studio as installed with SQL Server 2008 R2 Reporting Services supports design, creation and maintenance of both SQL Server 2008 and SQL Server 2008 R2 reports, and of Report Server projects in the SQL Server 2008 R2 version of Business Intelligence Development Studio. This means that we can open, preview, save, and deploy either version of reports or Report Server projects.
We can set Report Server project properties to specify the version of the Report Server to which we wish to deploy reports, as well as to dictate how warnings and errors we might experience, when either upgrading a report from SQL Server 2008 to SQL Server 2008 R2, or reverting a report from SQL Server 2008 R2 to SQL Server 2008, are handled.
New Report Definition Language Schema and New Web Service Endpoint
The Report Definition Language (RDL) schema introduced in SQL Server 2008 R2 includes a new
In order to support management operations of the report server in both native mode and SharePoint integrated mode, SQL Server 2008 R2 includes a new Web Service management endpoint, named ReportingService2010,that merges the functionalities of both the ReportingService2005 (which is used for managing objects on a report server that is configured for native mode) and the ReportingService2006 (used for managing objects on a report server that is configured for SharePoint integrated mode) endpoints. Features that are introduced in SQL Server 2008 R2, such as shared datasets and cache refresh (which we have discussed earlier), are included within the new endpoint.
Summary...
With the release of SQL Server 2008 Reporting Services, Microsoft introduced significant improvements that continued to roll out in the 2008 R2 update. We have reviewed many of these improvements in this article, and I hope that the extensive list makes a compelling case for upgrade, and the opportunity to develop and deploy better reports faster. Through the recent addition of entirely new capabilities, and the advances that have occurred in existing features, SQL Server Reporting Services delivers the maturity and power of a “best of class” enterprise reporting application.
The MERGE Statement in SQL Server 2008
When the SQL MERGE statement was introduced in SQL Server 2008, it allowed database programmers to replace reams of messy code with something quick, simple and maintainable. The MERGE syntax just takes a bit of explaining, and Rob Sheldon is, as always, on hand to explain with plenty of examples.
Starting with SQL Server 2008, you can use a MERGE statement to modify data in a target table based on data in a source table. The statement joins the target to the source by using a column common to both tables, such as a primary key. You can then insert, modify, or delete data from the target table—all in one statement—according to how the rows match up as a result of the join.
The MERGE statement supports several clauses that facilitate the different types of data modifications. In this article, I explain each of these clauses and provide examples that demonstrate how they work. I created the examples on a local instance of SQL Server 2008. To try them out, you’ll need to first run the following script to create and populate the tables used in the examples:
USE AdventureWorks2008
IF OBJECT_ID ('BookInventory', 'U') IS NOT NULL
DROP TABLE dbo.BookInventory;
CREATE TABLE dbo.BookInventory -- target
(
TitleID INT NOT NULL PRIMARY KEY,
Title NVARCHAR(100) NOT NULL,
Quantity INT NOT NULL
CONSTRAINT Quantity_Default_1 DEFAULT 0
);
IF OBJECT_ID ('BookOrder', 'U') IS NOT NULL
DROP TABLE dbo.BookOrder;
CREATE TABLE dbo.BookOrder -- source
(
TitleID INT NOT NULL PRIMARY KEY,
Title NVARCHAR(100) NOT NULL,
Quantity INT NOT NULL
CONSTRAINT Quantity_Default_2 DEFAULT 0
);
INSERT BookInventory VALUES
(1, 'The Catcher in the Rye', 6),
(2, 'Pride and Prejudice', 3),
(3, 'The Great Gatsby', 0),
(5, 'Jane Eyre', 0),
(6, 'Catch 22', 0),
(8, 'Slaughterhouse Five', 4);
INSERT BookOrder VALUES
(1, 'The Catcher in the Rye', 3),
(3, 'The Great Gatsby', 0),
(4, 'Gone with the Wind', 4),
(5, 'Jane Eyre', 5),
(7, 'Age of Innocence', 8);
As you can see, the script creates and populates the BookInventory and BookOrder tables. The BookInventory table represents the books that are or were available at a fictional book retailer. If the Quality value for a book is 0, then the book has sold out.
The BookOrder table shows those books for which an order has been placed and delivered. If the Quality value for a book listed in this table is 0, then the book had been requested but not included with the delivery. The Quantity values of both tables when added together represent the company’s current inventory.
NOTE: I created the BookInventory and BookOrder tables in the AdventureWorks2008 sample database. You can create the tables in any SQL Server 2008 user database. Be sure to change the USE statement in the script above to accurately reflect the target database.
Implementing the WHEN MATCHED Clause
The first MERGE clause we’ll look at is WHEN MATCHED. You should use this clause when you want to update or delete rows in the target table that match rows in the source table. Rows are considered matching when the joined column values are the same.
For example, if the BookID value in the BookInventory table matches the BookID value in the BookOrder table, the rows are considered to match, regardless of the other values in the matching rows. When rows do match, you can use the WHEN MATCHED clause to modify data in the target table. Lets look at an example to demonstrate how this works.
In the following MERGE statement, I join the BookInventory table (the target) to the BookOrder table (the source) and then use a WHEN MATCHED clause to update the Quantity column in the target table:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity;
SELECT * FROM BookInventory;
As you can see, the statement begins with the MERGE keyword, followed by the name of the target table. Note that the table name should be qualified as necessary. Notice that I’ve also assigned an alias (bi) to the target table to make it easier to reference that table later in the statement.
The next line in the statement is the USING clause, which is made up of the USING keyword, followed by the source table (again, qualified as necessary). I’ve also assigned an alias (bo) to this table. I then used an ON clause to join the two tables, based on the TitleID value in each table (bi.TitleID = bo.TitleID).
After I specified the target and source tables as well as the join condition, I added a WHEN MATCHED clause. The clause includes the WHEN MATCHED keywords, followed by the THEN keyword, next the UPDATE keyword, and finally a SET subclause. In this case, the SET expression specifies that the new Quantity value in the target table should equal the sum of the Quantity values from both the target and source tables (bi.Quantity = bi.Quantity + bo.Quantity). This way, the new Quantity value in the target table will reflect the accurate number of books available for sale, that is, the number of books that were on hand plus the number that arrived with the recent order.
That’s all there is to creating a basic MATCH statement. I also included a SELECT statement so we can view the contents of the target table. The following results reflect the new values in the Quantity column, as they appear in the BookInventory table after it has been updated with the MERGE statement:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
3
The Great Gatsby
0
5
Jane Eyre
5
6
Catch 22
0
8
Slaughterhouse Five
4
As the query results indicate, several of the rows have been updated to reflect the total inventory, based on the amounts in both the target and source tables. For example, the row in the BookInventory table with the TitleID value of 1 (The Catcher in the Rye) originally showed three books in stock. However, according to the BookOrder table, six more books were ordered, giving the company a total of nine books. As you would expect, the Quantity value in the BookInventory table is now 9.
You might have noticed that the book with the BookID value of 3 (The Great Gatsby) originally had a Quantity value of 0 in both the source and target tables. Suppose you want to remove from the BookInventory table any book whose Quantity value is 0 in both the target and source tables. You can easily delete such rows by adding a second WHEN MATCHED clause to your MATCH statement, as shown in the following example:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity;
SELECT * FROM BookInventory;
Notice that the new WHEN MATCHED clause includes a specific search condition after the AND keyword (bi.Quantity + bo.Quantity = 0). As a result, whenever rows from the two tables match and the two Quantity columns from the matched rows add up to 0, the row will be deleted, indicated by the DELETE keyword. Now the SELECT statement returns the following results:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
5
Jane Eyre
5
6
Catch 22
0
8
Slaughterhouse Five
4
As you can see, the book The Great Gatsby has been removed from the inventory. You should note, however, that a MERGE statement can include at most only two WHEN MATCHED clauses. Whenever you do include two of these clauses, the first clause must include the AND keyword, followed by a search condition, as I’ve done here. The second WHEN MATCHED clause is then applied only if the first one is not.
NOTE: The examples in this article are independent of one another. That is, for each example you run, you should first rerun the table creation script if you want your results to match the ones shown here.
Implementing the WHEN NOT MATCHED [BY TARGET] Clause
The next clause in the MERGE statement we’ll review is WHEN NOT MATCHED [BY TARGET]. (The BY TARGET keywords are optional.) You should use this clause to insert new rows into the target table. The rows you insert into the table are those rows in the source table for which there are no matching rows in the target. For example, the BookOrder table contains a row for Gone with the Wind. However, the BookInventory table does not contain this book. The following example demonstrates how to include a WHEN NOT MATCHED clause in your MERGE statement that adds Gone with the Wind to your target table:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity);
SELECT * FROM BookInventory;
As the statement shows, I first specify the WHEN NOT MATCHED BY TARGET keywords, followed by the THEN keyword, and finally followed by the INSERT clause. The INSERT clause has two parts: the INSERT subclause and the VALUES subclause. You specify the target columns in the INSERT subclause and the source values in the VALUES clause. Notice that, for the VALUES subclause, I must qualify the column names with the table alias. The SELECT statement now returns the following results:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
4
Gone with the Wind
4
5
Jane Eyre
5
6
Catch 22
0
7
Age of Innocence
8
8
Slaughterhouse Five
4
Two new rows have been added to the BookInventory table: one for Gone with the Wind and one for Age of Innocence. Because the books existed in the source table, but not in the target table, they were inserted into the BookInventory table.
Implementing the WHEN NOT MATCHED BY SOURCE Clause
As you’ll recall from the discussion about the WHEN MATCHED clause, you can use that clause to delete rows from the target table. However, you can delete a row that matches a row in the source table. But suppose you want to delete a row from the target table that does not match a row in the source table.
For example, one of the rows originally inserted into the BookInventory table is for the book Catch 22. The Quality value for that book was never updated because no order was placed for the book, that is, the book was never added to the BookOrder table. Because there are no copies of that book in stock, you might decide to delete that book from the target table. To delete a row that does not match a row in the source table, you must use the WHEN NOT MATCHED BY SOURCE clause.
NOTE: Like the WHEN MATCHED clause, you can include up to two WHEN NOT MATCHED BY SOURCE clauses in your MERGE statement. If you include two, the first clause must include the AND keyword followed by a search condition.
The following example includes a WHEN NOT MATCHED BY SOURCE clause that specifies that any rows with a quantity of 0 that do not match the source should be deleted:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity)
WHEN NOT MATCHED BY SOURCE
AND bi.Quantity = 0 THEN
DELETE;
SELECT * FROM BookInventory;
After I specified the WHEN NOT MATCHED BY SOURCE keywords, I specified AND followed by a search condition (bi.Quantity = 0). I then added the THEN keyword, and next the DELETE keyword. The results returned by the SELECT statement are shown in the following table:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
4
Gone with the Wind
4
5
Jane Eyre
5
7
Age of Innocence
8
8
Slaughterhouse Five
4
As you can see, the BookInventory table no longer includes the row for Catch 22. And because the three MERGE clauses have been used together, the BookInventory now reflects the exact number of books that are currently on hand, and no books are included that are not in stock.
Implementing the OUTPUT Clause
When SQL Server 2005 was released, it included support for the OUTPUT clause in several data modification language (DML) statements. The OUTPUT clause is also available in the MERGE statement. The OUTPUT clause returns a copy of the data that you’ve inserted into or deleted from your tables. When used with a MERGE statement, the clause provides you with a powerful tool for capturing the modified data for archiving, messaging, or application purposes.
NOTE: To learn more about the OUTPUT clause, see the article “Implementing the OUTPUT Clause in SQL Server 2008” (http://www.simple-talk.com/sql/learn-sql-server/implementing-the-output-clause-in-sql-server-2008/).
In the following example, I use an OUTPUT clause to pass the outputted data to a variable named @MergeOutput:
DECLARE @MergeOutput TABLE
(
ActionType NVARCHAR(10),
DelTitleID INT,
InsTitleID INT,
DelTitle NVARCHAR(50),
InsTitle NVARCHAR(50),
DelQuantity INT,
InsQuantity INT
);
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity)
WHEN NOT MATCHED BY SOURCE
AND bi.Quantity = 0 THEN
DELETE
OUTPUT
$action,
DELETED.TitleID,
INSERTED.TitleID,
DELETED.Title,
INSERTED.Title,
DELETED.Quantity,
INSERTED.Quantity
INTO @MergeOutput;
SELECT * FROM BookInventory;
SELECT * FROM @MergeOutput;
Notice that I first declare the @MergeOutput table variable. In the variable, I include a column for the action type plus three additional sets of column. Each set corresponds to the columns in the target table and includes a column that shows the deleted data and one that shows the inserted data. For example, the DelTitleID and InsTitleID columns correspond to the deleted and inserted values, respectively, in the target table.
The OUTPUT clause itself first specifies the built-in $action variable, which returns one of three nvarchar(10) values—INSERT, UPDATE, or DELETE. The variable is available only to the MERGE statement. I follow the variable with a set of column prefixes (DELETED and INSERTED) for each column in the target table. The column prefixes are followed by the name of the column they’re related to. For example, I include DELETED.TitleID and INSERTED.TitleID for the TitleID column in the target table. After I specify the column prefixes, I then include an INTO subclause, which specifies that the outputted values should be saved to the @MergeOutput variable.
After the OUTPUT clause, which is the last clause in my MERGE statement, I added a SELECT statement to retrieve the updated contents of the BookInventory table, as I’ve done in previous examples. The SELECT statement returns the following results:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
4
Gone with the Wind
4
5
Jane Eyre
5
7
Age of Innocence
8
8
Slaughterhouse Five
4
Notice that I also include a second SELECT statement in my example above. This statement retrieves the contents of the @MergeOutput variable. The query results are shown in the following table:
ActionType
DelTitleID
InsTitleID
DelTitle
InsTitle
DelQuantity
InsQuantity
UPDATE
1
1
The Catcher in the Rye
The Catcher in the Rye
6
9
DELETE
3
NULL
The Great Gatsby
NULL
0
NULL
INSERT
NULL
4
NULL
Gone with the Wind
NULL
4
UPDATE
5
5
Jane Eyre
Jane Eyre
0
5
DELETE
6
NULL
Catch 22
NULL
0
NULL
INSERT
NULL
7
NULL
Age of Innocence
NULL
8
The results show any actions that were taken on the target table. For instance, the row for the Great Gatsby indicates that this row was deleted from the BookInventory table. The DelTitleID, DelTitle, and DelQuantity columns show the values that were deleted. However, the InsTitleID, InsTitle, and InsQuantity columns all show null values. That’s because no data was inserted into the target table for that row. If a row had been inserted, the InsTitleID, InsTitle, and InsQuantity columns will show the inserted values, but the DelTitleID, DelTitle, and DelQuantity columns will show null values because nothing is deleted when an insert is performed. Any updated rows will have values in all columns.
Starting with SQL Server 2008, you can use a MERGE statement to modify data in a target table based on data in a source table. The statement joins the target to the source by using a column common to both tables, such as a primary key. You can then insert, modify, or delete data from the target table—all in one statement—according to how the rows match up as a result of the join.
The MERGE statement supports several clauses that facilitate the different types of data modifications. In this article, I explain each of these clauses and provide examples that demonstrate how they work. I created the examples on a local instance of SQL Server 2008. To try them out, you’ll need to first run the following script to create and populate the tables used in the examples:
USE AdventureWorks2008
IF OBJECT_ID ('BookInventory', 'U') IS NOT NULL
DROP TABLE dbo.BookInventory;
CREATE TABLE dbo.BookInventory -- target
(
TitleID INT NOT NULL PRIMARY KEY,
Title NVARCHAR(100) NOT NULL,
Quantity INT NOT NULL
CONSTRAINT Quantity_Default_1 DEFAULT 0
);
IF OBJECT_ID ('BookOrder', 'U') IS NOT NULL
DROP TABLE dbo.BookOrder;
CREATE TABLE dbo.BookOrder -- source
(
TitleID INT NOT NULL PRIMARY KEY,
Title NVARCHAR(100) NOT NULL,
Quantity INT NOT NULL
CONSTRAINT Quantity_Default_2 DEFAULT 0
);
INSERT BookInventory VALUES
(1, 'The Catcher in the Rye', 6),
(2, 'Pride and Prejudice', 3),
(3, 'The Great Gatsby', 0),
(5, 'Jane Eyre', 0),
(6, 'Catch 22', 0),
(8, 'Slaughterhouse Five', 4);
INSERT BookOrder VALUES
(1, 'The Catcher in the Rye', 3),
(3, 'The Great Gatsby', 0),
(4, 'Gone with the Wind', 4),
(5, 'Jane Eyre', 5),
(7, 'Age of Innocence', 8);
As you can see, the script creates and populates the BookInventory and BookOrder tables. The BookInventory table represents the books that are or were available at a fictional book retailer. If the Quality value for a book is 0, then the book has sold out.
The BookOrder table shows those books for which an order has been placed and delivered. If the Quality value for a book listed in this table is 0, then the book had been requested but not included with the delivery. The Quantity values of both tables when added together represent the company’s current inventory.
NOTE: I created the BookInventory and BookOrder tables in the AdventureWorks2008 sample database. You can create the tables in any SQL Server 2008 user database. Be sure to change the USE statement in the script above to accurately reflect the target database.
Implementing the WHEN MATCHED Clause
The first MERGE clause we’ll look at is WHEN MATCHED. You should use this clause when you want to update or delete rows in the target table that match rows in the source table. Rows are considered matching when the joined column values are the same.
For example, if the BookID value in the BookInventory table matches the BookID value in the BookOrder table, the rows are considered to match, regardless of the other values in the matching rows. When rows do match, you can use the WHEN MATCHED clause to modify data in the target table. Lets look at an example to demonstrate how this works.
In the following MERGE statement, I join the BookInventory table (the target) to the BookOrder table (the source) and then use a WHEN MATCHED clause to update the Quantity column in the target table:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity;
SELECT * FROM BookInventory;
As you can see, the statement begins with the MERGE keyword, followed by the name of the target table. Note that the table name should be qualified as necessary. Notice that I’ve also assigned an alias (bi) to the target table to make it easier to reference that table later in the statement.
The next line in the statement is the USING clause, which is made up of the USING keyword, followed by the source table (again, qualified as necessary). I’ve also assigned an alias (bo) to this table. I then used an ON clause to join the two tables, based on the TitleID value in each table (bi.TitleID = bo.TitleID).
After I specified the target and source tables as well as the join condition, I added a WHEN MATCHED clause. The clause includes the WHEN MATCHED keywords, followed by the THEN keyword, next the UPDATE keyword, and finally a SET subclause. In this case, the SET expression specifies that the new Quantity value in the target table should equal the sum of the Quantity values from both the target and source tables (bi.Quantity = bi.Quantity + bo.Quantity). This way, the new Quantity value in the target table will reflect the accurate number of books available for sale, that is, the number of books that were on hand plus the number that arrived with the recent order.
That’s all there is to creating a basic MATCH statement. I also included a SELECT statement so we can view the contents of the target table. The following results reflect the new values in the Quantity column, as they appear in the BookInventory table after it has been updated with the MERGE statement:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
3
The Great Gatsby
0
5
Jane Eyre
5
6
Catch 22
0
8
Slaughterhouse Five
4
As the query results indicate, several of the rows have been updated to reflect the total inventory, based on the amounts in both the target and source tables. For example, the row in the BookInventory table with the TitleID value of 1 (The Catcher in the Rye) originally showed three books in stock. However, according to the BookOrder table, six more books were ordered, giving the company a total of nine books. As you would expect, the Quantity value in the BookInventory table is now 9.
You might have noticed that the book with the BookID value of 3 (The Great Gatsby) originally had a Quantity value of 0 in both the source and target tables. Suppose you want to remove from the BookInventory table any book whose Quantity value is 0 in both the target and source tables. You can easily delete such rows by adding a second WHEN MATCHED clause to your MATCH statement, as shown in the following example:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity;
SELECT * FROM BookInventory;
Notice that the new WHEN MATCHED clause includes a specific search condition after the AND keyword (bi.Quantity + bo.Quantity = 0). As a result, whenever rows from the two tables match and the two Quantity columns from the matched rows add up to 0, the row will be deleted, indicated by the DELETE keyword. Now the SELECT statement returns the following results:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
5
Jane Eyre
5
6
Catch 22
0
8
Slaughterhouse Five
4
As you can see, the book The Great Gatsby has been removed from the inventory. You should note, however, that a MERGE statement can include at most only two WHEN MATCHED clauses. Whenever you do include two of these clauses, the first clause must include the AND keyword, followed by a search condition, as I’ve done here. The second WHEN MATCHED clause is then applied only if the first one is not.
NOTE: The examples in this article are independent of one another. That is, for each example you run, you should first rerun the table creation script if you want your results to match the ones shown here.
Implementing the WHEN NOT MATCHED [BY TARGET] Clause
The next clause in the MERGE statement we’ll review is WHEN NOT MATCHED [BY TARGET]. (The BY TARGET keywords are optional.) You should use this clause to insert new rows into the target table. The rows you insert into the table are those rows in the source table for which there are no matching rows in the target. For example, the BookOrder table contains a row for Gone with the Wind. However, the BookInventory table does not contain this book. The following example demonstrates how to include a WHEN NOT MATCHED clause in your MERGE statement that adds Gone with the Wind to your target table:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity);
SELECT * FROM BookInventory;
As the statement shows, I first specify the WHEN NOT MATCHED BY TARGET keywords, followed by the THEN keyword, and finally followed by the INSERT clause. The INSERT clause has two parts: the INSERT subclause and the VALUES subclause. You specify the target columns in the INSERT subclause and the source values in the VALUES clause. Notice that, for the VALUES subclause, I must qualify the column names with the table alias. The SELECT statement now returns the following results:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
4
Gone with the Wind
4
5
Jane Eyre
5
6
Catch 22
0
7
Age of Innocence
8
8
Slaughterhouse Five
4
Two new rows have been added to the BookInventory table: one for Gone with the Wind and one for Age of Innocence. Because the books existed in the source table, but not in the target table, they were inserted into the BookInventory table.
Implementing the WHEN NOT MATCHED BY SOURCE Clause
As you’ll recall from the discussion about the WHEN MATCHED clause, you can use that clause to delete rows from the target table. However, you can delete a row that matches a row in the source table. But suppose you want to delete a row from the target table that does not match a row in the source table.
For example, one of the rows originally inserted into the BookInventory table is for the book Catch 22. The Quality value for that book was never updated because no order was placed for the book, that is, the book was never added to the BookOrder table. Because there are no copies of that book in stock, you might decide to delete that book from the target table. To delete a row that does not match a row in the source table, you must use the WHEN NOT MATCHED BY SOURCE clause.
NOTE: Like the WHEN MATCHED clause, you can include up to two WHEN NOT MATCHED BY SOURCE clauses in your MERGE statement. If you include two, the first clause must include the AND keyword followed by a search condition.
The following example includes a WHEN NOT MATCHED BY SOURCE clause that specifies that any rows with a quantity of 0 that do not match the source should be deleted:
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity)
WHEN NOT MATCHED BY SOURCE
AND bi.Quantity = 0 THEN
DELETE;
SELECT * FROM BookInventory;
After I specified the WHEN NOT MATCHED BY SOURCE keywords, I specified AND followed by a search condition (bi.Quantity = 0). I then added the THEN keyword, and next the DELETE keyword. The results returned by the SELECT statement are shown in the following table:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
4
Gone with the Wind
4
5
Jane Eyre
5
7
Age of Innocence
8
8
Slaughterhouse Five
4
As you can see, the BookInventory table no longer includes the row for Catch 22. And because the three MERGE clauses have been used together, the BookInventory now reflects the exact number of books that are currently on hand, and no books are included that are not in stock.
Implementing the OUTPUT Clause
When SQL Server 2005 was released, it included support for the OUTPUT clause in several data modification language (DML) statements. The OUTPUT clause is also available in the MERGE statement. The OUTPUT clause returns a copy of the data that you’ve inserted into or deleted from your tables. When used with a MERGE statement, the clause provides you with a powerful tool for capturing the modified data for archiving, messaging, or application purposes.
NOTE: To learn more about the OUTPUT clause, see the article “Implementing the OUTPUT Clause in SQL Server 2008” (http://www.simple-talk.com/sql/learn-sql-server/implementing-the-output-clause-in-sql-server-2008/).
In the following example, I use an OUTPUT clause to pass the outputted data to a variable named @MergeOutput:
DECLARE @MergeOutput TABLE
(
ActionType NVARCHAR(10),
DelTitleID INT,
InsTitleID INT,
DelTitle NVARCHAR(50),
InsTitle NVARCHAR(50),
DelQuantity INT,
InsQuantity INT
);
MERGE BookInventory bi
USING BookOrder bo
ON bi.TitleID = bo.TitleID
WHEN MATCHED AND
bi.Quantity + bo.Quantity = 0 THEN
DELETE
WHEN MATCHED THEN
UPDATE
SET bi.Quantity = bi.Quantity + bo.Quantity
WHEN NOT MATCHED BY TARGET THEN
INSERT (TitleID, Title, Quantity)
VALUES (bo.TitleID, bo.Title,bo.Quantity)
WHEN NOT MATCHED BY SOURCE
AND bi.Quantity = 0 THEN
DELETE
OUTPUT
$action,
DELETED.TitleID,
INSERTED.TitleID,
DELETED.Title,
INSERTED.Title,
DELETED.Quantity,
INSERTED.Quantity
INTO @MergeOutput;
SELECT * FROM BookInventory;
SELECT * FROM @MergeOutput;
Notice that I first declare the @MergeOutput table variable. In the variable, I include a column for the action type plus three additional sets of column. Each set corresponds to the columns in the target table and includes a column that shows the deleted data and one that shows the inserted data. For example, the DelTitleID and InsTitleID columns correspond to the deleted and inserted values, respectively, in the target table.
The OUTPUT clause itself first specifies the built-in $action variable, which returns one of three nvarchar(10) values—INSERT, UPDATE, or DELETE. The variable is available only to the MERGE statement. I follow the variable with a set of column prefixes (DELETED and INSERTED) for each column in the target table. The column prefixes are followed by the name of the column they’re related to. For example, I include DELETED.TitleID and INSERTED.TitleID for the TitleID column in the target table. After I specify the column prefixes, I then include an INTO subclause, which specifies that the outputted values should be saved to the @MergeOutput variable.
After the OUTPUT clause, which is the last clause in my MERGE statement, I added a SELECT statement to retrieve the updated contents of the BookInventory table, as I’ve done in previous examples. The SELECT statement returns the following results:
TitleID
Title
Quantity
1
The Catcher in the Rye
9
2
Pride and Prejudice
3
4
Gone with the Wind
4
5
Jane Eyre
5
7
Age of Innocence
8
8
Slaughterhouse Five
4
Notice that I also include a second SELECT statement in my example above. This statement retrieves the contents of the @MergeOutput variable. The query results are shown in the following table:
ActionType
DelTitleID
InsTitleID
DelTitle
InsTitle
DelQuantity
InsQuantity
UPDATE
1
1
The Catcher in the Rye
The Catcher in the Rye
6
9
DELETE
3
NULL
The Great Gatsby
NULL
0
NULL
INSERT
NULL
4
NULL
Gone with the Wind
NULL
4
UPDATE
5
5
Jane Eyre
Jane Eyre
0
5
DELETE
6
NULL
Catch 22
NULL
0
NULL
INSERT
NULL
7
NULL
Age of Innocence
NULL
8
The results show any actions that were taken on the target table. For instance, the row for the Great Gatsby indicates that this row was deleted from the BookInventory table. The DelTitleID, DelTitle, and DelQuantity columns show the values that were deleted. However, the InsTitleID, InsTitle, and InsQuantity columns all show null values. That’s because no data was inserted into the target table for that row. If a row had been inserted, the InsTitleID, InsTitle, and InsQuantity columns will show the inserted values, but the DelTitleID, DelTitle, and DelQuantity columns will show null values because nothing is deleted when an insert is performed. Any updated rows will have values in all columns.
Subscribe to:
Posts (Atom)