http://reddymsbitools.blogspot.com

Thursday, 28 October 2010

SSIS Incremental Loads

SSIS Design Pattern - Incremental Loads
Introduction

Loading data from a data source to SQL Server is a common task. It's used in Data Warehousing, but increasingly data is being staged in SQL Server for non-Business-Intelligence purposes.

Maintaining data integrity is key when loading data into any database. A common way of accomplishing this is to truncate the destination and reload from the source. While this method ensures data integrity, it also loads a lot of data that was just deleted.

Incremental loads are a faster and use less server resources. Only new or updated data is touched in an incremental load.

When To Use Incremental Loads

Use incremental loads whenever you need to load data from a data source to SQL Server.

Incremental loads are the same regardless of which database platform or ETL tool you use. You need to detect new and updated rows - and separate these from the unchanged rows.

Incremental Loads in Transact-SQL

I will start by demonstrating this with T-SQL:

0. (Optional, but recommended) Create two databases: a source and destination database for this demonstration:


CREATE DATABASE [SSISIncrementalLoad_Source]

CREATE DATABASE [SSISIncrementalLoad_Dest]
1. Create a source named tblSource with the columns ColID, ColA, ColB, and ColC; make ColID is a primary unique key:

USE SSISIncrementalLoad_Source
GO
CREATE TABLE dbo.tblSource
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL constraint df_ColB default (getDate())
,ColC int NULL
,constraint PK_tblSource primary key clustered (ColID))

2. Create a Destination table named tblDest with the columns ColID, ColA, ColB, ColC:

USE SSISIncrementalLoad_Dest
GO
CREATE TABLE dbo.tblDest
(ColID int NOT NULL
,ColA varchar(10) NULL
,ColB datetime NULL
,ColC int NULL)

3. Let's load some test data into both tables for demonstration purposes:

USE SSISIncrementalLoad_Source
GO

-- insert an "unchanged" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, 'B', '1/1/2007 12:02 AM', -2)

-- insert a "new" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, 'N', '1/1/2007 12:03 AM', -3)

USE SSISIncrementalLoad_Dest
GO

-- insert an "unchanged" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, 'C', '1/1/2007 12:02 AM', -2)

4. You can view new rows with the following query:

SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL

This should return the "new" row - the one loaded earlier with ColID = 2 and ColA = 'N'. Why? The LEFT JOIN and WHERE clauses are the key. Left Joins return all rows on the left side of the join clause (SSISIncrementalLoad_Source.dbo.tblSource in this case) whether there's a match on the right side of the join clause (SSISIncrementalLoad_Dest.dbo.tblDest in this case) or not. If there is no match on the right side, NULLs are returned. This is why the WHERE clause works: it goes after rows where the destination ColID is NULL. These rows have no match in the LEFT JOIN, therefore they must be new.

This is only an example. You occasionally find database schemas that are this easy to load. Occasionally. Most of the time you have to include several columns in the JOIN ON clause to isolate truly new rows. Sometimes you have to add conditions in the WHERE clause to refine the definition of truly new rows.

Incrementally load the row ("rows" in practice) with the following T-SQL statement:

INSERT INTO SSISIncrementalLoad_Dest.dbo.tblDest
(ColID, ColA, ColB, ColC)
SELECT s.ColID, s.ColA, s.ColB, s.ColC
FROM SSISIncrementalLoad_Source.dbo.tblSource s
LEFT JOIN SSISIncrementalLoad_Dest.dbo.tblDest d ON d.ColID = s.ColID
WHERE d.ColID IS NULL

5. There are many ways by which people try to isolate changed rows. The only sure-fire way to accomplish it is to compare each field. View changed rows with the following T-SQL statement:

SELECT d.ColID, d.ColA, d.ColB, d.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)

This should return the "changed" row we loaded earlier with ColID = 1 and ColA = 'C'. Why? The INNER JOIN and WHERE clauses are to blame - again. The INNER JOIN goes after rows with matching ColID's because of the JOIN ON clause. The WHERE clause refines the resultset, returning only rows where the ColA's, ColB's, or ColC's don't match and the ColID's match. This is important. If there's a difference in any or some or all the rows (except ColID), we want to update it.

Extract-Transform-Load (ETL) theory has a lot to say about when and how to update changed data. You will want to pick up a good book on the topic to learn more about the variations.
To update the data in our destination, use the following T-SQL:

UPDATE d
SET
d.ColA = s.ColA
,d.ColB = s.ColB
,d.ColC = s.ColC
FROM SSISIncrementalLoad_Dest.dbo.tblDest d
INNER JOIN SSISIncrementalLoad_Source.dbo.tblSource s ON s.ColID = d.ColID
WHERE (
(d.ColA != s.ColA)
OR (d.ColB != s.ColB)
OR (d.ColC != s.ColC)
)

Incremental Loads in SSIS

Let's take a look at how you can accomplish this in SSIS using the Lookup Transformation (for the join functionality) combined with the Conditional Split (for the WHERE clause conditions) transformations.

Before we begin, let's reset our database tables to their original state using the following query:

USE SSISIncrementalLoad_Source
GO

TRUNCATE TABLE dbo.tblSource

-- insert an "unchanged" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(1, 'B', '1/1/2007 12:02 AM', -2)

-- insert a "new" row
INSERT INTO dbo.tblSource
(ColID,ColA,ColB,ColC)
VALUES(2, 'N', '1/1/2007 12:03 AM', -3)

USE SSISIncrementalLoad_Dest
GO

TRUNCATE TABLE dbo.tblDest

-- insert an "unchanged" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(0, 'A', '1/1/2007 12:01 AM', -1)

-- insert a "changed" row
INSERT INTO dbo.tblDest
(ColID,ColA,ColB,ColC)
VALUES(1, 'C', '1/1/2007 12:02 AM', -2)

Next, create a new project using Business Intelligence Development Studio (BIDS). Name the project SSISIncrementalLoad:

Once the project loads, open Solution Explorer and rename Package1.dtsx to SSISIncrementalLoad.dtsx:

When prompted to rename the package object, click the Yes button. From the toolbox, drag a Data Flow onto the Control Flow canvas:



Double-click the Data Flow task to edit it. From the toolbox, drag and drop an OLE DB Source onto the Data Flow canvas:



Double-click the OLE DB Source connection adapter to edit it:



Click the New button beside the OLE DB Connection Manager dropdown:

Click the New button here to create a new Data Connection:

Enter or select your server name. Connect to the SSISIncrementalLoad_Source database you created earlier. Click the OK button to return to the Connection Manager configuration dialog. Click the OK button to accept your newly created Data Connection as the Connection Manager you wish to define. Select "dbo.tblSource" from the Table dropdown:



Click the OK button to complete defining the OLE DB Source Adapter.

Drag and drop a Lookup Transformation from the toolbox onto the Data Flow canvas. Connect the OLE DB connection adapter to the Lookup transformation by clicking on the OLE DB Source and dragging the green arrow over the Lookup and dropping it. Right-click the Lookup transformation and click Edit (or double-click the Lookup transformation) to edit:



When the editor opens, click the New button beside the OLE DB Connection Manager dropdown (as you did earlier for the OLE DB Source Adapter). Define a new Data Connection - this time to the SSISIncrementalLoad_Dest database. After setting up the new Data Connection and Connection Manager, configure the Lookup transformation to connect to "dbo.tblDest":



Click the Columns tab. On the left side are the columns currently in the SSIS data flow pipeline (from SSISIncrementalLoad_Source.dbo.tblSource). On the right side are columns available from the Lookup destination you just configured (from SSISIncrementalLoad_Dest.dbo.tblDest). Follow the following steps:

1. We'll need all the rows returned from the destination table, so check all the checkboxes beside the rows in the destination. We need these rows for our WHERE clauses and for our JOIN ON clauses.

2. We do not want to map all the rows between the source and destination - we only want to map the columns named ColID between the database tables. The Mappings drawn between the Available Input Columns and Available Lookup Columns define the JOIN ON clause. Multi-select the Mappings between ColA, ColB, and ColC by clicking on them while holding the Ctrl key. Right-click any of them and click "Delete Selected Mappings" to delete these columns from our JOIN ON clause.

3. Add the text "Dest_" to each column's Output Alias. These rows are being appended to the data flow pipeline. This is so we can distinguish between Source and Destination rows farther down the pipeline:

Next we need to modify our Lookup transformation behavior. By default, the Lookup operates as an INNER JOIN - but we need a LEFT (OUTER) JOIN. Click the "Configure Error Output" button to open the "Configure Error Output" screen. On the "Lookup Output" row, change the Error column from "Fail component" to "Ignore failure". This tells the Lookup transformation "If you don't find an INNER JOIN match in the destination table for the Source table's ColID value, don't fail." - which also effectively tells the Lookup "Don't act like an INNER JOIN, behave like a LEFT JOIN":

Click OK to complete the Lookup transformation configuration.

From the toolbox, drag and drop a Conditional Split Transformation onto the Data Flow canvas. Connect the Lookup to the Conditional Split as shown. Right-click the Conditional Split and click Edit to open the Conditional Split Editor:



Expand the NULL Functions folder in the upper right of the Conditional Split Transformation Editor. Expand the Columns folder in the upper left side of the Conditional Split Transformation Editor. Click in the "Output Name" column and enter "New Rows" as the name of the first output. From the NULL Functions folder, drag and drop the "ISNULL( <> )" function to the Condition column of the New Rows condition:

Next, drag Dest_ColID from the columns folder and drop it onto the "<>" text in the Condition column. "New Rows" should now be defined by the condition "ISNULL( [Dest_ColID] )". This defines the WHERE clause for new rows - setting it to "WHERE Dest_ColID Is NULL".

Type "Changed Rows" into a second Output Name column. Add the expression "(ColA != Dest_ColA) || (ColB != Dest_ColB) || (ColC != Dest_ColC)" to the Condition column for the Changed Rows output. This defines our WHERE clause for detecting changed rows - setting it to "WHERE ((Dest_ColA != ColA) OR (Dest_ColB != ColB) OR (Dest_ColC != ColC))". Note "||" is used to convey "OR" in SSIS Expressions:



Change the "Default output name" from "Conditional Split Default Output" to "Unchanged Rows":

Click the OK button to complete configuration of the Conditional Split transformation.

Drag and drop an OLE DB Destination connection adapter and an OLE DB Command transformation onto the Data Flow canvas. Click on the Conditional Split and connect it to the OLE DB Destination. A dialog will display prompting you to select a Conditional Split Output (those outputs you defined in the last step). Select the New Rows output:

Next connect the OLE DB Command transformation to the Conditional Split's "Changed Rows" output:



Your Data Flow canvas should appear similar to the following:

Configure the OLE DB Destination by aiming at the SSISIncrementalLoad_Dest.dbo.tblDest table:



Click the Mappings item in the list to the left. Make sure the ColID, ColA, ColB, and ColC source columns are mapped to their matching destination columns (aren't you glad we prepended "Dest_" to the destination columns?):



Click the OK button to complete configuring the OLE DB Destination connection adapter.

Double-click the OLE DB Command to open the "Advanced Editor for OLE DB Command" dialog. Set the Connection Manager column to your SSISIncrementalLoad_Dest connection manager:



Click on the "Component Properties" tab. Click the elipsis (button with "...") beside the SQLCommand property:

The String Value Editor displays. Enter the following parameterized T-SQL statement into the String Value textbox:

UPDATE dbo.tblDest
SET
ColA = ?
,ColB = ?
,ColC = ?
WHERE ColID = ?



The question marks in the previous parameterized T-SQL statement map by ordinal to columns named "Param_0" through "Param_3". Map them as shown below - effectively altering the UPDATE statement for each row to read:

UPDATE SSISIncrementalLoad_Dest.dbo.tblDest
SET
ColA = SSISIncrementalLoad_Source.dbo.ColA
,ColB = SSISIncrementalLoad_Source.dbo.ColB
,ColC = SSISIncrementalLoad_Source.dbo.ColC
WHERE ColID = SSISIncrementalLoad_Source.dbo.ColID

Note the query is executed on a row-by-row basis. For performance with large amounts of data, you will want to employ set-based updates instead.

Click the OK button when mapping is completed.

Your Data Flow canvas should look like that pictured below:



If you execute the package with debugging (press F5), the package should succeed and appear as shown here:



Note one row takes the "New Rows" output from the Conditional Split, and one row takes the "Changed Rows" output from the Conditional Split transformation. Although not visible, our third source row doesn't change, and would be sent to the "Unchanged Rows" output - which is simply the default Conditional Split output renamed. Any row that doesn't meet any of the predefined conditions in the Conditional Split is sent to the default output.

SQL Server Interview Questions 4

61. Can you link only other SQL Servers or any database servers such as Oracle?
We can link any server provided we have the OLE-DB provider from Microsoft to allow a link. For Oracle we have a OLE-DB provider for oracle that microsoft provides to add it as a linked server to the sql server group.
62. Which stored procedure will you be running to add a linked server?
sp_addlinkedserver, sp_addlinkedsrvlogin
63. What are the OS services that the SQL Server installation adds?
MS SQL SERVER SERVICE, SQL AGENT SERVICE, DTC (Distribution transac co-ordinator)
64. Can you explain the role of each service?
SQL SERVER - is for running the databases SQL AGENT - is for automation such as Jobs, DB Maintanance, Backups DTC - Is for linking and connecting to other SQL Servers
65. How do you troubleshoot SQL Server if its running very slow?
First check the processor and memory usage to see that processor is not above 80% utilization and memory not above 40-45% utilization then check the disk utilization using Performance Monitor, Secondly, use SQL Profiler to check for the users and current SQL activities and jobs running which might be a problem. Third would be to run UPDATE_STATISTICS command to update the indexes
66. Due to N/W or Security issues client is not able to connect to server or vice versa. How do you troubleshoot?
First I will look to ensure that port settings are proper on server and client Network utility for connections. ODBC is properly configured at client end for connection; Makepipe & readpipe are utilities to check for connection. Makepipe is run on Server and readpipe on client to check for any connection issues.
67. What are the authentication modes in SQL Server?
Windows mode and mixed mode (SQL & Windows).
68. Where do you think the users names and passwords will be stored in sql server?
They get stored in master db in the sysxlogins table.
69. What is log shipping? Can we do logshipping with SQL Server 7.0?
Logshipping is a new feature of SQL Server 2000. We should have two SQL Server - Enterprise Editions. From Enterprise Manager we can configure the logshipping. In logshipping the transactional log file from one server is automatically updated into the backup database on the other server. If one server fails, the other server will have the same db and we can use this as the DR (disaster recovery) plan.
70. SQL Server crashed and you are rebuilding the databases including the master database what procedure to you follow?
For restoring the master db we have to stop the SQL Server first and then from command line we can type SQLSERVER –m which will basically bring it into the maintenance mode after which we can restore the master db.
71. What is BCP? When do we use it?
BulkCopy is a tool used to copy huge amount of data from tables and views. But it won’t copy the structures of the same.
72. What should we do to copy the tables, schema and views from one SQL Server to another?
We have to write some DTS packages for it.

SQL Server Interview Questions 3

41. What are triggers?
Triggers are special kind of stored procedures that get executed automatically when an INSERT, UPDATE or DELETE operation takes place on a table.
42. How many triggers you can have on a table?
In SQL Server 6.5 you could define only 3 triggers per table, one for INSERT, one for UPDATE and one for DELETE. From SQL Server 7.0 onwards, this restriction is gone, and you could create multiple triggers per each action.
43. How to invoke a trigger on demand?
In SQL Server 7.0 there's no way to control the order in which the triggers fire. In SQL Server 2000 you could specify which trigger fires first or fires last using sp_settriggerorder Triggers can't be invoked on demand. They get triggered only when an associated action (INSERT, UPDATE, DELETE) happens on the table on which they are defined.
44. What are the uses of Triggers?
Triggers are generally used to implement business rules, auditing. Triggers can also be used to extend the referential integrity checks, but wherever possible, use constraints for this purpose, instead of triggers, as constraints are much faster. Till SQL Server 7.0, triggers fire only after the data modification operation happens. So in a way, they are called post triggers. But in SQL Server 2000 you could create pre triggers also. Search SQL Server 2000 books online for INSTEAD OF triggers.
45. What is a self join? Explain it with an example.
Self join is just like any other join, except that two instances of the same table will be joined in the query. Here is an example: Employees table which contains rows for normal employees as well as managers. So, to find out the managers of all the employees, you need a self join.
46. What is normalization?
The Process of organizing relational data into tables is actually referred to as normalization.
47. What is a Stored Procedure?
Its nothing but a set of T-SQL statements combined to perform a single task of several tasks. Its basically like a Macro so when you invoke the Stored procedure, you actually run a set of statements.
48. Can you give an example of Stored Procedure?
sp_helpdb , sp_who2, sp_renamedb are a set of system defined stored procedures. We can also have user defined stored procedures which can be called in similar way.
49. What is the basic difference between clustered and a non-clustered index?
The difference is that, Clustered index is unique for any given table and we can have only one clustered index on a table. The leaf level of a clustered index is the actual data and the data is resorted in case of clustered index. Whereas in case of non-clustered index the leaf level is actually a pointer to the data in rows so we can have as many non-clustered indexes as we can on the db.
50. When do we use the UPDATE_STATISTICS command?
This command is basically used when we do a large processing of data. If we do a large amount of deletions any modification or Bulk Copy into the tables, we need to basically update the indexes to take these changes into account. UPDATE_STATISTICS updates the indexes on these tables accordingly.
51. Which TCP/IP port does SQL Server run on?
SQL Server runs on port 1433 but we can also change it for better security.
52. From where can you change the default port?
From the Network Utility TCP/IP properties –> Port number.both on client and the server.
53. Can we use Truncate command on a table which is referenced by FOREIGN KEY?
No. We cannot use Truncate command on a table with Foreign Key because of referential integrity.
54. What is the use of DBCC commands?
DBCC stands for database consistency checker. We use these commands to check the consistency of the databases, i.e., maintenance, validation task and status checks.
55. What command do we use to rename a db?
sp_renamedb ‘oldname’ , ‘newname’
56. Sometimes sp_reanmedb may not work you know because if some one is using the db it will not accept this command so what do you think you can do in such cases?
In such cases we can first bring to db to single user using sp_dboptions and then we can rename that db and then we can rerun the sp_dboptions command to remove the single user mode.
57. What is the difference between a HAVING CLAUSE and a WHERE CLAUSE?
Having Clause is basically used only with the GROUP BY function in a query. WHERE Clause is applied to each row before they are part of the GROUP BY function in a query.
58. What do you mean by COLLATION?
Collation is basically the sort order. There are three types of sort order Dictionary case sensitive, Dictonary - case insensitive and Binary.
59. When do you use SQL Profiler?
SQL Profiler utility allows us to basically track connections to the SQL Server and also determine activities such as which SQL Scripts are running, failed jobs etc..
60. What is a Linked Server?
Linked Servers is a concept in SQL Server by which we can add other SQL Server to a Group and query both the SQL Server dbs using T-SQL Statements.

SQL Server Interview Questions 2

21. What is blocking and how would you troubleshoot it?
Blocking happens when one connection from an application holds a lock and a second connection requires a conflicting lock type. This forces the second connection to wait, blocked on the first.
22. How to restart SQL Server in single user mode?
SQL Server can be started from command line, using the SQLSERVR.EXE. This EXE has some very important parameters with which a DBA should be familiar with. -m is used for starting SQL Server in single user mode and -f is used to start the SQL Server in minimal confuguration mode.
23. what are the DBCC commands that you commonly use for database maintenance?
DBCC CHECKDB - Ensures that tables in the db and the indexes are correctly linked.and DBCC CHECKALLOC - To check that all pages in a db are correctly allocated. DBCC SQLPERF - It gives report on current usage of transaction log in percentage. DBCC CHECKFILEGROUP - Checks all tables file group for any damage.
24. What are statistics?
Statistics determine the selectivity of the indexes. If an indexed column has unique values then the selectivity of that index is more, as opposed to an index with non-unique values. Query optimizer uses these indexes in determining whether to choose an index or not while executing a query.
25. Under what circumstances they go out of date, how do you update statistics?
Some situations under which you should update statistics:
1) If there is significant change in the key values in the index
2) If a large amount of data in an indexed column has been added, changed, or removed (that is, if the distribution of key values has changed), or the table has been truncated using the TRUNCATE TABLE statement and then repopulated
3) Database is upgraded from a previous version Look up SQL Server books online for the following commands: UPDATE STATISTICS, STATS_DATE, DBCC SHOW_STATISTICS, CREATE STATISTICS, DROP STATISTICS, sp_autostats, sp_createstats, sp_updatestats
26. What are the different ways of moving data/databases between servers and databases in SQL Server?
BACKUP/RESTORE, dettaching and attaching databases, replication, DTS, BCP, logshipping, INSERT...SELECT, SELECT...INTO, creating INSERT scripts to generate data.
27. Explian different types of BACKUPs avaialabe in SQL Server?
Full database backup, differential database backup, transaction log backup, filegroup backup.
28. What is database replicaion?
Replication is the process of copying/moving data between databases on the same or different servers.
29. What are the different types of replication you can set up in SQL Server?
SQL Server supports the following types of replication scenarios:
* Snapshot replication
* Transactional replication (with immediate updating subscribers, with queued updating subscribers)
* Merge replication
30. How to determine the service pack currently installed on SQL Server?
The global variable @@Version stores the build number of the sqlservr.exe, which is used to determine the service pack installed.
31. What are cursors?
Cursors allow row-by-row prcessing of the resultsets.
32. Explain different types of cursors.
Types of cursors:
Static, Dynamic, Forward-only, Keyset-driven.
33. What are the disadvantages of cursors?
Disadvantages of cursors:
Each time you fetch a row from the cursor, it results in a network roundtrip, where as a normal SELECT query makes only one rowundtrip, however large the resultset is. Cursors are also costly because they require more resources and temporary storage (results in more IO operations). Furthere, there are restrictions on the SELECT statements that can be used with some types of cursors.
34. How can you avoid cursors?
Most of the times, set based operations can be used instead of cursors. Another situation in which developers tend to use cursors: You need to call a stored procedure when a column in a particular row meets certain condition. You don't have to use cursors for this. This can be achieved using WHILE loop, as long as there is a unique key to identify each row.
35. What is a join?
Joins are used in queries to explain how different tables are related. Joins also let you select data from a table depending upon data from another table.
36. Explain different types of joins.
Types of joins:
INNER JOINs, OUTER JOINs, CROSS JOINs. OUTER JOINs are further classified as LEFT OUTER JOINS, RIGHT OUTER JOINS and FULL OUTER JOINS.
37. Can you have a nested transaction?
Yes, very much.
38. What is an extended stored procedure?
An extended stored procedure is a function within a DLL (written in a programming language like C, C++ using Open Data Services (ODS) API) that can be called from T-SQL, just the way we call normal stored procedures using the EXEC statement.
39. Can you instantiate a COM object by using T-SQL?
Yes, you can instantiate a COM (written in languages like VB, VC++) object from T-SQL by using sp_OACreate stored procedure.
40. What is the system function to get the current user's user id?
USER_ID(). Some more system functions: USER_NAME(), SYSTEM_USER, SESSION_USER, CURRENT_USER, USER, SUSER_SID(), HOST_NAME().

SQL Server Interview Questions 1

1. What is denormalization and when would you go for it?
As the name indicates, denormalization is the reverse process of normalization. It's the controlled introduction of redundancy in to the database design. It helps improve the query performance as the number of joins could be reduced.
2. How do you implement one-to-one, one-to-many and many-to-many relationships while designing tables?
One-to-One relationship can be implemented as a single table and rarely as two tables with primary and foreign key relationships. One-to-Many relationships are implemented by splitting the data into two tables with primary key and foreign key relationships. Many-to-Many relationships are implemented using a junction table with the keys from both the tables forming the composite primary key of the junction table.
3. What's the difference between a primary key and a unique key?
Both primary key and unique enforce uniqueness of the column on which they are defined. But by default primary key creates a clustered index on the column, where are unique creates a nonclustered index by default. Another major difference is that, primary key doesn't allow NULLs, but unique key allows one NULL only.
4. What are user defined datatypes and when you should go for them?
User defined datatypes let you extend the base SQL Server datatypes by providing a descriptive name, and format to the database. Take for example, in your database, there is a column called ZIP_Code which appears in many tables. In all these tables it should be varchar(6). In this case you could create a user defined datatype called ZIP_Code_Type of varchar(6) and use it across all your tables.
5. What is bit datatype and what's the information that can be stored inside a bit column?
Bit datatype is used to store boolean information like 1 or 0 (true or false). Untill SQL Server 6.5 bit datatype could hold either a 1 or 0 and there was no support for NULL. But from SQL Server 7.0 onwards, bit datatype can represent a third state, which is NULL.
6. Define candidate key, alternate key, composite key.
A candidate key is one that can identify each row of a table uniquely. Generally a candidate key becomes the primary key of the table. If the table has more than one candidate key, one of them will become the primary key, and the rest are called alternate keys. A key formed by combining at least two or more columns is called composite key.
7. What are defaults? Is there a column to which a default can't be bound?
A default is a value that will be used by a column, if no value is supplied to that column while inserting data. IDENTITY columns and timestamp columns can't have defaults bound to them. See CREATE DEFUALT in books online.
8. What is a transaction and what are ACID properties?
A transaction is a logical unit of work in which, all the steps must be performed or none. ACID stands for Atomicity, Consistency, Isolation, Durability. These are the properties of a transaction. For more information and explanation of these properties, see SQL Server books online or any RDBMS fundamentals text book.
9. Explain different isolation levels An isolation level determines the degree of isolation of data between concurrent transactions.
The default SQL Server isolation level is Read Committed. Here are the other isolation levels (in the ascending order of isolation): Read Uncommitted, Read Committed, Repeatable Read, Serializable.
10. How does .NET and SQL SERVER thread is work?
There are two types of threading pre-emptive and Non-preemptive but Sql Server support Non-preemptive and .NET thread model is different. Because Sql have to handle thread in different way for SQLCLR this different thread are known as Tasking of Threads . In this thread there is a switch between SQLCLR and SQL SERVER threads .SQL SERVER uses blocking points for transition to happen between SQLCLR and SQL SERVER threads.
11. What's the maximum size of a row?
8060 bytes.
12. What is lock escalation?
Strong names are required to store shared assemblies in the global assembly cache (GAC). This is because the GAC allows multiple versions of the same assembly to reside on your system simultaneously, so that each application can find and use its own version of your assembly. This helps avoid DLL Hell, where applications that may be compiled to different versions of your assembly could potentially break because they are all forced to use the same version of your assembly. Another reason to use strong names is to make it difficult for hackers to spoof your assembly, in other words, replace or inject your assembly with a virus or malicious code.
13. What's the difference between DELETE TABLE and TRUNCATE TABLE commands?
DELETE TABLE is a logged operation, so the deletion of each row gets logged in the transaction log, which makes it slow. TRUNCATE TABLE also deletes all the rows in a table, but it won't log the deletion of each row, instead it logs the deallocation of the data pages of the table, which makes it faster. Of course, TRUNCATE TABLE can be rolled back.
14. What are constraints? Explain different types of constraints.
Constraints enable the RDBMS enforce the integrity of the database automatically, without needing you to create triggers, rule or defaults. Types of constraints: NOT NULL, CHECK, UNIQUE, PRIMARY KEY, FOREIGN KEY
15. What is an index?
Indexes in SQL Server are similar to the indexes in books. They help SQL Server retrieve the data quicker.
16. What are the types of indexes?
Indexes are of two types. Clustered indexes and non-clustered indexes. When you craete a clustered index on a table, all the rows in the table are stored in the order of the clustered index key. So, there can be only one clustered index per table. Non-clustered indexes have their own storage separate from the table data storage. Non-clustered indexes are stored as B-tree structures (so do clustered indexes), with the leaf level nodes having the index key and it's row locater. The row located could be the RID or the Clustered index key, depending up on the absence or presence of clustered index on the table.
17. I create a separate index on each column of a table. what are the advantages and disadvantages of this approach?
If you create an index on each column of a table, it improves the query performance, as the query optimizer can choose from all the existing indexes to come up with an efficient execution plan. At the same t ime, data modification operations (such as INSERT, UPDATE, DELETE) will become slow, as every time data changes in the table, all the indexes need to be updated. Another disadvantage is that, indexes need disk space, the more indexes you have, more disk space is used.
18. What is RAID and what are different types of RAID configurations?
RAID stands for Redundant Array of Inexpensive Disks, used to provide fault tolerance to database servers. There are six RAID levels 0 through 5 offering different levels of performance, fault tolerance.
19. What are the steps you will take, if you are tasked with securing an SQL Server?
Preferring NT authentication, using server, databse and application roles to control access to the data, securing the physical database files using NTFS permissions, using an unguessable SA password, restricting physical access to the SQL Server, renaming the Administrator account on the SQL Server computer, disabling the Guest account, enabling auditing, using multiprotocol encryption, setting up SSL, setting up firewalls, isolating SQL Server from the web server etc.
20. What is a deadlock and what is a live lock? How will you go about resolving deadlocks?
Deadlock is a situation when two processes, each having a lock on one piece of data, attempt to acquire a lock on the other's piece. Each process would wait indefinitely for the other to release the lock, unless one of the user processes is terminated. SQL Server detects deadlocks and terminates one user's process. A livelock is one, where a request for an exclusive lock is repeatedly denied because a series of overlapping shared locks keeps interfering. SQL Server detects the situation after four denials and refuses further shared locks. A livelock also occurs when read transactions monopolize a table or page, forcing a write transaction to wait indefinitely.

Configuring and Administering the SSRS Server

chapter summary

* A Reporting Services installation includes the report server catalog, which containsall the metadata necessary to run reporting services, including the encrypted reportdefinitions, encrypted data sources, and site and report security settings.
* The Reporting Services Configuration Manager tool helps in the setup of an SSRS instance after installation, including setup of virtual directories, service accounts, Report Server database access, and encryption keys.
* The RSConfig.exe command-line utility provides SSRS connection administration to the Report Server databases, ReportServer and ReportServerTempDB.
* The RSKeyMgmt.exe command-line utility helps manage encryption keys for backup, restore, scale-out instance management, and deletion of encrypted content.
* The RS.exe command-line utility provides a way to run SSRS script files at a command line.
* The SSRS encryption keys need to be backed up to an .snk file and stored securely to allow for a restore at a later point in the case of a sever failure or a reconfiguration of hardware.
* SSRS can be implemented in a scale-out architecture by sharing the report server catalog between multiple instances of the report server. The Reporting Services Configuration Manager tool can synchronize the encryption keys between the servers in order to access the shared catalog, or you can use the RSKeyMgmt.exe and RSConfig.exe command-line utilities to set up the scale-out environment.
* Reporting Services uses several configuration files for advanced settings and management. The main file is the RSReportServer.config file, which contains service settings, Report Manager settings, rendering device properties, delivery method settings, and so on.


Lesson 1: Installing and Confi guring Initial Instances of SSRS


1. Which of the tools and utilities described in this lesson can change or create the virtual directories for the report server and Report Manager after installation?

only Reporting Services Configuration Manager can enable and name the virtual directories for the report server and Report Manager.


2. What is the fi le name extension for an encryption key backup?

Encryption key backups have an .snk fi le name extension.


Lesson 2: Configuring Advanced SSRS Settings and Scale-Out Deployment


1. What are the three SSRS command-line utilities and their primary functions?

RSConfig.exe is used to defi ne the connection properties from the SSRS instance to the Report Server database; RSKeyMgmt.exe performs encryption key operations and scale-out deployment setup; RS.exe runs Report Server Script files that can perform report deployment and management operations.


2. If you want to disable one of the rendering options in the Save As drop-down list when viewing a report through Report Manager, where do you do that?

The visibility property for any of the rendering devices can be changed by modifying the RSReportServer.confi g fi le and locating the tag for the specifi c device.



Scaling out your SSRS Servers


Case scenario


As you begin the process of setting up a SQL Server 2008 BI environment, you are planning out your hardware architecture. Your solution will consist of an SSRS installation that will access online analytical processing (OLAP) data in SSAS built from a SQL Server data mart. Your manager has approved two blade servers for SSRS so that the service can be configured for scale-out deployment—that will help to balance the workload and will provide some load balancing. Additionally, your system administrator has set up Network Load Balancing (NLB) to distribute the report requests. How will you handle the following two setup requirements?

1. Your SSRS implementation requires that you set up a scale-out deployment of SSRS on the two allocated servers. You have already installed the first SSRS instance on one of the servers that points to a Report Server database on your database server. You are now ready to install an instance of SSRS on the second computer. How do you use command-line utilities to install and configure this instance to be part of the SSRS scale-out deployment?
2. To prevent a single point of failure for SSRS, you also need to handle failures at the report server catalog database level. In the case of a server hardware failure, you do not want the SSRS implementation to also fail. Which is the best high-availability technology to use for this situation and why?

Answers


1. Because the first SSRS server has been installed and configured, your next task is to install an unconfigured instance of SSRS, without having the installation use the default configuration, on the second server. You will then need to configure the virtual directories through Reporting Services Configuration Manager and set the service accounts. Before joining the new instance to the scale-out deployment, connect the SSRS instance to the same Report Server database by using RSConfig.exe with the /c parameter. At this point, you should run the RSKeyMgmt.exe tool with the /j command parameter to join an existing SSRS. You need to run this command statement on the server that is already configured and then reference the new instance that will join the existing scale-out deployment.


2. High availability at the database level can be handled by clustering, database mirroring, replication, or log shipping. However, the best choice is to use clustering. Clustering will prevent a server hardware failure from affecting the SSRS implementation, as the Database Engine will fail over to another available node in the cluster. In addition, this will be seamless to Reporting Services, as the database connection uses the virtual instance name of the database server, which also moves to the other server. With the other technologies, the SSRS implementation would need the catalog database connection to be repointed to the backup database from the mirroring, log shipping, or replication.

Scheduling and Securing Deployed Reports and Data Sources

Chapter summary

* Reporting Services has two main role types: item-level roles and system-level roles.These roles can be customized to meet your particular security requirements.
* Item-level roles handle the security of the object contained within the hierarchicalnamespace of report objects in Reporting Services, whereas system-level roles handlethe objects and operations outside the hierarchy.
* My Reports functionality allows users to have private space to work with their ownreports, without the need for high privileges in public folders. My Reports is disabledby default.
* Shared schedules are easier to maintain than report-specific schedules. Shared schedulesgive more control to an administrator.
* End users can subscribe to reports.
* To execute reports automatically on a schedule, you must store credentials for connectingto data sources in your report server database.
* You can quickly create multiple subscriptions with different properties by using a datadrivensubscription.
* Reporting Services lets you improve report performance by using cached reports andreport snapshots.
* Data sources represent access to data repositories. Credentials can be predefined andset to apply to everyone who accesses a report, or they can be defined at run time.

Lesson 1: Administering SSRS Item-Level Permissions and Site Security Roles


1. What types of roles are available in SSRS 2008, and what are their purposes?

Item-level roles and system-level roles are the two types of roles available in SSRS 2008. An item-level role is a collection of tasks related to operations on an object of the report object hierarchy of SSRS 2008. A system-level role is a collection of tasks related to operations on server objects outside the report object hierarchy of SSRS 2008.


2. Can a user or group belong to more than one item-level or system-level role?

yes, in SSRS 2008, a user or group can have more than one association to a system-level or an item-level role.


3. When storing the credentials of a data source in the server, are those credentials safe?

yes, the data source credentials are safe because Reporting Services encrypts them and stores them in the ReportServer SQL Server


Lesson 2: Creating Report Schedules and Subscriptions


1. What happens if you do not specify a parameter value in a subscription and the parameter does not have a default value?

If you do not specify a parameter value in a subscription and the parameter does not have a default value, the execution of the report will fail.


2. you want to create a subscription to a report. However, when you right-click the Subscription subfolder of the report, you notice that the new Subscription option is dimmed. What is wrong?

When the new Subscription option is dimmed, the report probably does not use stored credentials for accessing the data. SSRS needs these credentials stored in its own ReportServer database to execute a report on a schedule.


3. What can you do if your query with properties for a data-driven subscription does not provide values for all of the subscription properties?

If your query with properties for a data-driven subscription does not provide values for all of the subscription properties, you can use text and default values instead. These values are then used for parameters of all subscriptions you get from the


Lesson 3: Managing Report Caching and Execution Properties in Report Manager


1. What mechanisms do you identify to reduce the overhead of Reporting Services data sources?

Snapshots and cached reports can help reduce the processing pressure on data sources and improve report response time.


2. Can you always create a cache of a report?

no, you can create a cache of a report only when certain requirements, such as having credentials stored in the Report Server, are met.


3. Can you edit the .rdl code associated with a linked report?

no, because a linked report has no .rdl code of its own. It refers to the .rdl code of the base report.


Managing the Report Environment for Adventure Works


Case scenario


Reporting Services has been deployed in Adventure Works, and multiple departments are requesting access to the new server. However, some employees need to access reports from departments other than their own. At the same time, some users from the call center require the flexibility to create reports based on their customers. Adventure Works uses a mixed database environment, with most applications running on SQL Server 2008 but some running on Oracle.


1. What is the general security infrastructure required to support the reporting needs of the Adventure Works departments and employees?
2. From a data source security standpoint, how can you satisfy the requirements to access both Oracle and SQL Server from Reporting Services?


Answers


1. In general, because item-level roles are cumulative, for each department’s folder, you can assign the Browser role to the Active Directory group that represents the employees of each division. Additionally, you can add the Browser role to each person who needs access to more than one department’s folder.

2. To allow access to the Oracle databases, you will need to define a user with low-level privileges and store the credentials for that user in Report Server. For the SQL Server databases, given the fact that Adventure Works uses a Windows domain, you can enable Kerberos and use the Integrated Windows Authentication mechanism.

Extending and Deploying SSRS Reports

* Use report models to help users create their own reports without having to learn the complexity of the database schema or the SQL language.
* Use BIDS to create and personalize relational report models.
* Use query parameters to filter information retrieved from the database, and use report parameters to interact with end users. Create new datasets to provide a list of alternative values.
* Use expressions to dynamically change report formatting.
* Enhance expressions by using custom code, either embedded in the report or from a custom assembly.
* Manage the report server from your code by using the SSRS Web service.
* Include SSRS reports in your applications by using Visual Studio report viewer controls.

Lesson 1: Assigning Parameters Within Reports

1. What is the main purpose of a report parameter?

The main purpose of a report parameter is to add interactivity to your reports, letting users change the report behavior based on options they select.

2. What is the main purpose of a query parameter?

The main purpose of a query parameter is to filter data in the data source.

3. you want your users to select a parameter from a list of values in a list box. How should you configure the parameter?

you should create a data source that contains the possible values and then bind the data source to the parameter.

Lesson 2: Using Expressions to Perform Advanced Report Item Formatting

1. What is the main benefit of using embedded code in a report?

The main benefit of using embedded code in a report is that the code you write at the report level can be reused in any expression in the report.

2. What programming language would you use to create embedded functions in SSRS?

An SSRS report supports only visual Basic .nET embedded code.

3. How do you reference an embedded function in a report expression?

use the Code prefix and the name of the function to reference an embedded function in a report expression.

Lesson 3: Deploying New Reports and Changes

1. How can you manage reports from your application if the report server is deployed in SharePoint integrated mode?

use the ReportService2006 endpoint of the SSRS Web service if your report server is deployed in SharePoint integrated mode.

2. In which processing mode of a report viewer control can you use the full functionality of your report server?

you should use the remote processing mode to use the full functionality of your report server.

Lesson 4: Using Reports in Your Code

1. How can you manage reports from your application if the report server is deployed in SharePoint integrated mode?

use the ReportService2006 endpoint of the SSRS Web service if your report server is deployed in SharePoint integrated mode.

2. In which processing mode of a report viewer control can you use the full functionality of your report server?

you should use the remote processing mode to use the full functionality of your report server.

Creating a Reporting Services Infrastructure


Case scenario



You just successfully installed SSRS 2008. You also created two shared data sources: one configured to retrieve data from the AdventureWorks relational database and the other configured to retrieve information from a marketing/sales data mart stored in an online analytical processing (OLAP) database. The data mart is populated once a week. The schemas of the relational and OLAP databases are the same as the sample databases provided by SQL Server 2008. You will be the main developer of a set of reports that will be used in the Adventure-Works portal, and you need to address the following requirements:

1. End users want the ability to create their own reports. The users are knowledge workers who have Excel expertise but no database experience. What is the best way to create the reports, without giving end users direct access to the database? How will you build the infrastructure?
2. In the previous version of the reports, users had a set of reports that were identical to each other except that each report grouped information at different levels of the organization. Users still want the flexibility to look at the information grouped in different ways, but you want to build a single report rather than multiple reports. Given these requirements, what is the best way to create the new report?

Answers



1. You can configure report models that will let users create their own reports. You need to create two separate models, one for the relational engine and another for the SSAS database. The relational database model should be created and configured in BIDS, and the OLAP data source should be created from SSMS or Report Manager.


2. You could create a parameterized report that prompts the user for the desired level of aggregation and then dynamically creates the group by statement. Alternatively, you could use the same query for all the aggregation levels, use a table or matrix report item, and hide the grouping level based on the user selection. The main advantage of the first approach is that it pulls only the required data when the report is executed. The main advantage of the second option is that it allows the reuse of the cache if the report uses report caching.

Developing SSRS Reports

* Use the Report Server Project Wizard to create a new project and automatically start the Report Wizard to generate the first SSRS report. Then, use the Report Wizard to generate other reports. Use the Report Designer to enhance your reports and increase their functionality.
* Drag new items onto the Report Designer Design tab to add new report items, and use the Properties window to customize existing items.
* Use datasets to include more than one stream of data in your report.
* Hide columns and use the toggle properties to add interactivity to your report. Reduce the amount of information offered to the user at the beginning of the report, and let users expand and investigate areas they are interested in.
* Use groupings in Tablix data regions to summarize information by categories. You can also use multiple data regions.

Lesson 1: Creating SSRS Projects and Reports in BIDS

1. you want to include an image in a report. How do you display the Image Properties dialog box?

When you drag an image item from the Toolbox window to the Report Designer,

the Image Properties dialog box automatically opens.

2. you want to confi gure an amount to display the value in a currency format. Which property do you use?

To confi gure an amount to display a value in a currency format, select the report

item, and then set the format property to C or c.

3. What are data regions?

Data regions are report items that display repeated rows of summarized information

from datasets.

4. you want to generate a report that is formatted as a chart. Can you use the Report Wizard to create such a report?

no, the Report Wizard lets you create only tabular and matrix reports. you must

create the chart report directly by using the Report Designer.

5. you want to use BIDS to deploy a report to a different server than the one you chose in the Report Wizard. How can you change the server uRL?

you can right-click the project in Solution Explorer and then change the Target-Server uRL property.

6. Which rendering formats are affected by the PageSize properties?

Because only the Adobe PDf file, Word, and Image rendering extensions use

physical page breaks, they are the only formats that are affected by the PageSize

properties.

Lesson 2: Creating a Dataset from a Data Source

1. Can you use a stored procedure to provide data to an SSRS report?

yes, you can use a stored procedure to provide data to an SSRS report by con- fi guring the dataset to use a stored procedure command type. However, your stored procedure should return only a single result set. If it returns multiple result sets, only the fi rst one is used for the report dataset.

2. you want to use a perspective in an MDX query. How do you select the perspective?

use the Cube Selector in the MDX Query Designer to select a perspective.

3. Can you use data mining models in SSRS?

yes, you can use the DMX Designer to create data mining queries for SSRS reports. However, do not forget to fl atten the result set returned by the DMX query.

Lesson 3: Working with Advanced Report Object Properties

1. you want your report to display a hyperlink that will take users to your intranet. How do you configure such a hyperlink?

Create a text box item, set the action to Go To uRL, and then configure the uRL.

2. you want a report to display Sales by Category, SubCategory, and Product. You want users to see only summarized information initially but to be able to display the details as necessary. How would you create the report?

Group the Sales information by Category, SubCategory, and Product. Hide the SubCategory group and set the visibility to toggle based on the Category item. Hide the Product category group and set the visibility to toggle based on the SubCategory item.

3. you want to create an Excel interactive report from SSRS. In SSRS, can you create the same interactive experience in Excel that you would have on the Web?

no, you cannot create the same experience with SSRS. you can, however, use Excel to create such an experience.

Lesson 4: Applying Dataset Filters and Groups

1. What is the main difference between a Matrix report item and a Table report item?

The main difference between a Matrix and a Table report item is in the initial template. Actually, both report items are just templates for the Tablix data region.

2. When you do not use report caching, is it better to use parameters to filter information in the query or to use filters in the dataset?

from a performance perspective, it is better to use parameters because they let SSRS pull filtered data from the data source. In contrast, when you use filters, the queries retrieve all data and then filter the information in an additional step.

3. How do you configure a running aggregate in SSRS?

you can use the RunningValue function to configure a running aggregate.

Building Reports for the AdventureWorks Intranet



Case scenario



You have just successfully installed SSRS 2008. You have also created two shared data sources: one configured to retrieve data from the AdventureWorks relational database and the other to retrieve information from a sales and marketing data mart stored in an OLAP database. The data mart is populated once a week. The schemas of the relational and OLAP databases are the same as the sample databases provided by SQL Server 2008. You are the main developer of a set of reports that will be used in the AdventureWorks portal, and you need to handle the following requirements:

1.

Your end users want a report that gives them near real-time information about sales by Product Category, Product Subcategory, and Product model. The report should show only the past week’s sales and should have only three columns: Name, Quantity, and Amount. Users also want the ability to drill through the report from summary information to greater detail. You do not want to use the Report Wizard. Given these requirements, what is the best way to create the report?
2.

Your end users want a pivot table report that has Categories, Subcategories, and Models as columns and Year, Quarter, and Month as rows. The cell data should be filled with sales amount information. The information does not need to be real time. Given these requirements, what is the best way to create the report?

Answers



1. You can add a new report to the SSRS solution to satisfy this user requirement. Create a dataset that uses the AdventureWorks relational database. In the dataset’s query, filter the information to retrieve only the last week’s sales by product category, subcategory, and model. Use a Table data region and create two additional groupings, one by category and another by subcategory. Set the initial visibility status of the Subcategory and Detail rows to hidden, the toggle property of the Subcategory grouping to change based on Category, and the toggle property of the Detail grouping to change based on Subcategory.


2. For this requirement, you can use the Report Wizard to create the report. On the Data Source page, select the multidimensional database, and then use the MDX Query Builder to create the MDX query. In the Data pane, drag the Product Model Categories hierarchy, the Date.Calendar hierarchy, and the SalesAmount measure onto the Results pane. Remove the Calendar Semester and Calendar data levels. Select a matrix report, and then assign the date-related information to columns and the product category information to the rows. Last, assign the amount as detail information.

Working with SSAS Data Mining

* Creating a data mining model is easy with the Data Mining Wizard and Data Mining Designer tools in BIDS. And Data Mining Viewers display a model’s findings in intuitive ways.
* Preparing data for mining can be a complex and time-consuming task.
* You can mine relational data or cube data.
* You can create models that use simple cases based on a single table or complex cases with nested tables.
* Nested tables provide a lot of flexibility for modeling business questions.
* You can influence algorithms by setting their parameters.
* After creating the mining models, it is important to evaluate their accuracy.
* For evaluating predictive models, you make predictions on the test dataset. You can show the quality of predictions with lift chart, profit chart, or classification matrix tools.
* You browse the mining models by using the DMX language. You can write DMX queries in SSMS. You can also use the Prediction Query Builder in SSMS and BIDS to create prediction DMX queries.
* An SSRS report can use a mining model as its source.
* You control access to data sources, mining structures, and mining models through SSAS roles.
* SSAS supports Windows Authentication only.


Lesson 1: Preparing and Creating Data Mining Structures

1. you need to predict which customers are about to leave your company. How can you accomplish this task?

Predicting which customers might leave your company is a typical task for data mining. you should use a predictive algorithm such as Decision Trees to discover patterns in the data that you have about customers who have already left your company. you should then use the same algorithm to predict the churn based on data from existing customers.

2. What are the different components that make up the SQL Server BI suite?

The SQL Server BI suite includes SSAS cubes, SSAS data mining, SSRS, and SSIS

3. Why would you prepare separate training and test sets?

Whenever you use predictive models, you have to test the predictions. you train the models by using the training sets and then test the models by making predictions with the trained models on the test set.

4. Which algorithm is appropriate for fraud detection?

for fraud detection, you can use the Clustering algorithm. you can cluster the transactions and then find the transactions that do not fit well in any of the clusters.

5. In the practice, why do you think you had to discretize the continuous attributes?

you had to discretize the continuous attributes because of the naïve Bayes algorithm, which accepts discrete attributes only. If you did not discretize them, you would have to ignore them in the naïve Bayes model.

6. Do you get the same results when you use different algorithms for predictions?

no, different algorithms and even different parameters of a single algorithm give you different results—for example, slightly different predictions. That is why you have to evaluate the models by using them to make predictions on the test set.

7. How can you prepare training and test sets?

To prepare training and test sets, you can use the Data Mining Wizard and Data Mining Designer in BIDS to specify the percentage of the holdout data for the test set. you can also use the

TABLESAMPLE option of the T-SQL SELECT statement

or the SSIS Row Sampling Transformation and Percentage Sampling Transformation.

Lesson 2: Creating Models and Applying Algorithms

1. When you created a new cube and a dimension from the mining model, did you notice any objects other than the database and a cube dimension that were created in the Analysis Services project?

BIDS also created an additional DSV for this dimension.

2. How can you limit the depth of a Decision Tree?

you can use the MInIMuM_SuPPoRT and CoMPLEXITy_PEnALTy parameters to control the growth of the tree.

3. Can you precisely control the number of clusters in the Sequence Clustering algorithm?

no, the Sequence Clustering algorithm can add additional clusters if it finds distinctive and important sequences.

4. Can you always use order line items for sequences?

It depends on the way you do sales. With Internet sales, you typically can collect data that shows the sequence of items a customer placed into a market basket. However, in retail stores, you cannot rely on sequences because the sequence of purchasing gets completely remixed at the cashier.

5. for how many time points in the future should you do forecasting?

The further you go into the future, the less you can rely on the forecasts. However, this also depends on the algorithm you use for forecasting. The ARIMA algorithm is much better than ART for long-term forecasting.

6. Which algorithm would you use to find the best way to arrange products on shelves in a retail store?

you should use the Association Rules algorithm for this task. for example, you can order the products that are commonly in a basket close together to help your customers remember to buy all, rather than just one, of them in a single purchase.

Lesson 3: Validating Models, Using DMX Queries, and Using Prediction Queries in Reports

1. What are the three types of charts you can use to evaluate predictive models?

you can use a lift chart for global statistics, a lift chart for a single value, and a profit chart to evaluate predictive models.

2. How can you evaluate a Time Series model?

you can make historical predictions to evaluate a Time Series model.

3. How do you evaluate Clustering models?

you should evaluate Clustering models from a business perspective.

4. using DMX, how can you add a mining model to an existing structure so that you can share the structure with other models?

you can use the ALTER MINING STRUCTURE DMX statement to add a mining model to an existing structure so that it can be shared with other models.

5. Can you use DMX to drill through to the sample cases you used for training a mining model?

yes, you can use the DMX SELECT FROM .CASES syntax to drill through to the sample cases you used to train a mining model.

Lesson 4: Securing and Processing Data Mining Models

1. Can you use SQL Server logins for SSAS authentication?

no, you cannot use SQL Server logins for SSAS authentication. SSAS supports Windows Authentication only.

2. Do end users need the Process permission on a mining structure?

no, end users typically do not need any Process permission.

3. As an administrator, how would you prevent usage of the Clustering data mining algorithm?

you can disable the Clustering algorithm by using the Analysis Services Properties dialog box in SSMS.

4. What processing option deletes the training data in a mining structure without affecting its mining models?

use the Process Clear Structure option to purge the structure data without affecting the models inside the structure.

Working with SSAS Data Mining



Case scenario



The Adventure Works Sales Department wants to know the reasons for its customer churn. The department has requested that you implement one or more mining models to uncover these reasons. You have to prepare the data as well. The Adventure Works Finance Department is satisfied with the Decision Trees model it uses to predict the payment discipline of new customers. However, to better understand the reasons behind the predictions, staff members would like to see the source cases that the model used for training. The Finance Department also gets information about potential customers in the form of Excel 2007 worksheets. They would like to perform advanced checks on this data. For example, they would like to test the data for suspicious rows.

1. How would you prepare the data for finding the churn information?
2. Which algorithms would you use?
3. How many models would you create?
4. How can you allow the Finance Department to see the source cases that the data mining model used for training?
5. How can you use SSAS data mining in Excel 2007 to find the suspicious rows?

Answers



1. Preparing the data for customer churn is not an easy task. The problem is that you typically do not have a simple Churn attribute in your source data. You should find out from the Sales Department how it wants to define churn. For example, you could create a new attribute that shows whether a customer has purchased anything from Adventure Works in the past six months and then use this as a predictable attribute. Do not forget to randomly split the existing data into training sets and test sets.


2. You can use Decision Trees, Naïve Bayes, Neural Network, Clustering, Logistic Regression, and Linear Regression algorithms for predictions.


3. In a real-life project, you should create multiple mining models by using different algorithms and different parameters of the algorithms. Alternatively, you could try to define the attribute that measures the churn differently and apply the models on the new predictable variable. Last, you could use different input attributes for different models. You could have a single mining structure and ignore some attributes in some models. You would then deploy the model with the best performance into the production environment.


4. To allow the Finance Department to see the source cases that the data mining model used for training, you have to enable drillthrough on your mining model. To do this, first give the Drill Through permission to the SSAS database role that includes the Finance Department users. In addition, you should give this role the Read Access permission and the Read Definition permission for the data source. Last, the Finance Department users must have permissions that allow access to the source data. For example, if the source is in a SQL Server table, the users need Select permission on that table.


5. You can download the Microsoft SQL Server 2008 Data Mining Add-ins for Microsoft Office 2007 (see References) and install them on the computers in the Finance Department. Then, with the help of an ad hoc Clustering model, the Finance Department users will be able to use Excel 2007 Add-ins to analyze the worksheet data. With the Clustering model, you can easily find outliers, which are cases that do not fit well in any cluster. Outliers typically include suspicious data. Of course, you must allow session mining models on your SSAS.

Securing and Administering SSAS

* SSAS supports a comprehensive role-based security model that is layered on top of Windows security.
* Members of the SSAS Administrators role have unrestricted access to SSAS. You can create database roles to protect the cube data and control security all the way down to the individual cube cells.
* Dimension data security protects dimension members and data associated with those members. To enforce dimension data security, you define allowed sets and denied sets.
* Cell security lets you define security policy at the most granular level—that is, at the cube cell level.
* Backing up and restoring SSAS databases are essential disaster-recovery tasks. You can automate them by using the SQL Server Agent service.
* With SQL Server Profiler, you can audit server activity and troubleshoot performance problems. Windows Performance Monitor (PerfMon) lets you identify high-level performance bottlenecks. Query the SSAS schema rowsets to retrieve support and monitoring information from an SSAS instance.


Lesson 1: Setting Up SSAS Server Security

1. Which server property do you need to change to disallow local Windows administrators administrative access to the SSAS server?

To disallow local administrators the rights to become SSAS administrators, change the Security\BuiltinAdminsAreServerAdmins property for the Analysis Services instance to false.

2. Is a new role allowed cube access by default?

By default, a new role is not allowed cube access. you need to grant the role at least read permission to the cube so that its members can browse the cube.

3. What do you need to do to prevent a role from viewing calculated members that use a given measure?

Quick check answers

To prevent a role from seeing calculated members that derive from a measure, you need to set a read-contingent permission that revokes that measure.

Lesson 2: Managing SSAS High Availability, Backups, and Object Scripting

1. What should you do to protect a backup file from being accessed by unauthorized users?

To protect a backup file, select the Encrypt Backup file check box in the Backup Database dialog box, and then enter a password.

2. What SQL Server Agent step type do you need to run a backup script?

To run a backup script in a SQL Server Agent job, you need to select SQL Server Analysis Services Command as a step type.

3. How can you script the CREATE definition of a cube in SSMS?

Quick check answers

To script the CREATE cube definition in SSMS, right-click the cube, and then click the Script Cube As, Create To option.

Lesson 3: Managing SSAS Tuning and Logging

1. Which server property do you need to set to change the location of where SSAS saves data files?

you need to change the DataDir property to change where SSAS saves data files.

2. Can you enhance an existing aggregation design (custom or produced with the Aggregation Design Wizard) with an aggregation design from the usage-Based optimization Wizard?

yes, on the last page of the usage-Based optimization Wizard, select the Merge With An Existing Aggregation Design option.

3. Does the query logging feature in the SSAS server properties let you capture the MDX statements to a file for later review?

No, the QueryLog table does not store the actual MDX statements. It stores the levels for each of the attributes and hierarchies that are used by the query. You can use SQL Server Profiler to capture SSAS activity by tracing MDX.

Lesson 4: Tracing and Monitoring SSAS Instances

1. Which SSAS trace can you use to monitor the server activity for the past hour?

To monitor the server activity for the past hour, you can use the flight recorder trace.

2. Which tool would you use to determine how long a given query takes to run?

you can use SQL Server Profiler to determine how a long a query takes to run. The Duration column will give you the execution time.

3. Which schema rowset can you use to find the currently opened connections to the server?

you can use the DISCovER_ConnECTIonS schema rowset to discover information about current


Administering and Securing SSAS Cubes



Case scenario



You need to set up dimension data security on an SSAS database that contains several cubes. You want all cubes in the database to inherit the dimension data security allowed set for an attribute hierarchy. In addition, you want to know which users have been authenticated successfully by the server for a given time period.

1.

How can you set up dimension data security so that all cubes share the allowed set?
2.

How can you find out which users have been successfully authenticated by the server?

Answers



1. To propagate the allowed set to all cubes in a database, you need to set up dimension data security on the database dimension rather than on the cube dimension. To do so, expand the Dimension drop-down list on the Dimension Data tab, and then select the dimension below the database name.


2. To find the users who have been successfully authenticated by the server, use SQL Server Profiler to configure a trace based on the Standard template. Inspect the Audit Login/Audit Logout events to find the authenticated users. You can also query the DISCOVER_CONNECTIONS schema rowset to see a list of the current connections.

Managing SSAS Storage, Processing, and Deployment

* A measure group stores data physically in partitions. Partitions in a measure group can have different storage modes and aggregation designs.
* SSAS cubes perform best with MOLAP. Consider using HOLAP with very large and infrequently used partitions. Consider ROLAP to implement real-time partitions when you cannot use proactive caching.
* Aggregations are recalculated summaries of data. You might gain a remarkable performance increase by implementing useful aggregations. Useful aggregations are a byproduct of a good dimensional design.
* At design time, use the Aggregation Design Wizard to design aggregations using a cost/benefit algorithm.
* SSAS supports various deployment options to meet different management needs. During design time, use BIDS to deploy and process the changed object automatically.
* Deploying changes to a test or production environment typically requires more granular deployment. You can use the Deployment Wizard to preserve the existing management settings on the target server.
* Use the Synchronize Database Wizard to sync metadata and database changes between two SSAS databases.
* The SSAS processing architecture supports different processing options to minimize the database processing time. Processing a database fully is the easiest and cleanest way to process all objects contained in the database, but it might also take the longest.
* Proactive caching provides various advanced settings and notification options to meet different data latency needs.

Lesson 1: Defining Measure Group Partitions and Aggregations


1. How can partitioning improve query performance?

Partitioning can improve query performance by reducing the amount of data the server has to scan to satisfy queries.


2. What storage mode would you select to leave the detail data in the relational database and store the aggregations in the cube?

Hybrid OLAP (HoLAP) does not copy the data and stores the aggregations in the cube.


3. What types of notifications can you use with proactive caching?

Proactive caching supports SQL Server, client-initiated, and scheduled polling notification options.


4. How can aggregations improve query performance?

Aggregations can improve query performance because the server can satisfy queries from aggregations rather than having to scan and aggregate the partition data.


5. Which aggregation design option should you use at design time?

At design time, use the Aggregation Design Wizard.


Lesson 2: Deploying SSAS Objects


1. Does BIDS deployment retain the partition design and security settings on the target server?

no, BIDS deployment overwrites the target database.


2. Which deployment option would you use to deploy your local changes to a production database without overwriting the partition and security settings?

To deploy your changes to production, you would use the Deployment Wizard, which gives you more control over the deployment process than BIDS does.


3. Which deployment options can you use to synchronize a production database with a staging database?

The easiest option for synchronizing a production and a staging database is to use the Synchronize Database Wizard. But you could also execute a Synchronization XMLA command or backup and restore the database.


Lesson 3: Processing SSAS Objects


1. What is the difference between the Process update and Process Add options when they apply to dimensions?

The difference between Process Add and Process update is that the Process Add option detects only member inserts and Process update detects all changes.


2. What is the easiest way to process all objects in a database?

The easiest way to process all objects in the database is to process the database with the Process full option.


3. Why would you use the processing query with scheduled polling notifications?

you would use the processing query to support incremental processing of partitions instead of full processing when only new records are added to a partition.


Implementing Low-Latency OLAP and Deployment Strategies



Case scenario


1. You process the Adventure Works cube on a monthly basis as a last step of the ETLprocess that populates the AdventureWorksDW2008 database. However, some businessusers have reported that they would like to analyze sales data in real time. Whena new Internet order is placed in the sales system, a lightweight ETL process entersthe order data into the AdventureWorksDW2008 database. You need to enhance theAdventure Works cube to support low-latency reporting. In addition, based on somepreliminary testing, the QA users have reported inadequate query response timeswhen they browse the Adventure Works cube. You need improve the cube performance.How would you implement a low-latency partition to let business users browsethe daily order data?
2. For the same scenario, what can you do to optimize the cube performance?
3. As an administrator, you follow an iterative design, test, and deploy cycle to implement a cube. You use BIDS to design the cube and test it locally. Periodically, you need to deploy your changes to a test server for user acceptance testing. To handle high reporting volumes, the production SSAS servers are load-balanced. Instead of processing all production cubes individually, you want to process the cube on a staging server and then deploy it to the production servers. Only new members are added to a large dimension, and you need to minimize the dimension processing time. What deployment options would you use to deploy the cube to testing and production environments?
4. For the scenario described in question 3, what processing option would you use to add only new members to a dimension?

Answers



1. You could partition the Internet Sales measure group and set up a low-latency partition. The definition slice of the low-latency partition could filter the sales data for the current month only. In addition, you could enable proactive caching on that partition so that you do not have to process it explicitly.


2. Start by optimizing the dimensional design of your cube. Make sure that you have set up correct attribute relationships and have defined useful user hierarchies. Set the Aggregation-Usage property of infrequently used attributes to None. Run the Aggregation Design Wizard to create the preliminary aggregation design. When the cube has been deployed, you can turn on query logging and then use the Usage-Based Optimization Wizard, to tune the aggregations that are designed for the cube.


3. If you want to retain the partitions and security settings on the test server, your best deployment option is to use the Deployment Wizard. Rather than processing the production cubes individually, consider processing the cube on the staging server and then using the Synchronize Database Wizard to synchronize each of the production cubes with the staging cube.

4. If only new members are added to a dimension, you can process the dimension by using the Process Add option.

Extending SSAS Cubes

* The cube space is defined by attributes. Sometimes there are logical relationships among attributes within the same dimension. You should understand and explicitly define such relationships to optimize the UDM.
* In addition to attribute hierarchies, you can define user hierarchies that provide useful navigation paths in the UDM. A dimension can have several user hierarchies.
* You can browse data in a measure group by a dimension only if the dimension is joined to the measure group. You must review the Dimension Usage tab and correct the dimension relationships if needed. In addition to regular relationships, SSAS supports other relationship types to support more complex schemas.
* Key performance indicators (KPIs) are quantifiable measures that organizations can use to track business performance. A KPI has four main properties: Value, Goal, Status, and Target.


Lesson 1: Defining User Hierarchies and Dimension Relationships


1. Why should you spend time defining appropriate attribute relationships?

Proper attribute relationships optimize storage and improve query performance because the server might be able to produce the totals from the related attribute totals instead of scanning the fact table.


2. When creating a dimension, can you create different hierarchies to represent every possible combination of attributes and to maximize the options available to end users for using the hierarchies to explore cube data?

Although technically you can create different hierarchies to represent every combination of attributes, a large number of hierarchies within a dimension design will likely offer too many options and confuse end users. Generally, users can create their own hierarchies by simply nesting (cross-joining) different attributes onto the rows or columns of a given query, although this capability is somewhat dependent on the applications they are using. So having attributes available only for the most commonly requested or needed hierarchies is probably your best design strategy.


3. Can you create hierarchies directly from columns within a dimension’s table?

no, you cannot create hierarchies directly from columns within a dimension’s table. Hierarchies can be created only based on attributes that have been added to the dimension’s design.


4. Can a dimension be related to a measure group if the underlying dimension table is not related to the appropriate fact table in a primary key–to–foreign key relationship?

Although the dimension cannot be related to the measure group in a Regular relationship, you might be able to create a Referenced or Many-To-Many relationship if an intermediate dimension table or intermediate fact table related to the dimension and measure group in question is available. This capability within SSAS provides an elegant solution to various database modeling requirements.


5. Must every dimension you add to a cube be related to every measure group within that cube?

no, you do not have to relate every dimension you add to a cube to every measure group in the cube. In fact, you can create a cube that includes multiple measure groups whose source fact tables are related to different sets of dimensions. This lets end users browse the data in a way that makes sense from a business perspective rather than forcing them to analyze data in a way that is constrained by the underlying database design.


Lesson 2: Creating KPIs, Actions, Translations, and Perspectives


1. What types of actions can you identify?

Regular, drillthrough, and reporting actions are the three types of actions available.


2. How you can localize dimension members’ captions?

you can localize dimension members’ captions by selecting a translation column that stores the translated captions.


3. Can you use perspectives to enforce security?

no, you can use perspectives to make the cube easier to browse but not as a security mechanism.


4. What is a KPI?

A KPI, or key performance indicator, is a quantifiable measure used to track business performance.


5. What are the main properties of a KPI in SSAS?

The main properties of an SSAS KPI are Value, Goal, Status, and Trend.


6. What will the KPI Status expression return to indicate underperformance?

The KPI Status expression will return –1 to indicate underperformance.


Lesson 3: Creating Calculations and Queries by Using MDX


1. What are the first two axes of an MDX query?

Columns and Rows are the first two axes of an MDX query.


2. What is the difference between a calculated member and a regular measure in terms of storage?

The values of a regular measure are stored on the disk, and the values of a calculated member are calculated at run time.


3. What is a named set?

A named set is a set consisting of dimension members.


Extending SSAS Cubes



Case scenario



As it stands, the Adventure Works cube has data only for Internet sales. However, the business requirements state that the reporting solution must support consolidated reports that show both Internet and reseller sales. Adventure Works is also in the process of developing a Webbased dashboard solution, which needs to display vital business metrics in the form of KPIs.

1. What do you need to change in the dimensional model to accommodate the reseller sales?
2. How could you implement the Adventure Works KPIs?

Answers



1. In general, you can accommodate reseller sales by creating a new Reseller cube or by adding a Reseller measure group to the existing cube. If you need to handle large data volumes, you might find that a new cube will give you better performance. On the downside, you will not be able to author consolidated reports that draw data from separate cubes. With smaller cubes, consider adding a new measure group to the existing cube. Incorporate a performance- testing plan early in the design cycle to gauge the performance of the single-cube approach.

2. As an OLAP server, SSAS is a natural choice for implementing calculated measures and KPIs. OLAP browsers and third-party applications can use the MDX KPI functions to query the cube and retrieve the KPI properties.

Developing SSAS Cubes

* Every SSAS solution must include a data source view (DSV) that contains schema metadata to support the other objects within that solution. The solution might also include named calculations and named queries.
* The BIDS Cube Wizard and Dimension Wizard provide easy mechanisms for adding new cubes and dimensions to an SSAS solution.
* The BIDS Cube Designer provides tabs for reviewing and modifying various aspects of a cube. The two tabs you use to control the basic structure of a cube are the Cube Structure tab and the Dimension Usage tab.
* You use the BIDS Dimension Designer to modify a dimension and to add, modify, or delete specific attributes and multilevel hierarchies within the dimension.
* You can set various attribute properties to control the behavior of the attributes from an end user’s perspective. Likewise, you can control the behavior of dimension attributes through various properties available within the Dimension Designer.
* A measure group is a container of measures. The most important property of a measure is AggregationFunction, which tells the server how to aggregate the measure. Additive measures aggregate across all dimensions joined to the measure group. Semiadditive measures aggregate across all dimensions except time. Non-additive measures require custom calculations that define their aggregation behavior.

Lesson 1: Creating Data Sources and Data Source Views

1 . What is the purpose of a data source view (DSv)?

A data source view captures schematic information related to the relational database that serves as a source for your SSAS solution. The DSv contains metadata describing the tables and views selected from the database, the primary key definitions, and the relationships that exist between the tables. The DSv lets you extend the data source schema by adding named calculations and named queries and by defining table relationships and logical primary keys that do not exist in the data source.

2 . Within a DSv, why are primary key and table relationship definitions important?

The various designers and wizards within an SSAS project use the primary keys and the table relationships to determine how dimension members are uniquely defined and how the dimension and fact tables are related.

3 . If you have multiple tables that contain information related to a single dimension, how would you simplify the representation of that information within your DSv?

you can combine the relevant columns from each of the tables into a single named query by using a SELECT statement that joins the tables together.

Lesson 2: Creating and Modifying SSAS Cubes

1 . Which cube creation method should you select in the Cube Wizard if you want to build the cube on top of an existing data source?

you need to select the use Existing Tables method when you build the cube from the bottom up.

2 . What do you need to do to make the cube available for browsing?

you need to deploy the cube definition to the server and process its objects to load it with data. BIDS combines these two tasks in one step when you deploy the database to the server.

3 . Can you modify the design of a dimension that has been included in a cube?

Although you can modify certain properties for a dimension to change its behavior within a cube, the physical design of a “cube dimension” is inherited from the dimension at the database level; you can find that dimension definition under the Dimensions folder in Solution Explorer.

Lesson 3: Creating and Modifying Dimensions

1 . While using the Dimension Wizard, you inadvertently select the wrong primary key column and fail to select all the attributes you need for the dimension. Do you have to delete the dimension and start over?

Although you can certainly delete the dimension and start over, it is likely that you can make all necessary corrections by using the Dimension Designer. Simply update the key attribute to reflect the correct KeyColumns property and add the additional attributes as needed.

2 . When creating a new dimension by using the Dimension Wizard, can you specify the cubes to which the new dimension will be added after it is created?

no, you cannot specify the cubes to which a new dimension will be added. The Dimension Wizard simply creates any new dimension as a database-level dimension. If you want to use the new dimension within one or more existing cubes, you must open each cube within the Cube Designer and add the dimension.

3 . your factInternetSales fact table references the DimDate table three times with orderDateKey, DueDateKey, and ShipDateKey foreign keys. you want the end users to browse the cube data by these dates. Do you need to create separate Time database dimensions?

no, you add the same time dimension multiple times to the cube as a role-playing dimension.

Lesson 4: Creating Measure Groups and Measures

1 . What is the difference between measures and a measure group?

A measure group is a container of measures. Typically, a measure group represents a fact table, and its measures represent the facts.

2 . What types of measures can you identify based on the way they aggregate?

Based on how it aggregates data, a measure can be additive, semiadditive, or nonadditive.

3 . How do you set the aggregation function of a measure?

you can set a measure’s aggregation function by setting its AggregationFunction property to one of the SSAS standard aggregation functions.

Building an SSAS Solution as a Prototype

Case scenario

The business development department at Adventure Works has asked you to develop a solution to analyze Internet and reseller sales by using Excel 2007. Given the strong support for working with cubes in Excel, you decide to create an SSAS solution that provides the requested sales data, with various dimensions, hierarchies, and attributes available to support a broad range of analysis needs.

1. How can you quickly create a prototype of your solution to gain a better understanding of the data and the end-user requirements?
2. What SSAS features are you likely to take advantage of to improve the usability of the cube from an end-user perspective?

Answers

1. To build an initial prototype, you would create a new SSAS solution and add a data source and a data source view (DSV) on top of the database whose schema defines dimension and fact tables. You could then use the Cube Wizard to generate a cube with measure groups for Internet and reseller sales data, along with dimensions for business entities such as time, products, customers, geography, sales reasons, employees, and sales territories. After processing the new SSAS database, you could use the browsers within BIDS or Microsoft Office Excel to review the resulting design, possibly even sharing it with a few of your end users, to better understand the available data and the end users’ analytical requirements.
2. To improve the usability of the cube, you would first want to ensure that all the measures, measure groups, dimensions, and attributes have business-friendly names. Next, you would likely need to define format strings for all the measures that show currencies and other numbers appropriately. Last, within each dimension, consider adding additional attributes that create useful attribute hierarchies.