http://reddymsbitools.blogspot.com

Monday, 28 February 2011

MSBI Interview Questions and Answers

SSIS
1) Use Of Event Handlers
Based on event (On error, on Progress, On Pre/Post Validate..), if you want to perform some action like send email, insert log info into table etc.. we can go for Event Handler Tab.
http://msdn.microsoft.com/en-us/library/ms140011.aspx
http://consultingblogs.emc.com/jamiethomson/archive/2005/06/11/SSIS_3A00_-Custom-Logging-Using-Event-Handlers.aspx
2) Diff b/w script task & script component and its advantages.
Script task is control flow level item where as script component is data flow level item, both of the functionalities are same. This 2 are very powerful items in SSIS.
Normally we can implement custom code by using these components.
By using this two items we can perform any action like DML operations on the data from DB, file and we can do any operation using c# or vb.net code we can use it as Source, destination as well. In single sentence i can conclude the usage of this items we can use to perform any ETL operation if you are good at .net.
http://sqlblog.com/blogs/andy_leonard/archive/2007/10/14/ssis-design-pattern-read-a-dataset-from-variable-in-a-script-task.aspx
http://www.ssistalk.com/2007/04/04/ssis-using-a-script-component-as-a-source/
3) What is a Master package.
Master package is nothing but it is also a normal SSIS package, it act like a master to run the all other (child) packages. If you call the other SSIS packages in a new SSIS package using ‘Execute Package Task’, the new package called as master package. The use of this package(Master) is, instead of running all individual packages if we can run the single master package, then automatically all the child packages will run. Normally we simply schedule the master package in Sql server agent by adding the all child packages values in the master package config file.
http://sqlblog.com/blogs/jorg_klein/archive/2010/01/04/ssis-package-design-pattern-for-loading-a-data-warehouse.aspx
4) Have you worked with data source view in ssis.
You can simply say ‘i never used so far’.
A DSV allows you to create a metadata layer over an existing Data Source. Within a DSV, you can select a subset (or all) of the tables and views available from a Data Source, rename any of the tables/views and/or their columns, add new Named Calculations (which act like computed columns within a table/view), and add new Named Queries (which are complete select statements, which basically act like views).
http://database.blogs.webucator.com/2010/10/20/how-to-create-data-source-views-in-sql-server-integration-services-2008/
5) If there are 100 Packages, do you create 100 configuration files or u create how many.
The answer is based on the requirement we will create 100 config files for 100 packages.If i go for master package and if i am using same kind of connections for the child packages like source and destinations connections then we will create less config files, because Master Package will pass all the connections using master-child relation using the variables. Creation of config files is dependence of the requirement.
http://www.mssqltips.com/tip.asp?tip=1434
http://www.sql-server-performance.com/articles/dba/package_configuration_2005_p1.aspx
ssas
1) Difference b/w Cube-Deploy and Process.
Once you build the cube in the BIDS, we can say the structure of the cube is created. So once the cube structure is built we need to deploy the structure to the actual Sql server analysis server. This process is called DEPLOYMENT of cube. (like creating one object(table) in the relational database)
So once you deploy the structure to the SSAS server, we need to load the data into the structure we have created. This process of loading data into the cube and creating the aggregations is called Cube processing.(like inserting the data into your table)
So to make a cube online for the user for querying we need to build the structure of the cube in BIDS and deploy it to a SSAS server and the process it to load with data and aggregations, then the user can query on the SSAS cube.
Best Practice method to deploy a cube:
http://www.mssqltips.com/tip.asp?tip=1883
2) Different kinds of Dimensions. like Degenerate dimension, Conformed dimension, Junk dimension with example.

Different types of Dimensions:
I.SCD (Slowly Changing Dimension- Type 1 Type 2 and Type 3)
Type 1: The new record replaces the original record. No trace of the old record exists.
Type 2: A new record is added into the customer dimension table. Therefore, the customer is treated essentially as two people.
Type 3: The original record is modified to reflect the change
II. There are other three types of Dimensions:
1. Confirmed Dimensions.
2. Junk Dimensions.
3. Degenerate Dimensions.
-Confirmed Dimension:
The dimensions which is used more than one fact table is called conformed dimensions. It means we can definitely use CD in the nature
Ex: Time,Geography,customer,employee,product etc..
-Junk Dimension:
Junk dimensions are dimensions that contain miscellaneous data (like flags and indicators) that do not fit in the base dimension table.
-Degenerate Dimension :
A degenerate dimension is data that is dimensional in nature but stored in a fact table. For example, if you have a dimension that only has Order Number and Order Line Number, you would have a 1:1 relationship with the Fact table. Do you want to have two tables with a billion rows or one table with a billion rows. Therefore, this would be a degenerate dimension and Order Number and Order Line Number would be stored in the Fact table
III. Cube Dimensions
a. Fact Dimensions
b. Role-Playing Dimensions
c. Reference Dimensions
d. Many-to-Many Dimensions
http://consultingblogs.emc.com/christianwade/archive/2005/04/07/1255.aspx
http://www.informaticans.com/blog/data-warehousing-faqs/dimensions-different-types/

3) About Cube-Partions.its advantage.
We used Partitions to Improve query performance.
A partition is a file on a hard disk that contains a subset of the data included in an Analysis Services database. Partitions let you spread data over multiple hard disks. This includes combinations of both local (stored locally on hard disk) and remote (distributed across multiple hard disks) partitions. Partitions rely on storage settings to define the format and processing schedule for the database, and they use writeback settings to enable what-if analysis. What-if analysis enables a user to input their own data and evaluate the changes that cascade throughout their cube.
Method to create Partitons:
http://www.mssqltips.com/tip.asp?tip=1549

4) What is the use of data source view.
Data Source Views (DSV) have been introduced in SQL Server Analysis Services (SSAS) 2005. A DSV is a metadata layer between the cube and the underlying data source that allows us to build our cubes without being connected to the underlying source
5) iIhave 10 dimensions.i will delete one.and i will process the cube .is that deleted dimension will available in cube?
If you want to remove the dimension from the cube, you can hide/delete the dimension then re-deploy the cube and you need to perform the full process then the deleted dimension won’t appear in the cube.
6) diff b/w calculated members and calculated measures
Member is nothing but column on dimension table. Measure is nothing but column on fact table.
The term calculated member refers to the creation of any MDX object through a calculation. The calculated member can be part of the measures dimension where a simple MDX expression such as addition or subtraction of two or more base measures results in a new measure. Such calculated members on the measure dimension are referred to as calculated measures.
Calculated Member is related to the Dimension where as calculated Measure is related to the measure in measure group.
http://msdn.microsoft.com/en-us/library/ms166568.aspx
SSRS
1) types of parameters
Single value parameters, Multi value parameters,
Cascading parameters (one parameter is dependent on another parameter selection)
http://msdn.microsoft.com/en-us/library/aa337234.aspx
http://sql-bi-dev.blogspot.com/2010/07/report-parameters-in-ssrs-2008.html

2) what are drill through and drill down reports.
Drill down is when a power play model has several levels within a hierarchy. Given a location dimension with country/ state and city. A starting view will show the summary of countries, Drill down on one country will expose their states, drill down on a state will show their cities. So drill down goes to lower levels of a dimension as designed in the model.
Drill thru is the ability to leave PowerPlay and call a SQL Query, (say in Impromptu) passing the dimension line as filters to the query. If the columns in the query match the columns used to build the dimensions the query will automatically apply those values to the where clause. You can write any query or queries you want to support a drill thru from a cube. Context is important to balance the two
Database
1) How u done partitions.
If your database contains very large tables, you may benefit from partitioning those tables onto separate filegroups. This technology, introduced in SQL Server 2005, allows you to spread data onto different physical disks, leveraging the concurrent performance of those disks to optimize query performance.
http://www.mssqltips.com/tip.asp?tip=1914

2) Difference b/w view and materliazed view.
Views
A view takes the output of a query and makes it appear like a virtual table. You can use a view in most places where a table can be used.
All operations performed on a view will affect data in the base table and so are subject to the integrity constraints and triggers of the base table.
A View can be used to simplify SQL statements for the user or to isolate an application from any future change to the base table definition. A View can also be used to improve security by restricting access to a predetermined set of rows or columns.
In addition to operating on base tables, one View can be based on another, a view can also JOIN a view with a table (GROUP BY or UNION).

Materialized Views
Materialized views are schema objects that can be used to summarize, precompute, replicate, and distribute data. E.g. to construct a data warehouse.
A materialized view provides indirect access to table data by storing the results of a query in a separate schema object. Unlike an ordinary view, which does not take up any storage space or contain any data.

The existence of a materialized view is transparent to SQL, but when used for query rewrites will improve the performance of SQL execution. An updatable materialized view lets you insert, update, and delete
3) Have u worked with sequences.
CREATE SEQUENCE will enter a new sequence number generator into the current database. This involves creating and initializing a new single-row table with the name seqname. The generator will be owned by the user issuing the command.
http://blogs.msdn.com/b/sqlcat/archive/2006/04/10/sql-server-sequence-number.aspx

4) About logging in view.
http://www.stanford.edu/dept/itss/docs/oracle/10g/server.101/b10759/statements_6003.htm

Frequently asked Sql Queries with answers
http://blog.sqlauthority.com/2007/04/21/sql-server-interview-questions-and-answers-complete-list-download/
http://www.java2s.com/Code/SQLServer/CatalogSQLServer.htm

Some Other Interview Questions:
http://www.bigresource.com/MS_SQL-ssis-master-package-fail--MaixRNiy.html
Try to get the info from net.I do not have time to explain all these...
DTS vs SSIS: A basic overview
http://decipherinfosys.wordpress.com/2008/03/11/dts-vs-ssis-a-basic-overview/
How to create a basic Cube
http://www.mssqltips.com/tip.asp?tip=1532

MSBI - Some IMP URL

***->MSBI-Books to read****

SSIS-ETL World Record

Today at the launch of SQL Server 2008, you may have seen the references to world-record performance doing a load of data using SSIS. Microsoft and Unisys announced a record for loading data into a relational database using an Extract, Transform and Load (ETL) tool. Over 1 TB of TPC-H data was loaded in under 30 minutes. I wanted to provide some background material in the form of a Q&A on the record, since it’s hard to give many details in the context of a launch event. We are also planning a paper that talks about all this, so think of this article as a place-holder until the full paper comes along. I hope you find this background information useful.
-- See the below link for more info
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx

SSAS-Storage Modes in Analysis Services

Why did we only use the MOLAP storage mode in Project REAL? There is also ROLAP and HOLAP storage, but they weren’t used at all. Also, why doesn’tProject REAL illustrate pro-active caching?

The Analysis Services 2005 Performance Guide contains a more complete discussion of the storage modes in Analysis Services in Appendix B. Because the question comes up from time to time regarding Project REAL, this write-up gives a summary.

Every partition in AS has an associated storage mode. MOLAP partitions store aggregations and a copy of the source data (fact and dimension data) in a multidimensional structure on the Analysis server. This is the optimal storage mode for several reasons:
Compression. When relational data is processed into AS, the storage needed is typically 20% to 40% of the size of the un-indexed relational fact table. Less I/O means that the MOLAP data is faster to access.
Multidimensional structure. The format in which AS data is stored is optimized for the kind of access that OLAP users do – access to the data using arbitrary dimensions and arbitrary slices of the dimension.
Data local to AS. There is no need for AS to call out to another service to obtain data when MOLAP storage is used.
Indexes are not needed on relational fact tables. When AS processes fact data, it generally can do a full table scan, so indexes are not needed. When there are multiple partitions in a measure group, there will be one join to the dimension that is used for partitioning, but that’s all. Then at query time, no requests are sent to the relational database, so no indexes are needed for querying. Given that indexes can multiply the size of data in the relational system, the space savings can be significant.

HOLAP partitions store aggregations in a multidimensional structure on the Analysis server, but do not copy fact data from the original relational database. As a result, whenever Analysis Services needs to resolve a query against fact data stored in a HOLAP partition, Analysis Services must query the relational database rather than using a multidimensional structure stored on the Analysis server. Some administrators choose HOLAP because HOLAP appears to require less total storage space while yielding high performance queries. However, the disadvantages almost always outweigh the advantages:
When a query “misses” the aggregations AS will have to query the relational database for fact-level data. This is necessarily slower than getting data from the compressed and natively multidimensional MOLAP storage.
Because of the above, many administrators will create more aggregations to decrease the number of “misses.” At some point the additional aggregations can become larger than the fact data would have been. But even if it doesn’t, more aggregations take more time to process.
Processing of HOLAP partitions is not significantly faster than processing of MOLAP partitions. In both cases the entire fact table must be read into AS in order to create aggregations. The only difference is whether AS also saves the fact data. However if more aggregations were created because of the previous point, processing will be slower.

ROLAP partitions store aggregations in the same relational database that stores the fact data. ROLAP partitions take longer to process, and because all queries must be redirected to the relational database, ROLAP partitions are slower to query against as well.

Given all this, it sounds like only MOLAP should ever be used! For most implementations that is the case. The exception is when near-real-time access is needed to data that changes throughout the day. Even there, pro-active caching is the first technique of choice. However, if even pro-active caching does not allow frequent enough updates, then the use of one ROLAP partition in the measure group is a way to handle the incoming data.

The recommended way to handle near-real-time scenarios is to take advantage of the fact that the storage mode is set on a per-partition basis. A vast majority of data in an OLAP cube does not change throughout the day. That data should be stored in MOLAP partitions. One partition in the measure group is designated to receive updates as they come in. That way, only that partition needs to be updated. This partition can be updated using proactive caching or it can be defined as a ROLAP partition with zero aggregations. In the latter case, all queries to this partition will be sent to the relational source, and current data will always be as current as that table.

For more details about near-real-time data access, see the section “Near Real-Time Data Refreshes” in the Analysis Services 2005 Performance Guide.

An example of how such a partitioning scheme would be updated might look like this:
Fact tables contain weekly partitions in the RDBMS, and there are corresponding weekly partitions in AS.
Each night, data for the previous day is processed into the correct weekly partition.
As new data comes in during the day, it flows into a relational partition for current updates. AS has a near-real-time partition that maps to that relational partition. Either using pro-active caching or a ROLAP partition with zero aggs, users get access to new data throughout the day.
At night, the collected daily data is moved to the weekly partition and processed (either by full process or using incremental update). The near-real-time partition is now ready to collect the next day’s updates.

Returning to the original question: Why did we only use the MOLAP storage mode in Project REAL? The simple answer is that MOLAP storage performs the best. The slightly extended answer is that Project REAL is based on the scenario in the Barnes & Noble data warehouse. The business model centered on daily updates, not near-real-time access. Therefore, there was no need for proactive caching or ROLAP partitions

SSRS-Scale-Out Deployment Best Practices

Check out the the third of five technical note as part of the Building and Deploying Large Scale SQL Server Reporting Services Environments Technical Note Series: Reporting Services Scale-Out Deployment Best Practices
This technical note reviews the
SSRS Scale-Out Architecture
Report Catalog sizing
The benefits of File System snapshots for SSRS 2005
Why File System snapshots may not help for SSRS 2008
Using Cache Execution
Load Balancing your Network
Isolate your workloads
Report Data Performance Considerations

SSIS - Full-load vs. Delta-Load

SSRS - Adding Subreports

A subreport is a report item that displays another report inside the body of a main report. Conceptually, a subreport is similar to a frame in a Web page. It is used to embed a report within a report. Any report can be used as a subreport. The report that the subreport displays is stored on a report server, usually in the same folder as the parent report. You can design the parent report to pass parameters to the subreport. A subreport can be repeated within data regions, using a parameter to filter data in each instance of the subreport.
If you use subreports to display separate groups of data, consider using data regions (tables, matrices, lists, charts, or gauges) instead. Reports with data regions only may perform better than reports that include subreports.
Use data regions when you need to nest groups of data from the same data source within a single data region. Use subreports if you need to nest groups of data from different data sources within a single data region, reuse a subreport in multiple parent reports, or display a standalone report inside of another report. You can create a "briefing book" by placing multiple subreports inside the body of another report.
Note:
In Report Designer, if you preview a report that contains subreports, and then change the subreport, the preview may not be updated. To see the changes, click the Refresh button.
Using Parameters in Subreports
To pass parameters from the parent report to the subreport, define a report parameter in the report that you use as the subreport. When you place the subreport in the parent report, you can select the report parameter and a value to pass from the parent report to the report parameter in the subreport.
Note:
The parameter that you select from the subreport is a report parameter, not a query parameter. For more information about parameters, see Adding Parameters to Your Report.
You can place a subreport in the main body of the report, or in a data region. If you place a subreport in a data region, the subreport will repeat with each instance of the group or row in the data region. To pass a value from the group or row to the subreport, in the subreport value property, use a field expression for the field containing the value you want to pass to the subreport parameter.
For more information about working with subreports, see How to: Add a Subreport and Parameters (Reporting Services).

For More Details:-
Here is the good link aboutSubReport
http://www.simple-talk.com/content/print.aspx?article=278

SSIS-Waiting for a file

SSIS - Difference between Control Flow and Data Flow

SSIS-Parallel Execution

Hi,
this is important one to achieve parallel execution using SSIS.

http://shujaatsiddiqi.blogspot.com/2008/10/ssis-multithreading-parallel-execution.html

SSIS-Performance Tuning Techniques(SSIS Engine Overview,BlockingExecution Trees, Buffer Sizing, Parallelism)

Hi Performance tuning is very important one as a process of data loading using ETL tools.
When you architect data integration solutions, your design decisions not only determine how successfully your solution meets functional requirements, but also how well your solution meets performance requirements. To make the right performance design decisions, you need to understand the performance architecture of your data integration tool and, just as importantly, the techniques that enable you to maximize the tool’s utilization of system resources such as memory and CPU.
Microsoft® SQL Server™ 2005 Integration Services (SSIS) provides full-featured data integration and workflow engines coupled with a rich development environment for building high-performance data integration solutions. SSIS provides a variety of optimization opportunities to help you maximize resource utilization while successfully meeting the needs of your specific data integration scenario.
http://msdn.microsoft.com/en-us/library/cc966529.aspx

http://shujaatsiddiqi.blogspot.com/2008/08/performance-improvement-of-sql-server.html

SQL SERVER – Nth Highest Salary

SSIS-Transactions usage