http://reddymsbitools.blogspot.com

Thursday, 18 November 2010

Incremental Loading & Batch Processing?

Incremental Loading:Its also called retrospective data load.In this data being loaded on a regular inerval and only new data is moved.
eg in ur data warehouse ur loading "Transactional Data" on daily basis and its happning only for new Transactions because old Data is already in the Data warehouse.
Batch Processing:
Its processing of Jobs/Task together.
eg if u have a staging area u may have valid reason to get data for different Departments there first and then move all that in one shot to production.

Difference between ODS and Staging

---Scenario 1#
An operational data store (ODS) is a type of database often 
used as an interim area for a data warehouse. Unlike a data 
warehouse which contains static data the contents of the 
ODS are updated through the course of business operations. 
An ODS is designed to quickly perform relatively simple 
queries on small amounts of data (such as finding the 
status of a customer order) rather than the complex 
queries on large amounts of data typical of the data 
warehouse. An ODS is similar to your short term memory in 
that it stores only very recent information; in comparison 
the data warehouse is more like long term memory in that it 
stores relatively permanent information.

But in staging we are storing current as well as historic 
data. This data might be a raw and then need cleansing and 
transform before load into datawarehouse.
--Scenario 2#
ODS keeps the transactional data from where direct reporting is possible. The data in ODS can be partly denormalized.
Staging area is the exact replica of the transactional database which acts as a source for datawarehouse processing.

Types Of Facts?

# Additive: Additive facts are facts that can be summed up through all of the dimensions in the fact table.
# Semi-Additive: Semi-additive facts are facts that can be summed up for some of the dimensions in the fact table but not the others.
# Non-Additive: Non-additive facts are facts that cannot be summed up for any of the dimensions present in the fact table.

Fact vs Dimension Table

A fact table captures the data that measures the organization's business operations. A fact table might contain business sales events such as cash register transactions or the contributions and expenditures of a nonprofit organization. Fact tables usually contain large numbers of rows sometimes in the hundreds of millions of records when they contain one or more years of history for a large organization.

Dimension tables contain attributes that describe fact records in the fact table. Some of these attributes provide descriptive information; others are used to specify how fact table data should be summarized to provide useful information to the analyst.

Explain degenerated dimension in detail.

 Degenerate dimension: A column of the key section of the fact table that does not have the associated dimension table but used for reporting and analysis such column is called degenerate dimension or line item dimension.For ex we have a fact table with customer_id product_id branch_id employee_id bill_no date in key section and price quantity amount in measure section. In this fact table bill_no from key section is a single value it has no associated dimension table. Instead of cteating a seperate dimension table for that single value we can include it in fact table to improve performance.SO here the column bill_no is a degenerate dimension or line item dimension
-----Scenario 2
Degenerated Dimension is achieved through a gradual modeling approach following Dimensional Modeling standards. Let's take example of a Star Schema representing Sales Invoices. The FACT would have the "Invoiced Amount" as primary measure. Now when we look at the source of the Invoice, it is the body if the Paper Invoice that gives us the following particulars about each Invoice:
Invoice Date
Customer ID
Products within the Invoice
Reference to Order Number(s)
Invoice Number
Invoice Line Numbers (which are multiple lines in single Invoice)
Invoice Line Amount
Invoice Total Amount
When we model the above following Dimensional Modeling standards, we get following distinct Dimensions:
Calendar Dimension - representing the Invoice Date
Customer Dimension - representing Customer ID
Product Dimension - representing Products within the Invoice
Order Dimension - representing Orders
Invoice Dimension รข€“ representing Invoice Number & Invoice Line Numbers
Question comes - what attributes would be left to be part of the INVOICE DIMENSION, if at all we decide to have one! Only candidate attributes are Invoice Number and Invoice Line Numbers. But, this is at the granularity of the FACT, which stores references to all above said Dimensions as well as the measures i.e. Invoice Line Amount, Invoice Total Amount (Derived by aggregation).
It is at this situation, we may decide to degenerate the attributes Invoice Number & Invoice Line Number into the Fact and avoid having a distinct entity to represent Invoice Number / Line Numbers as a Dimension. What we achieve by this:
1. avoiding a huge join as both Fact and this Dimension would have the same granularity,
2. still able to query with Invoice Number as the entry point
So, when such a scenario appears, we make the left out attributes (i.e. Invoice Number & Invoice Line Number in our case) part of the Fact and part of the Primary Key in the Fact. This is why and how we model Degenerated Dimensions.

Sql server Database Backup's

OverviewThere are only two commands for backup, the primary is BACKUP DATABASE.  This allows you to do a complete backup of your database as well as differential, file, etc. backups depending on the options that you use.
ExplanationThe BACKUP DATABASE command gives you many options for creating backups.  Following are different examples.
Create a full backup to diskThe command is BACKUP DATABASE databaseName.  The "TO DISK" option specifies that the backup should be written to disk and the location and filename to create the backup is specified.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
GO
Create a differential backupThis command adds the "WITH DIFFERENTIAL" option.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK' 
WITH DIFFERENTIAL 
GO
Create a file level backupThis command uses the "WITH FILE" option to specify a file backup.  You need to specify the logical filename within the database which can be obtained by using the command sp_helpdb 'databaseName', specifying the name of your database.
BACKUP DATABASE TestBackup FILE = 'TestBackup' 
TO DISK = 'C:\TestBackup_TestBackup.FIL'
GO
Create a filegroup backupThis command uses the "WITH FILEGROUP" option to specify a filegroup backup.  You need to specify the filegroup name from the database which can be obtained by using the command sp_helpdb 'databaseName', specifying the name of your database.
BACKUP DATABASE TestBackup FILEGROUP = 'ReadOnly' 
TO DISK = 'C:\TestBackup_ReadOnly.FLG'
GO
Create a full backup to multiple disk filesThis command uses the "DISK" option multiple times to write the backup to three equally sized smaller files instead of one large file.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks_1.BAK',
DISK = 'D:\AdventureWorks_2.BAK',
DISK = 'E:\AdventureWorks_3.BAK'
GO
Create a full backup with a passwordThis command creates a backup with a password that will need to be supplied when restoring the database.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH PASSWORD = 'Q!W@E#R$'
GO
Create a full backup with progress statsThis command creates a full backup and also displays the progress of the backup.  The default is to show progress after every 10%.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH STATS
GO
Here is another option showing stats after every 1%.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH STATS = 1
GO
Create a backup and give it a descriptionThis command uses the description option to give the backup a name.  This can later be used with some of the restore commands to see what is contained with the backup.  The maximum size is 255 characters.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
WITH DESCRIPTION = 'Full backup for AdventureWorks'
GO
Create a mirrored backupThis option allows you to create multiple copies of the backups, preferably to different locations.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
MIRROR TO DISK =  'D:\AdventureWorks_mirror.BAK'
WITH FORMAT
GO
Specifying multiple optionsThis next example shows how you can use multiple options at the same time.
BACKUP DATABASE AdventureWorks 
TO DISK = 'C:\AdventureWorks.BAK'
MIRROR TO DISK =  'D:\AdventureWorks_mirror.BAK'
WITH FORMAT, STATS, PASSWORD = 'Q!W@E#R$'
GO