SSIS Tasks Explained
Data Flow Task
•Data Flow Task: This task extracts data from a source, allows for transformations of that data, and then the data is loaded into a target data destination.
Data Preparation Tasks
•File System Task: This task allows the user to copy/move/delete files and directories on a file system.
•FTP Task: This task allows the user to copy/move/delete files and directories over FTP.
•Web Service Task: This task allows the user to execute a Web service method and store the results
•XML Task: This task is used to work with XML data. XSLT can be used along with XPath to validate, compare and merge documents. The results of this can then be stored.
•Data Profiling Task: This task can be used for checking and validating data quality. Profiles can be set up and checked for varius data quality issues such as, column length issues, column patterns, column statics, etc.
Workflow Tasks
•Execute Package Task: This task will run other SQL Server Integration Services packages.
•Execute Process Task: This task will execute an application or batch file.
•Message Queue Task: This task allows you to send and receive messages between SSIS packages, or to send messages to an application via an application queue. This task uses Message Queuing (MSMQ)
•Send Mail Task: This task allows for email messages to be created and sent using an SMTP server.
•WMI Data Reader Task: This task allows a Package to query, using WQL, computer systems (local and remote) for information regarding that computer.
•WMI Event Watcher Task: This task watches for WMI events that have occurred on a computer system, and allows the package to take an action if certain criteria are met.
Scripting Tasks
•Script Task: This task can be used to program functions that are not available in the standard SSIS tasks or transformations. In SSIS 2005 this task can be programmed in VB .NET. In SSIS 2008 VB .NET and C# can be used to program a Script Task.
SQL Server Maintenance Tasks
•Back Up Database Task: This task will allow you to backup a one or many SQL Server databases.
•Check Database Integrity Task: This task will allow you to check the integrity of all the objects in one or many SQL Server databases.
•Execute SQL Server Agent Job Task: This task allows for the execution of a SQL Server Agent job.
•Execute T-SQL Statement Task: This task is similar to the Execute SQL Task, however it only supports Transact SQL Statements. It should be used for SQL Server specific SQL statements.
•History Cleanup Task: This task allows for the cleanup of historical activity data. It will cleanup the history for database maintenance plans, backup activites, restore activities and SQL Server agent jobs.
•Maintenance Cleanup Task: This task allows for the cleanup of backup files, and the reports of maintenance plans.
•Notify Operator Task: This task allows SSIS to notify SQL Server Agent operators. They can be notifies by email, pager, or netsend.
•Rebuild Index Task: This task will rebuild an index or indexes on one or many databases.
•Reorganize Index Task: This task will reorganize an index or indexes on one or many databases.
•Shrink Database Task: This task will shrink the size of the SQL Server database data and database log files.
•Update Statistics Task: This task will update the statistics for one of many tables in one or many databases.
SQL Server Tasks
•Bulk Insert Task: This task offers an efficient way to copy large volumes of data.
•Execute SQL Task: This task allows the execution of a SQL statement. If the statement returns results, they can be stored in a variable.
•Transfer Database Task: This task will copy or move a SQL Server database between two instances of SQL Server. It can even be used to make a copy of a database on the same server. Databases can be copied either online or offline.
•Transfer Error Messages Task: This task will transfer a single or multiple SQL Server user defined error messages between SQL Server instances. It can be setup to transfer specific user messages or all error messages.
•Transfer Jobs Task: This task will transfer a single or multiple SQL Server Agent jobs between SQL Server instances.
•Transfer Logins Task: This task will transfer a single or multiple SQL Server logins between SQL Server instances.
•Transfer Master Stored Procedures Task: This task will transfer a single or multiple SQL Server Master database stored procedures between SQL Server instances.
•Transfer SQL Server Objects Task: This task will transfer a single or multiple SQL Server database objects between SQL Server instances. Most of SQL Servers DDL objects can be copied with this task.
Analysis Services Tasks
•Analysis Services Execute DDL Task: This task will run data definition language statements on Analysis Services. This allows for the create, drop, alter of cubes, dimensions and mining models.
•Analysis Services Processing Task: This task will process Analysis Services Cubes, Dimensions, and Mining Models.
•Data Mining Query Task: This task will run a DMX (Data Mining Extensions) query that create a prediction based on new data that is run against a Analysis Services data mining model.
Posted by Ravi at 5:59 AM 0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Thursday, August 26, 2010
Checkpoints,ErrorHandiling, and Transactions in SSIS
Error Handling:
To use error paths, you will need to configure the error output. There are three error-handling options for handling errors in the data flow components:
• Setting the error output to Fail Transformation will cause the data flow to fail if an error is encountered.
• Using the Ignore Failure option will allow the row to continue out the normal green data path, but the value that resulted in the error is changed to a NULL in the output.
• Setting the error output to Redirect Row will send the error row out the red error path; this is the only way to handle errors with separate components.
Check Points:
What does it do?
With Checkpoints enabled on a package it will save the state of the package as it moves through each step or task and place it in a XML file upon failure of the package. If your package does fail you can correct the problem in your package and rerun from the point of the tasks that did not successfully run the first time. Once the package completes successfully the file is no longer needed and automatically discarded.
How does this benefit you?
Just imagine your package is loading a table with 10 million records. Your package passes the Data Flow that performs this huge load without any problem (Other than the fact that it took two hours to load). The next task in your package is a Send Mail Task and for some reason fails.
You correct the problem in the Send Mail Task, but without using Checkpoints your package would still have to run that Data Flow that loads the 10 million records again (taking another two hours) even though you’ve already done it once. If you had enable Checkpoints on this package you could simply correct the problem in the Send Mail Task and then run the package again starting at the Send Mail Task. Sounds great right?
How do I configure it?
This example will run you through very basic package using Checkpoints. To see these instructions with screenshots go to my regular blog http://blogs.pragmaticworks.com/devin_knight/2009/06/enabling-checkpoints-in-your-ssis-packages.html
Example Overview
• Use Three Execute SQL Task using the AdventureWorks2009 (It can really be any database for this example) database as a connection manager.
• Configure the package to handle Checkpoints
• Configure the individual tasks to handle Checkpoints
Step 1: Configure Execute SQL Tasks
• Drag three Execute SQL Tasks on your Control Flow.
• Use any database for the Connection property on all three tasks
• Configure Execute SQL Task SQLStatement property: Select 1
• Configure Execute SQL Task 1 SQLStatement property: Select A (Set to intentionally fail)
• Configure Execute SQL Task 2 SQLStatement property: Select 1
Step 2: Configure Package to enable Checkpoints
• Open the properties menu at the package level (Just open properties in the Control Flow without any task or connection manager selected)
• Change the properties CheckpointFileName: c:\Checkpoint.xml (Feel free to use the .txt extension when naming the checkpoint if you want to open it in notepad and look at it!)
• Change the properties CheckpointUsage: IfExists
• Change the properties SaveCheckpoints: True
Step 3: Configure Each Task
• Select each task individually and open the properties menu at the task level (Just click the task once then hit F4)
• Change the FailPackageOnFailure property to True
Step 4: Run the Package
• Run the package and you will see the package fail on the second task
• This also created the file c:\Checkpoints.xml. Feel free to open it and take a look! I use the tool XML Notepad to view XML Files. It’s Free.
• You could also save this file with the.txt extension and just view in regular notepad and it still works as a Checkpoint.
• If you run the package a second time it will skip the first task that was successful and start right at the second task
Step 5: Correct the Problem and Rerun Package
• Open the Execute SQL Task 2 and configure the SQLStatement property: Select 1
• The package has now completed and skipped the first step which already succeeded. Imagine if that first step would normally take two hours to run!
Defining Package and Task Transaction Settings
You can set package transactions at the entire package level or at any control flow container level or task level. Transactions in SSIS use the Windows Distributed Transaction Coordinator (DTC); the DTC service needs to be started on the machine for transactions to work. Any service or program that is enabled to work with the DTC can be part of a transaction in SSIS.
To enable a transaction within a package, you need to set the TransactionOption property of the task or container to Required. Figure 5-1 highlights the properties of a package at the control flow level, which means they apply to the package as a whole. The TransactionOption property is the same on any control flow object.
Figure:1 A task's or container's TransactionOption property must be set to Required to enable a transaction within a package.
When deciding whether and how to implement a transaction, follow these guidelines:
• For transactions to be enabled in SSIS, you need to turn on the DTC service, and the tasks that you want to be part of the transaction must work with the DTC service natively.
• If a series of tasks must be completed as a single unit, in which either all the tasks are successful and committed or an error occurs and none of the tasks are committed, then place the tasks within a Sequence Container, and then set the TransactionOption property of the container to Required.
• A task can inherit the transaction setting of its parent when the TransactionOption property is set to Supported, which is the default setting when creating a task or container.
• You can prevent a task from participating in a transaction by setting its TransactionOp-tion setting to NotSupported.
• Transactions work at the control flow level and not within a data flow. This means that you can turn on a transaction for a data flow task, but you cannot turn it on separately for selected components within the data flow; either the entire data process will be successful or it will be rolled back.
Implementing Restartability Checkpoints
At times, especially if you are working with complicated or long-running packages, you will want the ability to restart a package if it fails and have it start at the point of failure. In other words, you might not want the tasks that were already successful to be run again if the package is restarted. This can be done by enabling checkpoints in the package.
Enabling restartability within a package requires, first, enabling a package to use checkpoints, and, second, setting the specific tasks and containers to write checkpoints. To turn on checkpoints within a package, follow these steps:
1. Within the package, open the Properties window, and then select the Control Flow tab, which will reveal the properties of the package.
2. Set the SaveCheckpoints property at the package level to True. This allows checkpoints to be saved during package execution.
3. In the CheckpointFileName property, provide a valid path and file name to a checkpoint file. Packages use files to maintain their state information, so if a package fails and is then restarted, the package can read the checkpoint file to determine where it left off and to track the state information at the last successful task.
4. Set the CheckpointUsage to IfExists, which causes the package to run from the beginning if the file is not present or to run from the identified point if the file exists.
Practice: Implementing Package and Task Transactions
In this practice, you will turn on transactions at the container level and observe the results when a task fails within the container.
Exercise 1: Enabling Transactions
1. Navigate to the Control Panel/Administrative Tools/Services console, and then start the Distributed Transaction Coordinator service.
2. Open the SSIS project that you created in Chapter 4, and then open the package called MyPackage.dtsx. Or you can use the 'Start Here' project in the Source\Ch 05 folder in the installed practice files.
3. From the View menu in the menu bar, open the Properties window, and then click the pushpin in the Properties window to lock the window in the open position.
4. In the Control Flow Designer, select the Sequence Container (by clicking on it), and then note the Sequence Container properties listed in the Properties window.
5. Set the TransactionOption to Required using the drop-down list.
6. Save the package by clicking the Save icon in the toolbar.
Exercise 2: Observing a Transaction Rollback
1. With the package from Practice 1 still opened, expand the Sequence Container, and drag and drop a new Execute SQL Task to the bottom of the Sequence Container workspace.
2. Connect the green precedence arrow from the Data Flow to the new Execute SQL Task by dragging the green arrow from the bottom of the Data Flow onto the Execute SQL Task.
3. Edit the Execute SQL Task by double-clicking on the task. In the Execute SQL Task Editor, change the Connection property to the AdventureWorks connection.
4. Change the Name property within the Execute SQL Task Editor to Force Failure.
5. Select OK in the Execute SQL Task Editor to return to the Control Flow.
6. With the Force Failure Execute SQL Task still selected, open the Properties window and change the ForceExecutionResult property to Failure. By setting this property to Failure, you are specifying that the task should fail intentionally, which is something you might do for testing purposes.
7. Open a new database query, in SSMS, against the AdventureWorks database.
8. Run the following SQL statement and observe the results: SELECT COUNT(*) FROM Sales_Summary
9. In the Business Intelligence Development Studio (BIDS), execute the SSIS package you just modified, which will intentionally fail at the last step.
10. Stop the package execution, and rerun the query from step 8.
11. Observe that even though the data flow was successful, the data was rolled back because the Sequence Container was configured with a transaction, and the last task within the Sequence Container failed.
Quick Check
1. You add a sequence container to a package that contains several tasks, one of which calls a command on a legacy system and another of which a Data Flow Task imports data into SQL Server. Even with DTC started and transactions turned on, your sequence container fails before the tasks even run. What is the problem?
2. What do you need to set in order to use checkpoint properties at the package level, after you have turned the checkpoint properties on?
Quick Check Answers
1. The transactions featured in SSIS use the DTC service. However, not all systems support DTC, and a transaction cannot be forced on a non-compliant system, so the container will fail. You should remove the legacy task from the sequence container that has the transaction.
2. You need to set the FailPackageOnFailure property to True for tasks to write to the checkpoint file. However, if you want to rerun any successful tasks that occur before the failure task, you need to use a sequence container around the group of related tasks that require transactions.
Handling Package Errors with Event Handlers
In the data flow, using data viewers provides the ability to easily debug problems while processing data. The control flow, however, is different because the focus is on workflow and execution rather than on data and transformations. Leveraging the capabilities in Visual Studio, the control flow supports visual debugging and breakpoint features. Let's first look at the event handlers that SSIS provides; then we will explore the debugging capabilities in the control flow.
Event Handlers
SSIS provides the ability to listen for certain execution events and perform other operations when an event happens (depending on the execution event). For example, if an error happens, the error event handler can send an alert or potentially fix a data problem. Event handlers use the control flow paradigm for workflow processing, which includes all the same control flow tasks and containers that are found in the toolbox of the control flow.
You can define zero, one, or more than one event handler for a package. To add an event handler to a package, you need to select the Event Handler tab in the package designer. Creating a new package event handler requires that you select the executable and the event handler event, as Figure 2 shows.
Figure:2 Selecting the executable and event handler event for a package
The executable is the task or container scope that the event will fire. You can also choose the package itself (the highest-level container) as the executable for an event. The event handler event is the actual event that causes the event workflow to execute. The following table includes the package event handler types:
In addition, event handlers assigned to an executable scope will propagate down to child events when the event fires. If an event is assigned to a container, the child executables include the tasks and containers that are embedded within the parent container. This means that if you assign an OnError event to the package and an OnError event occurs at a task, the event handler would fire for both the task and the package (and for any containers in between). You would use an event handler for tracking error details, for sending failure messages in emails, and for implementing manual rollback logic.
BEST PRACTICES Capturing error information with the OnError event
Each package contains a set of system variables that are updated for the various levels in the package during the package execution. With event handlers, you can capture these variables and values, which provide contextual information, such as the ErrorCode, ErrorDescription, and SourceName (the task), when the event fires.
Using event handlers is a great way to track package execution; they can be used to audit the execution, capturing the errors that occur in a task. In addition, the event handler Send Mail Task can be used for notification; for example, it can notify an administrator of a certain predefined condition that requires a special response.
Exam Tip
Event handlers can be turned off for any task or container by setting the Disable-EventHandlers property of the Task or Container to True. In other words, if you have an event handler defined, but you specifically do not want it to be invoked for a specific task, then you can turn off event handlers for that task only.
Debugging the Control Flow with Breakpoints
Package debugging lets you know what is going on during the execution of a package in the designer so that you can troubleshoot or validate processing logic. Control flow debugging involves setting breakpoints in the package, which will pause the control flow execution so that you can observe the execution state. SSIS takes advantage of the breakpoint functionality that comes with Visual Studio, which means you have the capabilities to view execution information about the package when you execute a package in the designer.
BEST PRACTICES Breakpoints work in control flow only
Breakpoints function in the control flow but not in the data flow. For scripting, this means that you can set breakpoints only in a control flow Script Task and not in a data flow Script Component.
To set a breakpoint, highlight the task or container, and either press F9 or navigate to the Debug/Toggle Breakpoint menu. You can set multiple breakpoints in a package, and you can embed a breakpoint within a Script Task at a line of code. Figure:3 shows a package that is running but is paused at execution.
Figure:3 A package that is running but paused
In this screen, the arrow next to the breakpoint icon indicates which task the package is currently waiting to run. When you are paused in the debug environment, you can do the following things to help troubleshoot your package:
• Open the Locals window to see all the variable values and the package status. You can find this window in the Debug toolbar, next to the package execution selections. If you have several variables in a package that you actively use to control logic and precedence, you can use a breakpoint to pause the execution, allowing you to troubleshoot variable values handling before the package execution completes.
• When you have completed working during a pause and are in a Script Task, you can continue the execution of the script and package to the next breakpoint by either clicking the Continue button in the Debug toolbar or by pressing F5. Alternatively, you can stop the package from continuing by clicking the Stop button on the toolbar. Breakpoints in a Script Task are very useful because they help you validate the code logic and branching that you are performing with the script.
• When the package is paused on a task (as opposed to within the code of a Script Task as described in the prior bullet) in the control flow, you can also continue running the package to completion (or to the next breakpoint) by selecting the Continue button on the Debug toolbar or by pressing F5.
• Stopping the package during execution, whether the package is paused at a breakpoint or executing, is handled by clicking the Stop button in the toolbar or by pressing the Shift and F5 keys at the same time.
In all, breakpoints are powerful tools for resolving errors and validating functionality. Combined with data views in the data flow, they provide comprehensive debugging support during your package development.
Practice: Identifying Data Flow Errors
In this practice, you will add an error flow path and then identify the error by observing the rows in the output through a data viewer.
Exercise: Adding Error Flow Paths
1. In SSMS, create a new database query connected to the AdventureWorks sample database and run the following code:
USE [AdventureWorks] GO
TRUNCATE TABLE dbo.Sales_Summary GO
ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT [CK_Summary_ProductLine]
CHECK ((upper([ProductLine])='R' OR upper([ProductLine])='M' OR upper([ProductLine])='T')) GO
2. The above code uses the Sales_Summary table that you created in Chapter 4.
3. Open the package that you modified in the previous practices in this chapter.
4. Navigate to the Data Flow designer and open the OLE DB Destination adapter.
5. In the Data Access Mode drop-down list, change the selection to Table Or View, and then verify that the Sales_Summary table is still selected in the Name Of The Table Or The View drop-down list.
6. Click OK in the OLE DB Destination Editor to return to the package designer.
7. Right-click in the Data Flow workspace, and then choose Execute Task from the drop-down list to run the Data Flow Task. Before stopping the package, navigate to the Progress tab and observe that the OLE DB Destination failed because of a constraint violation.
8. Stop the package to return to design mode.
9. From the SSIS menu in the menu bar, select Variables; this will open the Variable window.
10. Select the leftmost icon in the Variable window toolbar to create a new variable for the package and name the variable ErrorCount.
11. Open the toolbar, and then drag a Row Count transformation onto the Data Flow work-space.
12. Highlight the OLE DB Destination adapter, and then drag the red error output path and drop it on the Row Count transformation.
13. When the Configure Error Output window appears, change the value in the Error Column drop-down list to Redirect Row, and then click OK to return to the Data Flow designer.
14. Open the Row Count transformation, and then change the VariableName property to
15. User::ErrorCount.
16. Click OK in the Row Count Editor to return to the Data Flow.
17. Right-click the red error path, and then select Data Viewers from the drop-down list. 16. Choose Add in the Data Flow Path Editor, highlight Grid, and click OK in the Configure
18. Data Viewer window and OK in the Data Flow Path Editor.
19. Right-click in the Data Flow designer workspace, and select Execute Task from the drop-down list.
20. A new Data Viewer will appear, which will reveal the OLE DB Destination Error Output.
21. Observe that the ProductLine column for all the error rows is S, which violates the defined constraint that you created in step 1.
22. Select Detach in the Data Viewer window, and then stop the package execution.
23. In SSMS, run the following database query, which adds S as a valid value of the Pro-ductLine column in the Sales_Summary table:
USE [AdventureWorks] GO
ALTER TABLE dbo.Sales_Summary
DROP CONSTRAINT [CK_Summary_ProductLine] GO
ALTER TABLE dbo.Sales_Summary WITH CHECK ADD CONSTRAINT [CK_Summary_ProductLine]
CHECK ((upper([ProductLine])='R' OR upper([ProductLine])='M' OR upper([ProductLine])='T' OR upper([ProductLine])='S')) GO
24. Return to BIDS and rerun the Data Flow, observing that the OLE DB Destination is now successful, with no rows being routed to the error path output or data viewer.
Quick Check
1. A Data Conversion transformation is failing in the middle of the data flow execution, and you need to determine what is causing the error. How should you proceed?
2. Your package contains a string variable that you are updating, using a Script Task, to be a file path and file name. Your package is failing at a File System Task that is configured to use the variable to move the file to a different folder on the server. How do you troubleshoot the package?
3. You would like to log all the package errors to a custom database table you have created for auditing purposes. How can you accomplish this task?
Quick Check Answers
1. To determine what is causing the error, configure the Data Conversion transformation error path to Flat File so that any rows that are failing conversion are sent out to a file. Then, create a data viewer on the error path, and run the package in BIDS. This technique will capture the errors in a file and display the rows in the designer for troubleshooting.
2. Because the Script Task can contain embedded breakpoints in the code, set a breakpoint in the script so that you will be able to execute the package and step through the lines of code, observing the value of the variable to check the code and accuracy.
3. By using the OnError event handler assigned to the package level, you can also use an Execute SQL Task that calls a stored procedure, passing in the SourceName and ErrorDescription variable values. The procedure can then track these details into a metadata storage table for auditing.
Posted by Ravi at 11:54 PM 0 comments
Email This BlogThis! Share to Twitter Share to Facebook Share to Google Buzz
Thursday, August 19, 2010
Looping over files with the Foreach Loop Container
In SQL Server 2000 Data transformation Services (DTS) it was a bit of a hack to be able to loop over files of a given type in a certain directory and import them into your destination. It involved a lot of "Glue Code" and a certain amount of fooling the package into going back to a previous task because it still had work to do. Well thankfully in SQL Server 2005 Integration Services (SSIS) that has all changed and this article is going to show you how.
The image below shows us how incredibly simple and clean the package will look when finished. There are some things worth pointing out at this stage. In the centre of the screen we see the Foreach Enumerator container and inside that we see the Data Flow task which houses the pipeline. At the bottom in the Connection Managers tray we see our Flat File Connection Manager (My Source File) and our OLEDB Connection Manager (My Destination). The Flat File Connection Manager is the one in which we are most interested for this article. Both of these managers are used in the Data Flow behind the DataFlow task. We will not be detailing the pipeline behind the DataFlow task in this article but it consists of a Flat File Source moving data to an OLEDB destination.
Let's begin then by opening up the Foreach enumerator and moving straight to the Collection node in the tree on our left. Below we see our information already populated.
What we see on the screen is pretty self explanatory but let's go through it anyway. We have chosen to enumerate over a file collection and ths is indicated by the value next to the Enumerator property at the top. We need to specify a folder over which to loop and for which type of files to look and we do that in the centre of the form. We are given three options as to what is returned when the loop finds a file in the folder at the bottom of the form. We can return the whole filename including extension and path, the name and extension or simply the name of the file found. Because our connection manager is going to need to know exactly where to find the file and it's name we have hosen the first option. The final thing we see on this screen is the ability to traverse subfolders. In our example we do not need to do this.
When the Foreach enumerator finds a file it needs to tell us about what it found and it does this by populating a variable. Click on to the Variable Mappings node now. Our package currently has no variables able to accept the name of the file so we are going to create a new one.
The next screen we see allows us to set the values of the variable. As we can see variables can be scoped in SSIS to certain executables in the package or to the package itself.
Here is how our variable looks with all its properties set.
Because the enumerator will only return us at most one value on every iteration we map our variable to an index of 0.
We have now configured everything as far as the Foreach enumerator is concerned. We now need to set the rpoerties of the Flat File Connection Manager. Highlight the manager in the tray at the bottom, right click and choose properties.
The important part of this dialog is highlighted and that is "Expressions". Click on the ellipses and we will be taken through to the next screen where we can start to create the expression. In the screen that follows, from the Property column drop the list down and choose ConnectionString
Now hit the ellises button to the right and we are taken through to the expression editor where we will build the actual expression itself
Our requirements are pretty simple here and all we want to do is to retrieve the variable we defined earlier. To do this simply drag the variable from the list at the top to the expression text box at the bottom. Property Expressions can become very complex and we shall no dount be seeing more of them in future articles. After you have chosen the variable click OK
We now see that our expression is mapped to our ConnectionString property. Click OK
Finally we can now see our File Manager's Connection string property being mapped to an expression in the properties of the manager.
When the enumerator finds a file matching our requirements it will set the correct property on the connection manager and this will be used by the pipeline at runtime.
No comments:
Post a Comment