Control Flow Overview
ControlFlow Tasks are an easy way to run database independent sql code or to avoid the boilerplate code when you just want to execute a simple statement.
Database and logging specific tasks
Control Flow task can be split in general tasks and logging tasks. General tasks allow you to create or drop objects or to run some particular predefined statements that will execute on any database. Logging tasks are useful helper when you need to create some tables for logging or to retrieve the whole log as Json.
Control Flow Tasks reside in the ETLBox.ControlFlow
namespace - tasks for logging in the ETLBox.Logging
namespace.
General idea behind Control Flow Task
Control Flow Tasks are a set of tasks to manage, alter or query a database. With one single line of code you will be able to create a table or fire some sql on your database - you write your code in C#, and in the background database code for your specific database is generated.
ControlFlow Tasks do not strive to be an replacement for an ORM (Object relation mapper), nor is it a complete set of tasks for all the operations you actually can do on the database. They were created as useful helper to create Tests for ETL pipelines - they can help you to (re)create databases, drop tables (if they exists) or to create a test environment on your database without writing to much database specific sql code.
Avoiding boilerplate code
If you have ever wrote some ADO.NET code to simply execute some Sql, you probably found that there is some boilerplate code you have to write over and over again. The idea behind some of the Control Flow Tasks is that you don’t have to write the same code again and again, e.g. just for doing something trivial like opening up a connection and counting the rows in table. This should be doable with only one line of .NET/C# code, and it should work independently on every database.
RowCount with ADO.NET
For instance, the code for establishing a connection and doing a simple row count on a table with a classic ADO.NET connection would look like this:
RowCount with Control Flow Tasks
Now let’s have a look how to do a row count with the Control Flow Tasks library. First, we need to setup a connection manager in order to connect with database. This is the same connection manager as used within the data flow part.
Now you can use a RowCountTask
to query the number of rows within a table with only one line.
Optionally, you can set up a default connection that is used every time you don’t provide a connection manager.
Simple set the property DefaultDbConnection
on the static Settings
class.
Now a RowCount is as simple as this:
Internally, an ADO.NET connection is opened up (the default ADO.NET connection pooling is used)
and a select count(*) from demotable
is executed on the database. The result is returned from the RowCountTask.
Using the instances
For every Control Flow Tasks, there are static accessors to simplify the use of the tasks. In order to have access to all functionalities of a task, sometime you have to create a new instance of the task object.
If you want to do a row count with an instance instead of the static accessors, it would look like this:
So whenever you don’t find a static accessor satisfying your needs, try to create an instance and check the properties and methods of the object.
Configure a task
But there is more. Let’s assume you want to count the rows on a pretty big table, a “normal” row count perhaps would take some time.
So RowCount has a property called QuickQueryMode
(which will only be quick on SqlServer). If set to true, a sql statement that queries the partition tables is then executed.
This would give you the same result, but instead of doing a select count(*) from dbo.tbl
the following sql is fired on the database
Note
ConnectionManager
All examples for the ControlFlow that access databases have a ConnectionManager
object passed - this is the actual connection manager for your database.
E.g. for Sql Server you would create it with
or for MySql with
Default ConnectionManager
Every component or task related to a database operation needs to have a connection managers set in order
to connect to the right database. Sometimes it can be cumbersome to pass the same connection manager over and over
again. To avoid this, there is a static Settings
class that contains the property DefaultDbConnection
.
If you define a connection manager here, this will always be used as a fallback value if no other connection manager property was defined.