Connection Managers
There a numerous database connector packages that come with ETLBox. Add the connector package for the database that you want to connect with. So if you want to connect with a SqlServer, add the ETLBox.SqlServer pacakge to your project. For MySql, use ETLBox.MySql.
Database integration
Relational databases can be accessed with the DbSource and DbDestination. How to use the connectors will be explained in the next chapter Relational databases. But before you can use them, you need the right connection manager for you database. For Sql Server you would the SqlConnectionManager, for Postgres the PostgresConnectionManager, for MySql the MySqlConnectionManager, and so on. You get the idea.
NoSql databases are also supported. As they are based on different concepts than relation databases (e.g. document storage, key-value store or graph database) the connection and features to interact with NoSql databases can totally different to other databases.
Supported relational databases
These relational database are currently fully supported with ETLBox:
Database | Native support | Odbc support | Remarks |
---|---|---|---|
SQL Server | ✓ | ✓ | + OleDb support |
Postgres | ✓ | ✓ | |
SQLite | ✓ | ✓ | |
Oracle | ✓ | ✓ | |
MySql | ✓ | ✓ | |
MariaDB | ✓ | ✓ | |
Db2 | ✓ | X | Luw + z/OS + Cloud |
SAP ASE (Sybase) | ✓ | ✓ | |
SAP HANA | ✓ | ✓ | |
Snowflake | ✓ | ✓ | |
Microsoft Access | X | ✓ | |
Apache Spark (SQL) | X | ✓ | |
Other databases | X | * | Generic ODBC support |
Custom connector | ** | ** | Using own implementation |
*: There is a limited support for other databases as well - you can use the generic Odbc or OleDb connection manager to access these databases. Please note that using these connections comes with some major limitations.
**: Using a CustomSource
/CustomDestination
or CustomBatchSource
/CustomBatchDestination
, you can create your own connectors using your own implementation logic.
Connection manager
Connection strings
To connect to your database of choice, you will need a string that contains all information needed to connect to your database (e.g., the network address of the database, user name and password). The specific connection string syntax for each provider is defined by the ADO.NET framework. If you need assistance to create such a connection string, have a look at connectionstrings.com that provide example strings for almost every database.
Choosing the right manager
Some components (e.g. the DbSource
or the SqlTask
) can be used to connect via ADO.NET to a database server. They will need a ConnectionManager
class to connect with the right database. The easiest way to create a connection manager is to create a new instance while providing the connection string for your database.
Here is an example creating a connection manager for Sql Server:
For other databases the code looks very similar. Please be aware that the connection string might look different. Here are some more examples:
Default ConnectionManager
Every component or task related to a database operation needs to have a connection managers set in order
to connect to the right database. Sometimes it can be cumbersome to pass the same connection manager over and over
again. To avoid this, there is a static ControlFlow
class that contains the property DefaultDbConnection
.
If you define a connection manager here, this will always be used as a fallback value if no other connection manager property was defined.
Connection string wrapper
When you create a new connection manager, you have the choice to either pass the connection string directly or you
create an adequate ConnectionString object from the connection string before you pass it to the connection manager.
The ConnectionString object does exist for every database type (e.g. for MySql it is MySqlConnectionString
). The ConnectionString
wraps the raw database connection string into the appropriate ConnectionStringBuilder object and also offers some more
functionalities, e.g. like getting a connection string for the database storing system information.
Odbc & OleDb Connections
The DbSource
and DbDestination
also work with Odbc connection. If you want to connect via Odbc or OleDb, you need to have a proper driver installed on the machine where ETLBox is executing. There is a difference between the “generic” Odbc/OleDb driver and the specialised Odbc driver. In short: if you want to connect to a database that is fully supported by ETLBox, use the Odbc/OleDb driver that has the database name in it.
Here is an example how you can connect to Postgres via ODBC:
Warning: ODBC does not support bulk inserts like in “native” connections.
The DbDestination
will do a bulk insert by creating a sql insert statement that
has multiple values: INSERT INTO (..) VALUES (..),(..),(..)
Generic Odbc and OleDb conenctions
As ETLBox has some database specific code in different components, you normally would choose an Odbc or OleDb connector that fits to your database. But if your database is not fully supported yet, you can try to use the generic Odbc or OleDb connection manager. Make sure you reference the Odbc or OleDb connector package.
Unfortunately, this connector will have some limitations.
- if you use DbDestination or DbSource with this connector, you would need to pass a TableDefinition object. This object basically holds the table name and column names, because they can’t be automatically extracted from the database.
- most ControlFlow components won’t work (e.g. IfTableExistsTask). But SqlTask will be likely to work with the generic connection manager
- if you use special characters in your table or columns names, you need to set the quotation begin / quotation end properties QB & QE that fit to your database (e.g. “[” and “]” for Sql Server or “`” for MySql)
E.g. you can use the generic OleDb connection manager to connect with Sql Server via OleDb
You will need an OleDb connection string,.
You can also use this connection manager with the DbSource or the DbDestination component. But please note that this will only work if you pass the TableDefinition manually, like this:
Same for DbDestination - the property name is also TableDefinition
there.
Connection management
Connection pooling
The implementation of all connection managers is based on Microsoft ADO.NET and makes use of the underlying connection pooling. Please see here for more details of connection pooling. This means that this actually can increase your performance, and in most scenarios you never have more connections open that you actually need for your application.
You don’t need to explicitly open a connection. ETLBox will call the Open()
method on a connection manager whenever
needed - where it relies on the underlying ADO.NET connection pooling that either creates a new connection
or re-uses an existing one. Whenever the work of a component or task is done, the connection manager will return the connection back to
the pool so that it can be reused by other components or tasks when needed.
Please note that the connection pooling only works for the same connection strings. For every connection string that differs there is going to be a separate pool
Reusing connection managers
By default, you can always reuse a connection manager as often as you like. ETLBox will always create a clone of the provided connection manager object, and in combination with the ADO.NET connection pooling it will be automatically decided if a new connection needs to be created or if an existing connection can be reused.
You can modify this behaviour either by setting the LeaveOpen
property to true or by starting a transaction.
LeaveOpen
This behaviour - returning connections back to the pool when the work is done - does work very well in a scenario with concurrent tasks. There may be a use-case where you don’t won’t to query your database in parallel and you want to leave the connection open, avoiding the pooling.
For this scenario you can use the LeaveOpen
property on the connection managers.
Transactions
Each connection manager allows you to begin, commit or rollback a transaction. Usage is quite simple
BeginTransaction
let you optionally define the isolation level of the transaction. Most databases are using an isolation level comparable to Serializable
or Snapshot
as the default level. If you define a different transaction level for a transaction, make sure that your database does support this level.
If you are starting a transaction, the connection will stay open and connected with your database during the lifetime of the transaction.
When starting a Transaction, the property LeaveOpen
will be set implicitly to true, indicating that a transaction has been started and the connection needs to stay alive until the connection is either commit or rolled back. When using a transaction, the connection can’t be returned to the connection pool while the transaction is still alive.
Note
Avoiding conflicts when using transactions
Here is a simple example where you want to use 2 connection managers: If you have a database table where you want to insert data within a transaction scope, while you want to read data from the same database, you will need two connection managers to avoid conflicts:
In ETLBox, a connection manager is a wrapper for the underlying ADO.NET connection. If there would be no transaction open, this code would work without issues. The reason for this is that the ADO.NET connection pooling would open up two connections, one for reading and one for writing. In the moment where you start a transaction, the connection pooling is “deactivated” - the same connection is used for all operations.
Using Multiple Active Result Set
Some databases (e.g. SQLServer) have a feature called MARS
(Multiple Actdive Result Set. If this is enabled, you can still use the same connection for reading/writing at the same time. For SQLServer, MARS
can be enabled when connecting:
But there are also other ways to enable this feature, depending on this database. With MARS
enabled, you might be able to read and write data using the same connection with a transaction open.