From files to databases
Sometimes you are in need to transfer data across databases on different server or to integrate flat files. ETLBox is an excellent tool of choice for integrating data from different sources. This example will guide you through the most common scenarios.
Prerequisites
The example code will not go into the details of the sql code to create the necessary table or files. It will focus on how to move the data from the source to the destinations.
Import or exporting CSV files
The easiest (and yet very powerful) example is to import or export data from a database table from/into a csv file.
Let’s start with the import of a csv file.
CSV Import
First, we need to create a connection manager which describe the connection to our database. In this example, we will connect to a Postgres database - but this will also work with any other supported database.
No we need to create the components for the CSV source and the database destination.
No we link the components together and execute the data flow.
In our demo we start the data flow with Execute()
- this will read all data from the source
and post it into the dataflow. While reading data, incoming batches will already be written into the destination. The program won’t continue until the flow has finished (though most of the data processing will be done in a separate task).
If you know how to deal with the Tasks Parallel library, you can use ExecuteAsync()
and instead.
Then you could start your network with
This example only works if you have a csv file named NameList.csv
copied into your output directory. It should
look like this:
Finally, here is the code to create the necessary table on your database of choice. I used the CreateTableTask
which offers a database neutral way to create a table (this will work on SqlServer, Postgres, MySql, SQLite …)
But of course you can always use regular Sql or other frameworks like Entitiy Framework…
Note
Of course, execute the create table code before you run the dataflow…
Running this example will import the csv file into the Postgres table. It should look like this:
Id | FirstName | LastName |
---|---|---|
1 | Bunny | Bugs |
2 | Tweety | |
3 | Devil | Tasmanian |
4 | Duck | Daffy |
6 | Porky Pig | NULL |
7 | Yosemite | Sam |
8 | Fudd | Elmer |
Export CSV
No we want to export the data again, which is just the other way round.
But to spice this example a little bit up, we now add an object type that hold the data during the processing. You don’t need to specify the object type as shown above - in this case, ETLBox will use a string array instead. But if you are in the need to work with your data, e.g. by add a transformation in the flow, it is easier if you have defined a class that specifies your data type.
No we create the components for the export, link them together and start the data flow.
You will find a file called Export.csv
in your output directory. As you have
perhaps noted, we changed the default csv separator from “,” to “;” by changing the configuration.
The csv will look like this:
Note
Transfer across databases
Sometimes you just want to transfer data across different database. Let’s say we want to take the data
in the table NameTable
(which we just used for our csv import) and transfer it into a Sql Server database. (It
could have be any other database, or another Postgres database on another server).
And to make it a little bit more complicated, the destination table on Sql Server looks a little bit different - it will also have an Id column (again as an Identity), but instead of FirstName and LastName it does only have a FullName column which is obviously a combination of first and last name.
First let’s create the destination table on Sql Server with a CreateTableTask (or do this manually): To do so, we will also need a new connection manager that can connect with Sql Server:
Now we extend our NameListElement
by a new Property that will store the FullName:
With this preparations, we can now create the data flow. This time, we will add a RowTransformation
in between
the source and the destination.
The RowTransformation has a function that concatenates first and last name and writes it into the property
FullName
. Actually, this transformation wouldn’t been necessary for this example - instead, you could have
defined the property with public string FullName => LastName + "," + FirstName;
in your NameListElement
object. But this way this example demonstrated the power of the RowTransformation a little bit.
Finally, we execute the data flow and wait for the completion.
The destination table on Sql Server should now look like this:
Id | FullName |
---|---|
1 | Bugs,Bunny |
2 | Tweety, |
3 | Tasmanian,Devil |
4 | Daffy,Duck |
5 | ,Sylvester |
6 | ,Porky Pig |
7 | Sam,Yosemite |
8 | Elmer,Fudd |
You have successfully transfered table data from a Postgres database into a Sql Server database and transformed it on-the-fly. This is one simple example of an ETL process (Extract, Transform, Load).