Control and data flow basics
This purpose of this example is to give you a brief overview of the basic concepts of ETLBox. It demonstrates the basic idea of a very simple data flow and shows how to use the ControlFlow objects to manage your database metadata.
Prerequisites
This example is written in C#, based on the current .NET Core. You can use your IDE of your choice - most will probably go with Visual Studio or Visual Studio Code. The database used in this example is Sql Server. You can either you set up a standalone installation of Sql Server, or use a docker image. Also, Azure Sql could be an alternative which is easy to setup. To access your database, Azure Data Studio or Sql Server Management Studio is recommended.
Setting up a docker container
If you want to use docker to set up a sql server database, please install Docker for your OS first. Then you can start a docker image running sql server on ubuntu. Simply run the following command line statement in a command line tool:
With the command docker ps
you can see the container is up and running.
Setting up your project
Now we need to create a new dotnet core console application. You can do this either with your IDE or just execute the following command:
dotnet new console
If this doesn’t work, make sure you have the latest version of .NET Core installed.
Now, add the current version of ETLBox as a package to your project.
Now you will be able to use the full set of tools coming with ETLBox
Start coding
Now open your project and go into the main method of your program.
First, define a connection manager for Sql Server that holds the connection string
No you can use the CreateDatabaseTask
to create a new Database.
Also we would like to change the connection to the database we just created and
create a table in there using the CreateTableTask
.
Adding logging
Before we test our demo project, we want to have some logging output displayed. ETLBox logging is build on nlog. First you need to add NLog.Extensions.Logging as a package reference to your project.
Then Add the following lines as nlog.config to your project root. Make sure it is copied into the output directory.
Now we need to create a logger instance, and then assign it to the static property Settings.LogInstance
. Here is the code to create a NLog instance using the LoggerFactory
:
Running the project
Now build and run the project.
A terminal window will pop up and display the logging output. As the logging level is set to debug, you will see all SQL code which is executed against the database. Check if the database and the table was created.
A simple etl pipeline
Next we want to create a simple etl pipeline.
First we create a demo csv file named input.csv
.
The input file contains header information and some value.
Also we need to copy it into the output directory.
Now we create a CsvSource pointing to the newly created input file.
Before we continue, we will need an object that can hold our data. Let’s call it MyData.
Now we add a row transformation. The row transformation will receive a string array from the source and transform it in our Mydata object.
Actually, this transformation wouldn’t been necessary - the CsvSource could have automatically converted the
incoming data into the MyData
object. But it shows how a transformation can be used to execute any
C# code you like.
Next we add a database destination pointing to our table.
Now we need to link the components of our dataflow.
After linking the components, we want to have the source reading the input data. The destination should wait until it received all data.
Finally, we check if the data was successfully loaded into the table and write it into the console output. We use the SQLTask for this and write the result into the output.
Run again
Let’s run the project again and see the output.
You’ll see that the data was successfully copied into the database table.