In Part 1, we delved into ETLBox, a robust .NET tool for data management. We explored its features, such as its straightforward API, versatility in handling transformations, and efficiency with large data volumes. Using ETLBox, we set up a data flow to process a CSV file of supplier data, emphasizing data validation and error management.
In Part 2, we enhanced the data flow, focusing on adding custom error messages. We also addressed the challenge of handling repeated or different supplier data loads. To capture these changes effectively, we implemented a Slowly Changing Dimension Type 2 (SCD Type 2).
Looking ahead to Part 3, we are going to convert the same data flow away from static types into a dynamic and configurable one
Traditionally, C# has been anchored to the realm of static typing. While this provides a robust structure and type safety, it can sometimes be a constraint, especially when dealing with flexible and ever-evolving data structures in the world of ETL.
Enter the realm of ExpandoObject and the dynamic keyword. These tools break free from the rigidity of static types, offering a more fluid data handling experience. Just because C# is statically typed doesn’t mean we can’t explore the versatility of dynamic data.
Thankfully, ETLBox is not just confined to static types. It fully embraces and supports the dynamic nature of ExpandoObject. In this part, we’ll pivot our example, shedding the static and ushering in the dynamic, showing you how to achieve greater adaptability in your dataflow processes.
In C#, the dynamic keyword introduces a paradigm shift from the traditionally statically-typed nature of the language. The dynamic type bypasses compile-time type checking. Instead, types are resolved at runtime. This opens the door to greater flexibility, especially when the exact structure of data isn’t known until runtime.
ExpandoObject is an in-built .NET class that lets you add properties, methods, and events to an object dynamically at runtime. It’s especially useful when working with data structures that might be fluid and not set in stone.
But what makes ExpandoObject even more powerful is its ability to be treated as a dictionary where properties can be accessed with keys.
Every ExpandoObject can be cast to IDictionary<string, object>, allowing properties to be accessed, added, or removed using dictionary-like syntax.
This approach allows for extremely versatile and flexible data handling, especially when dynamically constructing or deconstructing objects based on changing requirements.
In the world of ETL, where data sources and structures can change frequently, the power of dynamic and ExpandoObject provides an invaluable toolset for adaptable and resilient data transformations.
To facilitate more dynamic behaviour in our application, we’ll establish a configuration table called Meta. This table will host essential metadata such as business keys, obligatory fields, and potential field length constraints, as well as the associated filename in the source and its corresponding database column. This configuration is aligned with our static table structure from the previous segment.
While there are robust ORM mappers like Dapper or EntityFramework, for our purpose, we’ll employ a concise ETLBox data flow to load our configurations into an in-memory object.
Instead of a fixed setup, we’ll now leverage our configuration data to create our demo table dynamically.
It’s important to note that we’re still incorporating static columns, namely the id column, ValidFrom, and ValidTo. These columns aren’t found in our dynamic configuration; therefore, our dynamic data flow will work under the assumption that they’re present.
ETLBox components are flexibly designed with both generic and non-generic class definitions. Often, in the context of ETL processes, developers define strongly-typed objects, or POCOs (Plain Old CLR Objects), to represent the data flow, as demonstrated in earlier sections. However, for more dynamic operations, using the ExpandoObject can be advantageous.
For instance, rather than using RowTransformation<VendorMaster>, we can employ RowTransformation<ExpandoObject>, which is equivalent to the simple RowTransformation.
Let’s look at the dynamic component creation and decode the details.
We’ve transitioned the Normalize method, previously part of the VendorMaster class, into a standalone function. This function has been revamped to utilize the configuration data retrieved from our Meta table.
Furthermore, previously defined attributes like DbColumnMap, used for property mappings, are now set dynamically via the ColumnMapping attribute on the DbDestination component. This method of dynamic assignment also extends to other attributes, such as those for LookupTransformation and the Distinct transformation. Importantly, these dynamically assigned properties will replace any existing attributes that may already exist on a class definition, offering a valuable tool for minor adjustments in data flow behaviour.
Other transformations, including RowTransformation, have been updated to support dynamic objects.
By introducing these nuanced changes to the data flow, we’ve successfully transitioned from a static type-based approach to a dynamic, configuration-driven data loading process. This approach significantly boosts flexibility, making it more adaptable to evolving data requirements.
Over the course of three articles, we’ve delved deep into ETLBox, a potent .NET library dedicated to data management. In the first installment, we discovered the essence of ETLBox: a tool primarily designed for ETL processes without the dependence on a GUI. Key takeaways about ETLBox include its user-friendly API and the capacity to support various transformations and data sources. A hands-on example showed us how to craft an ETL data flow with ETLBox, processing a CSV supplier dataset and emphasizing its capabilities in data validation, normalization, and error management.
The second article brought enhancements to the table. We enhanced error reporting and incorporated Slowly Changing Dimension (SCD) Type 2, using ValidFrom and ValidTo date columns, enabling us to chronicle data alterations over time.
In the third segment, we shifted gears, handling data flow using dynamic data objects. We ventured into the realms of the dynamic keyword and ExpandoObject within C#, illustrating the power of moving beyond fixed data types. A new configuration table emerged, housing pivotal metadata and fostering the app’s dynamic nature.
To sum it all up, the series spotlighted ETLBox as a formidable ETL tool for the .NET community. Its range from foundational data flow setups to intricate enhancements like SCD Type 2 and dynamic data operations underscores its versatility, dependability, and adaptability. As ETLBox introduces dynamic data flows, it sets a new standard by addressing the ever-evolving challenges of data sourcing and structuring, offering an unparalleled level of flexibility. Although there’s undeniable value in static typing, the embrace of a dynamic approach positions ETLBox prominently within the .NET ETL ecosystem.