Non-Blocking Transformation
Non-blocking transformations in ETLBox process data row by row as it becomes available in the input buffer. These transformations are optimized for performance and minimal memory usage, making them well-suited for high-throughput batch processing scenarios.
Transformation Execution Behavior
Row-by-Row Processing
Most ETLBox transformations operate in a row-by-row fashion. As soon as a row enters a transformation’s input buffer, it is processed and the result is passed to the output. This streaming behavior allows for high-performance data flows and efficient memory usage within batch execution pipelines.
Non-Blocking vs Blocking
Transformations are categorized based on how they handle input data:
Non-blocking transformations process rows immediately as they arrive, requiring minimal memory.
Blocking transformations wait until all data (or a defined batch) has been received before producing output. These transformations are necessary for operations like sorting, aggregation, or complex joins.
This article provides details about non-blocking transformations. For information on blocking transformations, see Blocking Transformations Overview.
Buffering
Each transformation has at least one input buffer. Buffers temporarily store rows during processing and ensure smooth data flow between components.
If a transformation receives data faster than it can process, the input buffer absorbs the excess. By default, each buffer allows up to 100,000 rows.
However, if a buffer fills up and data cannot be processed quickly enough, memory consumption can increase significantly. To avoid high memory usage, it is recommended to lower the buffer size if needed:
transformation.MaxBufferSize = 50000; // Per component
Settings.MaxBufferSize = 10000; // Global default
Summary of Non-Blocking Transformations
Transformation | Description |
---|---|
RowTransformation | Applies custom logic to each row |
CachedRowTransformation | Like RowTransformation, but caches previously processed rows |
ColumnTransformation | Renames, reorders, or removes columns; outputs dynamic ExpandoObject |
Distinct | Removes duplicate rows |
FilterTransformation | Filters rows based on a predicate |
LookupTransformation | Enriches rows using an in-memory lookup |
MergeJoin | Joins two input streams using a match function |
Multicast | Forwards each row to multiple outputs |
RowDuplication | Duplicates rows a specified number of times |
RowMultiplication | Splits one row into multiple output rows |
RowValidation | Validates rows and separates valid/invalid rows |
XmlSchemaValidation | Validates XML strings against an XSD schema |
Descriptions
RowTransformation: Applies custom C# logic to transform each row. Can also convert input to a different output type.
CachedRowTransformation: Similar to RowTransformation, with access to a cache of previously processed rows for comparison or deduplication logic.
ColumnTransformation: Changes the structure of the row by renaming, reordering, or removing properties. Always returns a dynamic object.
FilterTransformation: Excludes rows that do not satisfy a specified condition.
Distinct: Removes duplicate rows by comparing content and allowing only the first occurrence.
LookupTransformation: Adds fields to each row by matching values from a preloaded lookup dataset.
MergeJoin: Combines rows from two sources using a custom equality function. Works best with pre-sorted inputs.
Multicast: Sends identical copies of each row to multiple downstream components.
RowDuplication: Creates multiple instances of each row. Can be made conditional using a predicate.
RowMultiplication: Converts one input row into multiple output rows using a custom function.
RowValidation: Checks rows against defined validation rules and routes valid and invalid rows separately.
XmlSchemaValidation: Validates the contents of an XML string field in a row against a given XML Schema Definition (XSD).