Non-Blocking Transformation

Non-blocking transformations in ETLBox process data row by row as it becomes available in the input buffer. These transformations are optimized for performance and minimal memory usage, making them well-suited for high-throughput batch processing scenarios.

Transformation Execution Behavior

Row-by-Row Processing

Most ETLBox transformations operate in a row-by-row fashion. As soon as a row enters a transformation’s input buffer, it is processed and the result is passed to the output. This streaming behavior allows for high-performance data flows and efficient memory usage within batch execution pipelines.

diagram

Non-Blocking vs Blocking

Transformations are categorized based on how they handle input data:

  • Non-blocking transformations process rows immediately as they arrive, requiring minimal memory.

  • Blocking transformations wait until all data (or a defined batch) has been received before producing output. These transformations are necessary for operations like sorting, aggregation, or complex joins.

This article provides details about non-blocking transformations. For information on blocking transformations, see Blocking Transformations Overview.

Buffering

Each transformation has at least one input buffer. Buffers temporarily store rows during processing and ensure smooth data flow between components.

If a transformation receives data faster than it can process, the input buffer absorbs the excess. By default, each buffer allows up to 100,000 rows.

However, if a buffer fills up and data cannot be processed quickly enough, memory consumption can increase significantly. To avoid high memory usage, it is recommended to lower the buffer size if needed:

transformation.MaxBufferSize = 50000; // Per component
Settings.MaxBufferSize = 10000; // Global default

Summary of Non-Blocking Transformations

TransformationDescription
RowTransformationApplies custom logic to each row
CachedRowTransformationLike RowTransformation, but caches previously processed rows
ColumnTransformationRenames, reorders, or removes columns; outputs dynamic ExpandoObject
DistinctRemoves duplicate rows
FilterTransformationFilters rows based on a predicate
LookupTransformationEnriches rows using an in-memory lookup
MergeJoinJoins two input streams using a match function
MulticastForwards each row to multiple outputs
RowDuplicationDuplicates rows a specified number of times
RowMultiplicationSplits one row into multiple output rows
RowValidationValidates rows and separates valid/invalid rows
XmlSchemaValidationValidates XML strings against an XSD schema

Descriptions

RowTransformation: Applies custom C# logic to transform each row. Can also convert input to a different output type.

CachedRowTransformation: Similar to RowTransformation, with access to a cache of previously processed rows for comparison or deduplication logic.

ColumnTransformation: Changes the structure of the row by renaming, reordering, or removing properties. Always returns a dynamic object.

FilterTransformation: Excludes rows that do not satisfy a specified condition.

Distinct: Removes duplicate rows by comparing content and allowing only the first occurrence.

LookupTransformation: Adds fields to each row by matching values from a preloaded lookup dataset.

MergeJoin: Combines rows from two sources using a custom equality function. Works best with pre-sorted inputs.

Multicast: Sends identical copies of each row to multiple downstream components.

RowDuplication: Creates multiple instances of each row. Can be made conditional using a predicate.

RowMultiplication: Converts one input row into multiple output rows using a custom function.

RowValidation: Checks rows against defined validation rules and routes valid and invalid rows separately.

XmlSchemaValidation: Validates the contents of an XML string field in a row against a given XML Schema Definition (XSD).