Pivot
The ETLBox Pivot transformation offers a simple way to reorganize your data by converting vertical lists into horizontal tables. This transformation can be helpful for summarizing data, allowing you to pivot one or more columns and aggregate their values
Introduction
The Pivot
transformation in ETLBox allows you to reshape data by converting rows into columns, effectively pivoting vertical data into a horizontal format. This is ideal for summarizing or restructuring data, such as turning a list of monthly values into a single row where each column represents a month.
PivotValue and PivotColumns
For example, given the following input data:
The Pivot
transformation will rearrange it into a horizontal format:
This transformation involves specifying:
- Pivot columns (headers in the resulting table),
- Pivot values (values under those headers), and
- Pivot rows (rows to be grouped together).
Here’s how it’s implemented:
The Month
property is marked with the PivotColumn
attribute, and the MonthValue
property is marked with the PivotValue
attribute.
Note
Grouped Data with PivotRow
When grouping data, the PivotRow
attribute can be used to group rows by a specific property:
Input:
Output:
Implementation:
PivotColumn, PivotValue, and PivotRow
You can define pivot columns, pivot values, and rows either by using attributes on a class (as shown above) or by directly setting the PivotColumns
, PivotValues
, and PivotRows
properties when using dynamic objects like ExpandoObject
. This flexibility allows you to handle both predefined and dynamic data structures.
For example, if working with dynamic data, you can specify these properties directly:
This approach will also work with the dynamic ExpandoObjects.
Aggregating Pivot Values with Built-in and Custom Methods
The PivotValue
attribute allows you to aggregate values using built-in methods like Sum
or with a custom function by specifying AggregationMethod.Custom
and a CustomFunction
.
Built-in Aggregation
Use the AggregationMethod.Sum
to calculate the sum of all values for a pivot column.
The Pivot provides the same aggregtion methods as the Aggregation.
Custom Aggregation
When AggregationMethod.Custom
is used, a CustomFunction
can be provided to define the aggregation logic.
Grand Totals for Rows and Columns
To calculate aggregated totals for rows and columns, the AddGrandTotalColumn
and AddGrandTotalRows
properties can be used.
Example with Grand Totals:
Input:
Output:
Advanced Features
Custom Column Naming
Use ColumnNameCombination
to customize how pivot column names are formed. This only applies if there are at least 2 PivotColumns
defined.
By default, the column names are concatenated.
Generic Aggregation
You can define an overall custom aggregation logic by setting the CustomValueAggregation
. This will overwrite any custom aggregation methods that are
specified on the pivot values.
Flatten Aggregation Results
This only applies if you have 2 or more PivotValues
, the output will contain an object of type PivotAggregationResult
. This type will contain a separated result for each defined pivot value. If you want to combine or “flatten” these two values into one (e.g. to extract the data into a csv file) , you can do this by defining the FlattenAggregationResults
method.
PivotAggregationResult vs. Flattened Output
If not flattened:
If flattened:
Managing Missing Values
The KeepEmptyValues
property controls whether columns with no data are included in the output. By default, it is set to true
. This is useful in cases where you want to maintain a consistent set of columns across all rows, even if some rows do not contain values for certain pivoted columns.
Note
Having this flag enable is important when writing dynamic output to a CSV file, as CSV writers require all header rows to be present in the first record. Without these columns, the structure of the CSV file could be inconsistent, leading to errors or missing data in the output file. If you want to produce a more compact output by avoiding the inclusion of empty or null columns, simply set this property to false
.
Consider a scenario where you are pivoting sales data for different months, but not all months have data for each group.
Input:
Group | Month | Sales |
---|---|---|
A | Jan | 100 |
B | Jan | 200 |
B | Feb | 300 |
C | Mar | 400 |
With KeepEmptyValues = true
, the output will contain all potential months, even though not every group has sales data for every month.
Output:
Group | Jan | Feb | Mar |
---|---|---|---|
A | 100 | null | null |
B | 200 | 300 | null |
C | null | null | 400 |