Microsoft.Data.Analysis.DataFrame is a .NET library designed for data manipulation and analysis, similar to the pandas library in Python. It allows users to work with tabular data, supporting various data types and providing functionalities for numerous data operations.
Microsoft.Data.Analysis.DataFrame offers efficient columnar storage for data and supports multiple data types, including integers, strings, decimals, and dates. It provides a variety of data operations such as filtering, aggregation, and joining, enabling robust data analysis capabilities within the .NET ecosystem.
Here is a simple example demonstrating how to create and manipulate a DataFrame:
In this example, three columns are created for integers, strings, and nullable doubles. These columns are then used to construct a DataFrame. The DataFrame is printed to the console to display its contents. A filter operation is performed to select rows where the values in the “Col1” column are greater than 1. The filtered DataFrame is then printed to the console, showing the rows that meet the filter criteria.
The output of the example code is as follows:
This introduction and example provide a basic understanding of how to use Microsoft.Data.Analysis.DataFrame for fundamental data operations.
You can simply connect the DataFrameSource to an existing DataFrame. This allows you to use the underlying data frame as a source for a data flow pipeline. Here is an example how to load the data into a memory object. Please note that this will convert the columnar storage into a classic row-based storage.
When extracting from an existing data frame using a POCO to map your rows into, you can also describe how the column in the data frame should be mapped to the properties in your object. If needed, you can map column with different naming to your properties, or ignore columns that have the same name as the property.
Instead of extract the data from an existing DataFrame using a strongly typed object, we can also utilize a dynamic ExpandoObject for this. The following code demonstrates this. Please note that we now use a slight different approach to initially fill our data frame.
When loading data into a data frame, you can use the DataFrameColumnMap attribute to map the name of the column in the data frame with a property in your object. Alternatively, you can define if a property should not be part of the destination data frame.