Dataset Terminology

Some terms regarding the data structures are explained here, including the definition of dataset, component, and attribute. For detailed data types used throughout power-grid-model, please refer to Python API Reference.

Buffer Type

Defines how component data is ordered in memory. Two buffer types are supported: row-based and columnar-based.

Row (row-based, row-major)

Attributes of the same component are stored contiguously before moving to the next component.

Columnar (column-based, column-major)

Attributes are grouped across components by attribute type.

Buffer Representation

Defines whether component data can be interpreted as a dense 2D matrix.

Dense

Dense buffers represent data as a rectangular matrix. This representation implies that all scenarios contain the same number of component entries.

Sparse

Component data is stored as a flattened 1D buffer.

Scenario boundaries are defined using an index pointer (indptr). The indptr defines how the flattened buffer is segmented into per-scenario ranges.

Sparse buffers may be either uniform or non-uniform.

Component Dataset Independency

Defines whether all scenarios operate on the same component IDs.

Independent

All scenarios modify the same component IDs in the same order.

Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore a reset is required between scenarios.

Dependent

Different scenarios may modify different components.

Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore a reset is required between scenarios.

Component Data Uniformity

Defines whether all scenarios contain the same number of component entries, independent of buffer representation. Uniformity is independent of buffer representation.

Uniform

All scenarios contain the same number of component entries.

Dense buffers are always uniform (by construction)
Sparse buffers may also be uniform

Non-uniform

Scenarios contain different numbers of component entries.

Only possible in sparse representation

Serialization Representation

Defines how datasets are serialized. Three serialization representations are supported: compact list, named map, and mixed.

Compact List

Uses positional arrays instead of named attributes. The attributes present in the dataset are stored separately.

Generated when using compact_list=True.

Named Map

Uses explicit attribute names per component.

Mixed

Combination of compact list and named map (only possible in manual construction, e.g. validation datasets).

Data structures

        graph TD
    subgraph Other numpy arrays
    IndexPointer
    SingleColumn
    BatchColumn
    end

    subgraph Datasets
    Dataset --> SingleDataset
    Dataset --> BatchDataset
    end


    click Dataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.Dataset"
    click SingleDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleDataset"
    click BatchDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchDataset"

    click IndexPointer href "../api_reference/python-api-reference.html#power_grid_model.data_types.IndexPointer"
    click SingleColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumn"
    click BatchColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumn"

        graph TD
    subgraph Dataset values
    ComponentData --> DataArray
    ComponentData --> ColumnarData

    DataArray --> SingleArray
    DataArray --> BatchArray

    BatchArray --> DenseBatchArray
    BatchArray --> SparseBatchArray

    ColumnarData --> SingleColumnarData
    ColumnarData --> BatchColumnarData

    BatchColumnarData --> DenseBatchColumnarData
    BatchColumnarData --> SparseBatchColumnarData
    end

    click ComponentData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ComponentData"
    click DataArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DataArray"
    click ColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ColumnarData"
    click SingleArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleArray"
    click BatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchArray"
    click DenseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchArray"
    click SparseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchArray"
    click SingleColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumnarData"
    click BatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumnarData"
    click DenseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchColumnarData"
    click SparseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchColumnarData"

Dataset: Either a single or a batch dataset. It is a dictionary with keys as the component types (e.g., line, node, etc.) and values as ComponentData
- SingleDataset: A data type storing input data (i.e., all elements of all components) for a single scenario.
- BatchDataset: A data type storing update and or output data for one or more scenarios. A batch dataset can contain dense or sparse representations per component.
ComponentData: The data corresponding to the component.
- DataArray: A data array can be a single or a batch array. It is a numpy structured array.
  - SingleArray: A 1D numpy structured array corresponding to a single dataset.
  - BatchArray: Multiple batches of data can be represented in sparse or dense forms.
    - DenseBatchArray: A 2D structured numpy array containing a list of components of the same type for each scenario. This implies all scenarios contain the same number of components (uniform structure).
    - SparseBatchArray: A typed dictionary with a 1D numpy array of Indexpointer type under indptr key and SingleArray under data key which is all components flattened across scenarios, with scenario boundaries defined by indptr.
- ColumnarData: A dictionary of attributes as keys and individual numpy arrays as values. This format is described in more detail in Native Data Interface.
  - SingleColumnarData: A dictionary of attributes as keys and SingleColumn as values in a single dataset.
  - BatchColumnarData: Multiple batches of data can be represented in sparse or dense forms.
    - DenseBatchColumnarData: A dictionary of attributes as keys and 2D/3D numpy array of BatchColumn type as values in a single dataset.
    - SparseBatchColumnarData: A typed dictionary with a 1D numpy array of Indexpointer type under indptr key and SingleColumn under data which is all components flattened over all batches.
IndexPointer: A 1D numpy array of int64 type used to specify sparse batches. It indicates the range of components within a scenario. For example, an Index pointer of [0, 1, 3, 3] indicates 4 batches with element indexed with 0 in 1st batch, [1, 2, 3] in 2nd batch and no elements in 3rd batch.
SingleColumn: A 1D/2D numpy array of values corresponding to a specific attribute.
BatchColumn: A 2D/3D numpy array of values corresponding to a specific attribute.

Dimensions of numpy arrays

The dimensions of numpy arrays and the interpretation of each dimension is as follows.

Data Type	1D	2D	3D
SingleArray	Corresponds to a single dataset.	❌	❌
DenseBatchArray	❌	Batch number \(\times\) Component within that batch	❌
SingleColumn	Component within that batch.	Component within that batch \(\times\) Phases ✨	❌
BatchColumn	❌	Batch number \(\times\) Component within that batch	Batch number \(\times\) Component within that batch \(\times\) Phases ✨

Note

✨ The “Phases” dimension is optional and is available only when the attributes are asymmetric.

Type of Dataset

The types of Dataset include the following: input, update, sym_output, asym_output, and sc_output. They are included under the enum DatasetType. Exemplary datasets attributes are given in a dataset containing a line component.

input: Contains attributes relevant to configuration of grid.
- Example: id, from_node, from_status
update: Contains attributes relevant to multiple scenarios.
- Example: from_status,to_status
sym_output: Contains attributes relevant to symmetrical steady state output of power flow or state estimation calculation.
- Example: p_from, p_to
asym_output: Contains attributes relevant to asymmetrical steady state output of power flow or state estimation calculation. Attributes are similar to sym_output except some values of the asymmetrical dataset will contain detailed data for all 3 phases individually.
- Example: p_from, p_to
sc_output: Contains attributes relevant to symmetrical short circuit calculation output. Like for the asym_output, detailed data for all 3 phases will be provided where relevant.
- Example: i_from, i_from_angle

Attributes of Components

Attribute	Description
name	Name of the attribute. It is exactly the same as the attribute name in `power_grid_model.power_grid_meta_data`. They are included under the enum `AttributeType`.
data type	Data type of the attribute. It is either a type from the table in Native Data Interface, or an enumeration as defined above. There are two special data types that are independent from one another, namely, `RealValueInput` and `RealValueOutput`.
	`RealValueInput` is used for some input attributes. It is a `double` for a symmetric class (e.g. `sym_load`) and `double[3]` an asymmetric class (e.g. `asym_load`). It is explained in detail in the corresponding types.
	`RealValueOutput` is used for many output attributes. It is a `double` in symmetric calculation and `double[3]` for asymmetric and short circuit calculations.
unit	Unit of the attribute, if applicable. As a general rule, only standard SI units without any prefix are used.
description	Description of the attribute.
required	Whether the attribute is required. If not, then it is optional. Note if you choose not to specify an optional attribute, it should have the null value as defined in Basic Data Types.
update	Whether the attribute can be mutated by the update call `PowerGridModel.update` on an existing instance, only applicable when this attribute is part of an input dataset.
valid values	Whether applicable or not; an indication of value validity for the input data.