Dataset Terminology

Some terms regarding the data structures are explained here, including the definition of dataset, component, and attribute. For detailed data types used throughout power-grid-model, please refer to Python API Reference.

Buffer Type

Defines how component data is ordered in memory. Two buffer types are supported: row-based and columnar-based.

Row (row-based, row-major)

Attributes of the same component are stored contiguously before moving to the next component.

Columnar (column-based, column-major)

Attributes are grouped across components by attribute type.

Buffer Representation

Defines whether component data can be interpreted as a dense 2D matrix.

Dense

Dense buffers represent data as a rectangular matrix. This representation implies that all scenarios contain the same number of component entries.

Sparse

Component data is stored as a flattened 1D buffer.

Scenario boundaries are defined using an index pointer (indptr). The indptr defines how the flattened buffer is segmented into per-scenario ranges.

Sparse buffers may be either uniform or non-uniform.

Component Dataset Independency

Defines whether all scenarios operate on the same component IDs.

Independent

All scenarios modify the same component IDs in the same order.

Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore a reset is required between scenarios.

Dependent

Different scenarios may modify different components.

Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore a reset is required between scenarios.

Component Data Uniformity

Defines whether all scenarios contain the same number of component entries, independent of buffer representation. Uniformity is independent of buffer representation.

Uniform

All scenarios contain the same number of component entries.

  • Dense buffers are always uniform (by construction)

  • Sparse buffers may also be uniform

Non-uniform

Scenarios contain different numbers of component entries.

  • Only possible in sparse representation

Serialization Representation

Defines how datasets are serialized. Three serialization representations are supported: compact list, named map, and mixed.

Compact List

Uses positional arrays instead of named attributes. The attributes present in the dataset are stored separately.

Generated when using compact_list=True.

Named Map

Uses explicit attribute names per component.

Mixed

Combination of compact list and named map (only possible in manual construction, e.g. validation datasets).

Data structures

        graph TD
    subgraph Other numpy arrays
    IndexPointer
    SingleColumn
    BatchColumn
    end

    subgraph Datasets
    Dataset --> SingleDataset
    Dataset --> BatchDataset
    end


    click Dataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.Dataset"
    click SingleDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleDataset"
    click BatchDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchDataset"

    click IndexPointer href "../api_reference/python-api-reference.html#power_grid_model.data_types.IndexPointer"
    click SingleColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumn"
    click BatchColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumn"
    
        graph TD
    subgraph Dataset values
    ComponentData --> DataArray
    ComponentData --> ColumnarData

    DataArray --> SingleArray
    DataArray --> BatchArray

    BatchArray --> DenseBatchArray
    BatchArray --> SparseBatchArray

    ColumnarData --> SingleColumnarData
    ColumnarData --> BatchColumnarData

    BatchColumnarData --> DenseBatchColumnarData
    BatchColumnarData --> SparseBatchColumnarData
    end

    click ComponentData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ComponentData"
    click DataArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DataArray"
    click ColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ColumnarData"
    click SingleArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleArray"
    click BatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchArray"
    click DenseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchArray"
    click SparseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchArray"
    click SingleColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumnarData"
    click BatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumnarData"
    click DenseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchColumnarData"
    click SparseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchColumnarData"

    
  • Dataset: Either a single or a batch dataset. It is a dictionary with keys as the component types (e.g., line, node, etc.) and values as ComponentData

    • SingleDataset: A data type storing input data (i.e., all elements of all components) for a single scenario.

    • BatchDataset: A data type storing update and or output data for one or more scenarios. A batch dataset can contain dense or sparse representations per component.

  • ComponentData: The data corresponding to the component.

    • DataArray: A data array can be a single or a batch array. It is a numpy structured array.

      • SingleArray: A 1D numpy structured array corresponding to a single dataset.

      • BatchArray: Multiple batches of data can be represented in sparse or dense forms.

        • DenseBatchArray: A 2D structured numpy array containing a list of components of the same type for each scenario. This implies all scenarios contain the same number of components (uniform structure).

        • SparseBatchArray: A typed dictionary with a 1D numpy array of Indexpointer type under indptr key and SingleArray under data key which is all components flattened across scenarios, with scenario boundaries defined by indptr.

    • ColumnarData: A dictionary of attributes as keys and individual numpy arrays as values. This format is described in more detail in Native Data Interface.

      • SingleColumnarData: A dictionary of attributes as keys and SingleColumn as values in a single dataset.

      • BatchColumnarData: Multiple batches of data can be represented in sparse or dense forms.

        • DenseBatchColumnarData: A dictionary of attributes as keys and 2D/3D numpy array of BatchColumn type as values in a single dataset.

        • SparseBatchColumnarData: A typed dictionary with a 1D numpy array of Indexpointer type under indptr key and SingleColumn under data which is all components flattened over all batches.

  • IndexPointer: A 1D numpy array of int64 type used to specify sparse batches. It indicates the range of components within a scenario. For example, an Index pointer of [0, 1, 3, 3] indicates 4 batches with element indexed with 0 in 1st batch, [1, 2, 3] in 2nd batch and no elements in 3rd batch.

  • SingleColumn: A 1D/2D numpy array of values corresponding to a specific attribute.

  • BatchColumn: A 2D/3D numpy array of values corresponding to a specific attribute.

Dimensions of numpy arrays

The dimensions of numpy arrays and the interpretation of each dimension is as follows.

Data Type

1D

2D

3D

SingleArray

Corresponds to a single dataset.

DenseBatchArray

Batch number \(\times\) Component within that batch

SingleColumn

Component within that batch.

Component within that batch \(\times\) Phases ✨

BatchColumn

Batch number \(\times\) Component within that batch

Batch number \(\times\) Component within that batch \(\times\) Phases ✨

Note

✨ The “Phases” dimension is optional and is available only when the attributes are asymmetric.

Type of Dataset

The types of Dataset include the following: input, update, sym_output, asym_output, and sc_output. They are included under the enum DatasetType. Exemplary datasets attributes are given in a dataset containing a line component.

  • input: Contains attributes relevant to configuration of grid.

    • Example: id, from_node, from_status

  • update: Contains attributes relevant to multiple scenarios.

    • Example: from_status,to_status

  • sym_output: Contains attributes relevant to symmetrical steady state output of power flow or state estimation calculation.

    • Example: p_from, p_to

  • asym_output: Contains attributes relevant to asymmetrical steady state output of power flow or state estimation calculation. Attributes are similar to sym_output except some values of the asymmetrical dataset will contain detailed data for all 3 phases individually.

    • Example: p_from, p_to

  • sc_output: Contains attributes relevant to symmetrical short circuit calculation output. Like for the asym_output, detailed data for all 3 phases will be provided where relevant.

    • Example: i_from, i_from_angle

Attributes of Components

Attribute

Description

name

Name of the attribute. It is exactly the same as the attribute name in power_grid_model.power_grid_meta_data. They are included under the enum AttributeType.

data type

Data type of the attribute. It is either a type from the table in Native Data Interface, or an enumeration as defined above. There are two special data types that are independent from one another, namely, RealValueInput and RealValueOutput.

RealValueInput is used for some input attributes. It is a double for a symmetric class (e.g. sym_load) and double[3] an asymmetric class (e.g. asym_load). It is explained in detail in the corresponding types.

RealValueOutput is used for many output attributes. It is a double in symmetric calculation and double[3] for asymmetric and short circuit calculations.

unit

Unit of the attribute, if applicable. As a general rule, only standard SI units without any prefix are used.

description

Description of the attribute.

required

Whether the attribute is required. If not, then it is optional. Note if you choose not to specify an optional attribute, it should have the null value as defined in Basic Data Types.

update

Whether the attribute can be mutated by the update call PowerGridModel.update on an existing instance, only applicable when this attribute is part of an input dataset.

valid values

Whether applicable or not; an indication of value validity for the input data.