Dataset Terminology
Some terms regarding the data structures are explained here, including the definition of dataset, component, and
attribute.
For detailed data types used throughout power-grid-model, please refer to
Python API Reference.
Buffer Type
Defines how component data is ordered in memory. Two buffer types are supported: row-based and columnar-based.
Row (row-based, row-major)
Attributes of the same component are stored contiguously before moving to the next component.
Columnar (column-based, column-major)
Attributes are grouped across components by attribute type.
Buffer Representation
Defines whether component data can be interpreted as a dense 2D matrix.
Dense
Dense buffers represent data as a rectangular matrix. This representation implies that all scenarios contain the same number of component entries.
Sparse
Component data is stored as a flattened 1D buffer.
Scenario boundaries are defined using an index pointer (indptr).
The indptr defines how the flattened buffer is segmented into per-scenario ranges.
Sparse buffers may be either uniform or non-uniform.
Component Dataset Independency
Defines whether all scenarios operate on the same component IDs.
Independent
All scenarios modify the same component IDs in the same order.
Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore a reset is required between scenarios.
Dependent
Different scenarios may modify different components.
Each scenario starts from the original input dataset, without carrying over changes from previous scenarios, therefore a reset is required between scenarios.
Component Data Uniformity
Defines whether all scenarios contain the same number of component entries, independent of buffer representation. Uniformity is independent of buffer representation.
Uniform
All scenarios contain the same number of component entries.
Dense buffers are always uniform (by construction)
Sparse buffers may also be uniform
Non-uniform
Scenarios contain different numbers of component entries.
Only possible in sparse representation
Serialization Representation
Defines how datasets are serialized. Three serialization representations are supported: compact list, named map, and mixed.
Compact List
Uses positional arrays instead of named attributes. The attributes present in the dataset are stored separately.
Generated when using compact_list=True.
Named Map
Uses explicit attribute names per component.
Mixed
Combination of compact list and named map (only possible in manual construction, e.g. validation datasets).
Data structures
graph TD
subgraph Other numpy arrays
IndexPointer
SingleColumn
BatchColumn
end
subgraph Datasets
Dataset --> SingleDataset
Dataset --> BatchDataset
end
click Dataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.Dataset"
click SingleDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleDataset"
click BatchDataset href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchDataset"
click IndexPointer href "../api_reference/python-api-reference.html#power_grid_model.data_types.IndexPointer"
click SingleColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumn"
click BatchColumn href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumn"
graph TD
subgraph Dataset values
ComponentData --> DataArray
ComponentData --> ColumnarData
DataArray --> SingleArray
DataArray --> BatchArray
BatchArray --> DenseBatchArray
BatchArray --> SparseBatchArray
ColumnarData --> SingleColumnarData
ColumnarData --> BatchColumnarData
BatchColumnarData --> DenseBatchColumnarData
BatchColumnarData --> SparseBatchColumnarData
end
click ComponentData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ComponentData"
click DataArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DataArray"
click ColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.ColumnarData"
click SingleArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleArray"
click BatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchArray"
click DenseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchArray"
click SparseBatchArray href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchArray"
click SingleColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SingleColumnarData"
click BatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.BatchColumnarData"
click DenseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.DenseBatchColumnarData"
click SparseBatchColumnarData href "../api_reference/python-api-reference.html#power_grid_model.data_types.SparseBatchColumnarData"
Dataset: Either a single or a batch dataset. It is a dictionary with keys as the component types (e.g.,line,node, etc.) and values as ComponentDataSingleDataset: A data type storing input data (i.e., all elements of all components) for a single scenario.BatchDataset: A data type storing update and or output data for one or more scenarios. A batch dataset can contain dense or sparse representations per component.
ComponentData: The data corresponding to the component.DataArray: A data array can be a single or a batch array. It is a numpy structured array.SingleArray: A 1D numpy structured array corresponding to a single dataset.BatchArray: Multiple batches of data can be represented in sparse or dense forms.DenseBatchArray: A 2D structured numpy array containing a list of components of the same type for each scenario. This implies all scenarios contain the same number of components (uniform structure).SparseBatchArray: A typed dictionary with a 1D numpy array ofIndexpointertype underindptrkey andSingleArrayunderdatakey which is all components flattened across scenarios, with scenario boundaries defined byindptr.
ColumnarData: A dictionary of attributes as keys and individual numpy arrays as values. This format is described in more detail in Native Data Interface.SingleColumnarData: A dictionary of attributes as keys andSingleColumnas values in a single dataset.BatchColumnarData: Multiple batches of data can be represented in sparse or dense forms.DenseBatchColumnarData: A dictionary of attributes as keys and 2D/3D numpy array ofBatchColumntype as values in a single dataset.SparseBatchColumnarData: A typed dictionary with a 1D numpy array ofIndexpointertype underindptrkey andSingleColumnunderdatawhich is all components flattened over all batches.
IndexPointer: A 1D numpy array of int64 type used to specify sparse batches. It indicates the range of components within a scenario. For example, an Index pointer of [0, 1, 3, 3] indicates 4 batches with element indexed with 0 in 1st batch, [1, 2, 3] in 2nd batch and no elements in 3rd batch.SingleColumn: A 1D/2D numpy array of values corresponding to a specific attribute.BatchColumn: A 2D/3D numpy array of values corresponding to a specific attribute.
Dimensions of numpy arrays
The dimensions of numpy arrays and the interpretation of each dimension is as follows.
Data Type |
1D |
2D |
3D |
|---|---|---|---|
SingleArray |
Corresponds to a single dataset. |
❌ |
❌ |
DenseBatchArray |
❌ |
Batch number \(\times\) Component within that batch |
❌ |
SingleColumn |
Component within that batch. |
Component within that batch \(\times\) Phases ✨ |
❌ |
BatchColumn |
❌ |
Batch number \(\times\) Component within that batch |
Batch number \(\times\) Component within that batch \(\times\) Phases ✨ |
Note
✨ The “Phases” dimension is optional and is available only when the attributes are asymmetric.
Type of Dataset
The types of Dataset include the following: input, update, sym_output, asym_output, and sc_output.
They are included under the enum DatasetType.
Exemplary datasets attributes are given in a dataset containing a line component.
input: Contains attributes relevant to configuration of grid.
Example:
id,from_node,from_status
update: Contains attributes relevant to multiple scenarios.
Example:
from_status,to_status
sym_output: Contains attributes relevant to symmetrical steady state output of power flow or state estimation calculation.
Example:
p_from,p_to
asym_output: Contains attributes relevant to asymmetrical steady state output of power flow or state estimation calculation. Attributes are similar to
sym_outputexcept some values of the asymmetrical dataset will contain detailed data for all 3 phases individually.Example:
p_from,p_to
sc_output: Contains attributes relevant to symmetrical short circuit calculation output. Like for the
asym_output, detailed data for all 3 phases will be provided where relevant.Example:
i_from,i_from_angle
Attributes of Components
Attribute |
Description |
|---|---|
name |
Name of the attribute. It is exactly the same as the attribute name in |
data type |
Data type of the attribute. It is either a type from the table in Native Data Interface, or an enumeration as defined above. There are two special data types that are independent from one another, namely, |
|
|
|
|
unit |
Unit of the attribute, if applicable. As a general rule, only standard SI units without any prefix are used. |
description |
Description of the attribute. |
required |
Whether the attribute is required. If not, then it is optional. Note if you choose not to specify an optional attribute, it should have the null value as defined in Basic Data Types. |
update |
Whether the attribute can be mutated by the update call |
valid values |
Whether applicable or not; an indication of value validity for the input data. |