Time Series Aggregation¶
In this example, we are going to explore different ways to cluster the temporal resolution of PyPSA models, and what impact they have on the optimisation results and solving times. Using an hourly resolved variant of the single-node capacity expansion example, we will compare three different approaches to reduce the number of time steps in the model:
- Sampling: Selecting a subset of the given snapshots based on a given frequency,
- Averaging: Aggregating the snapshots by averaging them over a given frequency, and
- Segmentation: Clustering the snapshots into segments of a given frequency and using
tsamlibrary.
We start with the usual imports and loading the hourly resolved model.
import logging
import time
import numpy as np
import pandas as pd
import tsam.timeseriesaggregation as tsam
import pypsa
logging.getLogger().setLevel(logging.WARNING)
SOLVER = "highs" # or "gurobi"
template_n = pypsa.Network(
"https://tubcloud.tu-berlin.de/s/4ra3NKrLGzE42of/download/model-energy-hourly.nc"
)
INFO:pypsa.network.io:Retrieving network data from https://tubcloud.tu-berlin.de/s/4ra3NKrLGzE42of/download/model-energy-hourly.nc.
WARNING:pypsa.network.io:Importing network from PyPSA version v0.35.0 while current version is v1.0.3. Read the release notes at `https://go.pypsa.org/release-notes` to prepare your network for import.
INFO:pypsa.network.io:Imported network 'Unnamed Network' has buses, carriers, generators, links, loads, storage_units, stores
Since we also want to monitor the solving times, we will use a small utility function that wraps around the solving process and returns the seconds it took to solve the model:
def time_it(func, *args, **kwargs):
"""Time the execution of a function and return the elapsed time in seconds."""
start_time = time.time()
result = func(*args, **kwargs)
elapsed_time = time.time() - start_time
return result, elapsed_time
logging.getLogger().setLevel(logging.WARNING)
Hourly Baseline¶
Additionally, we need a baseline model to compare aggregated models against. We will use the hourly resolved model for this.
n_hourly = template_n.copy()
_, s_hourly = time_it(n_hourly.optimize, solver_name=SOLVER, log_to_console=False)
s_hourly
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 38%|███▊ | 8/21 [00:00<00:00, 49.52it/s]
Writing constraints.: 62%|██████▏ | 13/21 [00:00<00:00, 34.75it/s]
Writing constraints.: 81%|████████ | 17/21 [00:00<00:00, 34.69it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 22.35it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 27.06it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 82%|████████▏ | 9/11 [00:00<00:00, 79.69it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 77.42it/s]
78.80059361457825
Sampling¶
We will start with the sampling approach, which is the simplest one. We simply select every $N$-th snapshot from the model. The important part here is that we need to adjust the snapshot weightings accordingly, as each remaining snapshot now represents $N$ hours. We iterate over N from 2 to 11, i.e. from 2-hourly to 11-hourly resolved models.
sampling_n = {1: n_hourly}
sampling_s = {1: s_hourly}
for resolution in range(2, 12):
n = template_n.copy()
# set the sampled snapshots (time series are automatically reduced)
n.set_snapshots(n.snapshots[::resolution])
n.snapshot_weightings.loc[:, :] = resolution
_, s = time_it(n.optimize, solver_name=SOLVER, log_to_console=False)
sampling_n[resolution] = n
sampling_s[resolution] = s
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 43%|████▎ | 9/21 [00:00<00:00, 80.12it/s]
Writing constraints.: 86%|████████▌ | 18/21 [00:00<00:00, 63.31it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 50.87it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 144.28it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 48%|████▊ | 10/21 [00:00<00:00, 93.48it/s]
Writing constraints.: 95%|█████████▌| 20/21 [00:00<00:00, 69.59it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 70.89it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 203.31it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 57%|█████▋ | 12/21 [00:00<00:00, 115.14it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 90.01it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 258.17it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 67%|██████▋ | 14/21 [00:00<00:00, 133.95it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 106.91it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 308.27it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 76%|███████▌ | 16/21 [00:00<00:00, 150.52it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 122.05it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 344.06it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 81%|████████ | 17/21 [00:00<00:00, 168.41it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 136.41it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 392.84it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 86%|████████▌ | 18/21 [00:00<00:00, 179.65it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 149.14it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 427.45it/s]
Averaging¶
The averaging approach also has equal snapshot durations, but instead of selecting data from every $N$-th snapshots, we average time series data for every $N$ snapshots. This means that the resulting model has $N$ times fewer snapshots, but each snapshot represents the average of $N$ original snapshots. Again, we need to adjust the snapshot weightings accordingly.
averaging_n = {1: n_hourly}
averaging_s = {1: s_hourly}
for resolution in range(2, 12):
n = template_n.copy()
# resample the time series data by averaging
n.loads_t.p_set = n.loads_t.p_set.resample(f"{resolution}h").mean()
n.generators_t.p_max_pu = n.generators_t.p_max_pu.resample(f"{resolution}h").mean()
# set the new snapshtos and adjusted snapshot weightings
n.set_snapshots(n.snapshots[::resolution])
n.snapshot_weightings.loc[:, :] = resolution
_, s = time_it(n.optimize, solver_name=SOLVER, log_to_console=False)
averaging_n[resolution] = n
averaging_s[resolution] = s
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 43%|████▎ | 9/21 [00:00<00:00, 80.23it/s]
Writing constraints.: 86%|████████▌ | 18/21 [00:00<00:00, 63.67it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 50.74it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 146.71it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 48%|████▊ | 10/21 [00:00<00:00, 93.80it/s]
Writing constraints.: 95%|█████████▌| 20/21 [00:00<00:00, 69.96it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 71.52it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 207.32it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 57%|█████▋ | 12/21 [00:00<00:00, 114.42it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 90.84it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 256.15it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 67%|██████▋ | 14/21 [00:00<00:00, 134.15it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 107.84it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 307.41it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 76%|███████▌ | 16/21 [00:00<00:00, 150.61it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 122.46it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 351.12it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 81%|████████ | 17/21 [00:00<00:00, 165.46it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 134.71it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 388.93it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 86%|████████▌ | 18/21 [00:00<00:00, 178.22it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 148.28it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 415.53it/s]
Segmentation¶
The segmentation approach is more complex. It uses a separate library called tsam to cluster the snapshots into segments of varying lengths. The sequence of snapshots is preserved as segments are only formed from neighbouring snapshots based on their similarity. For measuring similarity, it is advisable to normalise the time series data. This approach promises to capture the temporal patterns more effectively, as it can opt for higher resolution during periods of high variability and lower resolution during periods with low variability. The snapshot weightings are adjusted based on the number of snapshots in each segment.
segmentation_n = {1: n_hourly}
segmentation_s = {1: s_hourly}
for resolution in range(2, 12):
n = template_n.copy()
# calculate number of segments equivalent to resolution
segments = int(8760 / resolution)
# concatenate and normalize all time series with min-max normalization
df = pd.concat([n.generators_t.p_max_pu, n.loads_t.p_set], axis=1)
df_norm = (df - df.min()) / (df.max() - df.min())
# use `tsam` to run segmentation clustering algorithm
agg = tsam.TimeSeriesAggregation(
df_norm,
hoursPerPeriod=len(df_norm),
noTypicalPeriods=1,
noSegments=segments,
segmentation=True,
solver=SOLVER,
)
agg = agg.createTypicalPeriods()
# translate segments into time stamps and calculate new weightings
weightings = agg.index.get_level_values("Segment Duration")
offsets = np.insert(np.cumsum(weightings[:-1]), 0, 0)
weightings = n.snapshot_weightings.loc[n.snapshots[offsets]].mul(weightings, axis=0)
# aggregate the hourly time series by averaging over the segments
mapping = (
pd.Series(weightings.index, index=weightings.index).reindex(n.snapshots).ffill()
)
n.generators_t.p_max_pu = n.generators_t.p_max_pu.groupby(mapping).mean()
n.loads_t.p_set = n.loads_t.p_set.groupby(mapping).mean()
# set new segmented snapshots and adjust weightings
n.set_snapshots(weightings.index)
n.snapshot_weightings = weightings
# run optimization
_, s = time_it(n.optimize, solver_name=SOLVER, log_to_console=False)
segmentation_n[resolution] = n
segmentation_s[resolution] = s
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 43%|████▎ | 9/21 [00:00<00:00, 78.32it/s]
Writing constraints.: 81%|████████ | 17/21 [00:00<00:00, 62.72it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 49.85it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 144.06it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 48%|████▊ | 10/21 [00:00<00:00, 91.04it/s]
Writing constraints.: 95%|█████████▌| 20/21 [00:00<00:00, 67.64it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 69.17it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 205.22it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 57%|█████▋ | 12/21 [00:00<00:00, 113.14it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 89.39it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 255.07it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 67%|██████▋ | 14/21 [00:00<00:00, 132.17it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 106.34it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 301.13it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 76%|███████▌ | 16/21 [00:00<00:00, 149.13it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 121.65it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 350.87it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 76%|███████▌ | 16/21 [00:00<00:00, 155.96it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 129.46it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 380.94it/s]
Writing constraints.: 0%| | 0/21 [00:00<?, ?it/s]
Writing constraints.: 86%|████████▌ | 18/21 [00:00<00:00, 174.93it/s]
Writing constraints.: 100%|██████████| 21/21 [00:00<00:00, 145.33it/s]
Writing continuous variables.: 0%| | 0/11 [00:00<?, ?it/s]
Writing continuous variables.: 100%|██████████| 11/11 [00:00<00:00, 416.17it/s]
Now before we go ahead with the evaluation of the different approaches, let's quickly glance at the distribution of snapshot durations obtained from the segmentation approach for a resolution equivalent to a 3-hourly model. We can see quite some variability in the segment length.
segmentation_n[3].snapshot_weightings.generators.value_counts().sort_index(
ascending=True
).plot.bar(ylabel="snapshots [number]", xlabel="snapshot duration [h]")
<Axes: xlabel='snapshot duration [h]', ylabel='snapshots [number]'>
Evaluation¶
Let's start our evaluation with a look at the solving times. We can see that across all approaches, the solving times quickly decay, especially as we go from hourly to 2-hourly resolved models and decrease less substantially afterwards.
pd.Series(sampling_s).plot(label="sampling", legend=True)
pd.Series(averaging_s).plot(label="averaging", legend=True)
pd.Series(segmentation_s).plot(
label="segmentation", ylabel="time [s]", xlabel="resolution [h]", legend=True
)
<Axes: xlabel='resolution [h]', ylabel='time [s]'>
Furthermore, we compare the relative error in total system costs compared to the hourly resolved model. We can see how the segmentation approach remains more stable than the other two approaches, especially for lower resolutions. Even with 11-hourly equivalent resolution, the segmentation approach only has a relative error of -1% compared to the hourly resolved model (i.e. it appears to be 1% cheaper). Similar patterns can be observed for the relative error in the total installed capacity of solar and batteries, two technologies that are particularly sensitive to the temporal resolution of the model.
def tsc(n):
return (n.statistics.capex().sum() + n.statistics.opex().sum()) / 1e9
pd.concat(
[
pd.Series({("sampling", res): tsc(n) for res, n in sampling_n.items()}),
pd.Series({("averaging", res): tsc(n) for res, n in averaging_n.items()}),
pd.Series({("segmentation", res): tsc(n) for res, n in segmentation_n.items()}),
]
).unstack(0).div(tsc(n_hourly)).sub(1).mul(100).plot(
ylabel="relative objective error [%]", xlabel="resolution [h]", legend=True
)
<Axes: xlabel='resolution [h]', ylabel='relative objective error [%]'>
def solar(n):
return n.generators.loc["solar", "p_nom_opt"]
pd.concat(
[
pd.Series({("sampling", res): solar(n) for res, n in sampling_n.items()}),
pd.Series({("averaging", res): solar(n) for res, n in averaging_n.items()}),
pd.Series(
{("segmentation", res): solar(n) for res, n in segmentation_n.items()}
),
]
).unstack(0).div(solar(n_hourly)).sub(1).mul(100).plot(
ylabel="relative solar capacity error [%]", xlabel="resolution [h]", legend=True
)
<Axes: xlabel='resolution [h]', ylabel='relative solar capacity error [%]'>
def battery(n):
return n.storage_units.loc["battery storage", "p_nom_opt"]
pd.concat(
[
pd.Series({("sampling", res): battery(n) for res, n in sampling_n.items()}),
pd.Series({("averaging", res): battery(n) for res, n in averaging_n.items()}),
pd.Series(
{("segmentation", res): battery(n) for res, n in segmentation_n.items()}
),
]
).unstack(0).div(battery(n_hourly)).sub(1).mul(100).plot(
ylabel="relative battery capacity error [%]", xlabel="resolution [h]", legend=True
)
<Axes: xlabel='resolution [h]', ylabel='relative battery capacity error [%]'>