Network Clustering¶

In this section, we show how PyPSA can deal with spatial clustering of networks. We first load the SciGRID example network and reduce it to standard attributes as otherwise the clustering does not work.

The important information needed for spatial clustering is in the busmap. This pandas.Series contains the mapping of which buses from the unclustered network should be grouped together, similar to the groupby argument known from pandas.

You can either calculate a busmap from the provided clustering algorithms or you can provide your own busmap.

In [2]:

Copied!





import cartopy.crs as ccrs
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd

import pypsa

crs = ccrs.LambertConformal(
    central_longitude=10, central_latitude=51, standard_parallels=(47, 55)
)

kwargs = {"boundaries": [6, 15, 47, 55]}

n = pypsa.examples.scigrid_de()

n.calculate_dependent_values()
n.lines = n.lines.reindex(columns=n.components["Line"]["defaults"].index[1:])
n.lines["type"] = "Al/St 240/40 2-bundle 220.0"
n.buses = n.buses.reindex(columns=n.components["Bus"]["defaults"].index[1:])
n.buses["frequency"] = 50
import cartopy.crs as ccrs
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd

import pypsa

crs = ccrs.LambertConformal(
    central_longitude=10, central_latitude=51, standard_parallels=(47, 55)
)

kwargs = {"boundaries": [6, 15, 47, 55]}

n = pypsa.examples.scigrid_de()

n.calculate_dependent_values()
n.lines = n.lines.reindex(columns=n.components["Line"]["defaults"].index[1:])
n.lines["type"] = "Al/St 240/40 2-bundle 220.0"
n.buses = n.buses.reindex(columns=n.components["Bus"]["defaults"].index[1:])
n.buses["frequency"] = 50

INFO:pypsa.network.io:Imported network 'SciGrid-DE' has buses, carriers, generators, lines, loads, storage_units, transformers

Custom Clustering¶

Let's start with creating an own busmap. Suppose we want to cluster by federal states. Using geopandas we can achieve this by spatially joining the bus coordinates with a geopandas.GeoDataFrame containing the federal state boundaries. Here, we use the function .sjoin_nearest() with an appropriate projection ("EPSG:3035") as some buses are located near the borders of federal states in neighbouring countries.

In [3]:

Copied!

url = "https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/9469f09592ced973a3448cf66b6100b741b64c0d/releaseData/gbOpen/DEU/ADM1/geoBoundaries-DEU-ADM1-all.zip"
states = gpd.read_file(url, layer="geoBoundaries-DEU-ADM1_simplified")
states.plot(edgecolor="k")
url = "https://media.githubusercontent.com/media/wmgeolab/geoBoundaries/9469f09592ced973a3448cf66b6100b741b64c0d/releaseData/gbOpen/DEU/ADM1/geoBoundaries-DEU-ADM1-all.zip"
states = gpd.read_file(url, layer="geoBoundaries-DEU-ADM1_simplified")
states.plot(edgecolor="k")

Out[3]:

<Axes: >

No description has been provided for this image

In [4]:

Copied!





bus_coords = gpd.GeoDataFrame(
    geometry=gpd.points_from_xy(n.buses.x, n.buses.y, crs=4326), index=n.buses.index
)
busmap = bus_coords.to_crs(3035).sjoin_nearest(states.to_crs(3035), how="left").shapeISO
busmap.head(10)
bus_coords = gpd.GeoDataFrame(
    geometry=gpd.points_from_xy(n.buses.x, n.buses.y, crs=4326), index=n.buses.index
)
busmap = bus_coords.to_crs(3035).sjoin_nearest(states.to_crs(3035), how="left").shapeISO
busmap.head(10)

Out[4]:

name
1     DE-NI
2     DE-NI
3     DE-NI
4     DE-NI
5     DE-NI
6     DE-NI
7     DE-NI
8     DE-SH
9     DE-NI
10    DE-BB
Name: shapeISO, dtype: object

Now we cluster the network based on the busmap.

In [5]:

Copied!

cluster = n.cluster.spatial.get_clustering_from_busmap(busmap)
cluster = n.cluster.spatial.get_clustering_from_busmap(busmap)

cluster is a [Clustering][pypsa.clustering.spatial.Clustering] object which contains the cluster.busmap, the derived cluster.linemap (mapping of parallel lines to aggregate), and the clustered network cluster.n.

Let's compare the original to the clustered network topology:

In [6]:

Copied!





fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
for ax in axs:
    states.to_crs(crs.proj4_init).plot(ax=ax, edgecolor="orange", facecolor="none")
n.plot(ax=axs[0], title="unclustered", bus_size=0.005, **kwargs)
cluster.n.plot(ax=axs[1], title="clustered by federal state", bus_size=0.005, **kwargs);
fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
for ax in axs:
    states.to_crs(crs.proj4_init).plot(ax=ax, edgecolor="orange", facecolor="none")
n.plot(ax=axs[0], title="unclustered", bus_size=0.005, **kwargs)
cluster.n.plot(ax=axs[1], title="clustered by federal state", bus_size=0.005, **kwargs);

With the default settings of get_clustering_from_busmap(), the attached components are not aggregated. That means, for example, for every generator, only the bus it attaches to is changed to the respective new clustered bus.

In [7]:

Copied!

cluster.n.generators.query("bus=='DE-HB'")[["p_nom", "carrier"]]
cluster.n.generators.query("bus=='DE-HB'")[["p_nom", "carrier"]]

Out[7]:

	p_nom	carrier
name
61_220kV Gas	334.000000	Gas
61_220kV Hard Coal	350.000000	Hard Coal
62 Hard Coal	119.000000	Hard Coal
62 Run of River	9.900000	Run of River
62 Waste	44.000000	Waste
67 Gas	14.800000	Gas
67 Hard Coal	300.000000	Hard Coal
67 Oil	88.000000	Oil
67 Other	150.000000	Other
67 Storage Hydro	392.000000	Storage Hydro
67 Waste	33.000000	Waste
62 Wind Onshore	153.592986	Wind Onshore
67 Wind Onshore	72.327992	Wind Onshore
61_220kV Wind Onshore	103.005076	Wind Onshore
62 Solar	61.888351	Solar
67 Solar	29.531771	Solar
61_220kV Solar	38.075560	Solar

When clustering of attached components to one representative generator per carrier is desired, the aggregate_one_ports option can be used. For instance, to cluster generators to one representative generator per carrier and bus run:

In [8]:

Copied!





cluster = n.cluster.spatial.get_clustering_from_busmap(
    busmap, aggregate_one_ports={"Generator"}
)
cluster.n.generators.query("bus=='DE-HB'")[["p_nom", "carrier"]]
cluster = n.cluster.spatial.get_clustering_from_busmap(
    busmap, aggregate_one_ports={"Generator"}
)
cluster.n.generators.query("bus=='DE-HB'")[["p_nom", "carrier"]]

Out[8]:

	p_nom	carrier
name
DE-HB Gas	348.800000	Gas
DE-HB Hard Coal	769.000000	Hard Coal
DE-HB Oil	88.000000	Oil
DE-HB Other	150.000000	Other
DE-HB Run of River	9.900000	Run of River
DE-HB Solar	129.495682	Solar
DE-HB Storage Hydro	392.000000	Storage Hydro
DE-HB Waste	77.000000	Waste
DE-HB Wind Onshore	328.926054	Wind Onshore

If only some carriers should be aggregated, this can be controlled with the aggregate_generators_carriers option. For example, to aggregate wind, solar and hard coal, run:

In [9]:

Copied!





cluster = n.cluster.spatial.get_clustering_from_busmap(
    busmap,
    aggregate_generators_carriers=["Hard Coal", "Wind Onshore", "Solar"],
    aggregate_generators_weighted=True,
)
cluster.n.generators.query("bus=='DE-HB'")[["p_nom", "carrier"]]
cluster = n.cluster.spatial.get_clustering_from_busmap(
    busmap,
    aggregate_generators_carriers=["Hard Coal", "Wind Onshore", "Solar"],
    aggregate_generators_weighted=True,
)
cluster.n.generators.query("bus=='DE-HB'")[["p_nom", "carrier"]]

Out[9]:

	p_nom	carrier
name
DE-HB Hard Coal	769.000000	Hard Coal
DE-HB Solar	129.495682	Solar
DE-HB Wind Onshore	328.926054	Wind Onshore
61_220kV Gas	334.000000	Gas
62 Run of River	9.900000	Run of River
62 Waste	44.000000	Waste
67 Gas	14.800000	Gas
67 Oil	88.000000	Oil
67 Other	150.000000	Other
67 Storage Hydro	392.000000	Storage Hydro
67 Waste	33.000000	Waste

For each attribute, default aggregation strategies are defined (e.g. minimum, maximum, sum, or capacity-weighted sum). These can be found under:

from pypsa.clustering.spatial import DEFAULT_ONE_PORT_STRATEGIES, DEFAULT_BUS_STRATEGIES, DEFAULT_LINE_STRATEGIES

K-means Clustering¶

Let's now determine' a clustering automatically based on the K-means algorithm for a target number of 40 clusters using sklearn.cluster.KMeans. Initially, we will assume uniform weights for all buses (i.e. the clustering will be based on pure geographical distance). Later we will add bus-specific weights.

In [10]:

Copied!

weighting = pd.Series(1, n.buses.index)
busmap = n.cluster.spatial.busmap_by_kmeans(bus_weightings=weighting, n_clusters=40)
n_clustered = n.cluster.spatial.cluster_by_busmap(busmap)
weighting = pd.Series(1, n.buses.index)
busmap = n.cluster.spatial.busmap_by_kmeans(bus_weightings=weighting, n_clusters=40)
n_clustered = n.cluster.spatial.cluster_by_busmap(busmap)

We can also achieve the same in one function call without first defining the busmap manually:

In [11]:

Copied!

n_clustered = n.cluster.spatial.cluster_by_kmeans(
    bus_weightings=weighting, n_clusters=40
)
n_clustered = n.cluster.spatial.cluster_by_kmeans(
    bus_weightings=weighting, n_clusters=40
)

Again, let's plot the networks to compare (with bus sizes proportional to loads):

In [12]:

Copied!

fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
n.plot(ax=axs[0], title="unclustered", bus_size=0.005, **kwargs)
n_clustered.plot(ax=axs[1], title="clustered by kmeans", bus_size=0.005, **kwargs);
fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
n.plot(ax=axs[0], title="unclustered", bus_size=0.005, **kwargs)
n_clustered.plot(ax=axs[1], title="clustered by kmeans", bus_size=0.005, **kwargs);

We can also assign some weights to the buses, for example, so that low-demand buses are aggregated more strongly:

In [13]:

Copied!





loads = (
    n.loads_t.p_set.sum()
    .groupby(n.loads.bus)
    .sum()
    .reindex(index=n.buses.index)
    .fillna(1)
)
n_clustered = n.cluster.spatial.cluster_by_kmeans(bus_weightings=loads, n_clusters=40)
loads = (
    n.loads_t.p_set.sum()
    .groupby(n.loads.bus)
    .sum()
    .reindex(index=n.buses.index)
    .fillna(1)
)
n_clustered = n.cluster.spatial.cluster_by_kmeans(bus_weightings=loads, n_clusters=40)

In [14]:

Copied!





loads_clustered = (
    n_clustered.loads_t.p_set.sum()
    .groupby(n_clustered.loads.bus)
    .sum()
    .reindex(index=n_clustered.buses.index)
)
fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
n.plot(ax=axs[0], title="unclustered", bus_size=loads.div(1e6), **kwargs)
n_clustered.plot(
    ax=axs[1],
    title="clustered by weighted kmeans",
    bus_size=loads_clustered.div(1e6),
    **kwargs,
);
loads_clustered = (
    n_clustered.loads_t.p_set.sum()
    .groupby(n_clustered.loads.bus)
    .sum()
    .reindex(index=n_clustered.buses.index)
)
fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
n.plot(ax=axs[0], title="unclustered", bus_size=loads.div(1e6), **kwargs)
n_clustered.plot(
    ax=axs[1],
    title="clustered by weighted kmeans",
    bus_size=loads_clustered.div(1e6),
    **kwargs,
);

Hierarchical Clustering¶

The Hierarchical Agglomerative Clustering (HAC) is a bottom-up approach that initially treats each node as its own cluster. In each iteration two adjacent clusters (e.g. connected by a line or link) are aggregated that have the most similar feature(s) with respect to a given similarity measure. The implementation uses sklearn.cluster.AgglomerativeClustering.

Let's say we want to cluster the buses based on similarity in their solar generation profiles in existing plants (note that in the example dataset there are few solar generators in the central part of Germany). Then we would prepare the feature matrix as follows:

In [15]:

Copied!





feature = (
    n.generators_t.p_max_pu.filter(like="Solar")
    .T.groupby(n.generators.bus)
    .mean()
    .reindex(n.buses.index)
    .ffill()
)

n_clustered = n.cluster.spatial.cluster_by_hac(feature=feature, n_clusters=40)

feature.iloc[:5, 8:16]
feature = (
    n.generators_t.p_max_pu.filter(like="Solar")
    .T.groupby(n.generators.bus)
    .mean()
    .reindex(n.buses.index)
    .ffill()
)

n_clustered = n.cluster.spatial.cluster_by_hac(feature=feature, n_clusters=40)

feature.iloc[:5, 8:16]

Out[15]:

snapshot	2011-01-01 08:00:00	2011-01-01 09:00:00	2011-01-01 10:00:00	2011-01-01 11:00:00	2011-01-01 12:00:00	2011-01-01 13:00:00	2011-01-01 14:00:00	2011-01-01 15:00:00
name
1	0.0	0.022246	0.093249	0.138226	0.142156	0.042697	0.014608	0.0
2	0.0	0.022246	0.093249	0.138226	0.142156	0.042697	0.014608	0.0
3	0.0	0.013251	0.022272	0.027736	0.028586	0.023846	0.010311	0.0
4	0.0	0.029535	0.219936	0.272449	0.271843	0.060156	0.021124	0.0
5	0.0	0.029535	0.219936	0.272449	0.271843	0.060156	0.021124	0.0

In [16]:

Copied!





feature_clustered = (
    n_clustered.generators_t.p_max_pu.filter(like="Solar")
    .T.groupby(n_clustered.generators.bus)
    .mean()
)
fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
n.plot(
    ax=axs[0],
    title="unclustered",
    bus_size=0.01,
    bus_color=feature.mean(axis=1),
    **kwargs,
)
n_clustered.plot(
    ax=axs[1],
    title="clustered by hac (solar)",
    bus_size=0.01,
    bus_color=feature_clustered.mean(axis=1),
    **kwargs,
);
feature_clustered = (
    n_clustered.generators_t.p_max_pu.filter(like="Solar")
    .T.groupby(n_clustered.generators.bus)
    .mean()
)
fig, axs = plt.subplots(1, 2, subplot_kw={"projection": crs}, figsize=(10, 7))
n.plot(
    ax=axs[0],
    title="unclustered",
    bus_size=0.01,
    bus_color=feature.mean(axis=1),
    **kwargs,
)
n_clustered.plot(
    ax=axs[1],
    title="clustered by hac (solar)",
    bus_size=0.01,
    bus_color=feature_clustered.mean(axis=1),
    **kwargs,
);

Greedy Modularity Clustering¶

The greedy modularity clustering approach seeks to find community structures using networkx.community.greedy_modularity_communities. The algorithm begins with each node in its own cluster and joins the pair of clusters that most increases modularity). No weights are applied for this algorithm.

In [17]:

Copied!

n_clustered = n.cluster.spatial.cluster_by_greedy_modularity(n_clusters=80)
n_clustered = n.cluster.spatial.cluster_by_greedy_modularity(n_clusters=80)

WARNING:pypsa.network.power_flow:Line(s) ['2', '4', '5', '10', '11', '12', '13', '14', '17', '18', '21', '22', '25', '26', '27', '28', '29', '33', '34', '35', '36', '37', '38', '39', '44', '45', '46', '47', '48', '49', '50', '51', '57', '59', '61', '62', '63', '64', '65', '66', '67', '68', '69', '70', '71', '72', '73', '74', '75', '76', '77', '79', '81', '82', '83', '84', '85', '89', '90', '92', '93', '94', '95', '96', '97', '98', '100', '101', '102', '103', '104', '105', '106', '107', '108', '109', '110', '111', '112', '113', '114', '115', '116', '118', '119', '120', '121', '122', '123', '124', '125', '126', '128', '129', '130', '131', '132', '134', '135', '136', '137', '138', '164', '169', '177', '178', '179', '184', '186', '187', '188', '189', '190', '192', '197', '200', '202', '204', '205', '207', '219', '220', '221', '222', '225', '226', '233', '238', '239', '241', '242', '246', '252', '253', '254', '255', '256', '260', '261', '263', '264', '265', '266', '267', '268', '269', '270', '271', '272', '273', '276', '277', '279', '280', '282', '285', '286', '287', '288', '289', '290', '291', '293', '295', '301', '302', '304', '305', '309', '310', '312', '313', '317', '318', '319', '324', '325', '326', '328', '329', '330', '332', '336', '343', '344', '347', '348', '349', '356', '357', '358', '360', '366', '369', '370', '371', '378', '379', '383', '384', '385', '386', '399', '400', '401', '402', '403', '404', '405', '406', '409', '410', '411', '412', '415', '416', '420', '421', '423', '429', '430', '431', '432', '433', '436', '437', '438', '439', '440', '441', '443', '444', '445', '446', '447', '448', '449', '450', '451', '452', '453', '454', '455', '456', '457', '458', '459', '460', '461', '462', '464', '465', '466', '468', '469', '470', '471', '474', '475', '476', '477', '478', '479', '480', '488', '489', '490', '491', '492', '493', '494', '495', '496', '498', '499', '504', '510', '511', '512', '513', '514', '515', '523', '524', '525', '526', '527', '535', '540', '541', '542', '543', '544', '545', '546', '547', '551', '554', '558', '559', '560', '561', '562', '568', '569', '570', '571', '572', '574', '580', '581', '582', '586', '587', '588', '589', '590', '591', '592', '593', '594', '595', '596', '597', '598', '599', '600', '611', '613', '614', '615', '620', '621', '622', '623', '628', '630', '631', '637', '638', '647', '651', '656', '662', '665', '666', '667', '668', '671', '672', '675', '676', '677', '682', '683', '684', '685', '686', '687', '688', '689', '690', '691', '692', '693', '700', '701', '702', '709', '710', '711', '712', '713', '714', '721', '722', '723', '724', '725', '726', '727', '728', '729', '730', '731', '732', '733', '734', '735', '736', '737', '738', '739', '740', '741', '769', '770', '771', '775', '776', '777', '784', '785', '786', '793', '794', '795', '799', '800', '801', '805', '806', '807', '808', '809', '810', '811', '812', '813', '823', '824', '825', '827', '828', '833', '834', '835', '837', '838', '839', '840', '841', '842', '843', '844', '845', '846', '847', '848', '849', '850', '851', '852', '853', '854', '855', '856', '857', '858', '859'] have a 'type' set; the manually provided values for r, x, b are overridden by the standard type. Set 'type' to "" to keep manual values.

WARNING:pypsa.network.transform:The attribute 'v_nom' is a standard attribute for other components but not for lines. This could cause confusion and it should be renamed. See also: https://go.pypsa.org/warning-attr-misleading.

References¶

For more information, consult the following research papers:

Frysztacki, Recht, Brown (2022), A comparison of clustering methods for the spatial reduction of renewable electricity optimisation models of Europe, Energy Informatics.
Frysztacki, Hörsch, Hagenmeyer, Brown (2021), The strong effect of network resolution on electricity system models with high shares of wind and solar, Applied Energy.

Temporal Clustering¶

In addition to spatial clustering, PyPSA also supports temporal clustering to reduce the number of time steps in a model while preserving total modeled hours through snapshot weighting adjustments.

Temporal clustering methods are available via n.cluster.temporal.*:

resample(offset) - Aggregate snapshots at regular intervals (e.g., "3h", "24h")
downsample(stride) - Select every Nth snapshot as representative
segment(num_segments) - TSAM agglomerative clustering for variable-duration segments
from_snapshot_map(snapshot_map) - Apply pre-computed temporal aggregation

# Resample to 3-hour resolution
m = n.cluster.temporal.resample("3h")

# Select every 4th snapshot
m = n.cluster.temporal.downsample(4)

# Segment into 100 variable-duration segments (requires tsam)
m = n.cluster.temporal.segment(100)

For a detailed comparison of temporal clustering methods and their impact on optimization results, see the Time Series Aggregation example.