In the previous post the network consisted of only 25 nodes. Who needs computers, right? SBB, the Swiss Federal Railways, definitely do. In fact, they maintain an amazing data repository comprised of over sixty datasets with many thousand entries each. The company even launched a Kaggle competition in 2018 because there was no standard software to cover their increasing needs.

Here I would like to show the bigger picture without going into task-specific problem solving. It is about connecting the dots, making a living organism out of the network, beyond coordinate systems.

Basel SBB

%matplotlib inline
import os
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
from matplotlib import cm

Import Data

The original dataset contains information about the yearly total number of trains that pass through each route section of Switzerland. The list contains entries for 2016, 2017 and 2018. Distinction is also made between passenger transport and cargo. For this post, I will focus only on passenger transport in 2018. Comparative studies along the aforementioned two axes are reserved for the near future.

df = pd.read_csv('data/zugzahlen.csv')
PID Anzahl_Zuege
SBB_MOR_STJ 121566


As is often the case with transportation networks, there are origin and destination points. Caution :skull:! These two sets of points do not have to be congruent and in this case they aren’t. Let’s extract the respective longitudes and latitudes and have a look.

Longitude of origin

df['lon_von'] = df.geopos_von.str.split('\,').str[0]
df['lon_von'] = df['lon_von'].map(float)

Latitude of origin

df['lat_von'] = df.geopos_von.str.split('\,').str[1]
df['lat_von'] = df['lat_von'].map(float)

Longitude of destination

df['lon_bis'] = df.geopos_bis.str.split('\,').str[0]
df['lon_bis'] = df['lon_bis'].map(float)

Latitude of destination

df['lat_bis'] = df.geopos_bis.str.split('\,').str[1]
df['lat_bis'] = df['lat_bis'].map(float)

Map of origins and destinations

Spot the difference :trollface:


Coordinates of origins

pos_von = {}
for i in range(0, len(df)):
    pos_von[df.BP_Von_Abschnitt[i]] = (df.lon_von[i], df.lat_von[i])

Coordinates of destinations

pos_bis = {}
for i in range(0, len(df)):
    pos_bis[df.BP_Bis_Abschnitt[i]] = (df.lon_bis[i], df.lat_bis[i])

Final list of nodes

# Magic merge of dictionaries
pos = {**pos_bis, **pos_von}

Building the graph

The graph is directed, so I will initialize it as a DiGraph with weighted edges and proceed with a bidirectional copy.

df['weight'] = df['Anzahl_Zuege'].map(float)
df['weight'] /= np.max(df['weight'])
D = nx.from_pandas_edgelist(df, source = 'BP_Von_Abschnitt',
                            target = 'BP_Bis_Abschnitt',
                            edge_attr = 'weight',
                            create_using = nx.DiGraph())
G = nx.Graph(D)

Graph as a Map

Now the graph is fully operational and we can very easily do all kinds of fun calculations.

bond = np.array(list(nx.get_edge_attributes(G,'weight').values()))
# Calculate degree centrality,
eigenvector_centrality = nx.eigenvector_centrality(G)

# Set degree centrality metrics on each node,
nx.set_node_attributes(G, eigenvector_centrality, 'ec')

# Use eigenvector centrality for visualization.
ec = np.array(list(nx.get_node_attributes(G,'ec').values()))
plt.figure(figsize = (12,9), dpi=150)
nx.draw(G, pos=pos, edge_color=bond, node_color=ec, with_labels=False,
        node_size=ec*1000, width=bond*20,,


Graph as a Network

New positions based solely on network dynamics abstracting away geography.

_pos = nx.spring_layout(G, seed=0)
plt.figure(figsize = (12,12), dpi=150)
nx.draw(G, pos=_pos, edge_color=bond, node_color=ec, with_labels=True,
        font_size=5, weight='weight', node_size=ec*2500, width=bond*20,,, alpha=0.8)


P.S. The thick purple line in the core is Langstrasse with 336586 passenger trains in 2018.