Introduction
This post intends to demonstrate how to identify the key segments of a given road network and model what paths experience the greatest stress when they are removed from the network.
The post focuses on patterns for accomplish this using python-sharedstreets
to extract road segment data, peartree
to supply a few convenience methods for networkx
graph manipulation and export, and graph-tool
to do the heavy lifting of running performant network graph algorithms in a lower level data architecture than Python provides (e.g. leverage Boost C++ algorithms on a pure-C data architecture).
Get the road network data
For the purposes of this post we will access the road network from just a single SharedStreets tile. We can use the python client along with mercantile (converts lat/lng data to vector tile coordinates) to assist in pulling it down:
Once we have the tile, the python client provides a convenience function for extracting the geographic data. From that, and the other information, we can recast this as a meter-based GeoDataFrame:
This will produce the following results geodata:
From this geodata, we can separate out the nodes and edges as separate tabular datasets:
We can then improve the edges dataframe to have attributes related to speed to help us determine the cost of segment traversal (in seconds). We could be a little more sophisticated about this if we wanted, but this is sufficient for the blog post:
Constructing the graph-tool graph
Now that we have cleaned edge and nodes data and we have edge data with a cost attribute we care about (the time cost), we can convert this into a networkx graph:
Since this is a peartree
style networkx
graph, we can use peartree
helper methods to examine it. For example, we can plot it and ensure that the conversion looks ok:
Now that we have a peartree
style networkx
graph, we can use the helper method to convert from a pure python networkx
graph to a graph-tool
network graph:
Plotting it will produce a spring graph output that again helps us ensure that the result “looks right”:
Calculating betweenness centrality is now super cheap. The below calculation takes a fraction of a second on just 2 cores in Docker container on a ‘14 13” MBP:
We can see the results applied to the spring graph like so:
Note that the spring diagram does not know “up” from “down” in the spatial since so the arrangement will look correct but may be rotated or inverted compared to what the true spatial distribution is when the coordinate reference system is. We can tether the plot to that using the graph-tool
position vector:
Identifying critical segments
We can view the distribution of the results and see that a few key segments hold most of the shortest path assignments:
Now we can take the top 5% of the edges and toss them:
We can see how this impacts the distribution:
The code to generate the plot looks like:
And the subset that is just the “not-adjusted” portion:
Let’s visualize what the dropped portions of the network are:
The code to generate that looks like this:
Modeling the handicapped network
Now that the “primary” edges have been severely handicapped, we can see where all that impacted routing has been adjusted to by recalculating the betweenness values. Before we do, we should preserve the original values for comparison:
At this point, we can recalculate and plot the new, impeded network the same way:
This will result in the same network, but with 2 new core axes:
Evaluating the modified networks path allocations
Let’s take the previous betweenness results and compare them with the new ones. Let’s plot to see where the new edges experienced significant gains over the old network:
As we can see the adjusted network has a larger high-betweenness spread than the original. This is likely because the paths that were removed were major boulevards and freeways - they had the greatest performance in terms of speed (of course we were modeling without traffic, but it’s likely true either way). At any rate, removing them meant that other thoroughfares were less powerful in terms of time efficiency and, thus, routes were less likely to aggressively tack towards a single set few paths. Thus, more, smaller routes experienced increases in route placement.
We can plot the subset that showed an increase:
There are 2 clear paths that took the brunt of the network path re-assignment - San Pablo and Adeline. I have highlighted them in the below map (sorry for using Google Maps, OSM community):
Anecdotally, I happen to know that this result is correct - these are the two secondary corridors that experience heavy traffic, and act as local alternatives to the major freeway auto corridors.
Finally, we can also confirm that the major corridors that prior were the spine of the betweenness results experience the greatest drop in relative centrality:
Conclusion
Hope this demonstrates how lightweight python tools can be chained together to quickly model infrastructure network modification without the need for heavy backend architecture thanks to free services such as SharedStreets and OpenStreetMap.
Additionally, this should demonstrate how it would be easy and quick to establish an iterative (perhaps, recursive) model that repeatedly identifies the “next most important” route segments and tosses them, each time re-running betweenness to identify those “next in line.” With such a pattern, you could create a tiered ordinal classification system for all road networks, taking, say, the top 5% each time and placing them in a group and then tossing them.
Through such a system, you could identify you traffic dispersal patterns may occur and identify touch points for intervention.
Another path to improving this model would be introducing node weights (which graph-tool
fully supports). Using that, you could assign population figures, for example, to nodes to correctly weight areas with higher population and areas with lower population. Right now, in the example, all nodes are weighted equally (this was done to keep things simple purely for demonstration purposes).