Above: SharedStreets (SS) segments tethered to transit stops in blue, analyzed transit stops from AC Transit in red, all OSM geometries associated with selected SS tiles in grey.
The intent of this post is to document some initial exploration with the SharedStreets format in conjunction with GTFS data processed with Peartree. In this post, I use correlate Peartree data to SharedStreets (SS) and, via linearly-referenced system, am able to perform walk analyses from network stops (bus stops) in Peartree and map out transit + walk accessibility against SharedStreets segment metadata (which is from OpenStreetMap OSM). That said, once I have tethered the Peartree network graph stops to SS segments, I observe how I can interchange OSM data for other network datasets also associated with SS, and compare walk segment outputs easily through the SS medium.
SharedStreets (SS) is an interesting platform. By creating a linearly referenced system approximating known paths around the world, it creates a medium through which disparate datasets - from city sidewalk data to OpenStreetMap road network data to state road network shape files - can “speak to one another.”
As a medium through which these datasets speak, one must only tether their data to SS. Once that has been accomplished, identifying likely comparable segments in other datasets is also possible.
In the above image, taken from an exploratory interface available via SharedStreets, one can see the cleaned road segments shown on top of Mapbox map tiles (which render OpenStreetMap road segments, or “ways”).
Loading in transit network data
Let’s start with something I’ve likely blogged about more than a few times and load in some transit network data with Peartree.
There’s nothing unusual here, we will just load in the default busiest schedule data as a directed network graph:
Plotting the results with the library’s built-in
pt.generate_plot should generate the image shown prior, with the network in blue on a black background.
Identifying SharedStreets vector tiles
Right now, I will be querying for SharedStreets vector tiles for map zoom 12. This seems to be a sufficient size, particularly as the edges themselves do not hold geometric data and thus are fairly small to pull down. What’s nice about this is that edge network data and the geometries they represent have been decoupled. Thus, once we get our data into SS segment pairs, we will be “untethered” (theoretically) from geometric data. This should vastly improve all sorts of analyses.
In order to do this, we will need to roughly capture the nodes of the transit network touch. This is easy to extract from the network graph. From those nodes, we will buffer them roughly by 0.03 degrees which is about 2 miles in the Bay Area. This can be rough as we just need to make sure we hit the edges of all the tiles we will roughly need and if we pull in one or two extra, that is okay.
We can accomplish this with the following method
Which in turn relies on these 2 helper functions:
These two helper functions perform the buffer, use Mapbox’s mercantile library to get the conversion from web mercator coordinates to vector tile coordinates (quadrant-based x, y values).
From these methods, we can generate a JSON which represents the parsed protobuf vector tiles from SharedStreets, representing all tiles in related to the Peartree network graph:
From these results, we can quickly plot the output to visually observe the network graph on top of the SharedStreets graph:
SharedStreets Python library
The SharedStreets Python library allows for easily interfacing with the SharedStreets API. It’s part of the SharedStreets organization on Github and its repo is here.
There are two components in a returned object:
- The references (the graph itself, with reference IDs for related components) as a list of references.
- The geometries, which are referenced by ID in the references list. The geometries are also held as a list of objects.
Combing the two is possible by first creating a lookup from the reference to the geometries, like so:
By doing this, we can improve the quality of the visualization of the network from straight edge connections, such as this detail:
To plots that look like this, with this level of detail:
Now, this is not important for the analysis of the network except for the fact that we can calculate the real length of the edge, instead of the straight line distance. This will make calculating walk times along edges more accurate. It will also help for the initial step of identifying which network node is assigned to which SharedStreets edge.
Note that in the prior plotted graph with the whole system, each edge was not “fully” rendered and thus the detail shown in the 2nd of the 2 plots above was absent.
Next, we can iterate through both the nodes (via intersectionId) and the edges (via the same, but in paired form), to generate a GeoDataFrame:
Comment on edge classifications
For the purposes of this exploration, I simply accepted that I was going to use all edges of the network. In reality, I would likely want to parse out walk networks, or perhaps only highways. Right now, that is not particularly easy to do with SharedStreets.
Since we will be performing a series of distance and buffer-related calculations, we should convert the GeoDataFrames into a meter-based projection. This will help ensure we are more accurate in our geometric operations.
Above: The results of the reproduction of the edge GeoDataFrame.
The code to do this is quite simple:
Pairing to SharedStreets
At this point, we have the SharedStreets network in a GeoDataFrame, which will make resolving the AC Transit peartrees graph network easier. Now, this is definitely a step that could be far more optimized, but since this is a casual weekend exploration, and performed within a single Jupyter notebook, no effort was made to optimize (or, specifically, parallelize) this step. As a result, runtime against all edges in the related network on all 5,050 graph nodes took about 1.5 hours. The script to find the nearest edge to each network node is fairly straightforward, simply iterating through all edges in a loop. I acknowledge that even simple optimizations such as the inclusion of a spatial index could have drastically sped this up. I include it here only so someone else might use it as a starting point for performing a similar task in the future:
We can now view the results of this effort by plotting the edges that were paired (in blue) on top of the plot of all edges in the network. In addition, I have marked the stops themselves in red.
What can we do with the paired network data?
Once we have the paired network data, we can begin to contextualize the transit network stops. For example, we can create buckets and see how much of the East Bay (since AC Transit served Oakland) can be accessed in 5, 10, 15, and 20 minutes from each bus stop. This can help visualize the coverage of the East Bay that the network has - ignoring frequencies and the transit network itself. It is just amount of the walk network that the nodes are in close proximity to.
Using a default walk speed of 4.8 km/h, we can script this out using NetworX’s ego graph method to calculate what is accessible from each paired edge for each time bucket:
Again, this is something that would be far more performant outside of NetworkX, but my intent is to just show this as a demonstration of potential - not something that would be used outside of a one-off.
With these results we can plot the output, with darker areas being closer to node points and those that are lighter being farther away. I found the results rather “pretty:”
To plot this, I wrote the following:
Looking in more detail (I plotted in black this time), we can explore certain parts of the city and see how transit + walk service exhibits itself.
For example, downtown Oakland is well serviced, and, to the west, downtown San Francisco is similarly well captured:
Looking north, we can see the core of Richmond is well serviced, but the suburban developments up the hill, as well as the areas around the large shopping mall, all have poorer levels of accessibility (20+ minutes to the nearest bus stop - any bus stop - is pretty bad).
Finally, down south, we can see connections across the Bay to the peninsula. We can also see how suburban development and large swaths of freeway create greater inconsistency in transit coverage, with a tendency for “pockets” of access to transit nodes, surrounded by lower access areas that tend to be limited by the network components primarily designed to service vehicles.
Appending the transit network edges to the SharedStreets network
Now that we can neatly calculate walk shed from each edge that the transit network serves, we can also go ahead and add in the transit network itself. Below is a large blob of code but all it does is create a list of new edges to add that connect the network to the point on the edge in between the SS intersections and then also creates walk networks from that point to each of the intersections on the SS network (in both directions).
Now that the list has been created, we can iterate through it and add each new component to the copied network graph (which will now house both the SS network and the transit edges).
Analyzing full network with transit
Now that we have a new graph object that contains both the “walk” SharedStreets network (again, remember I am assuming all components are ok for walking, but could have done some parsing back at the beginning to trim out highway segments or other car-only segments), we can generate accurate isochrones with the SS segments.
Let’s find the nearest node in to the 12th St/City Center Bart station in downtown Oakland:
We can now perform an ego graph to determine what part of the two networks is accessible in a given amount of time (let’s say 20 minutes) on the walk network or the modified network with transit (
G would be swapped out):
First, we have results for 20 minutes, where walk is in green and walk plus transit is in purple:
First, we have results for 40 minutes, where walk is in green and walk plus transit is in purple:
In both of these examples, we can see how AC Transit actually provides pretty decent coverage of the overall network. In spite of what I imagine would be a hard task (getting good coverage along the width of the East Bay instead of the length, where the longer main corridors of bus and Bart are), there appears to be good accessibility and high coverage (on weekdays, during peak hour, that is).
Script to generate those plots is like so:
Shared streets is exciting because of its potential to be a “one and done” solution. That is, once I pair my network to SharedStreets and save that lookup table, I can potentially circumvent any future expensive geometric operations (so long as that other dataset has also been paired with SharedStreets). This helps create a “Rosetta Stone” of sorts where all metadata about segments can be stored and shared amongst disparate geospatial datasets.
Adding a thank you to Disqus user Pablo who noticed that time in minutes was being calculated incorrectly in the code snippets. An edit was made to address that on April 23, 2019.