Developing pattern for generating comparative hyperpath plots against network OD pairs with Peartree: pic.twitter.com/u89QrI4Sbt
— Kuan Butts (@buttsmeister) March 15, 2018
Above: The tweet that was the basis for this blog post.
Introduction
I was recently reading about hyperpaths as part of the MTC’s Fast-Trips project.
As I have been curious to begin developing agent-based models that utilize Peartree’s generated network graph, I figured I good first step would be to have the capacity to generate these subnetworks. That is, in given a particular O-D pair, there may be a number of potential reasonable paths to get from the origin to the destination.
From this subset of potential acyclic networks, probabilities can be calculated that would then allow a modeling of route choice. This blog post seeks to only sketch out how different paths could be drawn out from the resultant Peartree graph, given a case study of the AC transit network in the East Bay.
The resulting summary data is then plotted as a graph (as seen in the second image in the tweet). The resulting summary data could also be used in conjunction with various edge and path attributes (such as number of transfers) to calculate the agent’s most likely path option.
Getting started
First, a list of the libraries we will be using:
Next, we need to load in the network as a Peartree directed graph:
Note that we turned off exempt_internal_edge_imputation
. This parameter chooses whether or not to create walk edges between close nodes in a network. So, if there are two bus stops that are close together (the connection_threshold
is defaulted to 50 meters), we create a direct link from one to the other. The time cost (set as length currently in the network edge attributes), is set to a function of the rate of the walk_speed_kmph
parameter. This is default to 4.5 kilometers per hour. The straight line distance is then divided by the walk speed to determine the amount of time that edge cost is going to represent.
Selecting paths
We now want to pick and origin and destination and some intermediary points. With this data, we can establish two clear paths from an origin and a destination.
We will need to write a simple method to tether these coordinates to their nearest node on the network graph. I’ve sketched that out with the below function:
Now that we have a way to find the nearest node id, let’s get them for all the points we defined:
Now, theoretically, there would be a better way of devising what the most likely routes are for the agent to be selecting between (and would comprise other modes and the like), but let’s say that these are the two paths that the model were to spit out that the agent would be picking between. For example, the agent could be attempting to trip chain, and the midpoints are equivalent destinations that the agent would be willing to stop off at one or the other.
Generating paths
With the resulting intermediary points, we can employ NetworkX’s shortest path algorithms to find the shortest path for each of the two alternate routes:
We can then assemble the two halves of each path and create two distinct paths:
Now that we have the paths, we will also need to calculate attributes along it. Specifically, we want to estimate the time it takes to traverse each segment, from node to node, and where transfers occur (or the agent would just be walking and not taking the bus, for example).
Let’s first just write a method to see how the two paths stack up:
And, the results using that method:
From these results, we can see one is 3 minutes slower than the other.
Visualizing the paths
We know the comparative times for the two paths, but let’s see how they stack up. Let’s make a plot of the two paths so we can see how they differ “on a map” of East Bay (cropped to West Oakland and Emeryville, approximately).
We will be using two additional methods from our libraries:
Now, in addition to plotting the graph, we will also need to plot the two paths and the nodes that are along them. So, we will need first a method to generate a GeoDataFrame of the edges:
As you can see in the comment inline, I do a quick estimate of the spatial distance that each edge represents using great circle. This will be used to contrast against the time it takes to traverse that edge segment. Better practice would be to use the actual route path length, which we could access in the GTFS for AC Transit, but for our cases right now, let’s call this method good enough for an example.
And also for the nodes, as a GeoDataFrame:
Now, we can leverage OSMnx to help us plot the network graph (as we can see, above). We can use the resulting axis object to add our nodes on top of that:
Plotting comparative paths
Not only can we plot the spatial paths of the two alternate routes, but with the new edge dataframe, we can also generate comparative plots. In the one I generate here, I will attempt to mimic the hyperpath example from Fast-Trips documentation (also shown above).
I’ll go into the code after the image, but want to just give a high level overview of what I will do. First, I will utilize the .cumsum()
Pandas API to estimate the total elapsed time at each node along the route. Then, I will do the same with the edge distance estimates to see how far along the route each segment is. Thus, the x axis will represent cumulative distance while the y axis will represent cumulative time.
Dark portions of the graph lines by color will represent the portions of the route that are on transit while the lighter portions will represent walk segments. X axis (total distance) will be measured in meters while the Y axis (total elapsed time) will be measured in minutes.
Here’s how the plot was generated: First, I generate the cumulative sums mentioned prior that represent elapsed time and total distance. I then set the distance to the index to the distance so that, when I plot a line graph of the elapsed time, the distance values will be represented along the X axis. I utilize the mode tag along the edges to “turn off” edge values for segments that are walk for one set of lines. This will allow us to expose which part of the segment is walk or not, and to easily calculate how many transfers are involved in each path.
Once we have the data frames modified for each comparative path, we can wrangle some matplotlib and spit something out.
Conclusion
Just some notes on how to get two comparative paths as Pandas series plotted. From these vector arrays, we have the capacity to run some quick calculations to “score” routes and generate probabilities. With these scores and probabilities, we are well situated to estimate or model route selection for a given agent!