Introduction
RAPTOR stands for: “novel RoundbAsed Public Transit Optimized Router.” It’s based on this paper by Microsoft Research and demonstrates a simple method for traversing published transit schedules to determine what the optimal path is between two stops.
There are numerous optimizations presented both in this paper and in broader discussion of the algorithm. The goal of this post is to skip all of that discussion and show the most stripped down version of the algorithm possible, to clearly demonstrate it’s fundamental concept. This version is not intended to be performant, but it does work and is hopefully simple enough to be easily digestible.
I’ve included all this work in a Github Gist notebook for reference as well. In this post, I will take snippets from that code, but you can reference the original for the total set of scripts to run end-to-end. In the future, I’ll follow up on this post with an example that also brings in some of the additional performance optimizations presented.
Base data and tools
For this example, we will be using AC Transit’s latest GTFS feed. I pulled this down on September 8, 2020. The data was pulled from their data resource center. I’ll be using a few key libraries in python.
First, I will use partridge to parse the raw GTFS and load it up as pandas data frames. I will also use Geopandas to perform some geo-based operations to compute proximities with nearby stops for the purpose of adding in transfers between lines as footpath (or, walk) transfers.
Algorithm
The RAPTOR algorithm has 3 main steps. Note I will be using terminology based on GTFS to describe components of the transit feed.
- Get all stops currently accessible (initially, just the starting origin stop) and find associated trip ids.
- For all stops in associated with each trip id, use arrival and departure data to figure out how long to reach each other stop in the trip.
- For each reachable stop, see if there are other stops nearby that can be walked to (footpath transfers).
We can visualize this with the above graphic. First, we can reach everything along the trips along the stop that we start at. This is just the east-west route at first. Then we can find nearby transfers (the green dot) and add these to our qualifying stop ids. On the next iteration (a trip with 1 transfer), we now can consider all possibilities from all the stops from the first iteration: the blue and the green stops. This opens up all the trips that serve the yellow stops now, which lets us navigate to the destination stop node.
Loading in the data
First let’s read in the GTFS data. We’ll just work with the busiest day for this example.
Next, we will also want to convert the stops to shapes so we can find nearest stops.
Finally, we want to get the start and end node for our example.
Helper methods
There will be 3 main helper methods to help with each of the main steps that were stated above. First, we need to get the associated trips for a given stop.
Now, given those trips, we can iterate through the stops that we can reach so far and find all other stops we can reach from these identified trips. Then, we can compute the time it takes to reach those stops and if it is faster than what we currently have (or we do not yet have them as reachable), we can add them to our managing object that holds all accessible stops.
For the 3rd step, we can calculate potential transfers. These are nearby stops that we assume you can walk to. We will use the GeoDataFrame to just find nearby stops to each qualifying stop. Of course, this is very inefficient, but we are keeping it simple for the purpose of demonstrating the basics of this algorithm.
Bringing it all together
Now that we have the 3 helper functions, we can run them together to generate an iterator that iterates through the number of transfers we have set as a limit and keeps finding or updating how long it takes to reach all/any stops in the system. At the end of the iterations, we see if our destination stop is there and we get a resulting timeframe for reaching it.
Note again that there is plenty of optimizations to be introduced but, again, the purpose is to just show the fundamental concept of the algorithm.
Here is the logged output from the above run with AC Transit data and start and end nodes set up earlier. The resulting time is 35.5 minutes which is the same as what you would get in a service like Google Maps (estimated 36 minutes).
Conclusion
I hope this was helpful in demonstrating with code the basic concept of the RAPTOR transit routing algorithm presented in this paper. This is just a skeleton. It would be easy to introduce some of the optimizations discussed in the paper as well as modify how trips and stop paths are tracked to daylight more information other than just time to reach a stop (such as the path, the associated routes, and the number of transfers).