Introduction
Recently, someone was asking online for some assistance getting up to date arrival information for a particular stop by combining both GTFS and GTFS-RT data. This is a fairly straightforward task thanks to a couple handy GTFS libraries in Python and I wanted to document how this would be performed. First, I will analyze static transit schedule data (GTFS). Next, I will analyze real-time transit data updates (via GTFS-RT). I will return the results of both formats’ analyses and use that to highlight how one could conflate the two to show up to date arrivals information for a given stop.
We will be using Calgary’s transit data for this analysis for no reason in particular. Any transit operator publishing their schedule data in GTFS and their real-time updates via GTFS-RT could be used instead, if desired. A gist of the below methodology is also available for reference here.
Set up
Make sure that you have 2 key libraries: gtfs-realtime-bindings
and partridge
. We will be working in a Py3 notebook with these two libraries available.
Next, download the latest GTFS zip file for the system:
Then, load in the GTFS as a partridge GTFS object:
Note that we are targeting a specific service date (today), but could specify any day as desired. The only thing to be careful us is to just limit to a specific date in this case.
Sanity check the loaded data
We can now review the data and select a target stop to analyze.
We can visualize this as shown in the above just as a visual refernce to see both our stop and the system as a whole to make sure we have loaded in the network we have in fact intended to.
Above: Image of the system map with the target stop highlighted in red.
ETL on GTFS schedule data
We can now extract just upcoming arrivals data for the one stop we want to review. We can create a window of time that we want to look in the future for - this window will limit how many upcoming arrivals we show. In this case, let’s just look forward 20 minutes.
Note that we just selected a popular_stop
value arbitrarily based on what appeared to be a stop with a high number of arrivals in the downtown area of Calgary. A stop value could be provided by a client when requesting arrivals data instead and used in lieu of the one provided, of course.
We now have just 3 arrivals we can see on the schedule. These are the only trips with arrivals scheduled for that target stop in the window of time we have specified.
Reporting scheduled information
With this resulting table that has route and trip data merged, we can now via all relevant arrival information.
| trip_id | route_id | service_id | trip_headsign | direction_id | block_id | shape_id | route_short_name | route_long_name | route_desc | route_type | route_url | route_color | route_text_color |
|----------:|:-----------|:--------------------------|:-----------------------|---------------:|-----------:|-----------:|-------------------:|:----------------------------------|-------------:|-------------:|------------:|--------------:|-------------------:|
| 58056923 | 1-20656 | 2021DE-1BUSCUT-Weekday-02 | FOREST LAWN | 0 | 6073510 | 10141 | 1 | Bowness/Forest Lawn | nan | 3 | nan | nan | nan |
| 58063105 | 307-20656 | 2021DE-1BUSCUT-Weekday-02 | MAX PURPLE CITY CENTRE | 1 | 6073794 | 3070045 | 307 | MAX Purple City Centre/East Hills | nan | 3 | nan | nan | nan |
| 58063164 | 307-20656 | 2021DE-1BUSCUT-Weekday-02 | MAX PURPLE EAST HILLS | 0 | 6073797 | 3070044 | 307 | MAX Purple City Centre/East Hills | nan | 3 | nan | nan | nan |
Above: Table of arrival information that is the result of the trip and route table merge on the subset of qualifying arriving trips to the target stop.
We can now also log the resulting stops as information from the GTFS dataaset for scheduled arrivals to this stop:
When I ran this at the time logged below, I got the following arrivals:
Evaluating at this time: 12:45:59
3 Upcoming scheduled arrivals in the next 20 minutes:
01: MAX Purple City Centre/East Hills arriving in 5 minutes
02: Bowness/Forest Lawn arriving in 6 minutes
03: MAX Purple City Centre/East Hills arriving in 18 minutes
Get GTFS-RT information
Just like we handled the scheduled arrival information, we can check what the real-time feed has to say, as well, and compare the two to see if there are updates on any of the scheduled trips.
The following script takes into account all the trip updates from a single query to the real-time GTFS-RT feed and filters down to just the trips that are related to the stop that was being evaluated in the prior analysis of the static GTFS feed. Also, we limit to just trips running that have not yet passed the target stop (so, their stop sequence is equal to or less than the stop we are evaluating’s stop sequence).
Reporting live schedule udpates from GTFS-RT
Just like we did with the static data, we can report out our new list of trip updates that are filtered to just the stop we are interested in.
The following script will allow us to just log all qualifying trips from the live feed.
This will log the following text:
Upcoming arrivals from live GTFS-RT feed
03: MAX Purple City Centre/East Hills arriving in 5 minutes
What I saw from the above was the first/next arrival from the scheduled transit zip file was shown as being on time for arrival.
We could continue to poll the live feed every 30 seconds to learn about upcoming arrivals as well as any updates should the next arriving trip become delayed.
Conclusion
Based on the above example, it should be apparent how to then pair the scheduled subset of upcoming trips with the similarly filtered live updates to get the latest information on all scheduled trips that are next-arriving at a specified stop in a transit network.