Conveyal offers an open source GTFS editing tool. You can read about it here. The tool allows for new lines to be drawn and described as a collection of stops and a line (shape) with operational characteristics held as metadata. (Note: This information is out of date - see comments at the bottom of this post for details.)
Above is an image of what the interface looks like.
This tool allows for export of the lines that are drawn and the stops that are added in a summary JSON format. The notes below demonstrate how to take this information and convert it to 2 structured summary data frames: a stops dataset and an edges dataset.
With these two converted datasets, addition to a network graph is easy. The format produced can be passed directly into peartree for example, so that new, custom transit can be added directly from Conveyal’s open source tools and into a peartree network graph. This can be useful for lightweight, notebook based exploration of how a new transit line interacts with an existing network (held as GTFS), for example.
Reading in the data
To start, let’s just use geopnadas easy read operation to read in the two GeoJSONs and hold them in data frame form.
We can plot the results easily.
Above: Results of plotting the lines and stops. As we an see, it looks comparable to the results viewable in the transit editing portion of the app.
Note that the alignments data frame holds the contents of the timetable attributes as a stringified array. See the above log of the alignment data table to see what that looks like.
For stops, we just want the distance measure and the geometry. We can then group by the name of each route. We will use these to pair with the alignments.
Pairing the stops
We need to find which stop goes where on the line. We need to determine how long each subsegment is. This method is not necessarily the bus, but works if we rely on the stops data being returning from Conveyal to be vetted in the sense that the stops do indeed exist along the route.
Indeed, in this method the stops that are placed too far off the route (should not happen with default output) would get dropped. An alternative would be to use a more advanced line matching pattern to pair the stops to the line.
In this case, we are assuming that this has already occurred and we just want to subdivide the route into separate lines for each segment. We are already working in a meter based projection so division of the route into segments with a small buffer avoids any minor offsets and provides a “good enough” measure for each segment distance.
A note before showing the full logic: We are at this point just working with one alignment. We simply need to apply the following logic to all route names to be get the edges and nodes for each new line.
We can color the plot to show each segment and how the stops were separators.
Above: Segments for a single route plotted.
Cleaning timetable metadata
As shown earlier, the alignment data is stringified. First, we should parse it for the target alignment.
Segment extraction for time is a function of the indicated speed for that segment, paired with the length calculated for that segment. Speeds are in kilometers and the length is in meters, so a 1000x conversion is also applied.
I also rolled in the dwelling times into the edge costs and shifted the cost to the following cost segment. I assume if you arrive at a destination you should not have to wait that additional amount of time since, at that point, you can just deboard the vehicle.
We now have edge costs. We can get the stop ids paired with each edge and create a summary data frame:
Nodes are easy at this point as well. We have the stop latitude and longitude already from the geometry data and just need to convert it to ESPG 4326 (web mercator) projection.
We also need to add headways. This is available as a single metadata parameters we can pull out. In the future, an improvement could be to provide a better function for calculating the average wait time. For example, for certain types of routes people might intentionally schedule their arrivals and thus wait times might want to be modeled as less than half the wait time.
This operation can now be run for each alignment name to produce the necessary edges and nodes data frames that are used to create graph edges.
You can see how these are used in the peartree library by viewing the
synthetic.py file and seeing how it consumed new summary nodes and edges data frames.
Here is nodes data frame generation.
Here is edges data frame generation.
- An update was requested by Anson at Conveyal. Per Anson: “For clarity, it might be good to specify that [I was] using the scenario editing features of Conveyal Analysis. The “data tools ui” [that was referenced] is actually a separate codebase now under IBI’s control.” More on that can be found, here.