Cost estimating synthetic edges – kuan butts

## Introduction

Intent is to document performance cost of adding synthetic node to edge leafs. Why would one do this? One example use case is to create summary walk access edges and nodes along end of remote our outlying transit service areas.

Instead of assigning all accessible jobs to final transit stop, desire might be to “string out” jobs linearly along synthetic walk network edges that extend out, one after another, from target transit nodes.

Such a method would model walk access from transit node to jobs. Also would avoid unrealistic and oversized “boost” in jobs access the moment you are able to reach this outlying transit node.

This seems ideal from a clean modeling perspective, but what are the costs of adding synthetic nodes to a representative network? Synthetic nodes add to the complexity of the network and can slow down pathfinding operations. This post creates a simple example model and demonstrates the increased cost of addition of synthetic nodes.

## Set up an example graph

For the purpose of this post, instead of using a real transit network, let’s just create an example one using the `random_lobster` algorithm available with `networkx`. It’s a close enough approximation, creating a main “spine” and some branching paths out from that core trunk.

Drawing the graph will result in something like the following:

## Summarize existing network

Before modifying the network, capture aspects of the network as they currently exist. We will use these as references for evaluating performance before and after.

The above operation gets us the ids of the nodes that presently are in the original graph.

The above gets us the ids of the nodes on the ends of these leafs branching out from the main spine of the network graph.

Finally, we get the largest node id so that when we begin to modify the graph, we do not override any existing nodes.

## Adding synthetic nodes to the graph

Now let’s add new nodes to the graph. Let’s add 10 per each leaf nodes that we identified. We can imagine that this would allow the graph to model walk access to some target metric (like jobs) at, say, 2 minute increments and thus model walk access to this resource within a 20 minute walk of the transit station (about a mile).

We can modify the graph now by iterating through and adding those 10 nodes stretching out from each of the prior identified edge nodes.

## Evaluating modification impacts

We can now calculate how much the graph changed before and after the addition of synthetic nodes.

## Comparing performance

We can see what run time looked like originally, by calculating shortest path to all nodes from each node in the original graph and how that increases when the graph is then updated.

First let us look at the performance of the original graph:

Now let’s look at the performance of the new, modified graph:

That’s about 386% longer on a network that is 434% larger. Very roughly, a 4x increase in the number of nodes here corresponded with a similar increase in the runtime for calculating shortest paths.

While there are a variety of optimizations that can be made on graph access calculation (as well as not using a pure-python graph implementation like NetworkX), the fact stands that mutating the original graph can introduce some significant performance-related costs when attempting to model path access for leaf edge end nodes (for example).