mdt CLI utility that Uber Movement released alongside their new speeds dataset shows extraction examples that will typically error (
bbox to ensure that the scope the extraction area.
Uber has recently begun updating some of its cities that it has available on its Movement platform with speed data. There are a few US cities that have speed data available. At the time of this writing, there were 2 I noted - Cincinnati and New York City. Since NYC is tired and Cinci is a cool place that often gets overlooked, I wanted to extract data from there.
The website offers a way to download CSVs with Movement ID pairs and a separate look up that pairs those Movement pairs to OSM id pairs. Here is an example of the download options for a city. I think that SharedStreets is involved somewhere in that process - so, perhaps Movement ID pairs are related in some way to SharedStreets linearly referenced segment IDs and then those are then used to point to the OSM IDs. This seems plausible, but I have not spent enough time digging into the underlying logic to fully understand.
There is a technical blog post here that I have been meaning to read by the SharedStreets folks on linking the Uber movement data outputs to SharedStreets to then pair with crash data. In that post it points to this Node package which I thought might be on Github, and it is, but it doesn’t have nay actually code in it. So, I assume that this post is not suggesting that the shapes from Uber are just being matched to the reference SS shapes, but I think I would need to spend more time reading and researching to better understand.
All of this is to say, the download options can be confusing.
Command line toolkit
Fortunately, there is a toolkit called
mdt that is also offered, via
npm, for download. The
README offers sufficient details to allow me to understand what I need to do to pull down a project, even providing an example script to run in the command line that is exactly what I want:
The only thing I wanted to do was look at a more typical month, so I picked September instead and set the time frame as:
I now had the following one line operation:
Errors during execution
Running this unfortunately resulted in an error.
First, the OSM ID mappings (so I think this was downloading pairings from the Movement IDs to OSM ID pairs) worked fine:
Next, a JSON of the whole Cinci road network downloaded successfully:
All related speed data also came through (this I imagine would be small, just sets or float values with id pairings):
What I believe remains to be executed is the mapping of this street speed data to the appropriate GeoJSON line string geometry based on the Movement ID to OSM ID pairings:
Unfortunately, the following error was thrown after the previous line was emitted:
I have 32 GB of memory available on my local machine, with a little more than half allocated to Docker. I noticed that my memory pressure in my Activity Monitor never really moved above 25% or so. On the off chance Docker was consuming too many resources, I turned it off and reran the operation. Again, it failed.
Again, without digging into the library being used here I was not able to figure out where the memory issue was. I suspect there may be an issue with the library itself…
At any rate: I was surprised that this was happening - I assume if it was able to download any of those files that preceded the reconciliation operation (moving Movement data into the GeoJSON), it should be able to do the last part fine.
I acknowledge that compiling the whole of the Cinci area as a GeoJSON is not the most appropriate format for this use case, but it should work (albeit be a really large output).
After running into this for awhile, I decided to see if I could download a small portion of the total site. Klokan Tech has a handy bounding box tool here. I used it to start with just a small bounding box around downtown.
The command looked like:
The file size was only
118K. This was just a few blocks in the downtown area.
I then increased the bounding box to include the downtown up to the Clifton neighborhood north of uptown. This time, the resulting GeoJSON is
I then went and upped the ante by trying everything within the I-275 beltway (which is most of urbanized Cinci). The resulting file size was
59M. This was a pretty big GeoJSON, but still not an unreasonably large file size.
I then wen and increased the size far beyond the beltway and out to Monroe and Lebanon to the north (almost halfway to Dayton, OH) and south to Alexandria. At this point we are encompassing the whole Cinci metro area. This was
117M. Again, I don’t see this being enough to cause node to crash the way it did.
Finally, I made a massive bounding box that covers all of Lexington, KY and north straight past Dayton, OH. At this point, the error occurred again:
So, it does appear to be a consequence of the area being considered. That said, it also does seem to be the case that the helper utility might be erring in some way that could be preventable, given the limited memory pressure being observed while the operation is running.