Introduction
It is common to deal with data coming in a format where x
values are represented by longitudinal coordinates and y
values are represented by latitudinal coordinates. This format, known as WGS 84 or EPSG 4326, is outlined in greater detail here.
Often, in a Python workflow, this data will be read into a GeoDataFrame where the geometric shapes are paired with other metadata (in other columns). Subsequent manipulations may require distance calculations.
Because a system might be used to process data from one part of the world and then, later, from another; it helps to calculate the projection on the fly. This blog post documents how to do this quickly.
Note: This method is actually a 1:1 map of the steps also encoded in OSMnx, a tool for working with OSM network data in Python with NetworkX. You can see the code this post is based on here.
Steps
First let’s create an example dataset from some data with line string segments.
In the below snippet, we take some array of coordinate arrays that represent line string geometries. Because the data is in a format where x
values are represented by longitudinal coordinates and y
values are represented by latitudinal coordinates, we register the input coordinate reference system (CRS) as EPSG 4326.
Printing the head of the data will look like the following:
Note that we could be using a whole data frame but, for the purposes of this example, it is sufficient to just focus on the geometries column as this is the one that holds and is affected by the CRS (and it subsequently being changed).
We assume that the data being worked with is tied within roughly a single area or region such that extracting all the longitudes will result in a clustering that is relatively geographically isolated.
Thus, all longitudes fall roughly in the same area:
As a result, we can find the average and use this as a reference value:
With that average value calculated, we can insert the value into the following equation. This produces an integer value that renders the zone id related to the UTM this data is clustered in. You can learn more about UTM zones here.
Now, it is simply a matter of formatting the new zone id into a string that will indicate to PyProj (via GeoPandas) to re-project from EPSG 4326 to the appropriate meter-based UTM zone id.
Conclusion
To conclude, we can now examine the reprojected GeoSeries and see that x
and y
values are now in meter-values.