Introduction
AWS has been incrementally releasing some geospatial functionality on Athena over the past few years. Athena engine version 2 released a series of major quality of life improvements (see blog post) that allowed for the assembly of geospatial primitive spatial types.
With version 3, new operators allow for more advanced spatial operations to be performed and expressed just through Athena’s SQL interface (rather than, say, needing to spin up a Spark stack and run user-defined functions over the coordinate data).
In this blog post, I will demonstrate a quick example of using these features to calculate attributes from more “complex” spatial types. The Trino documentation already describes how to use a to_spherical_geography
to calculate great circle distance between two points cast as ST_Point()
objects:
ST_Distance(to_spherical_geography(ST_Point(-71.0882, 42.3607)), to_spherical_geography(ST_Point(-74.1197, 40.6976)))
Example length calculation
Let’s say your data is stored in the following format with a coordinate array representing the path of a line. If you want to calculate, say, the length of this linestring, you can do so by building up from the sum of the pairwise coordinates using the same pattern of ST_Distance()
shown in the Trino documentation sampled above.
First, the start data:
select '[[130.266523,33.317762],[130.265568,33.317536],[130.263366,33.317021],[130.262584,33.316808],[130.261858,33.316611]]' as coords
This data can then be parsed as a JSON string:
select cast(json_parse(coords) as array(array(double))) as coords
From this point, we can use this (extremely handy) to_spherical_geometry
method to recast each coordinate as a point on a globe, which will allow for spherical distance calculation (as opposed to Euclidean distance):
transform(coords, x -> to_spherical_geography(ST_Point(x[1], x[2]))) as points
Now that we have an array of points that are in spherical projection, we need to pair them with their predecessor to enable a distance calculation:
select
array [element_at(points, 1)] || points as points_fr,
points || array [element_at(points, -1)] as points_to
Distances can now be defined between each pairing, once zipped together. For each pair, a distance can be measured.
select
reduce(
zip_with(points_fr, points_to,
(x, y) -> ST_Distance(x, y)
),
0,
(s, x) -> s + x,
s -> s
) as dist_meters
from paired
The final step will output, for that single example input, a column dist_meters
with a value of 452.05113745459886
meters.
Conclusion
Thanks to the ability to project as a spherical geography, distance measures can now be expressed directly in Athena SQL queries. Wrapping together the above steps into a single defined query outputting the results described above can look like this:
with base as (
select '[[130.266523,33.317762],[130.265568,33.317536],[130.263366,33.317021],[130.262584,33.316808],[130.261858,33.316611]]' as coords
),
parsed_base as (
select cast(json_parse(coords) as array(array(double))) as coords
from base
),
as_geoms as (
select
transform(coords, x -> to_spherical_geography(ST_Point(x[1], x[2]))) as points
from parsed_base
),
paired as (
select
array [element_at(points, 1)] || points as points_fr,
points || array [element_at(points, -1)] as points_to
from as_geoms
),
distances as (
select
reduce(
zip_with(points_fr, points_to,
(x, y) -> ST_Distance(x, y)
),
0,
(s, x) -> s + x,
s -> s
) as dist_meters
from paired
)
select * from distances