Trino/Presto spherical distance capabilities


Introduction

AWS has been incrementally releasing some geospatial functionality on Athena over the past few years. Athena engine version 2 released a series of major quality of life improvements (see blog post) that allowed for the assembly of geospatial primitive spatial types.

With version 3, new operators allow for more advanced spatial operations to be performed and expressed just through Athena’s SQL interface (rather than, say, needing to spin up a Spark stack and run user-defined functions over the coordinate data).

In this blog post, I will demonstrate a quick example of using these features to calculate attributes from more “complex” spatial types. The Trino documentation already describes how to use a to_spherical_geography to calculate great circle distance between two points cast as ST_Point() objects:

ST_Distance(to_spherical_geography(ST_Point(-71.0882, 42.3607)), to_spherical_geography(ST_Point(-74.1197, 40.6976)))

Example length calculation

Let’s say your data is stored in the following format with a coordinate array representing the path of a line. If you want to calculate, say, the length of this linestring, you can do so by building up from the sum of the pairwise coordinates using the same pattern of ST_Distance() shown in the Trino documentation sampled above.

First, the start data:

select '[[130.266523,33.317762],[130.265568,33.317536],[130.263366,33.317021],[130.262584,33.316808],[130.261858,33.316611]]' as coords

This data can then be parsed as a JSON string:

select cast(json_parse(coords) as array(array(double))) as coords

From this point, we can use this (extremely handy) to_spherical_geometry method to recast each coordinate as a point on a globe, which will allow for spherical distance calculation (as opposed to Euclidean distance):

transform(coords, x -> to_spherical_geography(ST_Point(x[1], x[2]))) as points

Now that we have an array of points that are in spherical projection, we need to pair them with their predecessor to enable a distance calculation:

select
    array [element_at(points, 1)] || points as points_fr,
    points || array [element_at(points, -1)] as points_to

Distances can now be defined between each pairing, once zipped together. For each pair, a distance can be measured.

select
    reduce(
        zip_with(points_fr, points_to,
            (x, y) -> ST_Distance(x, y)
        ),
        0,
        (s, x) -> s + x,
        s -> s
    ) as dist_meters
from paired

The final step will output, for that single example input, a column dist_meters with a value of 452.05113745459886 meters.

Conclusion

Thanks to the ability to project as a spherical geography, distance measures can now be expressed directly in Athena SQL queries. Wrapping together the above steps into a single defined query outputting the results described above can look like this:

with base as (
    select '[[130.266523,33.317762],[130.265568,33.317536],[130.263366,33.317021],[130.262584,33.316808],[130.261858,33.316611]]' as coords
),
parsed_base as (
    select cast(json_parse(coords) as array(array(double))) as coords
    from base
),
as_geoms as (
    select
        transform(coords, x -> to_spherical_geography(ST_Point(x[1], x[2]))) as points
    from parsed_base
),
paired as (
    select
        array [element_at(points, 1)] || points as points_fr,
        points || array [element_at(points, -1)] as points_to
    from as_geoms
),
distances as (
    select
        reduce(
            zip_with(points_fr, points_to,
                (x, y) -> ST_Distance(x, y)
            ),
            0,
            (s, x) -> s + x,
            s -> s
        ) as dist_meters
    from paired
)

select * from distances