TL;DR There’s a bug with GeoPandas that results in GeoPackage files losing their FID, as captured in this issue. The issue lies in the read step, which fails to capture that the fid
value has been shifted from a property in the index
value held by the id
key.
Introduction
GeoPackage write operations in GeoPandas do not produce expected behavior. When you write to a GPKG
file and then re-read the output file back in as a GeoDataFrame, a column named fid
will disappear.
We can re-create this with a simple example. First we create a GeoDataFrame:
Then we can write the file and re-read it in as a 2nd GeoDataFrame:
Underlying fiona operations for write
To read and write these GeoDataFrames, GeoPandas is merely wrapping Fiona and leveraging that tools ability to interface with GDAL and perform the actual write operations. The key operator for all this is fiona.open
which requires a set of keyword arguments to execute a write operation.
First, it needs a schema object, which GeoPandas has a small utility that is used to generate this in the required format.
Additional keywords can be provided, but at the very minimum, this is the information needed, plus a “record” that represents each row in the GeoDataFrame. The records can be created using a GeoDataFrame’s iterfeatures()
generator. Converted to a list and examining the first entry we can see the following:
As we can see in the above example, each column’s value for that row is held in the properties component of that feature (which follows the GeoJSON pattern in terms of object formatting).
Reading operation with Fiona
If we just write one record as we showed in the above example, we can read that back as a single GeoDataFrame. That GeoDataFrame will have the following content:
Now, if we look at the underlying operation in which the Fiona.open
method is being used to read instead of write, we can see what happened to the fid
column.
In this operation we can see that the “12” value for the fid
is indeed present but has been moved to the id column. All other values still remain the properties column. What is happening is that, during the write process the fid
is being assigned to the index.
Quick fix
Without modifying the GeoPandas codebase, one can quickly extract the missing fid
column by extraction this from each feature id
with the fiona.open
file reader:
Once that is one, you have an array that has all the values for the fid
column that was previously exempt and can simply assign that to the existing GeoDataFrame that has already been read in but is missing that column. Here’s an example of how that might happen:
What is happening to fid on write
When GeoPandas writes to a file, it’s pretty straightforward in terms of how it just wraps fiona operations:
It’s iterating through each record and writing the record generated for that row via the driver specified. Per the GDAL GeoPackage documentation (see it here), the FID layer includes an FID designation for layer creation described as follows: “Column name to use for the OGR FID (primary key in the SQLite database). Default to ‘fid.’”
It appears that the write operation intentionally - that is by design - takes an “fid” attribute if available and uses it as the column. What this means is that it expects that this be an integer value column.
Let’s say we instead made the column a float value instead:
The resulting operation would error like so:
The GeoPackage driver was expecting that the fid
column be an integer column. The same would happen if that column was a string, too (or anything else other than an integer).
In order to accommodate for this in GeoPandas a number of decisions would need to be made. For example, if a user were to export an fid
column and want it preserved - the column would have to be renamed so as to not error by the driver on write. On read, the driver would need to look for the id
for each feature and add it to the properties dictionary representing all rows in the to-be-created GeoDataFrame. It would need to name that new column fid
and set each value read in (that was originally a string) as an integer.
Conclusions
fid
management is something that should be carefully considered and managed before submitting a GeoDataFrame to an export operation. Similarly, preservation of fid
on GeoPackage read-in should be explicitly called out so as to avoid reading in ids that are not integer-based as fid
(and thereby cause a downstream error should an export be tried then).