TL;DR There’s a bug with GeoPandas that results in GeoPackage files losing their FID, as captured in this issue. The issue lies in the read step, which fails to capture that the
fid value has been shifted from a property in the
index value held by the
GeoPackage write operations in GeoPandas do not produce expected behavior. When you write to a
GPKG file and then re-read the output file back in as a GeoDataFrame, a column named
fid will disappear.
We can re-create this with a simple example. First we create a GeoDataFrame:
Then we can write the file and re-read it in as a 2nd GeoDataFrame:
Underlying fiona operations for write
To read and write these GeoDataFrames, GeoPandas is merely wrapping Fiona and leveraging that tools ability to interface with GDAL and perform the actual write operations. The key operator for all this is
fiona.open which requires a set of keyword arguments to execute a write operation.
First, it needs a schema object, which GeoPandas has a small utility that is used to generate this in the required format.
Additional keywords can be provided, but at the very minimum, this is the information needed, plus a “record” that represents each row in the GeoDataFrame. The records can be created using a GeoDataFrame’s
iterfeatures() generator. Converted to a list and examining the first entry we can see the following:
As we can see in the above example, each column’s value for that row is held in the properties component of that feature (which follows the GeoJSON pattern in terms of object formatting).
Reading operation with Fiona
If we just write one record as we showed in the above example, we can read that back as a single GeoDataFrame. That GeoDataFrame will have the following content:
Now, if we look at the underlying operation in which the
Fiona.open method is being used to read instead of write, we can see what happened to the
In this operation we can see that the “12” value for the
fid is indeed present but has been moved to the id column. All other values still remain the properties column. What is happening is that, during the write process the
fid is being assigned to the index.
Without modifying the GeoPandas codebase, one can quickly extract the missing
fid column by extraction this from each feature
id with the
fiona.open file reader:
Once that is one, you have an array that has all the values for the
fid column that was previously exempt and can simply assign that to the existing GeoDataFrame that has already been read in but is missing that column. Here’s an example of how that might happen:
What is happening to fid on write
When GeoPandas writes to a file, it’s pretty straightforward in terms of how it just wraps fiona operations:
It’s iterating through each record and writing the record generated for that row via the driver specified. Per the GDAL GeoPackage documentation (see it here), the FID layer includes an FID designation for layer creation described as follows: “Column name to use for the OGR FID (primary key in the SQLite database). Default to ‘fid.’”
It appears that the write operation intentionally - that is by design - takes an “fid” attribute if available and uses it as the column. What this means is that it expects that this be an integer value column.
Let’s say we instead made the column a float value instead:
The resulting operation would error like so:
The GeoPackage driver was expecting that the
fid column be an integer column. The same would happen if that column was a string, too (or anything else other than an integer).
In order to accommodate for this in GeoPandas a number of decisions would need to be made. For example, if a user were to export an
fid column and want it preserved - the column would have to be renamed so as to not error by the driver on write. On read, the driver would need to look for the
id for each feature and add it to the properties dictionary representing all rows in the to-be-created GeoDataFrame. It would need to name that new column
fid and set each value read in (that was originally a string) as an integer.
fid management is something that should be carefully considered and managed before submitting a GeoDataFrame to an export operation. Similarly, preservation of
fid on GeoPackage read-in should be explicitly called out so as to avoid reading in ids that are not integer-based as
fid (and thereby cause a downstream error should an export be tried then).