In GeoPandas v0.6.0, the missing data handling was refactored and made more consistent across the library.
在GeoPandas v0.6.0中,对缺失数据的处理进行了重构,使整个库更加一致。
Historically, missing (“NA”) values in a GeoSeries could be represented by empty geometric objects, in addition to standard representations such as None
and np.nan
. At least, this was the case in GeoSeries.isna()
or when a GeoSeries got aligned in geospatial operations. But, other methods like dropna()
and fillna()
did not follow this approach and did not consider empty geometries as missing.
从历史上看,除了 None 和 np.nan 等标准表示之外,GeoSeries 中的缺失值(“NA”)还可以由空几何对象表示。至少,在 GeoSeries.isna() 中或当 GeoSeries 在地理空间操作中对齐时就是这种情况。但是,其他方法如 dropna() 和 fillna() 没有遵循这种方法,也没有将空几何视为缺失。
In GeoPandas v0.6.0, the most important change is GeoSeries.isna()
no longer treating empty as missing:
在 GeoPandas v0.6.0 中,最重要的变化是 GeoSeries.isna() 不再将空视为缺失:
- Using the small example from above, the old behaviour treated both the empty as missing geometry as “missing”:使用上面的小例子,旧行为将空的和缺失的几何图形都视为“缺失”:
>>> s
0 POLYGON ((0 0, 1 1, 0 1, 0 0))
1 None
2 GEOMETRYCOLLECTION EMPTY
dtype: object
>>> s.isna()
0 False
1 True
2 True
dtype: bool
- Starting from GeoPandas v0.6.0, it will now only see actual missing values as missing:从 GeoPandas v0.6.0 开始,它现在只会将实际缺失值视为缺失:
In [11]: s.isna()
Out[11]:
0 False
1 True
2 False
dtype: bool
For now, when isna()
is called on a GeoSeries with empty geometries, a warning is raised to alert the user of the changed behaviour with an indication how to solve this.
目前,当在具有空几何的 GeoSeries 上调用 isna() 时,会发出警告以提醒用户已更改的行为并指示如何解决此问题。
Additionally, the behaviour of GeoSeries.align()
changed to use missing values instead of empty geometries to fill non-matching indexes. Consider the following small toy example:
此外,GeoSeries.align() 的行为更改为使用缺失值而不是空几何来填充不匹配的索引。考虑以下小示例:
In [12]: from shapely.geometry import Point
In [13]: s1 = geopandas.GeoSeries([Point(0, 0), Point(1, 1)], index=[0, 1])
In [14]: s2 = geopandas.GeoSeries([Point(1, 1), Point(2, 2)], index=[1, 2])
In [15]: s1
Out[15]:
0 POINT (0.000000000 0.000000000)
1 POINT (1.000000000 1.000000000)
dtype: geometry
In [16]: s2
Out[16]:
1 POINT (1.000000000 1.000000000)
2 POINT (2.000000000 2.000000000)
dtype: geometry
- Previously, the
align
method would use empty geometries to fill values:以前,对齐方法会使用空几何来填充值:
>>> s1_aligned, s2_aligned = s1.align(s2)
>>> s1_aligned
0 POINT (0 0)
1 POINT (1 1)
2 GEOMETRYCOLLECTION EMPTY
dtype: object
>>> s2_aligned
0 GEOMETRYCOLLECTION EMPTY
1 POINT (1 1)
2 POINT (2 2)
dtype: object
This method is used under the hood when performing spatial operations on mis-aligned GeoSeries objects:
在对未对齐的 GeoSeries 对象执行空间操作时,会在后台使用此方法:
>>> s1.intersection(s2)
0 GEOMETRYCOLLECTION EMPTY
1 POINT (1 1)
2 GEOMETRYCOLLECTION EMPTY
dtype: object
- Starting from GeoPandas v0.6.0,
GeoSeries.align()
will use missing values to fill in the non-aligned indices, to be consistent with the behaviour in pandas:
从 GeoPandas v0.6.0 开始,GeoSeries.align() 将使用缺失值来填充非对齐索引,以与 pandas 中的行为保持一致:
In [17]: s1_aligned, s2_aligned = s1.align(s2)
In [18]: s1_aligned
Out[18]:
0 POINT (0.000000000 0.000000000)
1 POINT (1.000000000 1.000000000)
2 None
dtype: geometry
In [19]: s2_aligned
Out[19]:
0 None
1 POINT (1.000000000 1.000000000)
2 POINT (2.000000000 2.000000000)
dtype: geometry
This has the consequence that spatial operations will also use missing values instead of empty geometries, which can have a different behaviour depending on the spatial operation:
这导致空间操作也将使用缺失值而不是空几何,根据空间操作,空几何可能具有不同的行为:
In [20]: s1.intersection(s2)
Out[20]:
0 None
1 POINT (1.000000000 1.000000000)
2 None
dtype: geometry