In this notebook we will have a look at the SpaceNet 1: Building Detection challenge. The goal in this challenge is to find building footprints in satellite images. The focus in this notebook will be on the way that image labels can be transformed in order to make them usable in a more standardized way for classical image recognition models and on a way how to measure the success of the model.

In order to use something like a neural network for image recognition, the model needs to be trained with labeled data. This labeled data usually needs to be in a standardized format. For example, if a model should do image classification, the input dimension of the model is usually strictly defined: it equals a fixed number of pixels of input images, like for example 224x224 pixel RGB images. And on the output side the data also needs to comply with a fixed dimension which usually coincides with the number of classes that should be predicted. A classification into cats and dogs would have an output of 2 numbers: the individual likelihoods for classes cats and dogs. In contrast to that, image segmentation models will assign each individual pixel of an image to one class out of a set of class labels. Hence, the output dimension will be equal to (or a multiple of) the number of image pixels.

Although the SpaceNet challenge goal is pretty similar to a image segmentation task, the labels are not given as labeled image pixels, but as vector shapes with georeferenced coordinates. It would be a problem to somehow try to directly predict those vector shapes, as they do not comply with a standardized output dimension: any building footprint might have a different number of edges of its vector shape and any image might have a different number of building footprints in it. Hence, it is not straightforward how the data should be transformed into a more standardized setting such that it can be used to train a model on it.

Additionally, one needs to think about a metric in order to measure whether the model did do a good job or not. In image classification tasks this is rather straightforward: for each image you can check whether the image class was predicted correctly or not. With multiple georeferenced building footprints you run into several problems, however, where it is not directly obvious how to measure success or failure. For example, any given building footprint might be partially detected. How much overlap would you then need between true and predicted building footprint in order to count this as a success? Also, two separate building footprints that are close together might be detected with just a single predicted overlapping building footprint. How well was the task fulfilled in this case?

We will dive into all of these problems and implement one possible solution that was proposed by SpaceNet as a helpful starting point for the challenge. If you want further details on the topic, please also have a look at the following two blog posts that I used to come up with this notebook:

Load and visualize data

Let's import some libraries first:

In [1]:
import rasterio
import matplotlib 
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from rasterio.plot import show # import the show function which allows us to display the image
import geopandas as gpd
import rioxarray
from rasterio import features
from shapely import Point
from scipy.ndimage import distance_transform_edt
from matplotlib.patches import Circle
from rasterio.features import shapes
from shapely.geometry import shape

Now we can load the .tif raster data with rioxarray:

In [2]:
raster_file = '../../data/external/spacenet/3band_AOI_1_RIO_img5792.tif'
geojson_file = '../../data/external/spacenet/Geo_AOI_1_RIO_img5792.geojson'
In [3]:
img_dataarr = rioxarray.open_rasterio(raster_file)
In [4]:
img_dataarr
Out[4]:
<xarray.DataArray (band: 3, y: 406, x: 439)>
[534702 values with dtype=uint8]
Coordinates:
  * band         (band) int64 1 2 3
  * x            (x) float64 -43.54 -43.54 -43.54 ... -43.54 -43.54 -43.54
  * y            (y) float64 -22.88 -22.88 -22.88 ... -22.88 -22.88 -22.88
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:  Area
    scale_factor:   1.0
    add_offset:     0.0

The xarray.DataArray comes with geospatial metadata and can be plotted easily with georeferenced axis ticks:

In [5]:
img_dataarr.rio.crs
Out[5]:
CRS.from_epsg(4326)
In [6]:
img_dataarr.plot.imshow()
plt.show()

Alternatively, we can also drop the geo-spatial location information and plot just the image from a numpy array:

In [7]:
img_data = img_dataarr.values
img_data = np.transpose(img_data, (1, 2, 0))
In [8]:
#img = np.dstack((red_band, green_band, blue_band))
f = plt.figure()
plt.imshow(img_data)
plt.show()

Now let's read in the data labels using GeoPandas:

In [9]:
building_footprints = gpd.read_file(geojson_file)
building_footprints.head(3)
Out[9]:
timestamp version changeset user uid HGIS_OID building type id area QAStatus HGISOID TaskArea Revision1 Shape_Leng Shape_Area partialBuilding partialDec geometry
0 2016-06-24 20:32:00+00:00 1 5824 Derick 43 121412.0 yes None way/170104 yes Original_Building 121412.0 West No 0.000447 0.0 0.0 1.000000 POLYGON Z ((-43.54179 -22.87637 0.00000, -43.5...
1 2016-06-24 20:35:43+00:00 1 5824 Derick 43 121411.0 yes None way/172303 yes Original_Building 121411.0 West No 0.000655 0.0 0.0 1.000000 POLYGON Z ((-43.54187 -22.87638 0.00000, -43.5...
2 2016-06-24 20:42:34+00:00 1 5837 Derick 43 121504.0 yes None way/176428 yes Original_Building 121504.0 West No 0.000322 0.0 1.0 0.013743 POLYGON Z ((-43.54254 -22.87619 0.00000, -43.5...

GeoPandas automatically used the building footprints as geometry of the DataFrame. The individual polygons can be easily visualized:

In [10]:
building_footprints.plot()
plt.show()

Putting raster data and image labels into a single chart we can check how well the data was labeled:

In [11]:
fig, ax0 = plt.subplots(1, 1, figsize=(8,8))

img_dataarr.plot.imshow(ax=ax0)
building_footprints.plot(ax=ax0, alpha=0.25, edgecolor='None', facecolor='orange')
building_footprints.plot(ax=ax0, alpha=0.75, edgecolor='red', facecolor='None')
plt.show()

As you can see, building footprint rectangles are sometimes off by some pixels.

Rasterize vector data

Now the next step will be to get the labels into a more standardized format. For this, we will "rasterize" the vector data and "burn" the vector shapes into a raster file. The output will be a simple numpy array without geospatial information. In this process the rasterize function needs to know the transformation that is required to get from image pixel coordinates to georeferenced coordinates.

In [12]:
img_dataarr.rio.transform()
Out[12]:
Affine(4.483952164001227e-06, 0.0, -43.5436217964,
       0.0, -4.492595073897002e-06, -22.876191137)
In [13]:
# Rasterize vector using the shape and coordinate system of the raster
rasterized = features.rasterize(building_footprints.geometry.values,
                                out_shape = img_dataarr.rio.shape,
                                fill = 0,
                                out = None,
                                transform = img_dataarr.rio.transform(),
                                all_touched = False,
                                default_value = 1,
                                dtype = None)
In [14]:
type(rasterized)
Out[14]:
numpy.ndarray

We now have a nice pixel / raster representation of the original vector data.

In [15]:
# Plot raster
fig, (ax0, ax1) = plt.subplots(1, 2, figsize = (10, 10))

img_dataarr.plot.imshow(ax=ax0)
building_footprints.plot(ax=ax0, alpha=0.25, edgecolor='None', facecolor='orange')
building_footprints.plot(ax=ax0, alpha=0.75, edgecolor='red', facecolor='None')

show(rasterized, ax=ax1)
plt.show()

Verify x/y indexing

At this point it might be helpful to have a closer look at the way that the data is indexed. The raster matrix is transposed in a sense that the index with label x is stored in columns. This can be seen from the shape of the index. x has 439 entries:

In [16]:
img_dataarr.indexes.dims
Out[16]:
Frozen({'band': 3, 'x': 439, 'y': 406})
In [17]:
img_dataarr.x.coords
Out[17]:
Coordinates:
  * x            (x) float64 -43.54 -43.54 -43.54 ... -43.54 -43.54 -43.54
    spatial_ref  int64 0
In [18]:
img_dataarr.shape
Out[18]:
(3, 406, 439)

And these 439 entries are stored as second dimension (columns) in the numpy array:

In [19]:
img_data.shape
Out[19]:
(406, 439, 3)
In [20]:
rasterized.shape
Out[20]:
(406, 439)

In other words, in the numpy array the first dimension is y, while the second dimension is x. This is different for shapely geometries where the first dimension relates to x coordinates:

In [21]:
this_poly = building_footprints.geometry[0]
In [22]:
this_poly.exterior.coords.xy
Out[22]:
(array('d', [-43.541791399999966, -43.54187379999996, -43.54189189999994, -43.541906, -43.5418708, -43.541876, -43.54182689999993, -43.54182169999996, -43.541785, -43.541771, -43.541791399999966]),
 array('d', [-22.87636889999993, -22.876382599999943, -22.87638549999997, -22.87631289999996, -22.876307099999963, -22.87628069999994, -22.876272599999936, -22.876299399999937, -22.876293399999952, -22.87636559999993, -22.87636889999993]))

Let's look at an example to verify this. We pick some arbitrary point in pixel space:

In [23]:
irow = 283 # rows, represents y coordinates
icol = 113 # columns, represents x coordinates

Now in order to overlay this point into the pixel image we will create a Circle object. For a circle, the input coordinates need to be given as (x,y) tuple. For this we need to keep in mind that x is represented as column in the raster image. Hence, the tuple needs to be (icol, irow) now:

In [24]:
this_circle = Circle((icol, irow), radius=5, color='red') # Circle coordinates are given as (x,y); irow represents y, icol represents x

Also, we'll manipulate some image values and overwrite them with a white rectangle:

In [25]:
img_data = img_dataarr.values.copy()
img_data = np.transpose(img_data, (1, 2, 0))

# make rectangular shape with color
this_fake_val = 250
img_data[0:10, 0:50, 0] = this_fake_val
img_data[0:10, 0:50, 1] = this_fake_val
img_data[0:10, 0:50, 2] = this_fake_val
In [26]:
img_data.shape
Out[26]:
(406, 439, 3)

Now if we visualize the data we can see that the rectangle's larger side extends into dimension x, although we manipulated more columns than rows in the numpy array. This might go against the intuition.

In [27]:
fig, ax = plt.subplots(1)
ax.imshow(img_data)
ax.add_patch(this_circle)
plt.xlabel('x')
plt.ylabel('y')
plt.show()

At this point, let us also have a closer look at the transformation between pixel space and geo-spatial coordinates. In order to get from pixel space to coordiantes we can easily use the transformation that is stored in the xarray. For example, we can map the top left pixel (0,0) with:

In [28]:
(x_coord, y_coord) = rasterio.transform.xy(img_dataarr.rio.transform(), 0, 0)
x_coord, y_coord
Out[28]:
(-43.54361955442391, -22.876193383297537)

Let's verify this by plotting a red circle at the transformed coordinates:

In [29]:
fig, ax = plt.subplots(1)
img_dataarr.plot.imshow(ax=ax)
ax.add_patch(Circle((x_coord, y_coord), radius=0.0001, color='red'))
plt.show()

The same formula also holds for an arbitrary point of course.

In [30]:
x_coord, y_coord = img_dataarr.rio.transform() * (icol, irow)
In [31]:
x_coord, y_coord
Out[31]:
(-43.54311510980546, -22.87746254140591)
In [32]:
# an alternative code would be:
# (y_coord, x_coord) = rasterio.transform.xy(img_dataarr.rio.transform(), irow, icol)
In [33]:
fig, ax = plt.subplots(1)
img_dataarr.plot.imshow(ax=ax)
ax.add_patch(Circle((x_coord, y_coord), radius=0.00002, color='red'))
plt.show()

Measure distances

Now we want to start thinking about how to measure distances between pixels and geometries. We will use this to find an overlapping geometry for the point that we have chosen. Let's first pick some arbitrary polygon:

In [34]:
this_poly = building_footprints.geometry[0]
this_poly_gdp = gpd.GeoDataFrame(geometry=[this_poly], crs=img_dataarr.rio.crs)

fig, ax = plt.subplots(1)
img_dataarr.plot.imshow(ax=ax)

ax.add_patch(Circle((x_coord, y_coord), radius=0.00002, color='red'))
this_poly_gdp.plot(ax=ax, facecolor='None', edgecolor='blue')
plt.show()

We can see that the x/y dimensions of the point is in line with the x/y dimensions from the polygon:

In [35]:
this_poly.bounds
Out[35]:
(-43.541906, -22.87638549999997, -43.541771, -22.876272599999936)
In [36]:
this_point = Point(x_coord, y_coord)
this_point.coords[0]
Out[36]:
(-43.54311510980546, -22.87746254140591)

Hence, the distance between our point and the polygon can be easily computed:

In [37]:
newpd = this_point.distance(this_poly.boundary)
newpd
Out[37]:
0.0016298037975299594

In order to find the closest polygon to our point we simply need to loop over all polygons and keep track of the minimum distance or any polygon that fully contains the point:

In [38]:
counter = 0
min_counter = 0
running_min_dist = 100000
poly_contain_index = None

for this_poly in building_footprints.geometry:
    
    this_dist = this_point.distance(this_poly.boundary)
    if this_dist < running_min_dist:
        running_min_dist = this_dist
        min_counter = counter
    
    if this_poly.contains(this_point):
        poly_contain_index = counter
        
    counter += 1
In [39]:
min_counter
Out[39]:
12
In [40]:
poly_contain_index
Out[40]:
12

Let's verify that we picked the correct polygon:

In [41]:
this_poly = building_footprints.geometry[poly_contain_index]
this_poly_gdp = gpd.GeoDataFrame(geometry=[this_poly], crs=img_dataarr.rio.crs)

fig, ax = plt.subplots(1)
img_dataarr.plot.imshow(ax=ax)

ax.add_patch(Circle((x_coord, y_coord), radius=0.00002, color='red'))
this_poly_gdp.plot(ax=ax, facecolor='None', edgecolor='blue')
plt.show()

Distance transformation

We've already seen one way of how the building footprint vector labels can be transformed into a standardized format by rasterizing the vector data. However, this label transformation comes with one potential problem: building footprints that are too close to each other will end up as one connected segment and hence the model can not be able to distinguish the individual buildings anymore.

There is an alternative approach that might help to better preserve individual buildings. The idea is to compute for any pixel the distance to the closest building bounds and use this distance to encode building information. For pixels that are inside of a building we encode this distance with a negative sign such that we can better distinguish interior and exterior points.

A straightforward implementation of this approach is to loop over all polygons for each pixel and compute the distance.

In [42]:
this_point = Point(x_coord, y_coord)
In [43]:
def get_min_distance(this_point, building_footprints):
    
    counter = 0
    running_min_dist = 100000
    
    for this_poly in building_footprints.geometry:
        
        this_dist = this_point.distance(this_poly.boundary)
        
        if this_poly.contains(this_point):
            
            this_dist = -1 * this_dist
            running_min_dist = this_dist
            
            return running_min_dist
        
        else:
            
            if this_dist < running_min_dist:
            
                running_min_dist = this_dist

    return running_min_dist
In [44]:
n_rows, n_cols, _ = img_data.shape

However, this implementation would be very slow:

In [45]:
%%time

min_dist_matr = np.zeros((n_rows, n_cols))

for icol in range(0, n_cols):
    for irow in range(0, n_rows):
        x_coord, y_coord = img_dataarr.rio.transform() * (icol, irow)
        this_point = Point(x_coord, y_coord)
        
        min_dist = get_min_distance(this_point, building_footprints)
        min_dist_matr[irow, icol] = min_dist
        
CPU times: user 3min 9s, sys: 80.3 ms, total: 3min 9s
Wall time: 3min 9s

This is how the building information looks like when rasterized using the distance transformation:

In [46]:
fig, (ax0, ax1) = plt.subplots(1, 2, figsize = (10, 10))

ax0.imshow(min_dist_matr, vmin=-0.00001, vmax=0.0001, cmap='jet_r')
#ax0.colorbar()

show(rasterized, ax=ax1)
plt.show()

We can get a very similar approach also by using the distance_transform_edt function from the scipy image transformation tools. We need to apply it twice in order to get distances for both interior and exterior points.

In [47]:
%%time

im_dist = distance_transform_edt(rasterized)

rasterized_inv = 1 - rasterized
im_dist_inv = distance_transform_edt(rasterized_inv)
CPU times: user 15.5 ms, sys: 0 ns, total: 15.5 ms
Wall time: 14.8 ms

Let's merge interior and exterior point distances into a single rasterized image with negative distances for interior values.

In [48]:
im_dist_merged = (-1)*im_dist.copy()
im_dist_merged[im_dist_inv > 0] = im_dist_inv[im_dist_inv > 0]

For the visualization we need to deal with the fact that distances now live on a different scale than in the approach before:

In [49]:
np.min(im_dist_merged)
Out[49]:
-13.601470508735444
In [50]:
np.max(im_dist_merged)
Out[50]:
237.0358622656074
In [51]:
np.min(min_dist_matr), np.max(min_dist_matr)
Out[51]:
(-5.712877571919858e-05, 0.0010599574749903967)

But with some reasonable settings for the color map we get an output that resembles the output from the slow implementation pretty closely:

In [52]:
fig, (ax0, ax1) = plt.subplots(1, 2, figsize = (10, 10))

ax0.imshow(min_dist_matr, vmin=-0.00001, vmax=0.0001, cmap='jet_r')
#ax0.colorbar()

plt.imshow(im_dist_merged, vmin=-7, vmax=10, cmap='jet_r')
plt.show()

Post-processing

The original image labels consisted of vector data building footprints. When we transform those labels into rasterized proxies and then use them to train a model then we actually solve a proxy problem instead of a real problem. In the end, we will need to transform the proxy output back to the true data format that we want to have: vector data.

In order to do this, we will use a clustering algorithm similar to the one described here: https://gist.github.com/hagerty/724b84ad69897d1fe6d241bbca9e2781. The idea is pretty simple. Let's say we now denote all interior values of buildings with positive values. Then within any building there should be a maximum value somewhere at the center of the building. And from there the distance values decrease on paths towards the building borders. Hence, starting from the maximum value we can grow a cluster by extending the cluster with neighboring points whenever they have decreasing value and simultaneously also positive values that denote the interior of buildings.

Let's first create an image with positive values for building interiors and zeros elsewhere:

In [53]:
pos_values = (min_dist_matr > 0)*(1)
plt.imshow(pos_values)
plt.show()
In [54]:
building_estimates = (-1)*im_dist_merged.copy()
building_estimates[building_estimates < 0] = 0

Now we will apply the cluster growing algorithm to the image. We keep track of all building pixels that are not yet added to some cluster in a variable to_be_classified.

In [55]:
to_be_classified = building_estimates > 0 # all positive values are inside of a building
cluster_int = 1
all_clusters = building_estimates * 0
In [56]:
plt.imshow(to_be_classified)
plt.show()

The algorithm starts with some cluster and grows this cluster until no neighoring points can be added anymore. Then the cluster points are removed from the to_be_classified matrix and it will start with the next cluster until all points in to_be_classified are classified.

In [57]:
def get_remaining_maximum(building_estimates, to_be_classified):
    """
    Find maximum value of remaining pixels. Positive values denote building interiors.
    """
    
    peak = np.argmax(np.multiply(building_estimates, to_be_classified))
    
    i_row = int(peak/building_estimates.shape[1])
    i_col = int(peak - i_row * building_estimates.shape[1])
    
    return i_row, i_col

def decreasing_neighbor_check(shift_matr, matr):
    """
    For a left-, right-, top- or bottom-shifted matrix, check whether new cluster neighbors are of decreasing slope
    """

    slope_matr = shift_matr - matr
    decreasing_neighbors = np.logical_and((slope_matr > 0), shift_matr > 0)#.astype(int)

    return decreasing_neighbors
In [58]:
while np.any(to_be_classified):
    
    if cluster_int > 100:
        print('Emergency break')
        break
        
    i_row, i_col = get_remaining_maximum(building_estimates, to_be_classified)
    
    # initialize current cluster
    this_cluster = building_estimates * 0
    this_cluster[i_row, i_col] = 1
    to_be_classified[i_row, i_col] = False

    cluster_not_final = True
    
    # start of WHILE loop

    while cluster_not_final:

        cluster_size = np.sum(this_cluster > 0)
        #print(f'Current cluster size: {cluster_size}')

        # grow current cluster with neighboring pixels (that are still to be classified) of decreasing values
        matr = this_cluster.copy()

        # create shifted matrices
        n_rows, n_cols = matr.shape
        col_zeros = np.zeros((n_rows, 1))
        row_zeros = np.zeros((1, n_cols))

        right_shift = np.hstack((col_zeros, matr[:, 0:-1]))
        left_shift = np.hstack((matr[:, 1:], col_zeros))
        top_shift = np.vstack((row_zeros, matr[0:-1, :]))
        down_shift = np.vstack((matr[1:, :], row_zeros))

        # check for decreasing neighbors
        candidates_right = decreasing_neighbor_check(right_shift, matr)
        candidates_left = decreasing_neighbor_check(left_shift, matr)
        candidates_top = decreasing_neighbor_check(top_shift, matr)
        candidates_down = decreasing_neighbor_check(down_shift, matr)

        # aggregate
        new_cluster_points = (candidates_right | candidates_left | candidates_top | candidates_down) & to_be_classified

        n_new_points = np.sum(new_cluster_points > 0)

        # update cluster and to_be_classified
        this_cluster[new_cluster_points] = cluster_int
        to_be_classified[new_cluster_points] = False

        if n_new_points == 0:
            
            print(f'New cluster done: {cluster_int} with {np.sum(this_cluster > 0)} points')
            cluster_not_final = False
            all_clusters[this_cluster > 0] = cluster_int

#             fig, (ax0, ax1) = plt.subplots(1, 2, figsize = (10, 10))

#             # visualize current cluster
#             ax0.imshow(this_cluster)
#             ax1.imshow(to_be_classified)
            
#             plt.show()
            
            cluster_int += 1
            
New cluster done: 1 with 1512 points
New cluster done: 2 with 898 points
New cluster done: 3 with 929 points
New cluster done: 4 with 341 points
New cluster done: 5 with 451 points
New cluster done: 6 with 424 points
New cluster done: 7 with 273 points
New cluster done: 8 with 200 points
New cluster done: 9 with 343 points
New cluster done: 10 with 429 points
New cluster done: 11 with 350 points
New cluster done: 12 with 342 points
New cluster done: 13 with 367 points
New cluster done: 14 with 335 points
New cluster done: 15 with 316 points
New cluster done: 16 with 309 points
New cluster done: 17 with 341 points
New cluster done: 18 with 313 points
New cluster done: 19 with 310 points
New cluster done: 20 with 311 points
New cluster done: 21 with 300 points
New cluster done: 22 with 290 points
New cluster done: 23 with 318 points
New cluster done: 24 with 308 points
New cluster done: 25 with 286 points
New cluster done: 26 with 322 points
New cluster done: 27 with 349 points
New cluster done: 28 with 324 points
New cluster done: 29 with 291 points
New cluster done: 30 with 312 points
New cluster done: 31 with 380 points
New cluster done: 32 with 315 points
New cluster done: 33 with 403 points
New cluster done: 34 with 304 points
New cluster done: 35 with 305 points
New cluster done: 36 with 372 points
New cluster done: 37 with 355 points
New cluster done: 38 with 387 points
New cluster done: 39 with 308 points
New cluster done: 40 with 304 points
New cluster done: 41 with 306 points
New cluster done: 42 with 282 points
New cluster done: 43 with 304 points
New cluster done: 44 with 340 points
New cluster done: 45 with 202 points
New cluster done: 46 with 42 points
New cluster done: 47 with 11 points
New cluster done: 48 with 4 points

Let's visualize how well this post-processing algorithm is suited to transform the rasterized labels back to separate buildings.

In [59]:
all_clusters_disp = all_clusters.copy()
all_clusters_disp[all_clusters == 0] = -200
In [60]:
# Plot raster
fig, (ax0, ax1) = plt.subplots(1, 2, figsize = (10, 10))

img_dataarr.plot.imshow(ax=ax0)
building_footprints.plot(ax=ax0, alpha=0.25, edgecolor='None', facecolor='orange')
building_footprints.plot(ax=ax0, alpha=0.75, edgecolor='red', facecolor='None')

ax1.imshow(all_clusters_disp)
plt.show()

As we can see, the individual clusters seem to match the true vector footprints pretty well. However, the two adjacent buildings in the upper right corner will erroneously end up in a single cluster.

Transform to vector geometries

Now we only lack one final step: we need to translate the pixel clusters to vector polygons with correct georeferenced coordinates.

In [61]:
n_clusters = int(np.max(all_clusters)) # implicit assumption of consecutively numbered clusters!
In [62]:
estimated_polys = []

for this_cluster_id in range(1, n_clusters+1):
    
    this_mask = all_clusters == this_cluster_id
    
    # vectorize polygon
    results = ({'properties': {'raster_val': v}, 'geometry': s}
            for i, (s, v) 
            in enumerate(
                shapes(all_clusters.astype(rasterio.int16), mask=this_mask, transform=img_dataarr.rio.transform())))
    
    # get potentially multiple polygons
    geoms = list(results)
    assert len(geoms) == 1, f'Multiple polygons found for cluster {this_cluster_id}'
    
    this_poly = shape(geoms[0]['geometry'])
    estimated_polys.append(this_poly)
    
estimated_polys_gpd = gpd.GeoDataFrame(estimated_polys, geometry=estimated_polys)

As we can see, the vectorized polygons match the original labels pretty closely:

In [63]:
# Plot raster
fig, (ax0, ax1) = plt.subplots(1, 2, figsize = (10, 10))

img_dataarr.plot.imshow(ax=ax0)
building_footprints.plot(ax=ax0, alpha=0.25, edgecolor='None', facecolor='orange')
building_footprints.plot(ax=ax0, alpha=0.75, edgecolor='red', facecolor='None')

estimated_polys_gpd.plot(ax=ax1)
plt.show()

This can best be seen when we plot both in a single image:

In [64]:
# Plot raster
fig, ax0 = plt.subplots(1, figsize = (6, 6))


#building_footprints.plot(ax=ax0, alpha=0.25, edgecolor='None', facecolor='orange')
estimated_polys_gpd.plot(ax=ax0)
building_footprints.plot(ax=ax0, alpha=0.75, edgecolor='red', facecolor='None')



plt.show()

Compute score

Now that we already have an algorithm to translate pixel labels into georeferenced polygons, we still need to think about a way how we can evaluate and measure how well any post-processed polygons match their original labels. One metric proposed by SpaceNet is the Intersection-over-Union (IoU) measure:

$$IoU(A,B) = \frac{A\cap B}{A \cup B}$$

We compute all pairwise IoUs and keep all IoUs above a threshold of 0.5:

In [65]:
true_polys = building_footprints.geometry
estimated_polys = estimated_polys_gpd.geometry
In [66]:
n_true_polys = len(true_polys)
n_estimated_polys = len(estimated_polys)

iou_thres = 0.5

all_ious = np.zeros((n_true_polys, n_estimated_polys))

for ii in range(0, n_true_polys):
    for jj in range(0, n_estimated_polys):
        
        this_true_poly = true_polys[ii]
        this_estimated_poly = estimated_polys[jj]

        intersect_poly = this_true_poly.intersection(this_estimated_poly)
        union_poly = this_true_poly.union(this_estimated_poly)

        iou = intersect_poly.area / union_poly.area
        
        if iou >= iou_thres:
            all_ious[ii, jj] = iou
        else:
            all_ious[ii, jj] = 0
/home/chris/research/earth-gis/venv/lib/python3.10/site-packages/shapely/set_operations.py:133: RuntimeWarning: invalid value encountered in intersection
  return lib.intersection(a, b, **kwargs)
In [67]:
plt.imshow(all_ious)
plt.title('Pairwise IoU values')
plt.xlabel('Index estimated polygon')
plt.ylabel('Index true polygon')
plt.show()

Now we can create matches between true and estimated polygons. We do this by consecutively matching polyons with highest IoU value and eliminating any matched polygons from the list until no polygons are left anymore.

In [68]:
def get_argmax_indices(matr):

    peak = np.argmax(matr)
    
    i_row = int(peak/matr.shape[1])
    i_col = int(peak - i_row * matr.shape[1])
    
    return i_row, i_col
In [69]:
true_list_i = []
est_list_i = []
iou_list = []

running_all_ious = all_ious.copy()
counter = 0

while np.any(running_all_ious) > 0:
    if counter > 50:
        print('Emergency break')
        break
    
    i_true, i_est = get_argmax_indices(running_all_ious)
    
    true_list_i.append(i_true)
    est_list_i.append(i_est)
    iou_list.append(running_all_ious[i_true, i_est])
    
    running_all_ious[i_true, :] = 0
    running_all_ious[:, i_est] = 0
    
    counter += 1

Let's keep track of all matches in a DataFrame:

In [70]:
df_matched = pd.DataFrame(true_list_i, columns=['true_polygon_i'])
df_matched['est_polygon_i'] = est_list_i
df_matched['iou'] = iou_list
df_matched.head(3)
Out[70]:
true_polygon_i est_polygon_i iou
0 47 1 0.962135
1 32 3 0.961385
2 35 4 0.953809
In [71]:
df_matched.tail(3)
Out[71]:
true_polygon_i est_polygon_i iou
44 49 46 0.705003
45 1 0 0.640250
46 2 47 0.576933

The algorithm can also leave some polygons unmatched. We will add them to the DataFrame as well

In [72]:
n_estimated_polys
Out[72]:
48
In [73]:
unmatched_true_inds = ~pd.Series(range(0, n_true_polys)).isin(df_matched['true_polygon_i'])
unmatched_true_inds = unmatched_true_inds[unmatched_true_inds].index.values
n_unmatched = len(unmatched_true_inds)

df_unmatched = pd.DataFrame(unmatched_true_inds, columns=['true_polygon_i'])
df_unmatched['est_polygon_i'] = np.nan
df_unmatched['iou'] = np.nan
df_unmatched
Out[73]:
true_polygon_i est_polygon_i iou
0 0 NaN NaN
1 13 NaN NaN
2 38 NaN NaN
3 50 NaN NaN
In [74]:
unmatched_est_inds = ~pd.Series(range(0, n_estimated_polys)).isin(df_matched['est_polygon_i'])
unmatched_est_inds = unmatched_est_inds[unmatched_est_inds].index.values
n_unmatched = len(unmatched_est_inds)

df_unmatched_est = pd.DataFrame(unmatched_est_inds, columns=['est_polygon_i'])
df_unmatched_est['true_polygon_i'] = np.nan
df_unmatched_est['iou'] = np.nan
df_unmatched_est
Out[74]:
est_polygon_i true_polygon_i iou
0 2 NaN NaN
In [75]:
df_matched_unmatched = pd.concat([df_matched, df_unmatched, df_unmatched_est])
df_matched_unmatched.tail(5)
Out[75]:
true_polygon_i est_polygon_i iou
0 0.0 NaN NaN
1 13.0 NaN NaN
2 38.0 NaN NaN
3 50.0 NaN NaN
0 NaN 2.0 NaN

Let's inspect one case with high and low IoU match.

In [76]:
this_true_poly = building_footprints.geometry[df_matched.tail(1)['true_polygon_i'].squeeze()]
this_estimated_poly = estimated_polys.geometry[df_matched.tail(1)['est_polygon_i'].squeeze()]
In [77]:
ax = gpd.GeoDataFrame([this_true_poly], geometry=[this_true_poly]).plot(facecolor='None', edgecolor='red')
gpd.GeoDataFrame([this_estimated_poly], geometry=[this_estimated_poly]).plot(ax=ax, facecolor='None', edgecolor='blue')
plt.show()

For the best matching case the only deviation seems to come from the fact that the vectorized polygon still reflects the original raster pixel pattern.

In [78]:
this_true_poly = building_footprints.geometry[df_matched.head(1)['true_polygon_i'].squeeze()]
this_estimated_poly = estimated_polys.geometry[df_matched.head(1)['est_polygon_i'].squeeze()]
In [79]:
ax = gpd.GeoDataFrame([this_true_poly], geometry=[this_true_poly]).plot(facecolor='None', edgecolor='red')
gpd.GeoDataFrame([this_estimated_poly], geometry=[this_estimated_poly]).plot(ax=ax, facecolor='None', edgecolor='blue')
plt.show()

Based on this table of matched/unmatched polygons we can now compute metrics like precision, recall and F1 score.

In [80]:
n_matched_polys = df_matched.shape[0]
In [81]:
precision = n_matched_polys / n_estimated_polys
precision
Out[81]:
0.9791666666666666
In [82]:
recall = n_matched_polys / n_true_polys
recall
Out[82]:
0.9215686274509803
In [83]:
f1_score = 2*precision*recall / (precision+recall)
f1_score
Out[83]:
0.9494949494949494

The source of the notebook can be found here