In [1]:
import rioxarray
from shapely.geometry import Point
from shapely import Polygon
from pystac_client import Client
import math
import matplotlib.pyplot as plt
import geopandas as gpd
import contextily as cx

There are multiple ways to map points of a 3D globe to a 2D map. Each transformation comes with its own coordinate reference system (CRS). Whenever you want to work with different data sources you will need to make sure that all your data is aligned. Aligning data from different coordinate reference systems can be achieved by reprojecting all data to a common CRS.

When working with satellite data you will usually have to deal with reprojections, as individual images are usually given in terms of a local CRS, but not with a CRS that was designed for global applications. Also, satellite images are usually oriented in line with the orbit of the satellite, so that they appear "tilted" with regards to most CRSs.

In this notebook we will inspect some properties of satellite images, reprojections to different CRSs and clipping (subsetting) of images.

Load and visualize a true color image of Las Vegas

Let's first define a point of interest for our analysis. In this case we choose Las Vegas.

In [2]:
poi = Point(-115.3036586, 36.1287561) # Las Vegas

Now we need to find a satellite image for our point of interest. The easiest way to do this is by using a STAC browser. We will use atmospherically corrected Surface Reflectance Sentinel images (Level 2A data) that are derived from the associated Level-1C products (https://sentinels.copernicus.eu/web/sentinel/user-guides/sentinel-2-msi/product-types/level-2a).

In [3]:
api_url = "https://earth-search.aws.element84.com/v0"
client = Client.open(api_url)

collection = "sentinel-s2-l2a-cogs" 
search = client.search(
    collections=[collection],
    intersects=poi,
    datetime="2023-03-26/2023-03-27"
)

For the given query we find two images:

In [4]:
items = search.get_all_items()
len(items)
Out[4]:
2

In order to better understand what the data looks like, we first load a true color overview image from the dataset. This is a rather low resolution image to give us a quick visualization.

In [5]:
href = items[0].assets['overview'].href
href
Out[5]:
'https://sentinel-cogs.s3.us-west-2.amazonaws.com/sentinel-s2-l2a-cogs/11/S/PV/2023/3/S2B_11SPV_20230326_0_L2A/L2A_PVI.tif'
In [6]:
visual_tif = rioxarray.open_rasterio(href)
In [7]:
visual_tif
Out[7]:
<xarray.DataArray (band: 3, y: 343, x: 343)>
[352947 values with dtype=uint8]
Coordinates:
  * band         (band) int64 1 2 3
  * x            (x) float64 6.002e+05 6.005e+05 ... 7.093e+05 7.096e+05
  * y            (y) float64 4e+06 4e+06 3.999e+06 ... 3.891e+06 3.89e+06
    spatial_ref  int64 0
Attributes:
    AREA_OR_POINT:       Area
    OVR_RESAMPLING_ALG:  AVERAGE
    _FillValue:          0
    scale_factor:        1.0
    add_offset:          0.0

A true color visualization can be obtained using imshow:

In [8]:
visual_tif.plot.imshow()
plt.show()

From the axis tick labels we can already see that x and y coordinates are not given in terms of latitude and longitude. The exact CRS is attached to the dataset:

In [9]:
visual_tif.rio.crs
Out[9]:
CRS.from_epsg(32611)

Visualize image bounds in CRSs

From the image metadata we can get the coordinates of the satellite image bounds:

In [10]:
items[0].geometry
Out[10]:
{'type': 'Polygon',
 'coordinates': [[[-114.69735281151316, 35.13301555004904],
   [-115.9021350633042, 35.14992495410684],
   [-115.88850843890832, 36.1397316133374],
   [-114.66879673080497, 36.12219793572046],
   [-114.69735281151316, 35.13301555004904]]]}

However, we can immediately see that those coordinates are given in terms of latitude and longitude, although the projection of the image is EPSG 32611 (as we already know):

In [11]:
items[0].properties['proj:epsg']
Out[11]:
32611

I did not find any other CRS mention, so that I assume that the geometry polygon is given in a popular global CRS, e.g. EPSG 4326. So this is what we will assume for now. Let's pull most metadata from the items and keep them in a GeoDataFrame. We will also create "granule" labels for the images that label the individual regions where satellite images exist.

In [12]:
stac_json = search.get_all_items_as_dict()
In [13]:
#gdf = gpd.GeoDataFrame.from_features(stac_json, visual_tif.rio.crs)
gdf = gpd.GeoDataFrame.from_features(stac_json, "epsg:4326")

gdf["granule"] = (
    gdf["sentinel:utm_zone"].apply(lambda x: f"{x:02d}")
    + gdf["sentinel:latitude_band"]
    + gdf["sentinel:grid_square"]
)

Now we can visualize the satellite image boxes for our images together with a base map:

In [14]:
ax = gdf.plot(
    "granule",
    edgecolor="black",
    facecolor="None",
    categorical=True,
    aspect="equal",
    alpha=0.2,
    figsize=(6, 12),
    legend=True,
    legend_kwds={"loc": "upper left", "frameon": False, "ncol": 1},
)

cx.add_basemap(
    ax, crs=gdf.crs.to_string(), source=cx.providers.CartoDB.Voyager
)

plt.title('Image bounds in EPSG 4326')
plt.show()

As we can see, the satellite image is slightly tilted in this global CRS. Let's reproject to the image CRS and visualize the image bounds again:

In [15]:
this_proj_str = '+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +type=crs'
gdf_reproj = gdf.to_crs(this_proj_str)
In [16]:
ax = gdf_reproj.plot(
    "granule",
    edgecolor="black",
    facecolor="None",
    categorical=True,
    aspect="equal",
    alpha=0.2,
    figsize=(6, 12),
    legend=True,
    legend_kwds={"loc": "upper left", "frameon": False, "ncol": 1},
)

cx.add_basemap(
    ax, crs=gdf_reproj.crs.to_string(), source=cx.providers.CartoDB.Voyager
)

plt.title('Image bounds in EPSG 32611')
plt.show()

This time the image is better aligned with the CRS.

Image clipping

Satellite images tend to be rather large files. In many cases it would be sufficient to only pick a small subset of the satellite image. Again, you will need to be careful in which CRS you define your subset.

As an example, let's now define a rectangle around our focus point. The rectangle should span a certain distance in kilometers. Hence, we first need to translate the requested width of the rectangle into coordinates of our CRS system. In applications with data from different latitudes of the world you will probably need to translate kilometers into latitude and longitude values of a global CRS.

An easy way to do this is to assume that the earth is a perfectly spherical shape. Still, we need to take into account that latitude circles around the globe are shorter the closer they are to one of the poles. Only at the equator the latitude circle has maximum size with one degree of longitude amounting to roughly 111 kilometers. The closer you get to the poles, the shorter the distance between two longitude lines will be. In the limit, the distance goes to zero.

Let's now compute the distance between longitude lines in the vicinity of our point of interest.

In [17]:
lon_val, lat_val = poi.coords.xy
In [18]:
lon_val[0]
Out[18]:
-115.3036586
In [19]:
one_degree_latitude_equator = 111
one_degree_longitude_equator = 111.321 # km
lat_scale_factor = math.cos(math.radians(lat_val[0]))
one_degree_longitude_at_point = one_degree_longitude_equator * lat_scale_factor
one_degree_longitude_at_point
Out[19]:
89.91331169978909

Hence, the distance between longitude lines is roughly 90 kilometers at our point of interest, as opposed to 111 kilometers at the equator. Given this information, we can easily translate kilometers into deltas in terms of latitude and longitude:

In [20]:
box_in_km = 60
lat_delta = box_in_km / one_degree_latitude_equator
long_delta = box_in_km / one_degree_longitude_at_point
In [21]:
lat_delta
Out[21]:
0.5405405405405406
In [22]:
long_delta
Out[22]:
0.6673094213272176

Using those delta values we can now create a rectangular box around our point of interest in our global CRS:

In [23]:
yup = lat_val[0] + lat_delta/2
ylow = lat_val[0] - lat_delta/2

xup = lon_val[0] + long_delta/2
xlow = lon_val[0] - long_delta/2
In [24]:
polygon_geom = Polygon([[xlow, ylow], [xup, ylow], [xup, yup], [xlow, yup]])
polygon = gpd.GeoDataFrame(index=[0], crs='epsg:4326', geometry=[polygon_geom])

The following image shows this rectangle together with a base map:

In [25]:
ax = polygon.plot(alpha=0.8, facecolor='None', edgecolor='black')
cx.add_basemap(
    ax, crs=polygon.crs.to_string(), source=cx.providers.CartoDB.Voyager
)

If we project the rectangular shape that we created in EPSG 4326 to the CRS of the satellite image, we will see that the rectangle gets tilted.

In [26]:
this_proj_str = '+proj=utm +zone=11 +datum=WGS84 +units=m +no_defs +type=crs'
polygon_reproj = polygon.to_crs(this_proj_str)
In [27]:
ax = polygon_reproj.plot(alpha=0.8, facecolor='None', edgecolor='black')
cx.add_basemap(
    ax, crs=polygon_reproj.crs.to_string(), source=cx.providers.CartoDB.Voyager
)

Keep in mind that our rectangle was only defined through four points. This means that the edges will not get transformed themselves, but only the four vertices of the rectangle. In the reprojected image the true projections of the vertices might not be straight lines anymore. In our image, however, those lines will be straight by construction, since the edges are plotted in a way that they combine the vertices with straight lines.

We can also manually compute the transformed vertices. For the upper left corner we get the new coordinates:

In [28]:
visual_tif.rio.transform() * (0,0)
Out[28]:
(600000.0, 4000020.0)

The lower right corner is given by the following coordinates of the new CRS:

In [29]:
visual_tif.rio.transform() * (visual_tif.rio.width, visual_tif.rio.height)
Out[29]:
(709760.0, 3890260.0)

Alternatively, these values can also be obtained from bounds which returns values in the following sequence: left, bottom, right, top

In [30]:
visual_tif.rio.bounds()
Out[30]:
(600000.0, 3890260.0, 709760.0, 4000020.0)

Now that we have the vertice coordinates in terms of the CRS of the satellite image we can easily pick the respective subset of the satellite image by clipping.

In [31]:
type(polygon_reproj)
Out[31]:
geopandas.geodataframe.GeoDataFrame
In [32]:
polygon_reproj
Out[32]:
geometry
0 POLYGON ((623038.430 3969109.826, 683297.173 3...
In [33]:
clipped = visual_tif.rio.clip(polygon_reproj.geometry)
clipped.plot.imshow()
plt.show()

As you can see, the image was clipped along the edges of the tilted rectangle that we obtained after reprojection. Hence, some of the pixels in the lower right corner do not contain values anymore, because they were outside of the clipping region.

In general there are two alternative ways how one could translate a given rectangular region into a subset of a satellite image. First, we could be really precise and map the full rectangle including the edges and not only the vertices. This way, however, we would most likely again end up with a shape other than a non-tilted rectangle and hence would get unfilled pixels again. A second way would be to only map the upper left and lower right vertices of the rectangle and then create a rectangular box from those vertices in order to avoid any unfilled pixels.

The source of the notebook can be found here