Automatic detection of soil degraded by human activities and potentially suitable for rehabilitation¶
Clémence Herny (Exolabs) - Gwenaëlle Salamin (Exolabs) - Clotilde Marmy (Exolabs) - Alessandro Cerioni (État de Genève) - Roxane Pott (swisstopo)
Proposed by the Canton of Ticino and the Canton of Vaud - PROJ-SDA
Mars 2024 to December 2024 - Published in December 2024
This work by STDL is licensed under CC BY-SA 4.0
Abstract: Each Swiss canton is required to make an inventory of potentially rehabilitatable soils for maintaining the land crop rotation quota. To assist the cantons in this task, the STDL has developed an artificial intelligence-based framework to automatically identify soils degraded by human activities, i.e. "non-agricultural activity" and "land movement". A deep learning model was trained to segment the extent of the detected human activity in a multi-year dataset of aerial imagery. The ground truth was vectorised by the Canton of Ticino and the Canton of Vaud. The trained model achieved a f1-score of 0.53, with better detection performance for the land movement class than for the non-agricultural activity class. The average results of the model can be explained by the limited number of ground truth elements, the complexity of the features to be detected and the diversity of the characteristics of the images used. The trained model was applied to historical imagery from 1946 to the present day for the two cantons. A vector layer showing the distribution of human activities by year was produced in just a few days for each canton. Recall was preferred to precision in order to obtain exhaustive results, but this implies a large number of FP detections. Therefore, a thorough review of the results is necessary before they can be used. Despite the average performance of the model, it allows the identification of new areas that can be added to the inventory and fasten the process compared to a fully manual process.
1. Introduction¶
The constant increase in population and economic growth are putting considerable pressure on agricultural land. The Federal law on land management, adopted in 1979, aims to regulate the use of agricultural land to guarantee the food independence to the Swiss population in the event of crises and supply problems. As part of the sectoral plan1, the high-quality arable lands that need to be protected have been secured in the form of land crop rotation areas (LCR or surface d'assolement (SDA) in French), with a minimum area allocated by canton. To be eligible as an LCR area, the soil must comply with specific criteria123 ensuring the quality of the land for agriculture. However, certain construction programs may impinge on these lands. In such cases, the area lost must be compensated for by the creation of a new LCR area of the same size. To identify areas that could be converted to LCR, Swiss Cantons must provide a register or an indicative map of land that could potentially be rehabilitated to meet the LCR criteria. Among those, soils degraded by past anthropogenic activities are of interest. This includes soils affected by landfills, construction sites, pollution, etc.
For this project, the STDL was solicited by the Canton of Ticino and the Canton of Vaud to develop a method to identify soils degraded by human activities in the past. The goal is to help the Cantons to establish the indicative map of potentially rehabilitated soils for LCR compensation by providing a vector layer with the delimitation of the human activity affecting soils.
Some cantons already established this inventory adopting different approaches mainly based on register consultation, field investigation, human memory, visual inspection of aerial images or detection of elevation changes456. The Canton of Ticino commissioned a company to identify potential rehabilitable soils in six municipalities in the Locarno region and had access to a study performed in the scope of a CFF train project in the Mogadino plain. Besides, they also developed a FME workflow based on LCR criteria and applied it to the Bellinzona valley, but it was not suitable for a large scale study.
Based on our experience with object segmentation78, we proposed to automatically segment human activities in aerial images available in Switzerland over the last 70 years with a deep learning approach.
In this report, we first present the areas of interest that will be studied. Next, the data used are described, including the image and the ground truth. We then present the deep learning method used. Next, we present and discuss the results of the model training and inference. Finally, we provide conclusion.
2. Study areas¶
Two cantons are considered in this study, the Canton of Ticino and the Canton of Vaud. Both cantons have established their LCR map and intend to finalise their indicative map of potential rehabilitable soils. Despite similar objective, their geography and climate raise different difficulties. On one hand, the canton of Ticino is mainly covered by high mountains, which limits the quota of the LCR area to 3,500 ha, but also the area eligible for rehabilitation. On the other hand, the Canton of Vaud displays larger lowland areas, with a quota of LCR to be maintained of 75,800 ha. Strong population growth and development are causing difficulties in the identification of land eligible for conversion.
Smaller area of interests (AoIs) were defined to test the inference (Fig. 1). For the Canton of Ticino, the AoI comprises the six municipalities of the Locarno region, the Mogadino plain and the Bellinzona valley, for which previous studies to find potential LCR were performed and can be used for result comparison. For the Canton of Vaud, an AoI located between the Jura mountains and Lausanne was selected.
3. Data¶
3.1 Images¶
Aerial orthophotos from 1946 to the present day from the swisstopo product SWISSIMAGE Journey were used (Table 1).
The images were captured in greyscale and colour using different instruments. Despite the rules of acquisition and image post-processing, the photometry and colour of the images can vary from year to year and from sensor to sensor. Some images were taken in winter rather than in summer, resulting in different colours of vegetation such as leafless trees or raw agricultural soils.
Product | Type | Year | Coordinate system | Spatial resolution |
---|---|---|---|---|
SWISSIMAGE 10 cm | RGB, numeric | 2017 - current | CH1903+/MN95 (EPSG:2056) | 0.10 m (\(\sigma\) \(\pm\) 0.15 m) - 0.25 m |
SWISSIMAGE 25 cm | RGB, numeric | 2005 - 2016 | MN03 (2005 - 2007) and MN95 since 2008 | 0.25 m (\(\sigma\) \(\pm\) 0.25 m) - 0.50 m (\(\sigma\) \(\pm\) 3.00 - 5.00 m) |
SWISSIMAGE 50 cm | RGB, photo | 1998 - 2004 | MN03 | 0.50 m (\(\sigma\) \(\pm\) 0.50 m) |
SWISSIMAGE HIST | greyscale, photo | 1946 - 1997 | MN95 | 0.50 m (\(\sigma\) \(\pm\) 1.0-5.0 m) |
The images are accessed via an XYZ connector using swisstopo's Web Map Tile Service (WMTS). Pre-rendred GEOTIFF tiles, with a size of 256 \(\times\) 256 pixels, are served on a grid of Cartesian coordinates (x, y, EPSG:3857 - WGS 84, Pseudo Mercator). The images at zoom level (z) 16 are chosen, with a resolution of 1.6 m px-1, as it is a good trade-off between model performance and numerical cost8. The images are fetched according to a desired year. Tiles with same coordinates but different years can exist in the dataset.
3.2 Ground truth¶
The ground truth (GT) was acquired manually by the beneficiaries. Two classes of human activities were defined:
-
Non-agricultural activity (Fig. 2, left): illegal activity on agricultural land, e.g. landfill, building, storage area, etc.
-
Land movement (Fig. 2, right): transport of material affecting soil, e.g. mineral extraction sites, quarry, backfill, excavation, construction site, etc.
The elements of the two classes are complex and have heterogeneous characteristics, particularly the elements of the non-agricultural activity class.
The labels were vectorised with SWISSIMAGE from various years (Fig. 3) in order to obtain a diverse set of images used to train the detection model. Efforts have been made to achieve GT with a good distribution between RGB and greyscale images, but it should be noted that more images and features are available in RGB images. Especially fewer elements of the non-agricultural activity class were vectorised in the greyscale images.
The two GT classes are balanced, with a total of 215 elements in the non-agricultural activity class and 234 elements in the land movement class (Table 2). We acknowledge that the number of elements is low for training a deep learning model.
Source | Non-agricultural activity | Land movement |
---|---|---|
Ticino | 73 | 93 |
Vaud | 175 | 146 |
Total | 215 | 234 |
3.3 Additional data¶
Potentially rehabilitable soils must meet defined criteria to be converted to LCR2. For example, a land can be located in an area that does not meet the criteria due to geographical parameters or that meets the criteria but is in conflict with another potential use. To properly identify potential LCR, beneficiaries provided us with vector layers containing geographical and land use information, so that we could cross-reference them with the results. These layers of objects of interest (OBI) include LCR areas already mapped, current and future buildings area, polluted areas and protected areas, for instance. The elevation and slope information is extracted from the Switzerland Digital Elevation Model (DEM, approx. 25 m px-1) and the sloping terrain layer respectively.
4. Method¶
4.1 Semantic segmentation¶
To perform the automatic detection of anthropogenic soils, we use a deep learning approach using the object detector framework7 developed by the STDL, the detailed description of which can be found here. It allows performing instance segmentation on georeferenced data based on the detectron2 framework9.
The model is initially trained with tiles intersecting the labels and randomly split into three datasets: the training dataset (70%), the validation dataset (15%), and the test dataset (15%). The two classes described in Section 3.2 were similarly distributed across datasets and the distribution is fixed. The model hyperparameters were calibrated to obtained the best results. The selected model (Section 5.2) was trained with two images per batch, a learning rate of 5 x 10-3 with 500 iteration steps and 200 warm-up iterations. The training was performed over 7000 iterations and lasts about 2 hours on a machine with 32 GB RAM and a NVIDIA Tesla T4 GPU. The optimal detection model corresponds to the one minimising the validation loss curve, in this case around 3000 iterations. The trained model is evaluated and used to perform detection. Each detection made by the model is given a confidence score from 0 to 1.
4.2 Metrics¶
The model performance was assessed globally and by class by comparing the results with the GT and computing the following metrics:
-
Precision: number of correct detections among all the detections produced by the model.
\[precision = \sum_k \frac{TP_k}{(TP_k + FP_k)}\] -
Recall: number of correct detections predicted by the model among all the GT labels.
\[recall = \sum_k \frac{TP_k}{(TP_k + FN_k)}\] -
F1-score: the harmonic average of the precision and the recall.
\[f1 = 2 \times \frac{recall \times precision}{recall + precision}\]
with (1) TP, true positive, i.e. the detection is correct ; (2) FP, false positive i.e. the detection is not correct or (3) FN, false negative i.e. the labelled object is not detected by the algorithm. k corresponds to the label class.
To evaluate our multi-class model, we have chosen to calculate micro-average metrics because the primary objective is to detect as many objects as possible regardless of their class and the classes are balanced. Knowing the object class is a secondary objective.
4.3 Multi-year dataset¶
The framework can handle training and detection on images from several years in an AoI. The operator assigned to each label the year of the image used to vectorise it. Based on this year, downloaded tiles are assigned a unique identifier in the form (year, z, x, y). The model is evaluated by spatially comparing labels and detections of the same year (Fig. 4).
4.4 Image processing¶
Training a model based on heterogeneous colour images can make the detection task more difficult, particularly with a reduced ground truth dataset. To homogenise the image dataset, we propose to:
- convert the RGB images to greyscale images (Fig. 5, top) to match the pre-1998 greyscale images. We define the greyscale brightness as the linear combination of the RGB bands, based on the rec601 standard:
-
colourise the greyscale historical images to RGB (Fig. 5, bottom) using a deep learning framework10.
-
homogenise the image colours by applying histogram matching method such as the ones offered by the rasterio plugin rio-hist or the scikit-image library.
4.5 Empty tiles¶
The object detector offers the possibility of using tiles without annotations, hereafter referred to as "empty tiles", during training. These tiles can be added to the dataset either randomly from a given AoI or directly as input. In particular, a list of FP labels obtained with a previous model can be provided to select the corresponding empty tiles (hereafter referred to as "FP tiles"). This allows us to confront the algorithm with problematic cases and try to improve its performance.
4.6 Detection integrity¶
We assume that nearby detections on different tiles belong to the same object. A buffer of 10 m is then used to merge detections across tiles, ensuring the detected object to be correctly delimited (Fig. 6). The average detection score is calculated for each feature. The class of the feature with the largest area is assigned to the final merged polygon. The processed results can be re-evaluated for direct comparison with the vectorised GT features.
4.7 Result filtering¶
The results were inferred over the AoI. However, as mentioned in Section 3.3, not all areas of the AoI meet the criteria for the LCR area or the detections may conflict with other usage. To provide the beneficiaries with the most comprehensive information, we have created a detection layer with attributes that can be filtered according to the user's needs (Fig. 7).
First, the detection polygons are intersected with the polygons of other OBI provided by the beneficiaries (Section 3.3). The ratio between the overlap area and the detection area is calculated. A value of 0 indicates that there is no overlap, while a value of 1 indicates that the detection is completely overlapped by a feature of the OBI layer. Based on the needs of the beneficiaries, detections overlapping OBI layers were excluded by deleting the intersecting areas of the detections (e.g. detections overlapping lakes). Besides, we provide information on the proportion of detection polygons for which the slope is greater than the threshold set at 18%2 thanks to the ‘sloping terrain’ layer.
Secondly, according to the LCR criteria, the area must be greater than 500 ha, unless the area is contiguous with another LCR area. We therefore calculate the surface area and minimum distance from an LCR polygon for each detection polygon. A distance of 0 m indicates that the polygons are in contact.
Thirdly, altitude influences the climatic zones favourable for the establishment of an LCR. The altitude of the centroid is calculated for each detection polygon from the DEM of Switzerland.
Fourthly, the confidence score given by the deep learning algorithm is provided for each detection, which can be used as a filter by the user.
5. Results¶
5.1 Model performance¶
The results of model training for several parameters are presented in Table 3.
Model | Image | Ground truth | False positive tiles | Image batch size | Learning rate | Raw detections | Processed detections | |||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Score threshold | Global | Non-agricultural activities | Land movement | Score threshold | Global | Non-agricultural activity | Land movement | |||||||||||||||||||
Precision | Recall | F1-score | Precision | Recall | F1-score | Precision | Recall | F1-score | Precision | Recall | F1-score | Precision | Recall | F1-score | Precision | Recall | F1-score | |||||||||
1 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 2 | 0.005 | 0.3 | 0.48 | 0.38 | 0.42 | 0.43 | 0.23 | 0.30 | 0.50 | 0.53 | 0.52 | 0.05 | 0.44 | 0.67 | 0.53 | 0.41 | 0.53 | 0.47 | 0.45 | 0.79 | 0.58 | |
2 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 2 | 0.005 | 0.40 | 0.43 | 0.36 | 0.39 | 0.44 | 0.27 | 0.33 | 0.42 | 0.43 | 0.43 | 0.05 | 0.41 | 0.64 | 0.50 | 0.38 | 0.55 | 0.45 | 0.43 | 0.72 | 0.54 | |
3 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 2 | 0.005 | 0.45 | 0.41 | 0.39 | 0.40 | 0.44 | 0.25 | 0.32 | 0.39 | 0.52 | 0.45 | 0.05 | 0.28 | 0.65 | 0.39 | 0.25 | 0.49 | 0.33 | 0.30 | 0.79 | 0.43 | |
4 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 2 | 0.005 | 0.25 | 0.45 | 0.36 | 0.40 | 0.44 | 0.32 | 0.37 | 0.46 | 0.41 | 0.43 | 0.05 | 0.37 | 0.59 | 0.46 | 0.28 | 0.49 | 0.36 | 0.47 | 0.68 | 0.56 | |
5 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
275 | 2 | 0.005 | 0.45 | 0.51 | 0.40 | 0.45 | 0.49 | 0.28 | 0.35 | 0.52 | 0.52 | 0.52 | 0.05 | 0.31 | 0.66 | 0.42 | 0.27 | 0.53 | 0.36 | 0.34 | 0.77 | 0.47 | |
6 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 4 | 0.005 | 0.35 | 0.44 | 0.40 | 0.42 | 0.37 | 0.30 | 0.33 | 0.49 | 0.49 | 0.49 | 0.05 | 0.32 | 0.67 | 0.44 | 0.27 | 0.54 | 0.36 | 0.37 | 0.78 | 0.50 | |
7 | RGB + greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 6 | 0.001 | 0.40 | 0.45 | 0.28 | 0.35 | 0.00 | 0.00 | - | 0.46 | 0.53 | 0.50 | 0.05 | 0.19 | 0.55 | 0.28 | 0.08 | 0.14 | 0.10 | 0.23 | 0.90 | 0.37 | |
8 | greyscale | Non-agricultural activity = 215 Land Movement = 234 |
0 | 2 | 0.005 | 0.35 | 0.46 | 0.33 | 0.39 | 0.52 | 0.21 | 0.30 | 0.44 | 0.45 | 0.45 | 0.05 | 0.33 | 0.61 | 0.43 | 0.34 | 0.46 | 0.39 | 0.32 | 0.74 | 0.45 | |
9 | RGB | Non-agricultural activity = 215 Land Movement = 234 |
0 | 2 | 0.005 | 0.45 | 0.39 | 0.36 | 0.37 | 0.41 | 0.17 | 0.24 | 0.38 | 0.52 | 0.44 | 0.05 | 0.24 | 0.63 | 0.34 | 0.18 | 0.39 | 0.25 | 0.27 | 0.84 | 0.41 | |
10 | RGB + greyscale | Non-agricultural activity = 215 | 0 | 2 | 0.001 | 0.30 | 0.63 | 0.33 | 0.44 | 0.63 | 0.33 | 0.44 | - | - | - | 0.05 | 0.42 | 0.52 | 0.47 | 0.42 | 0.52 | 0.47 | - | - | - | |
11 | RGB + greyscale | Land Movement = 234 | 0 | 2 | 0.001 | 0.25 | 0.82 | 0.36 | 0.50 | - | - | - | 0.82 | 0.36 | 0.50 | 0.05 | 0.55 | 0.52 | 0.54 | - | - | - | 0.55 | 0.52 | 0.54 | |
12 | RGB + greyscale | Non-agricultural activity = 120 Land Movement = 131 |
0 | 2 | 0.001 | 0.30 | 0.43 | 0.28 | 0.34 | 0.00 | 0.00 | - | 0.43 | 0.47 | 0.45 | 0.05 | 0.14 | 0.48 | 0.21 | 0.06 | 0.10 | 0.07 | 0.16 | 0.80 | 0.27 |
Firstly, we note that replicate models (models 1, 2, 3, 4) trained with the same input parameters lead to different metric values, up to 15%, for the selected models. Deep learning algorithms display random behavior, but the influence on the final results should be negligible. The non-deterministic behaviour of detectron2 has been recognised111213, but no suitable solution has been provided yet.
Secondly, the mean value of the global f1-score of the raw results is 0.40 ± 0.03 (standard deviation) for the models trained with two classes. These scores are average and show a promising but limited ability of the model to detect the two classes in the multi-year dataset. For a threshold on the confidence score optimizing the f1-score, the precision is higher than the recall in most cases, meaning that the models tend to miss the objects of interest or detect them with low confidence score. The detections of the land movement class performed significantly better, 20 points on average, than the ones of the non-agricultural activity class.
Thirdly, the evaluation of the post-processed detections was performed without score filtering to maximise the number of TPs according to the needs of the beneficiaries. Accordingly, the recall is higher than the precision, reaching 0.63 ± 0.04 for the models trained with two classes. The global f1-score is 0.42 ± 0.07.
Fourthly, we assessed the influence of different parameters:
-
Addition of GT elements: Increasing the number of GT elements improves the measurements. Model 12 was trained with a previous version of the GT containing about half of the GT elements shown in Table 2. The f1-scores before and after processing are lower than those of the models trained with the GT elements in Table 2, with a value of 0.34 and 0.21 respectively. In particular, the class "non-agricultural activity" performs poorly.
-
Addition of FP tiles: Adding empty tiles (Model 5) with previously detected FPs does not improve the metrics, but it should be noted that it does not reduce them either despite the addition of tiles containing objects that could potentially be a source of confusion.
-
Number of image per batch: Increasing the number of images per batch (Model 6 and Model 7) has no influence or decreases performance for the non-agricultural activity class.
-
Convert RGB images to greyscale: No influence is observed in the metrics (Model 8).
-
Convert greyscale images to RGB: No influence is observed in the metrics (Model 9).
No parameter significantly improve the performance of the model, apart from adding elements to the GT. However, these observations must be treated with caution, because it is difficult to decipher the influence of the parameters on the variability metrics due to the non-deterministic behaviour of the algorithm.
Finally, Model 1 was selected for the inference because it gave satisfactory metrics before and after processing (Table 3). With this model, the land movement class is better detected and less mixed up than for the non-agricultural activity class (Table 4).
Class | TP | FP | FN | misclassified |
---|---|---|---|---|
Non-agricultural activity | 117 | 154 | 74 | 28 |
Land movement | 195 | 206 | 41 | 11 |
Figure 8 shows that no year performed significantly better or worse than the others.
5.2 Inference¶
The model selected in Section 5.1 was used to make inferences over the Canton of Ticino and the Cantons of Vaud for the SWISSIMAGE years covering the territories. This took about two to three days per canton.
As mentioned in Section 5.1, the beneficiaries preferred higher recall than precision. Consequently, no threshold was applied to the detection score (Table 3). The aim is to maximise the completeness of the detection of objects of interest, but this also implies the presence of a large number of FP detections. The detections need to be reviewed carefully before use.
We recognise that there are a significant number of FP detections in the provided results, some of which even have high detection scores (Fig. 9). The main sources of confusion for the model vary according to class. For the non-agricultural activity class, it is mainly caused by the presence of a feature, e.g. trees or bushes, in the middle of a field, isolated houses with outdoor storage, storage buildings or boats parked in the harbour (the latter can easily be eliminated by filtering the polygons that touch the bodies of water). For the land movement class, brownish fields in RGB images and lighter coloured fields in greyscale images are the main source of error. In addition, rock outcrops, river sandbeds, and variations in the colour/texture of fields are identified. The case of an open area surrounded by forest is causing issues for both classes. These examples are a source of confusion for the algorithm, but can sometimes also be a source of confusion for the human eye.
The beneficiaries are overall satisfied with the provided layer of human activity affecting soils for each year. Although the model metrics are average, the results provide additional information and have enabled them to identify areas that had not previously been registered, particularly for the oldest years. By retaining all the detections and cross-referencing them with other vector layers of interest, beneficiaries have access to an information tool that can be customised to suit their specific needs.
6. Discussion and perspectives¶
The results obtained with our deep learning approach are promising, but they are accompanied by a large number of FPs and average detection performance. Below we discuss several issues that may be at the root of these average results and propose potential solutions to improve them.
6.1 Detection review¶
Based on the model performance and the fact that no score filtering was applied, the results consist of a large number of FPs. It is therefore necessary to review them manually before using them. Given the number of detections in some years (thousands of detections), this is a tedious task. We are currently working on a detection reviewing tool that will greatly simplify this task.
6.2 Ground truth¶
The quality and quantity of the GT is an essential element while using deep learning models. As shown in Section 5.2, it is the main parameter influencing the results quality. However, some constraints prevent us to improve the GT and so the trained models.
The objects we are trying to detect are complex. They are made up of several elements and their contours are not always easy to delineate. Each class presents a certain heterogeneity that can blur the characteristics of the key elements (Fig. 2). This is particularly the case for the elements of the class "non-agricultural activity", which also has few examples in the greyscale images (Fig. 3). This may explain the poorer detection performance for this class compared to the class "land movement".
In addition, some GT elements are ambiguous and can be mixed between classes (Fig. 10). However, misclassification, although it reduces the metrics, is not critical in the context of this project, as the priority is to detect human activities regardless of their class.
The GT used to train the deep learning models is limited, with only 200 to 250 elements for each class, spread across different image years. GT vectorisation is a tedious task, but further expansion of the GT would benefit model performance. In addition, defining sub-classes with more homogeneous characteristics could enable better model training.
6.3 Images¶
In addition to the heterogeneity of the ground truth, a heterogeneous image dataset is used. The images have different characteristics depending on the year and the acquisition conditions (Section 3.1). Despite our efforts to homogenise the image dataset, i.e. by converting the RGB images to greyscale or by colourising the greyscale images and homogenising the colour histogram, the impact on model performance is negligible. In the future, we may consider using more sophisticated methods such as the one of Nguyen et al. (2024)14, proposing a deep learning approach to manage with irregular time intervals between image acquisition and overcoming different image characteristics. This method was successfully applied to the same SWISSIMAGE dataset to monitor the forests of the Swiss Alps using image segmentation.
6.4 Model variability¶
Unfortunately, the variability of the model results is not negligible (Section 5.2). It can be difficult to disentangle the influence of the model and the dataset parameters from the non-deterministic behaviour of the algorithm. This also poses problems for the reproducibility of our results. We will try to mitigate this effect in our future projects by following a suggestion from the developer community12.
To thwart the variability of the models, one solution may be to train several models and to infer the results using these models over the AoI. Intersecting the results can help to identify "strong" and "weak" detections, in addition to the confidence score, which is unfortunately not always reliable (Fig. 9). If a detection is present in several results, there is a good chance that it is a TP. On the other hand, if the detection only appears in one model, there is a good chance that it is a FP. To help the beneficiaries to select detections, we plan to provide a layer with a recurrence index in the future.
6.5. Detection of elevation changes¶
The proposed method allows the detection of human activities in aerial photographs. Image renewal is at least 1 year, but often more than 3 years. Therefore, short or mid-term activities may be missed and the inventory may be incomplete.
An alternative method is to detect elevation changes456 induced by mass movement, either excavation or filling. This can be achieved by subtracting multi-temporal DEM/DSMs for an AoI.
In Switzerland, high resolution DEM (resolution: 0.5 m to 2 m and vertical accuracy: ± 0.3 m to 0.5 m) is available with the product swissALTI3D. It is derived from airborne LiDAR acquisitions made since 2012 and updated every 6 years.
DSMs for previous years can be calculated by photogrammetry from historical aerial images6. The production of large-scale DSMs by photogrammetry from aerial images is complex because it is subject to distortion. But recent advancements have demonstrated that DSM with an accuracy of 0.3 m to 0.5 m ± 3.9 m (RMSE) could be achieved in Switzerland6.
This method has the advantage over imagery of capturing all changes occurring over the year range separating the DEM/DSMs. However, it is limited by the vertical resolution, which makes it impossible to detect ground movement at shallow depths. The use of this method as a complement to the image-based method presented here would improve the inventory of areas degraded by human activities.
7. Conclusion¶
The Swiss Cantons have to provide an inventory (a register or a map) of potentially rehabilitatable soils, e.g. soils degraded by human activities, to be converted into LCR, if necessary, to maintain quotas. To achieve this objective, the STDL has developed a framework based on a deep learning approach to automatically detect soils degraded by human activities and classified into two classes, namely "non-agricultural activity" and "land movement". The model obtained promising, albeit modest, results, achieving a f1 score of 0.53, with better performance for the ground movements features than for those of non-agricultural activity. Only increasing the number of features in the ground truth seemed to significantly improve the performance of the model.
The final product delivered to the cantons consists of vector layers of detections inferred in SWISSIMAGE from 1946 to the present for each canton. Overall, the beneficiaries are satisfied with the results, despite the average performance of the model. Indeed, the results are very useful for identifying new areas of interest in the images within tens minutes to a few hours by year and by canton. We are aware that the results are accompanied by numerous FP detections that need to be carefully examined before use. This remains a tedious task and we are working on a review tool to speed up this stage.
For a land to be converted to LCR, a number of criteria must be met. Additional information is provided on the interaction with other features of interest that may potentially compete with LCR. This information will assist beneficiaries in making their land use decisions. Further investigations should be carried out by the beneficiaries such as identify the characteristics of the soil, assess the necessary rehabilitation measures and contact the landowner.
The method developed uses the SWISSIMAGE product, which is updated every year and available for the whole Switzerland. Therefore, the inventory can be updated with new image acquisition and applied to other cantons.
Code availability¶
The codes are stored and available on the STDL's GitHub page:
- proj-sda: framework for detecting human activities affecting agricultural soils
- object-detector: object detector framework
Acknowledgements¶
This project was made possible thanks to a tight collaboration between the STDL team, the Canton of Ticino, and the Canton of Vaud. In particular, the STDL team acknowledges key contribution from Alex Sollero (Canton of Ticino), Marie Zoélie Künzler (Canton of Vaud), Michael Lanini (Canton of Ticino), Gioele Gentilini (Canton of Ticino), and Romane Claustre (Canton of Vaud). This project has been funded by "Strategie Suisse pour la Géoinformation".
-
Office fédéral du développement territorial ARE. Plan sectoriel des surfaces d'assolement (SDA). 2020. URL: https://www.are.admin.ch/are/fr/home/developpement-et-amenagement-du-territoire/strategie-et-planification/conceptions-et-plans-sectoriels/plans-sectoriels-de-la-confederation/sda.html (visited on 2024-12-11). ↩↩
-
Basler & Hofmann. Carte indicative des sols valorisables et réhabilitables pour des compensations SDA. March 2021. URL: https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.are.admin.ch/dam/are/it/dokumente/raumplanung/dokumente/bericht/anleitung-hinweiskarte-fff-20210312.pdf.download.pdf/anleitung-hinweiskarte-fff-20210312-fr.pdf&ved=2ahUKEwiqtKH2qZ-KAxV4TKQEHXyvHUsQFnoECBsQAQ&usg=AOvVaw16dmv2iJ4fO7dns6B6Wy57. ↩↩↩
-
Canton de Vaud. Comment identifier de nouvelles surfaces d'assolement? 2022. URL: https://www.vd.ch/territoire-et-construction/amenagement-du-territoire/proteger-les-surfaces-dassolement-sda/identifier-de-nouvelles-sda. ↩
-
Stephan Nebiker, Natalie Lack, and Marianne Deuber. Building Change Detection from Historical Aerial Photographs Using Dense Image Matching and Object-Based Image Analysis. Remote Sensing, 6(9):8310–8336, September 2014. URL: http://www.mdpi.com/2072-4292/6/9/8310 (visited on 2024-03-26), doi:10.3390/rs6098310. ↩↩
-
Giuseppe Esposito, Fabio Matano, and Marco Sacchi. Detection and Geometrical Characterization of a Buried Landfill Site by Integrating Land Use Historical Analysis, Digital Photogrammetry and Airborne Lidar Data. Geosciences, 8(9):348, September 2018. URL: http://www.mdpi.com/2076-3263/8/9/348 (visited on 2024-03-26), doi:10.3390/geosciences8090348. ↩↩
-
Christian Ginzler, Livia Piermattei, Mauro Marty, and Lars T. Waser. Four nationwide Digital Surface Models from airborne historical stereo-images. EGU-2024, March 2024. URL: https://meetingorganizer.copernicus.org/EGU24/EGU24-5142.html (visited on 2024-12-13), doi:10.5194/egusphere-egu24-5142. ↩↩↩↩
-
Alessandro Cerioni, Clémence Herny, Adrian Meyer, and Gwenaëlle Salamin. Object detector framework. December 2024. URL: https://tech.stdl.ch/TASK-IDET/. ↩↩
-
Clémence Herny, Shanci Li, Alessandro Cerioni, and Roxane Pott. Automatic detection and observation of mineral extraction sites in Switzerland. January 2024. URL: https://tech.stdl.ch/PROJ-DQRY-TM/. ↩↩
-
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask R-CNN. January 2018. arXiv:1703.06870 [cs]. URL: http://arxiv.org/abs/1703.06870, doi:10.48550/arXiv.1703.06870. ↩
-
Elisa Mariarosaria Farella, Salim Malek, and Fabio Remondino. Colorizing the Past: Deep Learning for the Automatic Colorization of Historical Aerial Images. Journal of Imaging, 8(10):269, October 2022. URL: https://www.mdpi.com/2313-433X/8/10/269 (visited on 2024-12-11), doi:10.3390/jimaging8100269. ↩
-
J Rausch. Repeated training not deterministic despite identical setup and reproducibility flags. https://github.com/facebookresearch/detectron2/issues/4260, May 2022. Issue #4260. ↩
-
Collin Mac Carthy. Manual seed does not work as expected. https://github.com/facebookresearch/detectron2/issues/4438, July 2022. Issue #4438. ↩↩
-
ASDen. A simple trick for a fully deterministic roialign, and thus maskrcnn training and inference. https://github.com/facebookresearch/detectron2/issues/4723, December 2022. Issue #4723. ↩
-
Thiên-Anh Nguyen, Marc Rußwurm, Gaston Lenczner, and Devis Tuia. Multi-temporal forest monitoring in the Swiss Alps with knowledge-guided deep learning. Remote Sensing of Environment, 305:114109, May 2024. URL: https://linkinghub.elsevier.com/retrieve/pii/S0034425724001202 (visited on 2024-12-11), doi:10.1016/j.rse.2024.114109. ↩