Skip to content

Green roofs: automatic detection of roof vegetation, vegetation type and covered surface from aerial imagery

Clotilde Marmy (ExoLabs) - Ueli Mauch (Canton of Zürich) - Swann Destouches (Uzufly) - Alessandro Cerioni (Canton of Geneva) - Roxane Pott (swisstopo)

Proposed by the Canton of Zürich and Canton of Geneva - PROJ-VEGROOFS
Project start in November 2023 - Intermediate publication on November 7, 2024

All scripts are available on GitHub.

This work by STDL is licensed under CC BY-SA 4.0


Abstract: With rising temperatures and increased rainfall, mapping green roofs is becoming important for urban planning in dense areas like Geneva, Zürich and the surrounding areas. Green roofs, whether engineered or spontaneous, provide cooling, rain capture, and habitats, supporting biodiversity. Using national aerial imagery and land survey data, the study focuses on identifying green roofs and distinguishing among various vegetation types, including extensive, intensive, spontaneous, lawn, and terrace categories. Machine learning and deep learning approaches have been developed to detect and classify green roofs in two study areas on the cantons of Geneva and Zürich. Regarding the machine learning setup, statistical descriptors for the roof occupancy were derived from airborne images to train a random forest and a logistic regression predicting if a roof was green or not. Metrics on the test dataset showed that the best performance was achieved by combining a random forest and logistic regression models, trained with pixel statistics from potential vegetated areas defined by NDVI and luminosity thresholds on the original images. This combination yielded a recall of 0.87 for the green class and an F1-score of 0.85. The approach leveraging a deep neural network for classification of the roofs in the six classes of the project is still in development.

1 Introduction

With the rise of temperatures, the intensification of rain events and the care for biodiversity, mapping green roofs for urban planning is gaining importance. In cantons with dense urban regions, like Canton of Geneva and Canton of Zürich, the presence of green roofs is an aspect to be taken into account in urban planning given the role they play in creating cool islands, capturing rainfall and hosting biodiversity.

Vegetation is generally found on flat roofs, flat part of roofs or slightly tilted roofs. Formally, green roofs are engineered systems for growing plants on rooftop, like extensive and intensive green roofs. The former one hosting mosses, grasses, small vegetation; the later one hosting lawn, bushes and even trees. The green roofs concept can be extended to spontaneous green roofs and terraces. The former ones are developing spontaneously. Both are considered as green roofs for biodiversity reasons.

The detection of green roofs can be addressed by different methods applied on aerial imagery: thresholding on NDVI bands, classification by machine learning based on engineered features, object detection by deep learning. In the literature 1, 2, the use of thresholds on NDVI gave versatile performance and required consequent manual work. This is due to the variability of the NDVI, either caused by meteorological events preceding image acquisition, or by the vegetation period during image acquisition, or because of other site-specific reflectance factors that have an impact on image rendering. On the other hand, the classification of entire roofs with traditional machine learning showed good ability 3. For detection of the green patches on roofs, object detection by deep learning has been explored 4,5, 6, but such method require a significant effort for ground truth (GT) labeling.

Although the binary problem, i.e. the distinction between bare and green roofs, is treated in the literature, no work on the multiclass problem was found. However, image classification technique is applied to classify roofs according to their geometries 7, 8 or according to their materials 9, 10. This encourages to try a similar approach for green rooftops classification into several classes.

The aim of this project is first to detect green roofs using national aerial imagery and land survey. Secondly, the project will explore the classification of green roofs into different existing types. Finally, the vegetated surface area of the green roofs will be estimated.

2 Study areas

The study areas on both cantons of Geneva and Zürich have been defined to contain a variety of green roof types and bare roofs. Figures 1 and 2 show the study areas.

Study area
Figure 1: Study area over the city of Geneva and surrounding area.
Study area
Figure 2: Study area over the city of Zürich and surrounding area.

3 Data

The main input data of the project are aerial images and a vector layer of labeled building footprints as a ground truth.

3.1 Aerial imagery

Aerial images acquired in early summer every six years by the national aerial imagery survey, SWISSIMAGE RS, have been used. This corresponds to acquisitions in 2022 and 2023, for Zürich and Geneva respectively. The project makes use of the 10 cm resolution and the red, green, blue and near infrared channels of the product. The original product is delivered in the form of raw image captures encoded in 16 bit. However, for lighter processing and normalization, the imagery is converted to regular-sized 8-bit images, hereafter referred to as SWISSIMAGE RS 8-bit.

In addition, an in-development product, derived from SWISSIMAGE RS, is also available for testing. It consists of SWISSIMAGE RS orthorectified on the roofs of the large-scale topographic landscape model of Switzerland swissTLM3D (layer TLM TLM_GEBAEUDE_FOOTPRINT). The advantage of this innovative data is that the tilt of the building is mostly corrected, so the image and the land survey vector layer are aligned as illustrated in Figure 3.

Study area
Figure 3: Tilted building on the orthophoto and orthorectified orthophoto for the rootop.

3.2 Ground truth

For training and testing of machine learning techniques, a ground truth is necessary. The building footprints documented in the land survey, established and maintained by the cantons, have been used as geometry for the ground truth. Then, the beneficiaries have visualized the footprints on top of the aerial imagery to attribute a vegetation tag (vegetated or not) and a class: bare, terrace, spontaneous, extensive, lawn and intensive as depicted in Figure 4.

Green roof classes
Figure 4: Along bare roofs, five classes of green roofs are present in the ground truth : green terraces and lawns, as well as spontaneous, extensive and intensive roofs.

The time travel function for SWISSIMAGE and the construction year of the buildings have been helpful to attribute the correct class (hopefully) without going on site. However, an error free ground truth is not ensured. Table 1 summarizes the diversity in the ground truth.

Table 1: Summary of the ground truth data, showing the number of roofs and their attribution into specific categories.

Class GE ZH Total Percentage
Bare 2102 875 2977 78.6
Extensive 47 398 445 11.8
Spontaneous 48 78 126 3.3
Lawn 64 23 87 2.3
Intensive 68 14 82 2.2
Terrace 50 17 67 1.8

Moreover, here are the characteristics of each class:

  • bare: In this project, roofs with less than 10% of vegetation cover are considered bare. They are made of roof tiles, concrete, metal, glass or solar panels.
  • extensive: They show a marble effect, due to height variation in the substrate and to the different species used: moss, sedum and grasses.
  • spontaneous: Roofs that have been spontaneously colonized by plants. Vegetation is likely to develop in depressions of the roofs and is more dependent of external factors. Patches can be observed. Spontaneous green roofs are heterogeneous (color, height of vegetation, texture). At a young state, they do not cover all the available space and, above all, when going back a few years in time, evolution is observable.
  • lawn: Lawn can be found on roof tops or on top of underground car parks (fake soil). This is kind of a sub-class of intensive green roofs.
  • intensive: Intensive roofs are made out of lawn, shrubs, bushes and trees. They grow on a thicker substrate than extensive roofs.
  • terraces: These are roofs with movable vegetation. Unlike green roofs, terraces are often designed for recreational use, although some can show quite developed vegetation.

3.3 Other data

In addition, the canopy height models (CHM) derived from LiDAR acquisitions have been used to mask pixels corresponding to the vegetation of overhanging trees as illustrated in Figure 5.

Study area
Figure 5: Illustration of an overhanging tree above a garage.

For the study area in Zürich, the available WMS of the CHM produced by the City of Zürich with a LiDAR acquisition of 2022 has been converted to a binary raster and vectorized. For the canton of Geneva, the already vectorized layer of the CHM from LiDAR acquisition of 2019 has been used.

4 Method

The Method chapter consists of two main parts: binary classification by machine learning and multiclass classification by deep learning.

4.1 Evaluation metrics

To evaluate the performance of the machine learning algorithms, traditional metrics have been chosen:

  • Overall accuracy (OA): the proportion of correctly predicted samples over the entire ground truth.
\[\begin{align} \ OA = {TP+TN \over P+N} \ \end{align}\]
  • Recall of the green class: measures how sensitive the model is to the green roofs.
\[\begin{align} \ Recall = {TP \over P} \ \end{align}\]
  • Balanced accuracy (BA): deals with imbalanced datasets as it corresponds to the average of recall obtained on each class.
\[\begin{align} \ Balanced Accuracy={{1 \over C} \sum_{\substack{i=1}}^{C} {TPi \over FN_i+TP_i}} \ \end{align}\]
  • F1-score: keeps an eye on the precision. The F1-score overcomes the limitations of overall accuracy in cases of dataset imbalance.
\[\begin{align} \ F1{-}score = {TP \over TP+0.5*(FP+FN)} \ \end{align}\]

In the three aforementioned equations, the variables used are:

  • TP are true positives, green roofs correctly predicted as such
  • TN are true negatives, bare roofs correctly predicted as such
  • FN are false negatives, green roofs not detected
  • FP are false positives, bare roofs predicted as green
  • P are the green roofs in the ground truth
  • N are the bare roofs in the ground truth

Each metric ranges between 0 and 1, respectively the lowest and the highest values to be measured.

4.2 Classification by machine learning

Machine learning algorithms make use of descriptors to learns characteristics about the classes to predict. In this project, descriptors were derived from the NRGB images.

4.2.1 Raster preparation

A first step was to compute the normalized difference vegetation index (NDVI) and luminosity rasters corresponding to the images following these equations:

\[\begin{align} \ NDVI = {NIR-R \over NIR+R}, \ \end{align}\]
\[\begin{align} \ Luminosity = {R + G + B}, \ \end{align}\]

where R, G, B and NIR stand respectively for the pixels of the red, green, blue and near infrared (NIR) bands. The resulting NDVI index is between -1 and 1, whereas the luminosity range depends on the image format: 0 to 765 in 8-bit, 0 to 196605 in 16-bit.

4.2.2 Overhanging trees

As mentioned in Section 3.3, it was observed on the images that some big trees besides buildings may cover the roofs and erroneously lead to detection of green roofs. The mask derived from the CHM was buffered by 1 m to exclude all misleading pixels.

4.2.3 Statistics per roof

After having filtered the bands for overhanging vegetation, computation of the following statistics of pixels per roofs were performed on the red, green, blue, NIR, luminosity and NDVI bands:

  • mean
  • median
  • minimum
  • maximum
  • standard deviation

This leads to 30 descriptors. For instance, for the roof in Figure 6, the statistics (min, max, mean, median, standard deviation) of the pixels in the green band (image on the right) are:

  • min = 6
  • max = 255
  • mean = 122.272
  • median = 123
  • standard deviation = 43.22

Furthermore, leaning of the buildings in the image can lead to mismatch between the building in the image and in the land survey. To overcome that, an inner buffer of 1 m was applied to the geometry prior to the statistic computation.

Illustration of statistics per roofs.
Figure 6: The statistics (min, max, mean, median, standard deviation) of the pixels within the roof perimeter are computed for each building. Here, for the green band (image on the right), the statistic values obtained are: min = 6, max = 255, mean = 122.272, median = 123 and standard deviation = 43.22.

4.2.3 Potential greenery

On Figure 6, one can see that the building footprint encompasses not only the roof but also a courtyard and that, on the roof, infrastructures like solar panels are also considered in the statistics. Therefore, the extensive roof in Figure 6 is likely to show different statistics than an extensive roof without courtyard and/or without solar panels. To overcome that and focus primarily on the vegetated area, the potential greenery area on each roof was extracted based on NDVI and luminosity threshold values, and then vectorized. The term "potential greenery" is chosen because in the extracted areas, pixels corresponding to bare materials may still be found.

To chose the threshold values to apply on the NDVI and luminosity rasters, one can load the rasters in a visualizer and evaluate the effect of thresholds via the styling of the layer.

This potential greenery vector layer offers an alternative layer to the one of the building footprint from the land survey. For instance, Figure 7 shows the potential greenery extracted from the roof. The statistics per roof can be recomputed for this layer.

Illustration of extracted potential greenery.
Figure 7: Extracted potential greenery for NDVI values greater than 0 and for luminosity values lower than 500.

4.2.3 Training and testing

The roofs in the ground truth were randomly split into a training set (70 %) and a test set (30 %) following the original multiclass distribution. Two machine learning algorithms were trained with the scikit-learn 11 library in Python, a random forest (RF) and a logistic regression (LR). The hyperparameters were optimized by means of a grid search strategy during training:

  • random forest:

    • number of trees to grow: 200, 500, 800.
    • number of features to test at split: square root of the number of descriptors plus or minus one. This leads to three values to test.
  • logistic regression:

    • solver: liblinear, newton-cg
    • regularization technique: l2
    • inverse of regularization strength: 1, 0.5, 0.1.
    • number of iterations: 200, 500, 800.

The random state is fixed before training the algorithms. The classes are weighted inversely proportional to the class frequencies in the input data.

When optimizing the training with the GridSearchCV function from the Python library scikit-learn, 5-fold cross-validation is performed and evaluated using balanced accuracy.

The trained models are evaluated on the test set, and compared with the balanced accuracy, recall and F1-score. For instance, the beneficiaries can opt for the model with less green rooftops missed (high recall) or the model with less errors (high F1-score).

The importance of the descriptors has been evaluated with the permutation_importance function of the scikit-learn Python library. It shuffles values for each descriptor and measures the change in the model performance for the given scorer to use. In the present case, the balanced accuracy was used. Afterwards, an ablation study of the descriptors is carried out to observe the effective contribution of different sets of descriptors to the model.

The models were optimized once for the binary problem: green or not; once for the multiclass problem: is the roof bare, a green terrace, a spontaneous green roof, an extensive green roof, a lawn or an intensive green roof.

Finally, the best set of descriptors are used to train a model on the SWISSIMAGE RS orthorectified on the building footprints of the TLM and compare the metrics with those obtained on the original images.

4.3 Multiclass classification by deep learning

[Currently in development]

5 Results and discussion

Results regarding the binary classification by machine learning and the multiclass classification by deep learning are presented and discussed directly.

5.1 Binary classification by machine learning

Before presenting the results obtained by machine learning, the intermediate results from the data preprocessing steps are shown.

5.1.1 Data preprocessing

Table 2 summarizes the composition of the GT after the preprocessing steps: inner buffer of 1 m and masking with the CHM. The ratios between the classes remain in the same order of magnitude as in the original dataset. It is also worth noting that the inner buffer of 1 m leads to exclusion of 65 roofs narrower than 2 m, which are mainly small bare surfaces of tiny built parts attached to buildings or garden sheds. 95 more roofs are excluded by the mask for the overhanging vegetation.

Table 2: Composition of the ground truth after preprocessing steps.

Class GT original GT after inner buffer of 1 m GT after inner buffer of 1 m
and after masking
with the CHM
Bare 2977 2915 2830
Extensive 445 445 445
Spontaneous 126 124 122
Lawn 87 86 82
Intensive 82 82 78
Terrace 67 67 67
Total 3784 3719 3624

Furthermore, it appears that the mask derived from the CHM was not excluding all the pixels corresponding to overhanging vegetation. Therefore, an additional subset of bare roofs with mean NDVI value greater than 0.05 has been excluded from the dataset. 43 bare roofs were concerned.

With this new version of the ground truth, the statistics on the red, green, blue, NIR, NDVI and luminosity bands were computed per roof. The results for the NDVI are given in Figure 8. One can observe, on the boxplot of the NDVI means, that the interquartile's range of the classes of the classes bare (b) and terraces (t) are largely overlapping. That is also the case between the terraces and spontaneous (s) classes, and between spontaneous and extensive (e) roofs. Furthermore, the intensive (i) class shows a wide interquartile range which is overlapping with those of the three aforementioned classes and lawn (l). Similar observations can be made about the boxplots of the means. The distribution of the minimum and maximum pixel values per roof per class shows also that similar values are to be find between classes, tough the distribution for the classes bare, spontaneous and extensive have a lower interquartile range than the others. Finally, from the distributions of the standard deviation, two groups are distinguishable: high standard deviations for the terraces, lawn and intensive classes; low ones for the bare, spontaneous and extensive classes. The former roofs have often a mix of bare materials and vegetation in a good health state; whereas the latter are often homogeneously covered and the spontaneous and extensive vegetation may be weak.

Boxplots of NDVI.
Figure 8: Boxplots of the statistics for the NDVI pixels per roof in the study area per class.

In Appendices 7.1, 7.2, 7.3, 7.4 and 7.5, the interested reader can visualize similar boxplots as depicted in Figure 8 respectively for the luminosity, near infrared, red, green and blue bands. A general conclusion is that the descriptors contain information even if overlap is observable between classes. There is the potential of leveraging ML algorithms to learn pattern from these data.

5.1.2 Parameter optimization and ablation of the descriptors

The best sets of hyperparameters after optimization of the random forest and logistic regression are given in Tables 3 and 4 for the sets of descriptors tested.

In Table 3, the best results of the runs made with the random forest were achieved with all the descriptors and with 800 trees grown and 6 descriptors tested at each split. During the optimization phase, the best model for each tested configuration have been kept. The results are given in Appendix 7.6. The evaluation of the models by means of the k-fold validation test indicated that all set of parameters performed similarly: 0.01 of difference in the k-fold mean balanced accuracy. This indicates that the range of parameters to test was suitable to extract information from the data.

Table 3: Metrics for the test set trained with random forest.

Descriptors # of trees # of descriptors Balanced accuracy Recall F1-score
ndvi+lum+nrgb 800 6 0.83 0.69 0.78
lum+nrgb 200 5 0.82 0.65 0.77
nrgb 800 5 0.83 0.67 0.78
rgb 200 5 0.80 0.60 0.73

Table 4 shows that the best model for the runs made with the logistic regression is for 200 iterations, 1 coefficient of penalty and the newton-cg solver. In Appendix 7.7, the rest of the optimized models can be found. Again, one can notice the similarity of performances (0.01 of difference in the mean balanced accuracy).

Table 4: Metrics for the test set trained with logistic regression.

Descriptors Iterations C Solver Balanced accuracy Recall F1-score
ndvi+lum+nrgb 200 1.00 newton-cg 0.89 0.86 0.80
lum+nrgb 200 1.00 newton-cg 0.89 0.87 0.81
nrgb 200 1.00 liblinear 0.89 0.86 0.80
rgb 200 0.50 liblinear 0.87 0.85 0.77

The permutation importance results indicated that the important descriptors are different for the random forest and the logistic regression. In the random forest, the six most important descriptors are:

  1. NDVI standard deviation
  2. NDVI mean
  3. standard deviation of the blue pixels
  4. NDVI median
  5. NDVI maximum
  6. NIR mean

The rest of the important descriptors are given in Appendix 7.8. Indeed, in the ablation study shown in Table 3, the recall goes from 0.69 to 0.65 after removing the descriptors derived from the NDVI pixels (ndvi). It decreases further from 0.67 to 0.60 when removing the descriptors derived from the NIR band. A lot of information is in the NIR band and by extension in the NDVI.

Regarding,the logistic regression, the six most important descriptors are:

  1. luminosity median
  2. standard deviation of the blue pixels
  3. mean of the red pixels
  4. mean of the green pixels
  5. luminosity standard deviation
  6. median of the green pixels

The rest of the important descriptors are given in Appendix 7.9. Those results highlight the fact that the logistic regression learns differently from the data than the random forest. Moreover, in the ablation study in Table 4, there are only 1% of decrease in recall and 3% in F1-score between the full model and the rgb model. These results indicate that the NIR band, while not providing much information for green roofs detection, helps slightly to avoid false positives

Furthermore, when comparing the recall in Tables 3 and 4, one observes that the LR is more sensitive to the green class than the RF (0.86 vs. 0.69). However, the balanced accuracy, which is the mean of the recall for the bare class and for the green class, indicates that the recall for the bare class in the RF is higher than the one in the LR.

Therefore, the mean of the probability estimates by LR and of the predicted class probabilities by RF for the green class have been computed to take advantage of both ways of learning from the data. For values higher than 0.5, the corresponding roofs have been considered as green; otherwise, they were assigned to the bare class. The metrics for the RF, LR and combination of both are summarized in Table 5. One can appreciate the stability of the balanced accuracy and the increase of performance for the F1-score; although more green roofs are wrongly classified than by the LR only, the overall classification is getting better. Knowing the imbalance of classes in the reality, this leads to way less errors in the outputs. However, for detection purposes with allocated resources for manual control, the LR model would be more appropriate.

Table 5: Metrics obtained for the test set after training with all the statistics per roofs.

Model Balanced accuracy Recall F1-score
RF 0.83 0, 69 0.78
LR 0.89 0.86 0.80
RF+LR 0.89 0.81 0.83

5.1.3 Performance on SWISSIMAGE RS and on SWISSIMAGE RS orthorectified on TLM

Finally, Table 6 shows the metrics of the test set with models trained on the descriptors derived from the projected orthophotos on rooftops. One can observe that a similar range of performances is reached. Therefore, it seems that the tilt of the buildings in the orthoimage and its implication on the calculation of the descriptors is negligible, or that the application of a negative buffer on the land survey footprint geometries has made it possible to focus on the roofs and not include too much of the inclined facade in the descriptors.

Table 6: Metrics for the test set on SWISSIMAGE RS and on SWISSIMAGE RS orthorectified on TLM.

Model Images Balanced accuracy Recall F1-score
RF SWISSIMAGE RS 8-bit 0.87 0.77 0.84
RF SWISSIMAGE RS orthorectified on TLM 0.89 0.81 0.87
LR SWISSIMAGE RS 8-bit 0.91 0.89 0.85
LR SWISSIMAGE RS orthorectified on TLM 0.89 0.85 0.83

5.1.4 Results on potential greenery areas

In a second step, the focus was put on potential greenery areas. The threshold values to apply on the NDVI and luminosity bands have been set to 0 and 500 respectively after visualizing the rasters in QGIS and masked for these values. An illustration is given in Figure 9. Pixels with a NDVI value smaller than 0 are overlaid with transparent blue and pixel with a luminosity value greater than 500 are overlaid with transparent red. The bright green pixels correspond to the potential vegetation identified.

Boxplots of NDVI.
Figure 9: Visualization of the threshold effect on the NDVI and luminosity rasters. Pixels with a NDVI value smaller than 0 are overlaid with transparent blue and pixel with a luminosity value greater than 500 are overlaid with transparent red. The vibrant green pixels correspond to the potential vegetation.

When referring to Figure 8 displaying the boxplots corresponding to the statistics of the NDVI band computed per entire roof, one can observe that a large majority of roofs have at least one pixel with a NDVI value greater than 0. When filtering the surface of the roofs according to NDVI and luminosity to focus on potential vegetated areas, 3189 out of 3624 roofs were indeed still included in the analysis as shown in Table 7.

Table 7: Comparison of the composition of the ground truth before and after filtering for the potential greenery area.

Class GT after inner buffer of 1 m
and after masking
with the CHM
GT after filtering on NDVI
and luminosity
Difference
Bare 2830 2397 433
Extensive 445 444 1
Spontaneous 122 121 1
Lawn 82 82 0
Intensive 78 78 0
Terrace 67 67 0
Total 3624 3189 435

Once again, the statistics of the NDVI, luminosity and NRGB pixels were computed per class, but this time, they were computed on the potential greenery. The boxplot of the NDVI mean of the class terraces in Figure 10, with a median around 0.18 instead of -0.18 in the boxplot of the NDVI mean per entire roof (see Figure 8), illustrates that the potential greenery layer allows to focus on the vegetated part of the terraces. Increase is also to be noted in the other classes, but the increase for the terraces is particularly interesting, as the median of the distribution reached those of lawns and intensive roofs; whereas it was similar to the median of the bare class before (see Figure 8).

The bare class benefits also from the threshold, with a median for the distribution of the NDVI similar to those of the classes spontaneous and extensive. Moreover, Figure 11 shows the boxplots for statistics on luminosity where it can be seen that the medians of luminosity are generally lower on the bare class than the others. This corresponds to the fact that higher NDVI values are mostly to be found on the part of roofs in shadow as highlighted by Figure 9.

Boxplots of NDVI.
Figure 10: Boxplots of the statistics for the NDVI pixels per potential greenery in the study area per class.
Boxplots of luminosity.
Figure 11: Boxplots of the statistics for the luminosity pixels per potential greenery in the study area per class.

The boxplots of statistics on the NRGB bands are given in Appendices 7.10, 7.11, 7.12 and 7.13.

Since the statistics showed different characteristics on the potential greenery area than the entire roofs, another optimization was performed on the training based on the potential greenery area. The corresponding optimized parameters and metrics on the test set are shown in Table 8. For the random forest, the best model is reached for 5 features to test at the split and 200 trees to grow. Regarding, the logistic regression, 200 iterations are performed with a penalty coefficient of 1 and the newton-cg solver. Again, the combination of predictions from the RF and the LR is computed and corresponding metrics are also given in Table 8.

In the last row of Table 8, the combination of the RF and LR on the entire roofs have been trained and evaluated on the same dataset than the potential greenery one. It is worth noting that the model trained with the descriptors computed on the potential greenery surfaces performs better at detecting the green roofs than the models trained with the descriptors computed over the entire roofs (0.87 vs 0.84 of recall), but in overall leads to more bare roofs predicted as green (0.85 against 0.86 of F1-score).

Table 8: Metrics for the test set for statistics over the potential greenery area.

Model Balanced accuracy Recall F1-score Optimized parameters
RF 0.87 0.76 0.82 # of features = 5
# of trees = 200
LR 0.88 0.87 0.80 C=1
# of iterations = 200
solver = newton-cg
RF+LR 0.91 0.87 0.85
RF+LR on entire roofs 0.90 0.84 0.86

5.1.5 Results and use

Result of the best model trained on the entire roofs and on the potential greenery are shown in Figure 12. According to the metrics, the user interested in detecting green roofs should use the combination of RF and LR trained on the potential greenery since this model is more sensitive to green roofs and produces a limited number of wrong predictions (second best F1-score obtained). The geometry of the potential greenery may help to fasten the control when zooming on the roofs, whereas aggregation of the results on the original geometry of roofs delivers a better overview of the situation.

Boxplots of luminosity.
Figure 12: Results on an inference area.

5.1.4 Multiclass classification insights

Results for the multiclass classification with traditional machine learning were not satisfactory. Confusion between classes indicated that scarce vegetation for terraces, spontaneous and extensive roofs leads to confusion with bare roofs. From these tests, it appears that global statistics of the samples are not sufficient for the task. Hence, a strategy including the spatial structure of the rooftop might be needed (e.g. Deep-learning approach).

5.2 Multiclass classification by deep learning

[Currently in development]

6 Conclusions and outlooks

This study showed the effectiveness of using aerial imagery and machine learning models in detecting green roofs.

In the machine learning parts, the results demonstrated the ability of a random forest and logistic regression algorithms to detect green roofs among bare roofs, based on vegetation and material reflectance in airborne images. The metrics, with a recall of 0.87 and an F1-score of 0.85 for the green class on the test set, reveal that the combination of both models trained on pixels statistics derived from vegetated areas defined by NDVI and luminosity thresholds, achieved the best performances. These metrics highlight the model’s ability to accurately detect green roof coverage, making it a reliable tool for large-scale urban mapping.

Some further outlooks and insights:

  • By training and testing the models with two areas separated by approximately 300 kilometers, with images acquired in two different years and with six types of roofs represented in the ground truth, the models have already a certain ability for generalization.
  • From the scores in the k-folds cross-validations in Annexes 7.6 and 7.7, it is to be expected that the metrics vary of approx. 5% according to the ground truth split into train and test sets.
  • Machine learning approach needing engineered features (descriptors) in entry let always rooms for improvement by including additional descriptors.
  • Finally, the experts mentioned that, even using SWISSIMAGE Time Travel WMTS and the construction year of the buildings, they could not insure an error free ground truth. This may have had an impact on model training and evaluation, as well as on manual correction of results in the future.

7 Appendixes

7.1 Boxplots of the statistics for the luminosity pixels per roof in the study area per class

Boxplots of luminosity.
Figure 13: Boxplots of the statistics for the luminosity pixels per roof in the study area per class.

7.2 Boxplots of the statistics for the near infrared pixels per roof in the study area per class

Boxplots of the NIR band.
Figure 14: Boxplots of the statistics for the near infrared pixels per roof in the study area per class.

7.3 Boxplots of the statistics for the red pixels per roof in the study area per class

Boxplots of the red band.
Figure 15: Boxplots of the statistics for the red pixels per roof in the study area per class.

7.4 Boxplots of the statistics for the green pixels per roof in the study area per class

Boxplots of the green band.
Figure 16: Boxplots of the statistics for the green pixels per roof in the study area per class.

7.5 Boxplots of the statistics for the blue pixels per roof in the study area per class

Boxplots of the blue band.
Figure 17: Boxplots of the statistics for the blue pixels per roof in the study area per class.

7.6 Results of the parameter optimization of the random forest

Table 9: Results of the parameter optimization of the random forest. Optimized parameters: number of trees to grow and number of descriptors to test at each split.

param_max_features param_n_estimators split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score
6 800 0.877 0.834 0.857 0.875 0.855 0.860 0.016
5 200 0.870 0.830 0.849 0.870 0.864 0.857 0.015
6 500 0.877 0.839 0.841 0.875 0.850 0.857 0.017
6 200 0.877 0.825 0.846 0.870 0.862 0.856 0.019
5 800 0.864 0.830 0.852 0.875 0.853 0.855 0.015
5 500 0.872 0.825 0.841 0.878 0.852 0.854 0.020
4 800 0.868 0.820 0.853 0.870 0.849 0.852 0.018
4 200 0.870 0.820 0.852 0.863 0.839 0.849 0.018
4 500 0.864 0.820 0.843 0.874 0.839 0.848 0.019

7.7 Results of the parameter optimization of the logistic regression

Table 10: Results of the parameter optimization of the random forest. Optimized parameters: penalty, penalty coefficient and solver.

param_C param_max_iter param_solver split0_test_score split1_test_score split2_test_score split3_test_score split4_test_score mean_test_score std_test_score
1 200 newton-cg 0.882 0.859 0.919 0.885 0.922 0.893 0.024
1 500 newton-cg 0.882 0.859 0.919 0.885 0.922 0.893 0.024
1 800 newton-cg 0.882 0.859 0.919 0.885 0.922 0.893 0.024
0.5 200 newton-cg 0.882 0.859 0.915 0.892 0.917 0.893 0.022
0.5 500 newton-cg 0.882 0.859 0.915 0.892 0.917 0.893 0.022
0.5 800 newton-cg 0.882 0.859 0.915 0.892 0.917 0.893 0.022
0.1 200 newton-cg 0.892 0.844 0.907 0.901 0.919 0.893 0.026
0.1 500 newton-cg 0.892 0.844 0.907 0.901 0.919 0.893 0.026
0.1 800 newton-cg 0.892 0.844 0.907 0.901 0.919 0.893 0.026
1 200 liblinear 0.881 0.856 0.908 0.893 0.912 0.890 0.020
1 500 liblinear 0.881 0.856 0.908 0.893 0.912 0.890 0.020
1 800 liblinear 0.881 0.856 0.908 0.893 0.912 0.890 0.020
0.5 200 liblinear 0.881 0.847 0.907 0.898 0.910 0.889 0.023
0.5 500 liblinear 0.881 0.847 0.907 0.898 0.910 0.889 0.023
0.5 800 liblinear 0.881 0.847 0.907 0.898 0.910 0.889 0.023
0.1 200 liblinear 0.889 0.840 0.896 0.898 0.900 0.885 0.022
0.1 500 liblinear 0.889 0.840 0.896 0.898 0.900 0.885 0.022
0.1 800 liblinear 0.889 0.840 0.896 0.898 0.900 0.885 0.022

7.8 Permutation importance of the random forest

Table 11: Permutation importance of the random forest.

Descriptor set Statistic % drop in balanced accuracy
NDVI standard deviation 0.086
NDVI mean 0.079
Blue standard deviation 0.014
NDVI median 0.013
NDVI maximum 0.011
NIR mean 0.011
NIR maximum 0.010
NIR median 0.006
NIR standard deviation 0.003
Green minimum 0.001
Red standard deviation 0.001
Red median 0.001
Blue median 0.001
NIR minimum 0.001
NDVI minimum 0.000
Luminosity minimum 0.000
Luminosity maximum 0.000
Luminosity mean 0.000
Luminosity medain 0.000
Luminosity standard deviation 0.000
Red minimum 0.000
Red maximum 0.000
Red mean 0.000
Blue minimum 0.000
Blue maximum 0.000
Blue mean 0.000
Green maximum 0.000
Green mean 0.000
Green median 0.000
Green standard deviation 0.000

7.9 Permutation importance of the logistice regression

Table 12: Permutation importance of the logistic regression.

Descriptor set Statistic % drop in balanced accuracy
Luminosity median 0.318
Blue standard deviation 0.277
Red mean 0.219
Green mean 0.207
Luminosity standard deviation 0.199
Green median 0.168
NIR mean 0.157
Green standard deviation 0.144
Luminosity maximum 0.131
Green maximum 0.110
Red median 0.110
Blue mean 0.108
Blue median 0.099
Luminosity minimum 0.056
Luminosity mean 0.045
NIR median 0.044
Red maximum 0.029
NDVI maximum 0.012
Red standard deviation 0.011
NIR minimum 0.010
Green minimum 0.008
Red minimum 0.002
NIR standard deviation 0.001
NDVI median 0.000
NDVI mean -0.001
Blue minimum -0.001
NDVI standard deviation -0.002
Blue maximum -0.003
NDVI minimum -0.003
NIR maximum -0.004

7.10 Boxplots of the statistics for the near infrared pixels per potential greenery per class

Boxplots of luminosity.
Figure 18: Boxplots of the statistics for the near infrared pixels per potential greenery in the study area per class.

7.11 Boxplots of the statistics for the red pixels per potential greenery per class

Boxplots of luminosity.
Figure 19: Boxplots of the statistics for the red pixels per potential greenery in the study area per class.

7.12 Boxplots of the statistics for the green pixels per potential greenery per class

Boxplots of luminosity.
Figure 20: Boxplots of the statistics for the green pixels per potential greenery in the study area per class.

7.13 Boxplots of the statistics for the blue pixels per potential greenery per class

Boxplots of luminosity.
Figure 21: Boxplots of the statistics for the green pixels per potential greenery in the study area per class.

8 Sources and references

Indications on software and hardware requirements, as well as the code used to perform the project, are available on GitHub: https://github.com/swiss-territorial-data-lab/proj-hetres/tree/main.

Other sources of information mentioned in this documentation are listed here:


  1. Grün Stadt Zürich. Extensive Flachdachbegrünungen in der Stadt Zürich. Technical Report, Grün Stadt Zürich, March 2017. URL: https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.stadt-zuerich.ch/content/dam/stzh/ted/Deutsch/gsz_2/publikationen/beratung-und-wissen/wohn-und-arbeitsumfeld/dach-vertikalgruen/dachbegr%25C3%25BCnung/ErfolgskontrolleFlachdachbegruenungen170329.pdf&ved=2ahUKEwi0ptz_v8KJAxXogf0HHdVEIZ0QFnoECAwQAQ&usg=AOvVaw0lJtD7ffmgNMzGfse2ns1G

  2. J Massy, P Martin, and N Wyler. Cartographie semi-automatisée des toitures végétalisées de la Ville de Genève. Géomatique Expert, 81(Juillet-Août):26 – 31, 2011. 

  3. Tanguy Louis-Lucas, Flavie Mayrand, Philippe Clergeau, and Nathalie Machon. Remote sensing for assessing vegetated roofs with a new replicable method in Paris, France. Journal of Applied Remote Sensing, 15(1):014501, January 2021. Publisher: SPIE. URL: https://www.spiedigitallibrary.org/journals/journal-of-applied-remote-sensing/volume-15/issue-1/014501/Remote-sensing-for-assessing-vegetated-roofs-with-a-new-replicable/10.1117/1.JRS.15.014501.full (visited on 2023-06-15), doi:10.1117/1.JRS.15.014501

  4. Annika Pauligk. Green Roofs 2020. March 2023. https://www.berlin.de/umweltatlas/_assets/literatur/ab_gruendach_2020.pdf. URL: https://www.berlin.de/umweltatlas/en/land-use/green-roofs/2020/methodology/ (visited on 2023-12-28). 

  5. Abraham Noah Wu and Filip Biljecki. Roofpedia: Automatic mapping of green and solar roofs for an open roofscape registry and evaluation of urban sustainability. Landscape and Urban Planning, 214:104167, October 2021. URL: https://www.sciencedirect.com/science/article/pii/S0169204621001304 (visited on 2023-10-26), doi:10.1016/j.landurbplan.2021.104167

  6. Charles H. Simpson, Oscar Brousse, Nahid Mohajeri, Michael Davies, and Clare Heaviside. An Open-Source Automatic Survey of Green Roofs in London using Segmentation of Aerial Imagery. preprint, ESSD – Land/Land Cover and Land Use, August 2022. URL: https://essd.copernicus.org/preprints/essd-2022-259/ (visited on 2023-03-21), doi:10.5194/essd-2022-259

  7. Yanjun Wang, Shaochun Li, Fei Teng, Yunhao Lin, Mengjie Wang, and Hengfan Cai. Improved Mask R-CNN for Rural Building Roof Type Recognition from UAV High-Resolution Images: A Case Study in Hunan Province, China. Remote Sensing, 14(2):265, January 2022. Number: 2 Publisher: Multidisciplinary Digital Publishing Institute. URL: https://www.mdpi.com/2072-4292/14/2/265 (visited on 2024-01-16), doi:10.3390/rs14020265

  8. M. Buyukdemircioglu, R. Can, and S. Kocaman. DEEP LEARNING BASED ROOF TYPE CLASSIFICATION USING VERY HIGH RESOLUTION AERIAL IMAGERY. The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, XLIII-B3-2021:55–60, June 2021. URL: https://isprs-archives.copernicus.org/articles/XLIII-B3-2021/55/2021/ (visited on 2024-01-16), doi:10.5194/isprs-archives-XLIII-B3-2021-55-2021

  9. Małgorzata Krówczyńska, Edwin Raczko, Natalia Staniszewska, and Ewa Wilk. Asbestos—Cement Roofing Identification Using Remote Sensing and Convolutional Neural Networks (CNNs). Remote Sensing, 12(3):408, January 2020. Number: 3 Publisher: Multidisciplinary Digital Publishing Institute. URL: https://www.mdpi.com/2072-4292/12/3/408 (visited on 2024-01-16), doi:10.3390/rs12030408

  10. Jonguk Kim, Hyansu Bae, Hyunwoo Kang, and Suk Gyu Lee. CNN Algorithm for Roof Detection and Material Classification in Satellite Images. Electronics, 10(13):1592, January 2021. Number: 13 Publisher: Multidisciplinary Digital Publishing Institute. URL: https://www.mdpi.com/2079-9292/10/13/1592 (visited on 2024-01-16), doi:10.3390/electronics10131592

  11. Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, Peter Prettenhofer, Ron Weiss, Vincent Dubourg, Jake Vanderplas, Alexandre Passos, David Cournapeau, Matthieu Brucher, Matthieu Perrot, and Édouard Duchesnay. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12(85):2825–2830, 2011. URL: http://jmlr.org/papers/v12/pedregosa11a.html (visited on 2024-11-01).