Detection of occupied and free surfaces on rooftops¶

Clémence Herny (Exolabs) - Gwenaëlle Salamin (Exolabs) - Alessandro Cerioni (État de Genève) - Roxane Pott (swisstopo)

Proposed by the Canton of Geneva - PROJ-ROOFTOPS
Mars 2023 to January 2024 - Published in May 2024

This work by STDL is licensed under CC BY-SA 4.0

Abstract: Free roof surfaces offer great potential for the installation of new infrastructure such as solar panels and vegetated rooftops, which are essential for adapting cities to climate change. The arrangement of objects on rooftops can be complex and dynamic. Inventories of existing roof objects are often scarce, incomplete and difficult to update, making it difficult to assess their potential.
In this project, in collaboration with the Canton of Geneva, we have developed and tested three methods to automatically identify occupied and free surfaces on roofs: (1) classification of roof plane occupancy based on a random forest, (2) segmentation of objects in LiDAR point clouds based on clustering and (3) segmentation of objects in aerial imagery based on deep learning. The results are vector layers containing information about surface occupancy. True orthophotos and LiDAR data acquired over the canton of Geneva in 2019 were used. The methods were developed using a subset of 122 buildings selected to be representative of a diversity of objects and roofs, and on which the ground truth objects were manually vectorized.
The developed methods achieved satisfactory performance. About 85% of the roof planes were correctly classified. The segmentation method was able to detect most of the objects with f1 scores of 0.78 and 0.75 for the LiDAR-based segmentation and the image-based segmentation respectively. The global shape of the occupied surface was more difficult to reproduce with a median intersection over the union of 0.35 and 0.37 respectively.
The results of all three methods were considered satisfactory by the experts, with 70% to 95% of the results considered acceptable. Considering the quality of the results and the computational time, only the classification method was selected for an application at the cantonal level.

1. Introduction¶

To address the challenges of the climate crisis and the ecological transition, local authorities need to adapt their land use policies. One possible measure is to use the surface available on rooftops to install new infrastructure while minimizing the impact on land use. For instance, solar panels can be installed on rooftops to produce local energy with a minimal impact on the landscape¹. Rooftops can also accommodate vegetated areas, promoting biodiversity in cities and mitigating the heat island effect².
Accurate knowledge of available rooftop surface and an inventory of the existing infrastructure, such as solar panels and vegetated rooftops, are required to plan and prioritize future investments. Ignoring rooftop objects could firstly lead to overestimating the potential for new infrastructure, such as the solar potential¹, and secondly, extend the process of new installations. Unfortunately, information on this topic is often scarce and difficult to keep up to date, especially in big cities, limiting our understanding of the current situation. This can be explained by the number and diversity of roofs and roof objects. In addition, the rooftop landscape is dynamic and requires regular monitoring.

With increasing urbanization and the need for sustainable cities, there is a growing interest in knowing the potential of rooftops. The availability of high-resolution satellite and aerial imagery, as well as LiDAR data, along with the development of advanced numerical methods, has yielded to the multiplication of studies.
The crowdsourcing approach³ makes it possible to vectorize objects on a large scale, but requires a large workforce and can suffer from a lack of homogeneity. Computer vision-based solutions show promising results for segmenting objects of interest. A deterministic approach based on pixel analysis and a 3D building model, developed by Narjabadifam et al. (2022)⁴, was able to detect suitable areas for installing solar panels, taking into account large roof objects (e.g. ventilation). The watershed method is commonly used for image segmentation. It can detect small objects (e.g. roof windows) in high-resolution images but involves a complex workflow to achieve satisfactory results⁵. Deep learning (DL) methods are used to train detection models for objects of interest such as solar panels⁶^, ⁷^, ⁸, vegetated roofs⁵, superstructures on roofs⁹ or free roof surface¹⁰^, ¹¹ with variable performances depending on studies and studied objects. The main difficulty in training DL models is the availability of a qualitative dataset of labels⁹ as the production of such dataset is a time-consuming task.
LiDAR data is often used to assess the solar potential of rooftops by segmenting their main planes¹²^, ¹³. Continuous improvements in point density make it possible today to retrieve the detailed morphology of the roof, including superstructures (e.g. dormers) but also smaller objects, such as chimneys. Therefore, segmentation of objects protruding from flat roof planes provides valuable information about the surface available on rooftops.

In this context, the State of Geneva, through the Cantonal Office for Energy (OCEN) and the Cantonal Office for Agriculture and Nature (OCAN), contacted the STDL to explore possibilities of improving knowledge of rooftops. Both offices have developed methods for producing vector layers for solar panels and vegetated rooftops, respectively, but neither provided a satisfactory level of automation, accuracy, or completeness. Besides, information on other objects present on the rooftops, like air conditioners, pipes or windows, is incomplete. Therefore, both offices expressed the need to further automate the detection of free roof surfaces to assess the potential, define realistic objectives and strategies to achieve them, and prioritize investments.
The objective for the STDL was to produce a vector layer of the occupied and free surfaces on roofs in the canton of Geneva.
In this report, we first describe the data used, including high-resolution aerial imagery, 3D LiDAR point clouds and available vector layers of rooftops. We then present the methods and results of three approaches developed to evaluate free rooftop surface, namely, (1) LiDAR-based classification of roof occupancy, (2) LiDAR-based object segmentation, and (3) image-based object segmentation. Next, we discuss the possibility of combining the results of the different methods to improve the results. Finally, we provide conclusions on the ability of the developed methods to address the problem and on the most appropriate solution.

2. Input data¶

2.1 LiDAR point cloud¶

The LiDAR point cloud was acquired in March 2019 by the State of Geneva. It has a density of 25 pts/m², an altimetric accuracy of +/- 10 cm and a planimetric accuracy of 20 cm. It is distributed in georeferenced tiles of 500 m each.
The point cloud is classified into 11 classes, including a "building" class. This class includes the whole building without distinction for the facades, rooftop or roof superstructures. Within the framework of the classification of the roof plane occupancy, the presence of the class "building" was evaluated, as explained in Section 4.1.2. To avoid the influence of classification errors, points from all classes were considered in the LiDAR segmentation.

2.2 True orthophotos¶

The RGB aerial imagery was acquired in May 2019 by the State of Geneva with a ground sampling distance of 5 cm. A true orthophoto was derived based on a photomesh. It has a ground sampling distance of 6.8 cm. The product, available on request, is served as RGB GeoTIFF images with a size of 500 m.
True orthophotos are more complicated to obtain than orthophotos, and thus rarer. Their use was motivated by the fact that orthorectification aligns the roofs and bases of buildings. As a result, the objects detected on true orthophotos have the true position, allowing us to compare our results with those obtained with LiDAR data.

2.3 Delimitation of the roofs¶

Information on building roofs is provided by the roof vector layer produced by the State of Geneva. It includes the main roof planes and some superstructure elements, defined by their area between 1 m² and 9 m². Each roof has been assigned the following attributes:

Roof plane identifier;
Federal identifier of the building, called EGID;
Minimum and maximum altitude of the roof plane.

The vector layer is regularly updated to reflect of the destruction and construction of buildings. The version used for this project was downloaded in March 2023.

2.4 Ground truth¶

In the Canton of Geneva, several vector layers exist for roof objects (see the SITG catalog) but are incomplete for the purposes of our project. Consequently, it was decided to produce a precise ground truth (GT) dedicated to the project instead of using existing layers. It consists of a vector layer segmenting all the visible objects on the roofs, the objects partially covering the roofs, such as trees, as well as the delimitation of free surfaces. This work was performed manually on the 2019 true orthophotos. A single GT was produced for both the LiDAR and the true orthophoto datasets as they are aligned and synchronized in time. All vectorized objects were assigned to a class listed in Figure 1.

Bar chart of training and test distribution of object classes

Figure 1: Number of objects per ground truth class for the training and test datasets.

The GT is a list of 122 buildings chosen to be representative of the diversity (villas, industrial buildings, old town...). Of these, 105 were used to develop and optimize the workflows, i.e. as a training dataset, and 17 were used to check the stability of metrics, i.e. as a test dataset. The labeled objects in the GT, occupying surfaces on the selected roofs, represent about 50% of the total surface in both training and test datasets (Table 1).

Dataset	Number of buildings	Occupied area (m²)	Free area (m²)
Training subset	25	3,087	14,147
Training	105	57,303	60,526
Test	17	6,214	7,415

Table 1: Occupied and free surface areas for the different ground truth datasets. The training subset is specific to the image segmentation workflow (see Section 6.1.4).

Buildings were classified by occupation, hereinafter referred to as the building type, and roof typology, hereinafter referred to as the roof type, to evaluate the impact of these parameters on the results.
The following building types were selected:

7 administrative,
18 industrial buildings, including commercial buildings,
97 residential buildings.

The following roof types were selected:

72 flat roofs,
29 pitched roofs,
21 mixed (combination of the first two classes) roofs.

Note that all administrative and industrial roofs have a flat roof and all the pitched roofs are residential. The GT was used to optimize and assess the different workflows. No custom training was done for this project.

3. Evaluation of the results¶

3.1 Metrics¶

The performance of the developed methods was evaluated by computing the number of GT labels detected, namely, the precision P and the ability of the algorithm to be exhaustive with its detections, namely the recall R. The two were combined to obtain the f1 score. The respective formulas are presented below:

\(\mbox{P} = \frac{\mbox{TP}}{\mbox{TP} + \mbox{FP}}\)
\(\mbox{R} = \frac{\mbox{TP}}{\mbox{TP} + \mbox{FN}}\)
\(f_1 = 2\times \frac{\mbox{P}\;\times\;\mbox{R}}{\mbox{P}\; +\; \mbox{R}}\)

with:

TP: true positives, i.e. a detection overlapping a label above a given threshold
FN: false negative, i.e. a missed label
FP: false positive, i.e. an excess detection

The main challenge in calculating these metrics is related to the count of TP. Indeed, roof objects can have complex shapes, such as pipes, aeration outlets and solar panels, which can be vectorized in many ways, all of them possibly correct (Fig. 2). It can be difficult to reproduce the labels with detections, especially from one algorithm to another. Several detections may well cover one label, just as one detection may well cover several labels and be equally correct for all.

Figure 2: Illustration of different approaches to object vectorization and possible segmentation results. Solar panels can be vectorized as a group (left), as lines (middle) or as individual panels (right).

To account for this aspect, a connected-component method was adopted. Graphs of overlapping detections and labels were generated as illustrated in Figure 3. A detection was considered to overlap a label when more than 10% of the detection surface was covered. All the elements in a connected graph were tagged as TP. Detections within the group were merged and the assigned TP value is equal to the number of labels within the connected graph. The labels and detections that were not part of any connected graph were assigned FN and FP labels respectively.

Example of intersections between graph and labels

Figure 3: Labels (a) and detected obstacles (b) for the EGID 1005001, the corresponding graphs for the numbered elements on the balconies (c) and the resulting merged tagged detections (d).

In addition to object detection, the ability to reproduce the shape of the occupied surface was evaluated. The main objective of the project is to recover the delimitation of occupied and free surfaces. Because of the difficulty of pairing detections and labels and the fact that it is not necessary to delimit the object inside an occupied surface, we calculated the intersection over union (IoU) of the detections and the labels at the roof scale:

\[\begin{align} \ \mbox{IoU} = {A_{detections \cap labels} \over A_{detections \cup labels}} \ \end{align}\]

with:

\(A_{detections \cap labels}\): the area shared by detection and label;
\(A_{detections \cup labels}\): the union area between detections and labels.

The median IoU (mIoU) of all the roof provide the evaluation metric for the dataset considered.

The optimal value for the selected metrics, i.e. f1 score and mIoU, is 1.

3.2 Hyperparameter optimization¶

The algorithms used and developed in this project involve numerous hyperparameters. We adopted the Optuna framework to automate the search for the value of each hyperparameter giving the best results. The optimization was performed for the LiDAR segmentation and the image segmentation workflows. Although the values to be optimized are different, the strategy is similar.

We sought to maximize the f1 score and the mIoU. The search for the best hyperparameter value was performed using the Tree-structured Parzen Estimator¹⁴ (TPE) algorithm. At each iteration, the workflow was executed from segmentation to assessment. At the end of the process, the best hyperparameter combinations optimizing the metrics were provided.
In addition, the relative importance of precision compared to recall can be tuned by adding one of these metrics to the list of value to be optimized.

The hyperparameters obtained for the whole training dataset are referred to as "global". Specific optimization can be performed given the building type or the roof type to take into account of specific features. In this case, the obtained hyperparameters are referred to as "specialized".

3.3 Evaluating the relevance of the detections¶

In addition to the selected metrics, the results were analyzed in terms of object characteristics relevant to the project objective, i.e. providing indications of potential surface available for the installation of new facilities such as solar panels and vegetated rooftops.
The experts expect to get an estimate of the free surface available to estimate the potential. Therefore, the occupied and free areas obtained with the different methods were computed and compared to the GT to evaluate the accuracy.
In addition, the continuity of the roof surface is an important parameter to consider when installing facilities. It depends on the size of the objects and their position on the roof. A large object or an object located in the middle of a roof can constitute an obstacle. To evaluate the models' ability to detect such objects, the surface area of the object and the position of its centroid relative to the roof edge were computed, and the metrics were analyzed accordingly.

4. Classification of roof plane occupancy¶

A first method was developed to identify potentially occupied and free surfaces on rooftops. It consists of using statistics derived from LiDAR data as an indicator of occupancy. We assumed that some LiDAR properties can capture the presence of objects on the target roofs. For instance, changes in intensity could be caused by the LiDAR hitting different objects. In addition, a surface covered with objects is likely to be rougher than a flat, free surface. Zonal statistics on these two parameters, intensity and roughness, were used in addition to the LiDAR classification and roof plane area to classify roof planes into three classes:

“occupied”: no usable surface for additional facilities;
“potentially free”: possible free surface;
“undefined”: surface under which the LiDAR point cloud is not classified as “building”.

4.1 Method¶

4.1.1 Initial processing¶

First, the intensity values of the LiDAR points classified as building were interpolated with inverse distance weighting and converted to raster. Second, a DEM was computed from the LiDAR point cloud and roughness was derived and saved as raster. The Python library WhiteboxTools was used for this processing. The roughness was calculated at a scale of 1 m, which was the smallest possible scale. The produced rasters of intensity and roughness have a resolution of 0.3 m/px.

Zonal statistics of intensity and roughness were computed for each roof plane and used to classify them with manual thresholds and a random forest (RF), as described in the next two sections. If a roof plane extended over several tiles, then the result was kept for the tile with the largest overlap.

The initial processing was performed for all the roofs of the 45 LiDAR tiles containing GT and eight test tiles selected in the city center, representing a total of 95,699 roof planes. It took around 30 minutes to create the rasters and get the zonal statistics for the LiDAR tiles, while the classification took less than a minute with 32 GB of RAM and a i7-1260P CPU.

4.1.2 Classification with manual thresholds¶

Roof planes smaller than 2 m² were classified as "occupied", because they are too small for solar or vegetated installations. In addition, roof planes for which the LiDAR point cloud was classified as "building" for less than 25% of the area were classified as "undefined". To classify the remaining roof planes, thresholds were set on the statistical values presented in Table 2. They were selected to reflect the variations in intensity and roughness induced by the presence of objects on the roof, as well as the presence of non-building classes in the LiDAR point cloud.

Variable	Threshold
Margin of error of intensity	400
Standard deviation of intensity	5500
Median roughness (m)	7.5
Overlap with interpolated pixels not classified as building (%)	25

Table 2: Variables considered to classify the roof planes and the thresholds at which they are classified as occupied.

A roof was classified as occupied if it exceeds the threshold for at least one statistical value. The thresholds were set through trials and errors until we came to a satisfying result.
The resulting classification was reviewed by the experts for 650 roof planes. A satisfaction rate was calculated (Section 4.2.2). It corresponds to the recall, i.e. the proportion of correctly classified roof planes relative to the total roofs of the class in the reviewed dataset. Further tests were performed to improve it by adjusting the thresholds, but no better combination could be found.

4.1.3 Classification with random forest¶

To avoid classification based on arbitrary thresholds, RF was used with zonal statistics (Tables A1 and A2 in Appendix A). The manual threshold classification, reviewed by the experts, was used as GT (Table 3) to train two RFs, one for each office. The roof planes of the class "undefined" and the ones smaller than 2 m² were ignored.

Office	Potentially free	Occupied
OCAN	258	324
OCEN	301	297

Table 3: Correct classification of the roof reviewed by the experts and used as the ground truth for the random forest.

The GT was split with 80% of the roof planes for training and 20% for testing. Satisfaction rates were calculated on the test dataset to evaluate the performance.

Only roofs that could be used for potential solar and vegetated installations were classified by the RF. We excluded the roof planes smaller than 2 m² that are automatically classified as "occupied", and the roof planes classified as "undefined".

4.2 Results¶

4.2.1 Classification¶

Examples of roof plane classification obtained with the manual thresholds and the RF models are shown in Figure 4. The results for the OCAN are closer to the results obtained with the manual thresholds than the ones for the OCEN. In addition, let us note that the RF for the OCAN is classifying more roof planes as "occupied" than the RF for the OCEN.

Figure 4: Results of the manual thresholds, the random forest for the classification of occupancy for the OCAN and the OCEN.

The visualization of the results shows that not only roofs with obstacles are classified as "occupied", but also some small or narrow empty roof planes, because they display high median roughness and/or high minimum roughness.
The roof planes classified as "undefined" can often be considered as occupied due to the presence of vegetation or walkways. The corresponding areas in the LiDAR point cloud is mostly classified as ground and vegetation.

4.2.2 Expert assessment¶

OCEN and OCAN experts are generally satisfied with the classification based on the manual thresholds (Table 4), with global satisfaction rates ranging from 83% to 89%.

Office	Global	Occupied	Potentially free	Undefined
OCAN	89%	86%	93%	66%
OCEN	83%	84%	81%	-

Table 4: Satisfaction rates of the OCAN and OCEN experts with the classification of 650 roof planes using manual thresholds. Global satisfaction rates were computed only for planes classified as "occupied" and "potentially free". The review of planes classified as "undefined" is not available for OCEN.

Satisfaction rates for the "occupied" roof planes are similar for both offices, while that for the "potentially free" roof planes is 12 points higher for OCAN, reaching an excellent score of 93%. For the OCEN expert, small roof planes are more easily considered as occupied than for the OCAN expert. The OCAN expert approved the "undefined" class in 66% of the cases, while this class was not reviewed by the OCEN expert.

Manual threshold classification

	Global	Occupied	Potentially free
OCAN	79%	70%	91%
OCEN	77%	72%	82%

RF classification

	Global	Occupied	Potentially free
OCAN	86%	78%	96%
OCEN	83%	74%	91%

Table 5: Satisfaction rates of the OCAN and OCEN experts on the test dataset for the two classification methods.

The satisfaction rates obtained with the manual thresholds and the RF on the test dataset are presented in Table 5. The classification with RF outperforms the manual thresholds, with satisfaction rates increasing by 7 and 6 points for OCAN and OCEN respectively. Note that the satisfaction rates are improved by between 2 and 9 points, for the "occupied" and "potentially free" classes.

4.2.3 Variable importance¶

The influence of the variables considered in the two RF models can be identified by their relative importance (Tables A1 and A2) provided by the algorithm.
The models are consistent, with four common variables, namely the margin of error (MOE) of intensity, the median roughness, the mean roughness and the minimum roughness, in the top 5 most influential variables with an importance higher than 7%. Note that the ranking differs. In particular, the median roughness is showing the greatest divergence, with a difference of 11 points between the two models. It plays the most important role in the OCAN's RF (19.3%) while its role is limited in the OCEN's model (8.3%). Difference in importance for the other variables does not exceed 3.2 points. The roof plane area plays a non-negligible role in the OCEN's RF (13.6%), while this is less the case in the OCAN's RF (5.8%). The standard deviation of intensity has a greater influence in the OCAN's RF (7.8%) than in the OCEN's RF (4.6%). The percentage of overlap with non-building pixels with less than 2%, is the least important parameter for both RF.

4.3 Discussion¶

4.3.1 Manual thresholds vs RF¶

Although both the manual threshold and the RF methods give satisfactory results (Tables 3 and 4), classification with RF is better. This result was expected, as RF is a machine learning algorithm based on 14 variables, whereas the threshold method involves manual adjustments of only 4 variables.
The choice of a small number of variables for the manual thresholds was made for simplicity sake. Our choices of selecting the MOE of intensity and the median roughness were pertinent as these variables are among the most influential (Tables A1 and A2). The standard deviation of intensity plays a stronger role for the OCEN's model but its significance remains limited (7.8%). Selecting the percentage of overlap with non-building data appears to not be relevant as this variables comes last in the list of relative importance (< 2%) for both RF models.
On the other hand, we missed important variables in the manual thresholds such as the minimum roughness and the mean roughness of roof planes playing a significant role (> 10%) in the RF models. The mean roughness was absent of the manual thresholds as it was considered redundant with the median roughness.

Both methods have their advantages. The manual threshold method is easy to set up and does not require GT, while the RF method is automated, i.e. it does not require an operator to perform manual testing, which can be tedious.

4.3.2 Classification of small roof planes¶

When using the manual thresholds, small or narrow roof planes are often classified as "occupied", because of their median roughness above the threshold. As the roughness was calculated at a scale of 1 m (Section 4.1.1), objects located up to 1 m away from the pixel will affect its value. As a result, the roughness of small or narrow roof planes is more influenced by their surroundings than the larger ones. This is the case, for example, with empty roof planes receding or protruding from other planes.
This interpretation is supported by the fact that the minimum roughness of roof planes is a critical parameter in the RF (Tables A1 and A2). For the considered roughness scale of 1 m, the minimum value strongly depends on the dimensions of the roof plane. A large roof plane can have a low minimum value, because the obstacles on it and its surroundings do not affect the roughness values over the whole plane as can be the case for a small one.

Small unobstructed roof planes could be used for the installation of solar panels or vegetation. However, the more receding or protruding they are, the more difficult it is to install facilities on them. In addition, because of the limited benefits they would represent in comparison to the effort necessary to develop them, they are not a priority in the planning strategy of the Canton of Geneva. Therefore, the fact that the algorithm often classifies small roof planes as occupied suited the experts.

4.3.3 Differences between random forests¶

The differences in the results obtained for OCAN and OCEN can be explained by their different requirements (Tables 3, A1 and A2).

From the OCAN's point of view, which aims to develop vegetated rooftops, some surfaces already covered with low vegetation can be considered as "occupied". Conversely, the presence of some obstacles on the roof plane may not prevent the installation of vegetated rooftops and can be considered as "potentially free". The tolerance of the presence of sparse objects on roof planes could be captured by the median roughness driving the OCAN's RF classification.
From the OCEN's point of view, which aims to install solar panels, large continuous area are required for a roof plane to be considered as "potentially free". This is consistent with the fact that the surface area of roof planes and the minimum roughness are critical parameters in OCEN's RF.

4.3.4 Relevance of the methods¶

The primary goal of classifying roof planes is to provide a product that assists experts in identifying available surfaces for the installation of future equipment. The surfaces classified as "potentially free" need to be examined to assess their actual potential. The surfaces classified as "occupied" are assumed to be unusable and should not be taken into account when estimating potential. It is therefore important to obtain robust results for this class. The experts did not specify a minimum satisfaction rate, but were satisfied with the provided results.
Thus, it is planned to apply the developed method on a larger scale for use by the experts.

It should be recognized that the classification only evaluates the occupancy of a roof plane. Other factors such as roof slope or roof material were ignored. In addition, although the LiDAR intensity was normalized, its value can vary from one acquisition campaign to another, potentially affecting the results of the classification.

5. LiDAR segmentation¶

The goal of this second method based on LiDAR point cloud is to detect objects on rooftops. It is assumed that each roof plane can be approximated by a flat plane and that obstacles protrude from it. The processing resulted in the production of a vector layer of occupied and free surfaces per building.

5.1 Method¶

The roof plane vectors were merged by EGID to obtain the roof delimitation for each building. Next, the point cloud was clipped according to the roof shape using WhiteboxTools. If the building extended over several LiDAR tiles, the clipped point clouds were merged. Finally, the point clouds were filtered with the minimum altitude of the roof to retain only the roof points.

Roof segmentation was performed per building using Open3D.
Each plane in the 3D point cloud was segmented using the RANSAC algorithm. The DBSCAN algorithm was applied to the points of the potential plane to mitigate noise. The cluster with the largest number of points was retained and considered as a roof plane. This process was repeated with the rest of the point cloud for the expected number of roof planes, given by the roof vector layer, as long as enough points remained. Finally, the remaining points were clustered using DBSCAN and considered as obstacles.
Despite our endeavors to fix the seed and make the process deterministic, slight variations remained in the output of the RANSAC algorithm. However, the observed impact was only a few hundredths on the final metrics.

The planes and obstacles were transformed from point clusters to concave polygons using the alpha shape algorithm. A minimum, respectively maximum, threshold was set on the projected area of the planes, respectively obstacles. If a polygon had a value that did not meet the threshold of its category, its category was changed.

The results were evaluated using the metrics described in Section 3.1. Note that the GT was adapted for the optimization of this method. Indeed, LiDAR segmentation is unable to detect low objects such as lawns, extensive vegetation and empty terraces and balconies. However, these objects can occupy entire flat roofs, creating a bias in the optimization of the process that would tend to segment entire roof planes as objects. Therefore, the aforementioned objects were excluded from the ground truth when running the optimization.
As explained in Section 3.2, the hyperparameters of the RANSAC and the DBSCAN algorithms, as well as the thresholds on area, were optimized for the training dataset and for subsets based on the building type and the roof type. Combinations were tested between results obtained with different sets of hyperparameters, depending on the types.

The resulting detection shapes were unsatisfactory. They were shaky and sometimes had a lot of overlap due to the 3D component of the LiDAR data as visible in Figure 5 (left). To improve the rendering, the polygons were smoothed by buffering and cropping and by applying the Visvalingam-Wyatt algorithm. Though the polygons aspect have been improved by the simplifications, they still present a shaky aspect (Fig. 5, right). Then, the overlapping detection polygons were merged for each EGID and compared to the roof extend to create a partition of the occupied and free surfaces on roofs. This post-processing was performed after the optimization.

Original and simplified results of LiDAR segmentation

Figure 5: Original (left) and simplified (right) polygons obtained with LiDAR segmentation.

Finally, the results were submitted to the OCAN and OCEN's experts for assessment.

5.2 Results¶

Results of the LiDAR segmentation representing the free and occupied space.

Figure 6: Example of LiDAR segmentation results.

Figure 6 shows the results for seven buildings of the GT.

5.2.1 Effect of the optimized hyperparameters¶

A f1 score of 0.70 and a mIoU of 0.34 were obtained on the adapted GT with the global hyperparameters. The specialized hyperparameters have different influence according to the building and roof type considered (Table 6). Administrative buildings and pitched roofs show lower f1 scores than those of other categories with values around 0.70. The mIoU is lower than 0.5 for all subsets.

Global hyperparameters

Metric	Administrative	Industrial	Residential		Flat	Mixed	Pitched
f1 score	0.41	0.70	0.72		0.73	0.71	0.59
mIoU	0.14	0.50	0.30		0.42	0.45	0.13

Specialized hyperparameters

Metric	Administrative	Industrial	Residential		Flat	Mixed	Pitched
f1 score	0.63	0.72	0.72		0.67	0.68	0.49
mIoU	0.11	0.49	0.38		0.44	0.23	0.31

Table 6: Metrics obtained with the global and specialized hyperparameters on the subsets for each type of building and roof with the adapted GT.

The f1 score obtained for administrative buildings using specialized hyperparameters is improved by about 50%, while the mIoU is reduced by about 20%. The impact of specialized hyperparameters on the segmentation of industrial and residential buildings is rather limited, with a variation of less than 2.5%, except for the mIoU of the residential buildings, which increases by about 25%.
The flat roofs do not benefit from specific optimization, with variations in the f1 score and mIoU of less than 8%. On the contrary, the pitched roofs are affected by the use of specialized hyperparameters, with a decrease in the f1 score of about 27% and an increase in mIoU of 150%. In this specific case, the general hyperparameters favor the segmentation of the entire roof as an obstacle, while the specialized hyperparameters improve the distinction between roof planes and obstacles (Fig. 7). The metrics of mixed roofs are a combination of the two previous types, with a f1 score that is little affected by the use of specialized hyperparameters (< 5%) and a mIoU 50% lower.

Global vs specialized parameters on pitched roofs

Figure 7: Comparison of results obtained with global (left) and specialized (right) hyperparameters on buildings with pitched roofs.

To take advantage of the best segmentation results, combinations were produced (Table 7).

Metric	Global	Combined with administrative buildings	Combined with pitched roofs
f1 score	0.72	0.73	0.70
mIoU	0.37	0.37	0.41

Table 7: Metrics obtained for the global results and for their combination with specialized results on the training dataset with the adapted GT.

The influence of combining the global results with the specialized hyperparameter ones is limited. The metrics vary by less than 4%, except for the mIoU, which improves by 11% when combined with the optimized results for the pitched roofs. The improvement in object segmentation is sufficient to choose to use specialized hyperparameters for the pitched roofs. In addition, this is necessary to ensure results sufficiently discriminating, as visible on Figure 7.

After applying the post-processing procedure to the combined results, the final metrics are a f1 score of 0.77 and a mIoU of 0.42. The metrics are improved by the better coverage of the detected objects thanks to polygons smoothing and merging of the detections as visible on Figure 5.

5.2.2 Global results¶

Ground truth	Precision	Recall	f1 score	mIoU	Relative error (%)
adapted GT, training set	0.77	0.77	0.77	0.42	11
whole GT, training set	0.78	0.77	0.78	0.35	38
whole GT, test set	0.75	0.80	0.77	0.38	26

Table 8: Metrics and relative error on the occupied area for the training dataset when using the GT adapted for the LiDAR optimization or the whole GT, as well as for the test set on the whole GT.

The f1 score remains stable when using the whole GT (Table 8), meaning that the extensive vegetation, lawn and terraces that were removed are detected. However, they are not correctly delineated, resulting in a drop of about 17% in the mIoU and an increase in the relative error of the occupied area by a factor of 3.5.
The values for the precision and recall are always close. The results obtained with the test dataset are consistent with those obtained with the training dataset (Table 8). The observations made in the following sections about the characteristics of the detections in the training dataset, are also valid for the test dataset.

5.2.3 Detection characteristics¶

Figure 8: Number of TP, FP and FN as a function of object area.

Figure 8 shows that labeled objects with an area greater than 1 m² are well detected, with f1 scores between 0.82 and 0.92. On the other hand, detection of objects with an area lower than 1 m² is less trustworthy with a majority of FP detections and almost half of the labels tagged as FN.

Count of the tags per range of the distance to the roof edge.

Figure 9: Number of TP, FP and FN as a function of the distance of the centroid of the object from the roof edge.

Figure 9 shows that labeled objects whose centroid is more than 1 m from the roof edge are well detected with a f1 score between 0.80 and 0.85. On the other hand, among the detections whose centroid is less than 1 m from the roof edge, 65% are FP, making them less trustworthy.

Visualizing the detections, we note that FP covering no obstacle, although they do exist, are rare. Most of the FP form a group of small detections delimiting a roof edge, sometimes detecting barriers that have not been vectorized as obstacles in the GT. Therefore, the FP having an area smaller than 1 m² must often be the same as those with a centroid closer than 1 m from the roof edge.

Object class	Recall
Antenna	0.24
Pipe	0.59
Lawn	0.70
Other obstacle	0.70
Extensive vegetation	0.72
Window	0.76
Chimney	0.79
Aero	0.83
Solar thermal	0.83
Intensive vegetation	0.88
Solar unknown	0.89
Balcony / terrace	0.90
Solar photovoltaic	0.92

Table 9: Recall for each object class of the ground truth.

The developed method shows good performance in detecting most of the object classes (Table 9). Aeration outlets, balconies and terraces, intensive vegetation, and solar facilities all have recalls greater than 0.80. Antennas are the most difficult class to detect with a recall of 0.24. Next come pipes lawn, other obstacles and extensive vegetation with recall values between 0.59 and 0.72. Windows and chimneys, which are low and thin objects respectively, are detected satisfactorily with a recall of 0.76 and 0.78 respectively.

Although the detection of objects is globally satisfactory, the reproduction of their shape was not assessed. For example, only the upper parts of solar panels are generally detected as shown in Figure 10.

Example of the varying delimitation of the solar panels in the segmentation.

Figure 10: Example of results for the segmentation of solar panels using the segmentation of LiDAR data.

The same goes for lawn and extensive vegetation, which are always partially detected (Fig 11).

Roof with a extensive lawn detected as TP, but badly delineated.

Figure 11: Roof with a terrace and an area of extensive vegetation (left). Both are detected as TP, but are not covered by the area detected as occupied (right).

5.2.4 Estimated area¶

	Administrative	Industrial	Residential	Flat	Mixed	Pitched
Area labeled as occupied	4,986	32,720	19,399	54,875	1,386	844
Area detected as occupied	1,195	20,953	12,986	30,980	3,052	1,102
Total area	6,692	78,011	33,278	108,415	5,018	4,584

Table 10: Occupied area for the labels and the detections, as well as the total roof area in m².

In total, 35,134 m² of roofs were detected as occupied while 57,105 m² were labeled as such (Table 10). This represents an error of 38%.
Administrative buildings have the largest error in estimating the occupied area with an error of 76% compared to less than 37% for other building types.
The occupied area is underestimated for flat roofs, while it is overestimated for pitched and mixed roofs. The mixed roofs have an error of 120%, i.e. the estimated occupied area is about twice larger than the actual value. Flat and pitched roofs each have an error of 44% and 30% respectively.

5.2.5 Expert assessment¶

The experts were at least partially satisfied by more than 69% of the segmented roofs (Table 11).

Evaluation	OCAN	OCEN
Not satisfied	22%	31%
Partially satisfied	54%	33%
Satisfied	24%	36%

Table 11: Expert's satisfaction with the results produced using segmentation of LiDAR data. OCAN's expert assessed 122 buildings, while OCEN's expert assessed 39 buildings.

The most satisfactory types were the administrative buildings and the flat roofs, while the most unsatisfactory types were the industrial buildings and the mixed roofs. This is in contradiction with the metrics for which the administrative buildings have the lowest mIoU.

5.3 Discussion¶

5.3.1 Global capability of the method¶

The method proved its ability to detect objects with a f1 score of 0.78 on the whole GT. The primary goal, which was to detect in priority large and roof-centered objects, was satisfactorily achieved. Indeed, objects larger than 1 m² have a f1 score higher than 0.81 and objects with their centroid at more than 1 m from the roof edge have a f1 score higher than 0.79.

However, the mIoU remains lower than 0.50, indicating that the shapes of the detections are poorly reproduced. It should be noted that this metric seems to be sensitive to small variations in shape, making it a very strict metric. In addition, it should be remembered that the mIoU evaluates the delimitation of the occupied surface at the roof scale, including TP, FP and FN detections.

In addition to the low mIoU, the global estimation of the occupied area is medium with an error of 38%. However, this value is reduced to 11% when the lawn, the empty balconies and most of the extensive vegetation are removed from the ground truth. This highlights that the method is generally good, except for the segmentation of low objects. Indeed, although the aforementioned classes have a recall of 0.70 or higher, we note that their total area is largely underestimated, as visible on Figures 10 and 11. Once those classes removed from the GT, most of the false detections and missed objects have a small area (Fig. 8).

5.3.2 Problematic detections and labels¶

5.3.2.1 False positive detections¶

The majority of FP detections cover small areas (Fig. 8) and are located near the roof edges (Fig. 9). In many cases, FP detections near to the edge actually detect barriers that are not labeled in the GT. It may be considered that the protruding roof edges should be included in the annotation to improve the precision and better reflect the actual performance of the method.

5.3.2.2 By object type¶

Antennas are often missed (Table 9). We think that LiDAR points due to the presence of an antenna were considered as noise during point clustering, because the object is represented by only a few points due to its morphology and the LiDAR density. Improving antenna detection in the LiDAR point cloud may require specific developments or the use of a denser point cloud. Other thin objects, such as small chimneys, are also missed.

As shown in Section 5.2.3, low objects such as windows, extensive vegetation, lawns, and pipes are more difficult to detect, as they do not protrude above roof planes. The recall, between 0.58 and 0.76, is acceptable, but the shape of objects is always only partially detected.

Finally, the method also has trouble detecting objects in the "other obstacle" class (Table 9). However, the lack of a precise definition for this category makes it difficult to label and it encompasses several types of objects. In particular, it would be necessary to define with measurable values when a roof part is labeled as a free surface or as "other obstacle".

5.3.2.3 By building type and roof type¶

We notice that administrative buildings tend to have low or small objects, such as windows, extensive vegetation, or small chimneys. In addition, there are many FP detections because of the roof edges. The use of specialized hyperparameters does not improve the metrics on the adapted GT used for the optimization (Tables 6 and 7), supporting the difficulty to detect these objects. At the cantonal level, the error on the estimated surface should have a limited impact, since the administrative buildings represent only a small fraction of all the buildings. However, as they belong to the state, they could be prioritized for the installation of facilities on their roof.

The pitched roofs are often segmented into a single obstacle when using the global hyperparameters (Fig. 7). This could be explained by the fact they have a different typology than other roofs. In addition, the training dataset is dominated by flat roofs, with 72 buildings against 29 buildings with pitched roofs. The hyperparameters resulting from the optimization are therefore better suited to the typology of flat roofs, motivating our choice to use specialized hyperparameters for pitched roofs.
Pitched roofs can be automatically identified in the Canton of Geneva using the slope available in the roofs and buildings vector layers. If this information is unavailable, for instance in another city or canton, areas of interest can be defined, as pitched roofs are generally located in residential areas and old towns.

Most of the 21 roofs assigned to the "mixed" type have the entirety of their flat or pitched planes segmented as obstacles. Some of them would have benefited from being segmented with the parameters for pitched roofs. The definition of a pitched roof have to be studied further to define more precisely when to use the specialized hyperparameters.

5.3.3 Limitation and further developments¶

The process relies on a roof vector layer. Methods exist to produce this information automatically¹²^, ¹⁵. Their application should be tested in order to extend the project to areas where a roof vector layer does not yet exist. The one for the Canton of Geneva is produced manually to guarantee its quality.

Variations in detection quality were observed from one building to another. In addition, the detection shapes are not intuitive, making them difficult to interpret and less pleasing to the eye. Therefore, despite the method's respectable results, the experts were not interested in taking the algorithm to the production stage.

The visual aspect of the results could be improved by modifying the vectorization function to smooth the polygons directly during their production.
Alternatively, more advanced processing could try to take advantage of the fact that obstacles have simple geometries, like a cylinder for straight pipes or a parallelepiped for aeration. by trying to match these shapes to the clustered point cloud, more precise and visually pleasing detections could be produced.

6. Image segmentation¶

The third method consists of segmenting all the potential objects present in a given image. The processing resulted in the production of a vector layer of occupied surfaces per building.

6.1 Method¶

Figure 12: Illustration of the different steps in the image segmentation workflow for EGID 1005027. Black polygons correspond to the roof delimitation. (a) Bounding box (blue polygon) used to clip the true orthophotos for a given roof with a 1 m positive buffer. (b) Segmentation masks (colored pixels) obtained by processing the tile with SAM. (c) Vector masks (red polygons) of the detected objects after post-processing. (d) Detection tags assigned to the vectors.

6.1.1 Image preparation¶

Similar to the LiDAR segmentation workflow, we adopted a per-roof processing strategy to process the true orthophotos. For each selected building, the roof delimitation was used to derive a bounding box from which the true orthophoto was clipped (Fig. 12(a)). In case the roof was spread over several true orthophotos, the images were first merged. One tile was obtained for each roof considered. The tiles have the same pixel resolution as the true orthophotos, but different sizes according to the roof size.

6.1.2 Object segmentation and vectorization¶

First, potential objects visible in images were segmented using Segment Anything Model¹⁶ (SAM) implemented with PyTorch. It aims to be an open-source foundation model for object segmentation in images with strong zero-shot generalization capabilities. Instance segmentation is performed using a vision transformer (ViT-H) model and a mask is produced for each detected object (Fig. 12(b)). For the project, the default pre-trained model (checkpoints: sam_vit_h_4b8939) was used without any fine-tuning specific to roof objects. Although the object classes are available in the GT dataset, no classification was performed.
Second, SAM does not handle georeferenced datasets. To simplify the process of leveraging SAM for geospatial data analysis, we used the Python library segment-geospatial¹⁷ (samgeo). Georeferenced tiles are used as input to the algorithm. The coordinate reference system of the image is assigned to the SAM masks and their corresponding polygon vectors (Fig. 12(c)).

Some large buildings, up to 300 m in length, may be encountered. In this case, the number of pixels in the tile can saturate the RAM during image segmentation. To handle this issue, large tiles are split into smaller sub-tiles of 512 px size. Boundary effects are the downside of this method. Sub-tiles are processed individually by SAM. The output masks are then merged to recover the original tile extent, but the joints between the sub-tile masks may not match, causing artifacts in the vector layer (Fig. 13).

Figure 13: The squared orange polygon is an artifact due to the tiling performed to process large tiles (EGID 1011376). The tagged detections are superimposed on (left) the segmented masks (white: detection, black: background) and (right) the true orthophotos. Grey and black polygons correspond to the building delineation.

6.1.3 Result filtering¶

To improve the quality of the results, post-processing tasks were performed. Polygons were discarded based on geometric considerations:

polygons equal to or smaller than 0.2 m² were considered noise,
polygons whose area (without holes) was close (90%) to the roof area were likely to correspond to the segmentation of the whole roof by SAM and were therefore not relevant to keep,
polygons with more than half of their area not intersecting the roof, as the detected object is unlikely to be located on the roof.

The vector layer of detected objects was clipped with the roof delimitation polygon to ensure that objects did not overlap several roofs (Fig. 12(c)).

Each building was processed independently. The vector layers were finally merged into a single layer.

6.1.4 Assessment and hyperparameter optimization¶

The detections were compared to the GT labels (Fig. 12(d)) and metrics were calculated (Section 3.1) to evaluate the performance of the algorithm and the choice of post-processing parameters.

SAM displays numerous hyperparameters for which values were assigned after running the optimization workflow presented in Section 3.2. Depending on some hyperparameter values and the image size, processing a single image can take several minutes. Since the optimization requires tens of iterations, it was unreasonable to run the process on the entire training dataset. Therefore, we chose to sub-sample the training dataset down to 25 roofs (Table 1), selected to be representative of the entire dataset. Running the optimization process on this subset for 50 iterations took between 1 and 2 days using a 16 GiB GPU machine.
Based on several replications of the optimization process, including one performed on 100 trials, four of the most influential hyperparameters¹⁶ were identified: (1) the threshold on the stability score of the predicted mask, (2) the stability score offset, (3) the box IoU cutoff used by non-maximal suppression to filter duplicated masks and (4) the prediction threshold.
The other SAM hyperparameters have a limited impact on the value ranges explored. However, we noticed that the number of points sampled per side strongly influence the processing duration (Table 12). 64 points per side is a good trade-off between performance and computation time, which guided our final choice to set this value.

Points per side	f1 score	mIoU	Duration (min)
128	0.75	0.40	43
96	0.75	0.44	25
64	0.74	0.41	12
32	0.66	0.40	4

Table 12: Influence of the number of points sampled per image side on the f1 score, mIoU and duration of segmentation using SAM on a 16 GiB GPU machine. Results obtained on the training sub-sampled dataset with the optimized hyperparameter values set in the configuration file and only varying the points per side.

The selected hyperparameter values can be found in the configuration file of the image segmentation workflow.

6.2 Results¶

6.2.1 Global¶

Example of imgseg results of free and occupied surface

Figure 14: Example of a result obtained with the image segmentation workflow. Free surfaces were obtained by subtracting detected objects from the roof boundary (black polygons).

The image segmentation method produced vectors of detected objects for each roof considered (Fig. 14). The metrics obtained for the different datasets are presented in Table 13. They were obtained for a set of hyperparameters that balanced the precision and the recall. Alternative result, obtained with a set of hyperparameters promoting the recall over the precision was also produced and evaluated but was not preferred by the experts, in particular due to the presence of large FP detections segmenting whole roofs.
Overall, similar metric values are obtained for the different datasets, demonstrating the consistency of the method.

Dataset	Precision	Recall	f1 score	mIoU	Relative error (%)
Training subset	0.73	0.78	0.75	0.41	7
Training	0.75	0.82	0.78	0.37	42
Test	0.75	0.71	0.73	0.37	23

Table 13: Metrics and relative errors on the occupied area for the training and the test datasets.

Satisfactory f1 scores, between 0.73 and 0.78, are achieved. We note a slight imbalance, from 4 to 7 points, between the precision and the recall according to the different datasets.
The value of the mIoU are modest, ranging between 0.37 and 0.41, with a standard deviation of about 0.20 (later noted as +/- 0.20). High mIoU (>= 0.70) are associated with high f1 score (0.87 +/- 0.11 in average) but the opposite is not true (Fig. 15). When an object is detected, the method usually shows good ability to segment it accurately. However, small discrepencies with GT shapes can lower down the IoU value significantly.

Example of roof with good and average IoU

Figure 15: Examples of detections with high f1 scores but variables IoU. (left) Roof segmentation (EGID 295060134) with both high f1 score and IoU and (right) roof segmentation (EGID 1023590) with a high f1 score and an averaged IoU.

Finally, the surface area occupied by the detected objects in the training and test datasets, which represents about 30% of the total area, is significantly underestimated compared with the 50% of the GT (Table 1). This results in relative errors between 23% and 42%, for the test and training datasets respectively. Note the significant variation in relative errors depending on the dataset considered, in particular, the variability between the training subset and the training datasets. This highlights that errors concerning large objects can drastically increase the relative error on area estimations.

6.2.2 Roof characteristics¶

Metric	Administrative	Industrial	Residential		Flat	Mixed	Pitched
f1 score	0.81	0.80	0.77		0.79	0.75	0.74
mIoU	0.23	0.33	0.38		0.30	0.39	0.51

Table 14: Metrics calculated by building and roof type for the training dataset.

Table 14 shows that the model detects objects similarly for the different building and roof types with a f1 score within 5 points for each types.
The mIoU depends on the roof characteristics. Industrial and residential buildings have a mIoU of about 0.35, while the value for the administrative buildings is about 35% lower. The mIoU increases from flat, to mixed, to pitched roof over a range of about 20 points.

Figure 16: Comparison of the occupied and free surface areas of the GT labels and the detections according to (top) the building types and (bottom) the roof types for the training dataset.

The detected occupied areas are underestimated regardless of the building type (Fig. 16, top). Industrial and residential buildings have a relative error of about 40%, while the administrative buildings have a higher error of 67%.
The occupied areas of pitched and mixed roofs are accurately estimated with a relative error of about 10%, while the performance for the flat roof is worse, with an error of 43% (Fig. 16, bottom). Remember that all administrative and industrial roofs have a flat roof.

Considering the similar f1 scores obtained for all the roof properties and the significant amount of time required to run the optimization workflow, no specific optimization was carried out to date.

6.2.3 Object characteristics¶

6.2.3.1 Class¶

The image segmentation method detects objects of different classes (Fig. 17) with an average recall of 0.80 +/- 0.11, with the exception of pipes, which performs significantly worse with a recall of 0.27.

Figure 17: Recall for each object class. The results are obtained for the training dataset.

Lawns, PV panels and windows are particularly well detected with recall values above 0.93.

6.2.3.2 Surface area¶

Figure 18 shows that objects with a surface area between 0.5 m² and 100 m² are detected with equal performances by the algorithm with a recall of 0.84 +/- 0.02. Smaller and larger objects are more difficult to detect, with 65% and 76% of GT objects detected, respectively.

Number of detections and labels in each assessment tag according to object surface

Figure 18: Number of TP and FN labels, as well as FP detections, depending on the object area (m²). The results are obtained on the training dataset.

The proportion of FP detections increases for surface areas of less than 1 m² leading to an average precision of 0.60 +/- 0.06, while the average precision for larger objects is 0.83 +/- 0.05.

6.2.3.3 Position on the roof¶

Objects are well detected, with an average recall of 0.83 +/- 0.04, as long as their centroid is more than 1 m from the roof edge (Fig. 19). For objects closer to the roof edge, the recall is only 0.56.

Number of detections and labels in each assessment tag according to object position to the roof edge

Figure 19: Number of TP and FN labels, as well as FP detections, depending on the distance of the object centroid to the roof edge (m). The results are obtained for the training dataset.

The precision also decreases significantly for objects located near to the roof edge, from an average of 0.77 +/- 0.03 for object centroids more than 1 m away to 0.51 below, due to an increase of FP detections.

6.2.4 Expert assessment¶

The experts are at least partially satisfied by over 86% with the image segmentation method (Table 15).

Evaluation	OCAN	OCEN
Not satisfied	6%	14%
Partially satisfied	40%	49%
Satisfied	54%	37%

Table 15: Expert's satisfaction with the results produced using image segmentation. OCAN's expert assessed 122 buildings, while OCEN's expert assessed 39.

Satisfaction is independent of the building type and the roof type which is consistent with the f1 score (Table 13). Slightly lower satisfaction are attributed to the administrative buildings and the flat roofs, which is consistent with the fact that these types have the lowest mIoU.

The experts are generally satisfied with the shapes of the detection polygons and the consistency of the results from one building to another.

6.3 Discussion¶

6.3.1 Limits to object segmentation¶

Although the workflow provides overall satisfactory results, there are inherent limitations when using the SAM algorithm to detect roof objects:

pipe detection: the detection of objects belonging to the "pipe" class is more difficult than for other classes (Fig. 20(a)). The reason could be due to:
- lack of contrast with the background roof ;
- pipes are long but narrow objects, the point sampling in the image may be too large to detect these objects.
color change: the SAM algorithm is sensitive to color changes that can be interpreted as an object and segmented.
- Roof color (Fig. 20(b)): roof surface can have different colors due to durtiness or material change. In this case, it is difficult to address this issue in both pre- and post-processing.
- Shadows (Fig. 20(c)): although aerial images were acquired at a time of the day minimizing the presence of shadows, some are still present and can be detected as objects. Methods¹⁸ exist to mitigate the presence of shadows in images and could be considered in future developments.
- Segmentation of roof planes (Fig. 20(d)): the segmentation of entire roof planes by SAM occurs on both flat and pitched roofs. This is due to color contrast under different illumination associated to different heights and/or inclinations between roof planes. A filter was applied to discard polygons with the same size as the roof, but it was not applied to roof planes because their sizes were sometimes comparable to some labeled objects. The detection of large roof planes may strongly influences the mIoU and the calculation of the occupied surface.
difficulty in detecting small objects (Fig. 18): this can be explained by the number of points sampled per image. Increasing the number may improve the detection of small objects, but it may also lead to over-segmentation of some objects. In addition, the processing time is greatly increased (Table 12). As these objects represent only a limited fraction of the occupied surface (< 0.4%), we decided to prioritize a reasonable computation time over the detection of small objects, which are considered less critical to be detected.

Figure 20: Examples of problems encountered with the image segmentation method. (a) Missing pipe detections; (b) Color changes interpreted as the presence of objects; (c) Dormer shadows detected as objects; (d) Roof plane segmentation.

SAM is a pre-trained model showing good zero-shot generalization performance but is not dedicated to detect objects on roofs. The possibility of fine-tuning the model can be considered to improve the performance. It would require additional training with the dataset at our disposal (true orthophotos plus GT annotation).

6.3.2 Reproduction of object shape¶

We acknowledge that the mIoU has low values on the training and test datasets (Table 13), but note that this metric is strict.
It is sensitive to the detection or not of an object as it was computed on all the polygons present on the roof, including TP, FP and FN. Thus, if a large object is not detected or if there is a large FP detection, the mIoU value will be strongly affected.
In addition, the IoU metric is also sensitive to discrepancies in polygon shapes with the GT (Fig. 16(b)). While the object delineation may appear satisfactory from visual inspection, the metric may display low value. This aspect is difficult to improve as it dependents on the segmentation model and the GT delineation strategy. Overall, the shape of the object is usually satisfactorily reproduced when detected (Figs. 16 and 21).

The method tends to underestimate the occupied surface area (Fig. 17). Thus, the estimated free surface constitutes an upper limit to assess the potential.
The small relative error of 10% obtained on the occupied area for the mixed and pitched roofs can be explained by the fact that they generally correspond to villas with limited roof surfaces and a "simple" arrangement of small objects. In comparison, industrial roofs can be large with complex arrangements of objects such as pipes, solar panels or ventilation systems. Therefore, detection errors on villas have usually less impact on the area estimation than detection errors on large industrial buildings.

6.3.3 Relevance of the method¶

The results provide strong arguments in favor of the ability of the image segmentation method to correctly detect and segment objects. The fact that the metrics are consistent between the different datasets (Table 13) is encouraging for its applicability to a wider area with a variety of buildings.

The performance of the method is lower for small objects (Fig. 19) and objects close to the roof edge (Fig. 20). However, the accurate detection of these objects is less critical as they interfere less with the continuity of the roof for the potential installation of solar panels and vegetated roofs.

The experts were satisfied with the results and interested in putting the method into production. However, the current processing time, about 12 min for 25 buildings, is an hindrance to extending the method to the whole canton of Geneva, gathering about 80,000 buildings. Parallelizing the algorithm to apply the method to an area of interest should be considered.

Finally, true orthophotos were used in this case. These provide the actual position of an object on a roof. Such product is rare because it is more expensive to produce and thus may not be available or regularly updated. However, we are confident that this segmentation method can be applied to orthophotos, more regularly acquired. In this case, methods for reprojecting the position of roofs and/or vectors will need to be explored.

7. Combination of results¶

The developed methods display different strengths and weaknesses. For instance, LiDAR segmentation has difficulty detecting low and thin objects, which image segmentation does not. Conversely, image segmentation has difficulty with color change segmentation and pipe detection, which LiDAR segmentation does not. Therefore, combining the two results could yield interesting outcomes.

7.1 Method¶

Two combinations of results were tested:

concatenation of detected objects from the vector layers produced by LiDAR segmentation and image segmentation;
selection of overlapping polygons of detected objects from the vector layers produced by LiDAR segmentation and image segmentation with a spatial join. In this case, the image segmentation polygons were retained because they provide the most satisfactory delineation.

The resulting combined vector layers were then assessed with metrics but not by the experts.

7.2 Results¶

Combination method	Precision	Recall	f1 score	mIoU	Relative error (%)
Concatenation	0.68	0.94	0.79	0.45	8
Spatial join	0.81	0.69	0.75	0.33	48

Table 16: Metrics obtained for the training dataset.

Comparing Tables 8 and 13 with Table 16, we note that the combination results in similar f1 scores, around 0.77. However, precision and recall values are affected differently.
The recall increases by more than 10 points with the concatenation method, reaching the excellent value of 0.94. This means that most of the GT objects are detected, including the pipes reaching the satisfactory value of 0.68. On the other hand, the proportion of FPs increases, diminishing the precision by about 8 points.
The spatial join discards all the single FP detections, improving the precision by 3 to 6 points. Non-overlapping TPs are discarded as well, reducing the recall value by more than 8 points.

The concatenation has a positive impact, more than 10 points, on the mIoU compared to the spatial join, which provides higher values than the segmentation methods.
Finally, the relative error on the occupied surface is significantly reduced to less than 10% by concatenating the results while it increases to about 50% with the spatial join method.

7.3 Discussion¶

Combining the results does not improve the f1 score, but allows for modulation of the results, i.e. whether favor precision or recall, depending on the needs (Table 16).

The high recall value obtained with concatenation proves the complementarity of the two methods for detecting different objects. Note that in this case, the final vector layer contains polygons with different aspects. A higher recall value tends to favor the mIoU, since more GT objects are detected, despite the addition of FP. The surface of the detected object is thus improved, but the addition of FPs also contributes to the reduction of the relative error on the occupied surface, which must be carefully analyzed.

Note that the results of the object segmentation can also be combined with occupancy classification to refine the information on the "potentially free" roof planes. Finally, although incomplete, the roof and roof superstructure vector layers produced by the State of Geneva contain vectors of some roof objects that can be used additionally to improve the accuracy of the results.

8. Conclusion¶

Detecting objects on rooftops is a key aspect of assessing the potential for installing facilities in cities, such as solar panels and vegetated rooftops. The STDL explored three methods to achieve this objective, based on machine learning and deep learning algorithms and on LiDAR, aerial imagery and vector data. All methods provided satisfactory results. Occupancy classification enabled roof planes to be classified with 85% accuracy. The two segmentation methods reached similar results, with a f1-score of about 0.77, a mIoU of about 0.36 and a relative error on the detected occupied area of 40%. In particular, segmentation methods have made it possible to accurately detect large objects and objects centered on the roof, which are most likely to constitute obstacles to the installation of facilities.

Overall, the beneficiaries were satisfied with all the methods, with at least 70% of buildings having satisfactory detections. Despite similar performance to image segmentation, LiDAR segmentation was considered the least satisfactory due to the appearance of the detection shapes and the varying results between buildings and object classes. Image segmentation gives satisfactory results overall, but at the current stage, the processing time is unrealistic to consider scaling up the method at the cantonal level. Further developments are required to reduce the computational cost. Finally, the classification method reconciles both accurate results and fast processing time. Therefore, it was selected for an application at the cantonal level. A vector layer indicating the presumed occupancy of roof planes will be produced helping the beneficiaries to find and assess areas potentially available for new installations.

Combining the results is an asset to enhance the strengths of the different methods. Combining segmentation results increases either precision or recall, depending on the chosen method, without changing the f1 score. A better recall translated into an enhanced delineation of the occupied area on a roof. Cross-referencing information sources, such as occupation classification and published vector layers, can improve results accuracy and help identify areas of interest.

It should be noted that the results are in line with the STDL's objective to automatically detect occupied and free surfaces on roofs. These results from numerical models are indications that need to be verified by an expert as part of an installation project. Our results do not indicate whether a facility can actually be installed. Additional parameters such as roof material, slope, solar potential, protected buildings, etc., which affect the possibility and prioritization of an installation, are not taken into account and are the responsibility of the beneficiaries.

Code availability¶

The codes are available on the STDL's GitHub repository: proj-rooftops

Acknowledgements¶

This project was made possible thanks to a tight collaboration between the STDL team and beneficiaries from the offices of the Etat de Genève. In particular, the STDL team acknowledges the key contributions from Basile Grandjean (OCEN), Benjamin Guinaudeau (OCAN), Alisa Freyre (PanData), Mayeul Gaillet (DIT) and Geraldine Chollet (OCAN). We thank PanData for the production of the ground truth. This project has been funded by Strategie Suisse pour la Géoinformation.

Appendix¶

A. Variable importance in the random forests¶

Variable	Importance for OCAN
median roughness	19.3
margin of error of intensity	16.8
mean roughness	15.7
minimum roughness	10.6
standard deviation of intensity	7.8
area	5.8
mean intensity	4.5
median intensity	3.9
minimum altitude	3.5
standard deviation of roughness	3.3
maximum intensity	3.0
maximum roughness	2.5
minimum intensity	2.4
% of overlap with non-building data	1.1

Table A1: List of the variables considered in the random forest and their importance in the classification for the OCAN.

	Importance for OCEN
margin of error of intensity	17.6
minimum roughness	17.4
area	13.5
mean roughness	10.7
median roughness	8.3
standard deviation of roughness	5.5
standard deviation of intensity	4.6
maximum roughness	4.2
maximum intensity	4.1
minimum altitude	3.6
mean intensity	3.1
minimum intensity	3.0
median intensity	2.4
% of overlap with non-building data	2

Table A2: List of the variables considered in the random forest and their importance in the classification for the OCEN.

References¶

Qing Zhong, Jake R. Nelson, Daoqin Tong, and Tony H. Grubesic. A spatial optimization approach to increase the accuracy of rooftop solar energy assessments. Applied Energy, 316:119128, June 2022. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261922005062 (visited on 2022-05-27), doi:10.1016/j.apenergy.2022.119128. ↩↩
Junjing Yang, Devi Llamathy Mohan Kumar, Andri Pyrgou, Adrian Chong, Mat Santamouris, Denia Kolokotsa, and Siew Eang Lee. Green and cool roofs’ urban heat island mitigation potential in tropical climate. Solar Energy, 173:597–609, October 2018. URL: https://linkinghub.elsevier.com/retrieve/pii/S0038092X18307667 (visited on 2024-03-21), doi:10.1016/j.solener.2018.08.006. ↩
Dan Stowell, Jack Kelly, Damien Tanner, Jamie Taylor, Ethan Jones, James Geddes, and Ed Chalstrey. A harmonised, high-coverage, open dataset of solar photovoltaic installations in the UK. Scientific Data, 7(1):394, November 2020. URL: https://www.nature.com/articles/s41597-020-00739-0 (visited on 2024-03-21), doi:10.1038/s41597-020-00739-0. ↩
Nima Narjabadifam, Mohammed Al-Saffar, Yongquan Zhang, Joseph Nofech, Asdrubal Cheng Cen, Hadia Awad, Michael Versteege, and Mustafa Gül. Framework for Mapping and Optimizing the Solar Rooftop Potential of Buildings in Urban Systems. Energies, 15(5):1738, February 2022. URL: https://www.mdpi.com/1996-1073/15/5/1738 (visited on 2024-03-21), doi:10.3390/en15051738. ↩
Youssef El Merabet, Cyril Meurie, Yassine Ruichek, Abderrahmane Sbihi, and Raja Touahni. Building Roof Segmentation from Aerial Images Using a Line and Region-Based Watershed Segmentation Technique. Sensors, 15(2):3172–3203, February 2015. URL: http://www.mdpi.com/1424-8220/15/2/3172 (visited on 2023-03-28), doi:10.3390/s150203172. ↩↩
Jordan M. Malof, Kyle Bradbury, Leslie M. Collins, and Richard G. Newell. Automatic detection of solar photovoltaic arrays in high resolution aerial imagery. Applied Energy, 183:229–240, December 2016. URL: https://linkinghub.elsevier.com/retrieve/pii/S0306261916313009 (visited on 2024-03-21), doi:10.1016/j.apenergy.2016.08.191. ↩
Abraham Noah Wu and Filip Biljecki. Roofpedia: Automatic mapping of green and solar roofs for an open roofscape registry and evaluation of urban sustainability. Landscape and Urban Planning, 214:104167, October 2021. URL: https://linkinghub.elsevier.com/retrieve/pii/S0169204621001304 (visited on 2022-05-23), doi:10.1016/j.landurbplan.2021.104167. ↩
Nils Hamel and Huriel Reichel. Detection of thermal panels on canton territory to follow renewable energy deployment. Technical Report, STDL, February 2022. URL: https://tech.stdl.ch/PROJ-TPNL/. ↩
Sebastian Krapf, Lukas Bogenrieder, Fabian Netzler, Georg Balke, and Markus Lienkamp. RID—Roof Information Dataset for Computer Vision-Based Photovoltaic Potential Assessment. Remote Sensing, 14(10):2299, May 2022. URL: https://www.mdpi.com/2072-4292/14/10/2299 (visited on 2022-05-27), doi:10.3390/rs14102299. ↩↩
Roberto Castello, Simon Roquette, Martin Esguerra, Adrian Guerra, and Jean-Louis Scartezzini. Deep learning in the built environment: automatic detection of rooftop solar panels using Convolutional Neural Networks. Journal of Physics: Conference Series, 1343(1):012034, November 2019. URL: https://iopscience.iop.org/article/10.1088/1742-6596/1343/1/012034 (visited on 2024-03-21), doi:10.1088/1742-6596/1343/1/012034. ↩
Alexander Apostolov, August Baum, Ghali Chraibi, and Roberto Castello. Automatic detection of available area for rooftop solar panel installation. Technical Report, EPFL, December 2020. URL: https://www.epfl.ch/labs/mlo/wp-content/uploads/2021/05/crpmlcourse-paper859.pdf. ↩
Fayez Tarsha Kurdi, Mohammad Awrangjeb, and Alan Wee-Chung Liew. Automated Building Footprint and 3D Building Model Generation from Lidar Point Cloud Data. In 2019 Digital Image Computing: Techniques and Applications (DICTA), 1–8. Perth, Australia, December 2019. IEEE. URL: https://ieeexplore.ieee.org/document/8946008/ (visited on 2024-03-21), doi:10.1109/DICTA47822.2019.8946008. ↩↩
Mohammad Aslani and Stefan Seipel. Automatic identification of utilizable rooftop areas in digital surface models for photovoltaics potential assessment. Applied Energy, 306:118033, January 2022. URL: https://www.sciencedirect.com/science/article/pii/S0306261921013283 (visited on 2023-03-24), doi:10.1016/j.apenergy.2021.118033. ↩
Shuhei Watanabe. Tree-Structured Parzen Estimator: Understanding Its Algorithm Components and Their Roles for Better Empirical Performance. Technical Report, University of Freiburg, May 2023. arXiv:2304.11127 [cs]. URL: http://arxiv.org/abs/2304.11127 (visited on 2024-04-29), doi:10.48550/arXiv.2304.11127. ↩
Zhen Qian, Min Chen, Teng Zhong, Fan Zhang, Rui Zhu, Zhixin Zhang, Kai Zhang, Zhuo Sun, and Guonian Lü. Deep Roof Refiner: A detail-oriented deep learning network for refined delineation of roof structure lines using satellite imagery. International Journal of Applied Earth Observation and Geoinformation, 107:102680, March 2022. URL: https://linkinghub.elsevier.com/retrieve/pii/S030324342200006X (visited on 2022-05-27), doi:10.1016/j.jag.2022.102680. ↩
Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chloe Rolland, Laura Gustafson, Tete Xiao, Spencer Whitehead, Alexander C. Berg, Wan-Yen Lo, Piotr Dollár, and Ross Girshick. Segment Anything. April 2023. arXiv:2304.02643 [cs]. URL: http://arxiv.org/abs/2304.02643 (visited on 2024-04-09). ↩↩
Qiusheng Wu and Lucas Prado Osco. Samgeo: A Python package for segmenting geospatial data with the Segment Anything Model (SAM). Journal of Open Source Software, 8(89):5663, September 2023. URL: https://joss.theoj.org/papers/10.21105/joss.05663 (visited on 2024-03-22), doi:10.21105/joss.05663. ↩
Xiaoxia Liu, Fengbao Yang, Hong Wei, and Min Gao. Shadow Removal from UAV Images Based on Color and Texture Equalization Compensation of Local Homogeneous Regions. Remote Sensing, 14(11):2616, May 2022. URL: https://www.mdpi.com/2072-4292/14/11/2616 (visited on 2024-03-22), doi:10.3390/rs14112616. ↩