Accurate building height estimation from very high-resolution (VHR) Synthetic Aperture Radar (SAR) imagery plays a pivotal role in urban analysis tasks. This paper presents a pixel-based deep learning (DL) framework for estimating building height maps from single COSMO-SkyMed (CSK) SAR images. Supervised training is provided through a refined normalized Digital Surface Model (nDSM), constructed by fusing public building height data with a globally available DSM baseline using a distance-weighted blending scheme. The proposed architecture features a modified Attention U-Net with dual decoders, specialized for built-up and background areas, and is trained using a Mean Absolute Error (MAE) loss for increased robustness to SAR-specific distortions. The model is evaluated across a multi-continental dataset covering eight cities, and tested under both in-distribution and cross-city out-of-distribution (OOD) conditions. The results show that the approach outperforms recent object-based and multimodal benchmarks, especially in European and American cities, although challenges remain in high-rise Asian metropolises.
A Pixel-Based Deep Learning Approach for Building Height Estimation From Single SAR Images
Russo L.;Memar B.;Gamba P.
2026-01-01
Abstract
Accurate building height estimation from very high-resolution (VHR) Synthetic Aperture Radar (SAR) imagery plays a pivotal role in urban analysis tasks. This paper presents a pixel-based deep learning (DL) framework for estimating building height maps from single COSMO-SkyMed (CSK) SAR images. Supervised training is provided through a refined normalized Digital Surface Model (nDSM), constructed by fusing public building height data with a globally available DSM baseline using a distance-weighted blending scheme. The proposed architecture features a modified Attention U-Net with dual decoders, specialized for built-up and background areas, and is trained using a Mean Absolute Error (MAE) loss for increased robustness to SAR-specific distortions. The model is evaluated across a multi-continental dataset covering eight cities, and tested under both in-distribution and cross-city out-of-distribution (OOD) conditions. The results show that the approach outperforms recent object-based and multimodal benchmarks, especially in European and American cities, although challenges remain in high-rise Asian metropolises.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


