In the era of exascale computing, accelerator-rich clusters, and intelligent edge devices, computation has become both the driving force and the limiting factor of modern artificial intelligence. This thesis explores the deep interdependence between High-Performance Computing (HPC) and Deep Learning (DL), arguing that recent advances in AI are inseparable from innovations in computational scale, efficiency, and system co-design. Deep learning’s demand for massive computation has drawn it toward HPC infrastructures, while HPC’s focus on throughput, optimization, and energy efficiency has shaped how DL models are trained and deployed. Together, they form a shared design space where HPC principles—profiling, parallelism, and resource efficiency—inform AI research, and HPC architectures evolve to accelerate DL operations and support irregular workloads. The work is structured around a “dual-track” methodology: (i) exploiting HPC for large-scale exploration, training, and validation, and (ii) translating those results into resource-constrained, deployable models through compression, quantization, and system optimization. This framework connects exascale experimentation with embedded deployment and recurs across applications from healthcare to industrial sensing. Methodologically, the thesis contributes three elements: (a) HPC-assisted experimentation for statistically robust architectural and preprocessing choices; (b) domain-specific DL pipelines adapted to data characteristics such as small targets or spectral signals; and (c) deployment strategies that satisfy latency and power constraints without compromising accuracy. Foundational chapters survey key DL architectures (ResNet, U-Net, Vision Transformer), modern HPC stacks (clusters, interconnects, GPU programming), and principles of efficient embedded computation, arguing that HPC is not merely a hardware domain but a unifying philosophy from exascale training to watt-level inference. Applications illustrate this integration. In ophthalmology, the work proposes a frequency-aware pipeline for retinal OCT analysis, detecting microscopic inflammatory lesions (hyper-reflective foci) through FFT-based enhancement, residual U-Net segmentation, and post-processing for lesion counting. Large-scale HPC sweeps over network variants and loss functions deliver clinically reliable results. For pandemic-era diagnostics, a compact pipeline classifies COVID-19 from lung ultrasound videos, coupling HPC-driven hyperparameter search with deployment on NVIDIA Jetson hardware for real-time bedside inference. The thesis extends to hyperspectral imaging (HSI) across dermatology and neurosurgery. Contributions include (1) combining spectral and spatial features for melanoma analysis and edge-ready screening tools; (2) efficient, noise-robust architectures for glioblastoma segmentation trained at HPC scale; and (3) HS2RGB, a framework converting hyperspectral cubes into enriched RGB images compatible with pretrained models, thus reducing data and compute demands. Industrial sensing provides a final case study: real-time fault detection in induction motors using only stator currents. FFT-based features feed a quantization-friendly residual MLP, whose int8-compressed version runs efficiently on microcontrollers, demonstrating the full HPC-to-edge workflow. Ultimately, the dissertation presents an integrated engineering workflow uniting deep learning and high-performance computing: use computational scale for principled discovery, and use efficiency techniques to deploy those discoveries in real-world, resource-limited environments—from supercomputers to operating rooms to industrial devices.

High Performance Computing for Deep Learning Algorithms for Computationally Intensive Applications

GAZZONI, MARCO
2026-04-29

Abstract

In the era of exascale computing, accelerator-rich clusters, and intelligent edge devices, computation has become both the driving force and the limiting factor of modern artificial intelligence. This thesis explores the deep interdependence between High-Performance Computing (HPC) and Deep Learning (DL), arguing that recent advances in AI are inseparable from innovations in computational scale, efficiency, and system co-design. Deep learning’s demand for massive computation has drawn it toward HPC infrastructures, while HPC’s focus on throughput, optimization, and energy efficiency has shaped how DL models are trained and deployed. Together, they form a shared design space where HPC principles—profiling, parallelism, and resource efficiency—inform AI research, and HPC architectures evolve to accelerate DL operations and support irregular workloads. The work is structured around a “dual-track” methodology: (i) exploiting HPC for large-scale exploration, training, and validation, and (ii) translating those results into resource-constrained, deployable models through compression, quantization, and system optimization. This framework connects exascale experimentation with embedded deployment and recurs across applications from healthcare to industrial sensing. Methodologically, the thesis contributes three elements: (a) HPC-assisted experimentation for statistically robust architectural and preprocessing choices; (b) domain-specific DL pipelines adapted to data characteristics such as small targets or spectral signals; and (c) deployment strategies that satisfy latency and power constraints without compromising accuracy. Foundational chapters survey key DL architectures (ResNet, U-Net, Vision Transformer), modern HPC stacks (clusters, interconnects, GPU programming), and principles of efficient embedded computation, arguing that HPC is not merely a hardware domain but a unifying philosophy from exascale training to watt-level inference. Applications illustrate this integration. In ophthalmology, the work proposes a frequency-aware pipeline for retinal OCT analysis, detecting microscopic inflammatory lesions (hyper-reflective foci) through FFT-based enhancement, residual U-Net segmentation, and post-processing for lesion counting. Large-scale HPC sweeps over network variants and loss functions deliver clinically reliable results. For pandemic-era diagnostics, a compact pipeline classifies COVID-19 from lung ultrasound videos, coupling HPC-driven hyperparameter search with deployment on NVIDIA Jetson hardware for real-time bedside inference. The thesis extends to hyperspectral imaging (HSI) across dermatology and neurosurgery. Contributions include (1) combining spectral and spatial features for melanoma analysis and edge-ready screening tools; (2) efficient, noise-robust architectures for glioblastoma segmentation trained at HPC scale; and (3) HS2RGB, a framework converting hyperspectral cubes into enriched RGB images compatible with pretrained models, thus reducing data and compute demands. Industrial sensing provides a final case study: real-time fault detection in induction motors using only stator currents. FFT-based features feed a quantization-friendly residual MLP, whose int8-compressed version runs efficiently on microcontrollers, demonstrating the full HPC-to-edge workflow. Ultimately, the dissertation presents an integrated engineering workflow uniting deep learning and high-performance computing: use computational scale for principled discovery, and use efficiency techniques to deploy those discoveries in real-world, resource-limited environments—from supercomputers to operating rooms to industrial devices.
29-apr-2026
File in questo prodotto:
File Dimensione Formato  
Gazzoni_PhD_Thesis_final.pdf

embargo fino al 06/05/2027

Descrizione: Tesi di dottorato
Tipologia: Tesi di dottorato
Dimensione 3.51 MB
Formato Adobe PDF
3.51 MB Adobe PDF   Visualizza/Apri   Richiedi una copia

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1547615
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact