High Performance Computing for Deep Learning Algorithms for Computationally Intensive Applications

Gazzoni, Marco

In the era of exascale computing, accelerator-rich clusters, and intelligent edge devices, computation has become both the driving force and the limiting factor of modern artificial intelligence. This thesis explores the deep interdependence between High-Performance Computing (HPC) and Deep Learning (DL), arguing that recent advances in AI are inseparable from innovations in computational scale, efficiency, and system co-design. Deep learning’s demand for massive computation has drawn it toward HPC infrastructures, while HPC’s focus on throughput, optimization, and energy efficiency has shaped how DL models are trained and deployed. Together, they form a shared design space where HPC principles—profiling, parallelism, and resource efficiency—inform AI research, and HPC architectures evolve to accelerate DL operations and support irregular workloads. The work is structured around a “dual-track” methodology: (i) exploiting HPC for large-scale exploration, training, and validation, and (ii) translating those results into resource-constrained, deployable models through compression, quantization, and system optimization. This framework connects exascale experimentation with embedded deployment and recurs across applications from healthcare to industrial sensing. Methodologically, the thesis contributes three elements: (a) HPC-assisted experimentation for statistically robust architectural and preprocessing choices; (b) domain-specific DL pipelines adapted to data characteristics such as small targets or spectral signals; and (c) deployment strategies that satisfy latency and power constraints without compromising accuracy. Foundational chapters survey key DL architectures (ResNet, U-Net, Vision Transformer), modern HPC stacks (clusters, interconnects, GPU programming), and principles of efficient embedded computation, arguing that HPC is not merely a hardware domain but a unifying philosophy from exascale training to watt-level inference. Applications illustrate this integration. In ophthalmology, the work proposes a frequency-aware pipeline for retinal OCT analysis, detecting microscopic inflammatory lesions (hyper-reflective foci) through FFT-based enhancement, residual U-Net segmentation, and post-processing for lesion counting. Large-scale HPC sweeps over network variants and loss functions deliver clinically reliable results. For pandemic-era diagnostics, a compact pipeline classifies COVID-19 from lung ultrasound videos, coupling HPC-driven hyperparameter search with deployment on NVIDIA Jetson hardware for real-time bedside inference. The thesis extends to hyperspectral imaging (HSI) across dermatology and neurosurgery. Contributions include (1) combining spectral and spatial features for melanoma analysis and edge-ready screening tools; (2) efficient, noise-robust architectures for glioblastoma segmentation trained at HPC scale; and (3) HS2RGB, a framework converting hyperspectral cubes into enriched RGB images compatible with pretrained models, thus reducing data and compute demands. Industrial sensing provides a final case study: real-time fault detection in induction motors using only stator currents. FFT-based features feed a quantization-friendly residual MLP, whose int8-compressed version runs efficiently on microcontrollers, demonstrating the full HPC-to-edge workflow. Ultimately, the dissertation presents an integrated engineering workflow uniting deep learning and high-performance computing: use computational scale for principled discovery, and use efficiency techniques to deploy those discoveries in real-world, resource-limited environments—from supercomputers to operating rooms to industrial devices.