: AI models for medical imaging often fail under dataset shifts and on underrepresented patient subgroups. Detecting out-of-distribution scans-arising from rare pathologies, atypical anatomy, or acquisition artifacts-is therefore essential for robust deployment. We introduce ViTMARE (Vision Transformer Masked Autoencoder Reconstruction Error), a volumetric anomaly-detection pipeline for 3D brain MRI that leverages Vision Transformer Masked AutoEncoders (ViTMAEs) adapted to volumetric data by treating axial slices as input channels. The model is fine-tuned on normal brain volumes and evaluated using a synthetic-lesion generator that produces anatomically plausible abnormalities. During inference, ViTMARE performs multiple reconstructions (N=100) and aggregates binary anomaly masks via majority voting, followed by morphological closing and opening to suppress spurious noise. On a test set of real images with added synthetic anomalies, ViTMARE achieves a median Dice score of 0.793, a median precision of 0.912, and a median recall of 0.748. We present a reproducible pipeline and demonstrate that combining voting-based fusion with morphological postprocessing yields robust voxel-level anomaly detection.
ViTMARE - A Vision Transformer Pipeline for Anomaly Detection in 3D Brain MRI
Peracchio L.;Corso L.;Santangelo G.;Bortolotto C.;Dagliati A.;Bellazzi R.;Nicora G.
2026-01-01
Abstract
: AI models for medical imaging often fail under dataset shifts and on underrepresented patient subgroups. Detecting out-of-distribution scans-arising from rare pathologies, atypical anatomy, or acquisition artifacts-is therefore essential for robust deployment. We introduce ViTMARE (Vision Transformer Masked Autoencoder Reconstruction Error), a volumetric anomaly-detection pipeline for 3D brain MRI that leverages Vision Transformer Masked AutoEncoders (ViTMAEs) adapted to volumetric data by treating axial slices as input channels. The model is fine-tuned on normal brain volumes and evaluated using a synthetic-lesion generator that produces anatomically plausible abnormalities. During inference, ViTMARE performs multiple reconstructions (N=100) and aggregates binary anomaly masks via majority voting, followed by morphological closing and opening to suppress spurious noise. On a test set of real images with added synthetic anomalies, ViTMARE achieves a median Dice score of 0.793, a median precision of 0.912, and a median recall of 0.748. We present a reproducible pipeline and demonstrate that combining voting-based fusion with morphological postprocessing yields robust voxel-level anomaly detection.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


