: AI models for medical imaging often fail under dataset shifts and on underrepresented patient subgroups. Detecting out-of-distribution scans-arising from rare pathologies, atypical anatomy, or acquisition artifacts-is therefore essential for robust deployment. We introduce ViTMARE (Vision Transformer Masked Autoencoder Reconstruction Error), a volumetric anomaly-detection pipeline for 3D brain MRI that leverages Vision Transformer Masked AutoEncoders (ViTMAEs) adapted to volumetric data by treating axial slices as input channels. The model is fine-tuned on normal brain volumes and evaluated using a synthetic-lesion generator that produces anatomically plausible abnormalities. During inference, ViTMARE performs multiple reconstructions (N=100) and aggregates binary anomaly masks via majority voting, followed by morphological closing and opening to suppress spurious noise. On a test set of real images with added synthetic anomalies, ViTMARE achieves a median Dice score of 0.793, a median precision of 0.912, and a median recall of 0.748. We present a reproducible pipeline and demonstrate that combining voting-based fusion with morphological postprocessing yields robust voxel-level anomaly detection.

ViTMARE - A Vision Transformer Pipeline for Anomaly Detection in 3D Brain MRI

Peracchio L.;Corso L.;Santangelo G.;Bortolotto C.;Dagliati A.;Bellazzi R.;Nicora G.
2026-01-01

Abstract

: AI models for medical imaging often fail under dataset shifts and on underrepresented patient subgroups. Detecting out-of-distribution scans-arising from rare pathologies, atypical anatomy, or acquisition artifacts-is therefore essential for robust deployment. We introduce ViTMARE (Vision Transformer Masked Autoencoder Reconstruction Error), a volumetric anomaly-detection pipeline for 3D brain MRI that leverages Vision Transformer Masked AutoEncoders (ViTMAEs) adapted to volumetric data by treating axial slices as input channels. The model is fine-tuned on normal brain volumes and evaluated using a synthetic-lesion generator that produces anatomically plausible abnormalities. During inference, ViTMARE performs multiple reconstructions (N=100) and aggregates binary anomaly masks via majority voting, followed by morphological closing and opening to suppress spurious noise. On a test set of real images with added synthetic anomalies, ViTMARE achieves a median Dice score of 0.793, a median precision of 0.912, and a median recall of 0.748. We present a reproducible pipeline and demonstrate that combining voting-based fusion with morphological postprocessing yields robust voxel-level anomaly detection.
2026
9781643686615
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1553856
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact