Pancreatic cancer remains one of the deadliest malignancies, primarily because of its subtle CT appearance and frequent late-stage diagnosis. We introduce MiniGPT-Pancreas, a lightweight multimodal large language model (MLLM) that interprets natural-language queries within an interactive ChatGPT-style interface, as well as computed tomography images, and returns precise bounding-box predictions for the pancreas and associated tumors. A cascaded fine-tuning strategy was applied to MiniGPTv2, a multi-task general-purpose MLLM, with a focus on pancreas and tumor detection, using the National Institute of Health (NIH) and Medical Segmentation Decathlon (MSD) pancreas datasets. Pancreas detection achieved an average intersection over Union (IoU) of 0.57 on NIH and MSD datasets, outperforming the base MiniGPT-Pancreas model and more recent MLLMs like GLM-4.1V-9B-Base (general-purpose) and UMIT (specific to the biomedical domain). Tumor observation on MSD yielded an accuracy, precision, recall, and F1 score of all about 0.87, surpassing MiniGPT-v2, GLM-4.1V-9B-Base, and UMIT. For tumor localization, the IoU was 0.28, higher than UMIT (IoU=0.07), but lower than GLM-4.1V-9B-Base (IoU=0.48). On multi-organ detection on the AbdomenCT-1k dataset, MiniGPT-Pancreas outperformed GLM-4.1V-9B-Base and UMIT in all organs, with an IoU of 0.50 on pancreas vs. 0.43 and 0.03, respectively. MiniGPT-Pancreas was rated highly by an international group of 10 expert general surgeons (Italy, Singapore, and the UK) as a potential training tool, especially for verification (4.5/5.0), and training of young specialists (4.5/5.0) on a 5-point Likert scale. While operating on 2D slices limits volumetric context, MiniGPT-Pancreas demonstrates that compact MLLMs can rival specialized vision networks in pancreas imaging, offering an intuitive, language-driven tool for AI-assisted radiology. The code is publicly available at https://github.com/elianastasio/MiniGPTPancreas.

MiniGPT-Pancreas: Multimodal Large Language Model for Pancreas Cancer Observation and Localization in CT Images

Cerveri, Pietro
2025-01-01

Abstract

Pancreatic cancer remains one of the deadliest malignancies, primarily because of its subtle CT appearance and frequent late-stage diagnosis. We introduce MiniGPT-Pancreas, a lightweight multimodal large language model (MLLM) that interprets natural-language queries within an interactive ChatGPT-style interface, as well as computed tomography images, and returns precise bounding-box predictions for the pancreas and associated tumors. A cascaded fine-tuning strategy was applied to MiniGPTv2, a multi-task general-purpose MLLM, with a focus on pancreas and tumor detection, using the National Institute of Health (NIH) and Medical Segmentation Decathlon (MSD) pancreas datasets. Pancreas detection achieved an average intersection over Union (IoU) of 0.57 on NIH and MSD datasets, outperforming the base MiniGPT-Pancreas model and more recent MLLMs like GLM-4.1V-9B-Base (general-purpose) and UMIT (specific to the biomedical domain). Tumor observation on MSD yielded an accuracy, precision, recall, and F1 score of all about 0.87, surpassing MiniGPT-v2, GLM-4.1V-9B-Base, and UMIT. For tumor localization, the IoU was 0.28, higher than UMIT (IoU=0.07), but lower than GLM-4.1V-9B-Base (IoU=0.48). On multi-organ detection on the AbdomenCT-1k dataset, MiniGPT-Pancreas outperformed GLM-4.1V-9B-Base and UMIT in all organs, with an IoU of 0.50 on pancreas vs. 0.43 and 0.03, respectively. MiniGPT-Pancreas was rated highly by an international group of 10 expert general surgeons (Italy, Singapore, and the UK) as a potential training tool, especially for verification (4.5/5.0), and training of young specialists (4.5/5.0) on a 5-point Likert scale. While operating on 2D slices limits volumetric context, MiniGPT-Pancreas demonstrates that compact MLLMs can rival specialized vision networks in pancreas imaging, offering an intuitive, language-driven tool for AI-assisted radiology. The code is publicly available at https://github.com/elianastasio/MiniGPTPancreas.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11571/1537935
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 0
  • ???jsp.display-item.citation.isi??? 0
social impact