PUBLICATIONS

  • PUBLICATIONS

A Pilot Study on Doubt Robustness of LLMs in Clinical Prediction Explanation

A Pilot Study on Doubt Robustness of LLMs in Clinical Prediction Explanation


Juhwan Choi, Sangchul Hahn, Eunho Yang 

 

We study large language models (LLMs) as clinical explanation generators and evaluate their robustness to user doubt in interactive settings. Using an in-hospital mortality prediction task on the MIMIC-III dataset, we examine how simple challenge prompts affect the consistency of LLM-generated explanations. We adopt the concept of doubt robustness and assess it by prompting models to explain risk predictions and indicate agreement, followed by doubt-inducing queries. Our results show that instruction-tuned models frequently reverse their initial stance, while reasoning-enhanced models exhibit improved but still limited stability. Further analysis suggests that LLMs rely heavily on model outputs rather than ground-truth labels, reducing explanation faithfulness. These findings highlight the need for robustness-oriented evaluation of clinical explanation systems.


 

View on article