Guo, Danfeng

Applying Medical Language Models to Medical Image Analysis

2024

Guo, Danfeng
Advisor(s): Terzopoulos, Demetri

Abstract

Medical image analysis powered by deep learning computer vision models has achieved significant advancements in the past decade. Deep learning models have demonstrated remarkable capabilities in a wide range of tasks, including medical image classification, detection, and segmentation. However, the limited availability of annotations has become a persistent challenge. Annotating medical images requires specialized professional knowledge, making it a costly process. This dissertation aims to relieve the reliance on medical image annotations by leveraging medical reports directly, which are usually associated with corresponding medical images and readily available. This thesis delves into the application of vision-language models, including large vision-language models, for enhancing medical image analysis. Existing vision-language models are modified and applied for three critical tasks: disease diagnosis, disease segmentation and medical report generation. In particular, the main contributions include: (1) proposing two prompting strategies to improve the accuracy of disease diagnosis through visual question answering in large vision language models; (2) introducing a disease segmentation model using medical reports as weak supervision; (3) evaluating medical large vision-language models in terms of the hallucination in generated reports across multiple complex diseases and applying existing techniques to mitigate the diagnostic errors in generated reports.

Main Content

For improved accessibility of PDF content, download the file to your device.

UCLA

Applying Medical Language Models to Medical Image Analysis