Transparent Medical Image AI via MONET Model

[1] Kim C, Gadgil S U, DeGrave A J, et al. Transparent medical image AI via an image–text foundation model grounded in medical literature[J]. Nature Medicine, 2024, 30(4): 1154-1165.

Overview

The study introduces MONET (medical concept retriever), an image-text foundation model designed to enhance the transparency and trustworthiness of medical artificial intelligence (AI) systems. MONET connects medical images with text and provides dense scoring on concept presence, which is crucial for various tasks in medical AI development and deployment.

Key Features of MONET

Concept Annotation: MONET can annotate medical images with semantically meaningful concepts.
Training Data: Trained on 105,550 dermatological images paired with descriptions from medical literature.
Performance: Competes with supervised models built on clinically annotated datasets.
Use Cases: Enables AI transparency across the development pipeline, including data auditing, model auditing, and interpretation.

Dermatology as a Use Case

Dermatology was chosen due to the heterogeneity in diseases, skin tones, and imaging modalities.
MONET's annotation capability was verified by board-certified dermatologists.

Technical Approach

Contrastive Learning: Utilizes an AI technique to leverage natural language descriptions directly on images.
Encoder: Transforms images and text into a lower-dimensional vector space, forcing paired elements to be close and unpaired elements to be distant.

对比学习（Contrastive Learning）

目的：对比学习是一种人工智能技术，用于使模型能够直接利用图像上的自然语言描述。
方法：通过训练，使得同一图像-文本对在表示空间中彼此靠近，而不同对的表示则彼此远离。
模型架构:

图像编码器（Image Encoder）：使用视觉变换器架构（如ViT-L/14），将输入图像转换为一个固定维度的嵌入向量。

文本编码器（Text Encoder）：采用具有多层自注意力机制的变换器架构，将文本转换为相应的嵌入向量。

数据预处理

图像：调整图像大小，进行中心裁剪和标准化处理，以符合编码器的输入要求。

文本：使用小写字节对编码进行标记化，并对超长文本进行分割

训练过程

损失函数：使用对称的交叉熵损失函数，基于余弦相似度评分。

优化器：采用Adam优化器，并使用余弦学习率调度策略。

超参数调整：通过将数据集分为训练集和验证集，选择最佳的批次大小和学习率。

自动概念注释

原理：训练完成后，MONET能够测量图像与任意文本的接近程度，用于自动注释概念。

方法：通过计算图像嵌入和概念提示嵌入之间的余弦相似度，得到概念存在分数。

数据审计

概念差异分析：利用MONET将图像集映射到共同的嵌入空间，以自然语言描述图像集之间的不同特征。

模型审计

MA-MONET：通过聚类测试集图像，并比较低性能和高性能图像集之间的概念存在分数，以识别导致模型错误的医学概念。

构建固有可解释的神经网络（Concept Bottleneck Models, CBMs）
目的：创建一个可解释的模型，使医生或开发者能够理解影响模型决策的因素。
方法：利用MONET自动注释的概念来构建瓶颈层，然后在此层上训练一个简单的线性分类器。
评估设置

预测目标：区分恶性和良性病变，以及黑色素瘤与其类似病变。

图像类型：临床图像和皮肤镜图像。

训练与测试：使用不同的训练-测试分割重复评估，以验证模型性能。

统计分析

AUROC值：通过不同的训练-测试集运行获得，并使用配对样本学生t检验来比较MONET与其他方法的性能。

临床试验评估

PROVE-AI研究：使用MONET对ADAE算法的临床试验进行复制和评估，分析与低特异性相关的概念。

数据和代码可用性

数据集：使用的是公开可访问的数据集，如ISIC、Derm7pt、Fitzpatrick 17k和DDI。

代码：分析中使用的代码可在GitHub上获得，包括数据收集、模型训练和基准研究的脚本。

Results

Automatic Concept Annotation: MONET successfully retrieves relevant clinical and dermoscopic images for various dermatological terms.
Performance Assessment: Compared favorably with supervised learning and CLIP models.
Diverse Skin Tones: MONET demonstrated consistent performance across different skin tones.
Nonclinical Concepts: Identified irrelevant artifacts that can affect AI predictions.

Data and Model Auditing

Data Auditing: MONET automatically examines datasets for irregularities, aiding in the auditing of large-scale datasets.
Model Auditing: A method called MA-MONET was developed to detect medical concepts leading to model errors.

Inherently Interpretable Models

MONET facilitates the creation of Concept Bottleneck Models (CBMs), which are inherently interpretable and allow physicians to understand factors influencing model decisions.

Real-world Application

MONET was applied to assess a clinical trial of a dermatology AI algorithm, providing insights into cases of lower specificity.

Limitations and Future Work

MONET may struggle with concepts not present in its training data.
Performance across skin tones for dermoscopic images was not examined due to dataset limitations.
MONET is not intended for diagnostic tasks and may exhibit biases present in the training data.

Conclusion

The MONET model presents a significant advancement in the transparency and interpretability of medical image AI, with potential applications in auditing, model development, and clinical deployment.

240428 文献阅读-Transparent medical image AI via an image–text foundation model grounded in medical l...