Artificial intelligence

Improving the ability of AI models to explain their predictions | MIT News

In advanced settings such as medical diagnosis, users often want to know what led a computer vision model to make a particular prediction, so they can decide whether to trust its output.

Concept bottleneck modeling is one method that enables artificial intelligence systems to describe their decision-making process. These methods force a deep learning model to use a set of concepts, which can be understood by humans, to make a prediction. In a new study, MIT computer scientists developed a method that leverages the model to achieve better accuracy and clearer and more concise explanations.

The concepts the model uses are usually defined in advance by human experts. For example, a doctor may suggest using concepts such as “clustered brown dots” and “distinct coloration” to predict whether a medical image shows melanoma.

But the pre-defined concepts may be useless or not detailed enough for a particular task, reducing the accuracy of the model. The new approach extracts concepts that the model has already learned while being trained to perform that particular task, and forces the model to use those, producing better explanations than conventional bottleneck models.

This approach uses a pair of special machine learning models that automatically extract information from a target model and translate it into simple language concepts. Ultimately, their techniques can turn any pre-trained computer vision model into one that can use concepts to explain its thinking.

“In a way, we want to be able to read the minds of these computer vision models. The concept bottle model is one of the ways for users to say what the model thinks and why it makes a certain prediction. Because our method uses better concepts, it can lead to higher accuracy and ultimately improve the accountability of Black-box AI models,” said lead author Antonio De Santis, a graduate student at the University of Milan and graduating in Computer Science. Artificial Intelligence Laboratory (CSAIL) at MIT.

Collaborated in a paper on the project by Schrasing Tong SM ’20, PhD ’26; Marco Brambilla, professor of computer science and engineering at the Polytechnic University of Milan; and senior author Lalana Kagal, principal research scientist at CSAIL. The research will be presented at the International Conference on Advocacy for Learning.

Building a better bottle

Concept bottleneck models (CBMs) are a popular way to improve AI specification. These techniques add an intermediate step by forcing the computer vision model to predict the concepts present in the image, and then use those concepts to make the final prediction.

This intermediate step, or “bottleneck,” helps users understand the model’s reasoning.

For example, a bird species identification model might choose the words “yellow legs” and “blue wings” before predicting a barn swallow.

But because these concepts are often pre-constructed by humans or large-scale linguistic models (LLMs), they may not fit a particular task. Furthermore, even when given a set of pre-defined concepts, the model sometimes uses redundant learned information, a problem known as information leakage.

“These models are trained to maximize efficiency, so the model may be using concepts that we are not aware of,” De Santis explained.

The MIT researchers had a different idea: Since the model was trained on a large amount of data, it might have learned the concepts needed to make accurate predictions for a specific task. They want to create a CBM by extracting this existing information and converting it into human-understandable text.

In the first step of their method, a special deep learning model called a sparse autoencoder selectively selects the most relevant features of the learned model and reconstructs them into a few concepts. Then, the multimodal LLM explains each concept in simple language.

This multimodal LLM annotates the images in the dataset by identifying which concepts are present and absent in each image. Researchers use this dataset of annotations to train a concept bottle module to recognize concepts.

They integrated this module into a target model, forcing it to make predictions using only a set of learned concepts that the researchers extracted.

Mind control

They overcame many challenges as they developed the method, from validating LLM’s annotated concepts correctly to determining whether the fuzzy autoencoder had identified human-understandable concepts.

To prevent the model from using unknown or unnecessary concepts, they limit it to only use five concepts for each prediction. This also forces the model to select the most relevant concepts and make the explanations understandable.

When comparing their method with state-of-the-art CBMs for tasks such as predicting bird species and identifying skin lesions in medical images, their method achieved significantly higher accuracy while providing more accurate descriptions.

Their method also generated concepts that were more applicable to the images in the dataset.

“We showed that extracting concepts from the original model can outperform other CBMs, but there is still a trade-off between interpretability and accuracy that needs to be addressed. Interpretable black-box models still outperform ours,” said De Santis.

In the future, researchers want to study possible solutions to the problem of information leakage, perhaps by adding more concept modules so that unnecessary concepts cannot emerge. They also plan to scale up their method by using a large multimodal LLM to describe a large training dataset, which can improve performance.

“I am excited about this work because it pushes descriptive AI in a very promising way and creates a natural bridge for symbolic AI and knowledge graphs,” said Andreas Hotho, professor and head of the Chair of Data Science at the University of Würzburg, who was not involved in the work. “By locating conceptual constraints on the internal principles of the model rather than solely on human-defined concepts, it provides a path to more faithful explanations of the model and opens up more opportunities for follow-up work with systematic information.”

This research was supported by the Progetto Rocca Doctoral Fellowship, the Italian Ministry of University and Research under the National Recovery and Resilience Plan, Thales Alenia Space, and the European Union under the NextGenerationEU project.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button