AI models may be using “demographic shortcuts” when making medical diagnostic evaluations

ai models may be using “demographic shortcuts” when making medical diagnostic evaluations

futuristic_techno_design_on_background_of_supercomputer_data_center_-_Image_-_Timofeev_Vladimir_M1_402c068791b640469e416c4f55d84afe-620x480.jpg

Artificial intelligence models often play a role in medical diagnoses, especially when it comes to analyzing images such as X-rays. However, studies have found that these models don't always perform well across all demographic groups, usually faring worse on women and people of color.

These models have also been shown to develop some surprising abilities. In 2022, MIT researchers reported that AI models can make accurate predictions about a patient's race from their chest X-rays -; something that the most skilled radiologists can't do.

That research team has now found that the models that are most accurate at making demographic predictions also show the biggest "fairness gaps" -; that is, discrepancies in their ability to accurately diagnose images of people of different races or genders. The findings suggest that these models may be using "demographic shortcuts" when making their diagnostic evaluations, which lead to incorrect results for women, Black people, and other groups, the researchers say.

"It's well-established that high-capacity machine-learning models are good predictors of human demographics such as self-reported race or sex or age. This paper re-demonstrates that capacity, and then links that capacity to the lack of performance across different groups, which has never been done," says Marzyeh Ghassemi, an MIT associate professor of electrical engineering and computer science, a member of MIT's Institute for Medical Engineering and Science, and the senior author of the study.

The researchers also found that they could retrain the models in a way that improves their fairness. However, their approached to "debiasing" worked best when the models were tested on the same types of patients they were trained on, such as patients from the same hospital. When these models were applied to patients from different hospitals, the fairness gaps reappeared.

I think the main takeaways are, first, you should thoroughly evaluate any external models on your own data because any fairness guarantees that model developers provide on their training data may not transfer to your population. Second, whenever sufficient data is available, you should train models on your own data."

Haoran Zhang, MIT graduate student and one of the lead authors of the new paper

MIT graduate student Yuzhe Yang is also a lead author of the paper, which will appear in Nature Medicine. Judy Gichoya, an associate professor of radiology and imaging sciences at Emory University School of Medicine, and Dina Katabi, the Thuan and Nicole Pham Professor of Electrical Engineering and Computer Science at MIT, are also authors of the paper.

Removing bias

As of May 2024, the FDA has approved 882 AI-enabled medical devices, with 671 of them designed to be used in radiology. Since 2022, when Ghassemi and her colleagues showed that these diagnostic models can accurately predict race, they and other researchers have shown that such models are also very good at predicting gender and age, even though the models are not trained on those tasks.

"Many popular machine learning models have superhuman demographic prediction capacity -; radiologists cannot detect self-reported race from a chest X-ray," Ghassemi says. "These are models that are good at predicting disease, but during training are learning to predict other things that may not be desirable." In this study, the researchers set out to explore why these models don't work as well for certain groups. In particular, they wanted to see if the models were using demographic shortcuts to make predictions that ended up being less accurate for some groups. These shortcuts can arise in AI models when they use demographic attributes to determine whether a medical condition is present, instead of relying on other features of the images.

Using publicly available chest X-ray datasets from Beth Israel Deaconess Medical Center in Boston, the researchers trained models to predict whether patients had one of three different medical conditions: fluid buildup in the lungs, collapsed lung, or enlargement of the heart. Then, they tested the models on X-rays that were held out from the training data.

Overall, the models performed well, but most of them displayed "fairness gaps" -; that is, discrepancies between accuracy rates for men and women, and for white and Black patients.

The models were also able to predict the gender, race, and age of the X-ray subjects. Additionally, there was a significant correlation between each model's accuracy in making demographic predictions and the size of its fairness gap. This suggests that the models may be using demographic categorizations as a shortcut to make their disease predictions.

The researchers then tried to reduce the fairness gaps using two types of strategies. For one set of models, they trained them to optimize "subgroup robustness," meaning that the models are rewarded for having better performance on the subgroup for which they have the worst performance, and penalized if their error rate for one group is higher than the others.

In another set of models, the researchers forced them to remove any demographic information from the images, using "group adversarial" approaches. Both of these strategies worked fairly well, the researchers found.

"For in-distribution data, you can use existing state-of-the-art methods to reduce fairness gaps without making significant trade-offs in overall performance," Ghassemi says. "Subgroup robustness methods force models to be sensitive to mispredicting a specific group, and group adversarial methods try to remove group information completely."

Not always fairer

However, those approaches only worked when the models were tested on data from the same types of patients that they were trained on -; for example, only patients from the Beth Israel Deaconess Medical Center dataset.

When the researchers tested the models that had been "debiased" using the BIDMC data to analyze patients from five other hospital datasets, they found that the models' overall accuracy remained high, but some of them exhibited large fairness gaps.

"If you debias the model in one set of patients, that fairness does not necessarily hold as you move to a new set of patients from a different hospital in a different location," Zhang says.

This is worrisome because in many cases, hospitals use models that have been developed on data from other hospitals, especially in cases where an off-the-shelf model is purchased, the researchers say.

"We found that even state-of-the-art models which are optimally performant in data similar to their training sets are not optimal -; that is, they do not make the best trade-off between overall and subgroup performance -; in novel settings," Ghassemi says. "Unfortunately, this is actually how a model is likely to be deployed. Most models are trained and validated with data from one hospital, or one source, and then deployed widely."

The researchers found that the models that were debiased using group adversarial approaches showed slightly more fairness when tested on new patient groups that those debiased with subgroup robustness methods. They now plan to try to develop and test additional methods to see if they can create models that do a better job of making fair predictions on new datasets.

The findings suggest that hospitals that use these types of AI models should evaluate them on their own patient population before beginning to use them, to make sure they aren't giving inaccurate results for certain groups.

The research was funded by a Google Research Scholar Award, the Robert Wood Johnson Foundation Harold Amos Medical Faculty Development Program, RSNA Health Disparities, the Lacuna Fund, the Gordon and Betty Moore Foundation, the National Institute of Biomedical Imaging and Bioengineering, and the National Heart, Lung, and Blood Institute.

Source:

Massachusetts Institute of Technology

Journal reference:

Yang, Y., et al. (2024). The limits of fair medical imaging AI in real-world generalization. Nature Medicine. doi.org/10.1038/s41591-024-03113-4.

OTHER NEWS

51 minutes ago

Birmingham team achieve world record internet speed 4.5m times faster than broadband

51 minutes ago

10-year-old boy riding bike killed in hit-and-run crash on Fourth of July, investigators say

51 minutes ago

Polls open in Iran run-off presidential election

51 minutes ago

Turkey defender Merih Demiral handed Euro 2024 ban over ‘Grey Wolves’ gesture

51 minutes ago

Jude Bellingham cleared to play in England’s Euro 2024 quarter-final

51 minutes ago

General election results – live: Sunak ‘sorry’ as he resigns as Tory leader after Labour’s landslide win

51 minutes ago

Lucy Letby: Timeline of killer nurse’s attacks on babies and investigation

51 minutes ago

This Dyson supersonic hairdryer deal is unmissable (and selling fast)

51 minutes ago

Singapore stocks fall in tandem with regional indices; STI down 0.9%

51 minutes ago

Six BRS MLCs join Cong in Telangana in latest series of defections

51 minutes ago

University Sains Islam Malaysia awards honourary doctorate to Al-Azhar Grand Imam

51 minutes ago

PM lauds India’s highest-ever growth in defence production

51 minutes ago

Cricket-Champions Trophy headlines Pakistan's home season but India uncertainty remains

51 minutes ago

ITV Emmerdale Ella Forster's return confirmed amid exit fears for Paula Lane

51 minutes ago

The Stunning Reversal of Trump’s Legal Fortunes

51 minutes ago

Joey Carbery completes move to Bordeaux

51 minutes ago

Captured Ukrainian Soldier Reveals Battalion's Decimation

51 minutes ago

Embattled Biden makes latest gaffe saying he’s the ‘first Black woman to serve with a Black president’

54 minutes ago

Woman falls off boat, drowns in Lake Conroe on Fourth of July, Montgomery County constable's office says

57 minutes ago

Pakistani gamer makes history in Call of Duty

57 minutes ago

Rohingya Muslims in Myanmar’s restive Rakhine state at risk of ‘genocidal violence’

57 minutes ago

All About Julia Roberts' Taylor Swift Fandom: From Joining Her Onstage to Meeting Travis Kelce

57 minutes ago

MotoGP German GP: Martin leads Marquez, Vinales in first practice

57 minutes ago

Minister of Human Resources and Emiratisation continues reviewing private sector’s compliance with Midday Break

57 minutes ago

Fabrizio Romano: Man City agree deal to sign talented gem who's left Spurs

57 minutes ago

PSX survives pressure, ends slightly up

57 minutes ago

Damien Duff: ‘If it was up to some agents in this country they’d have got Gav to sign for the Dog and Duck’

57 minutes ago

Public support for nuclear strikes in Ukraine reaches alarming level

57 minutes ago

Crystal clear diagnosis: Putin is mentally ill

57 minutes ago

EU sends warning as Hungary’s Orban reportedly meeting Putin days after surprise Ukraine visit

57 minutes ago

Video: 'I am sorry, I gave this job my all': Rishi Sunak bids goodbye to No10 with tribute to the 'sacrifices his family have made' as emotional wife Akshata Murty watches on - before King formally accepts his resignation

57 minutes ago

Jude Bellingham CLEARED to play against Switzerland as England star is handed £25,000 fine and a suspended one-match ban by UEFA for crotch-grabbing celebration

1 hour ago

Premier Lesufi sounds ‘death knell’ for Gauteng, say critics

1 hour ago

Huddersfield sign Bristol Rovers midfielder Evans

1 hour ago

Allister unseats Paisley in shock victory in North Antrim

1 hour ago

Huge Man Utd controversy caused real problem for Gary Lineker on Match of the Day

1 hour ago

The 4 times Kylian Mbappe has faced Cristiano Ronaldo – Who has the stellar unbeaten record?

1 hour ago

‘Subway guy’ Jared Fogle has gone from hawking sandwiches on TV to making them in prison — and now he’s broke, too

1 hour ago

Prince William is spotted zipping around Windsor Castle on electric scooter - after 'buying one last year to take on trips to see the King'

1 hour ago

Jadon Sancho now has big decision to make after Erik ten Hag signs new Man Utd deal