Man vs. Machine: AI matches third-year radiology residents at reading chest x-rays in large study

It’s possible to build artificial intelligence algorithms that can match or even exceed the performance of third-year radiology residents at reading the most common imaging exam in emergency departments.

That’s according to a large-scale study out of IBM Research and USC, published Friday in JAMA Network Open. An increasing number of imaging orders and high-resolution scanners have combined to saddle radiologists with heavy workloads in the ED. But AI can help to lighten this load by performing preliminary interpretations and targeting the most obvious clinical concerns, noted author Joy Wu and colleagues.

The research team proved this theory by creating an algorithm to estimate findings on frontal chest radiographs, trained using more than 342,000 images from multiple hospitals. Matching the program up against five physicians, Wu et al. found the system went toe-to-toe with rad residents, with no difference in sensitivity between the two. Specificity and positive predictive value, however, were “statistically higher” for the algorithm, they concluded.

“These findings suggest that well-trained AI algorithms can reach performance levels similar to radiology residents in covering the breadth of findings in [anteroposterior] frontal chest radiographs,” Wu, a postdoctoral radiology researcher with IBM, and co-authors wrote Oct. 9. This, they added, “suggests there is the potential for the use of AI algorithms for preliminary interpretations of chest radiographs in radiology workflows to expedite radiology reads, address resource scarcity, improve overall accuracy, and reduce the cost of care.”

To reach their conclusions, IBM and the University of Southern California, Los Angeles, recruited five radiology residents from multiple institutions. They then compared the performance of fledgling physicians with their AI program at interpreting a separate set of 1,998 chest x-rays.

Bottom line: The algorithm notched a mean image-based sensitivity of 0.716, while third-year radiology residents landed at 0.72. Positive predictive value, meanwhile, was 0.73 for AI versus 0.682 for physicians, while specificity was 0.98 versus 0.973, respectively. Residents seemed to perform better at locating subtle anomalies, including masses and nodules, misplaced lines and tubes, and consolidation. And AI excelled at detecting more basic findings such as clearly visible anomalies including pleural effusion and pulmonary edema.

“Overall, this study points to the potential use AI systems in future radiology workflows for preliminary interpretations that target the most prevalent findings, leaving the final reads performed by the attending physician to still catch any potential misses from the less-prevalent, fine-grained findings,” the research team concluded. “Having attending physicians quickly correct the automatically produced reads, we can expect to significantly expedite current dictation-driven radiology workflows, improve accuracy and ultimately reduce the overall cost of care.”

Read much more on their research project in JAMA Network Open here.