Q&A: AI will transform radiology—not by replacing radiologists, but by catching their mistakes in real time

Sponsored by vRad

vRad is the world’s largest teleradiology provider, with more than 500 general and subspecialty-trained teleradiologists who read up to 20,000 exams every day. One of the primary drivers behind the company’s success is its continued investment in AI technology. What can this state-of-the-art technology do for radiologists? How can it improve patient care? These are just some of the questions vRad’s team considers on a daily basis.

vRad launched a pilot program in 2020 to investigate the use of AI as a quality assurance (QA) tool. The practice sees significant potential in this field—a way to revolutionize radiology by catching misses before patient care has been affected.

Chief Medical Officer Benjamin W. Strong, MD, has been leading vRad’s AI efforts since the beginning. He sat down with us for an extensive interview, addressing the current landscape of AI in radiology and highlighting why QA is one of the company’s biggest priorities.

The full conversation can be read below:

Tell me about the importance of AI in the context of QA. Why is this so important for radiologists going forward?

Using AI for QA would provide a backstop or safety net for human error. You’re not changing anything when you develop QA models—you’re adding value to the process.

It’s understandably difficult for some radiologists to acknowledge, but every radiologist does make mistakes. If a radiologist doesn’t know about those mistakes, it is because his or her work isn’t being robustly overread. Any mistake is simply going unnoticed.

Multiple studies have backed up this very point. Wu et al., for example, reported in Radiology a major discrepancy rate of 2.4% when reading CT studies. And in the Journal of the American College of Radiology, Wong et al. looked at more than 124,000 exams interpreted by teleradiologists, identifying an individual discordant rate that ranged from 0.70% to 1.41%. Another study by Soffa et al., also in the Journal of the American College of Radiology, found that a team of 26 radiologists had a general disagreement rate of 3.4%.

With these statistics in mind, our own radiologists have their work overread approximately 20% of the time, with the local radiologist reviewing our preliminary report from the night before. Any discrepancy between the original and final report is going to come back to us through our QA system. Our own radiologists have error rates well below the published literature—in fact, their rate of significant errors is 1.2 or 1.3 for every 1,000 studies they read, or 0.12%-0.13%. There are many reasons for this but that is for another article. Consider that vRad’s specialists read 15,000 to 20,000 studies each day and you can see why QA would be so valuable on a daily basis.

In addition, QA data—and other similar types of data—help inform our decisions about which pathologies our AI models should address. So not only do we know that, yes, errors occur—we also know which errors are made most frequently and which errors are most likely to result in an unsatisfactory patient outcome. Our pilot program already includes AI models for critical pathologies such as aortic dissection, pulmonary embolism and pneumoperitoneum. We’re also working on models that cover superior mesenteric artery occlusion and epidural abscess. The decision was made to start with those specific pathologies for good reason: our data showed they were the most common life-threatening conditions that were regularly associated with significant, correctable mistakes.

Historically, many radiology QA programs have only reviewed 1–5% of radiology reports. But with these life-threatening conditions—areas such as aortic dissection, pulmonary embolism and pneumoperitoneum—the QA models we’re piloting mean that the overread rate could be 100% for studies that potentially contain those pathologies. That’s going to catch a lot of errors that would have otherwise been missed.

How do radiologists and health systems react when they learn something has been identified that they may have missed?

Radiologists are universally thankful. Many of them, when they see what our pilot QA models have caught, have actually gone to the extent of calling me to personally thank us. They feel like we really have their backs.

When it’s something in an emergency room or the hospital, they’re typically too busy to say much of anything. We did have one ER doctor who I’ll never forget—I called him with a really critical finding and was quite worried about the patient. This doctor was so interested in our AI model that we just couldn’t get to all his questions so he could take care of his patient. It showed that physicians are recognizing what our AI models can do for them to catch issues quickly enough to help patients—not two weeks later when it’s too late.

You have extensive experience working with AI. What lessons have you and your colleagues learned over the years?

We have learned a lot during these last several years. For instance, we know that AI, even at its relatively advanced state, is still the most effective when applied to a binary question. Something like, “Is there an aortic dissection or is there not an aortic dissection?” People who expect AI to come to a diagnostic conclusion—"this is sarcoidosis,” for example—are going to be disappointed.

Within that binary question, however, an AI model is capable of extraordinary subtlety. I initially thought AI would only be capable of finding large or clearly visible abnormalities, but that is not the case. These models are capable of exquisite sensitivity in detecting imaging findings that even a focused, highly competent radiologist would find difficult.

We have also learned that it is best to take a purposefully careful approach to development as we continue to learn the nuances of what AI means to the clinical setting. For instance, if an AI model identifies a pulmonary embolism immediately before a radiologist has even read the exam, should you tell the radiologist right away? Or could that distract them and make them not look as closely as they typically would, potentially missing another important finding? We just don’t know the answers to these questions yet, because they haven’t been properly studied. It’s one reason we are so interested in developing AI as a QA tool right now—QA is something that won’t create the potential for bias when the radiologist is performing their read.

What kind of impact do you see AI having on radiologist workflow?

I see a lot of potential down the road for really affecting radiologist workflow—it will be able to do things like automatically capture measurements and add those numbers directly to the radiology report. In the future, radiology workflow could be radically different due to AI. Those types of changes are definitely coming.

However, for now, we’re more focused on the life-threatening situations. We want to save lives by preventing the misdiagnosis of serious conditions such as aortic dissection or pulmonary embolism. That remains our No. 1 priority.

To see how vRad AI models are accelerating care delivery today for specific critical pathologies like intracranial hemorrhage (ICH), pulmonary embolism, pneumoperitoneum, and pneumothorax, visit the Radiology AI page at vRad.com

Michael Walter, Managing Editor

Michael has more than 16 years of experience as a professional writer and editor. He has written at length about cardiology, radiology, artificial intelligence and other key healthcare topics.