AI classifies breast masses found on ultrasound as well as radiologists

A convolutional neural network (CNN) can differentiate between benign and malignant breast masses on ultrasound images with high accuracy, according to new research published in the Japanese Journal of Radiology. The CNN model was found to perform as well, or even better, than human radiologists.

For training data, the authors gathered images from 235 adult patients who underwent breast ultrasound at a single institution from January 2010 to December 2017. The mean patient age was 55 years old. While 96 masses were found to be benign, another 144 were malignant.

The CNN was then tested on images featuring 48 benign masses and 72 malignant masses. Three radiologists also read the images. Overall, the CNN model had a sensitivity (0.958), specificity (0.875) and accuracy (0.925) higher than the three radiologists, though one radiologist had a sensitivity (0.917) that came quite close to matching the model.

The CNN model’s area under the ROC curve (0.913) was higher than that of a radiologist who had four years of experience as a breast imager and “comparable” to the performance of the other two radiologists, who had more experience.

“In our study, we demonstrated that breast masses could be differentiated using ultrasound using a deep learning method with a CNN model, using deep learning with multiple hidden layers,” wrote author Tomoyuki Fujioka, MD, department of radiology at Tokyo Medical and Dental University in Japan, and colleagues. “We found that the CNN model was helpful in distinguishing between benign and malignant masses and showed good diagnostic performance.”

The authors also noted that the AI interpreted studies much faster than the radiologists. The radiologists worked at a rate of 9.5 to 17 seconds per case, but the CNN read studies within 1 second per case.

Fujioka and colleagues observed that the CNN model does not necessarily “think” just like a radiologist. It diagnosed masses as BI-RADS categories 2, 3 or 5 more frequently than the radiologists, for instance, and the interobserver agreement between just the human radiologists was greater than any correlation between the AI and the radiologists.

“We must assume that the CNN models and radiologists find and evaluate completely different aspects of the images,” the authors wrote. “Human thought levels are limited to several dimensions. Conversely, existing deep learning technology can refer ‘thoughts’ up to hundreds of dimensions. This is known as ‘black box problem,’ in which it is impossible for us to understand the process of how deep learning has reached any particular answer and the cause of false positive or negative.”

The researchers did note that their study had limitations. For example, the results came from a single institution, meaning larger studies from multiple facilities are still necessary. Also, the images used for the research were converted to 256 x 256 pixels, which could have resulted in “a loss of information” and impacted the CNN model’s diagnostic performance.