Collective intelligence: 15 heads are better than one for mammography interpretation

Radiologists using collective intelligence (CI) methods consistently produce more accurate breast cancer diagnoses than a single radiologist, according to recent research published by PLOS ONE.

Max Wolf, of the Leibniz Institute of Freshwater Ecology and Inland Fisheries in Berlin, and colleagues conducted the study, asking groups of as many as 15 radiologists to analyze a mammogram and classify it as “recall” or “no recall.”

Then, three common CI methods—majority, quorum, and weighted quorum—were put to the test.

For majority CI, a patient was recalled when a majority of the group’s assessments were “recall.” For quorum CI, a patient was recalled when the fraction of the “recall” assessments was higher than a pre-established quorum threshold. Weighted quorum CI is similar to quorum, with the votes of the radiologists being weighed according to their own accuracy.

The resulting diagnosis of each group was then compared to the single radiologist in the group who was determined to be the most accurate. And across the board, these CI methods resulted in more accurate diagnoses than the single radiologist.

“We found that, compared to single radiologists, any of these CI-rules both increases true positives (i.e., recalls of patients with cancer) and decreases false positives (i.e., recalls of patients without cancer), thereby overcoming one of the fundamental limitations to decision accuracy that individual radiologists face,” the authors wrote.

For all three CI rules, the true positive rate, false positive rate, and accuracy rates were consistently equal to or better than a single radiologist’s performance for groups of three, five, seven, nine. 11, 13 or 15 radiologists. Even-numbered groups were avoided to ensure there would be no “ties.”

The authors compared their findings with common practices in the U.S., which involve a single radiologist’s interpretation working alongside computer-aided detection (CAD) to come up with a final diagnosis.  

“Compared to single reading without CAD, this practice generally increases true positives while also increasing false positives,” the authors wrote. “In contrast, our findings suggest that any of the three CI-rules can increase true positives and decrease false positives simultaneously.”

And while as many as 15 opinions were often used, Wolf et al. noticed a distinct patterns with the effects of group size on accuracy.

“Interestingly, gains achieved from larger group sizes level off around a group size of nine, after which adding more radiologists only has a marginal effect,” the authors wrote. “We stress that even relatively small group sizes can achieve substantial performance improvements.”

Using CI rules does take more time, energy and resources, but Wolf and his colleagues believe using the methodologies in some form or another is still better than not using them at all.

“Of course, viewing time of specialists is costly and has to be taken into account,” the authors wrote. “In fact, a substantial proportion of mammograms may be unambiguous and may thus not require more than two independent assessments. In such cases, one may envisage a decision tree in which a mammogram first gets assessed independently by two radiologists, and only in cases of disagreements is it evaluated by using the above CI-rules.”