Beyond the Hype - Bringing AI to Radiology

2.1k

Beyond the Hype – Bringing AI to Radiology

May 12, 2020

For radiologists, detecting abnormalities in medical images can be as difficult as finding a needle in a haystack. It can take hours for radiologists to go through a single scan, and there are a limited number of medical professionals trained in these tasks. The result is extremely long wait times for patients to receive test results.

In recent times, our society has slowly incorporated artificial intelligence (AI) into all aspects of our lives, from Siri to autonomous vehicles. If we can rely on AI for facial recognition in photos, then why not utilize AI for disease detection in medical images? Many researchers have set foot on this path, developing AI algorithms that screen for diseases and/or assist radiologists with diagnostics. Seeing promising AI headlines about this type of research is not uncommon: “Google AI Beats Doctors at Breast Cancer Detection—Sometimes.” “Doctors are using AI to triage covid-19 patients.” The list goes on and on. While such projects are a step in the right direction, they do not tell the whole story.

My experiences working on two such projects have shown me how much further we have to go to make the integration of AI and medicine a reality. Starting the summer of 2017, I worked with the University of Washington’s Ubiquitous Computing Lab on developing an automated AI-based hematological (blood-related) disease screening system. My research focused on designing a low-cost, mobile-phone camera microscope attachment to capture blood cell images from a standard blood smear, and developing a fully-automated image segmentation and classification system to screen for diseases in minutes rather than days. The system makes hematological disease screening as simple as taking a picture of a blood smear with any mobile phone!

Presenting my research at the 2019 Regeneron Science Talent Search.

In the summer of 2018, I worked with the Visual Attention Lab (affiliated with Harvard Medical School and Brigham & Women’s Hospital). The goal of my research was to develop deep learning models that can emulate the ability of expert radiologists to detect lung cancer from hidden “gist” signals, which seem to indicate the presence of cancer even before the cancer is visible in the scan. I developed an AI-based model that achieved an accuracy of 97.5% in diagnosing cancer when cancer nodules were visible in the scan and 87.7% accuracy when the nodules were not visible in the CT scan (early detection). This method has the potential to proactively detect cancer even before the cancerous nodules appear in the scan, significantly increasing a patient’s chances of survival.

With all of that said, at the end of the day, I approached these research projects with a computer science perspective. When we worked to bring the algorithms to the clinic, however, I quickly realized that they could not be designed without real-world medical context. Specifically, for diagnostic algorithms, an important consideration is whether to optimize for the reduction of false negatives or false positives — a tradeoff that takes place in any machine learning algorithm. On one hand, a high false negative rate is quite dangerous because patients who have a disease are told that they do not. Consequently, those patients do not end up receiving the care they need and their conditions often worsen. On the other hand, a high false positive rate results in more patients needing follow-up screening, which can further overwhelm the healthcare system. False positives also put patients through unnecessary trauma as they are told they have a disease they actually do not have. When Google tried to implement its AI system for diabetic retinopathy screening in the clinic, it optimized for minimizing false negatives; as a result, the system refused to give results when it did not have enough confidence, and patients were asked to visit a specialist on another day for a follow-up. These cases left both patients and nurses frustrated as their time was wasted.

There are also deeper, structural reasons why algorithms struggle to make their way into hospitals. First, medical professionals must understand the decision making process of the AI algorithm because they cannot diagnose a patient without proper medical justification. Currently, many AI algorithms are “black boxes” that only aim to maximize their diagnostic accuracy. If the computer scientists fail to open up these “black boxes” and determine the rationale behind the AI’s decisions, then these algorithms cannot be used in practice. Second, integrating AI models into the current clinical pipeline is a logistical nightmare. Many hospitals still use outdated systems for storing and managing patient data, and the AI models would have to be deployed to these systems. Third, medical professionals would have to be trained to use these new AI tools, which is a challenge especially as many worry that AI will take away their jobs.

All this goes to show that it is simply not enough for researchers to develop a solution in the lab and assume that it will also work in practice. Instead, experts in different disciplines must collaborate with each other to develop responsible, practical solutions. AI researchers should consult ethicists while developing algorithms to identify problems of bias and fairness, and they should consult doctors to understand the medical context in which they’re working. On the clinical side, doctors should be learning about algorithms and how to work with them as part of their medical education, and hospitals should be hiring more technologists to implement and manage these algorithms.

In recent years, the availability of datasets and powerful computers has enabled research groups to make breakthroughs in radiology, and diagnostic algorithms will continue to improve. However, we must remember that the transition of AI algorithms from proof-of-concept to practice will require a major overhaul of the healthcare system as we know it; changes that should certainly happen, but will require collaboration across many disciplines.

This article could not have been possible without the support and advice of Dr. Milind Tambe, Gordon McKay Professor of Computer Science and Director of Center for Research in Computation and Society at Harvard University; concurrently, he is also Director “AI for Social Good” at Google Research India. Prof. Tambe’s research focuses on advancing AI and multiagent systems research for Social Good.