From self-driving cars, to home assistants with voice recognition, it seems like every day there is some new application of artificial intelligence that is insinuating itself into our lives.

Although it is not as evident, medical device companies have been quietly implementing artificial intelligence for over 15 years, primarily in imaging systems. From products that enhance radiology images to assist clinicians’ reviews to those that highlight interesting regions of tissue or unusual cell populations in histopathology or hematology systems, medical devices using machine learning have been carefully marketed as improving the efficiency and quality of patient outcomes, while not straying into making diagnoses.

As consumer uses of machine learning grows, innovative medical device companies are introducing FDA-approved products using similar technologies as those employed by Facebook, Amazon and Google, and at the same time are moving from merely assistive systems to those that are making clinical suggestions. Some examples include:

  • Digital pathology systems that enable remote access to digitized patient samples (Philips WSI)
  • Systems that harness the power of image analysis in the cloud (Arterys)
  • Systems that identify features associated with known pathologies and thus provide indications for diagnosis (Quantitative Insights)

The obvious difference between medical devices and consumer products is the regulatory environment.

The regulatory environment for machine learning applications in medical devices

In its 2012 guidance on Computer-Assisted Detection Devices in Radiology, the FDA distinguished between Computer-Assisted Detection (CADe) devices and Computer-Assisted Diagnosis (CADx) devices, based on their intended use (not on the algorithms or other computational techniques that they might employ).

  • A CADe device is intended to identify, mark, highlight, or in any other manner direct attention to an area of interest on an image for clinician review.1
  • A CADx device is intended to provide an assessment of disease or other conditions in terms of the likelihood of the presence or absence of disease, or are intended to specify disease type (i.e., specific diagnosis or differential diagnosis), severity, stage, or intervention recommended.1

From a product design perspective, both CADe and CADx devices may use similar machine learning algorithms, but they will differ in how the results of the algorithms are presented to the operator, and how the operator is directed to use those results.

CADe devices offer modest productivity gains to clinicians by enhancing and highlighting areas of interest, but by their nature any productivity gains are limited. The promise of future CADx devices is they can make a diagnosis and thereby eliminate the need for clinician review except in the most ambiguous cases. The productivity gains for such devices would be an order of magnitude greater than what is now available from CADe devices that still require clinician review of each case.

While that future is still some way off, there are systems that have made a step into the CADx domain. In a widely reported ruling in July 2017, the FDA granted the Quantx system from Quantitative Insights a De Novo Classification for a new Class II device: “Radiological computer-assisted diagnostic (CADx) software for lesions suspicious for cancer”.2 The Quantx system utilizes machine learning and other analytical techniques to assess the similarity of breast cancer specimens to a database of known breast cancer cases with known pathologies. By suggesting a diagnosis, the Quantx system has clearly moved into the realm of CADx systems, which has been acknowledged by the FDA. The FDA ruling will allow other machine learning systems that analyze cancer specimens and make diagnostic indications to use the Quantx system as a predicate device for 510(k) submission.

Lessons from developing machine learning algorithms

The machine learning algorithms typically employed by these medical devices are “supervised” algorithms, meaning the algorithm is presented with annotated images having a known “ground truth” (e.g., known normal or disease states) and the algorithm is “trained” to recognize these states. Once trained, when a new image is presented to it, the algorithm matches the image to the most similar trained state.

Although different machine learning algorithms are suitable for different types of data, any algorithm can only be as good as the ground truth data on which it is trained. Selection and curation of training data sets is, therefore, a major undertaking, especially if it is to cover a large number of disease states from a wide range of patients captured under different conditions, and on different devices. The curation of such data in particular is a manual operation and hence error-prone. Errors in training data can lead to sub-optimally performing algorithms, and so particular attention should be paid to multiple experts annotating training and test data sets.

Medical device manufacturers will typically partner with institutions that are presented with a wide variety of rare disease states, and under appropriate privacy protections, gain access to the necessary samples. Without the full range and variety of samples, algorithms will be “overtrained” on a narrow selection of samples and will appear to perform better during development than when deployed in the clinic. A highly varied data set, on the other hand, will generally prove more challenging for an algorithm, and algorithm developers will need to work closely with clinical experts to understand the diagnostically distinguishing features in their samples.

Closed systems, where the medical device manufacturer controls both the data acquisition hardware and the machine learning software (and often reagents and sample preparation protocols), eliminate one source of variability (the hardware platform). They are therefore easier to develop, verify and validate. Where hardware and reagents are standardized, algorithms provide better and more accurate outcomes.

The challenge for digital health technology

Under the existing model of medical device regulation, a machine learning algorithm will be verified and validated, and then submitted for approval. Once approved, modifications to the algorithm could potentially be time-consuming and costly.

That model is in stark contrast to the consumer experience, where the tech titans can regularly deploy upgraded voice or face recognition algorithms, and consumers expect continuous improvement.

The FDA has acknowledged that there is a disconnect between how quickly software can be iterated, and the long cycle of approvals that are currently part of medical device approvals. It has, therefore, started exploring models under which medical device software might be released more quickly.3

Despite the current constraints, we believe that the clear productivity gains available from well-designed and targeted machine learning systems will drive innovation in medical device design. The advances that are being seen in radiology are directly applicable to other medical imaging disciplines, but must be applied through user-centric designs that drive uptake by end users. Designed poorly, machine learning or other statistical techniques can overwhelm users with difficult-to-interpret data. A more sensitive approach is to design in collaboration with end users, where the users’ needs are foremost, and the algorithms are truly the “ghost in the machine.”