Implementation and Prospective Real-time Evaluation of a Generalized System for in-clinic Deployment and Validation of Machine Learning Models in Radiology

James R. Hawkins, Marram P. Olson, Ahmed Harouni, Ming Melvin Qin, Christopher P. Hess, Sharmila Majumdar, Jason C. Crane


The medical imaging community has embraced Machine Learning (ML) as evidenced by the rapid increase in the number of ML models being developed, but validating and deploying these models in the clinic remains a challenge. The engineering involved in integrating and assessing the efficacy of ML models within the clinical workflow is complex. This paper presents a general-purpose, end-to-end, clinically integrated ML model deployment and validation system implemented at UCSF. Engineering and usability challenges and results from 3 use cases are presented. A generalized validation system based on free, open-source software (OSS) was implemented, connecting clinical imaging modalities, the Picture Archiving and Communication System (PACS), and an ML inference server.


The medical imaging community is embracing Machine Learning (ML) and Artificial Intelligence (AI) to develop novel predictive models. These models show promise, and have the potential to transform radiology practice and patient care, in areas ranging from data acquisition, reconstruction, and quantification, to diagnosis, treatment response, and clinical workflow efficiency [1]. While the foundation of this work is model development using retrospectively acquired datasets [2], translating AI models from research to the clinic for event-driven, prospective validation is a critical step towards model deployment for routine use in clinical care.

Materials and method

Fig 1 details the end-to-end AI inference system and networks presented in this work. Briefly, DICOM [12] images are sent from scanning modalities at time of acquisition to a DICOM router. The router directs images to the clinical Picture Archiving and Communication System (PACS) [13] and to specific inference services hosted on an on-premises server running NVIDIA Clara Deploy [14], a software platform for deploying ML pipelines. Results are exported to a dedicated instance of the XNAT web application, running on the same host. XNAT stores and displays inference results separate from the clinical record, which clinicians can still access from a PACS workstation in the reading room or other UCSF computers via a browser. Custom buttons in the Visage 7 client (Visage 7 Enterprise Imaging Platform (“Visage 7”), Visage Imaging, San Diego, CA) [15] running on PACS workstations link directly to relevant results in XNAT, where reviewer feedback is captured for use in assessing model performance or for retraining (Figs 2–4).


The system detailed above was used to deploy the 3 POC projects to support validation studies aimed at characterizing all aspects of pipeline development and integration from data flow to system performance, extensibility, engineering robustness and usability. The present section focuses on results related to characterizing the system’s viability as a general-purpose platform for supporting clinical validation of AI models for a variety of representative workflows, workloads and use cases. Specific details pertinent to the clinical use, model performance, and clinical impact of each model is beyond the scope of this paper and will be presented in separate papers.


Deploying and supporting an ML pipeline in the present framework requires software development and system engineering on multiple fronts. The model must be trained, AI inference operator built, and pipeline execution steps designed; pipeline operators performing additional calculations or data tasks must be built; XNAT plugins need to be developed to store and display result and capture user feedback; finally, data ingestion, pipeline execution, and results display must be tested with clinical data, which will differ from research data in unforeseen ways. Operator and plugin development efforts are estimated in Tables 3–4, but will vary based on skill level and experience. Collaboration with the clinical users is critical to define data display and data flow requirements. At UCSF, ci2’s Computational Core [48] supports this effort bridging the gap between scientific research, software engineering and enabling translation of AI research into the clinic.


Implementing a generalized, extensible, and scalable platform for validating and deploying AI-based pipelines in the clinic takes time and effort from a dedicated engineering team, in collaboration with clinical end users capable of providing guidance on usability and requirements. There is a considerable amount of work in system design, infrastructure setup, and software engineering to ensure high reliability and support for a diverse set of workloads and workflows, but the upfront investment does return significant value.


For their hard work in supporting the infrastructure and networking in the above system, we would like to thank: Jeff Block, Matt Denton, and Reese Webb from Radiology Clinical Infrastructure; Peter Storey and Jed Chan from Radiology Scientific Computing Services; Neil Singh and Muse Hsieh from Radiology Clinical IT Operations; and Dr. Wyatt Tellis from Radiology Innovation and Analytics. We would also like to thank Dr. Mona Flores, Dr. Sidney Bryson, Rahul Choudhury, Victor Chang, and David Bericat from NVIDIA for help deploying and developing Clara at UCSF.

Citation: Hawkins JR, Olson MP, Harouni A, Qin MM, Hess CP, Majumdar S, et al. (2023) Implementation and prospective real-time evaluation of a generalized system for in-clinic deployment and validation of machine learning models in radiology. PLOS Digit Health 2(8): e0000227.

Editor: Yuan Lai, Tsinghua University, CHINA

Received: March 7, 2023; Accepted: July 12, 2023; Published: August 21, 2023

This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.

Data Availability: All relevant data are within the manuscript.

Funding: The authors received no specific funding for this work.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: NVIDIA provided 4 T4 cards as a grant to UCSF.

Harvard Medical School - Leadership in Medicine Southeast Asia47th IHF World Hospital CongressHealthcare CNO Summit - USAHealthcare CMO Summit - USA