Nathaniel Hendrix, Rishi V. Parikh, Madeline Taskier, Grace Walter, Ilia Rochlin, Sharon Saydah, Emilia H. Koumans, Oscar Rincón-Guevara, David H. Rehkopf, Robert L. Phillips
Abstract
Post-COVID conditions (PCC) have proven difficult to diagnose. In this retrospective observational study, we aimed to characterize the level of variation in PCC diagnoses observed across clinicians from a number of methodological angles and to determine whether natural language classifiers trained on clinical notes can reconcile differences in diagnostic definitions.
Introduction
Post-COVID conditions (PCC), sometimes referred to as “long COVID,” are a collection of health conditions that affect people for three or more months after infection with SARS-CoV-2 [1]. These conditions have been challenging to study, largely due to their status as a set of “potentially overlapping entities.
Methods
We combined descriptive statistical analyses with machine learning to characterize the degree of diagnostic heterogeneity of PCC within primary care and to identify potential sources thereof. We first examined the distribution of PCC diagnoses across clinicians, which allowed for characterization of clinicians’ underlying propensity to diagnose PCC.
Results
The AFC contained 9,722,653 visits conducted by 3,845 clinicians at 519 practices with 4,724,507 unique patients from October 1, 2021, to November 1, 2023. Among these, 116,659 patients had a diagnostic code for COVID-19 and 6,116 had a diagnostic code for PCC.
Discussion
Our results revealed substantial heterogeneity in the diagnostic behavior of clinicians in primary care in the 15 months following the introduction of the ICD-10 code for PCC. While a majority of clinicians in our dataset (75.7%) did not record a single diagnostic code for PCC, others applied the diagnosis very widely.
Conclusion
In this study describing diagnostic and documentation patterns of PCC, we used a range of machine learning methods to characterize the role of clinician and practice site in predicting receipt of a PCC diagnosis and to try to identify PCC from clinical notes.
Citation: Hendrix N, Parikh RV, Taskier M, Walter G, Rochlin I, Saydah S, et al. (2025) Heterogeneity of diagnosis and documentation of post-COVID conditions in primary care: A machine learning analysis. PLoS One 20(5): e0324017. https://doi.org/10.1371/journal.pone.0324017
Editor: Jiawen Deng, University of Toronto, CANADA
Received: February 21, 2024; Accepted: April 18, 2025; Published: May 16, 2025
This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication.
Data Availability: Data cannot be shared publicly because of patient privacy. Data are available from the Stanford Medicine Center for Population Health Sciences (contact Isabella Chu at bella.chu@stanford.edu) for researchers who meet the criteria for access to confidential data.
Funding: This work was created as part of a collaborative agreement funded by the United States Centers for Disease Control and Prevention (CDC). Authors from the CDC participated in the conceptualization and conduct of this study, and in the editing of the manuscript.
Competing interests: The authors have declared that no competing interests exist.