BD - Earth day 2024

Machine Learning Approaches to Enhance Diagnosis and Staging of Patients With MASLD Using Routinely Available Clinical Information

Matthew McTeer, Douglas Applegate, Peter Mesenbrink, Vlad Ratziu, Jörn M. Schattenberg, Elisabetta Bugianesi, Andreas Geier, Manuel Romero Gomez, Jean-Francois Dufour, Mattias Ekstedt, Sven Francque, Hannele Yki-Jarvinen, Michael Allison, Luca Valenti, Luca Miele, Michael Pavlides,Jeremy Cobbold, Georgios Papatheodoridis, Adriaan G. Holleboom, Dina Tiniakos, Clifford Brass, Quentin M. Anstee, Paolo Missier


Metabolic dysfunction Associated Steatotic Liver Disease (MASLD) outcomes such as MASH (metabolic dysfunction associated steatohepatitis), fibrosis and cirrhosis are ordinarily determined by resource-intensive and invasive biopsies. We aim to show that routine clinical tests offer sufficient information to predict these endpoints


Metabolic dysfunction Associated Steatotic Liver Disease (MASLD), formerly known as Non-Alcoholic Fatty Liver Disease (NAFLD) [1] is the world’s most common chronic liver disease, and with the rise in increasingly sedentary lifestyles, poses a major challenge to healthcare systems globally. It is estimated that over 25% of the global adult population has MASLD [2], which is predicted to soon be the leading cause of liver transplantation [3]. MASLD encompasses a spectrum of disease severity, ranging from isolated increased hepatic triglyceride content (steatosis; metabolic dysfunction associated steatotic liver—MASL), through hepatic inflammation and hepatocyte injury (metabolic dysfunction associated steatohepatitis—MASH) with increasing fibrosis, and ultimately to cirrhosis and/or hepatocellular carcinoma [4]. More advanced stages of hepatic fibrosis are associated with an increased risk of liver-related and all-cause mortality [5].

Materials and method

This study utilised data drawn from the LITMUS Metacohort from patients participating in the European NAFLD Registry (NCT04442334), an international cohort of NAFLD patients prospectively recruited following standardized procedures and monitoring; see Hardy and Wonders et al. for details [12]. Patients were required to provide informed consent prior to inclusion. Studies contributing to the Registry were approved by the relevant Ethical Committees in the participating countries and conform to the guidelines of the Declaration of Helsinki. The Metacohort enrolled subjects from sites in Belgium, Finland, France, Germany, Italy, the Netherlands, Spain, Sweden, Switzerland, and the UK between Jan 6, 2010, and Dec 29, 2017. Subjects were at least 18 years old, clinically suspected of having MASLD having been referred for further investigation due to abnormal biochemical liver tests and/or radiological evidence of steatosis.


The ML classifiers created used multiple variables in their decision making and were more powerful and effective at predicting these outcomes than each individual feature alone. Fig 1 compares the training set AUC achieved from univariate logistic regression models upon each of the 35 features explored in the analysis and the training set AUC achieved across all ML models created in predicting At-Risk MASH. All 9 ML models outperformed each individual variable when used in isolation. This demonstrates that the predictive power of these ML models is substantially greater than individual variables that have previously been used in predicting various MASLD outcomes. When comparing test set AUC the ML models performed less admirably, however the differences between training and test performance are small and to be expected.


Our best models achieved an AUC score reaching 0.899 in cross-validation and 0.800 in our hold-out test set for predicting At-Risk MASH, and similar performance for other endpoints. These scores largely track the performance observed in [13] and reflect a modest improvement over individual biomarkers. Interestingly, our machine learning models using ‘Core’ features significantly outperformed established markers such as FIB-4 (with AUC = 0.708 for At-Risk MASH on our test sample). Additionally, they provide similar levels of performance to the best performing specialised markers. This suggests that incremental improvement in MASLD/MASH screening is possible with established biomarker assays combined with more advanced models.


Building upon previous linear approaches to predict MASLD related endpoints, this research highlights the capability of more complex, non-linear machine learning methods in being able to accurately classify individuals of varying severity in relation to the MASLD natural progression. In particular, we have demonstrated the ability of predicting such outcomes using easily extractable and readily available information as collected from routine clinical appointments or standard blood tests to a high degree of accuracy. Through using the ML algorithm XGBoost along with missing imputation algorithm MICE and class balancing tool SMOTE upon easily accessible variables, we are able to obtain a classifier with an accuracy of 89.9% at predicting At-Risk MASH.


The LITMUS consortium, coordinated by Quentin M. Anstee ( Below are all investigators as part of the LITMUS consortium and their respective affliations:

Citation: McTeer M, Applegate D, Mesenbrink P, Ratziu V, Schattenberg JM, Bugianesi E, et al. (2024) Machine learning approaches to enhance diagnosis and staging of patients with MASLD using routinely available clinical information. PLoS ONE 19(2): e0299487.

Editor: Pavel Strnad, Medizinische Fakultat der RWTH Aachen, GERMANY

Received: December 4, 2023; Accepted: February 9, 2024; Published: February 29, 2024

Copyright: © 2024 McTeer et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: Data underpinning this study are not publicly available. The European NAFLD Registry protocol has been published in [1], including details of sample handing and processing, and the network of recruitment sites. Patient level data will not be made available due to the various constraints imposed by ethics panels across all the different countries from which patients were recruited and the need to maintain patient confidentiality. The point of contact for any enquiries regarding the European NAFLD Registry is the oversight group via email:

Funding: This work was supported by Newcastle University and Red Hat UK. This work has been supported by the LITMUS project, which has received funding from the Innovative Medicines Initiative 2 Joint Undertaking under grant agreement No. 777377. This Joint Undertaking receives support from the European Union’s Horizon 2020 research and innovation programme and EFPIA. QMA is an NIHR Senior Investigator and is supported by the Newcastle NIHR Biomedical Research Centre. This communication reflects the view of the authors and neither IMI nor the European Union and EFPIA are liable for any use that may be made of the information contained herein.

Competing interests: I have read the journal’s policy and the authors of this manuscript have the following competing interests: Quentin M. Anstee has received research grant funding from AstraZeneca, Boehringer Ingelheim, and Intercept Pharmaceuticals, Inc.; has served as a consultant on behalf of Newcastle University for Alimentiv, Akero, AstraZeneca, Axcella, 89bio, Boehringer Ingelheim, Bristol Myers Squibb, Galmed, Genfit, Genentech, Gilead, GSK, Hanmi, HistoIndex, Intercept Pharmaceuticals, Inc., Inventiva, Ionis, IQVIA, Janssen, Madrigal, Medpace, Merck, NGM Bio, Novartis, Novo Nordisk, PathAI, Pfizer, Poxel, Resolution Therapeutics, Roche, Ridgeline Therapeutics, RTI, Shionogi, and Terns; has served as a speaker for Fishawack, Integritas Communications, Kenes, Novo Nordisk, Madrigal, Medscape, and Springer Healthcare; and receives royalties from Elsevier Ltd. Jörn M. Schattenberg has served as consultant for Alentis Therapeutics, Astra Zeneca, Apollo Endosurgery, Bayer, Boehringer Ingelheim, Gilead Sciences, GSK, Ipsen, Inventiva Pharma, Madrigal, MSD, Northsea Therapeutics, Novartis, Novo Nordisk, Pfizer, Roche, Sanofi, Siemens Healthineers. Research Funding: Gilead Sciences, Boehringer Ingelheim, Siemens Healthcare GmbH. Stock Options: AGED diagnostics, Hepta Bio. Speaker Honorarium: Advanz, Echosens, MedPublico GmbH. Andreas Geier served as a speaker and consultant for AbbVie, Advanz, Alexion, AstraZeneca, Bayer, BMS, Burgerstein, CSL Behring, Eisai, Falk, Gilead, Heel, Intercept, Ipsen, Merz, MSD, Novartis, Pfizer, Roche, Sanofi-Aventis; received research funding from Intercept, Falk, Novartis. Dina Tiniakos served as consultant on behalf of the University or for ICON, Merck Greece, Madrigal, Inventiva, Histoindex, Cymabay and Clinnovate. This does not alter our adherence to PLOS ONE policies on sharing data and materials. We are not opposed to any reviewers.



Healthcare CEO & Executive Strategy SummitHealthcare CNO SummitHealthcare CMO SummitHealthcare CFO, Financial Strategy & Revenue Cycle SummitThe Healthcare Patient Experience & Engagement Summit 2024Healthcare Innovation & Transformation Summit