Accurate Data is the key for Reliable AI with Healthcare Systems

Dr. Varadraj Gurupur, Professor, School of Global Health Management and Informatics, University of Central Florida

Use of Artificial Intelligence (AI) systems in healthcare is on the rise. While AI systems can be a much-needed panacea for process improvement in healthcare, it's important to note that an AI system is only as good as the data fed into its decision-making process.

In the recent past, the use of artificial intelligence (AI) in decision-making has become prominent. This is especially true with healthcare systems. The use of AI within the healthcare ecosystem is gaining popularity. This trend was possible due to the HITECH Act of 2009, which forced the digitisation of the healthcare ecosystem in the United States. While digital health has its own inherent challenges, such as various facets and manifestations of the digital divide, the use of AI, if not handled well, can only create more problems. Here, we must understand that AI systems developed for any purpose are only as good as the knowledge contained in them. It must be noted that this knowledge is created by the information resulting from the data captured from different resources. 

The reliability of these data sources is key to the generation of accurate knowledge. If these data sources were not reliable, it would result in an equally unreliable system. Therefore, the reliability of data and its associated sources is paramount to the development of reliable AI systems. In this context, it must be noted that in the recent past, there has been a debate on the generation of synthetic data for the purpose of decision-making and developing AI systems. IBM defines synthetic data as “artificial data designed to mimic real-world data.” It is common knowledge that synthetic data is commonly used in developing models for AI-based decision-making. Based on this definition of synthetic data and other similar definitions, we must identify the fact that synthetic data is not real data that is derived from the real world. 

It must be noted that reliable decision-making is extremely critical within healthcare systems. Especially, because inaccurate decision-making leading to misdiagnosis or other clinical errors can potentially harm the patient, thereby resulting in an increase in healthcare costs. Furthermore, this situation can also lead to lawsuits and possible loss of revenue for healthcare providers. Here, we need to be careful in identifying the fact that synthetic data can be useful in training systems when real data is not available. However, it must be noted that we are only resisting the use of synthetic data in the use of analysis that would lead to policy making for healthcare providers, non-profit organisations, and government agencies. 

While it must be noted that AI-based systems are inherently biased, as described by Gurupur and Wan (Gurupur and Wan, 2020). This bias is based on several factors, such as the skewness of real data used to generate knowledge within AI systems, and many other such factors. This seminal work informs the readers of several inherent reasons why AI systems are inherently biased. It must be noted that these biases can be of higher or lower magnitude based on how these systems were developed, on the data that gets transformed into information and ultimately knowledge that drives these systems. It must be noted that even exceptionally good systems can underperform many times if the user is not well-trained in using those systems. Here, we must take into consideration the fact that self-learning AI systems could malfunction if the user is not using the system the right way, thereby creating a feedback loop that is inaccurate, leading to the injection of inaccurate data or information back into the system. If we decide to perceive this situation through the lens of inherent bias, we can conceptualise the idea that learning biases can lead to inherent bias that leads to the situation of the system underperforming. Now, for a moment, let’s take into consideration an AI system that is developed to support evidence-based medicine. If this system is used by a user or a group of users suffering from a lack of training in using the system, and who are also responsible for providing feedback that leads ultimately to the development of knowledge used by the system, we can clearly ascertain that the system might underperform. This underperformance might result in possible harm to the patients. 

While discussing the accuracy of data, we should also consider the idea of using incomplete data to develop systems for clinical decision support. Incomplete data can also be equally harmful since it can lead to incomplete information and a system that can provide inaccurate or debatable outcomes. A key aspect of incompleteness with respect to electronic health records is that there is no clear standard for quantifying incompleteness. Although it must be noted, that Nasir (Nasir et al., 2016) has proposed a method to quantify incompleteness of electronic health records. Additionally, some other researchers have further explored this problem. It must be emphasised that data is transformed into information that is converted into knowledge. If the data used is incomplete, the quality of knowledge generated will also be debatable. Therefore, ascertaining the completeness of data used for decision support is very critical. 

The use of synthetic data only complicates these systems by adding more unnecessary bias. It must be taken into consideration that this bias, if applied to diagnosis and treatment plans, will exponentially increase the possibility of harm to the patient. Therefore, we can conclude that the use of synthetic data only complicates the possibility of harm and loss, with exceptions to its use in training systems, which is also debatable. 

From a different perspective, an interesting concept associated with unreliable AI systems used in healthcare, resulting in adverse outcomes, will damage the reputation of the vendors of these systems. Additionally, these adverse outcomes can be detrimental to the reputation of AI systems. It must be noted that the persistence of this situation can potentially result in some kind of domino effect, further escalating the reputation of the AI industry. Therefore, it is critically important to avoid this domino effect. A key step in preventing this adverse effect would be to collect data from highly reliable sources. Data collected from unreliable sources must be avoided under all circumstances. We hope that the newly created AI industry will focus on the reliability of data used for the purpose of developing efficient systems; thereby, mitigating the possibility of any harm to patients and the reputation of the providers resulting from the usage of decision support systems in healthcare. 

To summarise, we must accept the fact that AI systems have, in the recent past, revolutionised the processes associated with healthcare systems. However, this transformation must be approached with caution, and the use of data and data systems associated with these AI-based decision support systems must be analysed for ascertaining their reliability and accuracy. This might lead to the idea of ranking data sources for accuracy and reliability. With the advent of AI-based systems in healthcare data, the new oil and reliable data can be compared to high-quality combustible oil, much needed to run and maintain high-quality AI-based decision support systems. We hope that the developers of AI systems for healthcare will consider this while implementing these systems.

References:

  1. Caballar. What is synthetic data? : https://www.ibm.com/think/topics/synthetic-data, Accessed: Feb-10-2026.
  2. Gurupur, V., Wan, T.T.H. (2020). Inherent Bias in Artificial Intelligence-Based Decision Support Systems for Healthcare, Medicina, 2020 56(3), 141.
  3. Nasir, A., Gurupur, V., Liu, X. (2016). A new paradigm to analyse data completeness of patient data, Applied Clinical Informatics, DOI: 10.4338/ACI-2016-04-RA-0063.
Dr. Varadraj Gurupur

Varadraj Gurupur, PhD, is currently a tenured Full Professor in the School of Health Management and Informatics, with a joint appointment in the Department of Computer Science and the Department of Electrical and Computer Engineering at the University of Central Florida (UCF). Dr. Gurupur received his doctoral degree in Computer Engineering from the University of Alabama at Birmingham in 2010.