Artificial Intelligence in Clinical Decision-Making: Contemporary Approaches and Emerging Challenges

Jump to section

About this article

Dr. Leila Moradi¹, Dr. Arman Rezaei²

¹ Department of Biomedical Informatics, Shiraz University of Medical Sciences, Shiraz, Iran
² Department of Computer Science and Engineering, University of Tehran, Tehran, Iran

Correspondence

Correspondence to: Dr. Leila Moradi, Department of Biomedical Informatics, Shiraz University of Medical Sciences, Shiraz, Iran.
Email: [email protected]

Abstract

Artificial intelligence (AI) has transformed the landscape of clinical decision-making by offering innovative solutions to complex healthcare challenges. Its rapid integration into medicine has been enabled by the availability of large, annotated datasets, advances in computational capacity, and the expansion of cloud-based infrastructure. The effectiveness of these tools relies on understanding the underlying cognitive and analytical pathways traditionally employed by clinicians.

This review examines the current state of AI in healthcare, focusing on four critical dimensions: data acquisition, feature extraction, clinical interpretation, and decision support. It further highlights a range of clinical applications where AI has been successfully deployed. At the same time, the adoption of AI raises considerable concerns from technical, medical, and ethical perspectives. Issues related to transparency, validation, interpretability, and regulatory frameworks remain central challenges.

By synthesizing these perspectives, this paper underscores both the transformative potential of AI and the barriers that must be addressed to ensure its safe, effective, and equitable integration into clinical practice.

Keywords: Digital Skills Gap, Indonesia, Life Satisfaction, Social Integration, Technology Access Inequality

© The Author(s) 2025. Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third-party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit https://creativecommons.org/licenses/by-nc-nd/4.0/

INTRODUCTION

Computational tools began to support clinical decision-making during the 1970s and 1980s. Although the initial applications were technically constrained and clinically limited, subsequent decades witnessed significant progress that encouraged the adoption of more advanced methods. In recent years, the deployment of data-driven and knowledge-based systems across hospitals, medical centers, and healthcare institutions has expanded considerably, supported by approval from the Food and Drug Administration (FDA) (Adams et al., 1986; Adlung et al., 2021; Benjamens et al., 2020; Harish et al., 2021; Moja et al., 2014). Within the healthcare sector, artificial intelligence (AI) has increasingly been developed to optimize administrative operations, enhance diagnostic and therapeutic processes, and strengthen patient engagement. Parallel to this development, investment in healthcare AI companies has grown substantially, reaching $2.5 billion in the first quarter of 2021. This growth is projected to continue, with forecasts suggesting that nearly three-quarters of healthcare organizations will increase their investments in AI-driven solutions. Among the primary areas of application is the integration of AI into clinical decision support systems (CDSS) to aid clinicians. Nevertheless, despite rising investments in healthcare AI, the large-scale implementation of AI-enabled CDSS remains limited. Experiences with such systems reveal persistent barriers to broader adoption, including insufficient evidence of clinical effectiveness, the absence of standardized frameworks for assessing AI technologies, interoperability challenges, and constraints related to financial support (Rajkomar et al., 2019; Topol, 2019).

When decision support systems are gradually integrated into clinical practice, one of the key principles is the careful evaluation of their application, performance, and inherent limitations. Importantly, clinical users must also be adequately familiarized with the knowledge underlying these platforms. AI enhances clinical decision-making in multiple ways, including generating early warnings, supporting diagnosis, enabling large-scale screening, tailoring treatments, and evaluating patient responses. Any AI framework must be validated against clinical characteristics as well as existing standards, which include clinician expertise and experience. In several contexts, AI models have demonstrated performance comparable to that of healthcare professionals. In recent years, proposed guidelines for AI-related clinical trials have sought to address these concerns by introducing structured frameworks to improve consistency in AI evaluation (Montani & Striani, 2019; Sanchez-Martinez et al., 2022; Topol, 2019; Uddin et al., 2022).

The availability of more comprehensive data and the ability to minimize uncertainty in choosing and implementing interventions—such as surgical procedures, pharmacological treatments, or medical devices—can substantially improve patient outcomes (Harish et al., 2021). Within the era of evidence-based medicine, millions of patients are systematically assessed, producing vast, complex, and heterogeneous datasets. Leveraging algorithmic methods to analyze such data and reinforce clinical decision-making has become feasible due to the continual expansion of AI’s computational capabilities. The use of big data by AI provides clinicians with detailed insights that support more accurate diagnoses and personalized treatment strategies. Moreover, AI tools can estimate the probabilities and costs associated with potential outcomes. By incorporating AI-enhanced decision support, clinicians can improve patient outcomes, lower healthcare costs, and enhance overall patient satisfaction. Against Fig. 1, which depicts the clinical decision-making pathway, Fig. 2 demonstrates how AI contributes to specific tasks within this process. Research findings indicate that AI achieves human-like performance in lower-level tasks such as data acquisition and feature extraction. For higher-level tasks, including patient condition interpretation and decision support, AI enables the integration of heterogeneous and complex datasets into clinical reasoning. Nonetheless, these data remain underdeveloped and require further validation. Advancing through the clinical decision-making stages—data acquisition, feature extraction, interpretation, and decision support—also carries inherent risks, as errors in these processes may directly impact patients and cause irreversible harm. Despite differences in models, methodologies, or architectures, the fundamental steps of AI remain consistent and can be regarded as a general process.

Beyond these four stages outlined in Fig. 2, data normalization can provide an essential foundation for improving data sharing and aggregation. Data normalization, as a hierarchical approach to standardization, enhances usability, reduces ambiguity, and ensures that clinical information is harmonized to support sharing and care-chain analysis. Clinicians and healthcare professionals recognize normalization as a crucial step for effective collaboration in health information systems. They argue that normalization ensures data compatibility with software systems, thereby maximizing efficiency during processing. Given the critical role of optimizing clinical staff performance—particularly in the context of the current global health challenges—improved normalization strategies enable decision-makers to prioritize patients more effectively, minimize errors, and allocate resources with greater precision. Since fragmented or inconsistent data are rarely useful for AI models, healthcare organizations with appropriate resources can derive meaningful insights only when data are properly normalized. Thus, selecting suitable normalization techniques is vital. Several recent methods have been proposed to transform heterogeneous inputs into dimensionless forms. These approaches allow decision-makers to compare alternatives across different criteria by mapping decision matrices onto the interval [0,1]. Vafaei et al. (2022) present a detailed discussion of normalization strategies, grouping six major techniques into three categories: linear, semi-linear, and non-linear. Linear methods include ‘Max,’ ‘Max-Min,’ and ‘Sum,’ while the ‘Vector’ method represents a semi-linear approach. Non-linear techniques encompass both ‘Logarithmic’ and ‘Fuzzification’ methods.

In this study, we review recent research on advanced AI technologies with an emphasis on their role in facilitating healthcare delivery. We highlight the defining characteristics of AI systems, provide an overall assessment of AI perspectives and learning approaches, and critically examine the challenges of integrating AI into clinical decision-making.

AI PERSPECTIVES

As AI methods and model optimization continue to be applied across diverse domains (Kurshan et al., 2020; Liu et al., 2021; Saeidi et al., 2019; Wang et al., 2020; Ying et al., 2018; Zhou et al., 2018), the pace of algorithmic development in clinical and medical contexts is steadily accelerating. Nevertheless, recent publications highlight persistent deficiencies in healthcare that hinder timely and comprehensive progress. These shortcomings include prediction inaccuracies, workflow inefficiencies, treatment errors, misallocation of resources, and disparities in access. Researchers in the AI era argue that the integration of AI technologies offers promising solutions to mitigate these limitations, with future advancements expected to yield more positive outcomes. The following sections review the available evidence on AI’s contributions across various clinical and medical domains.

2.1. AI and Data Analysis

A critical dimension of AI in clinical data analysis lies in understanding cancer evolution, achieved through transfer learning applied to multiregional tumor sequencing and machine vision techniques for analyzing living cancer cells at single-cell resolution. Such innovations hold significant potential for enhancing decision-making processes. Another key application involves reconstructing neural circuits, enabling a connectomic understanding through electron microscopy. One of the most notable advances in medical AI is the exploration of human brain networks. This line of research facilitates faster computation and contributes to brain–machine interface development in neuromorphic computing, as well as the reverse engineering of neural mechanisms to design computer chips. Additional examples include advancements in behavior tracking across humans, animals, and machines by applying transfer learning in combination with both medical and non-medical knowledge (Ameen et al., 2022a; Giordano et al., 2021; Lee et al., 2019).

AI has also become increasingly important in drug discovery. Its applications include mining molecular structures, designing and synthesizing novel compounds, and predicting optimal dosages for patients under varying clinical conditions. Emerging applications extend to predicting toxicities, implementing secure data encryption within the pharmaceutical sector, and detecting drug–drug interactions using sophisticated AI algorithms. A prominent example is the Eve robot, developed at the University of Cambridge and the University of Manchester, which autonomously discovered an antimalarial compound that is also found in toothpaste. Such initiatives demonstrate how established companies and pharmaceutical startups can accelerate progress in this area (Williams et al., 2015).

2.2. AI and Clinicians

In the near future, clinicians, specialists, and healthcare organizations are expected to rely extensively on the practical applications of AI and deep learning (DL) in routine practice. These applications primarily focus on the analysis and evaluation of diverse medical imaging modalities, including histopathology, radiography, MRI, CT scans, ultrasound, ECG, endoscopy, and gastroscopy. As previously noted, AI algorithms provide numerous applications across clinical settings (Huss & Coupland, 2020; Kim, 2018). Specific examples include supporting neurologists in diagnosing conditions such as stroke, autism, or abnormalities in electroencephalography; assisting anesthesiologists in preventing intraoperative hypoxia; aiding paramedics in the rapid diagnosis of stroke or myocardial infarction; selecting viable embryos in in vitro fertilization; and guiding preventive surgical strategies for breast cancer patients. Figure 3 illustrates the development of AI applications throughout the human lifespan.

In addition to these clinical uses, substantial efforts are underway among startups and major technology companies to expand the scope of AI in healthcare. Notable contributors include Google, Microsoft, Orbita, Robin Healthcare, Tenor.ai, and Sopris Health, all of which are actively pursuing innovations in this field (Acosta et al., 2022; Ding et al., 2022; Wong et al., 2022; Zheng et al., 2021).

2.3. AI in Healthcare Systems

The ability to predict outcomes such as mortality risk in hospital palliative care has the potential to significantly influence system performance and patient care quality. For instance, estimating the duration of a patient’s stay in palliative care requires the continued development and refinement of AI methodologies (Bhagwat et al., 2018; Elfiky et al., 2018; Makar et al., 2017; Miotto et al., 2016). Similarly, forecasting the likelihood of hospitalization over time underscores the demand for advanced AI solutions, which can be achieved through the integration of electronic health records (EHR), AI, and deep learning (DL) techniques. Despite the need for further progress, several companies are actively pursuing innovations in this field. One example is Careskore, which has developed systems capable of predicting mortality risk using available datasets. However, uncertainty in predictions remains a significant issue. For instance, a model with an AUC (Area Under the ROC Curve) of 95% demonstrates high performance, yet it still lacks the precision to make accurate predictions at the individual level.

Beyond EHR data, imaging can also enhance predictive accuracy. Numerous studies have attempted to estimate biological age, with results indicating that this task can be performed using biomarkers such as DNA methylation. Nonetheless, predictive accuracy is affected by the unstructured and incomplete nature of much clinical data, which often includes socioeconomic, behavioral, biological, and physiological sensor information derived from clinician notes. These limitations prevent direct integration of such data into AI algorithms. Moreover, datasets with limited sample sizes represent an additional challenge (Daunay et al., 2019). While the AUC metric remains widely used, it does not account for the sensitivities and specificities valued by clinicians. Consequently, the full impact of AI within healthcare environments remains uncertain until robust, statistically validated criteria are established in real-world clinical settings.

Drawing upon the three AI perspectives discussed—data, clinicians, and healthcare systems—big data and DL offer unique opportunities for advancement. Real-world data serve as a critical foundation for clinical applications. For clinicians, AI facilitates efficient data analysis, accelerates patient outcome predictions, and supports the development of personalized medical guidelines. At the system level, AI enhances operational efficiency, reduces workflow delays, and minimizes medical errors. Harnessing the full potential of big data requires interdisciplinary expertise to integrate diverse datasets into a comprehensive and productive resource. The rapid proliferation of high-throughput technologies and the widespread adoption of EHRs have driven exponential growth in electronic health record data. However, analyzing multimodal datasets—encompassing both medical and clinical information—requires advanced computational tools to extract actionable insights aligned with specific clinical tasks.

A major challenge associated with EHR data is its high dimensionality, which necessitates substantial computational power. Dimension reduction techniques offer one pathway to address this issue while preserving valuable information. Additional concerns relate to ethical and legal dimensions, including privacy, personal autonomy, trust, and fairness, all of which require careful attention in healthcare big data applications. Technical and infrastructural barriers also pose risks, stemming from limitations in data processing workflows, security protocols, data heterogeneity, and inadequate storage infrastructure.

These considerations are illustrated in greater detail in Fig. 4. As shown, the general process includes stages such as data sources, data storage, analytics, and improved outputs. Data collected from hospitals and medical research—spanning clinical information, public health records, sensor data, EHRs, and OMICS—are stored using master person index (MPI) systems and operational data stores (ODS). Clinical data warehouses further enable healthcare organizations to evaluate disease management programs that directly and indirectly influence patient outcomes. Analytical processes across diagnostic, prescriptive, descriptive, and predictive domains yield outputs aimed at system improvement. These outputs span a wide range of applications, including risk and disease management, fraud reduction, medical imaging, personalized care, novel therapy development, population health improvement, cost reduction, prevention of medication abuse, medical research, precision medicine, preventive medicine, and enhanced patient engagement (Sahu et al., 2022).

CATEGORIES OF LEARNING METHODS

Broadly, in order to support clinical applications, AI approaches in this domain can be categorized into the following types:

3.1. Unsupervised Learning Approaches

In this framework, input information lacks labels, and the objective is to uncover and interpret complex patterns within the data. Techniques such as factor analysis (FA), principal component analysis (PCA), and autoencoders aim to obtain a low-dimensional understanding of the relationships among features. Conversely, mixture modeling and clustering methods classify data into groups. For instance, Le et al. (2022) proposed a method to address data sparsity by compressing the feature space of clinical presentations, allowing effective handling of limited clinical notes. Their model, based on an autoencoder, applied sparsity reduction for clinical note representation. The central goal was to reduce the dimensionality of sparse, high-dimensional data while preserving meaningful representation. Similarly, Chushig-Muzo et al. (2021) presented an AE-based approach that combined probabilistic models using Gaussian mixture modeling and hierarchical clustering, supported by Kullback–Leibler divergence. For clinical validation, this method utilized real EHR data from the University Hospital of Fuenlabrada in Spain. The findings demonstrated strong clustering performance, successfully grouping patients with similar health conditions based on diagnostic and medication codes (Adlung et al., 2021).

3.2. Supervised Learning Approaches

In contrast to unsupervised learning, supervised methods require labeled data to establish mappings between inputs and outputs. These techniques are widely implemented in clinical decision-making systems, where models are trained to map labeled input data to output functions that predict outcomes for new inputs. Supervised learning is predominantly applied in regression, where the output is continuous, and classification, where the output is discrete. Common supervised techniques include artificial neural networks (ANN), decision trees, linear regression (LR), and support vector machines (SVM). Among these, SVM and LR are relatively straightforward models that examine linear relationships between input variables and outcomes. Decision trees, by contrast, partition the feature space into smaller domains and assign each domain an outcome based on training data. When unrestricted, trees may overfit, creating partitions that only reflect training data, while shallow trees risk missing complex data structures. Ensemble techniques such as random forests, AdaBoost, and XGBoost mitigate these issues by integrating multiple weaker learners into a more accurate and generalizable model. Additionally, ANNs transform input into latent feature spaces, applying regression to these latent representations for precise predictions. Latent representations are obtained through successive linear mappings combined with nonlinear activation functions, generating hierarchical layers of data representation. However, ANNs require sufficiently large training datasets to avoid overfitting and memorization (Patrício et al., 2022; Uddin et al., 2022).

3.3. Weakly Supervised Learning Approaches

This paradigm focuses on constructing predictive models with limited, imprecise, or noisy data annotations, addressing the challenge of labeled dataset requirements in supervised models. For instance, weakly supervised learning has been employed to detect cancerous lesions from histopathology image datasets using multiple-instance learning. Hu et al. (2020) proposed a weakly supervised approach for classifying COVID-19 infections in CT images. While enhancing manual image labeling, this method still enabled accurate detection and differentiation of COVID-19 cases. The outcomes highlight the potential of such methods for broader implementation in clinical research and practice. Similarly, Ouyang et al. (2020) applied weakly supervised learning for abnormality localization in medical imaging. Despite high diagnostic accuracy, clinician trust remains limited due to the absence of transparent reasoning. To address this, they proposed an attention-driven weakly supervised algorithm incorporating a hierarchical attention mining framework that combined activation-based and gradient-based visual attention. This approach achieved significant improvements in localization when evaluated on a large-scale chest X-ray dataset.

3.4. Self-Supervised Learning Approaches

In this model, training occurs by predicting one part of the input from another. For example, Xue et al. (2020) trained a self-supervised model using an unlabeled EHR dataset, later fine-tuned for supervised estimation of time-to-event outcomes such as mortality and kidney failure with a smaller labeled dataset. This method exemplifies transfer learning, which addresses the lack of large labeled datasets by leveraging pre-trained knowledge for new tasks. Self-supervised learning has also been applied in histopathology image segmentation for cancer diagnosis and in assessing diabetic retinopathy severity from fundus images. Krishnan et al. (2022) reviewed such approaches, emphasizing their utility in healthcare for modeling multivariate datasets, while also highlighting challenges of bias and data collection. By leveraging unlabeled data, this method enables extraction of critical insights from medical images and signals without requiring extensive labeled datasets, making it a frontier area of research. For instance, models have achieved radiologist-level classification of chest radiographs and identification of various pathologies, even from unannotated data. Similarly, Li et al. (2020) demonstrated the importance of multimodal self-supervised methods for retinal disease diagnosis, advancing clinical decision-making. Nonetheless, significant challenges remain due to the reliance on large-scale patient data.

3.5. Reinforcement Learning Approaches

Reinforcement learning operates on a framework of reward and penalty, where models are trained to maximize favorable outcomes by receiving rewards for correct actions and penalties for incorrect ones. In healthcare, reinforcement learning has been applied to longitudinal treatment planning, where models recommend therapies associated with positive clinical outcomes. Deep reinforcement learning extends these concepts by incorporating Q-learning and Markov decision processes into neural network architectures. Liu et al. (2019) reviewed applications of deep reinforcement learning in clinical decision-making, exploring key approaches, challenges, and opportunities. Notable applications include constructing motifs from clinical notes, optimizing mechanical ventilation strategies, determining appropriate drug dosages, and personalizing treatments using diverse data sources such as genomic databases and EHRs. For example, Nemati et al. (2016) applied reinforcement learning to optimize drug dosing, while Prasad et al. (2017) used it to manage weaning from mechanical ventilation in intensive care. Although these models demonstrate strong potential for improving care, issues of safety, transparency, and accountability remain critical. Unlike controlled simulations, applying reinforcement learning in clinical practice introduces unique challenges related to dynamic decision-making and trust in algorithmic logic (Giordano et al., 2021; Haarnoja et al., 2017; Nemati et al., 2016).

BARRIERS AND CONSTRAINTS

This section outlines the principal challenges that arise when applying AI methods to clinical problems.

4.1. Interpretability

AI approaches used for regression and classification depend on datasets that are sufficiently robust and reliable to generate predictions from previously unseen cases. Model interpretability in AI is primarily statistical, reflecting how much variance in the output labels can be explained by a trained model. However, such interpretability should not be mistaken for causal inference or observational conclusions. For instance, decision trees—structured as a series of yes/no questions—provide a straightforward example of interpretable models. Similarly, linear regression assigns absolute values and signs to coefficients, indicating the strength and direction of predictor effects (Kovalchuk et al., 2022).

4.2. Causality

This limitation relates to the quality of explanations provided by AI algorithms. In clinical decision-making, practitioners require clear justification for why an algorithm produces a particular output. Interactive AI can support this process by integrating expert knowledge into the training phase. A thorough grasp of causality is essential to ensure that AI-based medical discoveries are interpreted appropriately and remain compliant with regulatory frameworks.

4.3. Data Availability and Quality

Despite growing public access to clinical datasets, high-quality data remain scarce—particularly for rare conditions. A key challenge is the integration of electronic health record (EHR) data from multiple sources, given their varying formats and complex structures. Using inadequate data to train AI models risks the selection of spurious correlations arising from sample bias or small datasets, leading to weak generalizability and inaccurate predictions. Moreover, limited sharing of data and code hampers reproducibility, collaboration, and benchmarking across studies. In many cases, medical data remain siloed in separate systems, making cross-population comparisons difficult or impossible. Since EHRs often contain unstructured information, clinicians and researchers may struggle to utilize them effectively. Machine learning techniques can address this challenge by organizing such data or directly integrating unstructured content for large-scale phenotyping and patient subgroup identification (Ameen et al., 2022a; Kelly et al., 2019).

4.4. Data Security and Privacy

A critical concern for the future of AI in medicine is safeguarding privacy and security. Given persistent risks of hacking and breaches, clinicians and organizations hesitate to adopt algorithms that may expose sensitive patient data (Topol, 2019). Since deep learning requires large datasets, privacy concerns intensify. Secure data exchange protocols between institutions remain unclear, and stakeholders are increasingly wary of the consequences of compromised health information. Hackers may even manipulate decision-making models, causing widespread harm. Blockchain offers one potential solution by enabling cryptographically secure and immutable data exchange systems. Yet, blockchain faces limitations, including high costs, scalability challenges, and slow performance. Federated learning provides another promising approach, allowing local model updates on decentralized servers without sharing raw patient data. Future strategies should also incorporate new data ownership models, secure platforms, and stronger regulatory frameworks to ensure progress is not undermined by unresolved privacy concerns (Sanchez-Martinez et al., 2022).

4.5. Generalizability of Models

In practice, AI models often perform inconsistently across different clinical environments. Generalizability measures the extent to which a trained model remains effective on real-world data beyond the original training context. For instance, a model trained on data from one hospital may perform poorly when applied in another setting. Models with greater generalizability are therefore more suitable for clinical adoption. Despite its importance, the concept lacks a universal definition, and new guidelines typically emphasize the need for independent validation (Sanchez-Martinez et al., 2022).

4.6. Real-World Clinical Trials

One of the most pressing challenges in clinical AI is determining whether a model is fit for routine use. Most AI systems are validated through retrospective or prospective studies, sometimes relying on simulated datasets. While such trials can confirm proof of concept, more real-world trials are essential for robust evaluation. Even when trials indicate strong performance, interpretive difficulties often remain due to limited controls. A notable example is Google Research’s deployment of an AI diagnostic system, which revealed unanticipated barriers in clinical practice that increased costs and disrupted workflows. This underscores the necessity of further testing and refinement to ensure AI models can withstand real-world challenges in hospitals, clinics, and research environments (Adlung et al., 2021; Topol, 2019).

4.7. Regulatory Challenges

The adoption of AI in clinical decision-making raises significant legal concerns, particularly regarding liability for errors or negligence. Questions of accountability emerge when AI recommendations contribute to adverse outcomes. Moreover, the rapidly evolving nature of AI complicates regulatory oversight, as agencies face difficulties determining how to evaluate iterative updates. Certification processes for clinical AI software often encounter regulatory obstacles, especially in adapting systems to real-world use. Ethical concerns further compound these challenges, particularly around data sharing and compliance with privacy laws. Predictive health models may also carry economic and medical consequences, reinforcing the need for strong legal and ethical frameworks (Sanchez-Martinez et al., 2022).

4.8. Causal AI versus Predictive AI

Traditional predictive AI, which relies on correlations between inputs and outcomes, may be insufficient for healthcare applications. Such methods can mislead clinicians if critical causal factors are ignored. By contrast, causal AI seeks to identify underlying mechanisms, providing a deeper explanation of variable interactions and enabling simulations of cause–effect scenarios. This shift is essential for building AI systems that support more reliable and clinically meaningful decision-making (Topol, 2019).

4.9. Validation Requirements

Even when AI models surpass human performance in predictive accuracy, rigorous validation remains essential. Deployment in hospitals requires robust evidence that algorithms can improve clinical and financial outcomes. Multicenter, randomized, prospective trials are particularly important for confirming whether trained models generalize across diverse settings. Yet, prospective trials in real clinical environments remain rare. Although AI performance typically improves with larger datasets, challenges such as catastrophic forgetting—where neural networks abruptly lose prior knowledge during new training—pose significant obstacles. Retraining entire datasets is also costly and time-intensive. Federated learning offers a potential solution by enabling local model improvements without centralized data pooling.

CONCLUSIONS AND FUTURE PERSPECTIVES

The exponential growth of AI research and the active participation of laboratories worldwide have significantly advanced clinical decision-making systems. However, sustaining this momentum is crucial for continued progress. AI systems are evolving from specific clinical applications to broader domains, where their impact could be transformative—provided that extensive, ongoing research continues. Clinical evaluation through standardized performance metrics remains essential to gauge AI’s contribution to care quality. Moreover, systems must be designed to optimize human–algorithm interaction for global healthcare use, ensuring interpretability and acceptance among practitioners.

Hybrid systems that integrate data-driven and knowledge-based approaches are emerging as promising directions. Such integration can leverage both empirical data and abstract reasoning to provide explainable decision support. Despite the abundance of empirical studies, the most pressing need remains demonstrating tangible improvements in patient outcomes and care processes. Regulatory guidance will play a central role in helping clinicians, innovators, and organizations navigate this development path.

Current research also emphasizes three key areas: (1) human-centered AI design, (2) cloud infrastructures for managing structured and unstructured clinical data, and (3) advancements in computational power through cloud and quantum computing (Ameen et al., 2022b). The convergence of these domains is expected to accelerate progress toward artificial general intelligence. Nevertheless, widespread adoption of AI raises unresolved ethical questions, particularly regarding patient autonomy and reliance on opaque systems. Epistemologically, differences between human and AI interpretations of clinical data highlight a persistent gap. This raises a crucial open question: if AI consistently outperforms humans in longitudinal tasks, at what point should clinicians defer judgment to AI?

According to EU regulators, systems that cannot provide adequate explanations should not be used in clinical practice. While this stance remains valid, maintaining it without adaptation could present future ethical dilemmas as AI evidence grows. Addressing this balance between trust and accountability will be critical for the safe and effective integration of AI in healthcare.

Acknowledgements

The authors would like to thank colleagues at the Department of Biomedical Informatics, Shiraz University of Medical Sciences, and the Department of Computer Science and Engineering, University of Tehran, for their constructive feedback and academic support during the preparation of this manuscript.

Funding

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.

Conflict of Interest

The authors declare that there are no conflicts of interest regarding the publication of this article.

References

  1. Acosta, J. N., Falcone, G. J., Rajpurkar, P., & Topol, E. J. (2022). Multimodal biomedical AI. Nature Medicine, 28(9), 1773–1784. https://doi.org/10.1038/s41591-022-01981-2
  2. Adams, I. D., Chan, M., Clifford, P. C., Cooke, W. M., Dallos, V., de Dombal, F. T., Edwards, M. H., Hancock, D. M., Hewett, D. J., & McIntyre,   (1986). Computer-aided  diagnosis  of  acute abdominal  pain:  a  multicentre study.  BMJ,  293(6550),  800–804. https://doi.org/10.1136/bmj.293.6550.800
  3. Adlung, ,  Cohen,  Y.,  Mor,  U.,  &  Elinav,  E.  (2021).  Machine  learning  in  clinical  decision  making.  Med,  2(6),  642–665.  https://doi.org/10.1016/J.MEDJ.2021.04.006
  4. Ameen, ,  Wong,  M.-C.,  Yee,  K.-C.,  &  Turner,  P. (2022a).  AI  and  Clinical  Decision  Making:  The  Limitations  and  Risks  of Computational Reductionism in Bowel Cancer Screening. Applied Sciences, 12(7). https://doi.org/10.3390/app12073341
  5. Ameen, ,  Wong, M.  C.,  Yee,  K. C.,  &  Turner,  P. (2022b).  AI  and  Clinical Decision  Making:  The  Limitations and  Risks  of  Computational  Reductionism  in  Bowel  Cancer  Screening.  Applied  Sciences  2022,  Vol.  12,  Page  3341,  12(7),  3341. https://doi.org/10.3390/APP12073341
  6. Benjamens, , Dhunnoo,  P.,  & Meskó,  B. (2020).  The  state of  artificial  intelligence-based  FDA-approved medical  devices  and algorithms: an online database. NPJ Digital Medicine, 3, 118. https://doi.org/10.1038/s41746-020-00324-0
  7. Bhagwat, N., Viviano, J. D., Voineskos, A. N., Chakravarty, M. M., & Initiative, A. D. N. (2018). Modeling and prediction of clinical symptom trajectories  in  Alzheimer’s  disease  using  longitudinal    PLOS  Computational  Biology,  14(9),  1–25. https://doi.org/10.1371/journal.pcbi.1006376
  8. Chushig-Muzo, D., Soguero-Ruiz, C., de Miguel-Bohoyo, P., & Mora-Jiménez, I. (2021). Interpreting clinical latent representations using  autoencoders  and  probabilistic    Artificial  Intelligence  in  Medicine,  122,  102211. https://doi.org/https://doi.org/10.1016/j.artmed.2021.102211
  9. Daunay, A., Baudrin, L. G., Deleuze, J.-F., & How-Kit, A. (2019). Evaluation of six blood-based age prediction models using DNA methylation analysis by pyrosequencing. Scientific Reports, 9(1), 8862. https://doi.org/10.1038/s41598-019-45197-w
  10. Ding, K., Zhou, M., Wang, Z., Liu, Q., Arnold, C. W., Zhang, S., & Metaxas, D. N. (2022). Graph Convolutional Networks for Multi-modality Medical Imaging: Methods, Architectures, and Clinical Applications. ArXiv, abs/2202.0. 54 Journal of Applied Intelligent Systems & Information Sciences

 

  1. Elfiky, A. A., Pany, M. J., Parikh, R. B., & Obermeyer, Z. (2018). Development and Application of a Machine Learning Approach to Assess Short-term Mortality Risk Among Patients With Cancer Starting Chemotherapy. JAMA Network Open, 1(3), e180926e180926. https://doi.org/10.1001/jamanetworkopen.2018.0926
  2. Giordano, C., Brennan, M., Mohamed, B., Rashidi, P., Modave, F., & Tighe, P. (2021). Accessing Artificial Intelligence for Clinical Decision-Making. Frontiers in Digital Health, 3. https://doi.org/10.3389/fdgth.2021.645232
  3. Haarnoja, T., Tang, H., Abbeel, P., & Levine, S. (2017). Reinforcement Learning with Deep Energy-Based Policies. Proceedings of the 34th International Conference on Machine Learning – Volume 70, 1352–1361.
  4. Harish, V., Morgado, F., Stern, A. D., &  Das, S. (2021).  Artificial Intelligence and Clinical  Decision Making: The New  Nature of  Medical    Academic  Medicine :  Journal  of  the  Association  of  American  Medical  Colleges,  96(1),  31–36. https://doi.org/10.1097/ACM.0000000000003707
  5. Hu, S., Gao, Y., Niu, Z., Jiang, Y., Li, L., Xiao, X., Wang, M., Fang, E. F., Menpes-Smith, W., & Xia, J. (2020). Weakly supervised deep learning for covid-19 infection detection and classification from ct images. IEEE Access, 8, 118869–118883.
  6. Huss, R., & Coupland, S. E. (2020). Software-assisted decision support in digital histopathology. The Journal of Pathology, 250(5), 685–692. https://doi.org/10.1002/path.5388
  7. Kelly, C. J., Karthikesalingam, A., Suleyman, M., Corrado, G., & King, D. (2019). Key challenges for delivering clinical impact with artificial intelligence. BMC Medicine, 17(1), 1–9.
  8. Kim, J. T. (2018). Application of Machine and Deep Learning Algorithms in Intelligent Clinical Decision Support Systems in Healthcare. Journal of Health and Medical Informatics, 09.
  9. Kovalchuk, S. V, Kopanitsa, G. D., Derevitskii, I. V, Matveev, G. A., & Savitskaya, D. A. (2022). Three-stage intelligent support of clinical decision-making  for  higher  trust,    validity,  and  Journal  of  Biomedical  Informatics,  127,  104013. https://doi.org/10.1016/j.jbi.2022.104013
  10. Krishnan, R., Rajpurkar, P., & Topol, E. J. (2022). Self-supervised learning in medicine and healthcare. Nature Biomedical Engineering. https://doi.org/10.1038/s41551-022-00914-1
  11. Kurshan, E., Shen, H., & Yu, H. (2020). Financial Crime Fraud Detection Using Graph Computing: Application Considerations Outlook. Proceedings –  2020  2nd  International  Conference  on  Transdisciplinary  AI,  TransAI  2020,  125–130. https://doi.org/10.1109/TRANSAI49837.2020.00029
  12. Le, T.-D., Noumeir, R., Rambaud, J., Sans, G., & Jouvet, P. (2022). Adaptation of Autoencoder for Sparsity Reduction From Clinical Notes Representation Learning. ArXiv Preprint ArXiv:2209.12831.
  13. Lee, K., Turner, N., Macrina, T., Wu, J., Lu, R., & Seung, H. S. (2019). Convolutional nets for reconstructing neural circuits from brain images acquired  by  serial  section  electron    Current Opinion  in  Neurobiology,  55,  188–198. https://doi.org/10.1016/j.conb.2019.04.001
  14. Li, X., Jia, M., Islam, M. T., Yu, L., & Xing, L. (2020). Self-Supervised Feature Learning via Exploiting Multi-Modal Data for Retinal Disease Diagnosis. IEEE Transactions on Medical Imaging, 39(12), 4023–4033. https://doi.org/10.1109/TMI.2020.3008871
  15. Liu, S., Ngiam, K.  , &  Feng, M.  (2019). Deep  Reinforcement Learning  for Clinical  Decision Support: A  Brief Survey.  ArXiv, abs/1907.0.
  16. Liu, S., Ni’mah, I., Menkovski, V., Mocanu, D. C., & Pechenizkiy, M. (2021). Efficient and effective training of sparse recurrent neural networks. Neural Computing and Applications 2021 33:15, 33(15), 9625–9636. https://doi.org/10.1007/S00521-021-05727-Y
  17. Makar, M., Oh, J., Fusco, C., Marchesani, J., McCaffrey, R., Rao, K., Ryan, E. E., Washer, L., West, L. R., Young, V. B., Guttag, J., Hooper, D. C., Shenoy, E. S., & Wiens, J. (2017). A data-driven approach to predict the daily risk of Clostridium difficile infection at  two  large  academic  health    Open  Forum  Infectious  Diseases,  4(suppl_1),  S403–S404. https://doi.org/10.1093/ofid/ofx163.1009
  18. Miotto, R., Li, L., Kidd, B. A., & Dudley, J. T. (2016). Deep Patient: An Unsupervised Representation to Predict the Future of Patients from the Electronic Health Records. Scientific Reports, 6(1), 26094. https://doi.org/10.1038/srep26094
  19. Moja, L., Kwag, K. H., Lytras, T., Bertizzolo, L., Brandt, L., Pecoraro, V., Rigon, G., Vaona, A., Ruggiero, F., Mangia, M., Iorio, A., Kunnamo, I., & Bonovas, S.  (2014). Effectiveness of  Computerized Decision  Support Systems Linked  to Electronic  Health Records:  A  Systematic  Review  and  Meta-Analysis. American  Journal  of  Public  Health,  104(12),  e12–e22.  https://doi.org/10.2105/AJPH.2014.302164
  20. Montani, S., & Striani, M. (2019). Artificial Intelligence in Clinical Decision Support: a Focused Literature Survey. Yearbook of Medical Informatics, 28(1), 120–127. https://doi.org/10.1055/S-0039-1677911
  21. Nemati, , Ghassemi,  M. M.,  &  Clifford, G.  D.  (2016). Optimal  medication  dosing from  suboptimal clinical  examples:  a deep reinforcement learning approach. Annual International Conference of the IEEE Engineering in Medicine and Biology  Society.
  22. IEEE Engineering  in  Medicine  and  Biology    Annual  International  Conference,  2016,  2978–2981. https://doi.org/10.1109/EMBC.2016.7591355
  23. Ouyang, X., Karanam, S., Wu, Z., Chen, T., Huo, J., Zhou, X. S., Wang, Q., & Cheng, J.-Z. (2020). Learning hierarchical attention for weakly-supervised chest X-ray abnormality localization and diagnosis. IEEE Transactions on Medical Imaging, 40(10), 2698–2710.
  24. Patrício, C., Neves, J. C., & Teixeira, L. F. (2022).  Explainable Deep Learning Methods in Medical  Imaging Diagnosis: A Survey. https://doi.org/10.48550/arxiv.2205.04766
  25. Prasad, N., Cheng, L.-F., Chivers, C., Draugelis, M., &  Engelhardt, B. (2017).  A Reinforcement Learning Approach  to Weaning of Mechanical Ventilation in Intensive Care Units.
  26. Rajkomar, A., Dean, J., & Kohane, I. (2019). Machine Learning in Medicine. The New England Journal of Medicine, 380(14), 1347–1358. https://doi.org/10.1056/NEJMra1814259
  27. Saeidi, N., Karshenas, H., & Mohammadi, H. M.  (2019). Single Sample Face  Recognition Using Multi cross Pattern  and Learning
  28. Discriminative Binary Features. Journal of Applied Security Research, 14(2), 169–190.
  29. Sahu, M., Gupta, R., Ambasta, R. K., & Kumar, P. (2022). Chapter Three – Artificial intelligence and machine learning in precision medicine: A paradigm shift in big data analysis. In B. Teplow (Ed.), Precision  Medicine (Vol. 190, Issue 1, pp. 57–100).
  30. Academic Press. https://doi.org/https://doi.org/10.1016/bs.pmbts.2022.03.002

 

  1. Sanchez-Martinez, S., Camara, O., Piella, G., Cikes, M., González-Ballester, M. Á., Miron, M., Vellido, A., Gómez, E., Fraser, A. G., & Bijnens, B. (2022). Machine Learning for Clinical Decision-Making: Challenges and Opportunities in Cardiovascular Imaging. Frontiers in Cardiovascular Medicine, 0, 2020. https://doi.org/10.3389/FCVM.2021.765693
  2. Topol, E. J. (2019). High-performance medicine: the convergence of human and artificial intelligence.  Nature Medicine 2019 25:1, 25(1), 44–56. https://doi.org/10.1038/s41591-018-0300-7
  3. Uddin, S., Ong, S., & Lu, H. (2022). Machine learning in project analytics: a data-driven framework and case study. Scientific Reports, 12(1), 15252. https://doi.org/10.1038/s41598-022-19728-x
  4. Vafaei, N., Delgado-Gomes, V., Agostinho, C., & Jardim-Goncalves, R. (2022). Analysis of Data Normalization in Decision-Making
  5. Process for ICU’s Patients During the Pandemic. Procedia Computer Science, 214, 809–816.
  6. Wang, H., Wang, Z., Wang, W., Xiao, Y., Zhao, Z., & Yang, K. (2020). A Note on Graph-Based Nearest Neighbor Search. In undefined.
  7. Williams, K., Bilsland, E., Sparkes, A., Aubrey, W., Young, M., Soldatova, L., De Grave, K., Ramon, J., Clare, M., Sirawaraporn, W., Oliver, S., & King, R. (2015). Cheaper faster drug development validated by the repositioning of drugs against neglected tropical diseases. Journal of The Royal Society Interface, 12, 20141289. https://doi.org/10.1098/rsif.2014.1289
  8. Wong, A. N. N., He, Z., Leung, K. L., To, C. C. K., Wong, C. Y., Wong, S. C. C., Yoo, J. S., Chan, C. K. R., Chan, A. Z., Lacambra, M. D., & Yeung, M. H. Y. (2022). Current Developments of Artificial Intelligence in Digital Pathology and Its Future Clinical Applications in Gastrointestinal Cancers. Cancers, 14(15). https://doi.org/10.3390/cancers14153780
  9. Xue, Y., Du, N., Mottram, A., Seneviratne, M., & Dai, A. M. (2020). Learning to Select Best Forecast Tasks for Clinical Outcome Prediction. In H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, & H. Lin (Eds.), Advances in Neural Information Processing Systems (Vol.  33,    15031–15041).  Curran  Associates,  Inc. https://proceedings.neurips.cc/paper/2020/file/abc99d6b9938aa86d1f30f8ee0fd169f-Paper.pdf
  10. Ying, H., Zhuang, , Zhang,  F., Liu,  Y., Xu,  G., Xie, X.,  Xiong, H.,  & Wu,  J. (2018).  Sequential recommender system  based on hierarchical  attention  network.  IJCAI  International  Joint  Conference  on  Artificial  Intelligence,  2018-July,  3926–3932. https://doi.org/10.24963/IJCAI.2018/546
  11. Zheng, Y., Jiang, Z., Zhang, H., Xie, F., Shi, J., & Xue, C. (2021). Histopathology WSI Encoding based on GCNs  for Scalable and Efficient Retrieval of Diagnostically Relevant Regions. https://arxiv.org/abs/2104.07878v
  12. Zhou, C., Bai, J., Song, J., Liu, X., Zhao, Z., Chen, X., & Gao, J. (2018). ATRank: An attention-based user behavior modeling framework for recommendation. 32nd AAAI Conference on Artificial Intelligence, AAAI 2018.