Introduction

The most widely established modifiable risk factors for developing coronary artery disease (CAD) include arterial hypertension, smoking, dyslipidemia, and diabetes mellitus. However, an increasing proportion of patients develop CAD, especially acute coronary syndrome (ACS), despite an absence of these risk factors.1

Baseline characteristics of patients enrolled in randomized controlled trials (RCTs) of complex CAD are inherently related to the study inclusion and exclusion criteria.2 The anatomic SYNTAX score (aSS), which is recommended by the European Society of Cardiology and American College of Cardiology guidelines for revascularization,3,4 semiquantifies the extent and complexity of CAD and helps stratify the risk of revascularization following percutaneous coronary intervention (PCI) and coronary artery bypass grafting (CABG). In 2020, the SYNTAX score II was redeveloped (SYNTAX score II 2020 [SS2020]) to predict 5-year mortality in patients with 3-vessel disease (3VD) and / or left main (LM) CAD following PCI or CABG, combining anatomic complexity assessment and 7 clinical prognostic factors identified by Cox regression: age, diabetes mellitus medically treated with or without insulin, chronic obstructive pulmonary disease (COPD), peripheral vascular disease (PVD), current smoking, creatinine clearance (CrCl), and left ventricular ejection fraction (LVEF).5

Machine learning (ML) is a novel approach to establishing risk models for predicting clinical outcomes. The ML methodology may allow for overcoming some of the limitations of conventional statistical approaches in risk prediction by applying computer algorithms to large datasets with numerous multidimensional variables and capturing high-dimensional, nonlinear relationships among clinical characteristics to create data-driven outcome predictions.6 Several ML studies have shown that advanced ML algorithms achieved better risk prediction than conventional models in patients with CAD.7,8 Furthermore, ML may identify hidden variables associated with clinical events, as reported by our group in the SYNTAX (Synergy between Percutaneous Coronary Intervention with Taxus and Cardiac Surgery) trial with very long-term follow-up.9,10

This study used ML algorithms in a registry population of patients with 3VD with / without LMCAD to identify novel, registry-specific risk factors for long-term mortality by integrating clinical characteristics, biological markers, anatomical complexity, and revascularization strategy.

Patients and Methods

Study population and dataset

The design and primary results of this single-center, registry-based study conducted in the First Department of Cardiology, Medical University of Warsaw, Poland have been published previously.11 A total of 1509 patients with CAD were discussed during 176 multidisciplinary heart team meetings in the years 2016–2019, and were enrolled in this retrospective study. The screening process required the recording of a dedicated database of clinical, biologic, echocardiographic, and angiographic characteristics. The angiographic inclusion criteria for the final analysis were severe CAD, defined as 3VD and / or LM disease.

The study flowchart is presented in Supplementary material, Figure S1. The final analysis comprised 1035 patients undergoing revascularization (CABG, n = 356; PCI, n = 679), while 251 patients were treated with medical therapy only, as recommended by the heart team. The 5-year survival data were available for all the patients. Of note, patients with ACS, including those with ST-segment elevation myocardial infarction (STEMI) and a history of prior revascularization, were included in the registry. This is at variance with the SYNTAX trial, which excluded such patients.

All experimental protocols, if undertaken, were approved by Medical University of Warsaw. An informed consent was obtained from all participating subjects or their legal guardians. All experiments and analyses were performed in accordance with the relevant guidelines and regulations.12

Feature selection and data preprocessing

Our objectives for developing the new ML model were 1) to identify potential risk factors that were recognized as important baseline characteristics of patients included in the SYNTAX study but were not identified as such in the SS2020; and 2) to compare the predicted 5-year mortality between the new ML model and the SS2020 by selecting baseline factors and applying definitions common to both the Polish registry and the SYNTAX study.

The structured dataset included 42 common variables divided into 5 categories: 1) revascularization strategy (1 variable): PCI or CABG; 2) clinical characteristics (21 variables): age, sex, body mass index, COPD, PVD, previous myocardial infarction (MI), history of heart failure (defined according to the BARI [Bypass Angioplasty Revascularization Investigation] trial13), current smoking, medically treated diabetes, use of insulin, hypertension, dyslipidemia, systolic blood pressure, diastolic blood pressure, previous stroke, carotid stenosis, active malignancy, LVEF, severe pulmonary hypertension (PH; defined as estimated right ventricular systolic pressure ≥55 mm Hg on Doppler echocardiography), and mental and physical component summary from the 36-Item Short Form Health Survey (SF-36); 3) biologic characteristics (12 variables): alanine transaminase (ALT), CrCl, C-reactive protein (CRP), total cholesterol, high-density lipoprotein cholesterol, low-density lipoprotein cholesterol, triglyceride, hematocrit, hemoglobin (Hb), fasting blood glucose, and glycated Hb (HbA1c); 4) anatomic characteristics (5 variables): aSS, disease type (3VD or LMCAD), total occlusion (TO), heavy calcification, bifurcation lesion; and 5) baseline pharmacotherapy (3 variables): statin, renin-angiotensin system inhibitors, and β-blockers.

Impressively, there were no missing values for any of these 42 variables for all 1035 patients included in the Polish registry.

Model development and comparison

Least absolute shrinkage and selection operator (LASSO) regression was the ML algorithm used to develop the clinical prognostic index for predicting 5-year death. This method selects its predictors by shrinking some coefficients to 0 through setting a limit to the sum of the absolute standardized coefficients. To avoid overfitting, a 10-fold cross-validation approach was used for selecting the best λ of the LASSO regression model. As the majority of the RCTs of 3VD and / or LMCAD exclude STEMI patients, sensitivity analysis was planned with only the non-STEMI population. A linear predictor was calculated based on the survival probability derived from the ML model.

In the second stage, Cox regression was used incorporating the ML linear predictor. Multicollinearity was checked among the factors selected by the LASSO regression, and any 2 predictors with a Pearson r greater than or equal to 0.6 were recognized as correlated. In such a case, we considered the relevance of the 5-year mortality in a univariable analysis of each factor, and also considered the historical impact on the prognosis to choose only 1 of the correlated predictors, which was later used in the final model. Each remaining factor was examined by a univariable Cox regression analysis, and those with a P value below 0.2 were identified as potential risk factors and incorporated into the multivariable model. The factors with a P value below 0.05 in the multivariable Cox regression model were included in the final linear model for the prediction of 5-year mortality. The risk factors identified as significant in the multivariable analysis were used in formulating the prediction model for internal validation. The model fitting was evaluated using calibration curves with concordance index (C-index) and receiver operating characteristic (ROC) analysis with the area under the curve (AUC), and was compared to the model fitting of SS2020.

Statistical analysis

Categorical variables are reported as numbers and percentages, and continuous variables as means with SD. Exceptionally, the survival term and follow-up duration were presented as median (interquartile range [IQR]) because these parameters did not follow the normal distribution. We used the t test to assess differences between parametric continuous variables, the Mann–Whitney test for non-normally distributed variables, and the χ2 test for categorical variables. No correction for multiple testing was applied. A 2-sided P value below 0.05 was considered significant. The Kaplan–Meier curve was plotted to describe and compare the survival rate over 5 years. To compare survival between groups, the log-rank test was used. In the Cox regression analysis, hazard ratios (HRs) and 95% CIs were presented both in the univariable and multivariable analyses. The DeLong test was used to compare the AUCs derived from SS2020 and the new ML model. All analyses were performed using R, version 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Baseline characteristics and outcomes

The median (IQR) follow-up of this population was 1676 (1274–2097) days, and their baseline characteristics are listed in Table 1. The mean (SD) age of the patients was 68.2 (9.9) years, and 75.7% of them were men. There was a high frequency of previous MI and a history of heart failure (48.5% and 70.9%, respectively). The mean values of CrCl and LVEF were below the normal range. The mean number of lesions was 4.2, with a high prevalence of bifurcation lesions, heavy calcification, and LMCAD, resulting in a mean (SD) aSS of 30.2 (6.4). The registry also included information on physical and mental status obtained from the SF-36. The rate of statin prescription at baseline was 88%. Overall, there were significant differences between the prevalence of clinical, biologic, and anatomic parameters in the registry patients, as compared with the SYNTAX trial population (Supplementary material, Table S1).

Table 1. Baseline characteristics of the patients included in the Polish registry (n = 1035)

Parameter

Value

Age, y, mean (SD)

68.2 (9.9)

Male sex, n (%)

784 (75.7)

BMI, kg/m2, mean (SD)

28.1 (3.6)

Previous MI, n (%)

507 (48.5)

Previous heart failure, n (%)

734 (70.9)

Previous stroke, n (%)

86 (8.3)

Medically treated diabetes, n (%)

317 (30.6)

Insulin use, n (%)

105 (10.1)

Fasting blood glucose, mg/dl, mean (SD)

101.6 (12.8)

HbA1c, %, mean (SD)

5.8 (1.8)

Dyslipidemia, n (%)

840 (81.2)

Total cholesterol, mg/dl, mean (SD)

222 (27.3)

HDL-C, mg/dl, mean (SD)

39.0 (6.8)

Triglyceride, mg/dl, mean (SD)

204.6 (27.7)

Hypertension, n (%)

867 (83.8)

Systolic blood pressure, mm Hg, mean (SD)

134.1 (10.9)

Diastolic blood pressure, mm Hg, mean (SD)

85.3 (8.7)

Current smoker, n (%)

197 (19)

COPD, n (%)

99 (9.6)

Peripheral vascular disease, n (%)

63 (6.1)

Carotid stenosis, n (%)

111 (10.7)

Active malignancy, n (%)

27 (2.6)

Severe pulmonary hypertension, n (%)

90 (8.7)

LVEF, %, mean (SD)

37.7 (10.6)

Total occlusion, n (%)

241 (23.3)

Heavy calcification, n (%)

292 (28.2)

Bifurcation lesion, n (%)

752 (72.7)

LMCAD, n (%)

267 (25.8)

Lesion number, mean (SD)

4.2 (1.4)

Anatomic SYNTAX score, mean (SD)

30.2 (6.4)

CrCl, ml/min, mean (SD)

56.8 (16.6)

CRP, mg/dl, mean (SD)

0.69 (0.25)

ALT, IU/l, mean (SD)

42.7 (8.1)

Hematocrit, %, mean (SD)

43.9 (5.8)

Hemoglobin, g/dl, mean (SD)

13.3 (1.9)

SF-36 PCS, mean (SD)

72.5 (18.3)

SF-36 MCS, mean (SD)

51.9 (9.3)

Statin, n (%)

911 (88)

β-Blocker, n (%)

785 (75.8)

RAS inhibitors, n (%)

870 (84.1)

CABG, n (%)

356 (34.4)

SI conversion factors: to convert total cholesterol, HDL-C, and triglyceride to mmol/l, multiply by 0.0259; CRP to nmol/l, by 95.24; ALT to μkat/l, by 0.0167; hemoglobin to g/l, by 10.

Abbreviations: ALT, alanine transaminase; BMI, body mass index; CABG, coronary artery bypass grafting; COPD, chronic obstructive pulmonary disease; CrCl, creatinine clearance; CRP, C-reactive protein; HbA1c, glycated hemoglobin; HDL-C, high-density lipoprotein cholesterol; LMCAD, left main coronary artery disease; LVEF, left ventricular ejection fraction; MCS, mental component summary; MI, myocardial infarction; PCS, physical component summary; SF-36, the 36-Item Short Form Health Survey; RAS, renin-angiotensin system

In the Polish registry, fewer patients were treated with CABG than with PCI (34.4% vs 65.6%, respectively). At 5-year follow-up, there were 127 deaths (12.3%), with the survival rate post-PCI being nonsignificantly lower than following CABG (Figure 1). The registry enrolled 121 STEMI patients (11.6%), who had numerically higher 5-year mortality, as compared with the stable and non-STEMI patients (17.5% vs 11.9%; log-rank P = 0.07; Supplementary material, Figure S2).

Figure 1. Kaplan–Meier curve of the survival rate up to 5 years

Abbreviations: PCI, percutaneous coronary intervention; others, see Table 1

Least absolute shrinkage and selection operator screening, Cox regression analysis, and establishing the prediction model

The LASSO regression identified 19 prognostic factors for predicting 5-year mortality: age, sex, COPD, PVD, carotid stenosis, active malignancy, LVEF, pulmonary hypertension, history of MI, diabetes, insulin use, CRP, ALT, Hb, HbA1c, β-blocker use, TO, bifurcation lesion, and LMCAD (Figure 2, Table 2). Notably, almost the same factors were identified by the LASSO regression in a sensitivity analysis performed following the exclusion of STEMI patients (17 common factors; Supplementary material, Figure S3 and Table S2).

Figure 2. Top prognostic factors detected by the least absolute shrinkage and selection operator (LASSO) regression for predicting 5-year death in the Polish registry. Best λ was calculated based on the cross-validation method.

Abbreviations: PVD, peripheral vascular disease; others, see Table 1

Table 2. Prognostic factors selected by the least absolute shrinkage and selection operator (LASSO) regression for predicting 5-year death in the Polish registry

Prognostic factors

Coefficient calculated by LASSO

Age

–0.021

Male sex

0.244

COPD

1.784

Peripheral vascular disease

0.572

Carotid artery disease

–0.314

Active malignancy

–0.193

LVEF

–0.008

Pulmonary hypertension

1.679

History of MI

–0.471

Medically treated diabetes

1.761

Insulin use

0.936

CRP

0.054

ALT

0.006

Hemoglobin

–0.117

HbA1c

–0.547

β-Blocker use

–0.332

Total occlusion

0.383

Bifurcation lesion

–0.052

LMCAD

0.159

Abbreviations: see Table 1

There was no multicollinearity among the factors selected by the LASSO regression in the whole population (Supplementary material, Figure S4). Sex, ALT, history of MI, β-blocker use, and bifurcation lesion had a P value above 0.2 in the univariable analysis and were consequently removed, leaving 14 variables to be tested in the multivariable analysis. Of those, 11 showed a significant relationship with mortality (Table 3; Supplementary material, Table S3). There was no correlation among the combinations of 2 in the 6 selected continuous variables (Pearson r ≥0.6), and these factors were therefore included in the multivariable analysis and the final linear regression model to predict 5-year mortality.

Table 3. Coefficient and hazard ratio for predicting the risk of all-cause death at 5 years in the final linear model

Prognostic index

Coefficient

HR (95% CI)

P value

Age, per 1 year increase

–0.041

0.95 (0.93–0.98)

0.001

Carotid artery disease

–0.556

0.57 (0.36–0.92)

0.02

COPD

1.72

5.59 (3.07–10.16)

<⁠0.001

CRP, per 1 mg/dl increase

0.084

1.08 (1.02–1.15)

0.004

Medically treated diabetes

2.581

13.21(6.67–26.18)

<⁠0.001

Hemoglobin, per 1 g/dl increase

–0.152

0.86 (0.77–0.96)

0.006

HbA1c, per 1% increase

–0.91

0.4 (0.31–0.52)

<⁠0.001

Insulin use

1.162

3.19 (1.88–5.45)

<⁠0.001

Pulmonary hypertension

1.734

5.67 (3.37–9.52)

<⁠0.001

Peripheral vascular disease

0.699

2.01 (1.23–3.27)

0.005

Total occlusion

0.53

1.70 (1.16–2.49)

0.007

Abbreviations: see Table 1

By using these factors, the final linear model to predict 5-year mortality in the Polish registry was as follows:

Expected 5-year mortality = 1 – exp (–0.055 × exp [–0.041 × age – 0.556 × carotid stenosis + 1.720 × COPD + 0.084 × CRP + 2.581 × DM – 0.152 × Hb – 0.910 × HbA1c + 1.162 × insulin use + 1.734 × PH + 0.699 × PVD + 0.530 × TO] + 8.198)

Comparison of calibration between the SYNTAX score II 2020 and the machine learning model

The SS2020 was used to predict mortality in the registry population, and its calibration curve showed systematic underestimation, except in low-risk patients (Figure 3). The C-index was 0.753, while the AUC obtained from the ROC analysis was 0.763.

Figure 3. Calibration curve (A) and receiver operating characteristics curve (B) applied to the prediction of the SYNTAX Score II 2020

Abbreviations: AUC, area under the curve

The established LASSO model was also applied internally to the dataset. The 4 points for the average of each quartile were closer to the identical line than in the case of SS2020 (C-index, 0.924; Figure 4). The AUC obtained from the ROC anaysis was 0.935, which is significantly higher than that of SS2020 (DeLong test, P <⁠0.001).

Figure 4. Calibration curve (A) and receiver operating characteristics curve (B) applied to the prediction of the least absolute shrinkage and selection operator model

Abbreviations: see Figure 3

Risk factors not included in the SYNTAX score II 2020 and their impact on the registry

Among the top 5 risk factors for predicting 5-year mortality identified by the LASSO regression model, the only one that was not included in SS2020 was PH. The Kaplan–Meier curve displays a substantial difference in mortality between the registry patients with and without PH (Figure 5).

Figure 5. Kaplan–Meier curve of the mortality stratified by the presence of pulmonary hypertension (PH)

Discussion

The main findings of the present study can be summarized as follows: 1) the SS2020, which is currently the most reliable model for predicting 5-year mortality in patients undergoing revascularization with CABG or PCI for severe CAD, underestimated 5-year mortality in the Polish registry of CAD patients evaluated by a heart team; 2) in the Polish registry, following a systematic, extensive, and complete screening of clinical, biologic, and anatomic parameters, ML models identified additional risk factors (PH, TO as well as lower Hb and high CRP levels) that were not incorporated in the SS2020 for the prediction of 5-year all-cause death, with PH having the greatest impact on mortality.

Latent benefit of machine learning

Previously, the SYNTAX score II and SS2020 were developed using conventional Cox regression approaches to predict individual risk following revascularization with PCI or CABG.5 Farooq et al14 investigated independent risk factors predictive of 4-year mortality in the SYNTAX trial; however, unconventional variables, such as PH and CRP, were not part of their model, as they only tested conventional risk factors that were selected on the basis of published data and clinical experience. Furthermore, the number of factors that could be included in the Cox model was limited by the low prevalence of these variables.15

In contrast to these conventional statistical approaches, ML can handle large amounts of data / parameters and a wide variety of parameter types. ML algorithms could therefore represent a novel approach to the compelling requirement of personalizing risk assessment.6 The PRAISE (Prediction of Adverse Events Following an Acute Coronary Syndrome) score7 using ML has been shown to have accurate discriminative capabilities to predict all-cause death, recurrent acute MI, and major bleeding after ACS, and is now used in daily clinical practice. Rousset et al8 employed ML algorithms in the FOURIER (Evolocumab and Clinical Outcomes in Patients with Cardiovascular Disease) dataset, a randomized clinical trial testing the efficacy of evolocumab, for predicting the risk of major adverse cardiovascular events.8 ML algorithms can identify unconventional prognostic factors and therefore may improve individual risk prediction, as compared with a conventional statistical approach.

An important limitation of the ML approach is the issue of dealing with missing values, which are inevitable in the context of clinical practice, whereas the ML approach requires a large sample to be appropriately trained. The present dataset had a special quality for ML in the sense that there were no missing values among the 43 470 data points analyzed.

Extensive approach and unconventional risk factors

As a reference, the SS2020 was also applied to the registry dataset. Its performance was consistent with previous evaluations with the C-statistic in the Polish registry (C-index, 0.75), comparable to that observed in a large Japanese registry (C-index, 0.72),16 while also showing systematic underestimation in the Polish registry when compared with the ML model.

In 2021, Hara et al17 reported that preprocedural biologic parameters, such as CRP and Hb, were associated with 10-year mortality post-revascularization, regardless of revascularization technique. In our study, the ML model was built using a combination of baseline clinical characteristics, biochemical and imaging parameters, and unconventional, potentially important prognostic factors, such as PH, CRP, Hb, carotid stenosis, and TO, all of which were found to play a key role in predicting 5-year all-cause death. Two strong correlations were observed between factors included in the final model: medically treated diabetes vs HbA1c, and age vs carotid artery disease, PVD, COPD, and PH (Supplementary material, Figures S5 and S6). These relationships led to substantial changes in HR and regression coefficients between the univariable and multivariable analyses (Supplementary material, Table S3).

The abovementioned PRAISE score has been effective in predicting all-cause mortality, recurrent MI, and major bleeding events. However, its predictive capability appeared limited in a recent validation study involving a real-world Asian cohort with ACS, suggesting a potential need for model adjustment to specific populations.18 Although the multidisciplinary heart team decisions might have been affected by the inclusion criteria of the previous RCT,19 the present Polish registry included a population of all-comers amenable to either surgical or percutaneous revascularization, in which systematic screening by the heart team and the ML approach enabled the detection of specific risks (including novel ones, such as PH) not necessarily collected in RCTs with numerous restrictive inclusion and exclusion criteria. This may partially explain the geographic disparity of the very long-term mortality in the SYNTAXES (SYNTAX Extended Survival) trial,20 which remained unclear at the time of the substudy.

The new prognostic factors identified in the Polish registry corroborate the findings of the recent ML analysis of the SYNTAX trial at 10 years.9,10 The shared risk factors detected by the LASSO regression in the Polish registry and the SYNTAX trial are tabulated in Supplementary material, Table S4, and show some degree of concordance. On the contrary, PH was not a significant factor for 5-year mortality in the SYNTAX trial, whereas it was the most prominent risk factor in the Polish registry. More detailed information regarding this label of PH, including its interaction with COPD, invasive or noninvasive assessment of right ventricular systolic pressure, and the availability of spirometry parameters, would be useful to further understand the pathophysiologic background of this population and the impact of comorbidities, all of which would help with planning further studies. Moreover, regional-specific factors (eg, smoking habits, the impact of air pollution, soot, and smoke) should also be taken into account.21

Of note, important variables identified by the LASSO regression can be affected when another ML approach, such as the gradient boosting machine, is used.9 The LASSO method selects its predictors by shrinking some coefficients to 0 through setting a limit to the sum of the absolute standardized coefficients.22 A major advantage of LASSO is that it is a combination of both shrinkage and selection of variables. We used the LASSO regression to preselect event-related factors by ML. However, other statistical approaches (eg, gradient boosting, neural network, random forest, etc.) could also be applied when selecting the most important factors to predict mortality in complex CAD populations.

Furthermore, even though many parameters are collected in large clinical trials, some important ones may be missed or underestimated in a conventional logistic regression because the number of variables that can be considered in a single model is limited by the number of observed events. Also, baseline characteristics of CAD may differ in a regional, as compared with the global, randomized population.23 ML algorithms can potentially improve risk prediction and expose undiscovered, clinically important factors predicting clinical events that must be incorporated into novel risk models to improve long-term prediction among patients with complex CAD.24,25

Limitations

Our study has several limitations. First, the ML models were developed using registry data from patients treated in the years 2016–2019. The subsequent technological improvements in PCI devices and surgical techniques, as well as adjunctive optimal pharmacologic therapies, may limit the generalizability of our findings to a rapidly evolving future practice. In addition, frailty was evaluated in the Polish registry but was not involved in the SYNTAX trial due to the nature of the RCT. Thus, the current ML models did not consider this factor, which could be a significant predictor of vital outcomes.

Secondly, this is a post hoc analysis, and the model has not yet been validated externally, although the ML algorithm is designed to avoid overfitting in an internal validation.

Our approach to predicting a final probability of 5-year mortality involves a 3-stage process (LASSO, univariable Cox proportional hazard model, and multivariable Cox proportional hazard model) to select variables and finally predict probabilities.

Practically, we have compared the internal performance of one new ML model combining LASSO and Cox proportional hazard model with the external performance of SS2020 in the same population, which does not comply with the rule stating that the model tested in an internal validation must be subsequently tested in an external validation. The external validation should be performed by the Polish center in the upcoming years in their local practice.

Our method is somewhat unorthodox, since the initial LASSO regression was applied to screen potential novel baseline prognostic factors in a binomial categorical approach, and these novel factors, along with the other previously well-established ones were fed forward into the Cox proportional model: in other words, coefficient of the linear predictor and the LASSO model were not used in the Cox proportional model, as if all covariates were set to 0.

Theoretically, the 3-stage approach should be evaluated sequentially at each stage in the same manner as the cross-validation to avoid overfitting at every stage. It might be argued that the prognostic model is being evaluated in the same dataset in which it was trained and may lead to estimates being overly optimistic.

Finally, since LASSO has been a de facto statistical method since the mid-1990s, presenting it as a ML method seems to be veering into the realm of ML hype. However, it has helped us identify, in a special dataset including systematic Doppler echocardiography measurement, a major prognostic factor of death not previously used in our conventional statistical model, namely PH.

The question remains whether this ML model, which is specific to this single-center Polish registry, will only improve the local or regional prognosis of these Polish patients. If so, it might suggest that geographic disparities are not necessarily mitigated by large randomized multicenter and multinational trials, and paradoxically, in the future, we may need relatively smaller but more granular internal datasets, systematically collecting data to be used with ML to predict future events for a local population treated by a single center, or a consortium of regional centers collecting data with the same preprocedural protocol.

Conclusions

The presented ML models proved useful in identifying nonconventional risk factors of all-cause death at 5 years by using all baseline components in the datasets. On the one hand, trial-specific characteristics can be used to establish a risk prediction model in a single dataset, which must be interpreted carefully before being applied to another dataset; on the other hand, a sophisticated prediction model trained by a large number of parameters with great granularity may harmonize decision-making globally and help foster the concept of precision medicine.