Comparison of an interpretable extreme gradient boosting model and an artificial neural network model for prediction of severe acute pancreatitis

Abstract

Introduction: Acute pancreatitis (AP) that progresses to persistent organ failure is referred to as severe acute pancreatitis (SAP). It is a condition associated with a relatively high mortality. A prediction model that would facilitate early recognition of patients at risk for SAP is crucial for improvement of patient prognosis.

Objectives: The aim of this study was to evaluate the accuracy of extreme gradient boosting (XGBoost) and artificial neural network (ANN) models for predicting SAP.

Patients and methods: A total of 648 patients with AP were enrolled. XGBoost and ANN models were developed and validated in the training (519 patients) and test sets (129 patients). The accuracy and predictive performance of the XGBoost and ANN models were evaluated using both the area under the receiver operating characteristic curves (AUCs) and the area under the precision-recall curves (AUC-PRs).

Results: A total of 15 variables were selected for model construction through a univariable analysis. The AUCs of the XGBoost and ANN models in 5-fold cross-validation of the training set were 0.92 (95% CI, 0.87–0.97) and 0.86 (95% CI, 0.78–0.92), respectively, whereas the AUCs for the test set were 0.93 (95% CI, 0.85–1) and 0.87 (95% CI, 0.79–0.96), respectively. The XGBoost model outperformed the ANN model in terms of both diagnostic accuracy and AUC-PR. Individual predictions of the XGBoost model were explained using a local interpretable model-agnostic explanation plot.

Conclusions: An interpretable XGBoost model showed better discriminatory efficiency for predicting SAP than the ANN model, and could be used in clinical practice to identify patients at risk for SAP.

What’s new?

Acute pancreatitis (AP) is considered an emergency gastrointestinal condition. A certain proportion of AP patients progress to severe acute pancreatitis (SAP), which is associated with high mortality. Early identification of SAP and initiation of treatment may improve patient prognosis. In this study, we developed and compared 2 types of machine learning models to predict SAP: an extreme gradient boosting (XGBoost) and an artificial neural network model. To our best knowledge, this is the first study to create an interpretable XGBoost model using local interpretable model-agnostic explanation diagrams for predicting SAP development. Moreover, we identified important prognostic factors of SAP. Owing to its high discriminatory efficiency, the XGBoost model could aid clinicians in early recognition of patients with SAP and enable them to take appropriate measures.

Introduction

Acute pancreatitis (AP) is one of the most common gastrointestinal diseases requiring hospitalization. It is associated with a high hospitalization cost in many countries. The global incidence of AP has risen over the last decades, with an average annual percent change of 3.07%,¹ resulting in an increased burden on health care systems. Although most patients with AP usually experience a mild disease that is self-limited and lasts approximately a week, about 20% to 30% progress to severe acute pancreatitis (SAP), with mortality rates ranging between 13% and 35%.² A majority of patients with SAP require acute care and nutritional support in an intensive care unit (ICU).³ Early and accurate identification of SAP is crucial to reduce mortality rates and improve clinical outcomes.⁴Therefore, it is important to recognize prognostic factors and establish a prediction model (prognostic scoring system) with high discriminatory efficiency for SAP.

Several prognostic scoring systems, such as the Ranson score, Acute Physiology and Chronic Health Evaluation (APACHE) II, Bedside Index of Severity in Acute Pancreatitis (BISAP), and Japanese severity score (JSS), have been commonly used for the prediction of AP severity in clinical practice.³ However, each of them has certain limitations. For example, some variables included in the Ranson score need to be calculated within 48 hours of hospital admission, resulting in a high risk of missing the optimal timing of treatment.⁵ APACHE II is difficult and cumbersome to be widely applied in clinical practice, as it comprises 12 mandatory variables that are not routinely obtained in patients who are not critically ill.⁵ Due to its simplicity, BISAP is useful for early prediction of severity in patients with AP; however, its accuracy is relatively low.⁵ Therefore, there is still no gold standard prognostic score for predicting SAP.

Nowadays, artificial intelligence (AI) methods are being widely utilized to determine prognosis of various diseases, and play an important role in clinical settings, as they can assist in clinical decision-making.^6-8 Artificial neural networks (ANNs) are a subset of traditional machine learning methods, belonging to the field of AI. Their structure and function are designed to resemble biologic nervous systems, with powerful learning algorithms and training capabilities to perform simulations with high accuracy.⁶ Using high-performance computer clusters, Andersson et al⁹ established an ANN model for prediction of SAP which outperformed a logistic regression model and APACHE II (area under the curve [AUC] values of 0.92, 0.84, and 0.63, respectively, for the ANN model, logistic regression model, and APACHE II). ANN models have relatively high sensitivity and specificity, but their interpretability is low because of the black box effect, which limits their clinical application.¹⁰

The extreme gradient boosting (XGBoost) algorithm has remarkable features that enable flexible and efficient processing of missing data. Additionally, it assembles weak prediction models to construct an accurate one, and has been used in clinical practice to predict the severity and outcomes of AP.^11,12 Thapa et al¹¹ developed an XGBoost algorithm to identify patients who would benefit from treatment of SAP. Their study was limited by the fact that persistent systemic inflammatory response syndrome (SIRS), rather than persistent organ failure, was considered a gold standard for establishing the SAP diagnosis.¹¹ In addition, local individualized prediction was not accounted for. Kui et al¹² developed an early achievable severity index using the XGBoost machine learning algorithm for prediction of severe AP within 24 hours of hospital admission. However, the authors did not exclude patients with organ failure on admission, nor did they provide a comparison between XGBoost and ANN models.¹² Therefore, the aim of the present study was to develop and validate an interpretable XGBoost model, and to compare its performance with that of the traditional ANN model for predicting SAP.

Patients and methods

Inclusion and exclusion criteria

This study was a post-hoc analysis of our previous cohort studies, which included 648 consecutive, eligible patients with AP, treated at the First Affiliated Hospital of Wenzhou Medical University, a tertiary referral center in mainland China.^4,13 Patients with AP admitted to the hospital within 72 hours of the symptom onset were enrolled in the study between April 1, 2012 and December 31, 2015.¹³ For the diagnosis of AP, at least 2 of the following features were required: characteristic abdominal pain consistent with AP, laboratory investigations with amylase and / or lipase levels more than 3 times the upper limit of normal, and typical abdominal findings on cross-sectional imaging.⁴ The exclusion criteria were described in detail previously, and comprised pancreatitis induced by trauma or endoscopy, concomitant pancreatic cancer, acute exacerbation of chronic pancreatitis, a history of surgery or treatment with lipid-lowering agents, malnutrition, and liver or kidney disease.¹³

Data collection

The collected data included age, sex, body mass index (BMI), duration of symptoms, presence of SIRS, etiology of AP, and selected laboratory parameters. Duration of symptoms was defined as the time from the onset of symptoms to admission. The analyzed symptoms included abdominal pain and other gastrointestinal symptoms related to AP. SIRS was defined as the presence of the least 2 of the following criteria: 1) body temperature greater than 38 °C or lower than 36 °C; 2) respiratory rate greater than 20/min or partial pressure of carbon dioxide lower than 32 mm Hg; 3) heart rate greater than 90 bpm; 4) leukocyte count greater than 12 × 10⁹/l or lower than 4 × 10⁹/l, or more than 10% of immature forms.¹⁴ Various blood biochemical indicators, including liver and kidney function parameters, blood glucose, lipids, coagulation parameters, serum calcium, and C-reactive protein (CRP), were collected according to the previously described data regarding predictive scores, such as APACHE II and BISAP.⁴ Imaging examinations (computed tomography or ultrasonography) were performed to determine the presence of pleural effusion.¹³

Definition of severity and study end point

The criterion for SAP diagnosis was persistent organ failure lasting more than 48 hours.¹³ The definition of organ failure was based on a modified Marshall score greater than or equal to 2, which means that at least 1 organ system, including the respiratory, cardiovascular, and renal systems, is functionally impaired.⁴ The primary end point of the study was occurrence of SAP during hospitalization.

Sample size and missing values

We calculated the sample size of this study based on data from our previous paper.⁴ Data on serum calcium and CRP values were partially missing in our cohort. In order to address this issue, we used the multiple imputation by chained equations (MICE) method to sustain the completeness of the sample size, reduce biased parameter estimates, and increase statistical power of the XGBoost and ANN analyses.¹⁵ MICE is one of the most common and flexible algorithms, which iteratively fits a predictive model for variables with missing values and creates a “complete” dataset.^15,16

Ethics statement

The Ethics Committee of the First Affiliated Hospital of Wenzhou Medical University approved this study protocol (KY2023-R270). Written informed consent from the participants was not required because their data were analyzed retrospectively and anonymously.

Statistical analysis

Categorical variables were presented as numbers and percentages and compared by the Fisher exact test or the χ² test. According to the results of the Shapiro–Wilk test, continuous variables were expressed as mean (SD) when they were normally distributed, or as median with interquartile range (IQR) when their distribution was non-normal. Continuous variables were compared by the t test or the Wilcoxon rank-sum test, as appropriate.

An exploratory variable importance analysis was performed to evaluate the role of different variables in SAP prediction by both XGBoost and ANN models. For the XGBoost model, variable importance was quantified by a Shapley additive explanations (SHAP) summary plot, and the individual predictions were explained by a SHAP force plot.¹⁷ For the ANN model, the importance of each variable was determined by evaluating how much the accuracy decreases after adding a variable to the ANN model, using mean decrease accuracy.⁶

Model evaluation was based on 5-fold cross-validation, which means that the entire cohort of 648 patients was randomly divided into 5 equal subsets. One of these subsets was randomly selected as the test set (129 patients), while the remaining ones were labelled as training sets (519 patients). The XGBoost and ANN models were developed on the training sets (n = 519) and independently validated on the test set (n = 129) using the “caret” package.¹⁸ To build and tune the XGBoost and ANN models on the training set, we used 5-fold cross-validation as the resampling method to avoid overﬁtting of the model with new data.¹⁸ The training set (n = 519) was divided into 5 equal-size subsamples, of which 4 subsamples (n = 415) served for training and the remaining one (n = 104) for testing of all possible permutations. The analysis was repeated 3 times (folds).¹⁷ The mean area under the receiver operating characteristic (ROC) curves with 95% CI as well as area under precision recall curve (AUC-PR) were used to evaluate the discriminatory power of the models.¹⁷ Comparison of the AUC values was performed using the method proposed by Cleves et al.¹⁹

Sensitivity, specificity, and diagnostic accuracy of the XGBoost and ANN models were calculated, and the optimal cutoff value was selected according to the maximum value of the Youden index (sensitivity + specificity – 1). The local interpretable model-agnostic explanation (LIME) plot was used to explain the individual prediction to overcome the black box effect of the XGBoost output and improve its interpretability.¹⁷ With this novel explanation technique, classifier predictions were interpreted and reliably explained by learning interpretable models locally around the prediction.²⁰

A data flow diagram of our study is shown in Supplementary material, Figure S1. All statistical analyses were performed with R software, version 4.1.1 (R Foundation for Statistical Computing, Vienna, Austria) and STATA software, version 10.0 (StataCorp LP, College Station, Texas, United States). A 2-tailed P value below 0.05 was considered significant.

Results

Baseline characteristics

The 3 most common etiologies of AP in the study population were biliary abnormalities (42.4%), excessive use of alcohol (16.4%), and hypertriglyceridemia (5.6%). The median (IQR) length of hospital stay for patients with and without SAP was 16 (10–31) and 10 (7–13) days, respectively. The incidence of SAP and mortality during hospitalization were 10% and 1.54%, respectively. There was no difference between the training and test sets with respect to most laboratory and clinical characteristics. However, the patients in the training set had higher BMI than those included in the test sets. The SIRS rate and serum creatinine level in the test sets were higher than those observed in the training set (P <⁠0.05). Baseline characteristics of patients included in the training and test sets are presented in Supplementary material, Table S1.

Univariable analysis of the training sample

A total of 23 variables were included in the univariable analysis (Table 1). Of those, 8 did not differ significantly between the patients with and without SAP. The patients with SAP more often presented with SIRS, pleural effusion, and abnormal serum total cholesterol (TC) concentration, as compared with the patients without SAP. Similarly, the SAP group had higher serum hematocrit, aspartate aminotransferase (AST), glucose, serum creatinine, blood urea nitrogen (BUN), CRP, and triglyceride levels and longer prothrombin time, as compared with the non-SAP group. The individuals with SAP also had a lower level of serum albumin, high-density lipoprotein cholesterol (HDL-C), low-density lipoprotein cholesterol (LDL-C), and serum calcium.

**Table 1**. Clinical and laboratory characteristics of patients with and without severe acute pancreatitis included in the training sample (n = 519)
Variable			No SAP (n = 467)	SAP (n = 52)	P value
Age, y			47 (37–62)	52.5 (38–68)	0.06
Male sex			288 (61.7)	32 (61.5)	0.99
Duration of symptoms, d, mean (SD)			1.8 (0.8)	1.9 (0.8)	0.42
BMI, kg/m²			23.8 (20.3–26.3)	23.9 (22–25.9)	0.94
SIRS			160 (34.3)	35 (67.3)	<⁠0.001
Etiology of AP	Biliary etiology		205 (43.9)	19 (36.5)	0.23
	Hypertriglyceridemia		23 (4.9)	5 (9.6)
	Alcohol		61 (13.1)	4 (7.7)
	Other		178 (38.1)	24 (46.2)
Pleural effusion			60 (12.9)	35 (67.3)	<⁠0.001
Laboratory findings
Hematocrit, l/l			0.42 (0.38–0.45)	0.44 (0.4–0.47)	0.01
Platelets, × 10⁹/l			199 (164–233)	190 (142–233)	0.1
Prothrombin time, s			13.8 (13.1–14.6)	14.6 (13.2–15.3)	0.002
Albumin, g/l			36.9 (33.6–40.1)	31.5 (27.7–35)	<⁠0.001
Total bilirubin, µmol/l			20 (14–33)	19 (14–26.5)	0.36
ALT, U/l			40 (18–119)	48 (24–75)	0.61
AST, U/l			33 (20–88)	63 (36–89)	0.003
Glucose, mmol/l			7.8 (6.5–10.6)	10.3 (8.4–14.7)	<⁠0.001
Serum creatinine, μmol/l			63 (53–75)	79 (58–128)	<⁠0.001
BUN, mmol/l			4.5 (3.5–5.9)	7.9 (5.3–11.4)	<⁠0.001
Total cholesterol		<⁠4.2 mmol/l	145 (31.3)	26 (50)	0.002
		4.2–6.2 mmol/l	205 (43.9)	10 (19.2)
		>6.2 mmol/l	117 (25.1)	16 (30.8)
HDL-C, mmol/l			1.1 (0.8–1.3)	0.6 (0.4–1)	<⁠0.001
LDL-C, mmol/l			2.4 (1.9–3.2)	1.7 (1.3–2.7)	<⁠0.001
Triglycerides, mmol/l			1.3 (0.8–3.4)	2.4 (1.3–7.1)	<⁠0.001
Serum calcium, mmol/l			2.2 (2.1–2.3)	2 (1.6–2.2)	<⁠0.001
CRP, mg/l			31 (9.6–84.9)	76.1 (26.4–90)	0.009
Data are shown as numbers and percentages or median (interquartile range) unless indicated otherwise. SI conversion factors: to convert ALT and AST to μkat/l, multiply by 0.0167; CRP to nmol/l, by 9.524. Abbreviations: ALT, alanine aminotransferase; AST, aspartate aminotransferase; BMI, body mass index; BUN, blood urea nitrogen; CRP, C-reactive protein; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; SAP, severe acute pancreatitis; SIRS, systemic inflammatory response syndrome

Exploratory variable importance analysis of the training sample

The 15 variables (SIRS, hematocrit, prothrombin time, albumin, AST, glucose, serum creatinine, BUN, TC, HDL-C, LDL-C, triglycerides, serum calcium, CRP, and pleural effusion) that were significant in the univariable analysis were used to build the XGBoost and ANN machine learning models. In the ANN model, glucose was found to be the most important predictor of SAP, followed by albumin and presence of pleural effusion (Figure 1). The SHAP summary plot visualized the relative importance of each variable included in the XGBoost model. The 3 most important variables were BUN, presence of pleural effusion, and HDL-C (Figure 2).

**Figure 1**. Variable importance plot of the artificial neural network (ANN) model for severe acute pancreatitis
Abbreviations: see Table 1

**Figure 2**. Variable importance plot of the extreme gradient boosting model for severe acute pancreatitis. The Shapley additive explanation (SHAP) value (x axis) reflects the predictive ability of each parameter.
Abbreviations: see Table 1

Model development, 5-fold cross-validation, and calibration on the training sample

The results of 5-fold cross-validation indicated that the XGBoost model achieved a greater mean AUC than the ANN model (mean AUC = 0.92; 95% CI, 0.87–0.97 vs mean AUC = 0.86; 95% CI, 0.78–0.92, respectively; P <⁠0.001) (Figure 3). A greater AUC-PR was also observed for the XGBoost model than for the ANN model (0.63 vs 0.48) (Figure 4). The calibration plots indicated adequate predicted probabilities against the observed proportions of SAP for both the XGBoost and ANN models (Supplementary material, Figure S2).

**Figure 3**. Receiver operator characteristic curves of the extreme gradient boosting (XGBoost) and artificial neural network (ANN) models for 5-fold cross-validation on the training set
Abbreviations: AUC, area under the curve

**Figure 4**. Precision-recall (PR) curves for the extreme gradient boosting (XGBoost) and artificial neural network (ANN) models for 5-fold cross-validation in the training set. The vertical line represents precision of the XGBoost and ANN models when the recall (sensitivity) equals 1.
Abbreviations: see Figure 3

Validation, comparison, and calibration of the prediction models on the test samples

The ROC curves for the XGBoost model, the ANN model, and the BISAP score for the prediction of SAP are shown in Supplementary material, Figure S3. The XGBoost model achieved the highest AUC (AUC = 0.93; 95% CI, 0.85–1), followed by the ANN model (AUC = 0.87; 95% CI, 0.79–0.96), and the BISAP score (AUC = 0.74; 95% CI, 0.58–0.89; P <⁠0.001). The AUC-PR of the XGBoost model was higher than that of the ANN model (0.59 vs 0.49) (Supplementary material, Figure S4).

Based on the maximum value of the Youden index, the optimal cutoff values of the XGBoost model and the ANN model were 0.24 and 0.05, respectively. The XGBoost model achieved sensitivity of 92.3%, specificity of 92.2%, and diagnostic accuracy of 92.2%. In comparison, the ANN model achieved similar sensitivity (92.3%), lower specificity (73.2%), and lower diagnostic accuracy (75.2%).

The calibration plots visualizing the predicted probabilities against the observed proportions of SAP for the XGBoost and ANN models are shown in Supplementary material, Figure S5.

Explanation: individual prediction on the test sample

To clarify the model prediction for individual patients, the LIME plot was used to visualize 2 typical predictions made by the XGBoost model; 1 for a non-SAP and 1 for a SAP patient (Figure 5).

**Figure 5**. Local interpretable model-agnostic explanation plot for the individual likelihood of 2 typical predictions, showing the main contributing features behind the model prediction. The length of the color bar represents the degree of contribution. A – a correctly identified case of a non-SAP patient: a 76-year-old woman with no SIRS, hematocrit of 0.3 l/l, prothrombin time of 14.2 s, albumin of 31.1 mg/dl, AST of 58 U/l, glucose of 4.6 mmol/l, serum creatinine of 78 μmol/l, BUN of 6.1 mmol/l, total cholesterol of 5.9 mmol/l, HDL-C of 1.17 mmol/l, LDL-C of 3.73 mmol/l, triglycerides of 1.64 mmol/l, calcium of 1.99 mmol/l, CRP of 137 mg/l, and no pleural effusion. The absence of pleural effusion and normal glucose values were the main reasons for classification in the non-SAP group, outweighing other factors, such as increased BUN and AST values and decreased calcium levels. B – a correctly identified case of a SAP patient: a 41-year-old woman with SIRS, hematocrit of 0.41 l/l, prothrombin time of 13.9 s, albumin of 28.6 mg/dl, AST of 50 U/l, glucose of 10.2 mmol/l, serum creatinine of 55 μmol/l, BUN of 5.8 mmol/l, total cholesterol of 18.3 mmol/l, HDL-C of 0.62 mmol/l, LDL-C of 2.01 mmol/l, triglycerides of 48.2 mmol/l, calcium of 0.87 mmol/l, CRP of 90 mg/l, and pleural effusion. The presence of pleural effusion and low HDL-C values were the main reasons for classification in the SAP group, outweighing other factors, such as normal BUN and LDL-C values.
Abbreviations: see Table 1

For example, the first correctly classified case (case 222) was a non-SAP patient. The woman was 76 years old. The lack of pleural effusion and normal glucose level were the main reasons for classifying the patient as non-SAP, outweighing other factors, such as increased blood urea nitrogen and aspartate transaminase levels, and decreased calcium concentration.

The second correctly classified case (case 224) was a SAP patient. This woman was 41 years old. The presence of pleural effusion and low HDL-C level were the main reasons for patient classification into the SAP group, outweighing other factors, such as normal blood urea nitrogen and LDL-C levels.

Discussion

SAP is characterized by persistent organ failure and high mortality.² To improve patient prognosis, early identification of this condition is very important. Our study developed the XGBoost and ANN models and compared their efficiency for SAP prediction. The results of 5-fold cross-validation showed that the XGBoost model outperformed the ANN model on the training set, with AUC values of 0.92 and 0.86, respectively. A greater AUC-PR was also observed for the XGBoost model than for the ANN model (0.63 vs 0.48). We validated the results on the test set and utilized a LIME plot to explain individual predictions made by the XGBoost model. Finally, we identified important predictors of SAP, including BUN, pleural effusion, and HDL-C, which were the 3 most important parameters in the XGBoost model.

Increased serum glucose levels are a common early characteristic of AP, and have been generally considered a transient phenomenon throughout AP.²¹ Their occurrence leads to damage of various pancreatic cells and activation of the neuroendocrine system, which causes exocrine and endocrine dysfunction and affects glucose homeostasis.^3,22 According to a cross-sectional study, almost 40% of patients without a history of diabetes mellitus showed altered glucose metabolism (AGM) after an episode of AP.²³ Rekeneire et al,²⁴ using tumor necrosis factor α, CRP, and interleukin 6 (IL-6) levels as indicators of the inflammatory event, concluded that dysglycemia was associated with inflammation and showed that this relationship also extended to hyperglycemia. Moreover, several previous longitudinal studies have shown that inflammation may be a predictive factor for the onset of diabetes.^25-27 Therefore, the association between diabetes and inflammation could be explained by a reciprocal interaction, suggesting that an incipient rise in serum IL-6 during AP may lead to AGM. High blood sugar levels and abnormal glucose metabolism could be an indication of more serious AP,²⁸ and have been used in prognostic models to predict SAP.^13,29 We showed that the glucose level was the most significant variable for forecasting SAP in the ANN model (Figure 1), and ranked fourth in the XGBoost model (Figure 2); therefore, it plays an important role in predicting SAP.

Albumin, an indispensable liver protein responsible for the maintenance of osmolar balance, generation of antioxidative compounds, and trapping free radicals, has also been long considered a negative acute phase protein whose production is reduced in inflammation, opening the way for proinflammatory cytokines.³⁰ Ocskay et al³¹ showed that a low albumin level on admission was an independent risk factor for both mortality and severe disease in patients with AP, with an odds ratio of 5.256 and 3.62, respectively, in the groups with albumin levels lower than 25 g/l. In our study, albumin was found to be an impactful indicator of SAP according to the variable relevance assessment (Figure 1) in the ANN model, which is consistent with previous studies.^30,31 However, it did not show good discriminatory performance for predicting SAP in the SHAP analysis for the XGBoost model (Figure 2).

Pleural effusion is common in AP patients, and it usually resolves as pancreatitis attenuates. Various reasons have been suggested to explain the development of pleural effusion in the setting of pancreatitis, for example, transdiaphragmatic lymphatic blockage, disruption of pancreatic duct, pancreatic pseudocyst, and anatomy (certain anatomic tracts between the chest and abdominal spaces).³² Generally, left-sided effusions that show normal levels of amylase in the fluid are caused by chemical or sympathetic factors.³² If pleural effusion occurs on the right side, a pseudocyst of the pancreas or a fistula between the pancreas and the pleura may be involved in its genesis.³³ Pleural effusion has been integrated into several clinical grading scores for predicting SAP, such as the BISAP score,³ and its volume is considered to be a valid imaging biomarker for assessing the severity and clinical course of AP.^34,35 In our study, pleural effusion was found to be of great importance in the analysis of variable importance in the ANN model (Figure 1), and the SHAP summary plot for the XGBoost model showed that it was useful for prediction of SAP (Figure 2).

BUN is the most important substance in the metabolism of proteins and is excreted mainly by the kidneys. Acute renal failure is a common organ injury in patients with SAP that increases the risk of mortality.³⁶ Elevated BUN levels can be explained by several factors contributing to the loss of intravascular volume.³⁷ On the one hand, endothelial dysfunction has been found in patients with AP,^37,38 which manifests as increased capillary permeability resulting in decreased blood volume. On the other hand, premature activation of pancreatic enzymes during AP leads to autodigestion of surrounding tissues, directly leading to renal injury.³ In addition, cytokines, including IL-1β, IL-8, and IL-6, interact with endothelial cells, leading to renal ischemia and secretion of free oxygen radicals.³⁹ For these reasons, a decrease in splanchnic perfusion, followed by impaired renal blood flow, leads to renal impairment and acute necrosis of the tubules.³⁷ As expected, based on the importance analysis of the ANN model variables (Figure 1), we showed that BUN was a major predictor of SAP. The SHAP analysis indicated that an increased BUN level on admission was the most salient parameter in the XGBoost model (Figure 2).

Cholesterol homeostasis, which requires a complex balance between biosynthesis, absorption, excretion, and esterification, is important for maintaining adequate cellular and systemic responses.⁴⁰ Elevated levels of HDL-C are commonly strongly related to a decreased risk of cardiovascular disease as a protective factor.⁴¹ Recently, studies have indicated that HDL-C may be of vital importance to the immune system, and decreased HDL-C concentration correlates with elevated serum CRP levels.⁴² HDL-C concentration is significantly reduced during the acute phase of inflammation.⁴³ Studies have further demonstrated that low levels of HDL-C correlate with a worse prognosis in septic patients.⁴⁴ A possible explanation of this association could be the ability of HDL-C, which carries a lipopolysaccharide-binding protein, to neutralize and clear proinflammatory endotoxins as a component of innate immunity. HDL-C has also been shown to exhibit antioxidant and anti-inflammatory effects,⁴⁵ whereas free radicals and oxidative stress, in relation to the severity of pancreatitis, are implicated in causing AP.⁴⁶ In addition, HDL-C inhibits bone marrow–derived hematopoietic stem cell proliferation; this way, the development of immune cells is controlled and inappropriate leukopoiesis is avoided.⁴⁷ Li et al⁴⁸ found that serum concentrations of HDL-C correlated negatively with SAP. In our study, patients with SAP presented with low serum HDL-C and calcium values, as compared with those without this condition. The SHAP analysis showed that HDL-C was a useful parameter for predicting SAP in the XGBoost model (Figure 2).

Machine learning has been widely applied to predict the severity or complications of AP.⁶ ANN models, one of the AI methods designed to mimic the structure and performance of biological nervous systems, have been applied to predict SAP. However, previous research has been hampered by a lack of individual predictability of the model tested.

XGBoost is designed to be an extremely scalable, end-to-end solution. It proposes a new sparse sensing algorithm for parallel tree learning, makes the missing values have a default split direction, and proposes an effective cache structure to increase training efficiency.⁴⁹ Similarly to the other machine learning methods, it still poses a challenge due to the limited possibility of interpreting the results derived from machine learning. The magnitudes of the variables can be measured and described using a SHAP summary plot, which improves interpretability of the representation.¹⁷ This plot displays the relationship between trait values, and the values of SHAP in the training set can also be used to learn how individual patient characteristics affect the performance of the prediction model itself.¹⁷ We found that, as compared with the BISAP score and the ANN model, the XGBoost model showed a greater discriminatory ability to predict SAP in both training and test sets (Figures 3 and 4; Supplementary material, Figures S3 and S4). With the help of the XGBoost algorithm, we could identify key parameters and build a prediction model capable of identifying individuals at risk for SAP with high accuracy. The LIME plot offered a visual representation of the individual variable importance, which might help clinicians better interpret the results of the ANN and XGBoost models (Figure 5). With respect to accuracy, the XGBoost model showed the highest discriminatory performance on the test samples (AUC = 0.93), followed by the ANN model (AUC = 0.87), and the BISAP score (AUC = 0.74) (Supplementary material, Figure S3). The AUC-PR analysis confirmed that the XGBoost model performed better than the other models, both on the training and the test samples (Figure 4; Supplementary material, Figure S4).

To our best knowledge, this is the first study to present an interpretable XGBoost model with LIME diagrams for predicting SAP development. The strengths of this study include a large number of patients, which ensures strong statistical power. Both the patients admitted to the ICU and those treated in the general ward were enrolled, thus reducing the selection bias. However, certain limitations need to be acknowledged. Firstly, it was a single-center analysis; thus, the applicability of our model to other cohorts is unknown. Secondly, we did not further subdivide the non-SAP group into mild and moderately severe groups when developing the prediction models. It may have affected the results regarding accuracy of the established models to a certain extent. Thirdly, the failure to compare the XGBoost model with other prediction scores used in clinical practice, such as APACHE II and JSS, may be another shortcoming. Additionally, although it has been previously validated internally with multiple tests through the 5-fold cross-validation technique, it is necessary to test the performance of our XGBoost model on an independent external sample. Lastly, XGBoost models are very sophisticated and difficult to understand even if proven to be effective, thus becoming comparable to a kind of “black box.” Consequently, we demonstrated how, by using LIME graphs, the results can be interpreted more easily.

In conclusion, as compared with the ANN model, an interpretable XGBoost model showed higher discriminatory efficiency for prediction of SAP. Interpretation of the model using a LIME diagram has certain application value in the field of precision medicine.

Supplementary material.pdf

Correspondence to

Wandong Hong, MD, Department of Gastroenterology and Hepatology, the First Affiliated Hospital of Wenzhou Medical University, 325000 Nanbaixiang, Ouhai District, Wenzhou City, Zhejiang, People’s Republic of China, phone: +86 0577 55579122, email: xhnk-hwd@163.com

Received

November 15, 2023.

Revision accepted

March 9, 2024.

Published online

March 15, 2024.

Data availability statement

The datasets used and / or analyzed during the current study are available from the corresponding author upon request.

Acknowledgments

None.

Funding

This work was supported by Wenzhou Science and Technology Bureau (Y2020010 and Y20220317) and Zhejiang Medical and Health Science and Technology Plan Project (2022KY886); to WH.

Contribution statement

WH conceived the concept of the study and was responsible for data collection and statistical analysis. WH, YL, MQ, and SP drafted the manuscript. ZB, MZ, and SF helped to finalize the manuscript. All authors read and approved the manuscript.

Conflict of interest

ZB is an employee of Alpha Genomics Private Limited. Other authors declare no conflict of interest.

How to cite

Lu Y, Qiu M, Pan S, et al. Comparison of an interpretable extreme gradient boosting model and an artificial neural network model for prediction of severe acute pancreatitis. Pol Arch Intern Med. 2024; 134: 16700. doi:10.20452/pamw.16700

1.: Iannuzzi JP, King JA, Leong JH, et al. Global incidence of acute pancreatitis is increasing over time: a systematic review and meta-analysis. Gastroenterology. 2022; 162: 122-134.Crossref
2.: Leppäniemi A, Tolonen M, Tarasconi A, et al. 2019 WSES Guidelines for the management of severe acute pancreatitis. World J Emerg Surg. 2019; 14: 27.Crossref
3.: Boxhoorn L, Voermans RP, Bouwense SA, et al. Acute pancreatitis. Lancet. 2020; 396: 726-734.Crossref
4.: Hong W, Zimmer V, Basharat Z, et al. Association of total cholesterol with severe acute pancreatitis: a U-shaped relationship. Clin Nutr. 2020; 39: 250-257.Crossref
5.: Hong W, Lillemoe KD, Pan S, et al. Development and validation of a risk prediction score for severe acute pancreatitis. J Transl Med. 2019; 17: 146.Crossref
6.: Hong WD, Chen XR, Jin SQ, et al. Use of an artificial neural network to predict persistent organ failure in patients with acute pancreatitis. Clinics. 2013; 68: 27-31.Crossref
7.: Qin X, Zhang W, Hu X, et al. A deep learning model to identify the fluid overload status in critically ill patients based on chest X-ray images. Pol Arch Intern Med. 2023; 133: 16396.Crossref
8.: Chrzan R, Wojciechowska W, Terlecki M, et al. The role of artificial intelligence technology analysis of high-resolution computed tomography images in predicting the severity of COVID-19 pneumonia. Pol Arch Intern Med. 2022; 132: 16332.Crossref
9.: Andersson B, Andersson R, Ohlsson M, et al. Prediction of severe acute pancreatitis at admission to hospital using artificial neural networks. Pancreatology. 2011; 11: 328-335.Crossref
10.: Zhou Y, Ge YT, Shi XL, et al. Machine learning predictive models for acute pancreatitis: a systematic review. Int J Med Inform. 2022; 157: 104641.Crossref
11.: Thapa R, Iqbal Z, Garikipati A, et al. Early prediction of severe acute pancreatitis using machine learning. Pancreatology. 2022; 22: 43-50.Crossref
12.: Kui B, Pinter J, Molontay R, et al. EASY-APP: an artificial intelligence model and application for early and easy prediction of severity in acute pancreatitis. Clin Transl Med. 2022; 12: e842.Crossref
13.: Hong W, Lu Y, Zhou X, et al. Usefulness of random forest algorithm in predicting severe acute pancreatitis. Front Cell Infect Microbiol. 2022; 12: 893294.Crossref
14.: Bone RC, Balk RA, Cerra FB, et al. Definitions for sepsis and organ failure and guidelines for the use of innovative therapies in sepsis. The ACCP/SCCM Consensus Conference Committee. American College of Chest Physicians / Society of Critical Care Medicine. Chest. 1992; 101: 1644-1655.Crossref
15.: Royston P. Multiple imputation of missing values: update of ice. The Stata Journal. 2005; 5: 527-536.Crossref
16.: Laqueur HS, Shev AB, Kagawa RMC. SuperMICE: an ensemble machine learning approach to multiple imputation by chained equations. Am J Epidemiol. 2022; 191: 516-525.Crossref
17.: Hong W, Zhou X, Jin S, et al. A comparison of XGBoost, random forest, and nomograph for the prediction of disease severity in patients with COVID-19 pneumonia: implications of cytokine and immune cell profile. Front Cell Infect Microbiol. 2022; 12: 819267.Crossref
18.: Kuhn M. Building predictive models in R using the caret package. J Stat Softw. 2008; 28: 1-26.Crossref
19.: Cleves MA. From the help desk: comparing areas under receiver operating characteristic curves from two or more probit or logit models. The Stata Journal. 2002; 2: 301-313.Crossref
20.: Ribeiro MT, Singh S, Guestrin C. “Why should I trust you?”: explaining the predictions of any classifier. KDD ‘16: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016; 1135-1144.Crossref
21.: Szatmary P, Grammatikopoulos T, Cai W, et al. Acute pancreatitis: diagnosis and treatment. Drugs. 2022; 82: 1251-1276.Crossref
22.: Wynne K, Devereaux B, Dornhorst A. Diabetes of the exocrine pancreas. J Gastroenterol Hepatol. 2019; 34: 346-354.Crossref
23.: Pendharkar SA, Asrani VM, Xiao AY, et al. Relationship between pancreatic hormones and glucose metabolism: a cross-sectional study in patients after acute pancreatitis. Am J Physiol Gastrointest Liver Physiol. 2016; 311: G50-G58.Crossref
24.: de Rekeneire N, Peila R, Ding J, et al. Diabetes, hyperglycemia, and inflammation in older individuals: the health, aging and body composition study. Diabetes Care. 2006; 29: 1902-1908.Crossref
25.: Pradhan AD, Manson JE, Rifai N, et al. C-reactive protein, interleukin 6, and risk of developing type 2 diabetes mellitus. JAMA. 2001; 286: 327-334.Crossref
26.: Freeman DJ, Norrie J, Caslake MJ, et al. C-reactive protein is an independent predictor of risk for the development of diabetes in the West of Scotland Coronary Prevention Study. Diabetes. 2002; 51: 1596-1600.Crossref
27.: Esposito K, Nappo F, Marfella R, et al. Inflammatory cytokine concentrations are acutely increased by hyperglycemia in humans: role of oxidative stress. Circulation. 2002; 106: 2067-2072.Crossref
28.: Rajaratnam SG, Martin IG. Admission serum glucose level: an accurate predictor of outcome in gallstone pancreatitis. Pancreas. 2006; 33: 27-30.Crossref
29.: Hong W, Dong L, Huang Q, et al. Prediction of severe acute pancreatitis using classification and regression tree analysis. Dig Dis Sci. 2011; 56: 3664-3671.Crossref
30.: Hong W, Lin S, Zippi M, et al. Serum albumin is independently associated with persistent organ failure in acute pancreatitis. Can J Gastroenterol Hepatol. 2017; 2017: 5297143.Crossref
31.: Ocskay K, Vinko Z, Nemeth D, et al. Hypoalbuminemia affects one third of acute pancreatitis patients and is independently associated with severity and mortality. Sci Rep. 2021; 11: 24158.Crossref
32.: Kumar P, Gupta P, Rana S. Thoracic complications of pancreatitis. JGH Open. 2019; 3: 71-79.Crossref
33.: Chmielecki J, Kościński T, Banasiewicz T. Pancreaticopleural fistula as a rare cause of both-sided pleural effusion. Case Rep Surg. 2021; 2021: 6615612.Crossref
34.: Yan G, Li H, Bhetuwal A, et al. Pleural effusion volume in patients with acute pancreatitis: a retrospective study from three acute pancreatitis centers. Ann Med. 2021; 53: 2003-2018.Crossref
35.: Peng R, Zhang L, Zhang ZM, et al. Chest computed tomography semi-quantitative pleural effusion and pulmonary consolidation are early predictors of acute pancreatitis severity. Quant Imaging Med Surg. 2020; 10: 451-463.Crossref
36.: Garg PK, Singh VP. Organ failure due to systemic injury in acute pancreatitis. Gastroenterology. 2019; 156: 2008-2023.Crossref
37.: Tomkötter L, Erbes J, Trepte C, et al. The effects of pancreatic microcirculatory disturbances on histopathologic tissue damage and the outcome in severe acute pancreatitis. Pancreas. 2016; 45: 248-253.Crossref
38.: Cuthbertson CM, Christophi C. Disturbances of the microcirculation in acute pancreatitis. Br J Surg. 2006; 93: 518-530.Crossref
39.: Nassar TI, Qunibi WY. AKI-associated with acute pancreatitis. Clin J Am Soc Nephrol. 2019; 14: 1106-1115.Crossref
40.: Luo J, Yang H, Song BL. Mechanisms and regulation of cholesterol homeostasis. Nat Rev Mol Cell Biol. 2020; 21: 225-245.Crossref
41.: Ben-Aicha S, Badimon L, Vilahur G. Advances in HDL: much more than lipid transporters. Int J Mol Sci. 2020; 21: 732.Crossref
42.: Choy E, Sattar N. Interpreting lipid levels in the context of high-grade inflammatory states with a focus on rheumatoid arthritis: a challenge to conventional cardiovascular risk actions. Ann Rheum Dis. 2009; 68: 460-469.Crossref
43.: Jahangiri A. High-density lipoprotein and the acute phase response. Curr Opin Endocrinol Diabetes Obes. 2010; 17: 156-160.Crossref
44.: Madsen CM, Varbo A, Tybjærg-Hansen A, et al. U-shaped relationship of HDL and risk of infectious disease: two prospective population-based cohort studies. Eur Heart J. 2018; 39: 1181-1190.Crossref
45.: Murch O, Collin M, Hinds CJ, et al. Lipoproteins in inflammation and sepsis. I. Basic science. Intensive Care Med. 2007; 33: 13-24.Crossref
46.: Kong L, Deng J, Zhou X, et al. Sitagliptin activates the p62-Keap1-Nrf2 signalling pathway to alleviate oxidative stress and excessive autophagy in severe acute pancreatitis-related acute lung injury. Cell Death Dis. 2021; 12: 928.Crossref
47.: Wang SH, Yuan SG, Peng DQ, et al. HDL and ApoA-I inhibit antigen presentation-mediated T cell activation by disrupting lipid rafts in antigen presenting cells. Atherosclerosis. 2012; 225: 105-114.Crossref
48.: Li Y, Zheng R, Gao F, et al. Association between high-density lipoprotein cholesterol and apolipoprotein A-I and severe acute pancreatitis: a case-control study. Eur J Gastroenterol Hepatol. 2021; 33: 1517-1523.Crossref
49.: Chen T, Guestrin C. XGBoost: a scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. San Francisco, California, USA: Association for Computing Machinery; 2016; 785-794.Crossref