Individualized prediction of psychiatric readmissions for patients with major depressive disorder: a 10-year retrospective cohort study


Data source

The sample was drawn from the EMR system of West China Hospital (WCH), Sichuan University. WCH is one of the largest single-site hospitals in the world and a leading medical center of West China, treating complicated and severe cases. The Health Information System (HIS) was gradually adopted in WCH in late 2008, through which physicians kept reports of admissions, progresses, and discharges, entered orders, requested, and viewed results of laboratory tests and examinations. Nurses and faculties from clinical laboratories also managed patients and controlled the quality of clinical practice via the HIS and the Laboratory Information System (LIS), respectively. In 2009, an EMR system integrated with the HIS and the LIS was adopted in all departments throughout the hospital, which was set as the starting time of our data extraction.

Study subjects

The aim of this study was to predict the risk of an MDD individual psychiatric readmission within 30, 60, 90, 180, and 365 days based on medical records of the index admission. The index admission was defined as the initial inpatient record of major depression for each patient. The initial sample comes from a research database created by extracting information from the EMR of WCH, containing 36,780 admission records of 21,964 inpatients with a diagnosis name including the words either “depression”, “mania”, or “bipolar disorder”, who were hospitalized and then discharged at least once between January 1, 2009 and December 31, 2018. At baseline, a sample contained 13,177 MDD patients were recruited as the study population and each patient’s index admission record was extracted for analyses (Supplementary Information 1.1). No other exclusion criteria were applied except for data-cleaning considerations. Fig. 2 shows the process of data extraction.

Fig. 2
figure 2

Flow diagram of the subject inclusion/exclusion process.

We used the presence of an F32 or F33 ICD-10 code in either the main or supplementary position to signify an MDD admission. To note, the ICD-10 code present in the main diagnostic position indicates the principal condition the patient is being treated for, whilst the supplementary codes signify active comorbidities that have contributed to the overall episode [27]. In such setting, the psychiatric rehospitalization risk of all MDD patients who were hospitalized both for an acute episode of MDD and not for an acute episode of MDD could be predicted and the prediction tool can be utilized in a wider range of scenarios.


Definition of psychiatric readmission

The psychiatric readmission measure in this study was defined as a binary indicator of whether an individual had another admission record within 30 days (60, 90, 180, and 365 days) after the initial MDD hospitalization, with a principal psychiatric diagnosis ICD-10 code in the range F00-F99. For example, the prediction-modeling cohort of 30-day psychiatric readmission, cases in the cohort referred to patients whose discharge date of the index admission located between January 2009 and November 2018 (Fig. 3). For each patient, follow-up began at the date of discharge, and ended at the earliest instance of 30-day psychiatric readmission (if at all), or the 30-day follow-up end date after discharge. Notably, patients whose discharge date of the index admission located between December 1, 2018 and December 31, 2018 were removed from the initial cohort, since the follow-up cannot cover a whole month (Supplementary Information 1.2).

Fig. 3
figure 3

Cohort establishment for the 30-, 60-, 90-, 180-, and 365-day psychiatric readmission prediction.

Variables and features

Following the Andersen sociobehavioral model [33, 34] and other previous studies, the determinants of readmission included predisposing factors, including age, sex (male/female), marital status, and race ethnicity; enabling factors, including whether the patient had a diagnosis of a medical comorbidity, source of payment, primary psychiatric diagnosis, severity of symptoms and length of stay of the index admission [10].

In our study, the database consists of a representative set of items to record all information generated during a patient’s hospitalization. Various categories of features were extracted from the original medical records. Specifically, sociodemographic information includes age at admission, gender, marital status, job, ethnicity, source of payment, and province of hometown. Basic information about hospitalization like month/seasonality/year at admission, specialty care unit that the patient lived, whether transferred to other units, diagnoses, severity of MDD, whether the diagnosis of a patient is a recurrent MDD, whether the patient had a diagnosis of a medical comorbidity that increases the relative risks of psychiatric readmission (other mental disorders, endocrine diseases, nervous diseases, digestive diseases, circulatory diseases, respiratory diseases, and cancer), the number of comorbidity in each disease system, the number of surgery, how many admissions before the index admission, length of stay, vital signs based on basic body check at admission (subcutaneous bleeding, sickly look, facial expression, nutrition, cooperation, consciousness, gait, body position, body temperature, pulse, breath, systolic and diastolic blood pressure), and other information that is typically used for billing purposes were collected. In addition, some information about past illness history and life behavior of a patient was also recorded and extracted, including histories of surgery, allergy, blood transfusion, medication use, smoking, alcoholism. By analyzing doctors’ order dataset, features of treatment patterns of each patient including inpatient medication prescriptions (the type of drug and usage frequency of common antidepressant, anxiolytic, antipsychotic, and anti-side-effect treatments), physiotherapy patterns (modified electroconvulsive therapy, biofeedback therapy, transcranial magnetic stimulation, and electroencephalographic biofeedback therapy), and psychotherapy patterns were captured and extracted from the EMR system.

Unstructured data like narrative notes (chief complaint) to record patients’ main symptoms and current mood states was processed by removing all digits (duration of the specific symptom) and punctuation. The text data was split into words by jieba package of R (Version 3.6.1 for Windows). By adding some stop words and medical dictionaries, meaningless words were deleted, and some medical nouns were prevented from being separated. The term frequency scores were used to identify the most informative words in patients’ chief complaints and these words were referred to as features of patients’ main symptoms and current mood states (mood-down, bad sleep, loss of interest, flustered, worry, tension, upset, headache, dizziness, physical discomfort, fatigue, suicide ideation, self-harm, hallucination, less activity, chest tightness, afraid, irritability, fidget, slow response, recurrence and worsen of symptoms). The value of each of the symptom feature depends on whether the chief complaint of this patient includes the above key words or not, particularly, 1 for including the specific word and 0 for not including the specific word (Supplementary Information 1.3).

Overall, 232 features were recruited into the original data pool (Supplementary Information 1.4). The details of recruited features were concluded in Table S1. Drugs, physiotherapies, and psychotherapies used in the antidepressant therapy were summarized in Table S2. Furthermore, we analyzed the quality of our raw data. The results were presented in Table S3 and S4. The data were checked for missing values, and missing values were filled with data in the EMR. The outliers of each record were detected and were removed before the start of the analysis.

2.4 ML models

Data analyses and ML algorithms were implemented using RStudio (Version 3.6.1 for Windows). The process of model development and evaluation were accomplished in three phases.

Model building

We trained four different ML models such as a multivariate binary logistic regression (LR) model, a Radial Basis Function Kernel two-class support vector machine (SVM) model, a random forest (RF) model and an extreme gradient boosting (XGBoost, XGB) model to predict 30-, 60-, 90-, 180-, and 365-day psychiatric readmissions for patients with MDD, respectively. LR is a statistical approach that presumes a linear relation between log-odds of the predictor variables and the outcome [35]. The expected value of the outcome is fit to the predictors and the regression function is a sigmoid function that can take a real number and transforms it into a value between 0 and 1. LR has been the dominant approach to binary outcome prediction in clinical medicine for decades. SVM generates hyperplanes to separate the data into different regions using a radial kernel function, which enlarges the feature space to accommodate as many data points as possible. RF is an ensemble algorithm that combine numerous independently sampled decision trees via bootstrapping to optimize predictive accuracy. Gradient boosting is a well-known ML technique for classification and regression problems and produces a prediction model in the form of an ensemble of weak prediction models, which are typically decision trees. The XGB algorithm builds the model in a stepwise fashion, like other boosting methods, but is more general than many approaches because it allows the optimization of an arbitrary differentiable loss function. These approaches were selected because they have strong theoretical foundations and vary in complexity.

To assess the prediction performance of ML models on a different dataset and to ensure the unbiased approximation of the model’s generalizability to new patients, each prediction-modeling cohort was split into multiple training and testing datasets according to the period obtained of the data respectively (Supplementary Information 1.5 and Table S5). The initial train-test splits were 1950 patients (15.03% of the 30-day cohort), 1759 patients (13.76% of the 60-day cohort), 1596 patients (12.64% of the 90-day cohort), 956 patients (7.98% of the 180-day cohort), and 2023 patients (18.35% of the 365-day cohort), respectively, for testing of our final models. All performance metrics were derived from the testing datasets.

For each train-test split of each prediction-modeling cohort (30-, 60-, 90-, 180-, and 365-day), we conducted feature selection, hyperparameter optimization, and the fitting of a SVM, LR, RF, and XGB model in the inner cross validation loop. To begin, for each prediction-modeling cohort, two groups (readmission and non-readmission) were compared regarding all included features using only the training dataset. The feature filtering phase has the benefits of both partly minimization of overfitting and variable selection. Two-sided t test for continuous variables and chi-squared test for categorical variables were used to statistically filter each feature’s p value, and those with p < 0.05 were considered as significant features and were entered in the ML algorithms. We ran the four algorithms using the training set to build better classifiers in the inner cross-validation loop. We conducted hyperparameter optimization and the fitting of each of the four algorithms. Specifically, the SVM model was derived by ten-fold cross-validation of hyperparameters (gamma and cost) using the tune.svm function. The link function of the LR model was logit for the Bernoulli distributed binary classification dependent variable. The parameters for RF model (number of trees and number of variables tried at each split) were selected by grid search using ten-fold cross-validation on the training set. For XGB, the optimal hyperparameters (number of optimization rounds, maximum tree depth, minimum weight in each child node, minimum loss reduction, regularization penalty, subsampling for regularization, etc.) were selected by grid search using the expand.grid function. The hyperparameters that maximized area under the receiver operator characteristic (ROC) curve (AUC) were selected. All steps were completed in an inner cross-validation loop with five repeats of ten-fold cross-validation. See Supplementary Information 1.6 for further details on the selected hyperparameter values for each algorithm.

The learned parameters were then used to construct a model for the entire training set and to make predictions on the testing set that was never used for model selection or parameter tuning. We remedied the imbalanced nature of the dataset, which had more non-readmissions than readmissions, by experimenting with randomly sub-sampling the negative class (non-readmissions) in the training set using the ovun.sample function to produce a balanced ratio between the positive class (readmissions) and negative class. Testing set distributions were never modified to reflect the reality of class imbalance during prediction, and reported performance reflects those raw distributions. Topic features extracted for characterizing patients’ symptoms were derived from the complete dataset including both training and testing subsets. As derived topics do not incorporate any knowledge of future readmission, the inclusion of the testing set does not lead to overfitting or inflated estimates of discrimination.

To avoid favorable train-test splits in the data, this process was repeated four times for each prediction-modeling cohort. For the testing of our final models, we obtained performance estimates from four different test sets. Statistical significance of the AUCs obtained from different test sets for each algorithm was calculated by DeLong’s test. The code is available upon request.

Explainable ML predictions

An important step forward for ML in actual medical decision support is the ability to provide simple explanations of predictions from arbitrarily complex models, helping eliminate the typical trade-off between accuracy and interpretability. In our work, we applied model agnostic methods (including SHapley Additive exPlanations (SHAP) [36] and Break Down [37]) to the prediction models to obtain explanations of the features that drive patient-specific predictions to mitigate the issue of black-box predictions [38]. SHAP is a model-agnostic representation of feature importance where the impact of each feature on a particular prediction is represented using Shapley values inspired by cooperative game theory [36]. A Shapley value states, given the current set of feature values, how much a single feature in the context of its interaction with other features contributes to the difference between the actual prediction and the mean prediction [39]. The main goal of Break Down is to decompose model predictions into parts that can be attributed to specific variables [37], and contributions of variables are calculated sequentially and presented in the form of waterfall plots [38]. Shapley value is an average over Break Down contributions for all possible ordering of variables [38]. For comparison, we also analyzed how specific features contribute in different prediction scenarios, i.e., 30-, 60-, 90-, 180-, and 365-day psychiatric readmission.

Model evaluation

Discrimination of the models were assessed using AUC, sensitivity, specificity, PPV, and NPV (the positive and negative predicted values). A high sensitivity (also called the true positive rate, the recall, or probability of detection) may be preferable to achieve so that MDD patients at greatest risk of psychiatric readmissions can be effectively identified. However, specificity (also called the true negative rate) measures the percentage of patients who are correctly identified as not readmitted with any mental disorders. The AUC is a measure comparing different classification models, which combines sensitivity and specificity.

Source link