A predictive model of hospitalization and survival to COVID-19 in a retrospective population study – Scientific Reports

In this study, we analyzed different types of patients with COVID-19 in southeastern Spain (n = 86,867). Unlike most of the COVID-19 studies that have developed predictive models in the literature that deal with fewer than 5,000 patients17,18,19,20,21. In addition, we presented a technique specifically designed for the treatment of imbalance problems (IPIP) by which we developed machine learning models to predict a patient’s final condition and need for hospitalization. We trained and evaluated sa models without IPIP, which effectively resolve the imbalance in the data according to our results (Fig. 3).

Regarding the characterization of the different types of prototypical COVID-19 patients, in this region the most common type of non-hospitalized COVID-19 patient is a 38-year-old woman with 2 chronic pathologies, while the hospitalized prototypical COVID-19 patient is a 62-year-old man with 5 chronic pathologies. We identified age, sex, and number of comorbidities as important for distinguishing outpatients from inpatients. Several studies have also found that hospitalized patients with COVID-19 are more likely to be older, male, and associated with more comorbidities such as obesity, diabetes mellitus, and hypertension.22,23. In addition, we could find statistically significant differences for age (Mr< 8.0 × 10-3), number of comorbidities (Mr< 2.5 × 10-3) and gender (Mr< 2.2 × 10– 16) between ICU patients and non-ICU inpatients, although these differences are smaller than between outpatients and inpatients. ICU patients were approximately one year younger than non-ICU inpatients and had fewer comorbidities (Supplementary Fig. S1). Therefore, we hypothesized that physicians include patients who are more likely to survive in the ICU because of the limited number of available ICU slots or the risk of male gender. We also find even greater differences in these features between survivors (discharged patients) and non-survivors (deceased patients) (Fig. 2). In our region, the prototype of a discharged patient is a 39-year-old woman with 2 chronic pathologies, while the prototype of a deceased patient is an 83-year-old man with 8 chronic pathologies. According to several studies, our results show that older patients are more likely to die24,25,26and also male patients are more likely to die (OR = 2.41, 95% CI 2.11, 2.75) (Table 2, Supplementary Fig. S2)27,28. Regarding comorbidities, we found that asthma, osteoporosis, and osteoarthritis were not associated with death related to COVID-19. A large number of studies indicate that patients with asthma are not at risk of severe disease from COVID-1929.30. For the association of osteoarthritis with death related to COVID-19, we found a study that reported a similar OR = 0.84 (95% CI 0.65–1.08)31. In osteoporosis, it is known that women are more at risk of developing osteoporosis than men32. Certain specific types of osteoporosis complications appear to be associated with a greater risk of exitus from COVID-19, however this study did not adjust risk for age and sex.33. The rest of the comorbidities evaluated in our study were associated with an increased risk of mortality. These comorbidities or pathologies are diabetes mellitus, dementia, obesity, heart failure, COPD, arterial hypertension, ischemic cardiomyopathy, stroke, renal insufficiency, cirrhosis and arthritis. Several studies reported the same results for these comorbidities31,34,35. Regarding depression, consistent with our results, a meta-analysis found that depression is associated with more deaths related to COVID-1936. All of the above results are important to ensure that the characteristics and comorbidities of our population are not unique. Additionally, we believe that due to similarities with other COVID-19 studies, our data could be useful for the development of predictive models.

Since the beginning of the pandemic, many studies have been conducted that reported some important clinical characteristics (predictors) of mortality in patients with COVID-19 through the development of ML-based models. Selected characteristics used as inputs to develop these models included baseline data, clinical symptoms, associated comorbidity, and clinical indicators. However, these studies have two fundamental problems: the low number of patients relative to the number of parameters studied greatly limits the cohort and strongly unbalanced data. To overcome these drawbacks, in this work, we tested different ML models considering basic data readily available in the emergency care environment and based on clinical data from the EHR to aid in early triage of patients. We definitely obtained promising results in predicting the final patient status using the LR-IPIP model (0.92 balanced accuracy, ROC-AUC = 0.94). Regarding variable importance, ML detects age (FI: 1.0), gender (FI: 0.366), osteoarthritis (FI: 0.194), renal insufficiency (FI: 0.144), obesity (FI: 0.123) and number of affected systems ( FI: 0.117) as the most important predictor variable problem. The model also detected comorbidities such as dementia, diabetes mellitus and COPD. These characteristics are associated with a greater risk of death related to COVID-19 according to our model. In a similar direction, these comorbidities are associated with severe clinical manifestations observed in older adult patients37,38. Comorbidities such as cardiovascular disease, hypertension, and diabetes, although highly prevalent in older adults, have been associated with poorer outcomes in COVID-1931,34,35. Studies that rely on comorbidities to predict death based on ML typically rank age as one of the most influential variables.39,40in fact, a meta-analysis of 611,583 patients shows an age-related increase in mortality. The highest mortality therefore occurs in patients > 80 years old, in whom it was 6 times higher than in younger patients41. Similarly, gender is an important feature for several ML-based studies39.42our model identified that males are more likely to die, possibly due to the distribution of our data (OR = 2.41, 95% CI: 2.11, 2.75), which is consistent with previous work27,28. Similar to our model, another ML-based study identified obesity as an important feature43. However, to our knowledge, this is the first time a model reports osteoarthritis as an important feature. Beta values ​​in the ensemble model showed that osteoarthritis is associated with a lower risk of death related to COVID-19 (Supplementary Table S2). This could be consistent with a study using UK Biobank data (OR = 0.84, 95% CI 0.65–1.08), although not statistically significant31. Moreover, the distribution of arthrosis in our population is not statistically associated with the patient’s final condition. Note that although we have no conclusive evidence, patients with osteoarthritis can be treated. Interestingly, we might think that medication could play a role in patients with osteoarthritis and COVID-19, however, Wong et al. reported that treatment with nonsteroidal anti-inflammatory drugs (NSAIDs) is not associated with a higher risk of death from COVID-19 in patients with osteoarthritis44. Dementia, along with the number of affected systems and the number of comorbidities, also appears among the most relevant characteristics, consistent with the above factors in other studies, in dementia with results obtained from a UK Biobank cohort of 12,863 individuals who lived in the community and were over 65 years (1,814 individuals ≥ 80 years of age) were tested for COVID-19, where all causes of dementia were seen to increase the risk of death related to COVID-1945. In terms of accuracy, our LR-IPIP model obtained a balanced accuracy between 89 and 93% (ROC-AUC = 0.94) in predicting the final patient status. Accuracy was similar or higher than others when comparing our results with several studies. For example, Gao et al. showed accuracy between 80.6 and 96.8%18 which is a large confidence interval except that they used more complex clinical data at admission. Chatterjee et al. showed a balanced accuracy of 72%20possibly due to the low number of patients with COVID-19. Finally, another study based on ML was able to predict the risk of death already at diagnosis with a ROC-AUC of 0.90221.

The ability of the LR-IPIP model to decide whether to admit new patients was not as effective (balanced accuracy = 0.72; ROC-AUC = 0.75). Regarding the importance of variables, ML again found age, sex, and number of comorbidities to be important. Obesity is reappearing among them, and renal insufficiency and depression are prominent. Thus, it has been shown that acute renal failure is common in patients hospitalized for COVID-19 and that only 30% survive with recovery of renal function at discharge.46.

This research has specific shortcomings. First, due to the highly specific nature of this cohort and its inevitable novelty, we were unable to easily obtain an alternative cohort that could be used to replicate and validate our findings. Fortunately, this was partly overcome by the individuals coming from different hospitals in our region with shared electronic health record data management. As this was a retrospective study, the lack of some data was compensated by including only those demographics and comorbidities that were correctly recorded. Second, another problem stems from the strong data imbalance inherent in the research question we are asking. We tried to compensate for this by developing the IPIP method. Third, it should be noted that the data used to build the models were obtained in the absence of vaccination patterns and new variants of the SARS-COV-2 virus. However, the modeling methodology can easily be adapted to these new scenarios. Finally, a better understanding of the contribution of different symptoms or comorbidities to disease diagnosis could serve to introduce new features into future models, especially to improve the prediction of patients who require or do not require hospitalization.

In conclusion, this paper shows the analysis and development of ML-based predictive models with one of the largest datasets of COVID-19 (n = 86,867) obtained from the health department of the Region of Murcia (Spain). In addition, the problem of class imbalance has been solved by developing a new algorithm called IPIP that automatically deals with this problem. The obtained model makes it possible to predict with high accuracy the final state of the patient and with reasonable accuracy which patient will need to be hospitalized, simply by using demographic data and comorbidities available to clinicians when diagnosing COVID-19. In fact, this LR-IPIP predictive model can be used, among other things, to prioritize the triage of patients with COVID-19 when health system resources are limited, as is often the case during different waves of COVID-19. To facilitate this prioritization of resources, the corresponding web application and predictive models are readily available in open repositories (GitHub), which will facilitate their adaptation to new datasets of future epidemic waves of this disease or other respiratory viruses in general.

#predictive #model #hospitalization #survival #COVID19 #retrospective #population #study #Scientific #Reports

Leave a Comment

Your email address will not be published.