Background:
For the analysis of length-of-stay (LOS) data, which is characteristically right-skewed, anumber of statistical estimators have been proposed as alternatives to the traditional ordinaryleast squares (OLS) regression with log dependent variable.
Methods:
Using a cohort of patients identified in the Australian and New Zealand Intensive CareSociety Adult Patient Database, 2008-2009, 12 different methods were used for estimation ofintensive care (ICU) length of stay. These encompassed risk-adjusted regression analysis offirstly: log LOS using OLS, linear mixed model [LMM], treatment effects, skew-normal andskew-t models; and secondly: unmodified (raw) LOS via OLS, generalised linear models[GLMs] with log-link and 4 different distributions [Poisson, gamma, negative binomial andinverse-Gaussian], extended estimating equations [EEE] and a finite mixture model includinga gamma distribution. A fixed covariate list and ICU-site clustering with robust variance wereutilised for model fitting with split-sample determination (80%) and validation (20%) datasets, and model simulation was undertaken to establish over-fitting (Copas test). Indices ofmodel specification using Bayesian information criterion [BIC: lower values preferred] andresidual analysis as well as predictive performance (R2, concordance correlation coefficient(CCC), mean absolute error [MAE]) were established for each estimator.
Results:
The data-set consisted of 111663 patients from 131 ICUs; with mean(SD) age 60.6(18.8)years, 43.0% were female, 40.7% were mechanically ventilated and ICU mortality was 7.8%.ICU length-of-stay was 3.4(5.1) (median 1.8, range (0.17-60)) days and demonstrated markedkurtosis and right skew (29.4 and 4.4 respectively). BIC showed considerable spread, from amaximum of 509801 (OLS-raw scale) to a minimum of 210286 (LMM). R2 ranged from 0.22(LMM) to 0.17 and the CCC from 0.334 (LMM) to 0.149, with MAE 2.2-2.4. Superiorresidual behaviour was established for the log-scale estimators. There was a general tendencyfor over-prediction (negative residuals) and for over-fitting, the exception being the GLMnegative binomial estimator. The mean-variance function was best approximated by aquadratic function, consistent with log-scale estimation; the link function was estimated(EEE) as 0.152(0.019, 0.285), consistent with a fractional-root function.
Conclusions:
For ICU length of stay, log-scale estimation, in particular the LMM, appeared to be the mostconsistently performing estimator(s). Neither the GLM variants nor the skew-regressionestimators dominated.