Background:
When outcomes are binary, the c-statistic (equivalent to the area under the ReceiverOperating Characteristic curve) is a standard measure of the predictive accuracy of a logisticregression model.
Methods:
An analytical expression was derived under the assumption that a continuous explanatoryvariable follows a normal distribution in those with and without the condition. We thenconducted an extensive set of Monte Carlo simulations to examine whether the expressionsderived under the assumption of binormality allowed for accurate prediction of the empiricalc-statistic when the explanatory variable followed a normal distribution in the combinedsample of those with and without the condition. We also examine the accuracy of thepredicted c-statistic when the explanatory variable followed a gamma, log-normal or uniformdistribution in combined sample of those with and without the condition.
Results:
Under the assumption of binormality with equality of variances, the c-statistic follows astandard normal cumulative distribution function with dependence on the product of thestandard deviation of the normal components (reflecting more heterogeneity) and the logoddsratio (reflecting larger effects). Under the assumption of binormality with unequalvariances, the c-statistic follows a standard normal cumulative distribution function withdependence on the standardized difference of the explanatory variable in those with andwithout the condition. In our Monte Carlo simulations, we found that these expressionsallowed for reasonably accurate prediction of the empirical c-statistic when the distribution ofthe explanatory variable was normal, gamma, log-normal, and uniform in the entire sample ofthose with and without the condition.
Conclusions:
The discriminative ability of a continuous explanatory variable cannot be judged by its oddsratio alone, but always needs to be considered in relation to the heterogeneity of thepopulation.