Abstract
Principal components analysis (PCA) and partial least squares (PLS) have been used for the construction of socioeconomic status (SES) indices to use as a predictor of the well-being status in targeted programs. Generally, these indicators are constructed as a linear combination of the first component. Due to the characteristics of the socioeconomic data, different extensions of PCA and PLS for non-metric variables have been proposed for these applications. In this paper, we compare the predictive performance of SES indices constructed using more than one component. Additionally, for the inclusion of non-metric variables, a variant of the normal mean coding is proposed that takes into account the multivariate nature of the variables, which we call multivariate normal mean coding (MNMC). Using simulations and real data, we found that PLS using MNMC as well as the classical dummy encoding method give the best predictive results with a more parsimonious SES index, both in regression and classification problems.