SF-6Dv2, the latest version of SF-6D, has been developed recently, and its measurement properties remain to be evaluated and compared with the EQ-5D-5L. The aim of this study was to assess and compare the measurement properties of the SF-6Dv2 and the EQ-5D-5L in a large-sample health survey among the Chinese population.
Data were obtained from the 2020 Health Service Survey in Tianjin, China. Respondents were randomly selected and invited to complete both the EQ-5D-5L and SF-6Dv2 through face-to-face interviews or self-administration. Health utility values were calculated by the Chinese value sets for the two measures. Ceiling and floor effects were firstly evaluated. Convergent validity and discriminate validity were examined using Spearman’s rank correlation and effect sizes, respectively. The agreement was assessed using intraclass correlation coefficients (ICC). Sensitivity was compared using relative efficiency and receiver operating characteristic.
Among 19,177 respondents (49.3% male, mean age 55.2 years, ranged 18–102 years) included in this study, the mean utility was 0.939 (0.168) for EQ-5D-5L and 0.872 (0.184) for SF-6Dv2. A higher ceiling effect was observed in EQ-5D-5L than in SF-6Dv2 (72.8% vs. 36.1%). The Spearman’s rank correlation (range: 0.30–0.69) indicated an acceptable convergent validity between the dimensions of EQ-5D-5L and SF-6Dv2. The SF-6Dv2 showed slightly better discriminative capacities than the EQ-5D-5L (ES: 0.126–2.675 vs. 0.061–2.256). The ICC between the EQ-5D-5L and SF-6Dv2 utility values of the total sample was 0.780 (p < 0.05). The SF-6Dv2 had 29.0–179.2% higher efficiency than the EQ-5D-5L at distinguishing between respondents with different external health indicators, while the EQ-5D-5L was found to be 8.2% more efficient at detecting differences in self-reported health status than the SF-6Dv2.
Both the SF-6Dv2 and EQ-5D-5L have been demonstrated to be comparably valid and sensitive when used in Chinese population health surveys. The two measures may not be interchangeable given the moderate ICC and the systematic difference in utility values between the SF-6Dv2 and EQ-5D-5L. Further research is warranted to compare the test–retest reliability and responsiveness.