Effective evaluation of data quality between data collected in different modes is complicated by the confounding of selection and measurement effects. This study evaluates the utility of propensity score matching (PSM) as a method that has been proposed as a means of removing selection effects across surveys conducted in different modes. Our results show large differences in estimates for the same variables between parallel face-to-face and online surveys, even after matching on standard demographic variables. Moreover, discrepancies in estimates are still present after matching between surveys conducted in the same (online) mode, where differences in measurement properties can be ruled out a priori. Our findings suggest that PSM has substantial limitations as a method for separating measurement and selection differences across modes and should be used only with caution.