Evaluation Review, Ahead of Print.
We consider estimating the effect of a treatment on a given outcome measured on subjects tested both before and after treatment assignment in observational studies. A vast literature compares the competing approaches of modelling the post-test score conditionally on the pre-test score versus modelling the difference, namely, the gain score. Our contribution lies in analyzing the merits and drawbacks of two approaches in a multilevel setting. This is relevant in many fields, such as education, where students are nested within schools. The multilevel structure raises peculiar issues related to contextual effects and the distinction between individual-level and cluster-level treatments. We compare the two approaches through a simulation study. For individual-level treatments, our findings align with existing literature. However, for cluster-level treatments, the scenario is more complex, as the cluster mean of the pre-test score plays a key role. Its reliability crucially depends on the cluster size, leading to potentially unsatisfactory estimators with small clusters.