Objective: To examine the dependability of alliance scores at the patient and therapist level, to evaluate the potential causal direction of session-to-session changes in alliance and depressive symptoms, and to investigate the impact of aggregating the alliance over progressively more sessions on the size of the alliance–outcome relationship. Method: We used data from a study (N = 45 patients; N = 9 therapists) of psychotherapy for major depressive disorder in which the alliance was measured at every treatment session to calculate generalizability coefficients and to predict change in depressive symptoms from alliance scores. Two replication samples were also used. Results: At the therapist level, a large number of patients (about 60) per therapist is needed to provide a dependable therapist-level alliance score. At the patient level, generalizability coefficients revealed that a single assessment of the alliance is only marginally acceptable. Very good (>.90) dependability at the patient level is only achieved through aggregating 4 or more assessments of the alliance. Session-to-session change in the alliance predicted subsequent session-to-session changes in symptoms. Evidence for reverse causation was found in later-in-treatment sessions, suggesting that only aggregates of early treatment alliance scores should be used to predict outcome. Session 3 alliance scores explained 4.7% of outcome variance, but the average of Sessions 3–9 explained 14.7% of outcome variance. Conclusion: Adequate assessment of the alliance using multiple patients per therapist and at least 4 treatment sessions is crucial for fully understanding the size of the alliance–outcome relationship. (PsycINFO Database Record (c) 2011 APA, all rights reserved)