Background There is a need for brief instruments to ascertain the diagnosis of major depressive disorder. In this study, we present the reliability, construct validity and accuracy of the PHQ-9 and PHQ-2 to detect major depressive disorder in primary care.Methods Cross-sectional analyses within a large prospective cohort study (PREDICT-NL). Data was collected in seven large general practices in the centre of the Netherlands. 1338 subjects were recruited in the general practice waiting room, irrespective of their presenting complaint. The diagnostic accuracy (the area under the ROC curve and sensitivities and specificities for various thresholds) was calculated against a diagnosis of major depressive disorder determined with the Composite International Diagnostic Interview (CIDI).Results The PHQ-9 showed a high degree of internal consistency (ICC=0.88) and test-retest reliability (correlation=0.94). With respect to construct validity, it showed a clear association with functional status measurements, sick days and number of consultations. The discriminative ability was good for the PHQ-9 (area under the ROC curve = 0.87, 95% CI: 0.84-0.90) and the PHQ-2 (ROC area = 0.83, 95% CI 0.80-0.87). Sensitivities at the recommended thresholds were 0.49 for the PHQ-9 at a score of 10 and 0.28 for a categorical algorithm. Adjustment of the threshold and the algorithm improved sensitivities to 0.82 and 0.84 respectively but the specificity decreased from 0.95 to 0.82 (threshold) and from 0.98 to 0.81 (algorithm). Similar results were found for the PHQ-2: the recommended threshold of 3 had a sensitivity of 0.42 and lowering the threshold resulted in an improved sensitivity of 0.81.Conclusion The PHQ-9 and the PHQ-2 are useful instruments to detect major depressive disorder in primary care, provided a high score is followed by an additional diagnostic work-up. However, often recommended thresholds for the PHQ-9 and the PHQ-2 resulted in many undetected major depressive disorders.