When Can We Trust Regression Discontinuity Design Estimates from Close Elections? Evidence from Experimental Benchmarks