Decision, Vol 11(1), Jan 2024, 7-34; doi:10.1037/dec0000187
The wisdom of a crowd can be extracted by simply averaging judgments, but weighting judges based on their past performance may improve accuracy. The reliability of any proposed weighting scheme depends on the estimation precision of the features that determine the weights, which in practice cannot be known perfectly. Therefore, we can never guarantee that any weighted average will be more accurate than the simple average. However, depending on the statistical properties of the judgments (i.e., their estimated biases, variances, and correlations) and the sample size (i.e., the number of judgments from each individual), we may be reasonably confident that a weighted average will outperform the simple average. We develop a general algorithm to test whether there are sufficiently many observed judgments for practitioners to reject using the simple average and instead trust a weighted average as a reliably more accurate judgment aggregation method. Using simulation, we find our test provides better guidance than cross validation. Using real data, we demonstrate how many judgments may be required to be able to trust commonly used weighted averages. Our algorithm can also be used for power analysis when planning data collection and as a decision tool given existing data to optimize crowd wisdom. (PsycInfo Database Record (c) 2024 APA, all rights reserved)