• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar

information for practice

news, new scholarship & more from around the world


advanced search
  • gary.holden@nyu.edu
  • @ Info4Practice
  • Archive
  • About
  • Help
  • Browse Key Journals
  • RSS Feeds

Adapting Nearest Neighbor for Multiple Imputation: Advantages, Challenges, and Drawbacks

Abstract

The U.S. Census Bureau has historically used nearest neighbor (NN) or random hot deck (RHD) imputation to handle missing data for many types of survey data. Using these methods removes the need to parametrically model values in imputation models. With strong auxiliary information, NN imputation is preferred because it produces more precise estimates than RHD. In addition, NN imputation is robust against a misspecified response mechanism if missingness depends on the auxiliary variable, in contrast to RHD which ignores the auxiliary information. A compromise between these two methods is k-NN imputation, which identifies a set of the k closest neighbors (“donor pool”) and randomly selects a single donor from this set. Recently these methods have been used for multiple imputation (MI), enabling variance estimation via the so-called Rubin’s Combining Rules. The Approximate Bayesian Bootstrap (ABB) is a simple-to-implement algorithm that makes the RHD “proper” for MI. In concept, ABB should work to propagate uncertainty for NN MI; bootstrapping respondents mean each nonrespondent’s one “nearest” donor will not be available for every imputation. However, we demonstrate through simulation that NN MI using ABB leads to variance underestimation. This underestimation is somewhat but not entirely attenuated with k-NN imputation. An alternative approach to variance estimation after MI, bootstrapped MI, eliminates the underestimation with NN imputation, but we show that it suffers from overestimation of variance with nonnegligible sampling fractions under both equal and unequal probability sampling designs. We propose a modification to bootstrapped MI to account for nonnegligible sampling fractions. We compare the performance of RHD and the various NN MI methods under a variety of sampling designs, sampling fractions, distribution shapes, and missingness mechanisms.

Read the full article ›

Posted in: Journal Article Abstracts on 01/17/2022 | Link to this post on IFP |
Share

Primary Sidebar

Categories

Category RSS Feeds

  • Calls & Consultations
  • Clinical Trials
  • Funding
  • Grey Literature
  • Guidelines Plus
  • History
  • Infographics
  • Journal Article Abstracts
  • Meta-analyses - Systematic Reviews
  • Monographs & Edited Collections
  • News
  • Open Access Journal Articles
  • Podcasts
  • Video

© 1993-2023 Dr. Gary Holden. All rights reserved.

gary.holden@nyu.edu
@Info4Practice