How to and how not to impute incomplete count data | Dr. Kristian Kleinke

How to and how not to impute incomplete count data

Abstract

Missing data pose a threat to the validity of statistical inferences, when they are numerous, not missing completely at random, and when they are handled in an inadequate way. Multiple imputation is a state-of-the-art method to handle the missing data problem and produces unbiased inferences, when (distributional) assumptions are at least approximately met. Count data are non-negative integer values, and often skewed. Most MI software does not support count models or supports only basic count models .Van Buuren (2012) therefore recommends the following strategies to impute count data: predictive mean matching, ordered categorical regression, (zero-inflated) Poisson regression, and (zero-inflated) negative binomial regression. In the present paper, we evaluate these recommendations by means of Monte Carlo simulation. Based on our findings, we discourage the use of proxy strategies with ill-fitting (distributional) assumptions.

Publication
Proceedings from the 9th European Congress of Methodology