Count data are non-negative integer values and give the frequency of occurrence of a certain event or behavior within a given timespan. Count data are usually not normally distributed but are often skewed and require special analysis and imputation techniques. Yet, most of the currently available multiple imputation packages are very limited with regard to count data. The countimp
package provides easy to use multiple imputation (MI) procedures for incomplete count data based on either a Bayesian regression approach or on a bootstrap regression approach within a chained equations MI framework. Our software extends the functionality of the popular and powerful mice package in R (van Buuren & Groothuis-Oudshoorn, 2011). The current version of countimp supports ordinary count data imputation under the Poisson model, imputation of incomplete overdispersed count data under either the quasi-Poisson or the negative binomial model, imputation of zero-inflated ordinary or overdispersed count data based on a zero-inflated Poisson or negative binomial model, or a hurdle model, and imputation of multilevel count data based on generalized linear mixed effects count models (overdispersion and zero-inflation are supported). Additionally, we provide a predictive mean matching (PMM) variant, based on a two-level model, which might be used, when count data are not too heavily skewed (for an evaluation of flat-file count data imputation by PMM, see for example Kleinke, 2017).
Kleinke, K., & Reinecke, J. (2019). countimp
version 2 – A multiple imputation package for incomplete count data (Technical Report). Siegen, Germany: University of Siegen, Department of Education Studies and Psychology. Available from https://www.kkleinke.de/countimp/