Parametric, semiparametric, nonparametric? How to impute missing data when convenient model assumptions are violated

14th Conference of the ‘methods and evaluation’-section (FGME) of the German Psychological Society (DGPs), Kiel, Germany

Authors

Kristian Kleinke

Jost Reinecke

Date

September 16, 2019

Abstract

One of the standard methods to analyse incomplete data is multiple imputation based on Rubin’s (1987) theory. Over the years, many different solutions have emerged to create multiple imputations in various scenarious: methods based on fully parametric models like Schafer‘s (1997) norm approach, semi-parametric methods like predictive mean matching, as it is for example implemented in R package mice, other semi-parametric techniques, e.g. based on generalized additive models for location, scale, and shape (e.g. R package ImputeRobust), or based on non-parametric quantile regression (R package Qtools). In this paper, we discuss advantages and disadvantages of the respective methods, outline, in which scenarious these techniques might be applied, and illustrate their use based on highly skewed empirical panel data from the CrimoC-project, focussing on the development of juvenile delinquency throughout adolescence.