Multiple imputation of longitudinal data: A comparison of robust imputation methods regarding sample size requirements, with an application to corporal punishment data

PD Dr. Kristian Kleinke, Markus Fritsch, Mark Stemmler, Friedrich Lösel

October 2024

Abstract

Models for longitudinal data are often based on strong parametric assumptions such as independence, normality and homoscedasticity of errors. Frequently, both the target and the explanatory variables are affected by missing data and require imputation. In order to produce unbiased statistical inferences, the imputation model should reflect the true data generating process and the mechanism creating the missing values. The present chapter discusses strengths and weaknesses and evaluates the robustness of three approaches for creating multiple imputations of missing data that do not necessarily rely on the assumption of independent, normal and homoscedastic errors: semi-parametric predictive mean matching, quantile regression-based multiple imputation and random forest-based multiple imputation. Our present simulations contribute to establish practical guidelines regarding sample size requirements of the respective approaches. Our results show that robust methods for multiple imputation require a sufficiently large sample size of $N\ge500$ to produce acceptable statistical inferences.

Type

Book section

Publication

In M. Stemmler, W. Wiedermann, & F. L. Huang (Eds.), Dependent data in social sciences research (pp. 565-588). Springer