Multiple imputation of incomplete panel data based on a piecewise growth curve model: An evaluation and application to juvenile delinquency data | Dr. Kristian Kleinke

Multiple imputation of incomplete panel data based on a piecewise growth curve model: An evaluation and application to juvenile delinquency data

Abstract

Modern model-based missing data imputation approaches try to recreate the sample in a way as if ideally no information had gone missing. To this end, the respective model that is used to create the imputations usually needs to reflect the assumed (and typically unknown) true data-generating process (DGP) and, if necessary, the mechanism that created the missing data patterns. The present chapter focuses on non-linear trajectories of the target variable over time. We propose to use piecewise growth curve models as a relatively simple method to approximate a non-linear trajectory and to impute incomplete non-linear panel data. The purpose of this paper is to elucidate how the choice of the imputation method and model affects substantive model results. We present results of a Monte Carlo simulation where the data-generating model was a piecewise growth curve model with two linear piecewise splines over 12 panel waves, reflecting first an increase in the target variable and a decrease later on. Data were imputed based on a piecewise growth curve model, and based on a related model with a close fit (a growth curve model with a linear and quadratic time trend), or based on a relatively robust all-round method (semi-parametric predictive mean matching). In empirical analyses, the true data-generating process is usually unknown, and applied researchers need to know if and to what extent minor model misspecifications of the imputation model affect statistical inferences. Results from our simulations show that if the imputation model is correctly specified, statistical inferences are widely unbiased and that minor model misspecifications of the imputation model did not do much damage. Falsely assuming that the data are missing at random on the other hand could lead to biased statistical inferences. The chapter ends with an application of the proposed method to data from 12 waves of the Crime in the Modern City (CrimoC) study and a general discussion of results.

Publication
In M. Stemmler, W. Wiedermann, & F. L. Huang (Eds.), Dependent data in social sciences research (pp. 589-615). Springer