Robust multiple imputation based on quantile forests | Dr. Kristian Kleinke

Robust multiple imputation based on quantile forests

Abstract

Random Forest (RF) is a machine learning method for classification and regression problems that can be enumerated among the ensemble methods - i.e. the classification decision / prediction is based on an ensemble (forest) of relatively independent statistical models (trees). Imputation by RF is particularly attractive for large datasets, since no imputation model and auxiliary variables need to be specified, and no functional from needs to be specified, since the underlying functional from is approximated in a data-driven fashion. However, little is yet known about the robustness of RF based imputation. The purpose of the present paper is to elucidate to what extent RF- based multiple imputation is robust, and if imputation based on quantile forests might work `better', if the data are skewed and heteroscedastic.

Date
Feb 9, 2023 3:00 PM
Location
Bielefeld, Germany