Fernando TUSELL
Facultad de CC.EE. y Empresariales
Avenida del Lehendakari Aguirre, 83
E-48015 BILBAO
email: etptupaf@bs.ehu.es
We address the problem of imputation of vectors, such as is required, for instance, when a large survey is supplemented with another one in which only a subset of all questions is asked, and imputation on the non-asked ones is needed.
In Bárcena and Tusell(2000) we proposed a tree-based method that afforded easy, nonparametric imputation of multivariate responses, such as is required e.g. when linking two surveys. While its performance is quite competitive with existing methods (and best in some circumstances), the method suffers from the discreteness intrinsic to the approximation that trees can make of a continuous function.
The basic ideas present in our previous work -predictive matching and flexible, nonparametric approximation- are carried now one step further. We show how neural networks can be put to use so as to provide an approximation of a predictive distribution. Thus, a simple, nonparametric, distribution-free analogue of multiple imputation is obtained.
We will show the performance of the method in both real and simulated date.
references
Bárcena, M.J. and Tusell, F. (2000) "Tree based algorithms for missing data imputation", COMPSTAT'2000: Proceedings in Computational Statistics, Physica-Verlag, Heidelberg, 2000.