IMPUTATION METHODS FOR ESTIMATING PAY DISTRIBUTIONS FROM HOUSEHOLD SURVEY DATA

Next: NEW AND TRADITIONAL TECHNIQUES Up: IMPUTATION Previous: COUPLING NEURAL NETWORKS AND

IMPUTATION METHODS FOR ESTIMATING PAY DISTRIBUTIONS FROM HOUSEHOLD SURVEY DATA

Gabriele Beissel and Chris Skinner

Department of Social Statistics,
University of Southampton,
Southampton SO17 1BJ, UK,
gbeissel@socsci.soton.ac.uk, cjs@socsci.soton.ac.uk

Distributions of hourly pay are important for a wide range of social and economic policy issues. However, it is difficult to obtain reliable data on both earnings and hours due to measurement error.

We use data from the U.K. Labour Force Survey, a large survey of households, which includes information on hours worked and earnings of employees. However, these variables appear to be subject to a considerable amount of measurement error, which is thought to lead to substantial upward bias of estimates of the lower end of the pay distribution. An alternative variable on hourly earnings is obtained directly and appears to give very accurate information but is subject to a high amount of missing data, because many individuals are not able to report their hourly pay. The aim is to impute the missing values taking into account information on the erroneous variable and other covariates, such that the imputation method effectively corrects for the measurement error in the pay variable. Under the assumption of ignorable nonresponse, an imputation method using a random hot deck procedure within imputation classes based on a regression model, is carried out and compared to more established methods such as predictive mean matching, as investigated in Skinner and Beissel (2001). The imputation is applied multiple times. A formula for variance estimation under this imputation method taking into account imputation, response and sampling variability and the complex weighting scheme of the survey, using a design-based approach (Rao and Sitter, 1995), is derived. A computer intensive simulation study is carried out showing good results for point and variance estimators.

In addition, we consider variance estimation using Rubin's multiple imputation formula. This formula is designed for proper multiple imputation, however, and we find that this approach underestimates the variance for the improper imputation procedure.

References

[1] Rao, J.N.K. and Sitter, R.R. (1995): Variance Estimation under Two-Phase Sampling with Applications to Imputation for Missing Data, Biometrika, 82, 2, pp. 453-460.
[2] Skinner, C.J. and Beissel, G. (2001): Estimating the Distribution of Hourly Pay from Survey Data, paper presented at the CHINTEX Workshop, The Future of Social Surveys in Europe, Helsinki 29, 30 Nov. 2001.

Next: NEW AND TRADITIONAL TECHNIQUES Up: IMPUTATION Previous: COUPLING NEURAL NETWORKS AND

Pasi Koikkalainen
Fri Oct 18 19:03:41 EET DST 2002