COMBINING EDITING AND IMPUTATION METHODS IN HOUSEHOLD SURVEYS: AN EXPERIMENTAL APPLICATION ON CENSUS DATA.

Next: IMPUTATION Up: EDITING AND IMPUTATION SYSTEMS Previous: ALGORITHMS FOR AUTOMATIC ERROR

COMBINING EDITING AND IMPUTATION METHODS IN HOUSEHOLD SURVEYS: AN EXPERIMENTAL APPLICATION ON CENSUS DATA.

Antonia Manzari

ISTAT
c/o Servizio MPS
Via Cesare Balbo,16 - Roma
Italy
manzari@istat.it

Data from Household surveys are generally characterised by a hierarchical structure: data are collected at the household level with information for each person within the household. Some collected variables are related to the household features, while the remaining ones concern the person. Some person variables are of demographic type, other ones are non-demographic. The majority of variables are of qualitative or categorical type (though integer coded data), but some variables can be of quantitative or numeric type. These features makes the Editing and Imputation (E&I) phase a complex matter: the relationships among the values of demographic variables referred to different persons within the household oblige the user of the E&I system to take into account the between persons constraints together with the constraints among the values of variables referred to a given person (within person constraints). Moreover, joint E&I of both qualitative and quantitative variables is required, but while constraints involving qualitative data are definable by logical edit rules, the relationships among numeric variables are generally expressed by arithmetic edit rules (generally linear inequalities). Therefore E&I system treating invalid or inconsistent responses for qualitative and numeric variables simultaneously are needed.

How E&I phase can be performed in so complex a situation by using automatic generalised system for micro-editing? Complex E&I task can be tackled dividing the E&I phase into simpler sub-problems and finding the most appropriate solution for each of them. The E&I strategy as a combination of several methodologies can be an useful way to clean data as the data quality is maintained because every peculiar problem is faced by a suitable tool.

In this paper the E&I strategy used in cleaning a perturbed Sample of Anonymised Records for individuals from UK Census 1991 (SARs data) is presented. The strategy has been developed in EUREDIT project by using currently in-use (or "standard") methods for data E&I, in order to obtain a performance benchmark for advanced techniques.

The E&I phase has been divided into two macro sub-phases where different automatic systems (CANCEIS and SCIA) were used. Inside the second macro sub-phase (SCIA) several applications were performed varying the variables to handle and/or the specified edit rules.

Next: IMPUTATION Up: EDITING AND IMPUTATION SYSTEMS Previous: ALGORITHMS FOR AUTOMATIC ERROR

Pasi Koikkalainen
Fri Oct 18 19:03:41 EET DST 2002