Ray Chambers, Xinqiang Zhao, and Adao Hentges
Department of Social Statistics
University of Southampton
Highfield, Southampton, SO17 1BJ, U.K.
Editing in business surveys is often complicated by the fact that
outliers due to errors in the data are mixed in with correct, but
extreme, data values. In this paper we focus on a technique for error
identification in such long tailed data distributions based on
fitting outlier robust tree-based models to the outlier an error
contaminated data. An application to a trial data set created as part
of the EUREDIT project that contains a mix of extreme errors and
"real" values will be demonstrated. The tree-based approach can be
carried out on a variable by variable basis or on a multivariate
basis. Intial results from both these approaches will be contrasted
using this data set. Issues associated with "correcting" identified
outliers in these data will also be explored.