next up previous
Next: SELECTIVE EDITING Up: NEURAL NETWORKS Previous: KERNEL METHODS FOR THE

NEURAL NETWORKS FOR EDITING AND IMPUTATION

Pasi Koikkalainen

Laboratory of Data Analysis
University of Jyväskylä
P.O.Box 35, FIN-40351 Jyväskylä
Finland

The concern of this presentation is how neural networks can be used for editing and imputation.

In imputation tasks the promise is that neural networks can overcome the problem of ``curse of dimensionality'':
Dense samples as needed to learn pdf well, but dense samples are hard to get in high dimensions.

When using neural networks the dimensionality of data is not the problem, rather it is the COMPLEXITY of data. The self-organizing map, for example, combines dimension reduction and data modelling under a single learning algorithm. This allows us to model multivariate distributions with relatively effectively. The imputation model is then obtained by conditionlizing the modelled distribution by observed values.

In editing neural networks can be used for both strong and weak type of error localization. Strong knowledge assumes that errors can be modelled, while weak knowledge expects that we are able to discriminate between acceptable and erroneous observations. The use of weak knowledge is more common in neural systems. The objective is to build a model that explains well all clean observations, but which gives low matching probabilities for erroneous ones. This can be done in two ways:

i)
Clean data is used for model building. As most models are based on mean values, also a measure of accepted spread around the model is needed.
ii)
When no clean training data is available, robust methods must be used during training. Then, according some criteria, samples that are suspicious are given less weight, or totally ignored, from the model. As well as in case i) a measure of accepted spread must be computed before actual error (outlier) detection can be done.


next up previous
Next: SELECTIVE EDITING Up: NEURAL NETWORKS Previous: KERNEL METHODS FOR THE

Pasi Koikkalainen
Fri Oct 18 19:03:41 EET DST 2002