next up previous
Next: PRESENTATION OF THE INSPECTOR Up: EDITING AND IMPUTATION SYSTEMS Previous: CANADIAN CENSUS EDIT AND

A HIGH PERFORMANCE SCALABLE IMPUTATION SYSTEM

M. Weeks, K. Lees, S. O'Keefe, and J. Austin

Advanced Computer Architectures Group
Department of Computer Science
University of York
Heslington
YORK
YO10 5DD
UK

This paper describes the implementation of a highly scalable method for imputation. Specialised hardware in a distributed environment provides high performance and scalability. Imputation is the process by which missing fields in a data set can be generated from known acceptable data. As part of the Euredit project for the development and evaluation of new methods for editing and imputation, we use the k-nearest-neighbour (kNN) approach to determine the imputed data. However, kNN processing can be slow as the vector distance must be calculated between the query point and all points in the data set. For very large data sets performance can be restrictively slow. To solve this problem we apply AURA (Advanced Uncertain Reasoning Architecture), which is a generic family of neural network based techniques and implementations intended for high speed approximate search and match operations on large data sets. AURA is based upon a binary neural network called a Correlation Matrix Memory (CMM). Hardware PRESENCE (PaRallEl StructurEd Neural Computing Engine) cards have been developed to accelerate core CMM functionality. Mapping a CMM to multiple PRESENCE cards provides data and performance scalability. The Cortex-1 distributed neural processor is a high performance environment for AURA development. It is a seven node PC cluster containing 28 PRESENCE cards, providing 3.5 gigabytes of CMM storage. Allowing AURA to perform a high speed sift of a large data set, we can create a smaller subset of the data. Traditional kNN methods can then be applied to this subset, in order to determine the imputed values. This paper describes how the AURA imputation technique maps onto Cortex-1, and determines how its performance scales with increasing data set size.


next up previous
Next: PRESENTATION OF THE INSPECTOR Up: EDITING AND IMPUTATION SYSTEMS Previous: CANADIAN CENSUS EDIT AND

Pasi Koikkalainen
Fri Oct 18 19:03:41 EET DST 2002