A Fast Multivariate Nearest Neighbour Imputation Algorithm

Solomon, Norman, Oatley, Giles and McGarry, Kenneth (2007) A Fast Multivariate Nearest Neighbour Imputation Algorithm. Lecture Notes in Engineering and Computer Science, 2166 (1). pp. 940-948. ISSN 2078-0958

[img]
Preview
PDF
WCE2007_pp940-947.pdf - Published Version

Download (298kB)

Abstract

Imputation of missing data is important in many
areas, such as reducing non-response bias in surveys and
maintaining medical documentation. Nearest neighbour (NN)
imputation algorithms replace the missing values within any
particular observation by taking copies of the corresponding
known values from the most similar observation found in the
dataset. However, when NN algorithms are executed against large
multivariate datasets the poor
performance (program execution
speed) of these algorithms can present major practical problems.
We argue that these problems
have not been sufficiently
addressed, and we present a fast NN imputation algorithm that
can employ any method for meas
uring the similarity between
observations. The algorithm has b
een designed for the imputation
of missing values in large multivar
iate datasets that contain many
different missingness patterns with large proportions of missing
data. The ideas underpinning th
e algorithm are explained in
detail, and experiments are described which show that the
algorithm delivers very good perf
ormance when it is used for
imputation in both segmented and non-segmented datasets
containing several million rows

Item Type: Article
Subjects: Computing > Artificial Intelligence
Computing > Databases
Divisions: Faculty of Applied Sciences
Depositing User: Kenneth McGarry
Date Deposited: 12 Mar 2015 09:55
Last Modified: 08 Mar 2017 23:04
URI: http://sure.sunderland.ac.uk/id/eprint/5284

Actions (login required)

View Item View Item

Downloads

Downloads per month over past year