A Stochastic Method for Estimating Imputation Accuracy

Solomon, Norman (2008) A Stochastic Method for Estimating Imputation Accuracy. Doctoral thesis, University of Sunderland.

N_Solomon_MPhil_Thesis_2008.pdf - Accepted Version

Download (843kB)

Search Google Scholar


This thesis describes a novel imputation evaluation method and shows how this method can
be used to estimate the accuracy of the imputed values generated by any imputation
technique. This is achieved by using an iterative stochastic procedure to repeatedly measure
how accurately a set of randomly deleted values are “put back” by the imputation process.
The proposed approach builds on the ideas underpinning uncertainty estimation methods, but
differs from them in that it estimates the accuracy of the imputed values, rather than
estimating the uncertainty inherent within those values. In addition, a procedure for
comparing the accuracy of the imputed values in different data segments has been built into
the proposed method, but uncertainty estimation methods do not include such procedures.
This proposed method is implemented as a software application. This application is used to
estimate the accuracy of the imputed values generated by the expectation-maximisation (EM)
and nearest neighbour (NN) imputation algorithms. These algorithms are implemented
alongside the method, with particular attention being paid to the use of implementation
techniques which decrease algorithm execution times, so as to support the computationally
intensive nature of the method. A novel NN imputation algorithm is developed and the
experimental evaluation of this algorithm shows that it can be used to decrease the execution
time of the NN imputation process for both simulated and real datasets. The execution time of
the new NN algorithm was found to steadily decrease as the proportion of missing values in
the dataset was increased.
The method is experimentally evaluated and the results show that the proposed approach
produces reliable and valid estimates of imputation accuracy when it is used to compare the
accuracy of the imputed values generated by the EM and NN imputation algorithms. Finally,
a case study is presented which shows how the method has been applied in practice, including
a detailed description of the experiments that were performed in order to find the most
accurate methods of imputing the missing values in the case study dataset. A comprehensive
set of experimental results is given, the associated imputation accuracy statistics are analysed
and the feasibility of imputing the missing case study data is assessed.

Item Type: Thesis (Doctoral)
Subjects: Computing > Information Systems
Computing > Software Engineering
Divisions: Collections > Theses
Depositing User: Barry Hall
Date Deposited: 15 Apr 2013 15:56
Last Modified: 20 May 2019 13:31
URI: http://sure.sunderland.ac.uk/id/eprint/3785

Actions (login required)

View Item View Item


Downloads per month over past year