Close menu

SURE

Sunderland Repository records the research produced by the University of Sunderland including practice-based research and theses.

Two-Level Text Classification Using Hybrid Machine Learning Techniques

Tripathi, Nandita (2012) Two-Level Text Classification Using Hybrid Machine Learning Techniques. Doctoral thesis, University of Sunderland.

Item Type: Thesis (Doctoral)

Abstract

Nowadays, documents are increasingly being associated with multi-level
category hierarchies rather than a flat category scheme. To access these
documents in real time, we need fast automatic methods to navigate these
hierarchies. Today’s vast data repositories such as the web also contain many
broad domains of data which are quite distinct from each other e.g. medicine,
education, sports and politics. Each domain constitutes a subspace of the data
within which the documents are similar to each other but quite distinct from the
documents in another subspace. The data within these domains is frequently
further divided into many subcategories.
Subspace Learning is a technique popular with non-text domains such as
image recognition to increase speed and accuracy. Subspace analysis lends
itself naturally to the idea of hybrid classifiers. Each subspace can be
processed by a classifier best suited to the characteristics of that particular
subspace. Instead of using the complete set of full space feature dimensions,
classifier performances can be boosted by using only a subset of the
dimensions.
This thesis presents a novel hybrid parallel architecture using separate
classifiers trained on separate subspaces to improve two-level text
classification. The classifier to be used on a particular input and the relevant
feature subset to be extracted is determined dynamically by using a novel
method based on the maximum significance value. A novel vector
representation which enhances the distinction between classes within the
subspace is also developed. This novel system, the Hybrid Parallel Classifier,
was compared against the baselines of several single classifiers such as the
Multilayer Perceptron and was found to be faster and have higher two-level
classification accuracies. The improvement in performance achieved was even
higher when dealing with more complex category hierarchies.

[img]
Preview
PDF
Two-Level_Text_Classification.pdf - Accepted Version

Download (4MB)

More Information

Depositing User: Barry Hall

Identifiers

Item ID: 3305
URI: http://sure.sunderland.ac.uk/id/eprint/3305

Users with ORCIDS

Catalogue record

Date Deposited: 08 Jan 2013 16:03
Last Modified: 02 Jul 2019 09:07

Contributors

Author: Nandita Tripathi

University Divisions

Faculty of Technology > School of Computer Science

Subjects

Computing > Information Systems

Actions (login required)

View Item (Repository Staff Only) View Item (Repository Staff Only)