Ulster University Logo

Ulster Institutional Repository

An optimization of ReliefF for classification in large datasets

Biomedical Sciences Research Institute Computer Science Research Institute Environmental Sciences Research Institute Nanotechnology & Advanced Materials Research Institute

Huang, Yue, McCullagh, Paul and Black, Norman (2009) An optimization of ReliefF for classification in large datasets. Data & Knowledge Engineering, 68 (11). pp. 1348-1356. [Journal article]

[img]PDF - Published Version
Indefinitely restricted to Repository staff only.

569Kb

URL: http://dx.doi.org/10.1016/j.datak.2009.07.011

DOI: doi:10.1016/j.datak.2009.07.011

Abstract

ReliefF has proved to be a successful feature selector but when handling a large dataset, it is computationally expensive. We present an optimization using Supervised Model Construction which improves starter selection. Effectiveness has been evaluated using 12 UCI datasets and a clinical diabetes database. Experiments indicate that compared with ReliefF, the proposed method improved computation efficiency whilst maintaining the classification accuracy. In the clinical dataset (20,000 records with 47 features), feature selection via Supervised Model Construction (FSSMC) reduced the processing time by 80%, compared to ReliefF, and maintained accuracy for Naive Bayes, IB1 and C4.5 classifiers.

Item Type:Journal article
Keywords:Relief Feature selection Classification Efficiency
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Mathematics
Research Institutes and Groups:Computer Science Research Institute > Smart Environments
ID Code:239
Deposited By:Dr Paul McCullagh
Deposited On:20 Nov 2009 09:20
Last Modified:22 Jul 2011 11:42

Repository Staff Only: item control page