Ulster University Logo

Ulster Institutional Repository

Multiple Sets of Rules for Text Categorization

Biomedical Sciences Research Institute Computer Science Research Institute Environmental Sciences Research Institute Nanotechnology & Advanced Materials Research Institute

Bi, Y, Anderson, TJ and McClean, SI (2004) Multiple Sets of Rules for Text Categorization. In: Advances in Information Systems 2004, Izmir, Turkey. Springer. 10 pp. [Conference contribution]

Full text not available from this repository.

URL: http://www.springerlink.com/index/X95B2KA5Y1PC925M

DOI: 10.1007/s10462-007-9049-y

Abstract

An important issue in text mining is how to make use of multiple pieces knowledge discovered to improve future decisions. In this paper, we propose a new approach to combining multiple sets of rules for text categorization using Dempster's rule of combination. We develop a boosting-like technique for generating multiple sets of rules based on rough set theory and model classification decisions from multiple sets of rules as pieces of evidence which can be combined by Dempster's rule of combination. We apply these methods to 10 of the 20-newsgroups--a benchmark data collection (Baker and McCallum 1998), individually and in combination. Our experimental results show that the performance of the best combination of the multiple sets of rules on the 10 groups of the benchmark data is statistically significant and better than that of the best single set of rules. The comparative analysis between the Dempster---Shafer and the majority voting (MV) methods along with an overfitting study confirm the advantage and the robustness of our approach.

Item Type:Conference contribution (Paper)
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Information Engineering
Faculty of Computing & Engineering > School of Computing and Mathematics
Research Institutes and Groups:Computer Science Research Institute
Computer Science Research Institute > Artificial Intelligence and Applications
Computer Science Research Institute > Information and Communication Engineering
ID Code:7576
Deposited By:Professor Sally McClean
Deposited On:02 Jul 2010 11:24
Last Modified:02 Jul 2010 11:24

Repository Staff Only: item control page