Ulster University Logo

Ulster Institutional Repository

Knowledge discovery by probabilistic clustering of distributed databases

Biomedical Sciences Research Institute Computer Science Research Institute Environmental Sciences Research Institute Nanotechnology & Advanced Materials Research Institute

McClean, SI, Scotney, BW, Morrow, PJ and Greer, KRC (2005) Knowledge discovery by probabilistic clustering of distributed databases. Data and Knowledge Engineering, 54 (2). pp. 189-210. [Journal article]

Full text not available from this repository.

DOI: 10.1016/j.datak.2004.12.001

Abstract

Clustering of distributed databases facilitates knowledge discovery through learning of new concepts that characterise common features and differences between datasets. Hence, general patterns can be learned rather than restricting learning to specific databases from which rules may not be generalisable. We cluster databases that hold aggregate count data on categorical attributes that have been classified according to homogeneous or heterogeneous classification schemes. Clustering of datasets is carried out via the probability distributions that describe their respective aggregates. The homogeneous case is straightforward. For heterogeneous data we investigate a number of clustering strategies, of which the most efficient avoid the need to compute a dynamic shared ontology to homogenise the classification schemes prior to clustering.

Item Type:Journal article
Keywords:Distributed databases; Probabilistic clustering; Aggregates; Dynamic shared ontology
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Information Engineering
Research Institutes and Groups:Computer Science Research Institute
Computer Science Research Institute > Information and Communication Engineering
ID Code:6815
Deposited By:Professor Bryan Scotney
Deposited On:20 Jan 2010 15:54
Last Modified:15 Jun 2011 11:07

Repository Staff Only: item control page