Ulster University Logo

Ulster Institutional Repository

Conceptual Clustering of Heterogeneous Gene Expression Sequences

Biomedical Sciences Research Institute Computer Science Research Institute Environmental Sciences Research Institute Nanotechnology & Advanced Materials Research Institute

McClean, SI, Scotney, BW and Robinson, S (2003) Conceptual Clustering of Heterogeneous Gene Expression Sequences. Artificial Intelligence Review (Special Issue on Life Science and AI), 20 (1-2). pp. 53-73. [Journal article]

Full text not available from this repository.

DOI: 10.1023/A:1026036631075

Abstract

We are concerned with clustering and characterising gene expression sequences that have been classified according to heterogeneous classification schemes. We adopt a model-based approach that uses a Hidden Markov Model (HMM) that has as states the stages of the underlying process that generates the gene sequences, thus allowing us to handle complex and heterogeneous data. Each cluster is described in terms of a HMM where we seek to find schema mappings between the states of the original sequences and the states of the HMM.The general solution that we propose involves several distinct tasks. Firstly, there is a clustering problem where we seek to group similar sequences; for this we use mutual entropy to identify associations between sequence states. Secondly, because we are concerned with clustering heterogeneous sequences, we must determine the mappings between the states of each sequence in a cluster and the states of an underlying hidden process; for this we compute the most probable mapping. Thirdly, using these mappings we employ maximum likelihood techniques to learn the probabilistic description of the hidden Markov process for each cluster. Fourthly, we use these descriptions to characterise the clusters using Dynamic Programming to determine the most probable pathway for each cluster. Finally, we derive linguistic labels to describe the clusters in a user-friendly manner. Such an approach provides an intuitive way of describing the underlying shape of the process by explicitly modelling the temporal aspects of the data. Non time-homogeneous HMMs are used to capture the full temporal semantics.

Item Type:Journal article
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Information Engineering
Research Institutes and Groups:Computer Science Research Institute
Computer Science Research Institute > Information and Communication Engineering
ID Code:49
Deposited By:Professor Sally McClean
Deposited On:23 Sep 2009 14:10
Last Modified:15 Jun 2011 11:07

Repository Staff Only: item control page