Ulster University Logo

Ulster Institutional Repository

A multi-dimensional sequence approach to measuring tree similarity

Biomedical Sciences Research Institute Computer Science Research Institute Environmental Sciences Research Institute Nanotechnology & Advanced Materials Research Institute

Lin, Zhiwei, Wang, Hui and McClean, Sally (2012) A multi-dimensional sequence approach to measuring tree similarity. IEEE Transactions on Knowledge and Data Engineering, 24 (2). pp. 197-208. [Journal article]

Full text not available from this repository.

URL: http://ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=5645628

DOI: 10.1109/TKDE.2010.239

Abstract

Tree is one of the most common and well-studied data structures in computer science. Measuring the similarity of such structures is key to analyzing this type of data. However, measuring tree similarity is not trivial due to the inherent complexity of trees and the ensuing large search space. In this paper, trees are represented as multi-dimensional sequences and their similarity is measured on the basis of their sequence representations. Multidimensional sequences have their sequential dimensions and spatial dimensions. We measure the sequential similarity by the all common subsequences sequence similarity measurement or longest common subsequence measurement, and measure the spatial similarity by dynamic time warping. Then we combine them to give a measure of tree similarity. A brute force algorithm to calculate this similarity will have high computational cost. In the spirit of dynamic programming two efficient algorithms are designed for calculating this similarity, which have quadratic time complexity. The new measurements are evaluated in terms of classification accuracy in two popular classifiers (k-nearest neighbor and support vector machine) and in terms of search effectiveness and efficiency in kNN similarity search, using 3 different datasets from natural language processing and information retrieval. Experimental results show that the new measurements outperform the benchmark measures consistently and significantly.

Item Type:Journal article
Faculties and Schools:Faculty of Computing & Engineering
Faculty of Computing & Engineering > School of Computing and Information Engineering
Faculty of Computing & Engineering > School of Computing and Mathematics
Research Institutes and Groups:Computer Science Research Institute
Computer Science Research Institute > Artificial Intelligence and Applications
Computer Science Research Institute > Information and Communication Engineering
ID Code:17369
Deposited By:Professor Hui Wang
Deposited On:02 Mar 2011 12:18
Last Modified:20 Apr 2012 11:13

Repository Staff Only: item control page