Abstract:
Data mining involves the use of sophisticated data analysis tools to discover previously unknown, valid patterns and relationships in large data sets. Sequential pattern mining is to find all frequent sequential pattern with a user specified minimum support threshold. PrefixSpan is a pattern growth method, particularly popular in biomedical fields. PrefixSpan is the most promosing of pattern growth method and is based on recursively constructing the pattern and the search to projected database. At each step, algorithm looks for the frequent sequences with prefix in the corresponding projected database. In prefixspan algorithm, no candidate sequence needs to be generated. The search space is reduced at each step allowing for better performance, in the presence of small support threshold. In this paper, the patterns of protein sequences can be discovered to analyse the structure of amino acids which are building blocks of protein sequences by using prefixspan approach.