Abstract:
Sequential pattern mining is an important data
mining problem with broad applications. Most of the
sequential pattern mining methods, such as GSP
(Generalized Sequential Pattern) and AprioriAll
explore a candidate generation and test approach to
reduce the number of candidates to be examined.
These approaches may not be efficient in mining
large sequence databases having numerous patterns
and/or long patterns. The better algorithm for
sequential pattern is based on pattern-growth, a
divide-and-conquer algorithm that projects and
partitions databases based on the currently identified
frequent patterns and grow such patterns to longer
ones using the projected databases. This paper
presents mining sequential pattern from library
database by prefixSpan algorithm, which explores
prefix projection in sequential pattern mining.
PrefixSpan mines the complete set of patterns but
greatly reduces the efforts of candidate subsequence
generation. Moreover, prefix-projection substantially
reduces the size of projected databases and leads to
efficient processing. The experimental results of our
system are also discussed in this paper.