Abstract:
Recent research has demonstrated the strong performance of hidden Markov models (HMM) applied to information extraction that the text of populating database slots with corresponding phrases from text documents. Hidden Markov Models (HMMs) are a powerful probabilistic tool for modeling time series data, and have been applied with success to many language-related tasks such as part of speech tagging, speech recognition, text segmentation and topic detection. This paper describes the application of HMMs to another language related task—information extraction—the problem of locating textual sub-segments that answer a particular information need. In this paper, the HMM state transition probabilities and word emission probabilities are learned from labeled training data.