Abstract:
Efficient extraction of useful information is a
rising problem in Big data, since the amount of information
being gathered across various domains grows with an increasing
rate. So, it takes more time to understand the underlying themes
of the documents collection. To deal with such problem in the
context of Big data, the proposed approach implements the
correlated topic model (CTM) with MapReduce framework to
reveal the thematic information represented by words, to speed
up the processing and to increase the scalability of the model.
We apply variational Expectation-Maximization (EM) to make
inference for CTM. In this paper, academic articles are collected
by using a web crawler. Then CTM is exploited to uncover the
underlying themes of the collection. The use of CTM with
MapReduce implementation improves the accuracy and
performance in a reliable and scalable manner.