Abstract:
The emergence of internet technology provides access to an endless supply of
data, information and knowledge to many different areas. Since the amount of digital
information has been increased rapidly day by day, yielding the big data analytics
becomes a problem for recommendation systems. With the availability of increasingly
large quantities of digital information in academic area, it is becoming more difficult to
find and extract relevant information pertinent to the interests. In addition, more efforts
are required to summarize and understand large amount of digital documents. In this
research, the correlated topic model (CTM) implemented in MapReduce framework is
proposed for generating relevant recommendations within a short response time when
the user provided the search query.
Firstly, the full-text documents of publicly available digital libraries are
collected to improve the accuracy of recommendations. With the aim of extracting
latent semantic topics from a collection of documents, the MapReduce CTM employs
a variational Expectation-Maximization (variational EM) algorithm. When learned
topics are coherent and interpretable, they may be valuable for the recommendations.
To address the poor prediction problem of recommendation system, the information
theoretic measure called entropy is proposed to measure the predictability between
documents. Finally, when the user enters the search query, the semantic similarity
between the search query and extracted topics are calculated for retrieving and
recommending relevant documents in the top-N recommendations list.
For the evaluation of the MapReduce CTM model, the topic coherence
measures, UCI and UMass, are used to investigate the semantic relatedness of the
extracted topics. The results of MapReduce CTM are then compared with another topic
model LDA, and observed that the proposed model learns more coherent and specific
topics. Moreover, the processing time of MapReduce CTM for extracting the latent
topics is also analysed. For the performance evaluation of the proposed paper
recommendation system, the precision and recall metrics are used to evaluate the
retrieval performance of recommendation system. According to the experimental
results, the proposed paper recommendation system with incorporation of MapReduce
CTM achieves the best possible performance in the quality of recommendations.