Abstract:
Using semantic web technology through Information Retrieval (IR) process is
becoming an efficient way to enhance the accuracy of the search process and retrieve
more relevant results in the web-based systems, especially in the Digital Library. In the
Digital Library fields, Ontology can be used to organize bibliographic descriptions,
represent and expose the contents of the document, and share knowledge between users.
Therefore, the IR model for digital libraries based on the adaptation of the
Vector Space Model (VSM) combined with the Semantic Web technologies: Web
Ontology Language (OWL) and SPARQL protocol is proposed in this research. The
main concept of the proposed IR model is that metadata of resources are stored in
Resource Description Framework (RDF) format and retrieved not only by the
keywords contained in the user query but also by the contexts defined in Domain
Ontology. In the proposed IR model, preprocessing, context matching, and calculating
similarity values steps are included. The algorithm for the formatting of SPARQL
query is developed in the context matching step of IR model.
Based on the proposed IR model, Ontology-based IR system for Digital
Library is implemented in Service-Oriented Architecture (SOA) by using the XML
Web Service technology and ASP.NET. The architecture of the proposed system
consists of file storage for documents, one ontology dataset, and two programming
components: Digital Library Web Service and Web Application. In this proposed
system, Web Ontology Language (OWL) is used to design Ontology for Digital
Library using Protégé v3.5 tool. Functions for publication and retrieving of
documents are implemented as a web service by using the C# programming language.
The user interface is designed and implemented as a web application in ASP.NET
platform for consuming the functions of web service.
To show the performance of the proposed IR system, 415 training documents
including various file types (.doc, .pdf, .txt) were tested and 33 queries for different
properties of document were presented. To evaluate the performance of proposed IR
system, the precision, recall, and F-values are measured and compared. According to
the comparison results, the Ontology-based IR system is more accurate in searching
for ObjectProperty type. As a result, the proposed system serves user-friendly, highperformance and scalable semantic search for information from the digital library.