Abstract:
Nowadays, the World Wide Web technology is
developed and it is a very large, distributed digital
information collection and information contents in
web is implemented by the different format such as
HTML, XML etc. The role of information retrieval
over the Web is important because web search users
can make the best decision into their society if they
get precise information from the Web. But as a
result of huge amount of information over the Web
and its growth rate, it is difficult for web search user
to extract useful information. Field of information
retrieval (IR) is born and several IR systems are
used on everyday by a wide variety of users. This
paper takes the advantages of IR and Brute Force
pattern matching algorithm in the development of
Web Document Retrieval System. This paper firstly
describes how to pass the text preprocessing steps
for web documents. Secondly, indexing technique
for this system is described. Finally, how to match
the query terms and document terms is described.