Abstract:
Nowadays web pages are implemented in various kinds of languages on Web and web crawlers are
important for search engine. Language specific crawlers are crawlers that traverse and collect the
relative web pages using the successive URls of web page. There is very little research area in crawling
for Myanmar Language web sites. Most of the language specific crawlers are based on n-gram character
sequences which require training documents, the proposed crawler differ from those crawlers. The
proposed system focused on only part of crawler to search and retrieve Myanmar web pages for
Myanmar Language search engine. The proposed crawler detects the Myanmar character and rule based
syllable threshold is used to judgment the relevant of the pages. According to experimental results, the
proposed crawler has better performance, achieves successful accuracy and storage space for search
engines are lesser since it only crawls the relevant documents for Myanmar web sites.