Abstract:
Nowadays web pages are implemented in various kinds of languages on Web and web crawlers
are important for search engine. Language specific crawlers are crawlers that traverse and
collect the relative web pages using the successive URls of web page. There is very little
research area in crawling for Myanmar Language web sites. Most of the language specific
crawlers are based on n-gram character sequences which require training documents, the
proposed crawler differ from those crawlers. The proposed system focused on only part of
crawler to search and retrieve Myanmar web pages for Myanmar Language search engine. The
proposed crawler detects the Myanmar character and rule based syllable threshold is used to
judgment the relevant of the pages. According to experimental results, the proposed crawler has
better performance, achieves successful accuracy and storage space for search engines are
lesser since it only crawls the relevant documents for Myanmar web sites.