Abstract:
Email is one of the commonest modes of communication via text. By using email, people are sending and receiving many messages per day and communicating with partners and friends. Most of email data is very noisy. Thus, text normalization is the most popular and it is necessary to clean up email data. Text cleaning and normalization is a significant aspect in developing many text processing and information extraction applications in email data cleaning processes. Many text normalization applications need to take email as input. Text normalization has many methods to find the useful information. Among these methods, a Cascaded Approach is very suitable for cleaning email data. Our proposed system is to convert the canonical form from the “informally inputted” text by using text normalization. Moreover, this paper is to eliminate “noises” in the text and to detect paragraph and sentence boundaries in the text.