Detection and Elimination of Duplicate Data using Smart Token-based Method for Airline Ticket Reservation System

San, Hsu Mon Mon; Thwin, Khin Lay

UCSYRR Home
/
Conferences
/
Local Conference on Parallel and Soft Computing
/
Fifth Local Conference on Parallel and Soft Computing
/
View Item

Detection and Elimination of Duplicate Data using Smart Token-based Method for Airline Ticket Reservation System

San, Hsu Mon Mon; Thwin, Khin Lay

URI: http://onlineresource.ucsy.edu.mm/handle/123456789/1163

Date: 2010-12-16

Abstract:

Data Cleaning is a process for determining whether two or more records defined differently in a database, actually represent the same real world object. During data cleaning, multiple records representing the same real life object are identified, assigned only one unique database identification, and only one copy of exact duplicate records is retained. Token formation algorithm will be efficient in handling the noisy data by expanding abbreviation, removing unimportant characters and eliminating duplicates. Attribute selection algorithm is used for the attribute selecting before the token formatting. This algorithm and token formation algorithm is used for data cleaning to reduce a complexity of data cleaning process and to clean data flexibly and effortlessly without any confusion. This paper uses smart token to increase the speed of the cleaning process and improve the quality of the data.

Show full item record