National Journal of Parallel and Soft Computing (2020)

National Journal of Parallel and Soft Computing (2020) https://onlineresource.ucsy.edu.mm/handle/123456789/2583 2025-03-11T05:02:43Z 2025-03-11T05:02:43Z Information Retrieval System Using BM25, Pivoted Normalization and CombSUM Method Khaing, Nu Yin Htwe, Ah Nge https://onlineresource.ucsy.edu.mm/handle/123456789/2587 2021-06-11T07:02:05Z 2021-01-01T00:00:00Z

Information Retrieval System Using BM25, Pivoted Normalization and CombSUM Method Khaing, Nu Yin; Htwe, Ah Nge Retrieving information is difficult and time consuming for searching a variety and large number of documents on the digital library. This paper intends to implement effective keyword search system for digital library.BM25 and Pivoted Normalization are best retrieval models for information retrieval system. The CombSUM is combining these two methods to get more relevant documents and to give better output result. The proposed system will help the user to get all relevant documents according to the given query. When the user enters the query, the most relevant documents are ranked by using BM25, Pivoted Normalization Method and CombSUM.

2021-01-01T00:00:00Z Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm Phyo, Yin Yin Win, Thidar https://onlineresource.ucsy.edu.mm/handle/123456789/2584 2021-06-11T07:02:02Z 2021-01-01T00:00:00Z

Duplicate Record Detection in Data Cleaning Using DCS++ Algorithm Phyo, Yin Yin; Win, Thidar Duplicate Record Detection is a multiple record search process that represents the same physical entity in a dataset. It is also known as the record linkage (or) entity matching [1]. The databases contain very large datasets. Datasets contain duplicate records that do not share a common key or contain errors such as incomplete information, transcription errors and missing or differing standard formats (non-standardized abbreviations) in the detailed schemas of records from multiple databases. So, the duplicate detection needs to complete its process in a very shorter time. Duplicate detection requires an algorithm for determining whether records are duplicate records or not. In this paper, calculate a similarity metric that is commonly used to find similar field items and use the Duplicate Count Strategy Multi-Record Increase (DCS++) Algorithm for approximately duplicate records detection over publication xml dataset.

2021-01-01T00:00:00Z