National Journal of Parallel and Soft Computing (2020)https://onlineresource.ucsy.edu.mm/handle/123456789/25832024-03-29T09:56:25Z2024-03-29T09:56:25ZInformation Retrieval System Using BM25, Pivoted Normalization and CombSUM MethodKhaing, Nu YinHtwe, Ah Ngehttps://onlineresource.ucsy.edu.mm/handle/123456789/25872021-06-11T07:02:05Z2021-01-01T00:00:00ZInformation Retrieval System Using BM25, Pivoted Normalization and CombSUM Method
Khaing, Nu Yin; Htwe, Ah Nge
Retrieving information is difficult and time
consuming for searching a variety and large number
of documents on the digital library. This paper
intends to implement effective keyword search system
for digital library.BM25 and Pivoted Normalization
are best retrieval models for information retrieval
system. The CombSUM is combining these two
methods to get more relevant documents and to give
better output result. The proposed system will help
the user to get all relevant documents according to
the given query. When the user enters the query, the
most relevant documents are ranked by using BM25,
Pivoted Normalization Method and CombSUM.
2021-01-01T00:00:00ZDuplicate Record Detection in Data Cleaning Using DCS++ AlgorithmPhyo, Yin YinWin, Thidarhttps://onlineresource.ucsy.edu.mm/handle/123456789/25842021-06-11T07:02:02Z2021-01-01T00:00:00ZDuplicate Record Detection in Data Cleaning Using DCS++ Algorithm
Phyo, Yin Yin; Win, Thidar
Duplicate Record Detection is a multiple
record search process that represents the same
physical entity in a dataset. It is also known as the
record linkage (or) entity matching [1]. The databases
contain very large datasets. Datasets contain
duplicate records that do not share a common key or
contain errors such as incomplete information,
transcription errors and missing or differing standard
formats (non-standardized abbreviations) in the
detailed schemas of records from multiple databases.
So, the duplicate detection needs to complete its
process in a very shorter time. Duplicate detection
requires an algorithm for determining whether
records are duplicate records or not.
In this paper, calculate a similarity metric that is
commonly used to find similar field items and use the
Duplicate Count Strategy Multi-Record Increase
(DCS++) Algorithm for approximately duplicate
records detection over publication xml dataset.
2021-01-01T00:00:00Z