Abstract:
Binarization is one of sub phases of
preprocessing step of optical character recognition
(OCR). Binarization is separation of foreground text
from background of document image. The accuracy
of OCR mainly relies on binarization’s result. This
paper compares several alternative binarization
algorithms for aged printed Myanmar documents.
The algorithms evaluated are global thresholding
(Otsu), Local thresholding (Niblack, Sauvola, Wolf,
Feng and Nick). It is found that the binarized images
more stable if filters (Wiener and Gaussian) are prior
used before applying binarization algorithms.
Another one is that local thresholding is suit for aged
Myanmar documents. Among local thresholding,
Niblack, Sauvola and Wolf are the more suitable
algorithms based on the experimental results. The
quality of binarized images is verified by using
different assessment parameters like mean square
error (MSE), signal to noise ratio (SNR) and peak
signal to noise ratio (PSNR). This work aims to get
the high accuracy of recognition steps with the main
objective of developing OCR of aged printed
Myanmar documents.