Abstract:
Document page segmentation and
classification are important parts of the document
analysis process. Page segmentation is that the page
is decomposed into blocks. It is also called “page
decomposition” or “zoning”. The goal of page
classification is to label these blocks according to
their contents. In this paper, multimedia documents
segmentation and document’s features classification
is presented. Multimedia documents usually consist
of a mixture of text, graphic, and image. One-scan
run-length smearing algorithm with block merging is
emphasized for document segmentation task. This
smearing algorithm is a document page segmentation
algorithm using a bottom-up approach. Document
classification task is performed based on features of
text, graphic and image. Separation and
classification of text, graphic, and image are
advantageous in reproducing, transmitting storing
the multimedia document and extraction different
parts of document.