Abstract:
The vocabulary of a natural language
processing (NLP) system is usually limited by the
word forms learnt by the system in the preliminary
step, for example, word forms seen in the training
corpus. Thus, out-of-vocabulary (OOV) problem is
an important issue in NLP. In this paper, we study
the plausibility of unseen word forms generated from
analogical grids on Indonesian, a language known
for its richness in derivational morphology. We
construct analogical grids from a list of word forms
contained in an annotated Indonesian corpus. We
generate new word forms by filling the empty cells in
the analogical grids. We verify these generated word
forms using morphological analyzer and count how
many of them are valid Indonesian word forms.