Abstract:
Abstract
We propose a pipeline to explain, on the level of
form, the unseen words contained in an Indonesian test
set, by using analogical clusters. Analogical clusters
are extracted from a training set by relying on formal
relations between words. The unseen words which can
be explained on the level of form are then verified on
two other representation levels: morpho-logical and
semantic. In our experiments on the BPPT corpus,
98 % of unseen words were explained on the level
form, out of which 58 % could also be explained on the
two levels of morphological and semantic
representations.