Old Catalan Morphosyntax: Developing an Annotated Corpus
Old Catalan Morphosyntax: Developing an Annotated Corpus
Blog Article
This paper presents a full procedure for the development of a Part-of-Speech (POS) tagged corpus of Old Catalan.As 830 an extremely low-resource language with rich inflection and frequent homographs, Old Catalan poses non-trivial problems in the development of a searchable constituency-based treebank.We demonstrate, however, that a semi- supervised method of incrementally building training data using both neural and memory-based taggers, together with the Pyrrha annotation tool is highly efficient and yields accurate results.We propose that this simple and effective method could easily Baby Bottle Nipples be extended to other low-resource historical languages for which no NLP tools exist yet.
Report this page