Return to search

Segmentation in super-chunks with a finite-state approach

Since Harris’ parser in the late 50s, multiword units have been progressively integrated in parsers. Nevertheless, in the most part, they are still restricted to compound words, that are more stable and less numerous. Actually, language is full of semi-fixed expressions that also form basic semantic units: semi-fixed adverbial expressions (e.g. time), collocations. Like compounds, the identification of these structures limits the combinatorial complexity induced by lexical ambiguity. In this paper, we detail an experiment that largely integrates these notions in a finite-state procedure of segmentation into super-chunks, preliminary to a parser.We show that the chunker, developped for French, reaches 92.9% precision and 98.7% recall. Moreover, multiword units realize 36.6% of the attachments within nominal and prepositional phrases.

Identiferoai:union.ndltd.org:Potsdam/oai:kobv.de-opus-ubp:2713
Date January 2008
CreatorsBlanc, Olivier, Constant, Matthieu, Watrin, Patrick
PublisherUniversität Potsdam, Extern. Extern
Source SetsPotsdam University
LanguageEnglish
Detected LanguageEnglish
TypeInProceedings
Formatapplication/pdf
Rightshttp://opus.kobv.de/ubp/doku/urheberrecht.php

Page generated in 0.0017 seconds