Global ETD Search

1	Extraction of Basic Noun Phrases from Natural Language Using Statistical Context-Free Grammar Afrin, Taniza 31 May 2001 (has links) The objective of this research was to extract simple noun phrases from natural language texts using two different grammars: stochastic context-free grammar (SCFG) and non-statistical context free grammar (CFG). Precision and recall were calculated to determine how many precise and correct noun phrases were extracted using these two grammars. Several text files containing sentences from English natural language specifications were analyzed manually to obtain the test-set of simple noun-phrases. To obtain precision and recall, this test-set of manually extracted noun phrases was compared with the extracted-sets of noun phrases obtained using the both grammars SCFG and CFG. A probabilistic chart parser was developed by modifying a deterministic parallel chart parser. Extraction of simple noun-phrases with the SCFG was accomplished using this probabilistic chart parser, a dictionary containing word probabilities along with the meaning, context-free grammar rules associated with rule probabilities and finally an algorithm to extract most likely parses of a sentence. The probabilistic parsing algorithm and the algorithm to determine figures of merit were implemented using C++ programming language. / Master of Science Information Extraction Probabilistic Parser Noun Phrase Stochastic Grammar
2	Construction de ressources linguistiques arabes à l’aide du formalisme de grammaires de propriétés en intégrant des mécanismes de contrôle / Building arabic linguistic resources using the property grammar formalism by integrating control mechanisms Bensalem, Raja 14 December 2017 (has links) La construction de ressources linguistiques arabes riches en informations syntaxiques constitue un enjeu important pour le développement de nouveaux outils de traitement automatique. Cette thèse propose une approche pour la création d’un treebank de l’arabe intégrant des informations d’un type nouveau reposant sur le formalisme des Grammaires de Propriétés. Une propriété syntaxique caractérise une relation pouvant exister entre deux unités d’une certaine structure syntaxique. Cette grammaire est induite automatiquement à partir du treebank arabe ATB, ce qui constitue un enrichissement de cette ressource tout en conservant ses qualités. Cet enrichissement a été également appliqué aux résultats d’analyse d’un analyseur état de l’art du domaine, le Stanford Parser, offrant la possibilité d’une évaluation s’appuyant sur un ensemble de mesures obtenues à partir de cette ressource. Les étiquettes des unités de cette grammaire sont structurées selon une hiérarchie de types permettant la variation de leur degré de granularité, et par conséquent du degré de précision des informations. Nous avons pu ainsi construire, à l’aide de cette grammaire, d’autres ressources linguistiques arabes. En effet, sur la base de cette nouvelle ressource, nous avons développé un analyseur syntaxique probabiliste à base de propriétés syntaxiques, le premier appliqué pour l'arabe. Une grammaire de propriétés lexicalisée probabiliste fait partie de son modèle d’apprentissage pour pouvoir affecter positivement le résultat d’analyse et caractériser ses structures syntaxiques avec les propriétés de ce modèle. Nous avons enfin évalué les résultats obtenus en les comparant à celles du Stanford Parser. / The building of syntactically informative Arabic linguistic resources is a major issue for the development of new machine processing tools. We propose in this thesis to create an Arabic treebank that integrates a new type of information, which is based on the Property Grammar formalism. A syntactic property is a relation between two units of a given syntactic structure. This grammar is automatically induced from the Arabic treebank ATB. We enriched this resource with the property representations of this grammar, while retaining its qualities. We also applied this enrichment to the parsing results of a state-of-the-art analyzer, the Stanford Parser. This provides the possibility of an evaluation using a measure set, which is calculated on this resource. We structured the tags of the units in this grammar according to a type hierarchy. This permit to vary the granularity level of these units, and consequently the accuracy level of the information. We have thus been able to construct, using this grammar, other Arabic linguistic resources. Secondly, based on this new resource, we developed a probabilistic syntactic parser based on syntactic properties. This is the first analyzer of this type that we have applied to Arabic. In the learning model, we integrated a probabilistic lexicalized property grammar that may positively affect the parsing result and describe its syntactic structures with its properties. Finally, we evaluated the parsing results of this approach by comparing them to those of the Stanford Parser. Grammaires de propriétés Langue arabe Treebanks Mécanismes de contrôle Analyseur syntaxique probabiliste Property grammars Arabic language Treebank Control mechanisms Probabilistic parser 004

Search results

Extraction of Basic Noun Phrases from Natural Language Using Statistical Context-Free Grammar

Construction de ressources linguistiques arabes à l’aide du formalisme de grammaires de propriétés en intégrant des mécanismes de contrôle / Building arabic linguistic resources using the property grammar formalism by integrating control mechanisms