The objective of this research was to extract simple noun phrases from natural language texts using two different grammars: stochastic context-free grammar (SCFG) and non-statistical context free grammar (CFG). Precision and recall were calculated to determine how many precise and correct noun phrases were extracted using these two grammars. Several text files containing sentences from English natural language specifications were analyzed manually to obtain the test-set of simple noun-phrases. To obtain precision and recall, this test-set of manually extracted noun phrases was compared with the extracted-sets of noun phrases obtained using the both grammars SCFG and CFG. A probabilistic chart parser was developed by modifying a deterministic parallel chart parser. Extraction of simple noun-phrases with the SCFG was accomplished using this probabilistic chart parser, a dictionary containing word probabilities along with the meaning, context-free grammar rules associated with rule probabilities and finally an algorithm to extract most likely parses of a sentence. The probabilistic parsing algorithm and the algorithm to determine figures of merit were implemented using C++ programming language. / Master of Science
Identifer | oai:union.ndltd.org:VTETD/oai:vtechworks.lib.vt.edu:10919/33353 |
Date | 31 May 2001 |
Creators | Afrin, Taniza |
Contributors | Electrical and Computer Engineering, Cyre, Walling R., VanLandingham, Hugh F., Pratt, Timothy J. |
Publisher | Virginia Tech |
Source Sets | Virginia Tech Theses and Dissertation |
Detected Language | English |
Type | Thesis |
Format | application/pdf |
Rights | In Copyright, http://rightsstatements.org/vocab/InC/1.0/ |
Relation | ThesisTaniza.pdf |
Page generated in 0.0021 seconds