Corpus-based methods are now dominant in Natural Language Processing (NLP). Creating big corpora is no longer difficult and the technology to analyze them is growing faster, more robust and more accurate. However, when an NLP application performs well on one corpus, it is unclear whether this level of performance would be maintained on others. To make progress on these questions, we need methods for comparing corpora. This thesis investigates comparison methods based on the notions of corpus homogeneity and similarity.
Identifer | oai:union.ndltd.org:bl.uk/oai:ethos.bl.uk:421298 |
Date | January 2005 |
Creators | Cavaglia, Gabriela Maria Chiara |
Publisher | University of Brighton |
Source Sets | Ethos UK |
Detected Language | English |
Type | Electronic Thesis or Dissertation |
Source | https://research.brighton.ac.uk/en/studentTheses/8b46265d-65c5-477e-9296-412fbb053ed0 |
Page generated in 0.0014 seconds