• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 907
  • 156
  • 74
  • 55
  • 27
  • 20
  • 18
  • 11
  • 10
  • 9
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1573
  • 1573
  • 1573
  • 611
  • 558
  • 459
  • 378
  • 369
  • 259
  • 252
  • 241
  • 224
  • 217
  • 198
  • 194
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

Establishing the reliability of natural language processing evaluation through linear regression modelling / E.R. Eiselen.

Eiselen, Ernst Roald January 2013 (has links)
Determining the quality of natural language applications is one of the most important aspects of technology development. There has, however, been very little work done on establishing how well the methods and measures represent the quality of the technology and how reliable the evaluation results presented in most research are. This study presents a new stepwise evaluation reliability methodology that provides a step-by-step framework for creating predictive models of evaluation metric reliability that take into account inherent evaluation variables. These models can then be used to predict how reliable a particular evaluation will be prior to doing an evaluation, based on the variables that are present in the evaluation data. This allows evaluators to predict the reliability of the evaluation prior to doing the evaluation and adjusting the evaluation data to ensure reliable results. Furthermore, this permits researchers to compare results when the same evaluation data is not available. The new methodology is firstly applied to a well-defined technology, namely spelling checkers, with a detailed discussion of the evaluation techniques and statistical procedures required to accurately model an evaluation. The spelling checker evaluations are investigated in more detail to show how individual variables affect the evaluation results. Finally, a predictive regression model for each of the spelling checker evaluations is created and validated to verify the accuracy of its predictive capability. After performing the in-depth analysis and application of the stepwise evaluation reliability methodology on spelling checkers, the methodology is applied to two more technologies, namely part of speech tagging and named entity recognition. These validation procedures are applied across multiple languages, specifically Dutch, English, Spanish and Iberian Portuguese. Performing these additional evaluations shows that the methodology is applicable to a broader set of technologies across multiple languages. / Thesis (PhD (Linguistics and Literary Theory))--North-West University, Potchefstroom Campus, 2013.
162

Automatic Supervised Thesauri Construction with Roget’s Thesaurus

Kennedy, Alistair H 07 December 2012 (has links)
Thesauri are important tools for many Natural Language Processing applications. Roget's Thesaurus is particularly useful. It is of high quality and has been in development for over a century and a half. Yet its applications have been limited, largely because the only publicly available edition dates from 1911. This thesis proposes and tests methods of automatically updating the vocabulary of the 1911 Roget’s Thesaurus. I use the Thesaurus as a source of training data in order to learn from Roget’s for the purpose of updating Roget’s. The lexicon is updated in two stages. First, I develop a measure of semantic relatedness that enhances existing distributional techniques. I improve existing methods by using known sets of synonyms from Roget’s to train a distributional measure to better identify near synonyms. Second, I use the new measure of semantic relatedness to find where in Roget’s to place a new word. Existing words from Roget’s are used as training data to tune the parameters of three methods of inserting words. Over 5000 new words and word-senses were added using this process. I conduct two kinds of evaluation on the updated Thesaurus. One is on the procedure for updating Roget’s. This is accomplished by removing some words from the Thesaurus and testing my system's ability to reinsert them in the correct location. Human evaluation of the newly added words is also performed. Annotators must determine whether a newly added word is in the correct location. They found that in most cases the new words were almost indistinguishable from those already existing in Roget's Thesaurus. The second kind of evaluation is to establish the usefulness of the updated Roget’s Thesaurus on actual Natural Language Processing applications. These applications include determining semantic relatedness between word pairs or sentence pairs, identifying the best synonym from a set of candidates, solving SAT-style analogy problems, pseudo-word-sense disambiguation, and sentence ranking for text summarization. The updated Thesaurus consistently performed at least as well or better the original Thesaurus on all these applications.
163

Evaluating Text Segmentation

Fournier, Christopher 24 April 2013 (has links)
This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmenters have been created to perform this task, and the question that this thesis answers is how to select the best automatic segmenter for such a task. This requires choosing an appropriate segmentation evaluation metric, confirming the reliability of a manual solution, and then finally employing an evaluation methodology that can select the automatic segmenter that best approximates human performance. A variety of comparison methods and metrics exist for comparing segmentations (e.g., WindowDiff, Pk), and all save a few are able to award partial credit for nearly missing a boundary. Those comparison methods that can award partial credit unfortunately lack consistency, symmetricity, intuition, and a host of other desirable qualities. This work proposes a new comparison method named boundary similarity (B) which is based upon a new minimal boundary edit distance to compare two segmentations. Near misses are frequent, even among manual segmenters (as is exemplified by the low inter-coder agreement reported by many segmentation studies). This work adapts some inter-coder agreement coefficients to award partial credit for near misses using the new metric proposed herein, B. The methodologies employed by many works introducing automatic segmenters evaluate them simply in terms of a comparison of their output to one manual segmentation of a text, and often only by presenting nothing other than a series of mean performance values (along with no standard deviation, standard error, or little if any statistical hypothesis testing). This work asserts that one segmentation of a text cannot constitute a “true” segmentation; specifically, one manual segmentation is simply one sample of the population of all possible segmentations of a text and of that subset of desirable segmentations. This work further asserts that an adapted inter-coder agreement statistics proposed herein should be used to determine the reproducibility and reliability of a coding scheme and set of manual codings, and then statistical hypothesis testing using the specific comparison methods and methodologies demonstrated herein should be used to select the best automatic segmenter. This work proposes new segmentation evaluation metrics, adapted inter-coder agreement coefficients, and methodologies. Most important, this work experimentally compares the state-or-the-art comparison methods to those proposed herein upon artificial data that simulates a variety of scenarios and chooses the best one (B). The ability of adapted inter-coder agreement coefficients, based upon B, to discern between various levels of agreement in artificial and natural data sets is then demonstrated. Finally, a contextual evaluation of three automatic segmenters is performed using the state-of-the art comparison methods and B using the methodology proposed herein to demonstrate the benefits and versatility of B as opposed to its counterparts.
164

Coping with uncertainty : noun phrase interpretation and early semantic analysis

Mellish, Christopher Stuart January 1981 (has links)
A computer program which can "understand" natural language texts must have both syntactic knowledge about the language concerned and semantic knowledge of how what is written relates to its internal representation of the world. It has been a matter of some controversy how these sources of information can best be integrated to translate from an input text to a formal meaning representation. The controversy has concerned largely the question as to what degree of syntactic analysis must be performed before any semantic analysis can take place. An extreme position in this debate is that a syntactic parse tree for a complete sentence must be produced before any investigation of that sentence's meaning is appropriate. This position has been criticised by those who see understanding as a process that takes place gradually as the text is read, rather than in sudden bursts of activity at the ends of sentences. These people advocate a model where semantic analysis can operate on fragments of text before the global syntactic structure is determined - a strategy which we will call early semantic analysis. In this thesis, we investigate the implications of early semantic analysis in the interpretation of noun phrases. One possible approach is to say that a noun phrase is a self-contained unit and can be fully interpreted by the time it has been read. Thus it can always be determined what objects a noun phrase refers to without consulting much more than the structure of the phrase itself. This approach was taken in part by Winograd [Winograd 72], who saw the constraint that a noun phrase have a referent as a valuable aid in resolving local syntactic ambiguity. Unfortunately, Winograd's work has been criticised by Ritchie, because it is not always possible to determine what a noun phrase refers to purely on the basis of local information. In this thesis, we will go further than this and claim that, because the meaning of a noun phrase can be affected by so many factors outside the phrase itself, it makes no sense to talk about "the referent" as a function of -a noun phrase. Instead, the notion of "referent" is something defined by global issues of structure and consistency. Having rejected one approach to the early semantic analysis of noun phrases, we go on to develop an alternative, which we call incremental evaluation. The basic idea is that a noun phrase does provide some information about what it refers to. It should be possible to represent this partial information and gradually refine it as relevant implications of the context are followed up. Moreover, the partial information should be available to an inference system, which, amongst other things, can detect the absence of a referent and provide the advantages of Winograd's system. In our system, noun phrase interpretation does take place locally, but the point is that it does not finish there. Instead, the determination of the meaning of a noun phrase is spread over the subsequent analysis of how it contributes to the meaning of the text as a whole.
165

Personalized Medicine through Automatic Extraction of Information from Medical Texts

Frunza, Oana Magdalena 17 April 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions.
166

Using Rhetorical Figures and Shallow Attributes as a Metric of Intent in Text

Strommer, Claus Walter January 2011 (has links)
In this thesis we propose a novel metric of document intent evaluation based on the detection and classification of rhetorical figure. In doing so we dispel the notion that rhetoric lacks the structure and consistency necessary to be relevant to computational linguistics. We show how the combination of document attributes available through shallow parsing and rules extracted from the definitions of rhetorical figures produce a metric which can be used to reliably classify the intent of texts. This metric works equally well on entire documents as on portions of a document.
167

Class-free answer typing

Pinchak, Christopher 11 1900 (has links)
Answer typing is an important aspect of the question answering process. Most commonly addressed with the use of a fixed set of possible answer classes via question classification, answer typing influences which answers will ultimately be selected as correct. Answer typing introduces the concept of type-appropriate responses. Such responses are plausible in the context of question answering when they are believable as answers to a given question. This notion of type-appropriateness is distinct from correctness, as there may exist many type-appropriate responses that are not correct answers. Type-appropriate responses can even exist for other kinds of queries that are not strictly questions. This work introduces class-free models of answer type for certain kinds of questions as well as models of type-appropriateness useful to the domain of information retrieval. Models built for both open-ended noun phrase questions and how-adjective questions are designed to evaluate the type-appropriateness of a candidate answer directly rather than via the use of an intermediary question class (as is done with question classification). Experiments show a meaningful improvement over alternative typing strategies for these kinds of questions. Ideas from these models are then applied outside of the domain of question answering in an effort to improve traditional information retrieval results. Experiments comparing reranked results with those of the Google search engine show improvements are made in those rare situations for which Google provides less than ideal results.
168

Computer aided pronunciation system (CAPS) /

Ananthakrishnan, Kollengode Subramanian. Unknown Date (has links)
Thesis (MEng(TelecommunicationsbyResearch))--University of South Australia, 2003.
169

Orthographic support for passing the reading hurdle in Japanese

Yencken, Lars January 2010 (has links)
Learning a second language is, for the most part, a day-in day-out struggle against the mountain of new vocabulary a learner must acquire. Furthermore, since the number of new words to learn is so great, learners must acquire them autonomously. Evidence suggests that for languages with writing systems, native-like vocabulary sizes are only developed through reading widely, and that reading is only fruitful once learners have acquired the core vocabulary required for it to become smooth. Learners of Japanese have an especially high barrier in the form of the Japanese writing system, in particular its use of kanji characters. Recent work on dictionary accessibility has focused on compensating for learner errors in pronouncing unknown words, however much difficulty remains. / This thesis uses the rich visual nature of the Japanese orthography to support the study of vocabulary in several ways. Firstly, it proposes a range of kanji similarity measures and evaluates them over several new data sets, finding that the stroke edit distance and tree edit distance metrics best approximate human judgements. Secondly, it uses stroke edit distance construct a model of kanji misrecognition, which we use as the basis for a new form of kanji search by similarity. Analysing query logs, we find that this new form of search was rapidly adopted by users, indicating its utility. We finally combine kanji confusion and pronunciation models into a new adaptive testing platform, Kanji Tester, modelled after aspects of the Japanese Language Proficiency Test. As the user tests themselves, the system adapts to their error patterns and uses this information to make future tests more difficult. Investigating logs of use, we find a weak positive correlation between ability estimates and time the system has been used. Furthermore, our adaptive models generated questions which were significantly more difficult than their control counterparts. / Overall, these contributions make a concerted effort to improve tools for learner self-study, so that learners can successfully overcome the reading hurdle and propel themselves towards greater proficiency. The data collected from these tools also forms a useful basis for further study of learner error and vocabulary development.
170

Orthographic support for passing the reading hurdle in Japanese

Yencken, Lars January 2010 (has links)
Learning a second language is, for the most part, a day-in day-out struggle against the mountain of new vocabulary a learner must acquire. Furthermore, since the number of new words to learn is so great, learners must acquire them autonomously. Evidence suggests that for languages with writing systems, native-like vocabulary sizes are only developed through reading widely, and that reading is only fruitful once learners have acquired the core vocabulary required for it to become smooth. Learners of Japanese have an especially high barrier in the form of the Japanese writing system, in particular its use of kanji characters. Recent work on dictionary accessibility has focused on compensating for learner errors in pronouncing unknown words, however much difficulty remains. / This thesis uses the rich visual nature of the Japanese orthography to support the study of vocabulary in several ways. Firstly, it proposes a range of kanji similarity measures and evaluates them over several new data sets, finding that the stroke edit distance and tree edit distance metrics best approximate human judgements. Secondly, it uses stroke edit distance construct a model of kanji misrecognition, which we use as the basis for a new form of kanji search by similarity. Analysing query logs, we find that this new form of search was rapidly adopted by users, indicating its utility. We finally combine kanji confusion and pronunciation models into a new adaptive testing platform, Kanji Tester, modelled after aspects of the Japanese Language Proficiency Test. As the user tests themselves, the system adapts to their error patterns and uses this information to make future tests more difficult. Investigating logs of use, we find a weak positive correlation between ability estimates and time the system has been used. Furthermore, our adaptive models generated questions which were significantly more difficult than their control counterparts. / Overall, these contributions make a concerted effort to improve tools for learner self-study, so that learners can successfully overcome the reading hurdle and propel themselves towards greater proficiency. The data collected from these tools also forms a useful basis for further study of learner error and vocabulary development.

Page generated in 0.1206 seconds