• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 897
  • 156
  • 74
  • 55
  • 27
  • 18
  • 16
  • 11
  • 10
  • 8
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1557
  • 1557
  • 1557
  • 605
  • 549
  • 451
  • 376
  • 366
  • 256
  • 248
  • 237
  • 223
  • 214
  • 194
  • 190
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

GeneTUC: Natural Language Understanding in Medical Text

Sætre, Rune January 2006 (has links)
<p>Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists.</p><p>The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems.</p><p>The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities.</p><p>The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.</p>
132

Exploiting Lexical Regularities in Designing Natural Language Systems

Katz, Boris, Levin, Beth 01 April 1988 (has links)
This paper presents the lexical component of the START Question Answering system developed at the MIT Artificial Intelligence Laboratory. START is able to interpret correctly a wide range of semantic relationships associated with alternate expressions of the arguments of verbs. The design of the system takes advantage of the results of recent linguistic research into the structure of the lexicon, allowing START to attain a broader range of coverage than many existing systems.
133

GeneTUC: Natural Language Understanding in Medical Text

Sætre, Rune January 2006 (has links)
Natural Language Understanding (NLU) is a 50 years old research field, but its application to molecular biology literature (BioNLU) is a less than 10 years old field. After the complete human genome sequence was published by Human Genome Project and Celera in 2001, there has been an explosion of research, shifting the NLU focus from domains like news articles to the domain of molecular biology and medical literature. BioNLU is needed, since there are almost 2000 new articles published and indexed every day, and the biologists need to know about existing knowledge regarding their own research. So far, BioNLU results are not as good as in other NLU domains, so more research is needed to solve the challenges of creating useful NLU applications for the biologists. The work in this PhD thesis is a “proof of concept”. It is the first to show that an existing Question Answering (QA) system can be successfully applied in the hard BioNLU domain, after the essential challenge of unknown entities is solved. The core contribution is a system that discovers and classifies unknown entities and relations between them automatically. The World Wide Web (through Google) is used as the main resource, and the performance is almost as good as other named entity extraction systems, but the advantage of this approach is that it is much simpler and requires less manual labor than any of the other comparable systems. The first paper in this collection gives an overview of the field of NLU and shows how the Information Extraction (IE) problem can be formulated with Local Grammars. The second paper uses Machine Learning to automatically recognize protein name based on features from the GSearch Engine. In the third paper, GSearch is substituted with Google, and the task in this paper is to extract all unknown names belonging to one of 273 biomedical entity classes, like genes, proteins, processes etc. After getting promising results with Google, the fourth paper shows that this approach can also be used to retrieve interactions or relationships between the named entities. The fifth paper describes an online implementation of the system, and shows that the method scales well to a larger set of entities. The final paper concludes the “proof of concept” research, and shows that the performance of the original GeneTUC NLU system has increased from handling 10% of the sentences in a large collection of abstracts in 2001, to 50% in 2006. This is still not good enough to create a commercial system, but it is believed that another 40% performance gain can be achieved by importing more verb templates into GeneTUC, just like nouns were imported during this work. Work has already begun on this, in the form of a local Masters Thesis.
134

Personalized Medicine through Automatic Extraction of Information from Medical Texts

Frunza, Oana Magdalena 17 April 2012 (has links)
The wealth of medical-related information available today gives rise to a multidimensional source of knowledge. Research discoveries published in prestigious venues, electronic-health records data, discharge summaries, clinical notes, etc., all represent important medical information that can assist in the medical decision-making process. The challenge that comes with accessing and using such vast and diverse sources of data stands in the ability to distil and extract reliable and relevant information. Computer-based tools that use natural language processing and machine learning techniques have proven to help address such challenges. This current work proposes automatic reliable solutions for solving tasks that can help achieve a personalized-medicine, a medical practice that brings together general medical knowledge and case-specific medical information. Phenotypic medical observations, along with data coming from test results, are not enough when assessing and treating a medical case. Genetic, life-style, background and environmental data also need to be taken into account in the medical decision process. This thesis’s goal is to prove that natural language processing and machine learning techniques represent reliable solutions for solving important medical-related problems. From the numerous research problems that need to be answered when implementing personalized medicine, the scope of this thesis is restricted to four, as follows: 1. Automatic identification of obesity-related diseases by using only textual clinical data; 2. Automatic identification of relevant abstracts of published research to be used for building systematic reviews; 3. Automatic identification of gene functions based on textual data of published medical abstracts; 4. Automatic identification and classification of important medical relations between medical concepts in clinical and technical data. This thesis investigation on finding automatic solutions for achieving a personalized medicine through information identification and extraction focused on individual specific problems that can be later linked in a puzzle-building manner. A diverse representation technique that follows a divide-and-conquer methodological approach shows to be the most reliable solution for building automatic models that solve the above mentioned tasks. The methodologies that I propose are supported by in-depth research experiments and thorough discussions and conclusions.
135

Automatic Supervised Thesauri Construction with Roget’s Thesaurus

Kennedy, Alistair H 07 December 2012 (has links)
Thesauri are important tools for many Natural Language Processing applications. Roget's Thesaurus is particularly useful. It is of high quality and has been in development for over a century and a half. Yet its applications have been limited, largely because the only publicly available edition dates from 1911. This thesis proposes and tests methods of automatically updating the vocabulary of the 1911 Roget’s Thesaurus. I use the Thesaurus as a source of training data in order to learn from Roget’s for the purpose of updating Roget’s. The lexicon is updated in two stages. First, I develop a measure of semantic relatedness that enhances existing distributional techniques. I improve existing methods by using known sets of synonyms from Roget’s to train a distributional measure to better identify near synonyms. Second, I use the new measure of semantic relatedness to find where in Roget’s to place a new word. Existing words from Roget’s are used as training data to tune the parameters of three methods of inserting words. Over 5000 new words and word-senses were added using this process. I conduct two kinds of evaluation on the updated Thesaurus. One is on the procedure for updating Roget’s. This is accomplished by removing some words from the Thesaurus and testing my system's ability to reinsert them in the correct location. Human evaluation of the newly added words is also performed. Annotators must determine whether a newly added word is in the correct location. They found that in most cases the new words were almost indistinguishable from those already existing in Roget's Thesaurus. The second kind of evaluation is to establish the usefulness of the updated Roget’s Thesaurus on actual Natural Language Processing applications. These applications include determining semantic relatedness between word pairs or sentence pairs, identifying the best synonym from a set of candidates, solving SAT-style analogy problems, pseudo-word-sense disambiguation, and sentence ranking for text summarization. The updated Thesaurus consistently performed at least as well or better the original Thesaurus on all these applications.
136

Corpus construction based on Ontological domain knowledge

Benis, Nirupama, Kaliyaperumal, Rajaram January 2011 (has links)
The purpose of this thesis is to contribute a corpus for sentence level interpretation of biomedical language. The available corpora for the biomedical domain are small in terms of amount of text and predicates. Besides that these corpora are developed rather intuitively. In this effort which we call BioOntoFN, we created a corpus from the domain knowledge provided by an ontology. By doing this we believe that we can provide a rough set of rules to create corpora from ontologies. Besides that we also designed an annotation tool specifically for building our corpus. We built a corpus for biological transport events. The ontology we used is the piece of Gene Ontology pertaining to transport, the term transport GO: 0006810 and all of its child concepts, which could be called a sub-ontology. The annotation of the corpus follows the rules of FrameNet and the output is annotated text that is in an XML format similar to that of FrameNet. The text for the corpus is taken from abstracts of MEDLINE articles. The annotation tool is a GUI created using Java.
137

Using Rhetorical Figures and Shallow Attributes as a Metric of Intent in Text

Strommer, Claus Walter January 2011 (has links)
In this thesis we propose a novel metric of document intent evaluation based on the detection and classification of rhetorical figure. In doing so we dispel the notion that rhetoric lacks the structure and consistency necessary to be relevant to computational linguistics. We show how the combination of document attributes available through shallow parsing and rules extracted from the definitions of rhetorical figures produce a metric which can be used to reliably classify the intent of texts. This metric works equally well on entire documents as on portions of a document.
138

Word Sense Disambiguation Using WordNet and Conceptual Expansion

Guo, Jian-Yi 24 January 2006 (has links)
As a single English word can have several different meanings, a single meaning can be expressed by several different English words. The meaning of a word depends on the sense intended. Thus to select the most appropriate meaning for an ambiguous word within a context is a critical problem for the applications using the technologies of natural language processing. However, at present, most word sense disambiguation methods either disambiguate only restricted parts of speech words such as only nouns or the accuracy in disambiguating word senses is not satisfiable. The ambiguous situation often bothers users. In this study, a new word sense disambiguation method using WordNet lexicon database, SemCor text files, and the Web is presented. In addition to nouns, the proposed method also attempts to disambiguate verbs, adjectives, and adverbs in sentences. The text files and sentences investigated in the experiments were randomly selected from SemCor. The semantic similarity between the senses of individually semantically ambiguous words in a word pair is measured to select the applicable candidate senses of a target word in that word pair. By a synonym weighting method, the possible sense diversity in synonym sets is considered based on the synonym sets WordNet provides. Thus corresponding synonym sets of the candidate senses are determined. The candidate senses expanded with the senses in the corresponding synonym sets, and enhanced by the context window technique form new queries. After the new queries are submitted to a search engine to search for the matching documents on the Web, the candidate senses are ranked by the number of the matching documents found. The first sense in the list of the ranked candidate senses is viewed as the most appropriate sense of the target word. The proposed method as well as Stetina et al.¡¦s and Mihalcea et al.¡¦s methods are evaluated based on the SemCor text files. The experimental results show that for the top sense selected this method having the average accuracy of disambiguating word senses with 81.3% for nouns, verbs, adjectives, and adverbs is slightly better than Stetina et al.¡¦s method of 80% and Mihalcea et al.¡¦s method of 80.1%. Furthermore, the proposed method is the only method with the accuracy of disambiguating word senses for verbs achieving 70% for the top one sense selected. Moreover, for the top three senses selected this method is superior to the other two methods by an average accuracy of the four parts of speech exceeding 96%. It is expected that the proposed method can improve the performance of the word sense disambiguation applications in machine translation, document classification, or information retrieval.
139

Generating natural language text in response to questions about database structure /

McKeown, Kathleen R. January 1900 (has links)
Thesis (Ph. D.)--University of Pennsylvania, 1982. / Cover title. Includes bibliographical references and index.
140

TERRESA a task-based message-driven parallel semantic network system /

Lee, Chain-Wu. January 1900 (has links)
Thesis (Ph. D.)--State University of New York at Buffalo, 1999. / "January 25, 1999." Includes bibliographical references (leaves 201-209). Also available in print.

Page generated in 0.1218 seconds