• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 987
  • 159
  • 80
  • 60
  • 27
  • 24
  • 18
  • 13
  • 13
  • 11
  • 8
  • 7
  • 5
  • 5
  • 4
  • Tagged with
  • 1713
  • 1713
  • 1617
  • 630
  • 579
  • 473
  • 391
  • 378
  • 271
  • 265
  • 263
  • 233
  • 224
  • 211
  • 207
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
131

SPEECH AND LANGUAGE TECHNOLOGIES FOR SEMANTICALLY LINKED INSTRUCTIONAL CONTENT

Swaminathan, Ranjini January 2011 (has links)
Recent advances in technology have made it possible to offer educational content online in the form of e-learning systems. The Semantically Linked Instructional Content (SLIC) system, developed at The University of Arizona,is one such system that hosts educational and technical videos online.This dissertation proposes the integration of speech and language technologies with the SLIC system.Speech transcripts are being used increasingly in video browsing systems to help understand the video content better and to do search on the content with text queries. Transcripts are especially useful for people with disabilities and those who have a limited understanding of the language of the video. Automatic Speech Recognizers (ASRs) are commonly used to generate speech transcripts for videos but are not consistent in their performance. This issue is more pronounced in a system like SLIC due to the technical nature of talks with words not seen in the ASR vocabulary and many speakers with different voices and accents making recognition harder.The videos in SLIC come with presentation slides that contain words specific to the talk subject and the speech transcript itself can be considered to be composed of these slide words interspersed with other words. Furthermore, the errors in the transcript are words that sound similar to what was actually spoken; notes instead of nodes for example. The errors that occur due to misrecognized slide words can be fixed if we know which slide words were actually spoken and where they occur in the transcript. In other words, the slide words are matched or aligned with the transcript.In this dissertation two algorithms are developed to phonetically align transcript words with slide words based on a Hidden Markov Model and a Hybrid hidden semi-Markov model respectively. The slide words constitute the hidden states and the transcript words are the observed states in both models. The alignment algorithms are adapted for different applications such as transcript correction (as already mentioned), search and indexing, video segmentation and closed captioning. Results from experiments conducted show that the corrected transcripts have improved accuracy andyield better search results for slide word queries.
132

Mental representation and processing of syntactic structure : evidence from Chinese

Cai, Zhenguang January 2011 (has links)
From the perspective of cognitive psychology, our knowledge of language can be viewed as mental representations and our use of language can be understood as the computation or processing of mental representations. This thesis explores the mental representation and processing of syntactic structure. The method used in this thesis is structural priming, a phenomenon in which people tend to repeat the linguistic structure that they have recently processed. The language under investigation is Chinese. The main research theme is divided up into four different questions. The first question is how syntactic structure is mentally represented. For a long time this has been a question for syntacticians whose main evidence is their intuition. There are, however, recent calls for experimental methods in the investigation of syntactic representation. I propose that structural priming can be used as an experimental approach to the investigation of syntactic representation. More specifically, structural priming can illuminate the constituent structure of a syntactic construction and help us determine which syntactic analysis corresponds to the representation of the construction. Three structural priming experiments on some controversial constructions in Mandarin were reported to show that structural priming can be used to distinguish alternative analyses of a syntactic construction. The second question concerns the use of thematic and lexical information in grammatical encoding in sentence production. Models of grammatical encoding differ in the locus of conceptual effects on grammatical encoding and the extent to which grammatical encoding is lexically guided. Five experiments were reported on these two issues. First, the results indicate that thematic information affects grammatical encoding by prompting the processor map thematic roles onto the same linear order as they were previously mapped. Though conceptual information was previously believed to only affect the assignment of grammatical functions (e.g., subject and object) to nouns (i.e., functional processing), this finding suggests that it can influence the linear order of sentence constituents (i.e., positional processing) as well. The results also show that the processor persists in using the same argument structure of the verb, implying that grammatical encoding is lexically guided to some extent. The third question concerns the processing of verb-phrase (VP) ellipsis in comprehension. Previous research on this topic disagrees on whether the interpretation of VP ellipsis is based over the syntactic or semantic representation of the antecedent and whether the antecedent representation is copied or reconstructed at the ellipsis site. An experiment was presented and the results show no structural priming effect from the ellipsis site. This suggests that no syntactic structure is reconstructed at the ellipsis and possibly no copying of the antecedent structure either. The results then favour a semantic account of VP ellipsis processing. The last question concerns the lexico-syntactic representation of cognates in Cantonese-Mandarin bilinguals. Previous research has paid little attention as to whether cognates have shared or distinct lemmas in bilinguals. Two experiments show that the structural priming effect from the cognate of a verb was smaller than from the verb itself, suggesting that Cantonese/Mandarin cognates have distinct rather than shared lemmas, though the syntactic information associated with cognates is collectively represented across the two languages. At the end of the thesis, I discussed the implications of these empirical studies and directions of further research.
133

Personality and alignment processes in dialogue : towards a lexically-based unified model

Brockmann, Carsten January 2009 (has links)
This thesis explores approaches to modelling individual differences in language use. The differences under consideration fall into two broad categories: Variation of the personality projected through language, and modelling of language alignment behaviour between dialogue partners. In a way, these two aspects oppose each other – language related to varying personalities should be recognisably different, while aligning speakers agree on common language during a dialogue. The central hypothesis is that such variation can be captured and produced with restricted computational means. Results from research on personality psychology and psycholinguistics are transformed into a series of lexically-based Affective Language Production Models (ALPMs) which are parameterisable for personality and alignment. The models are then explored by varying the parameters and observing the language they generate. ALPM-1 and ALPM-2 re-generate dialogues from existing utterances which are ranked and filtered according to manually selected linguistic and psycholinguistic features that were found to be related to personality. ALPM-3 is based on true overgeneration of paraphrases from semantic representations using the OPENCCG framework for Combinatory Categorial Grammar (CCG), in combination with corpus-based ranking and filtering by way of n-gram language models. Personality effects are achieved through language models built from the language of speakers of known personality. In ALPM-4, alignment is captured via a cache language model that remembers the previous utterance and thus influences the choice of the next. This model provides a unified treatment of personality and alignment processes in dialogue. In order to evaluate the ALPMs, dialogues between computer characters were generated and presented to human judges who were asked to assess the characters’ personality. In further internal simulations, cache language models were used to reproduce results of psycholinguistic priming studies. The experiments showed that the models are capable of producing natural language dialogue which exhibits human-like personality and alignment effects.
134

Automated question answering for clinical comparison questions

Leonhard, Annette Christa January 2012 (has links)
This thesis describes the development and evaluation of new automated Question Answering (QA) methods tailored to clinical comparison questions that give clinicians a rank-ordered list of MEDLINE® abstracts targeted to natural language clinical drug comparison questions (e.g. ”Have any studies directly compared the effects of Pioglitazone and Rosiglitazone on the liver?”). Three corpora were created to develop and evaluate a new QA system for clinical comparison questions called RetroRank. RetroRank takes the clinician’s plain text question as input, processes it and outputs a rank-ordered list of potential answer candidates, i.e. MEDLINE® abstracts, that is reordered using new post-retrieval ranking strategies to ensure the most topically-relevant abstracts are displayed as high in the result set as possible. RetroRank achieves a significant improvement over the PubMed recency baseline and performs equal to or better than previous approaches to post-retrieval ranking relying on query frames and annotated data such as the approach by Demner-Fushman and Lin (2007). The performance of RetroRank shows that it is possible to successfully use natural language input and a fully automated approach to obtain answers to clinical drug comparison questions. This thesis also introduces two new evaluation corpora of clinical comparison questions with “gold standard” references that are freely available and are a valuable resource for future research in medical QA.
135

Using a rewriting system to model individual writing styles

Lin, Jing January 2012 (has links)
Each individual has a distinguished writing style. But natural language generation systems pro- duce text with much less variety. Is it possible to produce more human-like text from natural language generation systems by mimicking the style of particular authors? We start by analysing the text of real authors. We collect a corpus of texts from a single genre (food recipes) with each text identified with its author, and summarise a variety of writing features in these texts. Each author's writing style is the combination of a set of features. Analysis of the writing features shows that not only does each individual author write differently but the differences are consistent over the whole of their corpus. Hence we conclude that authors do keep consistent style consisting of a variety of different features. When we discuss notions such as the style and meaning of texts, we are referring to the reac- tion that readers have to them. It is important, therefore, in the field of computational linguistics to experiment by showing texts to people and assessing their interpretation of the texts. In our research we move the thesis from simple discussion and statistical analysis of the properties of text and NLG systems, to perform experiments to verify the actual impact that lexical preference has on real readers. Through experiments that require participants to follow a recipe and prepare food, we conclude that it is possible to alter the lexicon of a recipe without altering the actions performed by the cook, hence that word choice is an aspect of style rather than semantics; and also that word choice is one of the writing features employed by readers in identifying the author of a text. Among all writing features, individual lexical preference is very important both for analysing and generating texts. So we choose individual lexical choice as our principal topic of research. Using a modified version of distributional similarity CDS) helps us to choose words used by in- dividual authors without the limitation of many other solutions such as a pre-built thesauri. We present an algorithm for analysis and rewriting, and assess the results. Based on the results we propose some further improvements.
136

Domain independent generation from RDF instance date

Sun, Xiantang January 2008 (has links)
The next generation of the web, the Semantic Web, integrates distributed web resources from various domains by allowing data (instantial and ontological data) to be shared and reused across applications, enterprise and community boundaries based on the Resource Description Framework (RDF). Nevertheless, the RDF was not developed for casual users who are unfamiliar with the RDF but interested in data represented using RDF. NLG may be a possible solution to bridging the gap between the casual users and RDF data, but the cost of separately applying fine grained NLG techniques for every domain in the Semantic Web would be extremely high, and hence not realistic.
137

Topics in Arabic auditory word recognition: effects of morphology and diglossia

Al-Omari, Moh'd A. 05 January 2017 (has links)
This dissertation investigates the cognitive relevance of Arabic morphology and diglossia in spoken word recognition. The current study asks four main questions: (1) Does Arabic morphology influence word recognition? (2) Which view of Arabic morphology (i.e., the root-based or the stem-based) has an online role in spoken word recognition? (3) Does Arabic diglossia (i.e., using colloquial Arabic (CA) and Modern Standard Arabic (MSA) as the dominant language of speaking and literacy, respectively) affect spoken word processing? (4) How can Arabic diglossia affect spoken word recognition? Three different lexical decision experiments and one phoneme-monitoring task were designed and conducted on a group of 140 literate native speakers of Jordanian colloquial Arabic (JCA). In the first experiment, the participant responded to MSA words varied in their surface, root, and stem frequencies. Results revealed that the token frequencies of the three tested units affected the speed of word recognition to the same extent. This suggests that both roots and stems, along with the surface words, are valid units of Arabic mental lexicon. The next two experiments compared the processing of JCA and MSA words when embedded in sentences of the same or the other variety of Arabic and when primed by intra-variety vs. cross-variety words. Results showed a lexical switching cost only when the target word is processed in the sentential context. Moreover, while the sentence experiment reported a processing advantage for MSA words relative to JCA words, the priming experiment found a processing advantage for JCA words. The priming effects were larger when the related primes were presented in JCA relative to the priming effects of the MSA primes. The fourth experiment compared phoneme monitoring of consonants and short vowels in JCA and MSA words. Results showed a detection advantage for consonants relative to short vowels and no difference between the carrier words of the two varieties of Arabic. On the whole, the last three experiments suggest that both spoken language (i.e., CA) experience and literary language (i.e., MSA) experience can affect auditory word recognition. This work emphasizes the relevance of (alphabetic) literacy and experimental task in speech processing. / February 2017
138

Automatic Tagging of Communication Data

Hoyt, Matthew Ray 08 1900 (has links)
Globally distributed software teams are widespread throughout industry. But finding reliable methods that can properly assess a team's activities is a real challenge. Methods such as surveys and manual coding of activities are too time consuming and are often unreliable. Recent advances in information retrieval and linguistics, however, suggest that automated and/or semi-automated text classification algorithms could be an effective way of finding differences in the communication patterns among individuals and groups. Communication among group members is frequent and generates a significant amount of data. Thus having a web-based tool that can automatically analyze the communication patterns among global software teams could lead to a better understanding of group performance. The goal of this thesis, therefore, is to compare automatic and semi-automatic measures of communication and evaluate their effectiveness in classifying different types of group activities that occur within a global software development project. In order to achieve this goal, we developed a web-based component that can be used to help clean and classify communication activities. The component was then used to compare different automated text classification techniques on various group activities to determine their effectiveness in correctly classifying data from a global software development team project.
139

Content-Based Geolocation Prediction of Canadian Twitter Users and Their Tweets

Metin, Ali Mert 13 August 2019 (has links)
Last decade witnessed the rise of online social networks, especially Twitter. Today, Twitteris a giant social platform with over 250 million users |who produce massive amounts of data everyday. This creates many research opportunities, speci cally for Natural Language Processing (NLP) in which text is utilized to extract information that could be used in many applications. One problem NLP might help solving is geolocation inference or geolocation detection from online social networks. Detecting the location of Twitter users based on the text of their tweets is useful since not many users publicly declare their locations or geotag their tweets. Location information is crucial for a variety of applications such as event detection, disease and illness tracking and user pro ling. These tasks are not trivial, because online content is often noisy; it includes misspellings, incomplete words or phrases, idiomatic expressions, abbreviations, acronyms, and Twitter-speci c literature. In this work, we attempted to detect the location of Canadian users |and tweets sent from Canada |at metropolitan areas and province level; this was not done before, to the best of our knowledge. In order to do this, we collected two di erent datasets, and applied a variety of machine learning, including deep learning methods. Besides, we also attempted to geolocate users based on their social graph (i.e., user's friends and followers) as a novel approach.
140

Geographic referring expressions : doing geometry with words

Gomes de Oliveira, Rodrigo January 2017 (has links)
No description available.

Page generated in 0.1032 seconds