• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 209
  • 95
  • 71
  • 68
  • 23
  • 12
  • 9
  • 9
  • 6
  • 6
  • 6
  • 6
  • 4
  • 4
  • 3
  • Tagged with
  • 614
  • 124
  • 83
  • 54
  • 50
  • 49
  • 44
  • 42
  • 37
  • 37
  • 35
  • 34
  • 34
  • 33
  • 31
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Segmentação de sentenças e detecção de disfluências em narrativas transcritas de testes neuropsicológicos / Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests

Treviso, Marcos Vinícius 20 December 2017 (has links)
Contexto: Nos últimos anos, o Comprometimento Cognitivo Leve (CCL) tem recebido uma grande atenção, pois pode representar um estágio pré-clínico da Doença de Alzheimer (DA). Em termos de distinção entre idosos saudáveis (CTL) e pacientes com CCL, vários estudos têm mostrado que a produção de discurso é uma tarefa sensível para detectar efeitos de envelhecimento e para diferenciar indivíduos com CCL dos saudáveis. Ferramentas de Processamento de Língua Natural (PLN) têm sido aplicadas em transcrições de narrativas em inglês e também em português brasileiro, por exemplo, o ambiente Coh-Metrix-Dementia. Lacunas: No entanto, a ausência de informações de limites de sentenças e a presença de disfluências em transcrições impedem a aplicação direta de ferramentas que dependem de um texto bem formado, como taggers e parsers. Objetivos: O objetivo principal deste trabalho é desenvolver métodos para segmentar as transcrições em sentenças e detectar/remover as disfluências presentes nelas, de modo que sirvam como uma etapa de pré-processamento para ferramentas subsequentes de PLN. Métodos e Avaliação: Propusemos um método baseado em redes neurais recorrentes convolucionais (RCNNs) com informações prosódicas, morfossintáticas e word embeddings para a tarefa de segmentação de sentenças (SS). Já para a detecção de disfluências (DD), dividimos o método e a avaliação de acordo com as categorias de disfluências: (i) para preenchimentos (pausas preenchidas e marcadores discursivos), propusemos a mesma RCNN com as mesmas features de SS em conjunto com uma lista pré-determinada de palavras; (ii) para disfluências de edição (repetições, revisões e recomeços), adicionamos features tradicionalmente empregadas em trabalhos relacionados e introduzimos um modelo de CRF na camada de saída da RCNN. Avaliamos todas as tarefas intrinsecamente, analisando as features mais importantes, comparando os métodos propostos com métodos mais simples, e identificando os principais acertos e erros. Além disso, um método final, chamado DeepBonDD, foi criado combinando todas as tarefas, e foi avaliado extrinsecamente com 9 métricas sintáticas do Coh-Metrix-Dementia. Conclusão: Para SS, obteve-se F1 = 0:77 em transcrições de CTL e F1 = 0:74 de CCL, caracterizando o estado-da-arte para esta tarefa em fala comprometida. Para detecção de preenchimentos, obtevese em média F1 = 0:90 para CTL e F1 = 0:92 para CCL, resultados que estão dentro da margem de trabalhos relacionados da língua inglesa. Ao serem ignorados os recomeços na detecção de disfluências de edição, obteve-se em média F1 = 0:70 para CTL e F1 = 0:75 para CCL. Na avaliação extrínseca, apenas 3 métricas mostraram diferença significativa entre as transcrições de CCL manuais e as geradas pelo DeepBonDD, sugerindo que, apesar das variações de limites de sentença e de disfluências, o DeepBonDD é capaz de gerar transcrições para serem processadas por ferramentas de PLN. / Background: In recent years, mild cognitive impairment (MCI) has received great attention because it may represent a pre-clinical stage of Alzheimers Disease (AD). In terms of distinction between healthy elderly (CTL) and MCI patients, several studies have shown that speech production is a sensitive task to detect aging effects and to differentiate individuals with MCI from healthy ones. Natural language procesing tools have been applied to transcripts of narratives in English and also in Brazilian Portuguese, for example, Coh-Metrix-Dementia. Gaps: However, the absence of sentence boundary information and the presence of disfluencies in transcripts prevent the direct application of tools that depend on well-formed texts, such as taggers and parsers. Objectives: The main objective of this work is to develop methods to segment the transcripts into sentences and to detect the disfluencies present in them (independently and jointly), to serve as a preprocessing step for the application of subsequent Natural Language Processing (NLP) tools. Methods and Evaluation: We proposed a method based on recurrent convolutional neural networks (RCNNs) with prosodic, morphosyntactic and word embeddings features for the sentence segmentation (SS) task. For the disfluency detection (DD) task, we divided the method and the evaluation according to the categories of disfluencies: (i) for fillers (filled pauses and discourse marks), we proposed the same RCNN with the same SS features along with a predetermined list of words; (ii) for edit disfluencies (repetitions, revisions and restarts), we added features traditionally employed in related works and introduced a CRF model after the RCNN output layer. We evaluated all the tasks intrinsically, analyzing the most important features, comparing the proposed methods to simpler ones, and identifying the main hits and misses. In addition, a final method, called DeepBonDD, was created combining all tasks and was evaluated extrinsically using 9 syntactic metrics of Coh-Metrix-Dementia. Conclusion: For SS, we obtained F1 = 0:77 in CTL transcripts and F1 = 0:74 in MCI, achieving the state of the art for this task on impaired speech. For the filler detection, we obtained, on average, F1 = 0:90 for CTL and F1 = 0:92 for MCI, results that are similar to related works of the English language. When restarts were ignored in the detection of edit disfluencies, F1 = 0:70 was obtained for CTL and F1 = 0:75 for MCI. In the extrinsic evaluation, only 3 metrics showed a significant difference between the manual MCI transcripts and those generated by DeepBonDD, suggesting that, despite result differences in sentence boundaries and disfluencies, DeepBonDD is able to generate transcriptions to be properly processed by NLP tools.
112

Outomatiese Afrikaanse tekseenheididentifisering / deur Martin J. Puttkammer

Puttkammer, Martin Johannes January 2006 (has links)
An important core technology in the development of human language technology applications is an automatic morphological analyser. Such a morphological analyser consists of various modules, one of which is a tokeniser. At present no tokeniser exists for Afrikaans and it has therefore been impossible to develop a morphological analyser for Afrikaans. Thus, in this research project such a tokeniser is being developed, and the project therefore has two objectives: i)to postulate a tag set for integrated tokenisation, and ii) to develop an algorithm for integrated tokenisation. In order to achieve the first object, a tag set for the tagging of sentences, named-entities, words, abbreviations and punctuation is proposed specifically for the annotation of Afrikaans texts. It consists of 51 tags, which can be expanded in future in order to establish a larger, more specific tag set. The postulated tag set can also be simplified according to the level of specificity required by the user. It is subsequently shown that an effective tokeniser cannot be developed using only linguistic, or only statistical methods. This is due to the complexity of the task: rule-based modules should be used for certain processes (for example sentence recognition), while other processes (for example named-entity recognition) can only be executed successfully by means of a machine-learning module. It is argued that a hybrid system (a system where rule-based and statistical components are integrated) would achieve the best results on Afrikaans tokenisation. Various rule-based and statistical techniques, including a TiMBL-based classifier, are then employed to develop such a hybrid tokeniser for Afrikaans. The final tokeniser achieves an ∫-score of 97.25% when the complete set of tags is used. For sentence recognition an ∫-score of 100% is achieved. The tokeniser also recognises 81.39% of named entities. When a simplified tag set (consisting of only 12 tags) is used to annotate named entities, the ∫-score rises to 94.74%. The conclusion of the study is that a hybrid approach is indeed suitable for Afrikaans sentencisation, named-entity recognition and tokenisation. The tokeniser will improve if it is trained with more data, while the expansion of gazetteers as well as the tag set will also lead to a more accurate system / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2006.
113

Funkcinė stilistinė sintaksinių priemonių diferenciacija V. Juknaitės tekstuose / Functional stylistic measures of syntactic differentiation in V. Juknaitė texts

Pranskevičiūtė, Inga 11 July 2011 (has links)
Lietuvių kalbos funkcinių stilių skyrimo kriterijai priklauso nuo atskirų tyrinėtojų iškeliamų požymių. Kalbotyros terminų žodyne funkcinis stilius apibūdinamas kaip kalbos atmaina, vartojama tam tikroje žmonių veiklos srityje. Galima skirti tokius funkcinius stilius: buitinį, kanceliarinį, meninį, publicistinį ir mokslinį. Kiekviename funkciniame stiliuje vartojamos skirtingos kalbos priemonės, jos savitai organizuojamos. Žanras – viena iš svarbesnių stilistinių kategorijų, kuria perduodama informacija. Tai labai parankus tekstų klasifikavimo įrankis, tačiau jam trūksta aiškiai apibrėžtų klasifikavimo kriterijų. Šio darbo objektas – sakinys ir jo funkcionavimas V.Juknaitės įvairių kalbos atmainų, įvairių žanrų tekstuose. Sakinio ilgis – vienas iš sakinio parametrų, padedantis atskleisti teksto savitumą. Šiame darbe pirmą kartą siekiama analizuoti šnekamosios kalbos sakinius ir jų parametrus. Šnekamosios kalbos sakinio ilgis dar nėra tyrinėtas. Ištyrinėjus sakytinės ir rašytinės kalbos sakinių ilgio struktūrą, patyrinėjus V.Juknaitės kalbinės laiškų raiškos savitumą galima kalbėti apie autorės idiostilių. Būtent autorystės valia ir formuojamas įvairių V.Juknaitės rašymų formų – meninės kūrybos, publicistinių, mokslinių tekstų, tiek šnekamosios kalbos – idiostilius. Rašytojos siekis meniniais vaizdais apipavidalinti kūrinio erdvės ir laiko pokyčius skatina įvairių raiškos priemonių vartojimą, jų atranką pagal individualius poreikius. Todėl kalba yra labai svarbus instrumentas... [toliau žr. visą tekstą] / Functional styles of Lithuanian language's criteria depends on the raised features by individual researchers. Functional style in Linguistics Glossary is characterized as a language variety used in a particular area of human activity. There are some functional styles: domestic, clerical, artistic, scientific, publicistic. Each functional style of language are used by the different measures and they are organized individually. Genre is one of the most important stylistic categories in which information is transmitted. This is a very handy tool for text classification, but it lacks to clearly defined criteria for classification. Work object - the sentence and its functioning in different language, different genres of texts of V Juknaitė. Length of the sentence is one of the parameters of the sentence that helps to reveal the text identity. This work pursues to analyze the spoken sentences and their settings. Sentence lenght of conversational speech hasn't been explored yet. Explored the spoken and written language, sentence length, structure and look at the linguistic expression of V.Juknaitė we can talk about auhtor's idiostyle. It were the various V.Juknaitė posting forms of idiostyle - artistic, publicistic, scientific texts and spoken language. The writer's ambition to format the work space and time changes by artistic images are motivate the application of different time expressions and screening them by individual needs. Therefore language is a very important tool that... [to full text]
114

Outomatiese Afrikaanse tekseenheididentifisering / deur Martin J. Puttkammer

Puttkammer, Martin Johannes January 2006 (has links)
An important core technology in the development of human language technology applications is an automatic morphological analyser. Such a morphological analyser consists of various modules, one of which is a tokeniser. At present no tokeniser exists for Afrikaans and it has therefore been impossible to develop a morphological analyser for Afrikaans. Thus, in this research project such a tokeniser is being developed, and the project therefore has two objectives: i)to postulate a tag set for integrated tokenisation, and ii) to develop an algorithm for integrated tokenisation. In order to achieve the first object, a tag set for the tagging of sentences, named-entities, words, abbreviations and punctuation is proposed specifically for the annotation of Afrikaans texts. It consists of 51 tags, which can be expanded in future in order to establish a larger, more specific tag set. The postulated tag set can also be simplified according to the level of specificity required by the user. It is subsequently shown that an effective tokeniser cannot be developed using only linguistic, or only statistical methods. This is due to the complexity of the task: rule-based modules should be used for certain processes (for example sentence recognition), while other processes (for example named-entity recognition) can only be executed successfully by means of a machine-learning module. It is argued that a hybrid system (a system where rule-based and statistical components are integrated) would achieve the best results on Afrikaans tokenisation. Various rule-based and statistical techniques, including a TiMBL-based classifier, are then employed to develop such a hybrid tokeniser for Afrikaans. The final tokeniser achieves an ∫-score of 97.25% when the complete set of tags is used. For sentence recognition an ∫-score of 100% is achieved. The tokeniser also recognises 81.39% of named entities. When a simplified tag set (consisting of only 12 tags) is used to annotate named entities, the ∫-score rises to 94.74%. The conclusion of the study is that a hybrid approach is indeed suitable for Afrikaans sentencisation, named-entity recognition and tokenisation. The tokeniser will improve if it is trained with more data, while the expansion of gazetteers as well as the tag set will also lead to a more accurate system / Thesis (M.A. (Applied Language and Literary Studies))--North-West University, Potchefstroom Campus, 2006.
115

Zpracování vět s věrohodnými a nevěrohodnými aktanty v češtině / Processing of sentences containing plausible and implausible actants in Czech

Bažantová, Olga January 2017 (has links)
The diploma thesis is a part of a good enough sentence processing research. In the theoretical part, I describe the origin of this approach and main research areas - garden- path sentences and noncanonical sentences. The practical part of the thesis introduces three experiments which partially replicate experiments of F. Ferreira (2003), results of these experiments, interpretation and comparation to the results of experiments in English. The Czech results show that Czech speakers unlike English speakers tend to use only heuristic of plausibility and do not use the NVN strategy.
116

Segmentação de sentenças e detecção de disfluências em narrativas transcritas de testes neuropsicológicos / Sentence Segmentation and Disfluency Detection in Narrative Transcripts from Neuropsychological Tests

Marcos Vinícius Treviso 20 December 2017 (has links)
Contexto: Nos últimos anos, o Comprometimento Cognitivo Leve (CCL) tem recebido uma grande atenção, pois pode representar um estágio pré-clínico da Doença de Alzheimer (DA). Em termos de distinção entre idosos saudáveis (CTL) e pacientes com CCL, vários estudos têm mostrado que a produção de discurso é uma tarefa sensível para detectar efeitos de envelhecimento e para diferenciar indivíduos com CCL dos saudáveis. Ferramentas de Processamento de Língua Natural (PLN) têm sido aplicadas em transcrições de narrativas em inglês e também em português brasileiro, por exemplo, o ambiente Coh-Metrix-Dementia. Lacunas: No entanto, a ausência de informações de limites de sentenças e a presença de disfluências em transcrições impedem a aplicação direta de ferramentas que dependem de um texto bem formado, como taggers e parsers. Objetivos: O objetivo principal deste trabalho é desenvolver métodos para segmentar as transcrições em sentenças e detectar/remover as disfluências presentes nelas, de modo que sirvam como uma etapa de pré-processamento para ferramentas subsequentes de PLN. Métodos e Avaliação: Propusemos um método baseado em redes neurais recorrentes convolucionais (RCNNs) com informações prosódicas, morfossintáticas e word embeddings para a tarefa de segmentação de sentenças (SS). Já para a detecção de disfluências (DD), dividimos o método e a avaliação de acordo com as categorias de disfluências: (i) para preenchimentos (pausas preenchidas e marcadores discursivos), propusemos a mesma RCNN com as mesmas features de SS em conjunto com uma lista pré-determinada de palavras; (ii) para disfluências de edição (repetições, revisões e recomeços), adicionamos features tradicionalmente empregadas em trabalhos relacionados e introduzimos um modelo de CRF na camada de saída da RCNN. Avaliamos todas as tarefas intrinsecamente, analisando as features mais importantes, comparando os métodos propostos com métodos mais simples, e identificando os principais acertos e erros. Além disso, um método final, chamado DeepBonDD, foi criado combinando todas as tarefas, e foi avaliado extrinsecamente com 9 métricas sintáticas do Coh-Metrix-Dementia. Conclusão: Para SS, obteve-se F1 = 0:77 em transcrições de CTL e F1 = 0:74 de CCL, caracterizando o estado-da-arte para esta tarefa em fala comprometida. Para detecção de preenchimentos, obtevese em média F1 = 0:90 para CTL e F1 = 0:92 para CCL, resultados que estão dentro da margem de trabalhos relacionados da língua inglesa. Ao serem ignorados os recomeços na detecção de disfluências de edição, obteve-se em média F1 = 0:70 para CTL e F1 = 0:75 para CCL. Na avaliação extrínseca, apenas 3 métricas mostraram diferença significativa entre as transcrições de CCL manuais e as geradas pelo DeepBonDD, sugerindo que, apesar das variações de limites de sentença e de disfluências, o DeepBonDD é capaz de gerar transcrições para serem processadas por ferramentas de PLN. / Background: In recent years, mild cognitive impairment (MCI) has received great attention because it may represent a pre-clinical stage of Alzheimers Disease (AD). In terms of distinction between healthy elderly (CTL) and MCI patients, several studies have shown that speech production is a sensitive task to detect aging effects and to differentiate individuals with MCI from healthy ones. Natural language procesing tools have been applied to transcripts of narratives in English and also in Brazilian Portuguese, for example, Coh-Metrix-Dementia. Gaps: However, the absence of sentence boundary information and the presence of disfluencies in transcripts prevent the direct application of tools that depend on well-formed texts, such as taggers and parsers. Objectives: The main objective of this work is to develop methods to segment the transcripts into sentences and to detect the disfluencies present in them (independently and jointly), to serve as a preprocessing step for the application of subsequent Natural Language Processing (NLP) tools. Methods and Evaluation: We proposed a method based on recurrent convolutional neural networks (RCNNs) with prosodic, morphosyntactic and word embeddings features for the sentence segmentation (SS) task. For the disfluency detection (DD) task, we divided the method and the evaluation according to the categories of disfluencies: (i) for fillers (filled pauses and discourse marks), we proposed the same RCNN with the same SS features along with a predetermined list of words; (ii) for edit disfluencies (repetitions, revisions and restarts), we added features traditionally employed in related works and introduced a CRF model after the RCNN output layer. We evaluated all the tasks intrinsically, analyzing the most important features, comparing the proposed methods to simpler ones, and identifying the main hits and misses. In addition, a final method, called DeepBonDD, was created combining all tasks and was evaluated extrinsically using 9 syntactic metrics of Coh-Metrix-Dementia. Conclusion: For SS, we obtained F1 = 0:77 in CTL transcripts and F1 = 0:74 in MCI, achieving the state of the art for this task on impaired speech. For the filler detection, we obtained, on average, F1 = 0:90 for CTL and F1 = 0:92 for MCI, results that are similar to related works of the English language. When restarts were ignored in the detection of edit disfluencies, F1 = 0:70 was obtained for CTL and F1 = 0:75 for MCI. In the extrinsic evaluation, only 3 metrics showed a significant difference between the manual MCI transcripts and those generated by DeepBonDD, suggesting that, despite result differences in sentence boundaries and disfluencies, DeepBonDD is able to generate transcriptions to be properly processed by NLP tools.
117

Mean Length of Utterance and Developmental Sentence Scoring in the Analysis of Children's Language Samples

Chamberlain, Laurie Lynne 01 June 2016 (has links)
Developmental Sentence Scoring (DSS) is a standardized language sample analysis procedure that uses complete sentences to evaluate and score a child’s use of standard American-English grammatical rules. Automated DSS software can potentially increase efficiency and decrease the time needed for DSS analysis. This study examines the accuracy of one automated DSS software program, DSSA Version 2.0, compared to manual DSS scoring on previously collected language samples from 30 children between the ages of 2;5 and 7;11 (years;months). The overall accuracy of DSSA 2.0 was 86%. Additionally, the present study sought to determine the relationship between DSS, DSSA Version 2.0, the mean length of utterance (MLU), and age. MLU is a measure of linguistic ability in children, and is a widely used indicator of language impairment. This study found that MLU and DSS are both strongly correlated with age and these correlations are statistically significant, r = .605, p < .001 and r = .723, p < .001, respectively. In addition, MLU and DSSA were also strongly correlated with age and these correlations were statistically significant, r = .605, p < .001 and r = .669, p < .001, respectively. The correlation between MLU and DSS was high and statistically significant r = .873, p < .001, indicating that the correlation between MLU and DSS is not simply an artifact of both measures being correlated with age. Furthermore, the correlation between MLU and DSSA was high, r = .794, suggesting that the correlation between MLU and DSSA is not simply an artifact of both variables being correlated with age. Lastly, the relationship between DSS and age while controlling for MLU was moderate, but still statistically significant r = .501, p = .006. Therefore, DSS appears to add information beyond MLU.
118

On Construction of a Manual for Item 27 on the SCTi-MAP

Zavarella, Cristi A. 16 June 2009 (has links)
No description available.
119

An Analysis of Sentence Repetitions in a Single-Talker Interference Task

Parlette, Hilary 28 April 2015 (has links)
No description available.
120

Children's Sentence Comprehension: The Influence of Working Memory on Lexical Retrieval During Complex Sentence Processing

Finney, Mianisha C. 19 September 2016 (has links)
No description available.

Page generated in 0.0613 seconds