Global ETD Search

1	A probabilistic approach for Chinese information retrieval : theory, analysis and experiments Huang, Xiangji January 2001 (has links) No description available. 020 Character based systems
2	Record Linkage Larsen, Stasha Ann Bown 11 December 2013 (has links) (PDF) This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit. Probabilistic record linkage Character-based similarity metrics Phonetic-based similarity metrics ROC curves Mathematics
3	Spelling Normalization of English Student Writings HONG, Yuchan January 2018 (has links) Spelling normalization is the task to normalize non-standard words into standard words in texts, resulting in a decrease in out-of-vocabulary (OOV) words in texts for natural language processing (NLP) tasks such as information retrieval, machine translation, and opinion mining, improving the performance of various NLP applications on normalized texts. In this thesis, we explore diﬀerent methods for spelling normalization of English student writings including traditional Levenshtein edit distance comparison, phonetic similarity comparison, character-based Statistical Machine Translation (SMT) and character-based Neural Machine Translation (NMT) methods. An important improvement of our implementation is that we develop an approach combining Levenshtein edit distance and phonetic similarity methods with added components of frequency count and compound splitting and it is evaluated as a best approach with 0.329% accuracy improvement and 63.63% error reduction on the original unnormalized test set. spelling normalization English student writings phonetic similarity comparison Levenshtein edit distance General Language Studies and Linguistics
4	Znakově-orientované metody DNA barcodingu / Character-based methods for DNA barcoding Kalianková, Kateřina January 2015 (has links) This work deals with character-based DNA barcoding. DNA barcoding and character-based DNA barcoding methods are described in the introduction. Another part contains information of method CAOS (Characteristic Attributes Organization) and method BLOG (Barcoding with LOGic). Programs are described in the practical part. The end contains results.
5	Znakově-orientované metody DNA barcodingu / Character-based methods for DNA barcoding Kalianková, Kateřina January 2016 (has links) This work deals with character-based DNA barcoding. DNA barcoding and character-based DNA barcoding methods are described in the introduction. Another part contains information of method CAOS (Characteristic Attributes Organization), method BLOG (Barcoding with LOGic) and method BLAST. Programs are described in the practical part. The end contains results.
6	Spelling Normalisation and Linguistic Analysis of Historical Text for Information Extraction Pettersson, Eva January 2016 (has links) Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting. NLP for historical text spelling normalisation digital humanities information extraction SMT Levenshtein edit distance language technology computational linguistics
7	Identifica??o de esp?cies de carn?voros (mammalia, carn?vora) utilizando sequ?ncias de DNA e sua aplica??o em amostras n?o-invasivas Chaves, Paulo Bomfim 20 March 2008 (has links) Submitted by PPG Zoologia (zoologia-pg@pucrs.br) on 2018-05-18T17:22:44Z No. of bitstreams: 1 dissertacao_mestrado_final_paulochaves.pdf: 4426171 bytes, checksum: 8be6ef944f497d1a9518754ebfbc27c1 (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-05-28T12:19:27Z (GMT) No. of bitstreams: 1 dissertacao_mestrado_final_paulochaves.pdf: 4426171 bytes, checksum: 8be6ef944f497d1a9518754ebfbc27c1 (MD5) / Made available in DSpace on 2018-05-28T12:35:54Z (GMT). No. of bitstreams: 1 dissertacao_mestrado_final_paulochaves.pdf: 4426171 bytes, checksum: 8be6ef944f497d1a9518754ebfbc27c1 (MD5) Previous issue date: 2008-03-20 / Sequ?ncias de DNA usadas na identifica??o de material biol?gico t?m alcan?ado consider?vel popularidade nos ?ltimos anos, especialmente no contexto dos c?digos de barras de DNA. Aferir a esp?cie de origem em amostras de pelos, penas, peles e particularmente fezes ? um passo fundamental para quem estuda a ecologia e evolu??o de diversos animais com este tipo de amostra. Este ? o caso em carn?voros, cujos h?bitos furtivos e baixas densidades populacionais de algumas esp?cies evidenciam a import?ncia de estudos baseados em amostras n?o-invasivas. Entretanto a atual escassez de ensaios padronizados de identifica??o de carn?voros freq?entemente dificulta a aplica??o dessas amostras em larga escala e compara??es de resultados entre diferentes localidades. No presente estudo n?s avaliamos dois segmentos curtos (<250 pb) de DNA mitochondrial (mtDNA) localizados nos genes ATP sintase 6 e citocromo oxidase I com potencial de servirem como marcadores-padr?o para identifica??o de carn?voros. Entre um e 11 indiv?duos de 66 esp?cies de carn?voros foram seq?enciados para um ou ambos os segmentos do mtDNA e analisados usando tr?s diferentes m?todos (?rvore de dist?ncia, dist?ncia gen?tica e an?lise de caracteres). Em geral, indiv?duos conspec?ficos apresentaram menor dist?ncia gen?tica entre si do que em rela??o a outras esp?cies, formando agrupamentos monofil?ticos. Exce??es foram algumas esp?cies que divergiram recentemente, algumas das quais ainda puderam ser identificadas pelo m?todo de caracteres, hapl?tipos esp?cie-espec?ficos, ou reduzindo a abrang?ncia geogr?fica das compara??es (restringindo a an?lise a uma regi?o zoogeogr?fica). An?lises in silico, usando um segmento curto do citocromo b freq?entemente empregado em carn?voros, tamb?m foram realizadas para comparar o desempenho deste segmento em rela??o aos outros dois propostos. N?s ent?o testamos o desempenho destes segmentos na identifica??o de fezes de carn?voros por meio de tr?s estudos de caso: (i) fezes de felinos de zool?gico, objetivando-se verificar o potencial de contamina??o das seq?encias com DNA da presa (coelho); (ii) fezes coletadas no Cerrado brasileiro contendo restos de presas (p?los, ossos, penas), supostamente proveniente de lobo-guar?, objetivando-se investigar a efici?ncia de identifica??o do predador e ocorr?ncia de interfer?ncia do DNA da presa na identifica??o; e (iii) fezes coletadas em uma reserva na Mata Atl?ntica, tamb?m com o objetivo de avaliar a efici?ncia de identifica??o. Apesar de diferen?as em alguns aspectos de sua performance, nossos resultados indicam que os dois segmentos propostos t?m um bom potencial de servir como marcadores moleculares eficientes para identifica??o acurada de amostras de carn?voros ao n?vel de esp?cie. / DNA sequences for species-level identification of biological materials have achieved considerable popularity in the last few years, especially in the context of the DNA barcoding initiative. Species assignment of biological samples such as hairs, feathers, pelts and particularly faeces is a crucial step for those interested in studying ecology and evolution of many species with these samples. This is especially the case for carnivores, whose elusive habits and low densities highlight the importance of studies based on noninvasive samples. However, the current lack of standardized assays for carnivore identification often poses challenges to the large-scale application of this approach, as well as the cross-comparison of results among sites. Here we evaluate the potential of two short (<250 pb) mitochondrial DNA (mtDNA) segments located within the genes ATP synthase 6 and cytochrome oxidase I as standardized markers for carnivore identification. Between one and eleven individuals of 66 carnivore species were sequenced for one or both of these mtDNA segments and analyzed using three different approaches (tree-based, distance-based and character-based), in conjunction with sequences retrieved from public databases. In most cases, conspecific individuals had lower genetic distances from each other relative to other species, resulting in diagnosable monophyletic clusters. Notable exceptions were the more recently diverged species, some of which could still be identified using diagnostic character attributes, species-specific haplotypes, or by reducing the geographic scope of the comparison (restricting the analysis to a single zoogeographic region). Additional in silico analyses using a short cytochrome b segment frequently employed in carnivore identification were also performed aiming to compare performance to that of our two focal markers. We then tested the performance of these segments in the identification of carnivore faeces via three case studies: (i) felid faeces collected in a controlled zoo experiment, aimed at assessing whether DNA from rabbit prey would contaminate the resulting sequences; (ii) field-collected faeces from the Brazilian Cerrado presumed to be from maned wolves and containing prey remains (hairs, bones, feathers), aimed at investigating the efficiency of predator identification and occurrence of prey DNA interference; and (iii) field-collected scats from an Atlantic Forest study site, also addressing the issue of PCR success rate and identification efficiency. In spite of some relevant differences in some aspects of their performance, our results indicate that both of our focal segments have a good potential to serve as efficient molecular markers for accurate species-level identification of carnivore samples. C?digo de Barras de DNA An?lise de Caracteres COI ATP6 Fezes Identifica??o de Esp?cies DNA Barcoding Character-Based COI ATP6 Faeces Species Identification CIENCIAS BIOLOGICAS::ZOOLOGIA

1

Page generated in 0.0561 seconds