Spelling suggestions: "subject:"characterbased"" "subject:"characters.based""
1 |
A probabilistic approach for Chinese information retrieval : theory, analysis and experimentsHuang, Xiangji January 2001 (has links)
No description available.
|
2 |
Record LinkageLarsen, Stasha Ann Bown 11 December 2013 (has links) (PDF)
This document explains the use of different metrics involved with record linkage. There are two forms of record linkage: deterministic and probabilistic. We will focus on probabilistic record linkage used in merging and updating two databases. Record pairs will be compared using character-based and phonetic-based similarity metrics to determine at what level they match. Performance measures are then calculated and Receiver Operating Characteristic (ROC) curves are formed. Finally, an economic model is applied that returns the optimal tolerance level two databases should use to determine a record pair match in order to maximize profit.
|
3 |
Spelling Normalization of English Student WritingsHONG, Yuchan January 2018 (has links)
Spelling normalization is the task to normalize non-standard words into standard words in texts, resulting in a decrease in out-of-vocabulary (OOV) words in texts for natural language processing (NLP) tasks such as information retrieval, machine translation, and opinion mining, improving the performance of various NLP applications on normalized texts. In this thesis, we explore different methods for spelling normalization of English student writings including traditional Levenshtein edit distance comparison, phonetic similarity comparison, character-based Statistical Machine Translation (SMT) and character-based Neural Machine Translation (NMT) methods. An important improvement of our implementation is that we develop an approach combining Levenshtein edit distance and phonetic similarity methods with added components of frequency count and compound splitting and it is evaluated as a best approach with 0.329% accuracy improvement and 63.63% error reduction on the original unnormalized test set.
|
4 |
Znakově-orientované metody DNA barcodingu / Character-based methods for DNA barcodingKalianková, Kateřina January 2015 (has links)
This work deals with character-based DNA barcoding. DNA barcoding and character-based DNA barcoding methods are described in the introduction. Another part contains information of method CAOS (Characteristic Attributes Organization) and method BLOG (Barcoding with LOGic). Programs are described in the practical part. The end contains results.
|
5 |
Znakově-orientované metody DNA barcodingu / Character-based methods for DNA barcodingKalianková, Kateřina January 2016 (has links)
This work deals with character-based DNA barcoding. DNA barcoding and character-based DNA barcoding methods are described in the introduction. Another part contains information of method CAOS (Characteristic Attributes Organization), method BLOG (Barcoding with LOGic) and method BLAST. Programs are described in the practical part. The end contains results.
|
6 |
Spelling Normalisation and Linguistic Analysis of Historical Text for Information ExtractionPettersson, Eva January 2016 (has links)
Historical text constitutes a rich source of information for historians and other researchers in humanities. Many texts are however not available in an electronic format, and even if they are, there is a lack of NLP tools designed to handle historical text. In my thesis, I aim to provide a generic workflow for automatic linguistic analysis and information extraction from historical text, with spelling normalisation as a core component in the pipeline. In the spelling normalisation step, the historical input text is automatically normalised to a more modern spelling, enabling the use of existing taggers and parsers trained on modern language data in the succeeding linguistic analysis step. In the final information extraction step, certain linguistic structures are identified based on the annotation labels given by the NLP tools, and ranked in accordance with the specific information need expressed by the user. An important consideration in my implementation is that the pipeline should be applicable to different languages, time periods, genres, and information needs by simply substituting the language resources used in each module. Furthermore, the reuse of existing NLP tools developed for the modern language is crucial, considering the lack of linguistically annotated historical data combined with the high variability in historical text, making it hard to train NLP tools specifically aimed at analysing historical text. In my evaluation, I show that spelling normalisation can be a very useful technique for easy access to historical information content, even in cases where there is little (or no) annotated historical training data available. For the specific information extraction task of automatically identifying verb phrases describing work in Early Modern Swedish text, 91 out of the 100 top-ranked instances are true positives in the best setting.
|
7 |
Identifica??o de esp?cies de carn?voros (mammalia, carn?vora) utilizando sequ?ncias de DNA e sua aplica??o em amostras n?o-invasivasChaves, Paulo Bomfim 20 March 2008 (has links)
Submitted by PPG Zoologia (zoologia-pg@pucrs.br) on 2018-05-18T17:22:44Z
No. of bitstreams: 1
dissertacao_mestrado_final_paulochaves.pdf: 4426171 bytes, checksum: 8be6ef944f497d1a9518754ebfbc27c1 (MD5) / Approved for entry into archive by Sheila Dias (sheila.dias@pucrs.br) on 2018-05-28T12:19:27Z (GMT) No. of bitstreams: 1
dissertacao_mestrado_final_paulochaves.pdf: 4426171 bytes, checksum: 8be6ef944f497d1a9518754ebfbc27c1 (MD5) / Made available in DSpace on 2018-05-28T12:35:54Z (GMT). No. of bitstreams: 1
dissertacao_mestrado_final_paulochaves.pdf: 4426171 bytes, checksum: 8be6ef944f497d1a9518754ebfbc27c1 (MD5)
Previous issue date: 2008-03-20 / Sequ?ncias de DNA usadas na identifica??o de material biol?gico t?m alcan?ado consider?vel
popularidade nos ?ltimos anos, especialmente no contexto dos c?digos de barras de DNA. Aferir a
esp?cie de origem em amostras de pelos, penas, peles e particularmente fezes ? um passo
fundamental para quem estuda a ecologia e evolu??o de diversos animais com este tipo de amostra.
Este ? o caso em carn?voros, cujos h?bitos furtivos e baixas densidades populacionais de algumas
esp?cies evidenciam a import?ncia de estudos baseados em amostras n?o-invasivas. Entretanto a
atual escassez de ensaios padronizados de identifica??o de carn?voros freq?entemente dificulta a
aplica??o dessas amostras em larga escala e compara??es de resultados entre diferentes localidades.
No presente estudo n?s avaliamos dois segmentos curtos (<250 pb) de DNA mitochondrial (mtDNA)
localizados nos genes ATP sintase 6 e citocromo oxidase I com potencial de servirem como
marcadores-padr?o para identifica??o de carn?voros. Entre um e 11 indiv?duos de 66 esp?cies de
carn?voros foram seq?enciados para um ou ambos os segmentos do mtDNA e analisados usando tr?s
diferentes m?todos (?rvore de dist?ncia, dist?ncia gen?tica e an?lise de caracteres). Em geral,
indiv?duos conspec?ficos apresentaram menor dist?ncia gen?tica entre si do que em rela??o a outras
esp?cies, formando agrupamentos monofil?ticos. Exce??es foram algumas esp?cies que divergiram
recentemente, algumas das quais ainda puderam ser identificadas pelo m?todo de caracteres,
hapl?tipos esp?cie-espec?ficos, ou reduzindo a abrang?ncia geogr?fica das compara??es (restringindo
a an?lise a uma regi?o zoogeogr?fica). An?lises in silico, usando um segmento curto do citocromo b
freq?entemente empregado em carn?voros, tamb?m foram realizadas para comparar o desempenho
deste segmento em rela??o aos outros dois propostos. N?s ent?o testamos o desempenho destes
segmentos na identifica??o de fezes de carn?voros por meio de tr?s estudos de caso: (i) fezes de
felinos de zool?gico, objetivando-se verificar o potencial de contamina??o das seq?encias com DNA da
presa (coelho); (ii) fezes coletadas no Cerrado brasileiro contendo restos de presas (p?los, ossos,
penas), supostamente proveniente de lobo-guar?, objetivando-se investigar a efici?ncia de identifica??o
do predador e ocorr?ncia de interfer?ncia do DNA da presa na identifica??o; e (iii) fezes coletadas em
uma reserva na Mata Atl?ntica, tamb?m com o objetivo de avaliar a efici?ncia de identifica??o. Apesar
de diferen?as em alguns aspectos de sua performance, nossos resultados indicam que os dois
segmentos propostos t?m um bom potencial de servir como marcadores moleculares eficientes para
identifica??o acurada de amostras de carn?voros ao n?vel de esp?cie. / DNA sequences for species-level identification of biological materials have achieved considerable
popularity in the last few years, especially in the context of the DNA barcoding initiative. Species
assignment of biological samples such as hairs, feathers, pelts and particularly faeces is a crucial step
for those interested in studying ecology and evolution of many species with these samples. This is
especially the case for carnivores, whose elusive habits and low densities highlight the importance of
studies based on noninvasive samples. However, the current lack of standardized assays for
carnivore identification often poses challenges to the large-scale application of this approach, as well
as the cross-comparison of results among sites. Here we evaluate the potential of two short (<250 pb)
mitochondrial DNA (mtDNA) segments located within the genes ATP synthase 6 and cytochrome
oxidase I as standardized markers for carnivore identification. Between one and eleven individuals of
66 carnivore species were sequenced for one or both of these mtDNA segments and analyzed using
three different approaches (tree-based, distance-based and character-based), in conjunction with
sequences retrieved from public databases. In most cases, conspecific individuals had lower genetic
distances from each other relative to other species, resulting in diagnosable monophyletic clusters.
Notable exceptions were the more recently diverged species, some of which could still be identified
using diagnostic character attributes, species-specific haplotypes, or by reducing the geographic
scope of the comparison (restricting the analysis to a single zoogeographic region). Additional in silico
analyses using a short cytochrome b segment frequently employed in carnivore identification were
also performed aiming to compare performance to that of our two focal markers. We then tested the
performance of these segments in the identification of carnivore faeces via three case studies: (i) felid
faeces collected in a controlled zoo experiment, aimed at assessing whether DNA from rabbit prey
would contaminate the resulting sequences; (ii) field-collected faeces from the Brazilian Cerrado
presumed to be from maned wolves and containing prey remains (hairs, bones, feathers), aimed at
investigating the efficiency of predator identification and occurrence of prey DNA interference; and (iii)
field-collected scats from an Atlantic Forest study site, also addressing the issue of PCR success rate
and identification efficiency. In spite of some relevant differences in some aspects of their
performance, our results indicate that both of our focal segments have a good potential to serve as
efficient molecular markers for accurate species-level identification of carnivore samples.
|
Page generated in 0.0406 seconds