Spelling suggestions: "subject:"similarity"" "subject:"imilarity""
231 |
Molecular similarity and xenobiotic metabolismAdams, Samuel E. January 2010 (has links)
MetaPrint2D, a new software tool implementing a data-mining approach for predicting sites of xenobiotic metabolism has been developed. The algorithm is based on a statistical analysis of the occurrences of atom centred circular fingerprints in both substrates and metabolites. This approach has undergone extensive evaluation and been shown to be of comparable accuracy to current best-in-class tools, but is able to make much faster predictions, for the first time enabling chemists to explore the effects of structural modifications on a compound’s metabolism in a highly responsive and interactive manner. MetaPrint2D is able to assign a confidence score to the predictions it generates, based on the availability of relevant data and the degree to which a compound is modelled by the algorithm. In the course of the evaluation of MetaPrint2D a novel metric for assessing the performance of site of metabolism predictions has been introduced. This overcomes the bias introduced by molecule size and the number of sites of metabolism inherent to the most commonly reported metrics used to evaluate site of metabolism predictions. This data mining approach to site of metabolism prediction has been augmented by a set of reaction type definitions to produce MetaPrint2D-React, enabling prediction of the types of transformations a compound is likely to undergo and the metabolites that are formed. This approach has been evaluated against both historical data and metabolic schemes reported in a number of recently published studies. Results suggest that the ability of this method to predict metabolic transformations is highly dependent on the relevance of the training set data to the query compounds. MetaPrint2D has been released as an open source software library, and both MetaPrint2D and MetaPrint2D-React are available for chemists to use through the Unilever Centre for Molecular Science Informatics website.
|
232 |
Evaluating Text SegmentationFournier, Christopher January 2013 (has links)
This thesis investigates the evaluation of automatic and manual text segmentation. Text segmentation is the process of placing boundaries within text to create segments according to some task-dependent criterion. An example of text segmentation is topical segmentation, which aims to segment a text according to the subjective definition of what constitutes a topic. A number of automatic segmenters have been created to perform this task, and the question that this thesis answers is how to select the best automatic segmenter for such a task. This requires choosing an appropriate segmentation evaluation metric, confirming the reliability of a manual solution, and then finally employing an evaluation methodology that can select the automatic segmenter that best approximates human performance.
A variety of comparison methods and metrics exist for comparing segmentations (e.g., WindowDiff, Pk), and all save a few are able to award partial credit for nearly missing a boundary. Those comparison methods that can award partial credit unfortunately lack consistency, symmetricity, intuition, and a host of other desirable qualities. This work proposes a new comparison method named boundary similarity (B) which is based upon a new minimal boundary edit distance to compare two segmentations. Near misses are frequent, even among manual segmenters (as is exemplified by the low inter-coder agreement reported by many segmentation studies). This work adapts some inter-coder agreement coefficients to award partial credit for near misses using the new metric proposed herein, B.
The methodologies employed by many works introducing automatic segmenters evaluate them simply in terms of a comparison of their output to one manual segmentation of a text, and often only by presenting nothing other than a series of mean performance values (along with no standard deviation, standard error, or little if any statistical hypothesis testing). This work asserts that one segmentation of a text cannot constitute a “true” segmentation; specifically, one manual segmentation is simply one sample of the population of all possible segmentations of a text and of that subset of desirable segmentations. This work further asserts that an adapted inter-coder agreement statistics proposed herein should be used to determine the reproducibility and reliability of a coding scheme and set of manual codings, and then statistical hypothesis testing using the specific comparison methods and methodologies demonstrated herein should be used to select the best automatic segmenter.
This work proposes new segmentation evaluation metrics, adapted inter-coder agreement coefficients, and methodologies. Most important, this work experimentally compares the state-or-the-art comparison methods to those proposed herein upon artificial data that simulates a variety of scenarios and chooses the best one (B). The ability of adapted inter-coder agreement coefficients, based upon B, to discern between various levels of agreement in artificial and natural data sets is then demonstrated. Finally, a contextual evaluation of three automatic segmenters is performed using the state-of-the art comparison methods and B using the methodology proposed herein to demonstrate the benefits and versatility of B as opposed to its counterparts.
|
233 |
Contribution au recalage d'images de modalités différentes à travers la mise en correspondance de nuages de points : Application à la télédétectionPalmann, Christophe 23 June 2011 (has links)
L'utilisation d'images de modalités différentes est très répandue dans la résolution de problèmes liés aux applications de la télédétection. La raison principale est que chaque image d'une certaine modalité contient des informations spécifiques qui peuvent être intégrées en un modèle unique, afin d'améliorer notre connaissance à propos d'une scène spécifique. A cause du grand volume de données disponibles, ces intégrations doivent être réalisées de manière automatique. Cependant, un problème apparaît dès les premiers stades du processus : la recherche, dans des images de modalités différentes, de régions en correspondance. Ce problème est difficile à résoudre car la décision de regrouper des régions doit nécessairement reposer sur la part d'information commune aux images, même si les modalités sont différentes. Dans cette thèse, nous nous proposons donc d'apporter une contribution à la résolution de ce problème / The use of several images of various modalities has been proved to be quite useful for solving problems arising in many different applications of remote sensing. The main reason is that each image of a given modality conveys its own part of specific information, which can be integrated into a single model in order to improve our knowledge on a given area. With the large amount of available data, any task of integration must be performed automatically. At the very first stage of an automated integration process, a rather direct problem arises : given a region of interest within a first image, the question is to find out its equivalent within a second image acquired over the same scene but with a different modality. This problem is difficult because the decision to match two regions must rely on the common part of information supported by the two images, even if their modalities are quite different. This is the problem that we wish to address in this thesis
|
234 |
Tratamento de condições especiais para busca por similaridade em bancos de dados complexos / Treatment of special conditional for similarity searching in complex data basesDaniel dos Santos Kaster 23 April 2012 (has links)
A quantidade de dados complexos (imagens, vídeos, séries temporais e outros) tem crescido rapidamente. Dados complexos são adequados para serem recuperados por similaridade, o que significa definir consultas de acordo com um dado critério de similaridade. Além disso, dados complexos usualmente são associados com outras informações, geralmente de tipos de dados convencionais, que devem ser utilizadas em conjunto com operações por similaridade para responder a consultas complexas. Vários trabalhos propuseram técnicas para busca por similaridade, entretanto, a maioria das abordagens não foi concebida para ser integrada com um SGBD, tratando consultas por similaridade como operações isoladas, disassociadas do processador de consultas. O objetivo principal desta tese é propor alternativas algébricas, estruturas de dados e algoritmos para permitir um uso abrangente de consultas por similaridade associadas às demais operações de busca disponibilizadas pelos SGBDs relacionais e executar essas consultas compostas eficientemente. Para alcançar este objetivo, este trabalho apresenta duas contribuições principais. A primeira contribuição é a proposta de uma nova operação por similaridade, chamada consulta aos k-vizinhos mais próximos estendida com condições (ck-NNq), que estende a consulta aos k-vizinhos mais próximos (k-\'NN SUB. q\') de maneira a fornecer uma condição adicional, modificando a semântica da operação. A operação proposta permite representar consultas demandadas por várias aplicações, que não eram capazes de ser representadas anteriormente, e permite homogeneamente integrar condições de filtragem complementares à k-\'NN IND.q\'. A segunda contribuição é o desenvolvimento do FMI-SiR (user-defined Features, Metrics and Indexes for Similarity Retrieval ), que é um módulo de banco de dados que permite executar consultas por similaridade integradas às demais operações do SGBD. O módulo permite incluir métodos de extração de características e funções de distância definidos pelo usuário no núcleo do gerenciador de banco de dados, fornecendo grande exibilidade, e também possui um tratamento especial para imagens médicas. Além disso, foi verificado através de experimentos sobre bancos de dados reais que a implementação do FMI-SiR sobre o SGBD Oracle é capaz de consultar eficientemente grandes bancos de dados complexos / The amount of complex data (images, videos, time series and others) has been growing at a very fast pace. Complex data are well-suited to be searched by similarity, which means to define queries according to a given similarity criterion. Moreover, complex data are usually associated with other information, usually of conventional data types, which must be employed in conjunction with similarity operations to answer complex queries. Several works proposed techniques for similarity searching, however, the majority of the approaches was not conceived to be integrated into a DBMS, treating similarity queries as isolated operations detached from the query processor. The main objective of this thesis is to propose algebraic alternatives, data structures and algorithms to allow a wide use of similarity queries associated to the search operations provided by the relational DBMSs and to execute such composite queries eficiently. To reach this goal, this work presents two main contributions. The first contribution is the proposal of a new similarity operation, called condition-extended k-Nearest Neighbor query (ck-\'NN IND. q\'), that extends the k-Nearest Neighbor query (k-\'NN IND. q\') to provide an additional conditio modifying the operation semantics. The proposed operation allows representing queries required by several applications, which were not able to be represented before, and allows to homogeneously integrate complementary filtering conditions to the k-\'NN IND. q\'. The second contribution is the development of the FMI-SiR(user-defined Features, Metrics and Indexes for Similarity Retrieval), which is a database module that allows executing similarity queries integrated to the DBMS operations. The module allows including user-defined feature extraction methods and distance functions into the database core, providing great exibility, and also has a special treatment for medical images. Moreover, it was verified through experiments over real datasets that the implementation of FMI-SiR over the Oracle DBMS is able to eficiently search very large complex databases
|
235 |
Exploring the Impacts of Fashion Blog Type and Message Type on Female Consumer Response Towards the BrandMelton, Rebecca 12 1900 (has links)
The current study examines the influences of blog type and blog message type on consumers’ perceptions of brand credibility and brand similarity. Additionally, the study seeks to understand the interaction effects of blog type and message type on brand credibility and brand similarity and on consumer engagement with a blog. The findings reveal that message type, specifically product message, is an important consideration when marketers want to illustrate similarities between the brand and consumers. Additionally, it was found that product messages should be considered when encouraging consumer engagement with a blog. However, blog type did not have an effect on consumer perceptions of brand credibility and similarity or consumer engagement.
|
236 |
Lícování skici objektu s jeho snímkem / Objects outline and image matchingKvasnička, Petr January 2018 (has links)
The aim of the diploma thesis is to get acquainted with problem of detecting the object with sketch, diagram or contour, which should represent the object to be compared. The thesis deals with how to solve this problem. Part of the thesis contains theoretical methods that manage to evaluate the degree of similarity between the unknown object and less accurate sketch. Each of these methods includes customized edits from the viewpoint of image pre-processing. The thesis deals with some methods in more detail. Selected methods are implemented and tested on the created database. In the last step of the thesis these realized methods are compared to each other.
|
237 |
Exploring Frameworks for Rapid Visualization of Viral Proteins Common for a Given HostSubramaniam, Rajesh January 2019 (has links)
Viruses are unique organisms that lack the protein machinery necessary for its propagation (like polymerase) yet possess other proteins that facilitate its propagation (like host cell anchoring proteins). This study explores seven different frameworks to assist rapid visualization of proteins that are common to viruses residing in a given host. The proposed frameworks rely only on protein sequence information. It was found that the sequence similarity-based framework with an associated profile hidden Markov model was a better tool to assist visualization of proteins common to a given host than other proposed frameworks based only on amino acid composition or other amino acid properties. The lack of knowledge of profile hidden Markov models for many protein structures limit the utility of the proposed protein sequence similarity-based framework. The study concludes with an attempt to extrapolate the utility of the proposed framework to predict viruses that may pose potential human health risks.
|
238 |
Porovnávání 3D objektů v jazyce JAVA / Comparison of 3D objects in JAVA languageZapletal, Tomáš January 2013 (has links)
The main aim of the thesis is loading of 3D models in different file formats and its displaying, creating of slices from 3D models and vice versa making 3D models from slices, quantify inaccuracy of anti-aliasing and other formatting and investigate similarity of original and edited models. Everything in Java programming language and its extensions for 3D object tasks. Thesis builds on earlier project „3D shape from MRI“.
|
239 |
Automatická anotace obrazu / Automatic image annotationHegmon, Jiří January 2013 (has links)
Recognition and comparison of image is one of the main problems and area of the field of computer vision. This thesis adds to these two issues the third, the recognition image semantics, so called annotations or labels. This work uses the knowledge of methods of recognizing the similarity of images to create a tool that is able based on training dataset of images and annotations, create a group most likely annotation for the test set of images. This work presents several types of test datasets suitable for the detection of annotation information for images. Subsequently, best set with the necessary training dataset size and enough information about annotations is selected. Based on this training dataset algorithm is designed for easy loading test set without large demands on computer performance. Evaluation of annotation information is done based on different similarity algorithms. At the beginning of this work was to use a simple, but not very effective method of MSE and comparison of color histograms, but gradually it was necessary to move to using more advanced methods (such as Tamura, Gabor, CEDD nebo různé druhy hostistogramů). The results of this comparison are then taken to evaluate the likelihood of the annotation for the image specified test set. The last part is an evaluation of the accuracy of annotation based on information from the test set.
|
240 |
A framework for data loss prevention using document semantic signatureAlhindi, Hanan 22 November 2019 (has links)
The theft and exfiltration of sensitive data (e.g., state secrets, trade secrets, company records, etc.) represent one of the most damaging threats that can be carried out by malicious insiders against institutions and organizations because this could seriously diminish the confidentiality, integrity, and availability of the organization’s data. Data protection and insider threat detection and prevention are significant steps for any organization to enhance its internal security. In the last decade, data loss prevention (DLP) has emerged as one of the key mechanisms currently used by organizations to detect and block unauthorized data transfer from the organization perimeter. However, existing DLP approaches face several practical challenges, such as their relatively low accuracy that in turn affects their prevention capability. Also, current DLP approaches are ineffective in handling unstructured data or searching and comparing content semantically when confronted with evasion tactics where sensitive content is rewritten without changing its semantic. In the current dissertation, we present a new DLP model that tracks sensitive data using a summarized version of the content semantic called document semantic signature (DSS). The DSS can be updated dynamically as the protected content change and it is resilient against evasion tactics, such as content rewriting. We use domain specific ontologies to capture content semantics and track conceptual similarity and relevancy using adequate metrics to identify data leak from sensitive documents. The evaluation of the DSS model on two public datasets of different domain of interests achieved very encouraging results in terms of detection effectiveness. / Graduate
|
Page generated in 0.0422 seconds