Spelling suggestions: "subject:"similarity"" "subject:"imilarity""
101 |
Investigating the Neural Representations of Taste and HealthLonderee, Allison M. 23 October 2019 (has links)
No description available.
|
102 |
A Hierarchical Approach to the Analysis of Intermediary Structures Within the Modified Contour Reduction AlgorithmWallentinsen, Kristen M 01 January 2013 (has links) (PDF)
Robert Morris’s (1993) Contour-Reduction Algorithm—later modified by Rob Schultz (2008) and hereafter referred to as the Modified Contour Reduction Algorithm (MCRA)—recursively prunes a contour down to its prime: its first, last, highest, and lowest contour pitches. The algorithm follows a series of steps in two stages. The first stage prunes c-pitches that are neither local high points (maxima) nor low points (minima). The second stage prunes pitches that are neither maxima within the max-list (pitches that were maxima in the first stage) nor minima within the min-list (pitches that were minima in the first stage). This second stage is repeated until no more pitches can be pruned. What remains is the contour’s prime.
By examining how the reduction process is applied to a given c-seg, one can discern a hierarchy of levels that indicates new types of relationships between them. In this thesis, I aim to highlight relationships between c-segs by analyzing the distinct subsets created by the different levels obtained by the applying the MCRA. These subsets, or sub-csegs, can be used to delineate further relationships between c-segs beyond their respective primes. As such, I posit a new method in which each sub-cseg produced by the MCRA is examined to create a system of hierarchical comparison that measures relationships between c-segs, using sub-cseg equivalence to calculate an index value representing degrees of similarity. The similarity index compares the number of levels at which two c-segs are similar to the total number of comparable levels.
I then implement this analytical method by examining the similarities and differences between thirteen mode-2 Alleluias from the Liber Usualis that share the same alleluia and jubilus. The verses of these thirteen chants are highly similar in melodic content in that they all have the same prime, yet they are not fully identical. I will examine the verses of these chants using my method of comparison, analyzing intermediary sub-csegs between these 13 chants in order to reveal differences in the way the primes that govern their basic structures are composed out.
|
103 |
Using the SKOS Model for Standardizing Semantic Similarity and Relatedness Measures for Ontological TerminologiesArockiasamy, Savarimuthu 14 August 2009 (has links)
No description available.
|
104 |
IDSF II: Integrated Decision Support Framework and its Application for Dispatching Policy Based on Part SimilarityGuo, Jia 18 April 2012 (has links)
No description available.
|
105 |
Feature extraction and similarity-based analysis for proteome and genome databasesOzturk, Ozgur 20 September 2007 (has links)
No description available.
|
106 |
Rough Sets, Similarity, and Optimal ApproximationsLenarcic, Adam 11 1900 (has links)
Rough sets have been studied for over 30 years, and the basic concepts of lower and upper approximations have been analysed in detail, yet nowhere has the idea of an `optimal' rough approximation been proposed or investigated. In this thesis, several concepts are used in proposing a generalized definition: measures, rough sets, similarity, and approximation are each surveyed. Measure Theory allows us to generalize the definition of the `size' for a set. Rough set theory is the foundation that we use to define the term `optimal' and what constitutes an `optimal rough set'. Similarity indexes are used to compare two sets, and determine how alike or different they are. These sets can be rough or exact. We use similarity indexes to compare sets to intermediate approximations, and isolate the optimal rough sets. The historical roots of these concepts are explored, and the foundations are formally defined. A definition of an optimal rough set is proposed, as well as a simple algorithm to find it. Properties of optimal approximations such as minimum, maximum, and symmetry, are explored, and examples are provided to demonstrate algebraic properties and illustrate the mechanics of the algorithm. / Thesis / Doctor of Philosophy (PhD) / Until now, in the context of rough sets, only an upper and lower approximation had been proposed. Here, an concept of an optimal/best approximation is proposed, and a method to obtain it is presented.
|
107 |
Detecting code duplications in the NPM communityLiu, Hanwen 09 September 2021 (has links)
In the modern software development process, it has become a very mainstream practice to build software projects on top of third-party packages to simplify the development process. In this development method, it is quite common to copy existing code or files in other libraries instead of making regular calls. Although this approach can reduce the project's dependence on other libraries and make the project more streamlined, it also causes difficulties in maintenance and understanding. The ignorance of code duplication by third-party library community can even be exploited for malicious purpose, such as typo-squatting attack. This paper serves as a starting point to analyze the growing code duplication issues surrounding third-party open source packages, and what is the root cause of code duplication. In this paper, I conducted code duplication-related research based on some popular packages in the third-party open source packages community, the NPM community, by using the tokenizer tool and the code comparison tool to compute the code similarity, quantitatively analyzed the prevalence of code duplication in the NPM community, and did some related experiments based on this similarity. In the experiments, I found that code duplication is very common in NPM community: 17.1% of all the files have 1-93 similar file in other package when the threshold of similar file is set to 0.5. 29.3% of all the packages has at least one "similar package" when the threshold of similar package is set to 0.5. In all the 951 similar package pairs, 33.9% of them, 323 package pairs comes from the same domain. The ultimate goal of this paper is to promote the awareness of the commonness and the importance of code duplication in the third-party package community and the reasonable use of code duplication by developers in the project development. / In the modern software development process, developers often call other people's completed code to build their own programs. There are generally two ways to do this: indirectly call other people's code through "import" or similar instructions in the program, or directly copy and paste other people's code and make slight modifications. The second method can make the program more independent and easy to use, but the code duplication problem caused by this method also has great security risks.This paper serves as a starting point to analyze the growing code duplication issues, and what is the root cause of code duplication. In this paper, I conducted code duplication-related research based on some popular code packages in the NPM community.I used some tools to compute a value to define how different codes are similar to each other, quantitatively analyzed the prevalence of code duplication in the NPM community, and did some related experiments based on this similarity. In the experiments, I found that code duplication is very common in the NPM community: 17.1% of all the files have 1-93 similar file in other package, and 29.3% of all the package have at least one "similar package", when the definition of similar files and packages are not that "strict".In all the 951 similar package pairs, 33.9% of them, 323 package pairs comes from the same domain. The ultimate goal of this paper is to promote the awareness of the commonness and the importance of code duplication in the third-party package community and the reasonable use of code duplication by developers in the project development.
|
108 |
Geometry of Self-Similar SetsRoinestad, Kristine A. 22 May 2007 (has links)
This paper examines self-similar sets and some of their properties, including the natural equivalence relation found in bilipschitz equivalence. Both dimension and preservation of paths are determined to be invariant under this equivalence. Also, sophisticated techniques, one involving the use of directed graphs, show the equivalence of two spaces. / Master of Science
|
109 |
Enhancing Document Retrieval in the FinTech Domain : Applications of Advanced Language ModelsHansen, Jesper January 2024 (has links)
In this thesis, methods of creating an information retrieval (IR) model within the Fin-Tech domain are explored. Given the domain-specific and data-scarce environment, methods of artificially generating data to train and evaluate IR models are implemented and their limitations are discussed. The generative model GPT-J 6B is used to generate pseudo-queries for a document corpus, resulting in a training- and test-set of 148 and 166 query-document pairs respectively. Transformer-based models, fine-tuned- and original versions, are put to the test against the baseline model BM25 which historically has been seen as an effective document retrieval model. The models are evaluated using mean reciprocal rank at k (MRR@k) and time-cost to retrieve relevant documents. The main findings are that the historical BM25 model performs well in comparison to the transformer alternatives, it reaches the highest score for MRR@2 = 0.612. The results show that for MRR@5 and MRR@10, a combination model of BM25 and a cross encoder slightly outperforms the baseline reaching scores of MRR@5 = 0.655 and MRR@10 = 0.672. However, the increase in performance is slim and may not be enough to motivate an implementation. Finally, further research using real-world data is required to argue that transformer-based models are more robust in a real-world setting.
|
110 |
Explorando variedade em consultas por similaridade / Investigationg variety in similarity queriesSantos, Lúcio Fernandes Dutra 26 October 2012 (has links)
A complexidade dos dados armazenados em grandes bases de dados aumenta sempre, criando a necessidade de novas formas de consulta. As consultas por similaridade vêm apresentando crescente interesse para tratar de dados complexos, sendo as mais representativas a consulta por abrangência (\'R IND. q\' Range query) e a consulta aos k-vizinhos mais próximos (k-\'NN IND. q\' k-Nearest Neighboor query). Até recentemente, essas consultas não estavam disponíveis nos Sistemas de Gerenciamento de Bases de Dados (SGBD). Agora, com o início de sua disponibilidade, tem se tornado claro que os operadores de busca fundamentais usados para executá-las não são suficientes para atender às necessidades das aplicações que as demandam. Assim, estão sendo estudadas variações e extensões aos operadores fundamentais, em geral voltados às necessidades de domínios de aplicações específicas. Além disso, os seguintes problemas vêm impactando diretamente sua aceitação por parte dos usuários e, portanto, sua usabilidade: (i) os operadores fundamentais são pouco expressivos em situações reais; (ii) a cardinalidade dos resultados tende a ser grande, obrigando o usuário analisar muitos elementos; e (iii) os resultados nem sempre atendem ao interesse do usuário, implicando na reformulação e ajuste frequente das consultas. O objetivo desta dissertação é o desenvolvimento de uma técnica inédita para exibir um grau de variedade nas respostas às consultas aos k-vizinhos mais próximos em domínios de dados métricos, explorando aspectos de diversidade em extensões dos operadores fundamentais usando apenas as propriedades básicas do espaço métrico sem a solicitação de outra informação por parte do usuário. Neste sentido, são apresentados: a formalização de um modelo de variedade que possibilita inserir diversidade nas consultas por similaridade sem a definição de parâmetros por parte do usuário; um algoritmo incremental para responder às consultas aos k-vizinhos mais próximos com variedade; um método de avaliação de sobreposição de variedade para as consultas por similaridade. As propriedades desses resultados permitem usar as técnicas desenvolvidas para apoiar a propriedade de variedade nas consultas aos k-vizinhos mais próximos em Sistemas de Gerenciamento de Bases de Dados / The data being collected and generated nowadays increases not only in volume, but also in complexity, leading to the need of new query operators. Similarity queries are one of the most pursued resources to retrieve complex data. The most studied operators to perform similarity are the Range Query (\'R IND.q\') and the k-Nearest Neighbor Query (k-\'NN IND. q\'). Until recently, those queries were not available in the Database Management Systems. Now they are starting to become available, but since its earliest applications to develop real systems, it became clear that the basic similarity query operators are not enough to meet the requirements of the target applications. Therefore, new variations and extensions to the basic operators are being studied, although every work up to now is only pursuing the requirements of specific application domains. Furthermore, the following issues are directly impacting their acceptance by users and therefore its usability: (i) the basic operators are not expressive in real situations, (ii) the result-set cardinality tends to be large, imposing to the user the need to analyze to many elements, and (iii) the results do not always meet the users interest, resulting in the reformulation and adjustment of the queries. The goal of this dissertation is the development of a novel technique to enable a degree of variety the answers of k-nearest neighbor queries in metric spaces, investigating aspects of diversity in extensions of the basic operators using only the properties of metric spaces, never requesting extra information from the user. In this monograph, we present: the formalization of the variety model that allows to support diversity in similarity queries without requiring diversification parameters from the user; a greedy algorithm to obtain answers for similarity queries to the k-nearest neighbors with variety; an evaluation method to assess the diversification ratio existing on a subset of elements in metric space. The properties of those results allow using our proposed techniques to support variety in k-nearest neighbor queries in Database Management Systems
|
Page generated in 0.0572 seconds