Spelling suggestions: "subject:"similarity"" "subject:"imilarity""
271 |
Geometry of Fractal SquaresRoinestad, Kristine A. 29 April 2010 (has links)
This paper will examine analogues of Cantor sets, called fractal squares, and some of the geometric ways in which fractal squares raise issues not raised by Cantor sets. Also discussed will be a technique using directed graphs to prove bilipschitz equivalence of two fractal squares. / Ph. D.
|
272 |
Generative models meet similarity search: efficient, heuristic-free and robust retrievalDoan, Khoa Dang 23 September 2021 (has links)
The rapid growth of digital data, especially visual and textual contents, brings many challenges to the problem of finding similar data. Exact similarity search, which aims to exhaustively find all relevant items through a linear scan in a dataset, is impractical due to its high computational complexity. Approximate-nearest-neighbor (ANN) search methods, especially the Learning-to-hash or Hashing methods, provide principled approaches that balance the trade-offs between the quality of the guesses and the computational cost for web-scale databases. In this era of data explosion, it is crucial for the hashing methods to be both computationally efficient and robust to various scenarios such as when the application has noisy data or data that slightly changes over time (i.e., out-of-distribution).
This Thesis focuses on the development of practical generative learning-to-hash methods and explainable retrieval models. We first identify and discuss the various aspects where the framework of generative modeling can be used to improve the model designs and generalization of the hashing methods. Then we show that these generative hashing methods similarly enjoy several appealing empirical and theoretical properties of generative modeling. Specifically, the proposed generative hashing models generalize better with important properties such as low-sample requirement, and out-of-distribution and data-corruption robustness. Finally, in domains with structured data such as graphs, we show that the computational methods in generative modeling have an interesting utility beyond estimating the data distribution and describe a retrieval framework that can explain its decision by borrowing the algorithmic ideas developed in these methods.
Two subsets of generative hashing methods and a subset of explainable retrieval methods are proposed. For the first hashing subset, we propose a novel adversarial framework that can be easily adapted to a new problem domain and three training algorithms that learn the hash functions without several hyperparameters commonly found in the previous hashing methods. The contributions of our work include: (1) Propose novel algorithms, which are based on adversarial learning, to learn the hash functions; (2) Design computationally efficient Wasserstein-related adversarial approaches which have low computational and sample efficiency; (3) Conduct extensive experiments on several benchmark datasets in various domains, including computational advertising, and text and image retrieval, for performance evaluation. For the second hashing subset, we propose energy-based hashing solutions which can improve the generalization and robustness of existing hashing approaches. The contributions of our work for this task include: (1) Propose data-synthesis solutions to improve the generalization of existing hashing methods; (2) Propose energy-based hashing solutions which exhibit better robustness against out-of-distribution and corrupted data; (3) Conduct extensive experiments for performance evaluations on several benchmark datasets in the image retrieval domain.
Finally, for the last subset of explainable retrieval methods, we propose an optimal alignment algorithm that achieves a better similarity approximation for a pair of structured objects, such as graphs, while capturing the alignment between the nodes of the graphs to explain the similarity calculation. The contributions of our work for this task include: (1) Propose a novel optimal alignment algorithm for comparing two sets of bag-of-vectors embeddings; (2) Propose a differentiable computation to learn the parameters of the proposed optimal alignment model; (3) Conduct extensive experiments, for performance evaluation of both the similarity approximation task and the retrieval task, on several benchmark graph datasets. / Doctor of Philosophy / Searching for similar items, or similarity search, is one of the fundamental tasks in this information age, especially when there is a rapid growth of visual and textual contents. For example, in a search engine such as Google, a user searches for images with similar content to a referenced image; in online advertising, an advertiser finds new users, and eventually targets these users with advertisements, where the new users have similar profiles to some referenced users who have previously responded positively to the same or similar advertisements; in the chemical domain, scientists search for proteins with a similar structure to a referenced protein. The practical search applications in these domains often face several challenges, especially when these datasets or databases can contain a large number (e.g., millions or even billions) of complex-structured items (e.g., texts, images, and graphs). These challenges can be organized into three central themes: search efficiency (the economical use of resources such as computation and time) and model-design effort (the ease of building the search model). Besides search efficiency and model-design effort, it is increasingly a requirement of a search model to possess the ability to explain the search results, especially in the scientific domains where the items are structured objects such as graphs.
This dissertation tackles the aforementioned challenges in practical search applications by using the computational techniques that learn to generate data. First, we overcome the need to scan the entire large dataset for similar items by considering an approximate similarity search technique called hashing. Then, we propose an unsupervised hashing framework that learns the hash functions with simpler objective functions directly from raw data. The proposed retrieval framework can be easily adapted into new domains with significantly lower effort in model design. When labeled data is available but is limited (which is a common scenario in practical search applications), we propose a hashing network that can synthesize additional data to improve the hash function learning process. The learned model also exhibits significant robustness against data corruption and slight changes in the underlying data. Finally, in domains with structured data such as graphs, we propose a computation approach that can simultaneously estimate the similarity of structured objects, such as graphs, and capture the alignment between their substructures, e.g., nodes. The alignment mechanism can help explain the reason why two objects are similar or dissimilar. This is a useful tool for domain experts who not only want to search for similar items but also want to understand how the search model makes its predictions.
|
273 |
Similarity concept in theory lecturing: application to transportation studies.Pu, Jaan H. 07 July 2017 (has links)
No / In this paper, a similarity concept is proposed to improve student understanding on difficult and complicated engineering theory. The planned application of this approach is for the Transportation Studies module (CSE6004-A) at School of Engineering, University of Bradford, United Kingdom. In the module, noise induced by road transport and vehicles are taught in depth, where the proposed teaching method will be applied to aid student understanding on the numerical concept of the vibration effect and noise on vehicle braking system. As part of the module planning, the full numerical solution of brake judder/vibration effect, which includes shaking (forced vibration) and nibbling (torsional vibration) effects will be introduced to students where similarity concept will be adapted in its teaching. The successfully applied concept will also be able to utilize by other engineering teaching and modules.
|
274 |
An Exploration of Circuit Similarity for Discovering and Predicting Reusable HardwareZeng, Kevin 27 April 2016 (has links)
A modular reuse-based design methodology has been one of the most important factors in improving hardware design productivity. Traditionally, reuse involves manually searching through repositories for existing components. This search can be tedious and often unfruitful. In order to enhance design reuse, an automated discovery technique is proposed: a reference circuit is compared with an archive of existing designs such that similar circuits are suggested throughout the design phase. To achieve this goal, methods for assessing the similarity of two designs are necessary. Different techniques for comparing the similarity of circuits are explored utilizing concepts from different domains. A new similarity measure was developed using birthmarks that allows for fast and efficient comparison of large and complex designs. Applications where circuit similarity matching can be utilized are examined such as IP theft detection and reverse engineering. Productivity experiments show that automatically suggesting reusable designs to the user could potentially increase productivity by more than 34% on average. / Ph. D.
|
275 |
Characterizing levels of granularity in the neural bases of motivated memoryHorwath, Elizabeth 08 1900 (has links)
Our memory system is highly complex and contains numerous features ranging from fine-grained, event-specific details through high-level conceptual knowledge. With immense amounts of constant incoming information, limitations on our memory system do not allow us to encode every detail into long-term memory. Thus, memory is prioritized for the information that is most valuable or important to our current or future goals (Adcock et al., 2006; Murty & Adcock, 2017). While this literature has shown a link between motivation and memory in general, our recent work has begun to characterize how motivation targets different aspects of memory, with evidence suggesting a focus on higher-level features (Horwath et al., 2023; Horwath & Murty, in-prep). Yet, investigating the neural bases of this process will further our understanding of the structure of memory. We tackled this question using representational similarity analysis (RSA) to first characterize the granularity at which reward is represented categorically or continuously in the brain, and then measure how those representations relate to subsequent memory. We measured pattern similarity in the medial temporal lobe (MTL) and a larger network containing anterior temporal (AT) regions, which supports conceptual information, and posterior medial (PM) regions, which support event-specific details. Results showed hippocampal (HPC) and AT involvement in representing categorical aspects of motivation, while PM tracked continuity across value. The AT and PM networks also revealed an important role in supporting successful memory for high- and low-value information, respectively. Together, this work highlights the importance of understanding the neural processes underlying the complexities of motivated memory. / Psychology
|
276 |
Transit Bus Number Identification for Frictionless Fare Collection Using Passenger Location DataGhorbankhani, Nafise January 2024 (has links)
Public transportation ticketing has evolved from traditional paper tickets to advanced
digital systems. This study combines GPS data from users’ smartphones with General
Transit Feed Specification (GTFS) data from the bus network in Hamilton, Ontario,
to analyze trajectory similarities using Dynamic Time Warping (DTW) and Longest
Common Subsequence (LCSS) algorithms. By matching user trajectories with GTFS
data, the system accurately identifies the bus services used, enabling frictionless fare
calculation and integration of payment systems. Our results show that DTW is
more effective than LCSS, particularly for longer trips due to the large quantity of
data points. This research demonstrates the practicality of this approach, providing a promising solution for improving fare collection and the efficiency of public
transportation. These findings make a significant contribution to the development of
smart, user-friendly transportation infrastructure. / Thesis / Master of Applied Science (MASc)
|
277 |
A study of model parameters for scaling up word to sentence similarity tasks in distributional semanticsMilajevs, Dmitrijs January 2018 (has links)
Representation of sentences that captures semantics is an essential part of natural language processing systems, such as information retrieval or machine translation. The representation of a sentence is commonly built by combining the representations of the words that the sentence consists of. Similarity between words is widely used as a proxy to evaluate semantic representations. Word similarity models are well-studied and are shown to positively correlate with human similarity judgements. Current evaluation of models of sentential similarity builds on the results obtained in lexical experiments. The main focus is how the lexical representations are used, rather than what they should be. It is often assumed that the optimal representations for word similarity are also optimal for sentence similarity. This work discards this assumption and systematically looks for lexical representations that are optimal for similarity measurement between sentences. We find that the best representation for word similarity is not always the best for sentence similarity and vice versa. The best models in word similarity tasks perform best with additive composition. However, the best result on compositional tasks is achieved with Kroneckerbased composition. There are representations that are equally good in both tasks when used with multiplicative composition. The systematic study of the parameters of similarity models reveals that the more information lexical representations contain, the more attention should be paid to noise. In particular, the word vectors in models with the feature size at the magnitude of the vocabulary size should be sparse, but if a small number of context features is used then the vectors should be dense. Given the right lexical representations, compositional operators achieve state-of-the-art performance, improving over models that use neural-word embeddings. To avoid overfitting, either several test datasets should be used or parameter selection should be based on parameters' average behaviours.
|
278 |
Initial business-to-business sales encounters : the impact of the similarity-attraction effectDekker, Johannes J. January 2016 (has links)
During initial business-to-business encounters, salespeople try to enhance buyers’ future interaction intentions. A common belief is that increasing buyers’ similarity perceptions increases the chances of future interaction. This study assesses the impact of the similarity-attraction effect on future interaction. By synthesising social psychology and marketing literature, a conceptual framework is proposed, in which perceived similarity influences salesperson trust. This relationship is mediated by task-related and social assessments of buyers. Task-related assessments comprise willingness (benevolence and integrity) and competence (power and expertise). Social attraction is conceptualised as likeability. Salesperson trust drives anticipated future interaction, together with organisational trust and anticipated added value. The conceptual framework was empirically tested through a cross-sectional survey. Dutch professional buyers assessed recent initial sales encounters. A sample of 162 dyads was analysed, using PLS-SEM, including FIMIX segmentation. This study demonstrates support for a third willingness construct: willingness behaviour. This construct implies that buyers are more influenced by expectations regarding behaviour, than assessments of salespeople’s attitudes. A homogeneous analysis supports the influence of perceived similarity on salesperson trust, both directly and through willingness behaviour. However, model-based segmentation uncovers a segment of cost-oriented dyads and a segment of more profit-oriented dyads. In cost-oriented dyads, there is no significant direct effect between perceived similarity and salesperson trust, and willingness behaviour nearly fully mediates this relationship. In more profit-oriented dyads, the similarity-attraction effect is not present. Theoretical and methodological contributions and managerial implications of these findings are discussed.
|
279 |
Análise da produtividade da soja associada a fatores agrometeorológicos, por meio de estatística espacial de área na Região Oeste do Estado do Paraná / Productivity analysis of factors associated with soy agrometeorological, through spatial statistical area in west region of the state of ParanáAraújo, Everton Coimbra de 10 December 2012 (has links)
Made available in DSpace on 2017-07-10T19:25:24Z (GMT). No. of bitstreams: 1
Everton.pdf: 4714140 bytes, checksum: 519aa9b0b92961245b0d80158227dea4 (MD5)
Previous issue date: 2012-12-10 / This paper aimed to present methods to be applied in the area of spatial statistics on soybean yield and agrometeorological factors in Western Paraná state. The data used, related to crop years from 2000/2001 to 2007/2008, are the following variables: soybean yield (t ha-1) and agrometeorological factors, such as rainfall (mm), average temperature (oC) and solar global radiation average (W m-2). In the first phase,it was used indices of spatial autocorrelation (Moran Global and Local) and presented multiple spatial regression models, with performance evaluations. The estimation of parameters occurred when using the Maximum Likelihood method and the performance evaluation of the models was based on the coefficient of determination (R2), the maximum value of the function of the logarithm of the maximum value of the likelihood function logarithm and the Bayesian information criterion of Schwarz. In a second step, cluster analysis was performed using spatial statistical multivariate associations, seeking to identify the same set of variables, but with a larger number of crop years. Finally, the data from one crop year were utilized in an approach based on fuzzy clustering, through the Fuzzy C-Means algorithm and the similarity measure by defining an index for this purpose. The first phase of the study showed the correlation between spatial autocorrelation and soybean yield and agrometeorological elements, through the analysis of spatial area, using techniques such as index Global Moran's I and Local univariate and bivariate and significance tests. It was possible to demonstrate, through the performance indicators used, that the SAR and CAR models offered better results than the classical multiple regression model. In the second phase, it was possible to present the formation of groups of cities using the similarities of the variables under analysis. Cluster analysis is a useful tool for better management of production activities in agriculture, since, with the grouping, it was possible to establish similarities parameters that provide better management of production processes that bring quantitative and qualitatively better, results sought by the farmer. In the final step, through the use of Fuzzy C-Means algorithm, it was possible to form groups of cities of similar soybean yield using the method of decision by the Higher Degree of Relevance (MDMGP) and Method of Decision Threshold by β (β CDM). Subsequently, identification of the adequate number of clusters was obtained using modified partition entropy. To measure the degree of similarity of each cluster, a Cluster Similarity Index (ISCl) was designed and used, which considers the degree of relevance of each city within the group to which it belongs. Within the perspective of this study, the method used was adequate, allowing to identify clusters of cities with degrees of similarities in the order of 60 to 78% / Este trabalho apresenta métodos para serem aplicados na estatística espacial de área na produtividade da soja e fatores agrometeorológicos na região oeste do estado do Paraná. Os dados utilizados estão relacionados aos anos-safra de 2000/2001 a 2007/2008, sendo as variáveis: produtividade da soja (t ha-1) e agrometeorológicas, tais como precipitação pluvial (mm), temperatura média (oC) e radiação solar global média (W m-2). Em uma primeira fase foram utilizados índices de autocorrelação espacial (Moran Global e Local) e apresentados modelos de regressão espacial múltipla, com avaliações de desempenho. A estimativa dos parâmetros dos modelos ajustados se deu pelo uso do método de Máxima Verossimilhança e a avaliação do desempenho dos modelos foi realizada com base no coeficiente de determinação (R2), no máximo valor do logaritmo da função do máximo valor do logaritmo da função verossimilhança e no critério de informação bayesiano de Schwarz. Em uma segunda etapa foram realizadas análises de agrupamento espacial por meio da estatística multivariada, buscando identificar associações no mesmo conjunto de variáveis, porém com um número maior de anos-safra. Finalmente, os dados de um ano-safra foram aplicados em uma abordagem baseada em agrupamento difuso, por meio do algoritmo Fuzzy c-Means, tendo a similaridade medida pela definição de um índice com este objetivo. O estudo da primeira fase permitiu verificar a correlação e a autocorrelação espacial entre a produtividade da soja e os elementos agrometeorológicos, por meio da análise espacial de área, usando técnicas como o índice I de Moran Global e Local uni e bivariado e os testes de significância. Foi possível demonstrar que, por meio dos indicadores de desempenho utilizados, os modelos SAR e CAR ofereceram melhores resultados em relação ao modelo de regressão múltipla clássica. Na segunda fase, foi possível apresentar a formação de grupos de municípios utilizando as similaridades das variáveis em análise. A análise de agrupamento foi um instrumento útil para uma melhor gestão das atividades de produção da agricultura, em função de que, com o agrupamento, foi possível se estabelecer similaridades que proporcionem parâmetros para uma melhor gestão dos processos de produção que traga, quantitativa e qualitativamente, resultados almejados pelo agricultor. Na etapa final, por meio do algoritmo Fuzzy c-Means, foi possível a formação de grupos de municípios similares à produtividade de soja, utilizando o Método de Decisão pelo Maior Grau de Pertinência (MDMGP) e o Método de Decisão pelo Limiar β (MDL β). Posteriormente, a identificação do número adequado de agrupamentos foi obtida utilizando a Entropia de Partição Modificada. Para mensurar o nível de similaridade de cada agrupamento, foi criado e utilizado um Índice de Similaridade de Clusters (ISCl), que considera o grau de pertinência de cada município dentro do agrupamento a que pertence. Dentro das perspectivas deste estudo, o método empregado se mostrou adequado, permitindo identificar agrupamentos de municípios com graus de similaridades da ordem de 60 a 78%
|
280 |
Análise da produtividade da soja associada a fatores agrometeorológicos, por meio de estatística espacial de área na Região Oeste do Estado do Paraná / Productivity analysis of factors associated with soy agrometeorological, through spatial statistical area in west region of the state of ParanáAraújo, Everton Coimbra de 10 December 2012 (has links)
Made available in DSpace on 2017-05-12T14:48:47Z (GMT). No. of bitstreams: 1
Everton.pdf: 4714140 bytes, checksum: 519aa9b0b92961245b0d80158227dea4 (MD5)
Previous issue date: 2012-12-10 / This paper aimed to present methods to be applied in the area of spatial statistics on soybean yield and agrometeorological factors in Western Paraná state. The data used, related to crop years from 2000/2001 to 2007/2008, are the following variables: soybean yield (t ha-1) and agrometeorological factors, such as rainfall (mm), average temperature (oC) and solar global radiation average (W m-2). In the first phase,it was used indices of spatial autocorrelation (Moran Global and Local) and presented multiple spatial regression models, with performance evaluations. The estimation of parameters occurred when using the Maximum Likelihood method and the performance evaluation of the models was based on the coefficient of determination (R2), the maximum value of the function of the logarithm of the maximum value of the likelihood function logarithm and the Bayesian information criterion of Schwarz. In a second step, cluster analysis was performed using spatial statistical multivariate associations, seeking to identify the same set of variables, but with a larger number of crop years. Finally, the data from one crop year were utilized in an approach based on fuzzy clustering, through the Fuzzy C-Means algorithm and the similarity measure by defining an index for this purpose. The first phase of the study showed the correlation between spatial autocorrelation and soybean yield and agrometeorological elements, through the analysis of spatial area, using techniques such as index Global Moran's I and Local univariate and bivariate and significance tests. It was possible to demonstrate, through the performance indicators used, that the SAR and CAR models offered better results than the classical multiple regression model. In the second phase, it was possible to present the formation of groups of cities using the similarities of the variables under analysis. Cluster analysis is a useful tool for better management of production activities in agriculture, since, with the grouping, it was possible to establish similarities parameters that provide better management of production processes that bring quantitative and qualitatively better, results sought by the farmer. In the final step, through the use of Fuzzy C-Means algorithm, it was possible to form groups of cities of similar soybean yield using the method of decision by the Higher Degree of Relevance (MDMGP) and Method of Decision Threshold by β (β CDM). Subsequently, identification of the adequate number of clusters was obtained using modified partition entropy. To measure the degree of similarity of each cluster, a Cluster Similarity Index (ISCl) was designed and used, which considers the degree of relevance of each city within the group to which it belongs. Within the perspective of this study, the method used was adequate, allowing to identify clusters of cities with degrees of similarities in the order of 60 to 78% / Este trabalho apresenta métodos para serem aplicados na estatística espacial de área na produtividade da soja e fatores agrometeorológicos na região oeste do estado do Paraná. Os dados utilizados estão relacionados aos anos-safra de 2000/2001 a 2007/2008, sendo as variáveis: produtividade da soja (t ha-1) e agrometeorológicas, tais como precipitação pluvial (mm), temperatura média (oC) e radiação solar global média (W m-2). Em uma primeira fase foram utilizados índices de autocorrelação espacial (Moran Global e Local) e apresentados modelos de regressão espacial múltipla, com avaliações de desempenho. A estimativa dos parâmetros dos modelos ajustados se deu pelo uso do método de Máxima Verossimilhança e a avaliação do desempenho dos modelos foi realizada com base no coeficiente de determinação (R2), no máximo valor do logaritmo da função do máximo valor do logaritmo da função verossimilhança e no critério de informação bayesiano de Schwarz. Em uma segunda etapa foram realizadas análises de agrupamento espacial por meio da estatística multivariada, buscando identificar associações no mesmo conjunto de variáveis, porém com um número maior de anos-safra. Finalmente, os dados de um ano-safra foram aplicados em uma abordagem baseada em agrupamento difuso, por meio do algoritmo Fuzzy c-Means, tendo a similaridade medida pela definição de um índice com este objetivo. O estudo da primeira fase permitiu verificar a correlação e a autocorrelação espacial entre a produtividade da soja e os elementos agrometeorológicos, por meio da análise espacial de área, usando técnicas como o índice I de Moran Global e Local uni e bivariado e os testes de significância. Foi possível demonstrar que, por meio dos indicadores de desempenho utilizados, os modelos SAR e CAR ofereceram melhores resultados em relação ao modelo de regressão múltipla clássica. Na segunda fase, foi possível apresentar a formação de grupos de municípios utilizando as similaridades das variáveis em análise. A análise de agrupamento foi um instrumento útil para uma melhor gestão das atividades de produção da agricultura, em função de que, com o agrupamento, foi possível se estabelecer similaridades que proporcionem parâmetros para uma melhor gestão dos processos de produção que traga, quantitativa e qualitativamente, resultados almejados pelo agricultor. Na etapa final, por meio do algoritmo Fuzzy c-Means, foi possível a formação de grupos de municípios similares à produtividade de soja, utilizando o Método de Decisão pelo Maior Grau de Pertinência (MDMGP) e o Método de Decisão pelo Limiar β (MDL β). Posteriormente, a identificação do número adequado de agrupamentos foi obtida utilizando a Entropia de Partição Modificada. Para mensurar o nível de similaridade de cada agrupamento, foi criado e utilizado um Índice de Similaridade de Clusters (ISCl), que considera o grau de pertinência de cada município dentro do agrupamento a que pertence. Dentro das perspectivas deste estudo, o método empregado se mostrou adequado, permitindo identificar agrupamentos de municípios com graus de similaridades da ordem de 60 a 78%
|
Page generated in 0.065 seconds