Global ETD Search

271	Geometry of Fractal Squares Roinestad, Kristine A. 29 April 2010 (has links) This paper will examine analogues of Cantor sets, called fractal squares, and some of the geometric ways in which fractal squares raise issues not raised by Cantor sets. Also discussed will be a technique using directed graphs to prove bilipschitz equivalence of two fractal squares. / Ph. D. Self-Similarity Bilipschitz Equivalences Fractal Squares Cantor Sets
272	Generative models meet similarity search: efficient, heuristic-free and robust retrieval Doan, Khoa Dang 23 September 2021 (has links) The rapid growth of digital data, especially visual and textual contents, brings many challenges to the problem of finding similar data. Exact similarity search, which aims to exhaustively find all relevant items through a linear scan in a dataset, is impractical due to its high computational complexity. Approximate-nearest-neighbor (ANN) search methods, especially the Learning-to-hash or Hashing methods, provide principled approaches that balance the trade-offs between the quality of the guesses and the computational cost for web-scale databases. In this era of data explosion, it is crucial for the hashing methods to be both computationally efficient and robust to various scenarios such as when the application has noisy data or data that slightly changes over time (i.e., out-of-distribution). This Thesis focuses on the development of practical generative learning-to-hash methods and explainable retrieval models. We first identify and discuss the various aspects where the framework of generative modeling can be used to improve the model designs and generalization of the hashing methods. Then we show that these generative hashing methods similarly enjoy several appealing empirical and theoretical properties of generative modeling. Specifically, the proposed generative hashing models generalize better with important properties such as low-sample requirement, and out-of-distribution and data-corruption robustness. Finally, in domains with structured data such as graphs, we show that the computational methods in generative modeling have an interesting utility beyond estimating the data distribution and describe a retrieval framework that can explain its decision by borrowing the algorithmic ideas developed in these methods. Two subsets of generative hashing methods and a subset of explainable retrieval methods are proposed. For the first hashing subset, we propose a novel adversarial framework that can be easily adapted to a new problem domain and three training algorithms that learn the hash functions without several hyperparameters commonly found in the previous hashing methods. The contributions of our work include: (1) Propose novel algorithms, which are based on adversarial learning, to learn the hash functions; (2) Design computationally efficient Wasserstein-related adversarial approaches which have low computational and sample efficiency; (3) Conduct extensive experiments on several benchmark datasets in various domains, including computational advertising, and text and image retrieval, for performance evaluation. For the second hashing subset, we propose energy-based hashing solutions which can improve the generalization and robustness of existing hashing approaches. The contributions of our work for this task include: (1) Propose data-synthesis solutions to improve the generalization of existing hashing methods; (2) Propose energy-based hashing solutions which exhibit better robustness against out-of-distribution and corrupted data; (3) Conduct extensive experiments for performance evaluations on several benchmark datasets in the image retrieval domain. Finally, for the last subset of explainable retrieval methods, we propose an optimal alignment algorithm that achieves a better similarity approximation for a pair of structured objects, such as graphs, while capturing the alignment between the nodes of the graphs to explain the similarity calculation. The contributions of our work for this task include: (1) Propose a novel optimal alignment algorithm for comparing two sets of bag-of-vectors embeddings; (2) Propose a differentiable computation to learn the parameters of the proposed optimal alignment model; (3) Conduct extensive experiments, for performance evaluation of both the similarity approximation task and the retrieval task, on several benchmark graph datasets. / Doctor of Philosophy / Searching for similar items, or similarity search, is one of the fundamental tasks in this information age, especially when there is a rapid growth of visual and textual contents. For example, in a search engine such as Google, a user searches for images with similar content to a referenced image; in online advertising, an advertiser finds new users, and eventually targets these users with advertisements, where the new users have similar profiles to some referenced users who have previously responded positively to the same or similar advertisements; in the chemical domain, scientists search for proteins with a similar structure to a referenced protein. The practical search applications in these domains often face several challenges, especially when these datasets or databases can contain a large number (e.g., millions or even billions) of complex-structured items (e.g., texts, images, and graphs). These challenges can be organized into three central themes: search efficiency (the economical use of resources such as computation and time) and model-design effort (the ease of building the search model). Besides search efficiency and model-design effort, it is increasingly a requirement of a search model to possess the ability to explain the search results, especially in the scientific domains where the items are structured objects such as graphs. This dissertation tackles the aforementioned challenges in practical search applications by using the computational techniques that learn to generate data. First, we overcome the need to scan the entire large dataset for similar items by considering an approximate similarity search technique called hashing. Then, we propose an unsupervised hashing framework that learns the hash functions with simpler objective functions directly from raw data. The proposed retrieval framework can be easily adapted into new domains with significantly lower effort in model design. When labeled data is available but is limited (which is a common scenario in practical search applications), we propose a hashing network that can synthesize additional data to improve the hash function learning process. The learned model also exhibits significant robustness against data corruption and slight changes in the underlying data. Finally, in domains with structured data such as graphs, we propose a computation approach that can simultaneously estimate the similarity of structured objects, such as graphs, and capture the alignment between their substructures, e.g., nodes. The alignment mechanism can help explain the reason why two objects are similar or dissimilar. This is a useful tool for domain experts who not only want to search for similar items but also want to understand how the search model makes its predictions. learning to hash generative modeling similarity search explainable retrieval
273	Similarity concept in theory lecturing: application to transportation studies. Pu, Jaan H. 07 July 2017 (has links) No / In this paper, a similarity concept is proposed to improve student understanding on difficult and complicated engineering theory. The planned application of this approach is for the Transportation Studies module (CSE6004-A) at School of Engineering, University of Bradford, United Kingdom. In the module, noise induced by road transport and vehicles are taught in depth, where the proposed teaching method will be applied to aid student understanding on the numerical concept of the vibration effect and noise on vehicle braking system. As part of the module planning, the full numerical solution of brake judder/vibration effect, which includes shaking (forced vibration) and nibbling (torsional vibration) effects will be introduced to students where similarity concept will be adapted in its teaching. The successfully applied concept will also be able to utilize by other engineering teaching and modules.
274	An Exploration of Circuit Similarity for Discovering and Predicting Reusable Hardware Zeng, Kevin 27 April 2016 (has links) A modular reuse-based design methodology has been one of the most important factors in improving hardware design productivity. Traditionally, reuse involves manually searching through repositories for existing components. This search can be tedious and often unfruitful. In order to enhance design reuse, an automated discovery technique is proposed: a reference circuit is compared with an archive of existing designs such that similar circuits are suggested throughout the design phase. To achieve this goal, methods for assessing the similarity of two designs are necessary. Different techniques for comparing the similarity of circuits are explored utilizing concepts from different domains. A new similarity measure was developed using birthmarks that allows for fast and efficient comparison of large and complex designs. Applications where circuit similarity matching can be utilized are examined such as IP theft detection and reverse engineering. Productivity experiments show that automatically suggesting reusable designs to the user could potentially increase productivity by more than 34% on average. / Ph. D. Productivity Reuse Hardware Field programmable gate arrays Similarity Matching Birthmarking
275	A study of model parameters for scaling up word to sentence similarity tasks in distributional semantics Milajevs, Dmitrijs January 2018 (has links) Representation of sentences that captures semantics is an essential part of natural language processing systems, such as information retrieval or machine translation. The representation of a sentence is commonly built by combining the representations of the words that the sentence consists of. Similarity between words is widely used as a proxy to evaluate semantic representations. Word similarity models are well-studied and are shown to positively correlate with human similarity judgements. Current evaluation of models of sentential similarity builds on the results obtained in lexical experiments. The main focus is how the lexical representations are used, rather than what they should be. It is often assumed that the optimal representations for word similarity are also optimal for sentence similarity. This work discards this assumption and systematically looks for lexical representations that are optimal for similarity measurement between sentences. We find that the best representation for word similarity is not always the best for sentence similarity and vice versa. The best models in word similarity tasks perform best with additive composition. However, the best result on compositional tasks is achieved with Kroneckerbased composition. There are representations that are equally good in both tasks when used with multiplicative composition. The systematic study of the parameters of similarity models reveals that the more information lexical representations contain, the more attention should be paid to noise. In particular, the word vectors in models with the feature size at the magnitude of the vocabulary size should be sparse, but if a small number of context features is used then the vectors should be dense. Given the right lexical representations, compositional operators achieve state-of-the-art performance, improving over models that use neural-word embeddings. To avoid overfitting, either several test datasets should be used or parameter selection should be based on parameters' average behaviours.
276	Initial business-to-business sales encounters : the impact of the similarity-attraction effect Dekker, Johannes J. January 2016 (has links) During initial business-to-business encounters, salespeople try to enhance buyers’ future interaction intentions. A common belief is that increasing buyers’ similarity perceptions increases the chances of future interaction. This study assesses the impact of the similarity-attraction effect on future interaction. By synthesising social psychology and marketing literature, a conceptual framework is proposed, in which perceived similarity influences salesperson trust. This relationship is mediated by task-related and social assessments of buyers. Task-related assessments comprise willingness (benevolence and integrity) and competence (power and expertise). Social attraction is conceptualised as likeability. Salesperson trust drives anticipated future interaction, together with organisational trust and anticipated added value. The conceptual framework was empirically tested through a cross-sectional survey. Dutch professional buyers assessed recent initial sales encounters. A sample of 162 dyads was analysed, using PLS-SEM, including FIMIX segmentation. This study demonstrates support for a third willingness construct: willingness behaviour. This construct implies that buyers are more influenced by expectations regarding behaviour, than assessments of salespeople’s attitudes. A homogeneous analysis supports the influence of perceived similarity on salesperson trust, both directly and through willingness behaviour. However, model-based segmentation uncovers a segment of cost-oriented dyads and a segment of more profit-oriented dyads. In cost-oriented dyads, there is no significant direct effect between perceived similarity and salesperson trust, and willingness behaviour nearly fully mediates this relationship. In more profit-oriented dyads, the similarity-attraction effect is not present. Theoretical and methodological contributions and managerial implications of these findings are discussed. 658
277	Análise da produtividade da soja associada a fatores agrometeorológicos, por meio de estatística espacial de área na Região Oeste do Estado do Paraná / Productivity analysis of factors associated with soy agrometeorological, through spatial statistical area in west region of the state of Paraná Araújo, Everton Coimbra de 10 December 2012 (has links) Made available in DSpace on 2017-07-10T19:25:24Z (GMT). No. of bitstreams: 1 Everton.pdf: 4714140 bytes, checksum: 519aa9b0b92961245b0d80158227dea4 (MD5) Previous issue date: 2012-12-10 / This paper aimed to present methods to be applied in the area of spatial statistics on soybean yield and agrometeorological factors in Western Paraná state. The data used, related to crop years from 2000/2001 to 2007/2008, are the following variables: soybean yield (t ha-1) and agrometeorological factors, such as rainfall (mm), average temperature (oC) and solar global radiation average (W m-2). In the first phase,it was used indices of spatial autocorrelation (Moran Global and Local) and presented multiple spatial regression models, with performance evaluations. The estimation of parameters occurred when using the Maximum Likelihood method and the performance evaluation of the models was based on the coefficient of determination (R2), the maximum value of the function of the logarithm of the maximum value of the likelihood function logarithm and the Bayesian information criterion of Schwarz. In a second step, cluster analysis was performed using spatial statistical multivariate associations, seeking to identify the same set of variables, but with a larger number of crop years. Finally, the data from one crop year were utilized in an approach based on fuzzy clustering, through the Fuzzy C-Means algorithm and the similarity measure by defining an index for this purpose. The first phase of the study showed the correlation between spatial autocorrelation and soybean yield and agrometeorological elements, through the analysis of spatial area, using techniques such as index Global Moran's I and Local univariate and bivariate and significance tests. It was possible to demonstrate, through the performance indicators used, that the SAR and CAR models offered better results than the classical multiple regression model. In the second phase, it was possible to present the formation of groups of cities using the similarities of the variables under analysis. Cluster analysis is a useful tool for better management of production activities in agriculture, since, with the grouping, it was possible to establish similarities parameters that provide better management of production processes that bring quantitative and qualitatively better, results sought by the farmer. In the final step, through the use of Fuzzy C-Means algorithm, it was possible to form groups of cities of similar soybean yield using the method of decision by the Higher Degree of Relevance (MDMGP) and Method of Decision Threshold by β (β CDM). Subsequently, identification of the adequate number of clusters was obtained using modified partition entropy. To measure the degree of similarity of each cluster, a Cluster Similarity Index (ISCl) was designed and used, which considers the degree of relevance of each city within the group to which it belongs. Within the perspective of this study, the method used was adequate, allowing to identify clusters of cities with degrees of similarities in the order of 60 to 78% / Este trabalho apresenta métodos para serem aplicados na estatística espacial de área na produtividade da soja e fatores agrometeorológicos na região oeste do estado do Paraná. Os dados utilizados estão relacionados aos anos-safra de 2000/2001 a 2007/2008, sendo as variáveis: produtividade da soja (t ha-1) e agrometeorológicas, tais como precipitação pluvial (mm), temperatura média (oC) e radiação solar global média (W m-2). Em uma primeira fase foram utilizados índices de autocorrelação espacial (Moran Global e Local) e apresentados modelos de regressão espacial múltipla, com avaliações de desempenho. A estimativa dos parâmetros dos modelos ajustados se deu pelo uso do método de Máxima Verossimilhança e a avaliação do desempenho dos modelos foi realizada com base no coeficiente de determinação (R2), no máximo valor do logaritmo da função do máximo valor do logaritmo da função verossimilhança e no critério de informação bayesiano de Schwarz. Em uma segunda etapa foram realizadas análises de agrupamento espacial por meio da estatística multivariada, buscando identificar associações no mesmo conjunto de variáveis, porém com um número maior de anos-safra. Finalmente, os dados de um ano-safra foram aplicados em uma abordagem baseada em agrupamento difuso, por meio do algoritmo Fuzzy c-Means, tendo a similaridade medida pela definição de um índice com este objetivo. O estudo da primeira fase permitiu verificar a correlação e a autocorrelação espacial entre a produtividade da soja e os elementos agrometeorológicos, por meio da análise espacial de área, usando técnicas como o índice I de Moran Global e Local uni e bivariado e os testes de significância. Foi possível demonstrar que, por meio dos indicadores de desempenho utilizados, os modelos SAR e CAR ofereceram melhores resultados em relação ao modelo de regressão múltipla clássica. Na segunda fase, foi possível apresentar a formação de grupos de municípios utilizando as similaridades das variáveis em análise. A análise de agrupamento foi um instrumento útil para uma melhor gestão das atividades de produção da agricultura, em função de que, com o agrupamento, foi possível se estabelecer similaridades que proporcionem parâmetros para uma melhor gestão dos processos de produção que traga, quantitativa e qualitativamente, resultados almejados pelo agricultor. Na etapa final, por meio do algoritmo Fuzzy c-Means, foi possível a formação de grupos de municípios similares à produtividade de soja, utilizando o Método de Decisão pelo Maior Grau de Pertinência (MDMGP) e o Método de Decisão pelo Limiar β (MDL β). Posteriormente, a identificação do número adequado de agrupamentos foi obtida utilizando a Entropia de Partição Modificada. Para mensurar o nível de similaridade de cada agrupamento, foi criado e utilizado um Índice de Similaridade de Clusters (ISCl), que considera o grau de pertinência de cada município dentro do agrupamento a que pertence. Dentro das perspectivas deste estudo, o método empregado se mostrou adequado, permitindo identificar agrupamentos de municípios com graus de similaridades da ordem de 60 a 78% Autocorrelação espacial Similaridade espacial Regressão espacial Spatial autocorrelation spatial similarity similarity index
278	Análise da produtividade da soja associada a fatores agrometeorológicos, por meio de estatística espacial de área na Região Oeste do Estado do Paraná / Productivity analysis of factors associated with soy agrometeorological, through spatial statistical area in west region of the state of Paraná Araújo, Everton Coimbra de 10 December 2012 (has links) Made available in DSpace on 2017-05-12T14:48:47Z (GMT). No. of bitstreams: 1 Everton.pdf: 4714140 bytes, checksum: 519aa9b0b92961245b0d80158227dea4 (MD5) Previous issue date: 2012-12-10 / This paper aimed to present methods to be applied in the area of spatial statistics on soybean yield and agrometeorological factors in Western Paraná state. The data used, related to crop years from 2000/2001 to 2007/2008, are the following variables: soybean yield (t ha-1) and agrometeorological factors, such as rainfall (mm), average temperature (oC) and solar global radiation average (W m-2). In the first phase,it was used indices of spatial autocorrelation (Moran Global and Local) and presented multiple spatial regression models, with performance evaluations. The estimation of parameters occurred when using the Maximum Likelihood method and the performance evaluation of the models was based on the coefficient of determination (R2), the maximum value of the function of the logarithm of the maximum value of the likelihood function logarithm and the Bayesian information criterion of Schwarz. In a second step, cluster analysis was performed using spatial statistical multivariate associations, seeking to identify the same set of variables, but with a larger number of crop years. Finally, the data from one crop year were utilized in an approach based on fuzzy clustering, through the Fuzzy C-Means algorithm and the similarity measure by defining an index for this purpose. The first phase of the study showed the correlation between spatial autocorrelation and soybean yield and agrometeorological elements, through the analysis of spatial area, using techniques such as index Global Moran's I and Local univariate and bivariate and significance tests. It was possible to demonstrate, through the performance indicators used, that the SAR and CAR models offered better results than the classical multiple regression model. In the second phase, it was possible to present the formation of groups of cities using the similarities of the variables under analysis. Cluster analysis is a useful tool for better management of production activities in agriculture, since, with the grouping, it was possible to establish similarities parameters that provide better management of production processes that bring quantitative and qualitatively better, results sought by the farmer. In the final step, through the use of Fuzzy C-Means algorithm, it was possible to form groups of cities of similar soybean yield using the method of decision by the Higher Degree of Relevance (MDMGP) and Method of Decision Threshold by β (β CDM). Subsequently, identification of the adequate number of clusters was obtained using modified partition entropy. To measure the degree of similarity of each cluster, a Cluster Similarity Index (ISCl) was designed and used, which considers the degree of relevance of each city within the group to which it belongs. Within the perspective of this study, the method used was adequate, allowing to identify clusters of cities with degrees of similarities in the order of 60 to 78% / Este trabalho apresenta métodos para serem aplicados na estatística espacial de área na produtividade da soja e fatores agrometeorológicos na região oeste do estado do Paraná. Os dados utilizados estão relacionados aos anos-safra de 2000/2001 a 2007/2008, sendo as variáveis: produtividade da soja (t ha-1) e agrometeorológicas, tais como precipitação pluvial (mm), temperatura média (oC) e radiação solar global média (W m-2). Em uma primeira fase foram utilizados índices de autocorrelação espacial (Moran Global e Local) e apresentados modelos de regressão espacial múltipla, com avaliações de desempenho. A estimativa dos parâmetros dos modelos ajustados se deu pelo uso do método de Máxima Verossimilhança e a avaliação do desempenho dos modelos foi realizada com base no coeficiente de determinação (R2), no máximo valor do logaritmo da função do máximo valor do logaritmo da função verossimilhança e no critério de informação bayesiano de Schwarz. Em uma segunda etapa foram realizadas análises de agrupamento espacial por meio da estatística multivariada, buscando identificar associações no mesmo conjunto de variáveis, porém com um número maior de anos-safra. Finalmente, os dados de um ano-safra foram aplicados em uma abordagem baseada em agrupamento difuso, por meio do algoritmo Fuzzy c-Means, tendo a similaridade medida pela definição de um índice com este objetivo. O estudo da primeira fase permitiu verificar a correlação e a autocorrelação espacial entre a produtividade da soja e os elementos agrometeorológicos, por meio da análise espacial de área, usando técnicas como o índice I de Moran Global e Local uni e bivariado e os testes de significância. Foi possível demonstrar que, por meio dos indicadores de desempenho utilizados, os modelos SAR e CAR ofereceram melhores resultados em relação ao modelo de regressão múltipla clássica. Na segunda fase, foi possível apresentar a formação de grupos de municípios utilizando as similaridades das variáveis em análise. A análise de agrupamento foi um instrumento útil para uma melhor gestão das atividades de produção da agricultura, em função de que, com o agrupamento, foi possível se estabelecer similaridades que proporcionem parâmetros para uma melhor gestão dos processos de produção que traga, quantitativa e qualitativamente, resultados almejados pelo agricultor. Na etapa final, por meio do algoritmo Fuzzy c-Means, foi possível a formação de grupos de municípios similares à produtividade de soja, utilizando o Método de Decisão pelo Maior Grau de Pertinência (MDMGP) e o Método de Decisão pelo Limiar β (MDL β). Posteriormente, a identificação do número adequado de agrupamentos foi obtida utilizando a Entropia de Partição Modificada. Para mensurar o nível de similaridade de cada agrupamento, foi criado e utilizado um Índice de Similaridade de Clusters (ISCl), que considera o grau de pertinência de cada município dentro do agrupamento a que pertence. Dentro das perspectivas deste estudo, o método empregado se mostrou adequado, permitindo identificar agrupamentos de municípios com graus de similaridades da ordem de 60 a 78% Autocorrelação espacial Similaridade espacial Regressão espacial Spatial autocorrelation spatial similarity similarity index
279	Effective and efficient similarity search in databases Lange, Dustin January 2013 (has links) Given a large set of records in a database and a query record, similarity search aims to find all records sufficiently similar to the query record. To solve this problem, two main aspects need to be considered: First, to perform effective search, the set of relevant records is defined using a similarity measure. Second, an efficient access method is to be found that performs only few database accesses and comparisons using the similarity measure. This thesis solves both aspects with an emphasis on the latter. In the first part of this thesis, a frequency-aware similarity measure is introduced. Compared record pairs are partitioned according to frequencies of attribute values. For each partition, a different similarity measure is created: machine learning techniques combine a set of base similarity measures into an overall similarity measure. After that, a similarity index for string attributes is proposed, the State Set Index (SSI), which is based on a trie (prefix tree) that is interpreted as a nondeterministic finite automaton. For processing range queries, the notion of query plans is introduced in this thesis to describe which similarity indexes to access and which thresholds to apply. The query result should be as complete as possible under some cost threshold. Two query planning variants are introduced: (1) Static planning selects a plan at compile time that is used for all queries. (2) Query-specific planning selects a different plan for each query. For answering top-k queries, the Bulk Sorted Access Algorithm (BSA) is introduced, which retrieves large chunks of records from the similarity indexes using fixed thresholds, and which focuses its efforts on records that are ranked high in more than one attribute and thus promising candidates. The described components form a complete similarity search system. Based on prototypical implementations, this thesis shows comparative evaluation results for all proposed approaches on different real-world data sets, one of which is a large person data set from a German credit rating agency. / Ziel von Ähnlichkeitssuche ist es, in einer Menge von Tupeln in einer Datenbank zu einem gegebenen Anfragetupel all diejenigen Tupel zu finden, die ausreichend ähnlich zum Anfragetupel sind. Um dieses Problem zu lösen, müssen zwei zentrale Aspekte betrachtet werden: Erstens, um eine effektive Suche durchzuführen, muss die Menge der relevanten Tupel mithilfe eines Ähnlichkeitsmaßes definiert werden. Zweitens muss eine effiziente Zugriffsmethode gefunden werden, die nur wenige Datenbankzugriffe und Vergleiche mithilfe des Ähnlichkeitsmaßes durchführt. Diese Arbeit beschäftigt sich mit beiden Aspekten und legt den Fokus auf Effizienz. Im ersten Teil dieser Arbeit wird ein häufigkeitsbasiertes Ähnlichkeitsmaß eingeführt. Verglichene Tupelpaare werden entsprechend der Häufigkeiten ihrer Attributwerte partitioniert. Für jede Partition wird ein unterschiedliches Ähnlichkeitsmaß erstellt: Mithilfe von Verfahren des Maschinellen Lernens werden Basisähnlichkeitsmaßes zu einem Gesamtähnlichkeitsmaß verbunden. Danach wird ein Ähnlichkeitsindex für String-Attribute vorgeschlagen, der State Set Index (SSI), welcher auf einem Trie (Präfixbaum) basiert, der als nichtdeterministischer endlicher Automat interpretiert wird. Zur Verarbeitung von Bereichsanfragen wird in dieser Arbeit die Notation der Anfragepläne eingeführt, um zu beschreiben welche Ähnlichkeitsindexe angefragt und welche Schwellwerte dabei verwendet werden sollen. Das Anfrageergebnis sollte dabei so vollständig wie möglich sein und die Kosten sollten einen gegebenen Schwellwert nicht überschreiten. Es werden zwei Verfahren zur Anfrageplanung vorgeschlagen: (1) Beim statischen Planen wird zur Übersetzungszeit ein Plan ausgewählt, der dann für alle Anfragen verwendet wird. (2) Beim anfragespezifischen Planen wird für jede Anfrage ein unterschiedlicher Plan ausgewählt. Zur Beantwortung von Top-k-Anfragen stellt diese Arbeit den Bulk Sorted Access-Algorithmus (BSA) vor, der große Mengen von Tupeln mithilfe fixer Schwellwerte von den Ähnlichkeitsindexen abfragt und der Tupel bevorzugt, die hohe Ähnlichkeitswerte in mehr als einem Attribut haben und damit vielversprechende Kandidaten sind. Die vorgestellten Komponenten bilden ein vollständiges Ähnlichkeitssuchsystem. Basierend auf einer prototypischen Implementierung zeigt diese Arbeit vergleichende Evaluierungsergebnisse für alle vorgestellten Ansätze auf verschiedenen Realwelt-Datensätzen; einer davon ist ein großer Personendatensatz einer deutschen Wirtschaftsauskunftei. Datenbanken Ähnlichkeitssuche Suchverfahren Ähnlichkeitsmaße Indexstrukturen Databases Similarity Search Search Algorithms Similarity Measures Index Structures Data processing Computer science
280	Category Knowledge, Skeleton-based Shape Matching And Shape Classification Erdem, Ibrahim Aykut 01 October 2008 (has links) (PDF) Skeletal shape representations, in spite of their structural instabilities, have proven themselves as effective representation schemes for recognition and classification of visual shapes. They capture part structure in a compact and natural way and provide insensitivity to visual transformations such as occlusion and articulation of parts. In this thesis, we explore the potential use of disconnected skeleton representation for shape recognition and shape classification. Specifically, we first investigate the importance of contextual information in recognition where we extend the previously proposed disconnected skeleton based shape matching methods in different ways by incorporating category knowledge into matching process. Unlike the view in syntactic matching of shapes, our interpretation differentiates the semantic roles of the shapes in comparison in a way that a query shape is being matched with a database shape whose category is known a priori. The presence of context, i.e. the knowledge about the category of the database shape, influences the similarity computations, and helps us to obtain better matching performance. Next, we build upon our category-influenced matching framework in which both shapes and shape categories are represented with depth-1 skeletal trees, and develop a similarity-based shape classification method where the category trees formed for each shape category provide a reference set for learning the relationships between categories. As our classification method takes into account both within-category and between-category information, we attain high classification performance. Moreover, using the suggested classification scheme in a retrieval task improves both the efficiency and accuracy of matching by eliminating unrelated comparisons.

Search results