Global ETD Search

51	Optimal and Hereditarily Optimal Realizations of Metric Spaces / Optimala och ärftligt optimala realiseringar av metriker Lesser, Alice January 2007 (has links) This PhD thesis, consisting of an introduction, four papers, and some supplementary results, studies the problem of finding an optimal realization of a given finite metric space: a weighted graph which preserves the metric's distances and has minimal total edge weight. This problem is known to be NP-hard, and solutions are not necessarily unique. It has been conjectured that extremally weighted optimal realizations may be found as subgraphs of the hereditarily optimal realization Γd, a graph which in general has a higher total edge weight than the optimal realization but has the advantages of being unique, and possible to construct explicitly via the tight span of the metric. In Paper I, we prove that the graph Γd is equivalent to the 1-skeleton of the tight span precisely when the metric considered is totally split-decomposable. For the subset of totally split-decomposable metrics known as consistent metrics this implies that Γd is isomorphic to the easily constructed Buneman graph. In Paper II, we show that for any metric on at most five points, any optimal realization can be found as a subgraph of Γd. In Paper III we provide a series of counterexamples; metrics for which there exist extremally weighted optimal realizations which are not subgraphs of Γd. However, for these examples there also exists at least one optimal realization which is a subgraph. Finally, Paper IV examines a weakened conjecture suggested by the above counterexamples: can we always find some optimal realization as a subgraph in Γd? Defining extremal optimal realizations as those having the maximum possible number of shortest paths, we prove that any embedding of the vertices of an extremal optimal realization into Γd is injective. Moreover, we prove that this weakened conjecture holds for the subset of consistent metrics which have a 2-dimensional tight span Applied mathematics optimal realization hereditarily optimal realization tight span phylogenetic network Buneman graph split decomposition T-theory finite metric space topological graph theory discrete geometry Tillämpad matematik
52	Similaridade em big data / Similarity in big data Lúcio Fernandes Dutra Santos 19 July 2017 (has links) Os volumes de dados armazenados em grandes bases de dados aumentam em ritmo sempre crescente, pressionando o desempenho e a flexibilidade dos Sistemas de Gerenciamento de Bases de Dados (SGBDs). Os problemas de se tratar dados em grandes quantidades, escopo, complexidade e distribuição vêm sendo tratados também sob o tema de big data. O aumento da complexidade cria a necessidade de novas formas de busca - representar apenas números e pequenas cadeias de caracteres já não é mais suficiente. Buscas por similaridade vêm se mostrando a maneira por excelência de comparar dados complexos, mas até recentemente elas não estavam disponíveis nos SGBDs. Agora, com o início de sua disponibilidade, está se tornando claro que apenas os operadores de busca por similaridade fundamentais não são suficientes para lidar com grandes volumes de dados. Um dos motivos disso é que similaridade\' é, usualmente, definida considerando seu significado quando apenas poucos estão envolvidos. Atualmente, o principal foco da literatura em big data é aumentar a eficiência na recuperação dos dados usando paralelismo, existindo poucos estudos sobre a eficácia das respostas obtidas. Esta tese visa propor e desenvolver variações dos operadores de busca por similaridade para torná-los mais adequados para processar big data, apresentando visões mais abrangentes da base de dados, aumentando a eficácia das respostas, porém sem causar impactos consideráveis na eficiência dos algoritmos de busca e viabilizando sua execução escalável sobre grandes volumes de dados. Para alcançar esse objetivo, este trabalho apresenta quatro frentes de contribuições: A primeira consistiu em um modelo de diversificação de resultados que pode ser aplicado usando qualquer critério de comparação e operador de busca por similaridade. A segunda focou em definir técnicas de amostragem e de agrupamento de dados com o modelo de diversificação proposto, acelerando o processo de análise dos conjuntos de resultados. A terceira contribuição desenvolveu métodos de avaliação da qualidade dos conjuntos de resultados diversificados. Por fim, a última frente de contribuição apresentou uma abordagem para integrar os conceitos de mineração visual de dados e buscas por similaridade com diversidade em sistemas de recuperação por conteúdo, aumentando o entendimento de como a propriedade de diversidade pode ser aplicada. / The data being collected and generated nowadays increase not only in volume, but also in complexity, requiring new query operators. Health care centers collecting image exams and remote sensing from satellites and from earth-based stations are examples of application domains where more powerful and flexible operators are required. Storing, retrieving and analyzing data that are huge in volume, structure, complexity and distribution are now being referred to as big data. Representing and querying big data using only the traditional scalar data types are not enough anymore. Similarity queries are the most pursued resources to retrieve complex data, but until recently, they were not available in the Database Management Systems. Now that they are starting to become available, its first uses to develop real systems make it clear that the basic similarity query operators are not enough to meet the requirements of the target applications. The main reason is that similarity is a concept formulated considering only small amounts of data elements. Nowadays, researchers are targeting handling big data mainly using parallel architectures, and only a few studies exist targeting the efficacy of the query answers. This Ph.D. work aims at developing variations for the basic similarity operators to propose better suited similarity operators to handle big data, presenting a holistic vision about the database, increasing the effectiveness of the provided answers, but without causing impact on the efficiency on the searching algorithms. To achieve this goal, four mainly contributions are presented: The first one was a result diversification model that can be applied in any comparison criteria and similarity search operator. The second one focused on defining sampling and grouping techniques with the proposed diversification model aiming at speeding up the analysis task of the result sets. The third contribution concentrated on evaluation methods for measuring the quality of diversified result sets. Finally, the last one defines an approach to integrate the concepts of visual data mining and similarity with diversity searches in content-based retrieval systems, allowing a better understanding of how the diversity property is applied in the query process. Análise de qualidade de resultados Big data Buscas em espaços métricos Buscas por similaridade Diversificação de resultados Analysis of results quality Big data Result diversification Similarity queries Similarity search in metric space
53	Operação de busca exata aos K-vizinhos mais próximos reversos em espaços métricos / Answering exact reverse k-nerarest neighbors queries in metric space Willian Dener de Oliveira 19 March 2010 (has links) A complexidade dos dados armazenados em grandes bases de dados aumenta cada vez mais, criando a necessidade de novas operações de consulta. Uma classe de operações que tem apresentado interesse crescente são as chamadas Consultas por Similaridade, sendo as mais conhecidas as consultas por Abrangência (\'R IND. q\') e por k-Vizinhos mais Proximos (kNN), sendo que esta ultima obtem quais são os k elementos armazenados mais similares a um dado elemento de referência. Outra consulta que é interessante tanto para consultas diretas quanto como parte de operações de análises mais complexas e a operação de consulta aos k-Vizinhos mais Próximos Reversos (RkNN). Seu objetivo e obter todos os elementos armazenados que têm um dado elemento de referência como um dos seus k elementos mais similares. Devido a complexidade de execução da operação de RkNN, a grande maioria das soluções existentes restringem-se a dados representados em espaços multidimensionais euclidianos (nos quais estão denidas tambem operações cardinais e topológicas, além de se considerar a similaridade como sendo a distância Euclidiana entre dois elementos), ou então obtém apenas respostas aproximadas, sujeitas a existência de falsos negativos. Várias aplicações de análise de dados científicos, médicos, de engenharia, financeiros, etc. requerem soluções eficientes para o problema da operação de RkNN sobre dados representados em espaços métricos, onde os elementos não podem ser considerados estar em um espaço nem Euclidiano nem multidimensional. Num espaço métrico, além dos próprios elementos armazenados existe apenas uma função de comparação métrica entre pares de objetos. Neste trabalho, são propostas novas podas de espaço de busca e o algoritmo RkNN-MG que utiliza essas novas podas para solucionar o problema de consultas RkNN exatas em espaços métricos sem limitações. Toda a proposta supõe que o conjunto de dados esta em um espaço métrico imerso isometricamente em espaço euclidiano e utiliza propriedades da geometria métrica válida neste espaço para realizar podas eficientes por lei dos cossenos combinada com as podas tradicionais por desigualdade triangular. Os experimentos demonstram comparativamente que as novas podas são mais eficientes que as tradicionais podas por desigualdade triangular, tendo desempenhos equivalente quando comparadas em conjuntos de alta dimensionalidade ou com dimensão fractal alta. Assim, os resultados confirmam as novas podas propostas como soluções alternativas eficientes para o problema de consultas RkNN / Data stored in large databases present an ever increasing complexity, pressing for the development of new classes of query operators. One such class, which is enticing an increasing interest, is the so-called Similarity Queries, where the most common are the similarity range queries (\'R IND. q\') and the k-nearest neighbor queries (kNN). A k-nearest neighbor query aims at retrieving the k stored elements nearer (or more similar) to a given reference element. Another important similarity query is the reverse k-nearest neighbor (RkNN), useful both for queries posed directly by the analyst and for queries that are part of more complex analysis processes. The objective of a reverse k-nearest neighbor queries is obtaining the stored elements that has the query reference element as one of their k-nearest neighbors. As the RkNN operation is a rather expensive operation, from the computational standpoint, most existing solutions only solve the query when applied over Euclidean multidimensional spaces (as these spaces also define cardinal and topological operations besides the Euclidean distance between pairs of elements) or retrieve only approximate answers, where false negatives can occur. Several applications, like the analysis of scientific, medical, engineering or financial data, require efficient and exact answers for the RkNN queries over data which is frequently represented in metric spaces, that is where no other property besides the similarity measure exists. Therefore, for applications handling metrical data, the assumption of Euclidean metric or even multidimensional data cannot be used. In this work, we propose new pruning rules based on the law of cosines, and the RkNN-MG algorithm, which uses them to solve RkNN queries in a way that is exact, faster than the existing approaches, that is not limited for any value of k, and that can be applied both over static and over dynamic datasets. The new pruning rules assume that the data set is in a metric space that can be embedded into an Euclidean space and use metric geometry properties valid in this space to perform effective pruning based on the law of cosines combined with the traditional pruning based on the triangle inequality property. The experiments show that the new pruning rules are alkways more efficient than the traditional pruning rules based solely on the triangle inequality. The experiments show that for high high dimensionality datasets, or for metric datasets with high fractal dimensionality, the performance improvement is smaller than for for lower dimensioinality datasets, but it\'s never worse. Thus, the results confirm that the our pruning rules are efficient alternative to solve RkNN queries in general Consulta por similaridade Espaço númerico Indexação RkNN Vizinhos mais próximos reversos Access method Metric space Reverse k-nearest neighbor RkN N Similarity query
54	Transformação de espaços métricos otimizando a recuperação de imagens por conteúdo e avaliação por análise visual / Metric space transformation optimizing content-based image retrieval and visual analysis evaluation Letrícia Pereira Soares Avalhais 30 January 2012 (has links) O problema da descontinuidade semântica tem sido um dos principais focos de pesquisa no desenvolvimento de sistemas de recuperação de imagens baseada em conteúdo (CBIR). Neste contexto, as pesquisas mais promissoras focam principalmente na inferência de pesos de características contínuos e na seleção de características. Entretanto, os processos tradicionais de inferência de pesos contínuos são computacionalmente caros e a seleção de características equivale a uma ponderação binária. Visando tratar adequadamente o problema de lacuna semântica, este trabalho propõe dois métodos de transformação de espaço de características métricos baseados na inferência de funções de transformação por meio de algoritmo genético. O método WF infere funções de ponderação para ajustar a função de dissimilaridade e o método TF infere funções para transformação das características. Comparados às abordagens de inferência de pesos contínuos da literatura, ambos os métodos propostos proporcionam uma redução drástica do espaço de busca ao limitar a busca à escolha de um conjunto ordenado de funções de transformação. Análises visuais do espaço transformado e de gráficos de precisão vs. revocação confirmam que TF e WF superam a abordagem tradicional de ponderação de características. Adicionalmente, foi verificado que TF supera significativamente WF em termos de precisão dos resultados de consultas por similaridade por permitir transformação não lineares no espaço de característica, conforme constatado por análise visual. / The semantic gap problem has been a major focus of research in the development of content-based image retrieval (CBIR) systems. In this context, the most promising research focus primarily on the inference of continuous feature weights and feature selection. However, the traditional processes of continuous feature weighting are computationally expensive and feature selection is equivalent to a binary weighting. Aiming at alleviating the semantic gap problem, this master dissertation proposes two methods for the transformation of metric feature spaces based on the inference of transformation functions using Genetic Algorithms. The WF method infers weighting functions and the TF method infers transformation functions for the features. Compared to the existing methods, both proposed methods provide a drastic searching space reduction by limiting the search to the choice of an ordered set of transformation functions. Visual analysis of the transformed space and precision. vs. recall graphics confirm that both TF and WF outperform the traditional feature eighting methods. Additionally, we found that TF method significantly outperforms WF regarding the query similarity accuracy by performing non linear feature space transformation, as found in the visual analysis. Algoritmo genético Consultas por similaridade Realimentação de relevância Transformação de espaço métrico Visualização Genetic algorithm Metric space transformation Relevance feedback Similarity queries Visualization
55	Vyhledávání v multimodálních databázích / Multimodal Database Search Krejčíř, Tomáš January 2009 (has links) The field that deals with storing and effective searching of multimedia documents is called Information retrieval. This paper describes solution of effective searching in collections of shots. Multimedia documents are presented as vectors in high-dimensional space, because in such collection of documents it is easier to define semantics as well as the mechanisms of searching. The work aims at problems of similarity searching based on metric space, which uses distance functions, such as Euclidean, Chebyshev or Mahalanobis, for comparing global features and cosine or binary rating for comparing local features. Experiments on the TRECVid dataset compare implemented distance functions. Best distance function for global features appears to be Mahalanobis and for local features cosine rating.
56	An Introduction to Metric Spaces Erickson Andersson, Samuel, Wiman, David January 2022 (has links) In this thesis we start off by ensuring that the reader is up to speed when it comes to some well known definitions and theorems from real analysis. We then introduce the reader to metric spaces and provide them with some examples such as the real numbers with the Euclidean distance, and compact sets with the Hausdorff distance. Then, we go on to define important concepts such as inner points, limit points, open sets, boundary and much more. We also show, whenever we can, how these concepts are connected. With these tools in place we move on to explain how limits and continuity are defined in metric spaces as well as providing the reader with several examples. We then introduce the reader to the concepts of compactness and uniform convergence, for which we show some interesting results such as how uniform convergence and the supremum norm are related. We finish off by covering curves and connectedness (including pathconnectedness) in metric spaces, before we briefly touch on topological spaces as to give the reader a hint of what further mathematics studies might hold. / I detta examensarbete börjar vi med att försäkra oss om att läsaren har de förkunskaper som behövs för att kunna ta del av arbetet. Detta görs genom att påminna läsaren om viktiga definitioner och satser från reell analys. Därefter introducerar vi läsaren till metriska rum och ger en mängd olika exempel på dessa som läsaren förhoppningsvis redan stött på. Detta inkluderar bland annat de reella talen med euklidiskt avstånd och slutna och begränsade mängder med Hausdorff-avstånd. När vi väl förklarat distanskonceptet introducerar vi inre punkter, hopningspunkter, öppna mängder, randpunkter och mycket mer. Vi visar dessutom, närhelst vi kan, hur dessa koncept hänger samman. När alla dessa grundbegrepp är etablerade kan vi fortsätta med att förklara gränsvärden och kontinuitet i metriska rum. Vi ger även läsaren flera exempel på detta. I arbetets andra hälft tar vi upp kompakthet och likformig konvergens, för vilka vi presenterar en del intressanta resultat, såsom hur likformig konvergens och supremumnormen är relaterade. Vi avslutar examensarbetet genom att gå igenom kurvor och sammanhängande mängder (inklusive bågvis sammanhängande mängder) i metriska rum, innan vi kort tar upp topologiska rum för att ge läsaren en föraning om vad vidare matematikstudier kan innehålla. Analysis compact convergence limit metric space sequence set topology Analys följd gränsvärde kompakt konvergens metriskt rum mängd topologi Mathematical Analysis Matematisk analys
57	Neue Indexingverfahren für die Ähnlichkeitssuche in metrischen Räumen über großen Datenmengen / New indexing techniques for similarity search in metric spaces Guhlemann, Steffen 06 July 2016 (has links) (PDF) Ein zunehmend wichtiges Thema in der Informatik ist der Umgang mit Ähnlichkeit in einer großen Anzahl unterschiedlicher Domänen. Derzeit existiert keine universell verwendbare Infrastruktur für die Ähnlichkeitssuche in allgemeinen metrischen Räumen. Ziel der Arbeit ist es, die Grundlage für eine derartige Infrastruktur zu legen, die in klassische Datenbankmanagementsysteme integriert werden könnte. Im Rahmen einer Analyse des State of the Art wird der M-Baum als am besten geeignete Basisstruktur identifiziert. Dieser wird anschließend zum EM-Baum erweitert, wobei strukturelle Kompatibilität mit dem M-Baum erhalten wird. Die Abfragealgorithmen werden im Hinblick auf eine Minimierung notwendiger Distanzberechnungen optimiert. Aufbauend auf einer mathematischen Analyse der Beziehung zwischen Baumstruktur und Abfrageaufwand werden Freiheitsgrade in Baumänderungsalgorithmen genutzt, um Bäume so zu konstruieren, dass Ähnlichkeitsanfragen mit einer minimalen Anzahl an Anfrageoperationen beantwortet werden können. / A topic of growing importance in computer science is the handling of similarity in multiple heterogenous domains. Currently there is no common infrastructure to support this for the general metric space. The goal of this work is lay the foundation for such an infrastructure, which could be integrated into classical data base management systems. After some analysis of the state of the art the M-Tree is identified as most suitable base and enhanced in multiple ways to the EM-Tree retaining structural compatibility. The query algorithms are optimized to reduce the number of necessary distance calculations. On the basis of a mathematical analysis of the relation between the tree structure and the query performance degrees of freedom in the tree edit algorithms are used to build trees optimized for answering similarity queries using a minimal number of distance calculations. Metrik Metrischer Raum Indexing Curse of Dimensionality EM-Baum M-Baum Ähnlichkeitssuche Bereichssuche k-Nächste-Nachbarn-Suche Metric Metric space Indexing Curse of Dimensionality EM-Tree M-Tree Similarity search Range query k-Nearest-Neighbor-Query ddc:004 rvk:ST 270
58	Introduction to some modes of convergence : Theory and applications Bolibrzuch, Milosz January 2017 (has links) This thesis aims to provide a brief exposition of some chosen modes of convergence; namely uniform convergence, pointwise convergence and L1 convergence. Theoretical discussion is complemented by simple applications to scientific computing. The latter include solving differential equations with various methods and estimating the convergence, as well as modelling problematic situations to investigate odd behaviors of usually convergent methods. Modes of convergence uniform convergence pointwise convergence L1 convergence numerical methods differential equations big O notation central difference method topological space metric space bump function support Computational Mathematics Beräkningsmatematik Mathematical Analysis Matematisk analys
59	Indexação de dados em domínios métricos generalizáveis / Indexing complex data in Generic Metric Domains. Pola, Ives Renê Venturini 10 June 2005 (has links) Os sistemas Gerenciadores de Bases de Dados (SGBDs) foram desenvolvidos para manipular domínios de dados numéricos e/ou pequenas seqüencias de caracteres (palavras) e não foram projetados prevendo a manipulação de dados complexos, como por exemplo dados multimídia. Os operadores em domínios de dados que requisitam a relação de ordem têm pouca utilidade para manipular operações que envolvem dados complexos. Uma classe de operadores que se adequa melhor para manipular esses dados são os operadores por similaridade: consulta por abrangência (``range queries') e consulta de vizinhos mais próximos (``k-nearest neighbor queries'). Embora muitos resultados já tenham sido obtidos na elaboração de algoritmos de busca por similaridade, todos eles consideram uma única função para a medida de similaridade, que deve ser universalmente aplicável a todos os pares de elementos do conjunto de dados. Este projeto propõe explorar a possibilidade de trabalhar com estruturas de dados concebidas dentro dos conceitos de dados em domínios métricos, mas que admitam o uso de uma função de distância adaptável, ou seja, que mude para determinados grupos de objetos, dependendo de algumas características universais, e assim permitindo acomodar características que sejam particulares a algumas classes de imagens e não de todo o conjunto delas, classificando as imagens em uma hierarquia de tipos, onde cada tipo está associado a uma função de distância diferente e vetores de características diferentes, todos indexados numa mesma árvore. / The DBMS were developed to manipulate data in numeric domains and short strings, not considering the manipulation of complex data, like multimidia data. The operators em data domain which requests for the total order property have no use to handle complex data. An operator class that fit well to handle this type of data are the similarity operators: range query and nearest neighbor query. Although many results have been shown in research to answer similarity queries, all use only one distance function to measure the similarity, which must be applicable to all pairs of elements of the set. The goal of this work is to explore the possibility of deal with complex data in metric domains, that uses a suitable distance function, that changes its behavior for certain groups of data, depending of some universal features, allowing them to use specific features of some classes of data, not shared for the entire set. This flexibility will allow to reduce the set of useful features of each element in the set individually, relying in the values obtainded for one or few features extracted in first place. This values will guide the others important features to extract from data. access methods domínio métrico generalizável espaço métrico estruturas de indexação métricas Generic Metric Domain métodos de acesso metric access methods Metric space múltiplas características múltiplas funções de distância multiple distance functions multiple features
60	Indexação de dados em domínios métricos generalizáveis / Indexing complex data in Generic Metric Domains. Ives Renê Venturini Pola 10 June 2005 (has links) Os sistemas Gerenciadores de Bases de Dados (SGBDs) foram desenvolvidos para manipular domínios de dados numéricos e/ou pequenas seqüencias de caracteres (palavras) e não foram projetados prevendo a manipulação de dados complexos, como por exemplo dados multimídia. Os operadores em domínios de dados que requisitam a relação de ordem têm pouca utilidade para manipular operações que envolvem dados complexos. Uma classe de operadores que se adequa melhor para manipular esses dados são os operadores por similaridade: consulta por abrangência (``range queries') e consulta de vizinhos mais próximos (``k-nearest neighbor queries'). Embora muitos resultados já tenham sido obtidos na elaboração de algoritmos de busca por similaridade, todos eles consideram uma única função para a medida de similaridade, que deve ser universalmente aplicável a todos os pares de elementos do conjunto de dados. Este projeto propõe explorar a possibilidade de trabalhar com estruturas de dados concebidas dentro dos conceitos de dados em domínios métricos, mas que admitam o uso de uma função de distância adaptável, ou seja, que mude para determinados grupos de objetos, dependendo de algumas características universais, e assim permitindo acomodar características que sejam particulares a algumas classes de imagens e não de todo o conjunto delas, classificando as imagens em uma hierarquia de tipos, onde cada tipo está associado a uma função de distância diferente e vetores de características diferentes, todos indexados numa mesma árvore. / The DBMS were developed to manipulate data in numeric domains and short strings, not considering the manipulation of complex data, like multimidia data. The operators em data domain which requests for the total order property have no use to handle complex data. An operator class that fit well to handle this type of data are the similarity operators: range query and nearest neighbor query. Although many results have been shown in research to answer similarity queries, all use only one distance function to measure the similarity, which must be applicable to all pairs of elements of the set. The goal of this work is to explore the possibility of deal with complex data in metric domains, that uses a suitable distance function, that changes its behavior for certain groups of data, depending of some universal features, allowing them to use specific features of some classes of data, not shared for the entire set. This flexibility will allow to reduce the set of useful features of each element in the set individually, relying in the values obtainded for one or few features extracted in first place. This values will guide the others important features to extract from data. domínio métrico generalizável espaço métrico estruturas de indexação métricas métodos de acesso múltiplas características múltiplas funções de distância access methods Generic Metric Domain metric access methods Metric space multiple distance functions multiple features

Search results