Global ETD Search

31	New LSH-based Algorithm for Approximate Nearest Neighbor Andoni, Alexandr, Indyk, Piotr 04 November 2005 (has links) We present an algorithm for c-approximate nearest neighbor problem in a d-dimensional Euclidean space, achieving query time ofO(dn^{1/c^2+o(1)}) and space O(dn + n^{1+1/c^2+o(1)}). AI locality sensitive hashing nearest neighbor high dimensions
32	Finite state automaton construction through regular expression hashing Coetser, Rayner Johannes Lodewikus 25 August 2010 (has links) In this study, the regular expressions forming abstract states in Brzozowski’s algorithm are not remapped to sequential state transition table addresses as would be the case in the classical approach, but are hashed to integers. Two regular expressions that are hashed to the same hash code are assigned the same integer address in the state transition table, reducing the number of states in the automaton. This reduction does not necessarily lead to the construction of a minimal automaton: no restrictions are placed on the hash function hashing two regular expressions to the same code. Depending on the quality of the hash function, a super-automaton, previously referred to as an approximate automaton, or an exact automaton can be constructed. When two regular expressions are hashed to the same state, and they do not represent the same regular language, a super-automaton is constructed. A super-automaton accepts the regular language of the input regular expression, in addition to some extra strings. If the hash function is bad, many regular expressions that do not represent the same regular language will be hashed together, resulting in a smaller automaton that accepts extra strings. In the ideal case, two regular expressions will only be hashed together when they represent the same regular language. In this case, an exact minimal automaton will be constructed. It is shown that, using the hashing approach, an exact or super-automaton is always constructed. Another outcome of the hashing approach is that a non-deterministic automaton may be constructed. A new version of the hashing version of Brzozowski’s algorithm is put forward which constructs a deterministic automaton. A method is also put forward for measuring the difference between an exact and a super-automaton: this takes the form of the k-equivalence measure: the k-equivalence measure measures the number of characters up to which the strings of two regular expressions are equal. The better the hash function, the higher the value of k, up to the point where the hash function results in regular expressions being hashed together if and only if they have the same regular language. Using the k-equivalence measure, eight generated hash functions and one hand coded hash function are evaluated for a large number of short regular expressions, which are generated using G¨odel numbers. The k-equivalence concept is extended to the average k-equivalence value in order to evaluate the hash functions for longer regular expressions. The hand coded hash function is found to produce good results. Copyright / Dissertation (MEng)--University of Pretoria, 2009. / Computer Science / unrestricted Regular expression hashing Finite state Super-automaton Approximate automaton UCTD
33	Learning hash codes for multimedia retrieval Chen, Junjie 28 August 2019 (has links) The explosive growth of multimedia data in online media repositories and social networks has led to the high demand of fast and accurate services for large-scale multimedia retrieval. Hashing, due to its effectiveness in coding high-dimensional data into a low-dimensional binary space, has been considered to be effective for the retrieval application. Despite the progress that has been made recently, how to learn the optimal hashing models which can make the best trade-off between the retrieval efficiency and accuracy remains to be open research issues. This thesis research aims to develop hashing models which are effective for image and video retrieval. An unsupervised hashing model called APHash is first proposed to learn hash codes for images by exploiting the distribution of data. To reduce the underlying computational complexity, a methodology that makes use of an asymmetric similarity matrix is explored and found effective. In addition, the deep learning approach to learn hash codes for images is also studied. In particular, a novel deep model called DeepQuan which tries to incorporate product quantization methods into an unsupervised deep model for the learning. Other than adopting only the quadratic loss as the optimization objective like most of the related deep models, DeepQuan optimizes the data representations and their quantization codebooks to explores the clustering structure of the underlying data manifold where the introduction of a weighted triplet loss into the learning objective is found to be effective. Furthermore, the case with some labeled data available for the learning is also considered. To alleviate the high training cost (which is especially crucial given a large-scale database), another hashing model named Similarity Preserving Deep Asymmetric Quantization (SPDAQ) is proposed for both image and video retrieval where the compact binary codes and quantization codebooks for all the items in the database can be explicitly learned in an efficient manner. All the aforementioned hashing methods proposed have been rigorously evaluated using benchmark datasets and found to outperform the related state-of-the-art methods.
34	An Experimental Fast Approach of Self-collision Handling in Cloth Simulation Using GPU Jichun Zheng (10719285) 01 June 2021 (has links) <p>This study describes a fast approach using GPU to process self-collision in cloth animation without significant compromise in physical accuracy. The proposed fast approach is built and works effectively on a modification of Mass Spring Model which is seen in a variety of cloth simulation study. Instead of using hierarchical data structure which needs to be updated each frame, this fast approach adopts a spatial hashing technique which virtually partitions the space where the cloth object locates into small cubes and stores the information of the particles being held in the cells with an integer array. With the data of the particles and the cells holding information of the particles, self-collision detection can be processed in a very limited cost in each thread launched in GPU regardless of the increase in the amount of particles. This method is capable of visualizing self-collision detection and response in real time with limited cost in accessing memory on the GPU. </p> <p>The idea of the proposed fast approach is extremely straightforward, however, the amount of memory which is needed to be consumed by this method is its weakness. Also, this method sacrifices physical accuracy in exchange for the performance.</p> Computer Graphics Cloth Simulation Self-collision GPU Spatial Hashing
35	Towards Learning Compact Visual Embeddings using Deep Neural Networks January 2019 (has links) abstract: Feature embeddings differ from raw features in the sense that the former obey certain properties like notion of similarity/dissimilarity in it's embedding space. word2vec is a preeminent example in this direction, where the similarity in the embedding space is measured in terms of the cosine similarity. Such language embedding models have seen numerous applications in both language and vision community as they capture the information in the modality (English language) efficiently. Inspired by these language models, this work focuses on learning embedding spaces for two visual computing tasks, 1. Image Hashing 2. Zero Shot Learning. The training set was used to learn embedding spaces over which similarity/dissimilarity is measured using several distance metrics like hamming / euclidean / cosine distances. While the above-mentioned language models learn generic word embeddings, in this work task specific embeddings were learnt which can be used for Image Retrieval and Classification separately. Image Hashing is the task of mapping images to binary codes such that some notion of user-defined similarity is preserved. The first part of this work focuses on designing a new framework that uses the hash-tags associated with web images to learn the binary codes. Such codes can be used in several applications like Image Retrieval and Image Classification. Further, this framework requires no labelled data, leaving it very inexpensive. Results show that the proposed approach surpasses the state-of-art approaches by a significant margin. Zero-shot classification is the task of classifying the test sample into a new class which was not seen during training. This is possible by establishing a relationship between the training and the testing classes using auxiliary information. In the second part of this thesis, a framework is designed that trains using the handcrafted attribute vectors and word vectors but doesn’t require the expensive attribute vectors during test time. More specifically, an intermediate space is learnt between the word vector space and the image feature space using the hand-crafted attribute vectors. Preliminary results on two zero-shot classification datasets show that this is a promising direction to explore. / Dissertation/Thesis / Masters Thesis Computer Engineering 2019 Computer science Image Embeddings Image Hashing Weakly Supervised Learning
36	Salient Index for Similarity Search Over High Dimensional Vectors Lu, Yangdi January 2018 (has links) The approximate nearest neighbor(ANN) search over high dimensional data has become an unavoidable service for online applications. Fast and high-quality results of unknown queries are the largest challenge that most algorithms faced with. Locality Sensitive Hashing(LSH) is a well-known ANN search algorithm while suffers from inefficient index structure, poor accuracy in distributed scheme. The traditional index structures have most significant bits(MSB) problem, which is their indexing strategies have an implicit assumption that the bits from one direction in the hash value have higher priority. In this thesis, we propose a new content-based index called Random Draw Forest(RDF), which not only uses an adaptive tree structure by applying the dynamic length of compound hash functions to meet the different cardinality of data, but also applies the shuffling permutations to solve the MSB problem in the traditional LSH-based index. To raise the accuracy in the distributed scheme, we design a variable steps lookup strategy to search the multiple step sub-indexes which are most likely to hold the mistakenly partitioned similar objects. By analyzing the index, we show that RDF has a higher probability to retrieve the similar objects compare to the original index structure. In the experiment, we first learn the performance of different hash functions, then we show the effect of parameters in RDF and the performance of RDF compared with other LSH-based methods to meet the ANN search. / Thesis / Master of Science (MSc) approximate nearest neighbor locality sensitive hashing index structure searching strategy
37	[en] AN EXPERIMENTAL EVALUATION OF CONSISTENT HASHING WITH BOUNDED LOADS IN ONLINE VIDEO DISTRIBUTION / [pt] UMA AVALIAÇÃO EXPERIMENTAL DE HASHING CONSISTENTE COM CARGAS LIMITADAS NA DISTRIBUIÇÃO DE VÍDEOS ONLINE BERNARDO DE CAMPOS VIDAL CAMILO 14 December 2018 (has links) [pt] O consumo de vídeos representa grande parte do tráfego na Internet hoje e tende a aumentar ainda mais nos próximos anos. Neste trabalho, investigamos formas de aprimorar o caching em redes de distribuição de conteúdo (Content Delivery Networks - CDNs) de vídeo para reduzir o tempo de resposta das mesmas e aumentar a qualidade de experiência dos usuários. A partir da análise de diferentes técnicas, concluímos que o hashing consistente com cargas limitadas possui características interessantes para esse fim e se encaixa adequadamente ao cenário de distribuição de vídeos. Para verificar o seu desempenho, criamos uma plataforma de experimentação e, usando dados de uma CDN de vídeos real, o confrontamos com o hashing consistente e com o método de balanceamento least connections, todos implementados de maneira equivalente para permitir uma comparação justa. Por fim, discutimos os resultados dessa avaliação, destacando os benefícios e limitações dessa técnica no contexto considerado. / [en] Video consumption accounts for a large part of Internet traffic today and tends to increase further in the next years. In this work, we investigate ways to improve caching in video content delivery networks (CDNs) to reduce their response time and increase the users quality of experience. From the analysis of different techniques, we concluded that consistent hashing with bounded loads has interesting characteristics for this purpose and fits adequately to the video delivery scenario. In order to verify its performance, we created an experimentation platform and, using data from a real video CDN, confronted it with the consistent hashing and the least connections balancing method, all implemented in an equivalent manner to permit a fair comparison. Lastly, we discussed the results of this evaluation, highlighting the benefits and limitations of this technique in the considered context. [pt] BALANCEAMENTO DE CARGA [en] WORKLOAD BALANCING [pt] CACHING [en] CACHING [pt] HASHING [en] HASHING [pt] DISTRIBUICAO DE VIDEOS ONLINE [en] ONLINE VIDEO DISTRIBUTION
38	Metric space indexing for nearest neighbor search in multimedia context : Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia / Indexação de espaços métricos para busca de vizinho mais próximo em contexto multimídia Silva, Eliezer de Souza da, 1988- 26 August 2018 (has links) Orientador: Eduardo Alves do Valle Junior / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-26T08:10:33Z (GMT). No. of bitstreams: 1 Silva_EliezerdeSouzada_M.pdf: 2350845 bytes, checksum: dd31928bd19312563101a08caea74d63 (MD5) Previous issue date: 2014 / Resumo: A crescente disponibilidade de conteúdo multimídia é um desafio para a pesquisa em Recuperação de Informação. Usuários querem não apenas ter acesso aos documentos multimídia, mas também obter semântica destes documentos, de modo que a capacidade de encontrar um conteúdo específico em grandes coleções de documentos textuais e não textuais é fundamental. Nessas grandes escalas, sistemas de informação multimídia de recuperação devem contar com a capacidade de executar a busca por semelhança de forma eficiente. No entanto, documentos multimídia são muitas vezes representados por descritores multimídia representados por vetores de alta dimensionalidade, ou por outras representações complexas em espaços métricos. Fornecer a possibilidade de uma busca por similaridade eficiente para esse tipo de dados é extremamente desafiador. Neste projeto, vamos explorar uma das famílias mais citado de soluções para a busca de similaridade, o Hashing Sensível à Localidade (LSH - Locality-sensitive Hashing em inglês), que se baseia na criação de funções de hash que atribuem, com maior probabilidade, a mesma chave para os dados que são semelhantes. O LSH está disponível apenas para um punhado funções de distância, mas, quando disponíveis, verificou-se ser extremamente eficiente para arquiteturas com custo de acesso uniforme aos dados. A maioria das funções LSH existentes são restritas a espaços vetoriais. Propomos dois métodos novos para o LSH, generalizando-o para espaços métricos quaisquer utilizando particionamento métrico (centróides aleatórios e k-medoids). Apresentamos uma comparação com os métodos LSH bem estabelecidos em espaços vetoriais e com os últimos concorrentes novos métodos para espaços métricos. Desenvolvemos uma modelagem teórica do comportamento probalístico dos algoritmos propostos e demonstramos algumas relações e limitantes para a probabilidade de colisão de hash. Dentre os algoritmos propostos para generelizar LSH para espaços métricos, esse desenvolvimento teórico é novo. Embora o problema seja muito desafiador, nossos resultados demonstram que ela pode ser atacado com sucesso. Esta dissertação apresentará os desenvolvimentos do método, a formulação teórica e a discussão experimental dos métodos propostos / Abstract: The increasing availability of multimedia content poses a challenge for information retrieval researchers. Users want not only have access to multimedia documents, but also make sense of them --- the ability of finding specific content in extremely large collections of textual and non-textual documents is paramount. At such large scales, Multimedia Information Retrieval systems must rely on the ability to perform search by similarity efficiently. However, Multimedia Documents are often represented by high-dimensional feature vectors, or by other complex representations in metric spaces. Providing efficient similarity search for that kind of data is extremely challenging. In this project, we explore one of the most cited family of solutions for similarity search, the Locality-Sensitive Hashing (LSH), which is based upon the creation of hashing functions which assign, with higher probability, the same key for data that are similar. LSH is available only for a handful distance functions, but, where available, it has been found to be extremely efficient for architectures with uniform access cost to the data. Most existing LSH functions are restricted to vector spaces. We propose two novel LSH methods (VoronoiLSH and VoronoiPlex LSH) for generic metric spaces based on metric hyperplane partitioning (random centroids and K-medoids). We present a comparison with well-established LSH methods in vector spaces and with recent competing new methods for metric spaces. We develop a theoretical probabilistic modeling of the behavior of the proposed algorithms and show some relations and bounds for the probability of hash collision. Among the algorithms proposed for generalizing LSH for metric spaces, this theoretical development is new. Although the problem is very challenging, our results demonstrate that it can be successfully tackled. This dissertation will present the developments of the method, theoretical and experimental discussion and reasoning of the methods performance / Mestrado / Engenharia de Computação / Mestre em Engenharia Elétrica Método K-vizinho mais próximo Hashing (Computação) Estruturas de dados (Computação) k-nearest neighbor Hashing (Computer science) Data structures (Computer)
39	Novos algoritmos de simulação estocástica com atraso para redes gênicas Silva, Camillo de Lellis Falcão da 22 May 2014 (has links) Submitted by Renata Lopes (renatasil82@gmail.com) on 2017-06-06T13:42:32Z No. of bitstreams: 1 camillodelellisfalcaodasilva.pdf: 1420414 bytes, checksum: f38c14f74131ea594b1e105fbfdb1619 (MD5) / Approved for entry into archive by Adriana Oliveira (adriana.oliveira@ufjf.edu.br) on 2017-06-06T14:07:56Z (GMT) No. of bitstreams: 1 camillodelellisfalcaodasilva.pdf: 1420414 bytes, checksum: f38c14f74131ea594b1e105fbfdb1619 (MD5) / Made available in DSpace on 2017-06-06T14:07:56Z (GMT). No. of bitstreams: 1 camillodelellisfalcaodasilva.pdf: 1420414 bytes, checksum: f38c14f74131ea594b1e105fbfdb1619 (MD5) Previous issue date: 2014-05-22 / CAPES - Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / Atualmente, a eficiência dos algoritmos de simulação estocástica para a simulação de redes de regulação gênica (RRG) tem motivado diversos trabalhos científicos. O interesse por tais algoritmos deve-se ao fato de as novas tecnologias em biologia celular — às vezes chamadas de tecnologias de alto rendimento (high throughput technology cell biology) — te-rem mostrado que a expressão gênica é um processo estocástico. Em RRG com atrasos, os algoritmos para simulação estocástica existentes possuem problemas — como crescimento linear da complexidade assintótica, descarte excessivo de números aleatórios durante a si-mulação e grande complexidade de codificação em linguagens de programação — que podem resultar em um baixo desempenho em relação ao tempo de processamento de simulação de uma RRG. Este trabalho apresenta um algoritmo para simulação estocástica que foi chamado de método da próxima reação simplificado (SNRM). Esse algoritmo mostrou-se mais eficiente que as outras abordagens existentes para simulações estocásticas realizadas com as RRGs com atrasos. Além do SNRM, um novo grafo de dependências para reações com atrasos também é apresentado. A utilização desse novo grafo, que foi nomeado de delayed dependency graph (DDG), aumentou consideravelmente a eficiência de todas as versões dos algoritmos de simulação estocástica com atrasos apresentados nesse trabalho. Finalmente, uma estrutura de dados que recebeu o nome de lista ordenada por hashing é utilizada para tratar a lista de produtos em espera em simulações de RRGs com atrasos. Essa estrutura de dados também se mostrou mais eficiente que uma heap em todas as simulações testadas. Com todas as melhorias mencionadas, este trabalho apresenta um conjunto de estratégias que contribui de forma efetiva para o desempenho dos algoritmos de simulação estocástica com atrasos de redes de regulação gênica. / Recently, the time efficiency of stochastic simulation algorithms for gene regulatory networks (GRN) has motivated several scientific works. Interest in such algorithms is because the new technologies in cell biology — called high-throughput technologies cell biology — have shown that gene expression is a stochastic process. In GRN with delays, the existing algorithms for stochastic simulation have some drawbacks — such as linear growth of complexity, excessive discard of random numbers, and the coding in a programming language can be hard — that result in poor performance during the simulation of very large GRN. This work presents an algorithm for stochastic simulation of GRN. We called it simplified next reaction method (SNRM). This algorithm was more efficient than other existing algorithms for stochastically simulation of GRN with delays. Besides SNRM, a new dependency graph for delayed reactions is also presented. The use of this new graph, which we named it delayed dependency graph (DDG), greatly increased the efficiency of all versions of the algorithms for stochastic simulation with delays presented in this work. Finally, a data structure that we named hashing sorted list is used to handle the waiting list of products in simulations of GRN with delays. This data structure was also more efficient than a heap in all tested simulations. With all the improvements mentioned, this work presents a set of strategies that contribute effectively to increasing performance of stochastic simulation algorithms with delays for gene regulatory networks. SSA DSSA Simulação estocástica SNRM DSNRM DDG Lista ordenada por hashing SSA DSSA Stochastic Simulation SNRM DSNRM DDG Hashing Sorted List
40	Cache-Efficient Aggregation: Hashing Is Sorting Müller, Ingo, Sanders, Peter, Lacurie, Arnaud, Lehner, Wolfgang, Färber, Franz 14 June 2022 (has links) For decades researchers have studied the duality of hashing and sorting for the implementation of the relational operators, especially for efficient aggregation. Depending on the underlying hardware and software architecture, the specifically implemented algorithms, and the data sets used in the experiments, different authors came to different conclusions about which is the better approach. In this paper we argue that in terms of cache efficiency, the two paradigms are actually the same. We support our claim by showing that the complexity of hashing is the same as the complexity of sorting in the external memory model. Furthermore we make the similarity of the two approaches obvious by designing an algorithmic framework that allows to switch seamlessly between hashing and sorting during execution. The fact that we mix hashing and sorting routines in the same algorithmic framework allows us to leverage the advantages of both approaches and makes their similarity obvious. On a more practical note, we also show how to achieve very low constant factors by tuning both the hashing and the sorting routines to modern hardware. Since we observe a complementary dependency of the constant factors of the two routines to the locality of the input, we exploit our framework to switch to the faster routine where appropriate. The result is a novel relational aggregation algorithm that is cache-efficient---independently and without prior knowledge of input skew and output cardinality---, highly parallelizable on modern multi-core systems, and operating at a speed close to the memory bandwidth, thus outperforming the state-of-the-art by up to 3.7x. info:eu-repo/classification/ddc/004 ddc:004

Search results