Global ETD Search

1	Spatial Range Querying for Gaussian-Based Imprecise Query Objects Ishikawa, Yoshiharu, Iijima, Yuichi, Yu, Jeffrey Xu 03 1900 (has links) No description available. imprecise locations spatial range queries Gaussian distributions
2	Secure and efficient query processing in outsourced databases Bogatov, Dmytro 16 September 2022 (has links) As organizations struggle with processing vast amounts of information, outsourcing sensitive data to third parties becomes a necessity. Various cryptographic techniques are used in outsourced database systems to ensure data privacy while allowing for efficient querying. This thesis proposes a definition and components of a new secure and efficient outsourced database system, which answers various types of queries, with different privacy guarantees in different security models. This work starts with the survey of five order-preserving and order-revealing encryption schemes that can be used directly in many database indices, such as the B+ tree, and five range query protocols with various tradeoffs in terms of security and efficiency. The survey systematizes the state-of-the-art range query solutions in a snapshot adversary setting and offers some non-obvious observations regarding the efficiency of the constructions. The thesis then proceeds with Epsolute - an efficient range query engine in a persistent adversary model. In Epsolute, security is achieved in a setting with a much stronger adversary where she can continuously observe everything on the server, and leaking even the result size can enable a reconstruction attack. Epsolute proposes a definition, construction, analysis, and experimental evaluation of a system that provably hides both access pattern and communication volume while remaining efficient. The dissertation concludes with k-anon - a secure similarity search engine in a snapshot adversary model. The work presents a construction in which the security of kNN queries is achieved similarly to OPE / ORE solutions - encrypting the input with an approximate Distance Comparison Preserving Encryption scheme so that the inputs, the points in a hyperspace, are perturbed, but the query algorithm still produces accurate results. Analyzing the solution, we run a series of experiments to observe the tradeoff between search accuracy and attack effectiveness. We use TREC datasets and queries for the search, and track the rank quality metrics such as MRR and nDCG. For the attacks, we build an LSTM model that trains on the correlation between a sentence and its embedding and then predicts words from the embedding. We conclude on viability and practicality of the solution. Read more Computer science Differential privacy ORAM ORE Secure range queries
3	Space-Efficient Data Structures in the Word-RAM and Bitprobe Models Nicholson, Patrick 06 August 2013 (has links) This thesis studies data structures in the word-RAM and bitprobe models, with an emphasis on space efficiency. In the word-RAM model of computation the space cost of a data structure is measured in terms of the number of w-bit words stored in memory, and the cost of answering a query is measured in terms of the number of read, write, and arithmetic operations that must be performed. In the bitprobe model, like the word-RAM model, the space cost is measured in terms of the number of bits stored in memory, but the query cost is measured solely in terms of the number of bit accesses, or probes, that are performed. First, we examine the problem of succinctly representing a partially ordered set, or poset, in the word-RAM model with word size Theta(lg n) bits. A succinct representation of a combinatorial object is one that occupies space matching the information theoretic lower bound to within lower order terms. We show how to represent a poset on n vertices using a data structure that occupies n^2/4 + o(n^2) bits, and can answer precedence (i.e., less-than) queries in constant time. Since the transitive closure of a directed acyclic graph is a poset, this implies that we can support reachability queries on an arbitrary directed graph in the same space bound. As far as we are aware, this is the first representation of an arbitrary directed graph that supports reachability queries in constant time, and stores less than n choose 2 bits. We also consider several additional query operations. Second, we examine the problem of supporting range queries on strings of n characters (or, equivalently, arrays of n elements) in the word-RAM model with word size Theta(lg n) bits. We focus on the specific problem of answering range majority queries: i.e., given a range, report the character that is the majority among those in the range, if one exists. We show that these queries can be supported in constant time using a linear space (in words) data structure. We generalize this result in several directions, considering various frequency thresholds, geometric variants of the problem, and dynamism. These results are in stark contrast to recent work on the similar range mode problem, in which the query operation asks for the mode (i.e., most frequent) character in a given range. The current best data structures for the range mode problem take soft-Oh(n^(1/2)) time per query for linear space data structures. Third, we examine the deterministic membership (or dictionary) problem in the bitprobe model. This problem asks us to store a set of n elements drawn from a universe [1,u] such that membership queries can be always answered in t bit probes. We present several new fully explicit results for this problem, in particular for the case when n = 2, answering an open problem posed by Radhakrishnan, Shah, and Shannigrahi [ESA 2010]. We also present a general strategy for the membership problem that can be used to solve many related fundamental problems, such as rank, counting, and emptiness queries. Finally, we conclude with a list of open problems and avenues for future work. Read more data structures space efficient word-RAM bitprobe partial order poset membership range queries Computer Science
4	Space-Efficient Data Structures in the Word-RAM and Bitprobe Models Nicholson, Patrick 06 August 2013 (has links) This thesis studies data structures in the word-RAM and bitprobe models, with an emphasis on space efficiency. In the word-RAM model of computation the space cost of a data structure is measured in terms of the number of w-bit words stored in memory, and the cost of answering a query is measured in terms of the number of read, write, and arithmetic operations that must be performed. In the bitprobe model, like the word-RAM model, the space cost is measured in terms of the number of bits stored in memory, but the query cost is measured solely in terms of the number of bit accesses, or probes, that are performed. First, we examine the problem of succinctly representing a partially ordered set, or poset, in the word-RAM model with word size Theta(lg n) bits. A succinct representation of a combinatorial object is one that occupies space matching the information theoretic lower bound to within lower order terms. We show how to represent a poset on n vertices using a data structure that occupies n^2/4 + o(n^2) bits, and can answer precedence (i.e., less-than) queries in constant time. Since the transitive closure of a directed acyclic graph is a poset, this implies that we can support reachability queries on an arbitrary directed graph in the same space bound. As far as we are aware, this is the first representation of an arbitrary directed graph that supports reachability queries in constant time, and stores less than n choose 2 bits. We also consider several additional query operations. Second, we examine the problem of supporting range queries on strings of n characters (or, equivalently, arrays of n elements) in the word-RAM model with word size Theta(lg n) bits. We focus on the specific problem of answering range majority queries: i.e., given a range, report the character that is the majority among those in the range, if one exists. We show that these queries can be supported in constant time using a linear space (in words) data structure. We generalize this result in several directions, considering various frequency thresholds, geometric variants of the problem, and dynamism. These results are in stark contrast to recent work on the similar range mode problem, in which the query operation asks for the mode (i.e., most frequent) character in a given range. The current best data structures for the range mode problem take soft-Oh(n^(1/2)) time per query for linear space data structures. Third, we examine the deterministic membership (or dictionary) problem in the bitprobe model. This problem asks us to store a set of n elements drawn from a universe [1,u] such that membership queries can be always answered in t bit probes. We present several new fully explicit results for this problem, in particular for the case when n = 2, answering an open problem posed by Radhakrishnan, Shah, and Shannigrahi [ESA 2010]. We also present a general strategy for the membership problem that can be used to solve many related fundamental problems, such as rank, counting, and emptiness queries. Finally, we conclude with a list of open problems and avenues for future work. Read more data structures space efficient word-RAM bitprobe partial order poset membership range queries Computer Science
5	Avaliação da qualidade de funções de similaridade no contexto de consultas por abrangência / Quality evaluation of similarity functions for range queries Stasiu, Raquel Kolitski January 2007 (has links) Em sistemas reais, os dados armazenados tipicamente apresentam inconsistências causadas por erros de gra a, abreviações, caracteres trocados, entre outros. Isto faz com que diferentes representações do mesmo objeto do mundo real sejam registrados como elementos distintos, causando um problema no momento de consultar os dados. Portanto, o problema investigado nesta tese refere-se às consultas por abrangência, que procuram encontrar objetos que representam o mesmo objeto real consultado . Esse tipo de consulta não pode ser processado por coincidência exata, necessitando de um mecanismo de consulta com suporte à similaridade. Para cada consulta submetida a uma determinada coleção, a função de similaridade produz um ranking dos elementos dessa coleção ordenados pelo valor de similaridade entre cada elemento e o objeto consulta. Como somente os elementos que são variações do objeto consulta são relevantes e deveriam ser retornados, é necessário o uso de um limiar para delimitar o resultado. O primeiro desa o das consultas por abrangência é a de nição do limiar. Geralmente é o especialista humano que faz a estimativa manualmente através da identi - cação de elementos relevantes e irrelevantes para cada consulta e em seguida, utiliza uma medida como revocação e precisão (R&P). A alta dependência do especialista humano di culta o uso de consultas por abrangência na prática, principalmente em grandes coleções. Por esta razão, o método apresentado nesta tese tem por objetivo estimar R&P para vários limiares com baixa dependência do especialista humano. Como um sub-produto do método, também é possível selecionar o limiar mais adequado para uma função sobre uma determinada coleção. Considerando que as funções de similaridade são imperfeitas e que apresentam níveis diferentes de qualidade, é necessário avaliar a função de similaridade para cada coleção, pois o resultado é dependente dos dados. Um limiar para uma coleção pode ser totalmente inadequado para outra coleção, embora utilizando a mesma função de similaridade. Como forma de medir a qualidade de funções de similaridade no contexto de consultas por abrangência, esta tese apresenta a discernibilidade. Trata-se de uma medida que de ne a habilidade da função de similaridade de separar elementos relevantes e irrelevantes. Comparando com a precisão média, a discernibilidade captura variações que não são percebidas pela precisão média, o que mostra que a discernibilidade é mais apropriada para consultas por abrangência. Uma extensa avaliação experimental usando dados reais mostra a viabilidade tanto do método de estimativas como da medida de discernibilidade para consultas por abrangência. / In real systems, stored data typically have inconsistencies caused by typing errors, abbreviations, transposed characters, amongst others. For this reason, di erent representations of the same real world object are stored as distinct elements, causing problems during query processing. In this sense, this thesis investigates range queries which nd objects that represent the same real world object being queried . This type of query cannot be processed by exact matching, thus requiring the support for querying by similarity. For each query submitted to a given collection, the similarity function produces a ranked list of all elements in this collection. This ranked list is sorted decreasingly by the similarity score value with the query object. Only the variations of the query object should be part of the result as only those items are relevant. For this reason, it is necessary to apply a threshold value to properly split the ranking. The rst challenge of range queries is the de nition of a proper threshold. Usually, a human specialist makes the estimation manually through the identi cation of relevant and irrelevant elements for each query. Then, he/she uses measures such as recall and precision (R&P). The high dependency on the human specialist is the main di culty related to use of range queries in real situations, specially for large collections. In this sense, the method presented in this thesis has the objective of estimating R&P at several thresholds with low human intervention. As a by-product of this method, it is possible to select the optimal threshold for a similarity function in a given collection. Considering the fact that the similarity functions are imperfect and vary in quality, it is necessary to evaluate the similarity function for each collection as the result is domain dependent. A threshold value for a collection could be totally inappropriate for another, even though the same similarity function is applied. As a measure of quality of similarity functions for range queries, this thesis introduces discernability. This is a measure to quantify the ability of the similarity function in separating relevant and irrelevant elements. Comparing discernability and mean average precision, the rst one can capture variations that are not noticed by precision-based measures. This property shows that discernability presents better results for evaluating similarity functions for range queries. An extended experimental evaluation using real data shows the viability of both, the estimation method and the discernability measure, applied to range queries. Read more Recuperacao : Informacao Métricas : Similaridade Banco : Dados Quality evaluation Similarity queries Similarity function Range queries Threshold estimation
6	Avaliação da qualidade de funções de similaridade no contexto de consultas por abrangência / Quality evaluation of similarity functions for range queries Stasiu, Raquel Kolitski January 2007 (has links) Em sistemas reais, os dados armazenados tipicamente apresentam inconsistências causadas por erros de gra a, abreviações, caracteres trocados, entre outros. Isto faz com que diferentes representações do mesmo objeto do mundo real sejam registrados como elementos distintos, causando um problema no momento de consultar os dados. Portanto, o problema investigado nesta tese refere-se às consultas por abrangência, que procuram encontrar objetos que representam o mesmo objeto real consultado . Esse tipo de consulta não pode ser processado por coincidência exata, necessitando de um mecanismo de consulta com suporte à similaridade. Para cada consulta submetida a uma determinada coleção, a função de similaridade produz um ranking dos elementos dessa coleção ordenados pelo valor de similaridade entre cada elemento e o objeto consulta. Como somente os elementos que são variações do objeto consulta são relevantes e deveriam ser retornados, é necessário o uso de um limiar para delimitar o resultado. O primeiro desa o das consultas por abrangência é a de nição do limiar. Geralmente é o especialista humano que faz a estimativa manualmente através da identi - cação de elementos relevantes e irrelevantes para cada consulta e em seguida, utiliza uma medida como revocação e precisão (R&P). A alta dependência do especialista humano di culta o uso de consultas por abrangência na prática, principalmente em grandes coleções. Por esta razão, o método apresentado nesta tese tem por objetivo estimar R&P para vários limiares com baixa dependência do especialista humano. Como um sub-produto do método, também é possível selecionar o limiar mais adequado para uma função sobre uma determinada coleção. Considerando que as funções de similaridade são imperfeitas e que apresentam níveis diferentes de qualidade, é necessário avaliar a função de similaridade para cada coleção, pois o resultado é dependente dos dados. Um limiar para uma coleção pode ser totalmente inadequado para outra coleção, embora utilizando a mesma função de similaridade. Como forma de medir a qualidade de funções de similaridade no contexto de consultas por abrangência, esta tese apresenta a discernibilidade. Trata-se de uma medida que de ne a habilidade da função de similaridade de separar elementos relevantes e irrelevantes. Comparando com a precisão média, a discernibilidade captura variações que não são percebidas pela precisão média, o que mostra que a discernibilidade é mais apropriada para consultas por abrangência. Uma extensa avaliação experimental usando dados reais mostra a viabilidade tanto do método de estimativas como da medida de discernibilidade para consultas por abrangência. / In real systems, stored data typically have inconsistencies caused by typing errors, abbreviations, transposed characters, amongst others. For this reason, di erent representations of the same real world object are stored as distinct elements, causing problems during query processing. In this sense, this thesis investigates range queries which nd objects that represent the same real world object being queried . This type of query cannot be processed by exact matching, thus requiring the support for querying by similarity. For each query submitted to a given collection, the similarity function produces a ranked list of all elements in this collection. This ranked list is sorted decreasingly by the similarity score value with the query object. Only the variations of the query object should be part of the result as only those items are relevant. For this reason, it is necessary to apply a threshold value to properly split the ranking. The rst challenge of range queries is the de nition of a proper threshold. Usually, a human specialist makes the estimation manually through the identi cation of relevant and irrelevant elements for each query. Then, he/she uses measures such as recall and precision (R&P). The high dependency on the human specialist is the main di culty related to use of range queries in real situations, specially for large collections. In this sense, the method presented in this thesis has the objective of estimating R&P at several thresholds with low human intervention. As a by-product of this method, it is possible to select the optimal threshold for a similarity function in a given collection. Considering the fact that the similarity functions are imperfect and vary in quality, it is necessary to evaluate the similarity function for each collection as the result is domain dependent. A threshold value for a collection could be totally inappropriate for another, even though the same similarity function is applied. As a measure of quality of similarity functions for range queries, this thesis introduces discernability. This is a measure to quantify the ability of the similarity function in separating relevant and irrelevant elements. Comparing discernability and mean average precision, the rst one can capture variations that are not noticed by precision-based measures. This property shows that discernability presents better results for evaluating similarity functions for range queries. An extended experimental evaluation using real data shows the viability of both, the estimation method and the discernability measure, applied to range queries. Read more Recuperacao : Informacao Métricas : Similaridade Banco : Dados Quality evaluation Similarity queries Similarity function Range queries Threshold estimation
7	Avaliação da qualidade de funções de similaridade no contexto de consultas por abrangência / Quality evaluation of similarity functions for range queries Stasiu, Raquel Kolitski January 2007 (has links) Em sistemas reais, os dados armazenados tipicamente apresentam inconsistências causadas por erros de gra a, abreviações, caracteres trocados, entre outros. Isto faz com que diferentes representações do mesmo objeto do mundo real sejam registrados como elementos distintos, causando um problema no momento de consultar os dados. Portanto, o problema investigado nesta tese refere-se às consultas por abrangência, que procuram encontrar objetos que representam o mesmo objeto real consultado . Esse tipo de consulta não pode ser processado por coincidência exata, necessitando de um mecanismo de consulta com suporte à similaridade. Para cada consulta submetida a uma determinada coleção, a função de similaridade produz um ranking dos elementos dessa coleção ordenados pelo valor de similaridade entre cada elemento e o objeto consulta. Como somente os elementos que são variações do objeto consulta são relevantes e deveriam ser retornados, é necessário o uso de um limiar para delimitar o resultado. O primeiro desa o das consultas por abrangência é a de nição do limiar. Geralmente é o especialista humano que faz a estimativa manualmente através da identi - cação de elementos relevantes e irrelevantes para cada consulta e em seguida, utiliza uma medida como revocação e precisão (R&P). A alta dependência do especialista humano di culta o uso de consultas por abrangência na prática, principalmente em grandes coleções. Por esta razão, o método apresentado nesta tese tem por objetivo estimar R&P para vários limiares com baixa dependência do especialista humano. Como um sub-produto do método, também é possível selecionar o limiar mais adequado para uma função sobre uma determinada coleção. Considerando que as funções de similaridade são imperfeitas e que apresentam níveis diferentes de qualidade, é necessário avaliar a função de similaridade para cada coleção, pois o resultado é dependente dos dados. Um limiar para uma coleção pode ser totalmente inadequado para outra coleção, embora utilizando a mesma função de similaridade. Como forma de medir a qualidade de funções de similaridade no contexto de consultas por abrangência, esta tese apresenta a discernibilidade. Trata-se de uma medida que de ne a habilidade da função de similaridade de separar elementos relevantes e irrelevantes. Comparando com a precisão média, a discernibilidade captura variações que não são percebidas pela precisão média, o que mostra que a discernibilidade é mais apropriada para consultas por abrangência. Uma extensa avaliação experimental usando dados reais mostra a viabilidade tanto do método de estimativas como da medida de discernibilidade para consultas por abrangência. / In real systems, stored data typically have inconsistencies caused by typing errors, abbreviations, transposed characters, amongst others. For this reason, di erent representations of the same real world object are stored as distinct elements, causing problems during query processing. In this sense, this thesis investigates range queries which nd objects that represent the same real world object being queried . This type of query cannot be processed by exact matching, thus requiring the support for querying by similarity. For each query submitted to a given collection, the similarity function produces a ranked list of all elements in this collection. This ranked list is sorted decreasingly by the similarity score value with the query object. Only the variations of the query object should be part of the result as only those items are relevant. For this reason, it is necessary to apply a threshold value to properly split the ranking. The rst challenge of range queries is the de nition of a proper threshold. Usually, a human specialist makes the estimation manually through the identi cation of relevant and irrelevant elements for each query. Then, he/she uses measures such as recall and precision (R&P). The high dependency on the human specialist is the main di culty related to use of range queries in real situations, specially for large collections. In this sense, the method presented in this thesis has the objective of estimating R&P at several thresholds with low human intervention. As a by-product of this method, it is possible to select the optimal threshold for a similarity function in a given collection. Considering the fact that the similarity functions are imperfect and vary in quality, it is necessary to evaluate the similarity function for each collection as the result is domain dependent. A threshold value for a collection could be totally inappropriate for another, even though the same similarity function is applied. As a measure of quality of similarity functions for range queries, this thesis introduces discernability. This is a measure to quantify the ability of the similarity function in separating relevant and irrelevant elements. Comparing discernability and mean average precision, the rst one can capture variations that are not noticed by precision-based measures. This property shows that discernability presents better results for evaluating similarity functions for range queries. An extended experimental evaluation using real data shows the viability of both, the estimation method and the discernability measure, applied to range queries. Read more Recuperacao : Informacao Métricas : Similaridade Banco : Dados Quality evaluation Similarity queries Similarity function Range queries Threshold estimation
8	Query Support for Multi-Dimensional and Dynamic Databases Apaydin, Tan 29 September 2008 (has links) No description available. Computer Science Angular Similarity Bitmap Index Multidimensional data Large Scale Point and Range Queries Query Execution
9	Δομές δεικτοδότησης και υπολογισμός ερωτημάτων εύρους κ-διαστάσεων σε κατανεμημένα περιβάλλοντα / Indexing structures and computation k-dimensional range queries in distributed environments Καπλάνης, Αθανάσιος 24 November 2014 (has links) Ανέκαθεν, η ανάγκη του ανθρώπου για πληροφορία ήτανε μια από αυτές που φρόντιζε να ικανοποιήσει όσο το δυνατόν πληρέστερα. Η πληροφορία είναι σε όλες τις περιπτώσεις ένα πολύτιμο εργαλείο στην λήψη αποφάσεων και οι άνθρωποι γρήγορα αντιλήφθηκαν την σημασία της, ειδικότερα μάλιστα στην σύγχρονη εποχή στην οποία μέσω της επιστήμης της Πληροφορικής δόθηκε η δυνατότητα σε μεγάλο μέρος του κοινού να έχει πρόσβαση σε τεράστιο όγκο δεδομένων, τα οποία μέσω της σωστής επεξεργασίας μετατρέπονται σε πληροφορία. Αυτό που πλέον αποτελεί πρόκληση, η οποία μας καλεί σαν επιστήμονες της Πληροφορικής να αντιμετωπίσουμε, είναι η εύρεση και στην συνέχεια η εφαρμογή καινούργιων μεθόδων γρήγορης και ανέξοδης συλλογής, αποδοτικής αποθήκευσης και εποικοδομητικής ανάλυσης δεδομένων, έτσι ώστε να γίνουν πληροφορία ποιοτική, πλούσια και με σημαντική χρηστική αξία. Στις μέρες μας, η ανάπτυξη του κλάδου τόσο των κατανεμημένων συστημάτων όσο και του διαδικτύου, μας έχουνε δώσει την δυνατότητα να χρησιμοποιούνται χαμηλοί σε απαιτήσεις υπολογιστικοί πόροι για να επεξεργάζονται παράλληλα μεγάλο όγκο δεδομένων. Ο κλάδος της Πληροφορικής που ασχολείται εκτενώς με αυτά τα συστήματα είναι τα ομότιμα συστήματα ή αλλιώς p2p συστήματα και ο κατανεμημένος υπολογισμός. Η παρούσα διπλωματική εργασία έχει ως στόχο να βρίσκει σε κατανεμημένο περιβάλλον σημεία στις δύο διαστάσεις. Ορίζεται, δηλαδή, ένας χώρος από κ – διαστάσεις που είναι το πλέγμα (grid), στον οποίο ο χρήστης προσπαθεί να εντοπίσει σημεία που τον ενδιαφέρουν δημιουργώντας έτσι ερωτήματα εύρους. Το σύστημα θα ψάχνει να βρει το αποτέλεσμα στο ερώτημα αυτό για να καταλήξει σε ποιο από τα άλλα ορθογώνια τμήματα του πλέγματος εμπλέκεται και στην συνέχεια αυτά (τα τμήματα) θα επιστρέφονται. Πιο συγκεκριμένα, το πλέγμα μας χωρίζεται σε τετράγωνες περιοχές και κάθε κόμβος του κατανεμημένου δικτύου αναλαμβάνει να φιλοξενήσει τα σημεία της κάθε τετράγωνης περιοχής. Όλοι αυτοί οι κόμβοι οργανώνονται σε ένα hadoop cluster και τα δεδομένα εισάγονται στην κατανεμημένη βάση δεδομένων HBase που βασίζεται στην αρχιτεκτονική του BigTable της Google File System. Ο τρόπος που οργανώνονται τα δεδομένα στην HBase είναι κατανεμημένος και γίνεται χρήση των B+ -δέντρων. Η χρησιμότητα των B+ -δέντρων σε συνδυασμό με το κατανεμημένο πλαίσιο εργασίας του Hadoop, έγκειται στο γεγονός ότι με την χρήση των απαραίτητων εργαλείων τόσο της HBase όσο και του Hadoop FS, μπορούμε να γνωρίζουμε σε ποιόν κόμβο του hadoop cluster είναι αποθηκευμένοι οι ζητούμενοι κόμβοι του B+ -δέντρου και έτσι να επιτυγχάνεται η γρήγορη ανάκτηση των αποτελεσμάτων σε ένα ερώτημα εύρους. Η διάρθρωση της εργασίας έχει ως εξής: Στο πρώτο κεφάλαιο γίνεται μια εισαγωγή στις έννοιες του κατανεμημένου υπολογισμού πάνω σε κατανεμημένα περιβάλλοντα. Στο δεύτερο γίνεται μια αναφορά στα ομότιμα δίκτυα (p2p) και πιο συγκεκριμένα αναλύεται το δίκτυο επικάλυψης του BATON που έχει δενδρική δομή όμοια με αυτή του Β+ -δέντρου. Στο τρίτο κεφάλαιο αναφέρεται μια υλοποίηση δεικτοδότησης και απάντησης σε ερωτήματα εύρους στο Νέφος Υπολογιστών με χρήση βασικών δομών δεδομένων B+ -δέντρου. Επίσης, η ART Autonomous Range Tree δομή παρουσιάζεται η οποία μπορεί να υποστηρίξει ερωτήματα εύρους σε τόσο ευρείας κλίμακας σε μη κεντρικοποιημένα περιβάλλοντα και μπορεί να κλιμακώνεται σε σχέση με τον αριθμό των κόμβων, καθώς και με βάση τα στοιχεία που είναι αποθηκευμένα. Η ART δομή ξεπερνά τις πιο δημοφιλείς μη κεντρικοποιημένες δομές, συμπεριλαμβανομένου του Chord (και μερικοί από τους διαδόχους του), του ΒΑΤΟΝ (και τον διάδοχό του) και των Skip-Graphs. Στο τέταρτο και πέμπτο κεφάλαιο, αντίστοιχα, γίνεται μια αναφορά στα βασικότερα σημεία της αρχιτεκτονικής και της λειτουργίας του Hadoop Framework και της HBase. Στο έκτο κεφάλαιο, βρίσκεται η περιγραφή της υλοποίησης της παρούσης διπλωματικής εργασίας μαζί με τους αλγορίθμους και τον τρόπο λειτουργίας τους. Στο επόμενο γίνεται η αξιολόγηση των πειραματικών αποτελεσμάτων της παρούσης διπλωματικής εργασίας καθώς, και το τι συμπεράσματα προκύπτουν μέσα από την αξιολόγηση. Τέλος, στο τελευταίο και όγδοο κεφάλαιο γίνεται η αποτίμηση της διπλωματικής εργασίας, καθώς αναφέρονται τα βασικά της μέρη, όπως επίσης και πιθανές προεκτάσεις που θα βελτίωναν την απόδοση του συστήματος. / Traditionally, the human need for information was one of those seeking to satisfy as much as possible. Information is in every way a valuable tool in decision making and people quickly realized its importance, especially in modern times, when the Information Technology gave the public access to the vast volume of data, which can be further processed into information. What seems to be now a challenge that IT specialists have to face is finding and implementing new methods of fast and inexpensive data collection, efficient storing of data and constructive data analysis, in order to turn them into quality, rich and useful information. Nowadays, the devel-opment of both the field of distributed systems and the Internet gave us the possibility of using computational resources with low requirements for simultaneous processing of large amounts of data. The IT field that deals extensively with these systems are peer-to-peer systems (p2p) and distributed computing. The present dissertation aims at finding points in a distributed environment in the two-dimensional space. A space of k – dimensions is defined, i.e. the grid, in which the user tries to identify points of interest creating range queries. The system will search to find the result in this question to come up with the rectangular section of the grid that is involved and then these sections will be returned. More specifically, the grid is divided into square areas, and each node of the distributed network will accommodate points of each square area. All these nodes are organized into a hadoop cluster and the data is imported into the HBase distributed database based on BigTable architecture of the Google File System. In HBase data is organized in a distributed way and B+ -trees are used. The utility of B+ -trees in conjunction with the distributed framework of Hadoop lies on the fact that using the necessary tools of both HBase and Hadoop FS we can know in which hadoop cluster node the requested B+ -tree nodes are stored and thus achieve fast results retrieval in a range query. The structure of the project is as follows: The first chapter is an introduction to the concepts of distributed computing over distributed environments. The second is a reference to peer-to-peer networks (p2p) and more specifically the BATON overlay network, which has a tree structure similar to that of the B+ -tree, is analyzed. The third chapter deals with an indexation and answering implementation on range queries in the Computer Cloud using B+ -tree basic data structures. Also, ART Autonomous Range Tree structure is presented which can support range queries in such large-scale decentralized environments and can scale in terms of the number of nodes as well as in terms of the data items stored. ART outperforms the most popular decentralized structures, including Chord (and some of its successors), BATON (and its successor) and Skip-Graphs. In the fourth and fifth chapter respectively a reference is made to the main points of Hadoop Framework and HBase architecture and operation. The sixth chapter is the description of the implementation of this dissertation together with the algorithms and how they operate. The next chapter is the evaluation of the experimental results of this dissertation and of the conclusions that derive from the evaluation. Finally, the eighth and last chapter is an overview of the dissertation, mentioning its basic parts, as well as possible extensions that would improve the system performance. Read more Ερωτήματα εύρους B+ -δέντρα Νέφη υπολογιστών P2P δίκτυα επικάλυψης 005.276 Hadoop HBase Baton ART Range queries B+ -trees Distributed systems Cloud computing P2P overlays
10	Evaluating data structures for range queries in brain simulations / Utvärdering av datastrukturer för intervallfrågor inom hjärnsimuleringar Norelius, Jenny, Tacchi, Antonello January 2018 (has links) Our brain and nervous system is a vital organ to us, since it is from there our thoughts, personalities, and other mental capacities originate. Within this field of neuroscience a common method of study is to build and run large scale brain simulations where up to hundred thousand neurons are used to produce a model of a brain in three dimensional space. To find all neurites within a specific area is to perform a range query. A vast number of range queries are required when running brain simulations which makes it important that the data structure used to store the simulated neurons is efficient. This study evaluate three common data structures, also called spatial index; the R-tree, Quadtree and R-tree (Rstar-tree). We test their performance for range queries with regards to execution time, incurred reads, build time, size of data and density of data. The data used is models of a typical neuron so that the characteristics of the data set is preserved. The results show that the R-tree outperforms the other indices by being significantly more efficient compared to the others, with the R-tree having slightly worse performance than the Quadtree. The time it takes to build the index is to be almost identical for all implementations. / Vår hjärna och nervsystem är ett grundläggande organ för oss. Det är där ifrån våra tankar, personligheter och mentala kapaciteter kommer ifrån. Inom neurovetenskap är en vanlig forskningsmetod att köra storskaliga hjärnsimuleringar där hundratusentals neuroner används för att skapa en modell av hjärnan i 3D. För att hitta alla neuroner inom en viss area används en så kallad intervallfråga. En stor mängd intervallfrågor behövs för hjärnsimuleringar vilket gör det viktigt att datastrukturerna som används för detta är kostnadseffektiva. Denna studie har som mål att jämföra tre stycken vanliga datastrukturer som används för intervallfrågor. Dessa är R-tree, Quadtree och R-tree. Deras prestanda testas för exekveringstid, antal läsningar, konstruktionstid, samt storlek och densitet på neuroner. För att skapa hjärnsimuleringen används en typisk neuron som standard sådant att dess karakteristiska egenskaper bevaras. Resultaten från studien visar att R-tree hade den tydligt bästa prestandan för de givna kriterierna, och att Quadtree har en något bättre prestanda än R-tree. Tiden det tar att mata in neuronerna i datastrukturerna är i stort sett densamma. Read more range queries range query spatial index R*-tree Rstar-tree R-tree Quadtree performance measure comparative quantative data structures Computer Sciences Datavetenskap (datalogi)

Search results