Global ETD Search

21	Sistema de aprendizado reconfigurável para classificação de dados utilizando processamento paralelo / Reconfigurable learning system for classification of data using parallel processing Moreira, Eduardo Marmo 07 May 2014 (has links) Esta tese apresenta a arquitetura de um sistema de aprendizado, com um escalonador de tarefas que possibilita a utilização de vários métodos de classificação e validação, permitindo a distribuição dessas tarefas entre os módulos do sistema. Esta arquitetura está estruturada de forma que classificações obtidas através de uma técnica sejam reutilizadas em paralelo pelo mesmo algoritmo ou por outras técnicas, produzindo novas classificações através do refinamento dos resultados alcançados e ampliando o uso em bases de dados com características diferentes. O sistema foi estruturado em quatro partes denominadas, respectivamente, Módulo de Inicialização, Módulo de Validação, Módulo de Refinamento e Módulo Especial de Escalonamento. Em cada módulo, podem ser usados vários algoritmos para atender aos seus objetivos. A estrutura deste sistema permite sua configuração, utilizando diversos métodos, inclusive com técnicas de inteligência artificial. Com isso, é possível a obtenção de resultados mais precisos por meio da escolha do melhor método para cada caso. Os resultados apresentados neste trabalho foram obtidos a partir de bases conhecidas na literatura, o que possibilita comparar as implementações dos métodos tradicionais que foram adicionadas ao sistema e, principalmente, verificar a qualidade dos refinamentos produzidos pela integração de técnicas diferentes. Os resultados demonstram que através de um sistema de aprendizado, minimiza-se a complexidade na análise de grandes bases de dados, permitindo verificar bases com estruturas diferentes e aumentar os métodos aplicados na análise de cada estrutura. Isto favorece a comparação entre os métodos e proporciona resultados mais confiáveis. Para uniformizar os dados provenientes de bases distintas, foi elaborada a modelagem de dados do sistema, o que favorece a escalabilidade do sistema de maneira uniforme. / This thesis presents the architecture of a System Learning with a task scheduler, which makes possible the utilization of several classification and validation methods, allowing the distribution of tasks between the module systems. This architecture is structured of such way that the classifications obtained through a specific technique can be reutilized in parallel by the same algorithm or by other techniques, producing new classifications through the refinement of the results achieved and expanding the use in databases with different characteristics. The system was structured in four parts denominated, respectively, Initialization module; Validation module; Refinement module; and Especial scheduling module. In each module, various algorithms can be employed to reach its objectives. The structure of this system allows its configuration, utilizing various methods, including artificial intelligence techniques. Thus, it is possible to obtain more precise results through the choice of the best method to each case. The results presented in this work were obtained from basis that are known in the literature, which allows to compare the implementations of the traditional methods that were added to the system and, especially, to verify the quality of the refinements produced by the integration of different techniques. The results demonstrated that through a learning system, the complexity of the analysis of great databases is minimized, allowing to verify basis with different structures and to increase the methods applied in the analysis of each structure. It favors the comparison between the methodologies and provides more reliable results. To standardize the data originated of distinct bases, the data modelling system was elaborated, which will favor the uniform scalability of the system. Artificial intelligence Banco de dados Classificação de dados Clustering Data partitioning Database Inteligência artificial Machine learning Máquina de aprendizado Particionamento de dados Sistema de aprendizado System learning
22	Automatic Data Partitioning By Hierarchical Genetic Search Shenoy, U Nagaraj 09 1900 (has links) CDAC / The introduction of languages like High Performance Fortran (HPF) which allow the programmer to indicate how the arrays used in the program have to be distributed across the local memories of a multi-computer has not completely unburdened the parallel programmer from the intricacies of these architectures. In order to tap the full potential of these architectures, the compiler has to perform this crucial task of data partitioning automatically. This would not only unburden the programmer but would make the programs more efficient since the compiler can be made more intelligent to take care of the architectural nuances. The topic of this thesis namely the automatic data partitioning deals with finding the best data partition for the various arrays used in the entire program in such a way that the cost of execution of the entire program is minimized. The compiler could resort to runtime redistribution of the arrays at various points in the program if found profitable. Several aspects of this problem have been proven to be NP-complete. Other researchers have suggested heuristic solutions to solve this problem. In this thesis we propose a genetic algorithm namely the Hierarchical Genetic Search algorithm to solve this problem. Computer and Information Science Genetic Search Automatic Data Partitioning Parallelizing Compiler Multiprogramming Parallel Processing Distributed Memory Multi-Computers Distributed Memory Machines Genetic Algorithms Hierarchical Genetic Search (HGS)
23	Hyperplane Partitioning : An Approach To Global Data Partitioning For Distributed Memory Machines Prakash, S R 07 1900 (has links) Automatic Global Data Partitioning for Distributed Memory Machines (DMMs) is a difficult problem. Distributed memory machines are scalable, but since the memory is distributed across processors, the scheme of placement of data (arrays) onto local memories of different processors become crucial since any communication between processors for non-local data access is an order of magnitude costlier than access to local memory. Researchers have given varied solutions to this problem, most of which work for uniform dependences in loops and they suggest HPF-like distributions only. For non-uniform dependences the loop was made to run sequentially. In this work, we present a partitioning strategy called Hyperplane Partitioning which works well with loops with non-uniform dependences also. In this method of partitioning, the iteration space is partitioned into as many number of partitions as there are number of logical processors, in such a way that the overall inter-processor communication will be minimum. The idea is to localize as many as dependences as possible so that overall communication both beacuse of non-local data as well as inter-processor synchronizations are reduced. These partitions are then induced into data spaces of the arrays referenced in the loop. Each processor then runs its part of iteration space keeping the data partition that it owns locally. Any non-local data access is implemented by inter-processor communication at run-time.The Hyperplane Partitioning is also extended to a sequence of loops. This is done by first finding Best Local Distribution (BLD) for every loop first and then finding the best way of grouping different adjacent loops (just for finding the data partition) which gives best global data partition. This sequence of distributions/redistributions is found by constructing a data structure called Data Distribution Tree (DDT) and finding the least cost path from the source to any of the leaf nodes in the DDT. The costs for the edges come from the communication cost incurred while running a loop with a particular distribution and redistribution to suit the requirement at the next loop. For this a communication cost estimator is developed which works well for fewer dimensions. To handle complete programs we use some heuristic to find the best global distribution for the entire program.Some optimizations like message optimization to reduce the number of messages sent across processors, time optimization which is done by uniform scheduling across processors, and space optimization to keep only the part of array space that any processor owns onto its local memory, are studied. Hyperplane Partitioning is also implemented using an algorithm for synchronization to handle non-local memory access as well as obeying data dependence constraints. The algorithm is also proved to be correct. The target machine is IBM-SP2 using PVM for the message passing library. The performance of the tool on some standard benchmarks (ADI and RHS) and also on some programs designed by us to show the specific merits of the tool. The results show that the loops which have non-uniform dependences also can be run on DMM with good speed-ups. Computer and Information Science Parallellizing Compiler Automatic Data Partitioning Hyper-plane Partitioning Distributed Memory Machine Electronic Data Processing Multiprogramming Distributed Memory Multiprocessors Distributed Memory Multicomputers
24	Distribuição de dados para implementações paralelas do Método de Lattice Boltzmann / Data distribution for parallel implementations of the Lattice Boltzmann Method Schepke, Claudio January 2007 (has links) A Dinâmica de Fluidos Computacional é uma importante área de pesquisa no contexto da Computação Científica. Através da modelagem e simulação das propriedades de líquidos e gases é possível obter resultados numéricos para diferentes estruturas e fenômenos físicos cotidianos e de grande importância econômica. A evolução dos sistemas computacionais possibilitou a essa área o surgimento de novas técnicas e abordagens de simulação. Uma das técnicas computacionais atualmente empregadas é o Método de Lattice Boltzmann, um método numérico iterativo para a modelagem e simulação mesoscópica da dinâmica de fluxos de fluidos. Diferentes tipos de sistemas físicos podem ser tratados através dessa técnica, como é o caso de fluxos em meios porosos ou de substâncias imiscíveis. No entanto, por causa da dimensão dos sistemas físicos, é necessário adotar estratégias que permitam a obtenção de resultados precisos ou em tempos computacionais aceitáveis. Assim, paralelizar as operações é a solução mais indicada para aumentar o desempenho do método. Uma maneira eficiente de paralelizar um método numérico é fazer uso de técnicas de distribuição de dados refinadas, como é o caso do particionamento em blocos. Tais abordagens de paralelização foram adotadas neste trabalho em implementações bi- e tridimensionais do Método de Lattice Boltzmann, com o intuito de avaliar o ganho de desempenho oferecido através dessa técnica. Além disso, foram definidos os fatores que influenciam as melhores configurações de particionamento. Os resultados obtidos demonstraram que o particionamento em blocos prove um aumento considerável do desempenho das aplicações paralelas, especialmente para a versão tridimensional do método. Para algumas configurações dos estudos de caso os tempos de execução diminuíram em até 30% em relação aos tempos obtidos com o particionamento unidimensional. Já as melhores configurações para a distribuição dos dados em blocos foram aquelas em que a disposição dos dados manteve-se mais quadrada ou cúbica em relação a cada uma das dimensões coordenadas. / Computational Fluid Dynamics is an important research area in the Scientific Computing context. Through the modeling and simulation of liquids and gases properties it is possible to get numerical results for different physical structures and daily phenomena that have great economic importance. The evolution of the computational systems made it possible to develop new techniques and approaches of simulation in this area. One of these techniques currently used is the Lattice Boltzmann Method. This method is an iterative numerical strategy for modeling and simulating mesoscopic dynamics of fluid flows. Different types of physical systems can be simulated through this technique, like immiscible substances and flows in porous media. However, since the dimension of the physical systems is usually large, it is necessary to adopt strategies that allow to get accurate results or results in an acceptable computational time. Thus, the parallelization of the operations is the best alternative to increase the performance of the method. An efficient way to parallelize a numerical method is to make use of refined data distribution techniques, like data partitioning in blocks. Such parallelization approach had been adopted in this work for bi- and three-dimensional implementations of the Lattice Boltzmann Method. The objective of the work was to evaluate the performance enhancement offered through the parallelization. Moreover, another objective is to define the elements that influence the best partitioning configurations. The results shown that data partitioning in blocks provide a considerable performance increase for parallel implementations, especially for the three-dimensional version of the method. For some configurations adopted in the case studies, the execution time was reduced of up to 30% in relation to the one-dimensional partitioning strategy. The best configurations for data distribution in blocks were that where the data disposal are more square or cubical shaped in relation to each one of the coordinate dimensions Análise numérica Mecanica : Fluidos Processamento : Alto desempenho Equacao Boltzmann Lattice-Boltzmann Block Data Partitioning Parallel processing Computational fluid dynamics High performance application Numerical simulation Scientific computing
25	Distribuição de dados para implementações paralelas do Método de Lattice Boltzmann / Data distribution for parallel implementations of the Lattice Boltzmann Method Schepke, Claudio January 2007 (has links) A Dinâmica de Fluidos Computacional é uma importante área de pesquisa no contexto da Computação Científica. Através da modelagem e simulação das propriedades de líquidos e gases é possível obter resultados numéricos para diferentes estruturas e fenômenos físicos cotidianos e de grande importância econômica. A evolução dos sistemas computacionais possibilitou a essa área o surgimento de novas técnicas e abordagens de simulação. Uma das técnicas computacionais atualmente empregadas é o Método de Lattice Boltzmann, um método numérico iterativo para a modelagem e simulação mesoscópica da dinâmica de fluxos de fluidos. Diferentes tipos de sistemas físicos podem ser tratados através dessa técnica, como é o caso de fluxos em meios porosos ou de substâncias imiscíveis. No entanto, por causa da dimensão dos sistemas físicos, é necessário adotar estratégias que permitam a obtenção de resultados precisos ou em tempos computacionais aceitáveis. Assim, paralelizar as operações é a solução mais indicada para aumentar o desempenho do método. Uma maneira eficiente de paralelizar um método numérico é fazer uso de técnicas de distribuição de dados refinadas, como é o caso do particionamento em blocos. Tais abordagens de paralelização foram adotadas neste trabalho em implementações bi- e tridimensionais do Método de Lattice Boltzmann, com o intuito de avaliar o ganho de desempenho oferecido através dessa técnica. Além disso, foram definidos os fatores que influenciam as melhores configurações de particionamento. Os resultados obtidos demonstraram que o particionamento em blocos prove um aumento considerável do desempenho das aplicações paralelas, especialmente para a versão tridimensional do método. Para algumas configurações dos estudos de caso os tempos de execução diminuíram em até 30% em relação aos tempos obtidos com o particionamento unidimensional. Já as melhores configurações para a distribuição dos dados em blocos foram aquelas em que a disposição dos dados manteve-se mais quadrada ou cúbica em relação a cada uma das dimensões coordenadas. / Computational Fluid Dynamics is an important research area in the Scientific Computing context. Through the modeling and simulation of liquids and gases properties it is possible to get numerical results for different physical structures and daily phenomena that have great economic importance. The evolution of the computational systems made it possible to develop new techniques and approaches of simulation in this area. One of these techniques currently used is the Lattice Boltzmann Method. This method is an iterative numerical strategy for modeling and simulating mesoscopic dynamics of fluid flows. Different types of physical systems can be simulated through this technique, like immiscible substances and flows in porous media. However, since the dimension of the physical systems is usually large, it is necessary to adopt strategies that allow to get accurate results or results in an acceptable computational time. Thus, the parallelization of the operations is the best alternative to increase the performance of the method. An efficient way to parallelize a numerical method is to make use of refined data distribution techniques, like data partitioning in blocks. Such parallelization approach had been adopted in this work for bi- and three-dimensional implementations of the Lattice Boltzmann Method. The objective of the work was to evaluate the performance enhancement offered through the parallelization. Moreover, another objective is to define the elements that influence the best partitioning configurations. The results shown that data partitioning in blocks provide a considerable performance increase for parallel implementations, especially for the three-dimensional version of the method. For some configurations adopted in the case studies, the execution time was reduced of up to 30% in relation to the one-dimensional partitioning strategy. The best configurations for data distribution in blocks were that where the data disposal are more square or cubical shaped in relation to each one of the coordinate dimensions Análise numérica Mecanica : Fluidos Processamento : Alto desempenho Equacao Boltzmann Lattice-Boltzmann Block Data Partitioning Parallel processing Computational fluid dynamics High performance application Numerical simulation Scientific computing
26	Distribuição de dados para implementações paralelas do Método de Lattice Boltzmann / Data distribution for parallel implementations of the Lattice Boltzmann Method Schepke, Claudio January 2007 (has links) A Dinâmica de Fluidos Computacional é uma importante área de pesquisa no contexto da Computação Científica. Através da modelagem e simulação das propriedades de líquidos e gases é possível obter resultados numéricos para diferentes estruturas e fenômenos físicos cotidianos e de grande importância econômica. A evolução dos sistemas computacionais possibilitou a essa área o surgimento de novas técnicas e abordagens de simulação. Uma das técnicas computacionais atualmente empregadas é o Método de Lattice Boltzmann, um método numérico iterativo para a modelagem e simulação mesoscópica da dinâmica de fluxos de fluidos. Diferentes tipos de sistemas físicos podem ser tratados através dessa técnica, como é o caso de fluxos em meios porosos ou de substâncias imiscíveis. No entanto, por causa da dimensão dos sistemas físicos, é necessário adotar estratégias que permitam a obtenção de resultados precisos ou em tempos computacionais aceitáveis. Assim, paralelizar as operações é a solução mais indicada para aumentar o desempenho do método. Uma maneira eficiente de paralelizar um método numérico é fazer uso de técnicas de distribuição de dados refinadas, como é o caso do particionamento em blocos. Tais abordagens de paralelização foram adotadas neste trabalho em implementações bi- e tridimensionais do Método de Lattice Boltzmann, com o intuito de avaliar o ganho de desempenho oferecido através dessa técnica. Além disso, foram definidos os fatores que influenciam as melhores configurações de particionamento. Os resultados obtidos demonstraram que o particionamento em blocos prove um aumento considerável do desempenho das aplicações paralelas, especialmente para a versão tridimensional do método. Para algumas configurações dos estudos de caso os tempos de execução diminuíram em até 30% em relação aos tempos obtidos com o particionamento unidimensional. Já as melhores configurações para a distribuição dos dados em blocos foram aquelas em que a disposição dos dados manteve-se mais quadrada ou cúbica em relação a cada uma das dimensões coordenadas. / Computational Fluid Dynamics is an important research area in the Scientific Computing context. Through the modeling and simulation of liquids and gases properties it is possible to get numerical results for different physical structures and daily phenomena that have great economic importance. The evolution of the computational systems made it possible to develop new techniques and approaches of simulation in this area. One of these techniques currently used is the Lattice Boltzmann Method. This method is an iterative numerical strategy for modeling and simulating mesoscopic dynamics of fluid flows. Different types of physical systems can be simulated through this technique, like immiscible substances and flows in porous media. However, since the dimension of the physical systems is usually large, it is necessary to adopt strategies that allow to get accurate results or results in an acceptable computational time. Thus, the parallelization of the operations is the best alternative to increase the performance of the method. An efficient way to parallelize a numerical method is to make use of refined data distribution techniques, like data partitioning in blocks. Such parallelization approach had been adopted in this work for bi- and three-dimensional implementations of the Lattice Boltzmann Method. The objective of the work was to evaluate the performance enhancement offered through the parallelization. Moreover, another objective is to define the elements that influence the best partitioning configurations. The results shown that data partitioning in blocks provide a considerable performance increase for parallel implementations, especially for the three-dimensional version of the method. For some configurations adopted in the case studies, the execution time was reduced of up to 30% in relation to the one-dimensional partitioning strategy. The best configurations for data distribution in blocks were that where the data disposal are more square or cubical shaped in relation to each one of the coordinate dimensions Análise numérica Mecanica : Fluidos Processamento : Alto desempenho Equacao Boltzmann Lattice-Boltzmann Block Data Partitioning Parallel processing Computational fluid dynamics High performance application Numerical simulation Scientific computing
27	Sistema de aprendizado reconfigurável para classificação de dados utilizando processamento paralelo / Reconfigurable learning system for classification of data using parallel processing Eduardo Marmo Moreira 07 May 2014 (has links) Esta tese apresenta a arquitetura de um sistema de aprendizado, com um escalonador de tarefas que possibilita a utilização de vários métodos de classificação e validação, permitindo a distribuição dessas tarefas entre os módulos do sistema. Esta arquitetura está estruturada de forma que classificações obtidas através de uma técnica sejam reutilizadas em paralelo pelo mesmo algoritmo ou por outras técnicas, produzindo novas classificações através do refinamento dos resultados alcançados e ampliando o uso em bases de dados com características diferentes. O sistema foi estruturado em quatro partes denominadas, respectivamente, Módulo de Inicialização, Módulo de Validação, Módulo de Refinamento e Módulo Especial de Escalonamento. Em cada módulo, podem ser usados vários algoritmos para atender aos seus objetivos. A estrutura deste sistema permite sua configuração, utilizando diversos métodos, inclusive com técnicas de inteligência artificial. Com isso, é possível a obtenção de resultados mais precisos por meio da escolha do melhor método para cada caso. Os resultados apresentados neste trabalho foram obtidos a partir de bases conhecidas na literatura, o que possibilita comparar as implementações dos métodos tradicionais que foram adicionadas ao sistema e, principalmente, verificar a qualidade dos refinamentos produzidos pela integração de técnicas diferentes. Os resultados demonstram que através de um sistema de aprendizado, minimiza-se a complexidade na análise de grandes bases de dados, permitindo verificar bases com estruturas diferentes e aumentar os métodos aplicados na análise de cada estrutura. Isto favorece a comparação entre os métodos e proporciona resultados mais confiáveis. Para uniformizar os dados provenientes de bases distintas, foi elaborada a modelagem de dados do sistema, o que favorece a escalabilidade do sistema de maneira uniforme. / This thesis presents the architecture of a System Learning with a task scheduler, which makes possible the utilization of several classification and validation methods, allowing the distribution of tasks between the module systems. This architecture is structured of such way that the classifications obtained through a specific technique can be reutilized in parallel by the same algorithm or by other techniques, producing new classifications through the refinement of the results achieved and expanding the use in databases with different characteristics. The system was structured in four parts denominated, respectively, Initialization module; Validation module; Refinement module; and Especial scheduling module. In each module, various algorithms can be employed to reach its objectives. The structure of this system allows its configuration, utilizing various methods, including artificial intelligence techniques. Thus, it is possible to obtain more precise results through the choice of the best method to each case. The results presented in this work were obtained from basis that are known in the literature, which allows to compare the implementations of the traditional methods that were added to the system and, especially, to verify the quality of the refinements produced by the integration of different techniques. The results demonstrated that through a learning system, the complexity of the analysis of great databases is minimized, allowing to verify basis with different structures and to increase the methods applied in the analysis of each structure. It favors the comparison between the methodologies and provides more reliable results. To standardize the data originated of distinct bases, the data modelling system was elaborated, which will favor the uniform scalability of the system. Banco de dados Classificação de dados Inteligência artificial Máquina de aprendizado Particionamento de dados Sistema de aprendizado Artificial intelligence Clustering Data partitioning Database Machine learning System learning
28	DSI-RTree - Um Índice R-Tree Distribuído Escalável / DSI-RTree - A distributed Scalable R-Tree Index OLIVEIRA, Thiago Borges de 15 December 2010 (has links) Made available in DSpace on 2014-07-29T14:57:47Z (GMT). No. of bitstreams: 1 dissertacao thiago b de oliveira 2010.pdf: 575961 bytes, checksum: 7a5a7e195780fa853d33c7629520df2a (MD5) Previous issue date: 2010-12-15 / The demand for spatial data processing systems that support the creation of massive applications has steadily grown in the increasingly ubiquitous computing world. These demands aims to explore the large amount of existing data to assist people s daily lives and provide new tools for business and government. Most of the current solutions to process spatial data do not meet the scalability needed, and thus new solutions that efficiently use distributed computing resources are needed. This work presents a distributed and scalable system called DSI-RTree, which implements a distributed index to process spatial data in a cluster of computers. We also have done a review of details related to the construction of the distributed spatial index, by addressing issues such as the size of data partitions, how that partitions are distributed and the impact of these definitions in the message flow on the cluster. An equation to calculate the size of the partitions based on the size of data sets is proposed, to ensure efficiently query processing on the proposed architecture. We have done some experiments running window queries in spatial data sets of 33,000 and 158,000 polygons and the results showed a scalability greater than linear. / Em face de um mundo computacional ubíquo cada vez mais possível, tem crescido constantemente a necessidade de sistemas de processamento de dados espaciais que suportem a criação de aplicações massivas para explorar a grande quantidade de dados existente, a fim de auxiliar a vida cotidiana das pessoas e prover novas ferramentas para empresas e governo. Soluções atuais de processamento, em sua maioria, não possuem a escalabilidade necessária para atender esta demanda e novas soluções distribuídas que usam eficientemente os recursos computacionais são necessárias. Este trabalho apresenta o DSIRTree, um sistema distribuído e escalável, que implementa a indexação e processamento distribuído de dados espaciais em um cluster de computadores. Uma avaliação de parâmetros da construção do índice espacial distribuído é realizada, abordando aspectos como o tamanho das partições criadas, a forma de distribuição destas partições e o impacto destas definições na troca de mensagens entre as máquinas do cluster. Uma fórmula para cálculo do tamanho das partições conforme o tamanho dos datasets é proposta, a fim de garantir eficiência no processamento de consultas na arquitetura projetada. Testes práticos do sistema mostraram uma escalabilidade maior que linear no processamento de consultas de janela em datasets espaciais de 32 e 158 mil polígonos. Processamento Distribuído Dados Espaciais Particionamento de Dados RTree Distributed Processing Spatial Data Data Partitioning R-Tree
29	Optimisation de requêtes spatiales et serveur de données distribué - Application à la gestion de masses de données en astronomie / Spatial Query Optimization and Distributed Data Server - Application in the Management of Big Astronomical Surveys Brahem, Mariem 31 January 2019 (has links) Les masses de données scientifiques générées par les moyens d'observation modernes, dont l’observation spatiale, soulèvent des problèmes de performances récurrents, et ce malgré les avancées des systèmes distribués de gestion de données. Ceci est souvent lié à la complexité des systèmes et des paramètres qui impactent les performances et la difficulté d’adapter les méthodes d’accès au flot de données et de traitement.Cette thèse propose de nouvelles techniques d'optimisations logiques et physiques pour optimiser les plans d'exécution des requêtes astronomiques en utilisant des règles d'optimisation. Ces méthodes sont intégrées dans ASTROIDE, un système distribué pour le traitement de données astronomiques à grande échelle.ASTROIDE allie la scalabilité et l’efficacité en combinant les avantages du traitement distribué en utilisant Spark avec la pertinence d’un optimiseur de requêtes astronomiques.Il permet l'accès aux données à l'aide du langage de requêtes ADQL, couramment utilisé.Il implémente des algorithmes de requêtes astronomiques (cone search, kNN search, cross-match, et kNN join) en exploitant l'organisation physique des données proposée.En effet, ASTROIDE propose une méthode de partitionnement des données permettant un traitement efficace de ces requêtes grâce à l'équilibrage de la répartition des données et à l'élimination des partitions non pertinentes. Ce partitionnement utilise une technique d’indexation adaptée aux données astronomiques, afin de réduire le temps de traitement des requêtes. / The big scientific data generated by modern observation telescopes, raises recurring problems of performances, in spite of the advances in distributed data management systems. The main reasons are the complexity of the systems and the difficulty to adapt the access methods to the data. This thesis proposes new physical and logical optimizations to optimize execution plans of astronomical queries using transformation rules. These methods are integrated in ASTROIDE, a distributed system for large-scale astronomical data processing.ASTROIDE achieves scalability and efficiency by combining the benefits of distributed processing using Spark with the relevance of an astronomical query optimizer.It supports the data access using the query language ADQL that is commonly used.It implements astronomical query algorithms (cone search, kNN search, cross-match, and kNN join) tailored to the proposed physical data organization.Indeed, ASTROIDE offers a data partitioning technique that allows efficient processing of these queries by ensuring load balancing and eliminating irrelevant partitions. This partitioning uses an indexing technique adapted to astronomical data, in order to reduce query processing time. Bases de données astronomiques Big Data Optimisation de requêtes Systèmes distribués Partitionnement Spark Astronomical Databases Big Data Query optimization Distributed systems Data partitioning Spark 005.74
30	Processamento distribu?do da consulta espa?o textual top-k Novaes, Tiago Fernandes de Athayde 17 July 2017 (has links) Submitted by Ricardo Cedraz Duque Moliterno (ricardo.moliterno@uefs.br) on 2017-11-28T21:38:06Z No. of bitstreams: 1 dissertacao-versao-final.pdf: 2717503 bytes, checksum: a1476bba65482b40daa1a139191ea912 (MD5) / Made available in DSpace on 2017-11-28T21:38:06Z (GMT). No. of bitstreams: 1 dissertacao-versao-final.pdf: 2717503 bytes, checksum: a1476bba65482b40daa1a139191ea912 (MD5) Previous issue date: 2017-07-17 / With the popularization of databases containing objects with spatial and textual information (spatio-textual object), the interest in new queries and techniques for retrieving these objects have increased. In this scenario, the main query is the the top-k spatio-textual query. This query retrieves the k best spatio-textual objects considering the distance of the object to the query location and the textual similarity between the query keywords and the textual information of the objects. However, most the studies related to top-k spatio-textual query are performed in centralized environments, not addressing real world problems such as scalability. In this paper, we study different strategies for partitioning the data and processing the top-k spatio-textual query in a distributed environment. We evaluate each strategy in a real distributed environment, employing real datasets. / Com a populariza??o de bases de dados contendo objetos que possuem informa??o espacial e textual (objeto espa?o-textual), aumentou o interesse por novas consultas e t?cnicas capazes de recuperar esses objetos de forma eficiente. Uma das principais consultas para objetos espa?o-textuais ? a consulta espa?o-textual top-k. Essa consulta visa recuperar os k melhores objetos considerando a dist?ncia do objeto at? um local informado na consulta e a similaridade textual entre palavras-chave de busca e a informa??o textual dos objetos. No entanto, a maioria dos estudos para consultas espa?o-textual top-k assumem ambientes centralizados, n?o abordando problemas frequentes em aplica??es do mundo real como escalabilidade. Nesta disserta??o s?o estudadas diferentes formas de particionar os dados e o impacto destes particionamentos no processamento da consulta espa?o-textual top-k em um ambiente distribu?do. Todas as estrat?gias propostas s?o avaliadas em um ambiente distribu?do real, utilizando dados reais. Particionamento de dados Processamento de consultas distribu?das Consultas espa?o-textuais Sistemas de informa??o Recupera??o de informa??o Data partitioning Distributed query processing Spatio-textual query Information systems Information retrieval

Search results