Global ETD Search

21	Citation analysis of database publications Rahm, Erhard, Thor, Andreas 19 October 2018 (has links) We analyze citation frequencies for two main database conferences (SIGMOD, VLDB) and three database journals (TODS, VLDB Journal, Sigmod Record) over 10 years. The citation data is obtained by integrating and cleaning data from DBLP and Google Scholar. Our analysis considers different comparative metrics per publication venue, in particular the total and average number of citations as well as the impact factor which has so far only been considered for journals. We also determine the most cited papers, authors, author institutions and their countries. Informatics, Database systems info:eu-repo/classification/ddc/004 ddc:004
22	Dynamic load balancing in parallel database systems Rahm, Erhard 19 October 2018 (has links) Dynamic load balancing is a prerequisite for effectively utilizing large parallel database systems. Load balancing at different levels is required in particular for assigning transactions and queries as well as subqueries to nodes. Special problems are posed by the need to support both inter-transaction/query as well as intra-transaction/query parallelism due to conflicting performance requirements. We compare the major architectures for parallel database systems, Shared Nothing and Shared Disk, with respect to their load balancing potential. For this purpose, we focus on parallel scan and join processing in multi-user mode. It turns out that both the degree of query parallelism as well as the processor allocation should be determined in a coordinated way and based on the current utilization of critical resource types, in particular CPU and memory. Informatics, Database systems info:eu-repo/classification/ddc/004 ddc:004
23	Ein Simulationsansatz zur Bewertung paralleler Shared-Disk-Datenbanksysteme Stöhr, Thomas 23 October 2018 (has links) No description available. Informatics, Database systems info:eu-repo/classification/ddc/004 ddc:004
24	On Parallel Join Processing in Object-Relational Database Systems Märtens, Holger, Rahm, Erhard 06 November 2018 (has links) So far only few performance studies on parallel object-relational database systems are available. In particular, the relative performance of relational vs. reference-based join processing in a parallel environment has not been investigated sufficiently. We present a performance study based on the BUCKY benchmark to compare parallel join processing using reference attributes with relational hash- and merge-join algorithms. In addition, we propose a data allocation scheme especially suited for object hierarchies and set-valued attributes. Informatics, Database systems info:eu-repo/classification/ddc/004 ddc:004
25	On Disk Allocation of Intermediate Query Results in Parallel Database Systems Märtens, Holger 07 November 2018 (has links) For complex queries in parallel database systems, substantial amounts of data must be redistributed between operators executed on different processing nodes. Frequently, such intermediate results cannot be held in main memory and must be stored on disk. To limit the ensuing performance penalty, a data allocation must be found that supports parallel I/O to the greatest possible extent. In this paper, we propose declustering even self-contained units of temporary data processed in a single operation (such as individual buckets of parallel hash joins) across multiple disks. Using a suitable analytical model, we find that the improvement of parallel I/O outweighs the penalty of increased fragmentation. Informatics, Database systems info:eu-repo/classification/ddc/004 ddc:004
26	A Classification of Skew Effects in Parallel Database Systems Märtens, Holger 07 November 2018 (has links) Skew effects are a serious problem in parallel database systems, but the relationship between different skew types and load balancing methods is still not fully understood. We develop and compare two classifications of skew effects and load balancing strategies, respectively, to match their relevant properties. Our conclusions highlight the importance of highly dynamic scheduling to optimize both the complexity and the success of load balancing. We also suggest the tuning of database schemata as a new anti-skew measure. Informatics, Database systems info:eu-repo/classification/ddc/004 ddc:004
27	Efficient Spatio-Temporal Network Analytics in Epidemiological Studies using Distributed Databases Khan, Mohammed Saquib Akmal 26 January 2015 (has links) Real-time Spatio-Temporal Analytics has become an integral part of Epidemiological studies. The size of the spatio-temporal data has been increasing tremendously over the years, gradually evolving into Big Data. The processing in such domains are highly data and compute intensive. High performance computing resources resources are actively being used to handle such workloads over massive datasets. This confluence of High performance computing and datasets with Big Data characteristics poses great challenges pertaining to data handling and processing. The resource management of supercomputers is in conflict with the data-intensive nature of spatio-temporal analytics. This is further exacerbated due to the fact that the data management is decoupled from the computing resources. Problems of these nature has provided great opportunities in the growth and development of tools and concepts centered around MapReduce based solutions. However, we believe that advanced relational concepts can still be employed to provide an effective solution to handle these issues and challenges. In this study, we explore distributed databases to efficiently handle spatio-temporal Big Data for epidemiological studies. We propose DiceX (Data Intensive Computational Epidemiology using supercomputers), which couples high-performance, Big Data and relational computing by embedding distributed data storage and processing engines within the supercomputer. It is characterized by scalable strategies for data ingestion, unified framework to setup and configure various processing engines, along with the ability to pause, materialize and restore images of a data session. In addition, we have successfully configured DiceX to support approximation algorithms from MADlib Analytics Library [54], primarily Count-Min Sketch or CM Sketch [33][34][35]. DiceX enables a new style of Big Data processing, which is centered around the use of clustered databases and exploits supercomputing resources. It can effectively exploit the cores, memory and compute nodes of supercomputers to scale processing of spatio-temporal queries on datasets of large volume. Thus, it provides a scalable and efficient tool for data management and processing of spatio-temporal data. Although DiceX has been designed for computational epidemiology, it can be easily extended to different data-intensive domains facing similar issues and challenges. We thank our external collaborators and members of the Network Dynamics and Simulation Science Laboratory (NDSSL) for their suggestions and comments. This work has been partially supported by DTRA CNIMS Contract HDTRA1-11-D-0016-0001, DTRA Validation Grant HDTRA1-11-1-0016, NSF - Network Science and Engineering Grant CNS-1011769, NIH and NIGMS - Models of Infectious Disease Agent Study Grant 5U01GM070694-11. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the official policies or endorsements, either expressed or implied, of the U.S. Government. / Master of Science Data Analytics Data Mining Distributed Systems Database Systems
28	A Model and Intelligent Software Agent for the Selection and Implementation of Open Source Software House, Terry Carl 01 January 2009 (has links) In this study, the researcher created a model and software application for identifying the factors that are relevant in the decision making process to select and implement Open-source applications in higher education. Open-source applications provide the programming syntax to the user for customization. Unlike proprietary software, where the source code is unavailable and illegal to alter, an Open-Source Software (OSS) application authorizes the user to edit and recompile the application to meet the specific needs of the institution or organization. OSS applications are either free or purchasable for a one-time fee. The rising cost of proprietary software has motivated many academic institutions to consider implementing OSS. Many IT professionals are investigating the advantages and disadvantages of open-source applications in an attempt to mitigate expensive yearly fees, licensures and maintenance costs required by proprietary software vendors. The data collected in the study represented OSS and non-OSS enabled institutions that were members of the Council of Higher Education and Accreditation (CHEA) Organization. Of the data collected from the institutions, a portion of the information was set aside for validation purposes. The model created in this research addressed the OSS concerns in higher education by identifying the experiences, institutional characteristics, and technical systems relevant to the selection and implementation of OSS applications. The researcher used the Visual Basic .NET programming language to create the model and software application that provided academic institutions with technical OSS information and support. The Intelligent Software Agent (ISA) simplified the data analysis process by providing a Graphical User Interface (GUI) for the user to enter and receive data. The results of this research allowed institutions to specify certain criteria such as highest degree awarded, relevant characteristics, and technological factors and then receive implementation suggestions for adopting OSS applications. The validation process indicated that the tabled data in the model and generated suggestions of the ISA were statistically comparable with the data that was set-aside for validation purposes. Database Systems Higher Education Intelligent Agents Open Source Software OSS Model Programming Computer Sciences
29	Systèmes de gestion de base de données embarqués dans une puce électronique<br />(Database Systems on Chip) Anciaux, Nicolas 17 December 2004 (has links) (PDF) L'évolution actuelle de l'informatique conduit à un environnement composé d'un grand nombre de petits dispositifs spécialisés, servant les individus. Mark Weiser, le pionnier de l'informatique ubiquitaire, dépeint cette vision du monde informatique au début des années 90 [Wei91], qu'il définit ainsi, à l'inverse de la réalité virtuelle : “la réalité virtuelle mets les gens au coeur d'un monde généré par l'ordinateur, alors que l'informatique ubiquitaire force l'ordinateur à vivre dans le monde réel des gens.” Dans cette vision, les individus sont entourés d'une pléiade d'objets intelligents. Ces dispositifs communicants sont conscients des individus, à l'écoute de leurs besoins, et ajustent leurs actions en fonction.<br />Le concept d'objet intelligent apparaît dans les années 90, au carrefour de la miniaturisation des composants, des technologies de traitement de l'information et des communications sans fil. Dans cette thèse, nous voyons les objets intelligents comme des dispositifs dotés de capacités d'acquisition de données, de stockage, de traitement et de communication. Les cartes à puce (cartes Java multi-applications, cartes bancaires, cartes SIM, carte d'indenté électronique, etc.), les capteurs surveillant leur environnement (collectant des informations météorologiques, de composition de l'air, de trafic routier, etc.) et les dispositifs embarqués dans des appareils domestiques [BID+99, KOA+99, Wei96] (puce embarquée dans un décodeur de télévision) sont des exemples représentatifs des objets intelligents. Les chercheurs en médecine (projet Smartdust [KKP]) comptent aussi sur les puces pour être inhalées ou ingérées (pour remplacer les radiographies, automatiser la régulation du taux de sucre chez les diabétiques, etc.).<br />Dès lors que les objets intelligents acquièrent et traitent des données, des composants base de données embarqués peuvent être nécessaires. Par exemple, dans [BGS01], des réseaux de capteurs collectant des données environnementales sont comparés à des bases de données distribuées, chaque capteur agissant comme un micro-serveur de données répondant à des requêtes. [MaF02] souligne le besoin d'effectuer des traitements locaux sur les données accumulées, telles que des calculs d'agrégats, pour réduire la quantité de données à émettre et ainsi économiser de l'énergie. De plus, la protection de données contextuelles ou de dossiers portables (comme un dossier médical, un carnet d'adresse, ou un agenda) stockés dans des objets intelligents conduit à embarquer des moteurs d'exécution de requêtes sophistiqués pour éviter de divulguer des informations sensibles [PBV+01]. Ces exemples sont représentatifs du besoin grandissant de composants base de données embarqués.<br />Cette thèse adresse particulièrement l'impact des contraintes relatives aux objets intelligents sur les techniques base de données à embarquer dans les calculateurs. L'introduction est organisée comme suit. La première section de l'introduction met en évidence ces contraintes et motive le besoin de composants base de données embarqués. La section suivante expose les trois contributions majeures de la thèse et la dernière section présente l'organisation de ce résumé. [INFO] Computer Science Systèmes de gestion base de données embarqués puce électronique <br />Database Systems Chip
30	Automatic Tuning of Data-Intensive Analytical Workloads Herodotou, Herodotos January 2012 (has links) <p>Modern industrial, government, and academic organizations are collecting massive amounts of data ("Big Data") at an unprecedented scale and pace. The ability to perform timely and cost-effective analytical processing of such large datasets in order to extract deep insights is now a key ingredient for success. These insights can drive automated processes for advertisement placement, improve customer relationship management, and lead to major scientific breakthroughs.</p><p>Existing database systems are adapting to the new status quo while large-scale dataflow systems (like Dryad and MapReduce) are becoming popular for executing analytical workloads on Big Data. Ensuring good and robust performance automatically on such systems poses several challenges. First, workloads often analyze a hybrid mix of structured and unstructured datasets stored in nontraditional data layouts. The structure and properties of the data may not be known upfront, and will evolve over time. Complex analysis techniques and rapid development needs necessitate the use of both declarative and procedural programming languages for workload specification. Finally, the space of workload tuning choices is very large and high-dimensional, spanning configuration parameter settings, cluster resource provisioning (spurred by recent innovations in cloud computing), and data layouts.</p><p>We have developed a novel dynamic optimization approach that can form the basis for tuning workload performance automatically across different tuning scenarios and systems. Our solution is based on (i) collecting monitoring information in order to learn the run-time behavior of workloads, (ii) deploying appropriate models to predict the impact of hypothetical tuning choices on workload behavior, and (iii) using efficient search strategies to find tuning choices that give good workload performance. The dynamic nature enables our solution to overcome the new challenges posed by Big Data, and also makes our solution applicable to both MapReduce and Database systems. We have developed the first cost-based optimization framework for MapReduce systems for determining the cluster resources and configuration parameter settings to meet desired requirements on execution time and cost for a given analytic workload. We have also developed a novel tuning-based optimizer in Database systems to collect targeted run-time information, perform optimization, and repeat as needed to perform fine-grained tuning of SQL queries.</p> / Dissertation Computer science cost-based optimization Database systems MapReduce systems self-tuning systems

Search results