Spelling suggestions: "subject:"databases"" "subject:"atabases""
151 |
Performance Evaluation of Time series Databases based on Energy ConsumptionSanaboyina, Tulasi Priyanka January 2016 (has links)
The vision of the future Internet of Things is posing new challenges due to gigabytes of data being generated everyday by millions of sensors, actuators, RFID tags, and other devices. As the volume of data is growing dramatically, so is the demand for performance enhancement. When it comes to this big data problem, much attention has been given to cloud computing and virtualization for their almost unlimited resource capacity, flexible resource allocation and management, and distributed processing ability that promise high scalability and availability. On the other hand, the variety of types and nature of data is continuously increasing. Almost without exception, data centers supporting cloud based services are monitored for performance and security and the resulting monitoring data needs to be stored somewhere. Similarly, billions of sensors that are scattered throughout the world are pumping out huge amount of data, which is handled by a database. Typically, the monitoring data consists time series, that is numbers indexed by time. To handle this type of time series data a distributed time series database is needed. Nowadays, many database systems are available but it is difficult to use them for storing and managing large volumes of time series data. Monitoring large amounts of periodic data would be better done using a database optimized for storing time series data. The traditional and dominant relational database systems have been questioned whether they can still be the best choice for current systems with all the new requirements. Choosing an appropriate database for storing huge amounts of time series data is not trivial as one must take into account different aspects such as manageability, scalability and extensibility. During the last years NoSQL databases have been developed to address the needs of tremendous performance, reliability and horizontal scalability. NoSQL time series databases (TSDBs) have risen to combine valuable NoSQL properties with characteristics of time series data from a variety of use-cases. In the same way that performance has been central to systems evaluation, energy-efficiency is quickly growing in importance for minimizing IT costs. In this thesis, we compared the performance of two NoSQL distributed time series databases, OpenTSDB and InfluxDB, based on the energy consumed by them in different scenarios, using the same set of machines and the same data. We evaluated the amount of energy consumed by each database on single host and multiple hosts, as the databases compared are distributed time series databases. Individual analysis and comparative analysis is done between the databases. In this report we present the results of this study and the performance of these databases based on energy consumption.
|
152 |
Scalable Community Detection in Massive Networks using Aggregated Relational DataJones, Timothy January 2019 (has links)
The analysis of networks is used in many fields of study including statistics, social science, computer sciences, physics, and biology. The interest in networks is diverse as it usually depends on the field of study. For instance, social scientists are interested in interpreting how edges arise, while biologists seek to understand underlying biological processes. Among the problems being explored in network analysis, community detection stands out as being one of the most important. Community detection seeks to find groups of nodes with a large concentration of links within but few between. Inferring groups are important in many applications as they are used for further downstream analysis. For example, identifying clusters of consumers with similar purchasing behavior in a customer and product network can be used to create better recommendation systems. Finding a node with a high concentration of its edges to other nodes in the community may give insight into how the community formed.
Many statistical models for networks implicitly define the notion of a community. Statistical inference aims to fit a model that posits how vertices are connected to each other. One of the most common models for community detection is the stochastic block model (SBM) [Holland et al., 1983]. Although simple, it is a highly expressive family of random graphs. However, it does have its drawbacks. First, it does not capture the degree distribution of real-world networks. Second, it allows nodes to only belong to one community. In many applications, it is useful to consider overlapping communities. The Mixed Membership Stochastic Blockmodel (MMSB) is a Bayesian extension of the SBM that allows nodes to belong to multiple communities.
Fitting large Bayesian network models quickly become computationally infeasible when the number of nodes grows into the hundred of thousands and millions. In particular, the number of parameters in the MMSB grows as the number of nodes squared. This thesis introduces an efficient method for fitting a Bayesian model to massive networks through use of aggregated relational data. Our inference method converges faster than existing methods by leveraging nodal information that often accompany real world networks. Conditioning on this extra information leads to a model that admits a parallel variational inference algorithm. We apply our method to a citation network with over three million nodes and 25 million edges. Our method converges faster than existing posterior inference algorithms for the MMSB and recovers parameters better on simulated networks generated according to the MMSB.
|
153 |
Biological database indexing and its applications.January 2002 (has links)
Cheung Ching Fung. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2002. / Includes bibliographical references (leaves 71-73). / Abstracts in English and Chinese. / Chapter 1 --- Introduction --- p.1 / Chapter 1.1 --- Biological Sequences --- p.2 / Chapter 1.2 --- User Queries on Biological Sequences --- p.4 / Chapter 1.3 --- Research Contributions --- p.6 / Chapter 1.4 --- Organization of Thesis --- p.6 / Chapter 2 --- Background --- p.7 / Chapter 2.1 --- What is a Suffix-Tree? --- p.7 / Chapter 2.2 --- Disk-Based Suffix-Trees --- p.9 / Chapter 3 --- Disk-Based Suffix Tree Constructions --- p.11 / Chapter 3.1 --- An Existing Algorithm: PrePar-Suff ix --- p.11 / Chapter 3.1.1 --- "Three Issues: Edge Splitting, Random Access and Data Skew" --- p.13 / Chapter 3.2 --- DynaCluster-Suffix: A New Novel Disk-Based Suffix-Tree Construction Algorithm --- p.18 / Chapter 4 --- Suffix Links Rebuilt --- p.29 / Chapter 4.1 --- Suffix-links and Least Common Ancestors --- p.29 / Chapter 5 --- q-Length Exact Sequence Matching --- p.35 / Chapter 5.1 --- q-Length Exact Sequence Matching by Suffix-Tree --- p.35 / Chapter 6 --- Implementation --- p.38 / Chapter 6.1 --- System Overview --- p.38 / Chapter 6.1.1 --- Index Builder --- p.39 / Chapter 6.1.2 --- Exact Query Processor --- p.39 / Chapter 6.1.3 --- Suffix Links Regenerator --- p.40 / Chapter 6.1.4 --- Tandem Repeats Finder --- p.40 / Chapter 6.2 --- Data Structures --- p.40 / Chapter 6.2.1 --- Representation of a Node --- p.40 / Chapter 6.2.2 --- An Alternative Node Representation --- p.42 / Chapter 6.2.3 --- Representation of a Leaf --- p.43 / Chapter 6.3 --- Buffering --- p.44 / Chapter 6.3.1 --- Page Format --- p.44 / Chapter 6.3.2 --- Address Translation --- p.45 / Chapter 6.3.3 --- Page Replacement Strategies --- p.45 / Chapter 7 --- A Performance Studies --- p.48 / Chapter 7.1 --- When Everything Can be Held In Memory --- p.52 / Chapter 7.2 --- When Main Memory Is Limited --- p.54 / Chapter 7.3 --- The Effectiveness of DNA Lengths with Fixed Memory Sizes . --- p.56 / Chapter 7.4 --- The Effectiveness of Memory Sizes --- p.57 / Chapter 7.5 --- Answering q-Length Exact Sequence Matching Queries --- p.60 / Chapter 7.6 --- Suffix Link Rebuilt --- p.61 / Chapter 8 --- Conclusions and Future Works --- p.69 / Chapter 8.1 --- Conclusions --- p.69 / Chapter 8.2 --- Future Works --- p.70 / Bibliography --- p.71
|
154 |
An improved method for database design.January 2004 (has links)
Chan, Chi Wai Alan. / Thesis (M.Phil.)--Chinese University of Hong Kong, 2004. / Includes bibliographical references (leaves 121-126). / Abstracts in English and Chinese. / Abstract --- p.v / Acknowledgements --- p.viii / List of Figures --- p.ix / List of Tables --- p.xi / Chapter 1. --- Introduction --- p.12 / Chapter 1.1. --- Object-oriented databases --- p.12 / Chapter 1.2. --- Object-oriented Data Model --- p.14 / Chapter 1.3. --- Class and Object Instances --- p.15 / Chapter 1.4. --- Inheritance --- p.16 / Chapter 1.5. --- Constraint --- p.18 / Chapter 1.6. --- Physical Design for OODB Storage --- p.19 / Chapter 1.7. --- Problem Description --- p.20 / Chapter 1.8. --- Genetic Algorithm --- p.22 / Chapter 1.8.1. --- Constraint Handling Methods in GA --- p.25 / Chapter 1.9. --- Contributions of this work --- p.27 / Chapter 1.10. --- Outline of this work --- p.30 / Chapter 2. --- Literature Review --- p.32 / Chapter 2.1. --- Object-oriented database --- p.32 / Chapter 2.2. --- Object-Oriented Data model --- p.33 / Chapter 2.3. --- Physical Storage Model for OODBs --- p.35 / Chapter 2.3.1. --- Home Class (HC) Model --- p.36 / Chapter 2.3.2. --- Repeated Class (RC) Model --- p.38 / Chapter 2.3.3. --- Split Instance (SI) Model --- p.39 / Chapter 2.4. --- Solving physical storage design for OODBs --- p.40 / Chapter 2.5. --- Transaction-Based Approach --- p.41 / Chapter 2.6. --- Minimize database operational cost --- p.42 / Chapter 2.7. --- Combinational Optimization Method --- p.43 / Chapter 2.8. --- Research in Genetic Algorithm --- p.46 / Chapter 2.9. --- Implementation in GA --- p.47 / Chapter 2.10. --- Fitness function --- p.49 / Chapter 2.11. --- Crossover operation --- p.50 / Chapter 2.12. --- Encoding and Representation --- p.51 / Chapter 2.13. --- Parent Selection in Crossover Operation --- p.52 / Chapter 2.14. --- Reproductive selection --- p.53 / Chapter 2.14.1. --- Selection of Crossover Operator --- p.54 / Chapter 2.14.2. --- Replacement --- p.54 / Chapter 2.15. --- The Use of Constraint Handling Method --- p.55 / Chapter 2.15.1. --- Penalty function --- p.56 / Chapter 2.15.2. --- Decoder gives instruction to build feasible solution --- p.57 / Chapter 2.15.3. --- Adjustment method --- p.58 / Chapter 3. --- Solving Physical Storage Problem for OODB using GA --- p.60 / Chapter 3.1. --- Physical storage models for OODB --- p.61 / Chapter 3.2. --- Database operation for transactions --- p.62 / Chapter 3.3. --- Properly designed physical storage structure --- p.68 / Chapter 3.4. --- Fitness Evaluation --- p.69 / Chapter 3.5. --- Initial population --- p.72 / Chapter 3.6. --- Cross-breeding --- p.72 / Chapter 3.7. --- GA Operators --- p.74 / Chapter 3.8. --- Physical Design Problem Formulation for GA --- p.75 / Chapter 3.9. --- Representation and Encoding --- p.75 / Chapter 3.10. --- Solving Physical Storage Problem for OODB in GA --- p.76 / Chapter 3.10.1. --- Representation of design solution --- p.76 / Chapter 3.10.2. --- Encoding --- p.78 / Chapter 3.10.3. --- Initial population --- p.80 / Chapter 3.10.4. --- Parent Selection for breeding --- p.80 / Chapter 3.11. --- Traditional Constraint handling method --- p.83 / Chapter 3.11.1. --- Improve the Performance of Inheritance Constraint Handling methods --- p.85 / Chapter 3.12. --- Weakness in Gorla's GA approach --- p.87 / Chapter 4. --- Proposed Methodology --- p.88 / Chapter 4.1 --- Enhanced Crossover Operator --- p.90 / Chapter 4.2. --- Infeasible Solutions and Enhanced Adjustment Method --- p.93 / Chapter 4.3. --- Propagation Adjustment Method --- p.97 / Chapter 5. --- Computational Experiments --- p.99 / Chapter 5.1. --- Introduction --- p.99 / Chapter 5.2. --- Experiment Objective --- p.101 / Chapter 5.3. --- Tools and Setup --- p.102 / Chapter 5.4. --- Crossover Operator --- p.105 / Chapter 5.5. --- Mutation Operator --- p.105 / Chapter 5.6. --- Termination condition --- p.106 / Chapter 5.7. --- Computational Experiments --- p.107 / Chapter 5.7.1. --- An Illustrative Example ´ؤ UNIVERSITY database --- p.107 / Chapter 5.7.2. --- Simulation ´ؤ 9 classes and 25 classes --- p.115 / Chapter 5.7.3. --- Result --- p.116 / Chapter 6. --- Conclusions --- p.118 / Chapter 6.1. --- Summary of Achievements --- p.118 / Chapter 7. --- Bibliography --- p.121 / Chapter 8. --- Appendix --- p.127
|
155 |
Universalism and particularism : explaining the emergence and growth of regional journal indexing systemsChavarro Bohórquez, Diego Andrés January 2017 (has links)
Journal indexing systems (JIS) are bibliographic databases that are used to search for scientific literature and for bibliometric analyses. This thesis addresses the emergence and growth of regional JIS, focusing on the Scientific Library Online (Scielo) and the Red de Revistas Científicas de América Latina, el Caribe, España, y Portugal (RedALyC) in a challenging environment in which the Web of Science (WoS) and Scopus prevail. WoS and Scopus are referred to as mainstream JIS and Scielo and RedALyC as alternative JIS. The research questions are: (1) Why did alternative JIS emerge in light of the dominance of WoS? (2) Why do researchers publish in journals indexed by alternative JIS? The research draws on the concepts of cognitive authority from information science, and universalism and particularism from the sociology of science. A cognitive authority is an information source that is credible. JIS are becoming cognitive authorities in the science communication system. Their credibility relies on their application of objective criteria to select journals (universalism). However, journal selection can be influenced by subjective criteria (particularism). The tensions between universalism and particularism suggest two scenarios for the emergence and growth of alternative JIS. A universalistic view suggests that they emerge to cover journals with low scientific impact and editorial standards. A particularistic view poses that they emerge to cover disciplinary, linguistic, and regional gaps created by biases in mainstream JIS, particularly in the coverage of WoS. The research questions were addressed through mixed methods to produce quantitative and qualitative evidence. The evidence was obtained from (1) documentary and literature reviews; (2) descriptive and correlational statistics; and (3) a case study that involved interviews with researchers in private and public universities in Colombia in agricultural sciences, business and management, and chemistry. The findings indicate that disciplinary, linguistic, and geographical biases in the coverage of mainstream JIS motivated the development of Scielo and RedALyC. The reasons for their growth have been conceptualised in this thesis as: (1) training; (2) knowledge-gap filling; and (3) knowledge bridging. This thesis addresses a significant gap in the sociology of science by studying new authorities in the science communication system. It contributes to debates on universalism and particularism, showing that both are involved in the selection of journals by JIS. It also contributes to understanding how particularism in mainstream JIS can pose barriers to the communication of scientific knowledge that has the potential to address pressing social demands. The findings could contribute to the design of research policy and research evaluation in contexts not widely covered by mainstream JIS.
|
156 |
Comparison of Functional Dependency Extraction Methods and an Application of Depth First SearchSood, Kanika 29 September 2014 (has links)
Extracting functional dependencies from existing databases is a useful technique in relational theory, database design and data mining.
Functional dependencies are a key property of relational schema design. A functional dependency is a database constraint between two sets of attributes. In this study we present a comparative study over TANE, FUN, FD_Mine, FastFDs and Dep_Miner, and we propose a new technique, KlipFind, to extract dependencies from relations efficiently. KlipFind employs a depth-first, heuristic driven approach as a solution. Our study indicates that KlipFind is more space efficient than any of the existing solutions and highly efficient in finding keys for relations.
|
157 |
A sem-odb application for the western cultures databaseGhersgorin, Raquel 21 July 1998 (has links)
This thesis presents the evolution of the Western Cultures Database. The project starts with a database design using a Semantic modeling, and continues with the implementation following two techniques: a Relational and a Semantic approach. The project continues with them in parallel, reaching a point where the Relational is left aside because of the advantages of the Semantic (Sem-ODB) approach.
The Semantic implementation produces as a result the Western Culture Semantic Database Application - web interface (the main contribution of this thesis). The database is created and populated using Sem ODB and the web interface is built using WebRG (report generator), HTML, JavaScript and JavaChart (applets for graphical representation). The resulting semantic application permits the storage and retrieval of data, the display of reports and the graphical representation of the data through a Web interface. All of these to support research assertions about the impact of historical figures in Western Cultures.
|
158 |
Konstruktion och utredning av databasstöd för Länsstyrelsens KalkningsverksamhetMoen, Daniel January 2008 (has links)
<p>För databasutveckling finns en mängd regler och standarder framtagna för att databasen ska bli så effektiv och felfri som möjligt. Ibland följs inte dessa fullt ut av olika anledningar. Detta kan leda till problem som i värsta fall kan få konsekvenser för verksamheten som använder sig av den. Ett exempel på en databas där man avvikit från gällande designregler är den som var tänkt att införas för kalkningsverksamheten för Länsstyrelsen i Gävleborg. I det här arbetet har jag gjort en utredning kring denna databas, identifierat dessa avvikelser, utrett varför dessa uppstått och vad dessa har lett till eller eventuellt kan leda till i det här sammanhanget. Jag har också byggt en ny databas enligt gällande regler anpassad för Gävleborgs kalkningsverksamhet. Resultaten jag kommit fram till är att dessa standardavvikelser och felaktigheter troligen beror på en kombination av bristande gränssnitt i form av formulär samt okunskap om databasdesign och administration hos personalen.</p>
|
159 |
Discovering Moving Clusters from Spatial-Temporal DatabasesLee, Chien-Ming 28 July 2007 (has links)
Owing to the advances of computer and communication technologies, clustering analysis on moving objects has attracted increasing attention in recent years. An interesting problem is to find the moving clusters composed of objects which move along for a sufficiently long period of time. However, a moving cluster inclines to break after some time because of the goal change in each individual object. In order to identify the set of moving clusters, we propose the formal definition of moving clusters with semantically clear parameters. Based on the definition, we propose delicate approaches to cluster moving objects. The proposed approaches are evaluated using data generated with and without underlying model. We validate our approaches with a through experimental evaluation and comparison.
|
160 |
GeoExpert - An Expert System Based Framework for Data Quality in Spatial DatabasesKumar, Aditya 01 August 2006 (has links)
Usage of very large sets of historical spatial data in knowledge discovery process became a common trend, and in order to obtain better results from this knowledge discovery process the data should be of high quality. In this thesis we proposed a framework 'GeoExpert' for data quality assessment and cleansing tool for spatial data that integrates the spatial data visualization and analysis capabilities of the ARCGIS, the reason and inference capability of an expert system. In this thesis we implemented the proposed framework both stand-alone and web versions using ArcGIS Engine and ArcGIS Server, respectively. We used JESS expert system shell for the expert system part of the GeoExpert. Use of expert system shell separates the application logic from the actual framework which makes the framework easily updatable and domain independent. In this thesis we implemented the GeoExpert on the spatially referenced water quality data.
|
Page generated in 0.3449 seconds