Global ETD Search

211	Online aggregate tables : A method forimplementing big data analysis in PostgreSQLusing real time pre-calculations / Realtidsaggregerade tabeller : En metod för analys av stora datamängder i PostgreSQL med hjälp av realtidsuppdaterade förberäkningar Bergmark, Fabian January 2017 (has links) In modern user-centric applications, data gathering and analysis is often of vitalimportance. Current trends in data management software show that traditionalrelational databases fail to keep up with the growing data sets. Outsourcingdata analysis often means data is locked in with a particular service, makingtransitions between analysis systems nearly impossible. This thesis implementsand evaluates a data analysis framework implemented completely within a re-lational database. The framework provides a structure for implementations ofonline algorithms of analytical methods to store precomputed results. The re-sult is an even resource utilization with predictable performance that does notdecrease over time. The system keeps all raw data gathered to allow for futureexportation. A full implementation of the framework is tested based on thecurrent analysis requirements of the company Shortcut Labs, and performancemeasurements show no problem with managing data sets of over a billion datapoints. / I moderna användarcentrerade applikationer är insamling och analys av dataofta av affärskritisk vikt. Traditionalla relationsdatabaser har svårt att hanterade ökande datamängderna. Samtidigt medför användning av externa tjänster fördataanalys ofta inlåsning av data, vilket försvårar byte av analystjänst. Dennarapport presenterar och utvärderar ett ramverk för dataanalys som är imple-menterat i en relationsdatabas. Ramverket tillhandahåller strukturer för attförberäkna resultat för analytiska beräkningar på ett effektivt sätt. Resultatetblir en jämn resursanvändning med förutsägbar prestanda som inte försämrasöver tid. Ramverket sparar även all insamlad data vilket möjliggör exporter-ing. Ramverket utvärderas hos företaget Shortcut Labs och resultatet visar attramverket klarar av datamängder på över en miljard punkter. big data aggregation real-time PostgreSQL Computer Sciences Datavetenskap (datalogi)
212	Measuring Racial Animus and Its Consequences: Incorporating Big Data into Criminology Rubenstein, Batya 23 August 2022 (has links) No description available. Criminology racism race relations measurement Big Data Google Trends
213	Cascading permissions policy model for token-based access control in the web of things Amir, Mohammad, Pillai, Prashant, Hu, Yim Fun January 2014 (has links) No / The merger of the Internet of Things (IoT) with cloud computing has given birth to a Web of Things (WoT) which hosts heterogeneous and rapidly varying data. Traditional access control mechanisms such as Role-Based Access schemes are no longer suitable for modelling access control on such a large and dynamic scale as the actors may also change all the time. For such a dynamic mix of applications, data and actors, a more distributed and flexible model is required. Token-Based Access Control is one such scheme which can easily model and comfortably handle interactions with big data in the cloud and enable provisioning of access to fine levels of granularity. However, simple token access models quickly become hard to manage in the face of a rapidly growing repository. This paper proposes a novel token access model based on a cascading permissions policy model which can easily control interactivity with big data without becoming a menace to manage and administer.
214	Applications of big data approaches to topics in infectious disease epidemiology Benedum, Corey Michael 04 June 2019 (has links) The availability of big data (i.e., a large number of observations and variables per observation) and advancements in statistical methods present numerous exciting opportunities and challenges in infectious disease epidemiology. The studies in this dissertation address questions regarding the epidemiology of dengue and sepsis by applying big data and traditional epidemiologic approaches. In doing so, we aim to advance our understanding of both diseases and to critically evaluate traditional and novel methods to understand how these approaches can be leveraged to improve epidemiologic research. In the first study, we examined the ability of machine learning and regression modeling approaches to predict dengue occurrence in three endemic locations. When we utilized models with historical surveillance, population, and weather data, machine learning models predicted weekly case counts more accurately than regression models. When we removed surveillance data, regression models were more accurate. Furthermore, machine learning models were able to accurately forecast the onset and duration of dengue outbreaks up to 12 weeks in advance without using surveillance data. This study highlighted potential benefits that machine learning models could bring to a dengue early warning system. The second study utilized machine learning approaches to identify the rainfall conditions which lead to mosquito larvae being washed away from breeding sites occurring in roadside storm drains in Singapore. We then used conventional epidemiologic approaches to evaluate how the occurrence of these washout events affect dengue occurrence in subsequent weeks. This study demonstrated an inverse relationship between washout events and dengue outbreak risk. The third study compared algorithmic-based and conventional epidemiologic approaches used to evaluate variables for statistical adjustment. We used these approaches to identify what variables to adjust for when estimating the effect of autoimmune disease on 30-day mortality among ICU patients with sepsis. In this study, autoimmune disease presence was associated with an approximate 10-20% reduction in mortality risk. Risk estimates identified with algorithmic-based approaches were compatible with conventional approaches and did not differ by more than 9%. This study revealed that algorithmic-based approaches can approximate conventional selection methods, and may be useful when the appropriate set of variables to adjust for is unknown. Epidemiology Big data Dengue Epidemiology Machine learning Sepsis
215	Big Data Phylogenomics: Methods and Applications Sharma, Sudip, 0000-0002-0469-1211 08 1900 (has links) Phylogenomics, the study of genome-scale data containing many genes and species, has advanced our understanding of patterns of evolutionary relationships and processes throughout the Tree of Life. Recent research studies frequently use such large-scale datasets with the expectation of recovering historical species relationships with high statistical confidence. At the same time, the computational complexity and resource requirements for analyzing such large-scale data increase with the number of genomic loci and sites. Therefore, different crucial steps of phylogenomic studies, like model selection and estimating bootstrap confidence limits on inferred phylogenetic trees, are often not feasible on regular desktop computers and generally time-consuming on high-performance computing systems. Moreover, increasing the number of genes in the data increases the chance of including genomic loci that may cause biased and cause fragile species relationships that spuriously receive high statistical support. Such data errors in phylogenomic datasets are major impediments to building a robust tree of life. Contemporary approaches to detect such data error require alternative tree hypotheses for the fragile clades, which may be unavailable a priori or too numerous to evaluate. In addition, finding causal genomic loci under these contemporary statistical frameworks is also computationally expensive and increases with the number of alternatives to be compared. In my Ph.D. dissertation, I have pursued three major research projects: (1) Introduction and advancement of the bag of little bootstraps approach for placing the confidence limits on species relationships from genome-scale phylogenetic trees. (2) Development of a novel site-subsampling approach to select the best-fit substitution model for genome-scale phylogenomic datasets. Both of these approaches analyze data subsamples containing a small fraction of sites from the full phylogenomic alignment. Before analysis, sites in a subsample are repeatedly chosen randomly to build a new alignment that contains as many sites as the original dataset, which is shown to retain the statistical properties of the full dataset. Analyses of simulated and empirical datasets exhibited that these approaches are fast and require a minuscule amount of computer memory while retaining similar accuracy as that achieved by full dataset analysis. (3) Development of a supervised machine learning approach based on the Evolutionary Sparse Learning framework for detecting fragile clades and associated gene-species combinations. This approach first builds a genetic model for a monophyletic clade of interest, clade probability for the clade, and gene-species concordance scores. The clade model and these novel matrices expose fragile clades and highly influential as well as disruptive gene-species candidates underlying the fragile clades. The efficiency and usefulness of this approach are demonstrated by analyzing a set of simulated and empirical datasets and comparing their performance with the state-of-the-art approaches. Furthermore, I have actively contributed to research projects exploring applications of these newly developed approaches to a variety of research projects. / Biology Biology Big data Machine learning Molecular evolution Phylogenetics Phylogenomics
216	Benchmarking Performance for Migrating a Relational Application to a Parallel Implementation Gadiraju, Krishna Karthik 13 October 2014 (has links) No description available. Computer Science Hive Hadoop benchmarking big data SQL queries
217	Predicting Diffusion of Contagious Diseases Using Social Media Big Data Elkin, Lauren S. 06 February 2015 (has links) No description available. Computer Science Information Science social media big data influenza diffusion
218	Conditional Correlation Analysis Bhatta, Sanjeev 05 June 2017 (has links) No description available. Computer Science subpopulation conditional correlation big data patterns unusualness
219	Using the Architectural Tradeoff Analysis Method to Evaluate the Software Architecture of a Semantic Search Engine: A Case Study Chatra Raveesh, Sandeep January 2013 (has links) No description available. Computer Science Semantic Hadoop Big data Search engine
220	Performance Characterization and Improvements of SQL-On-Hadoop Systems Kulkarni, Kunal Vikas 28 December 2016 (has links) No description available. Computer Science Hadoop SQL Impala Hive Big Data Joins HDFS

Search results