Global ETD Search

61	Identifying and Evaluating Early Stage Fintech Companies: Working with Consumer Internet Data and Analytic Tools Shoop, Alexander 24 January 2018 (has links) The purpose of this project is to work as an interdisciplinary team whose primary role is to mentor a team of WPI undergraduate students completing their Major Qualifying Project (MQP) in collaboration with Vestigo Ventures, LLC. (â€œVestigo Venturesâ€�) and Cogo Labs. We worked closely with the project sponsors at Vestigo Ventures and Cogo Labs to understand each sponsorâ€™s goals and desires, and then translated those thoughts into actionable items and concrete deliverables to be completed by the undergraduate student team. As a graduate student team with a diverse set of educational backgrounds and a range of academic and professional experiences, we provided two primary functions throughout the duration of this project. The first function was to develop a roadmap for each individual project, with concrete steps, justification, goals and deliverables. The second function was to provide the undergraduate team with clarification and assistance throughout the implementation and completion of each project, as well as provide our opinions and thoughts on any proposed changes. The two teams worked together in lock-step in order to provide the project sponsors with a complete set of deliverables, with the undergraduate team primarily responsible for implementation and final delivery of each completed project. big data data science data analytics algorithm vestigo ventures cogo labs financial math fintech
62	Rättidiga beslut genererade från olika typer av dataanalyser : En fallstudie inom Landstinget i Värmland / Right-time Decisions Generated from Different types of Data Analysis : A Case Study Within the County Council of Värmland Molin, Mattias January 2019 (has links) Syftet med denna kandidatuppsats är att, via kvalitativa intervjuer, identifiera och beskriva ett landstings process för hur patient- och sjukvårdsdata, genom olika typer av dataanalyser, kan generera rättidiga beslut. I det valda kvalitativa tillvägagångssättet skapades en semi-strukturerad intervjuguide, baserad på en analysmodell. Fem intervjuer med anställda inom fallstudieorganisationen genomfördes. För att underlätta för respondenterna och ge en tydlig helhetssyn på intervjuns innehåll, fick respondenterna innan intervjun se analysmodellen som skapats. Deskriptiv analys är den vanligaste dataanalysmodellen som används av respondenterna, i form av verksamhets- och produktionsuppföljning. Det framgår även i undersökningen att det är viktigt med ett brett dataunderlag för beslutsfattning och att Landstinget i Värmland överlag är en mycket faktabaserad organisation. Förutsättningarna för att kunna fatta rättidiga beslut, genererade från dataanalyser, anses vara att veta vilka frågeställningar som ska besvaras, att data snabbt finns tillgänglig och att analyser utförs på denna tillgängliga data. Men även om data snabbt finns tillgängligt för beslutsfattarna och analyser gjorts, medför inte det rättidiga beslut. Data ger inte alltid en korrekt eller sann bild, utan behöver först tolkas innan besluten kan fattas. Tolkningar av data kan skilja sig åt vilket medför att ytterligare en person behöver titta på materialet, vilket också medför att besluten skjuts fram. Det anses även saknas ett enhetligt arbetssätt för hur de anställda arbetar i vårdsystemen. Rättidiga beslut finns på olika nivåer inom landstinget. När det gäller den övergripande nivån är det inte brådskande med beslut, men ju mer operativt personalen arbetar desto viktigare är det med snabba beslut. Detta visar att ”rättidiga beslut” har olika betydelser, beroende på var i verksamheten de anställda befinner sig. Vården är tidspressad och det är inte alltid det finns möjlighet att analysera de data som finns tillräckligt mycket. Healthcare analytics Data analytics in healthcare Business intelligence in healthcare Decision making Information Systems
63	Centralized and distributed learning methods for predictive health analytics Brisimi, Theodora 02 November 2017 (has links) The U.S. health care system is considered costly and highly inefficient, devoting substantial resources to the treatment of acute conditions in a hospital setting rather than focusing on prevention and keeping patients out of the hospital. The potential for cost savings is large; in the U.S. more than $30 billion are spent each year on hospitalizations deemed preventable, 31% of which is attributed to heart diseases and 20% to diabetes. Motivated by this, our work focuses on developing centralized and distributed learning methods to predict future heart- or diabetes- related hospitalizations based on patient Electronic Health Records (EHRs). We explore a variety of supervised classification methods and we present a novel likelihood ratio based method (K-LRT) that predicts hospitalizations and offers interpretability by identifying the K most significant features that lead to a positive prediction for each patient. Next, assuming that the positive class consists of multiple clusters (hospitalized patients due to different reasons), while the negative class is drawn from a single cluster (non-hospitalized patients healthy in every aspect), we present an alternating optimization approach, which jointly discovers the clusters in the positive class and optimizes the classifiers that separate each positive cluster from the negative samples. We establish the convergence of the method and characterize its VC dimension. Last, we develop a decentralized cluster Primal-Dual Splitting (cPDS) method for large-scale problems, that is computationally efficient and privacy-aware. Such a distributed learning scheme is relevant for multi-institutional collaborations or peer-to-peer applications, allowing the agents to collaborate, while keeping every participant's data private. cPDS is proved to have an improved convergence rate compared to existing centralized and decentralized methods. We test all methods on real EHR data from the Boston Medical Center and compare results in terms of prediction accuracy and interpretability. Computer science Centralized and distributed methods Data analytics Diabetes hospitalizations Heart hospitalizations Machine learning Predictive health analytics
64	Chromosome 3D Structure Modeling and New Approaches For General Statistical Inference Rongrong Zhang (5930474) 03 January 2019 (has links) <div>This thesis consists of two separate topics, which include the use of piecewise helical models for the inference of 3D spatial organizations of chromosomes and new approaches for general statistical inference. The recently developed Hi-C technology enables a genome-wide view of chromosome</div><div>spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. We propose a parsimonious, easy to interpret, and robust piecewise helical curve model for the inference of 3D chromosomal structures</div><div>from Hi-C data, for both individual topologically associated domains and whole chromosomes. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model tting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.</div><div><br></div><div><div>For potential applications in big data analytics and machine learning, we propose to use deep neural networks to automate the Bayesian model selection and parameter estimation procedures. Two such frameworks are developed under different scenarios. First, we construct a deep neural network-based Bayes estimator for the parameters of a given model. The neural Bayes estimator mitigates the computational challenges faced by traditional approaches for computing Bayes estimators. When applied to the generalized linear mixed models, the neural Bayes estimator</div><div>outperforms existing methods implemented in R packages and SAS procedures. Second, we construct a deep convolutional neural networks-based framework to perform</div><div>simultaneous Bayesian model selection and parameter estimation. We refer to the neural networks for model selection and parameter estimation in the framework as the</div><div>neural model selector and parameter estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation</div><div>study shows that both the neural selector and estimator demonstrate excellent performances.</div></div><div><br></div><div><div>The theory of Conditional Inferential Models (CIMs) has been introduced to combine information for efficient inference in the Inferential Models framework for priorfree</div><div>and yet valid probabilistic inference. While the general theory is subject to further development, the so-called regular CIMs are simple. We establish and prove a</div><div>necessary and sucient condition for the existence and identication of regular CIMs. More specically, it is shown that for inference based on a sample from continuous</div><div>distributions with unknown parameters, the corresponding CIM is regular if and only if the unknown parameters are generalized location and scale parameters, indexing</div><div>the transformations of an affine group.</div></div> Statistics big data analytics machine learning deep neural networks bayesian model selection
65	Security Analysis of Interdependent Critical Infrastructures: Power, Cyber and Gas January 2018 (has links) abstract: Our daily life is becoming more and more reliant on services provided by the infrastructures power, gas , communication networks. Ensuring the security of these infrastructures is of utmost importance. This task becomes ever more challenging as the inter-dependence among these infrastructures grows and a security breach in one infrastructure can spill over to the others. The implication is that the security practices/ analysis recommended for these infrastructures should be done in coordination. This thesis, focusing on the power grid, explores strategies to secure the system that look into the coupling of the power grid to the cyber infrastructure, used to manage and control it, and to the gas grid, that supplies an increasing amount of reserves to overcome contingencies. The first part (Part I) of the thesis, including chapters 2 through 4, focuses on the coupling of the power and the cyber infrastructure that is used for its control and operations. The goal is to detect malicious attacks gaining information about the operation of the power grid to later attack the system. In chapter 2, we propose a hierarchical architecture that correlates the analysis of high resolution Micro-Phasor Measurement Unit (microPMU) data and traffic analysis on the Supervisory Control and Data Acquisition (SCADA) packets, to infer the security status of the grid and detect the presence of possible intruders. An essential part of this architecture is tied to the analysis on the microPMU data. In chapter 3 we establish a set of anomaly detection rules on microPMU data that flag "abnormal behavior". A placement strategy of microPMU sensors is also proposed to maximize the sensitivity in detecting anomalies. In chapter 4, we focus on developing rules that can localize the source of an events using microPMU to further check whether a cyber attack is causing the anomaly, by correlating SCADA traffic with the microPMU data analysis results. The thread that unies the data analysis in this chapter is the fact that decision are made without fully estimating the state of the system; on the contrary, decisions are made using a set of physical measurements that falls short by orders of magnitude to meet the needs for observability. More specifically, in the first part of this chapter (sections 4.1- 4.2), using microPMU data in the substation, methodologies for online identification of the source Thevenin parameters are presented. This methodology is used to identify reconnaissance activity on the normally-open switches in the substation, initiated by attackers to gauge its controllability over the cyber network. The applications of this methodology in monitoring the voltage stability of the grid is also discussed. In the second part of this chapter (sections 4.3-4.5), we investigate the localization of faults. Since the number of PMU sensors available to carry out the inference is insufficient to ensure observability, the problem can be viewed as that of under-sampling a "graph signal"; the analysis leads to a PMU placement strategy that can achieve the highest resolution in localizing the fault, for a given number of sensors. In both cases, the results of the analysis are leveraged in the detection of cyber-physical attacks, where microPMU data and relevant SCADA network traffic information are compared to determine if a network breach has affected the integrity of the system information and/or operations. In second part of this thesis (Part II), the security analysis considers the adequacy and reliability of schedules for the gas and power network. The motivation for scheduling jointly supply in gas and power networks is motivated by the increasing reliance of power grids on natural gas generators (and, indirectly, on gas pipelines) as providing critical reserves. Chapter 5 focuses on unveiling the challenges and providing solution to this problem. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Electrical engineering coupled systems cyber-physical system data analytics power grid
66	Utilization of automated location tracking for clinical workflow analytics and visualization January 2018 (has links) abstract: The analysis of clinical workflow offers many challenges to clinical stakeholders and researchers, especially in environments characterized by dynamic and concurrent processes. Workflow analysis in such environments is essential for monitoring performance and finding bottlenecks and sources of error. Clinical workflow analysis has been enhanced with the inclusion of modern technologies. One such intervention is automated location tracking which is a system that detects the movement of clinicians and equipment. Utilizing the data produced from automated location tracking technologies can lead to the development of novel workflow analytics that can be used to complement more traditional approaches such as ethnography and grounded-theory based qualitative methods. The goals of this research are to: (i) develop a series of analytic techniques to derive deeper workflow-related insight in an emergency department setting, (ii) overlay data from disparate sources (quantitative and qualitative) to develop strategies that facilitate workflow redesign, and (iii) incorporate visual analytics methods to improve the targeted visual feedback received by providers based on the findings. The overarching purpose is to create a framework to demonstrate the utility of automated location tracking data used in conjunction with clinical data like EHR logs and its vital role in the future of clinical workflow analysis/analytics. This document is categorized based on two primary aims of the research. The first aim deals with the use of automated location tracking data to develop a novel methodological/exploratory framework for clinical workflow. The second aim is to overlay the quantitative data generated from the previous aim on data from qualitative observation and shadowing studies (mixed methods) to develop a deeper view of clinical workflow that can be used to facilitate workflow redesign. The final sections of the document speculate on the direction of this work where the potential of this research in the creation of fully integrated clinical environments i.e. environments with state-of-the-art location tracking and other data collection mechanisms, is discussed. The main purpose of this research is to demonstrate ways by which clinical processes can be continuously monitored allowing for proactive adaptations in the face of technological and process changes to minimize any negative impact on the quality of patient care and provider satisfaction. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2018 Operations research Health sciences Health care management Clinical Workflow Data Analytics Real-Time Location Sensing Visualization
67	Fast demand response with datacenter loads: a green dimension of big data McClurg, Josiah 01 August 2017 (has links) Demand response is one of the critical technologies necessary for allowing large-scale penetration of intermittent renewable energy sources in the electric grid. Data centers are especially attractive candidates for providing flexible, real-time demand response services to the grid because they are capable of fast power ramp-rates, large dynamic range, and finely-controllable power consumption. This thesis makes a contribution toward implementing load shaping with server clusters through a detailed experimental investigation of three broadly-applicable datacenter workload scenarios. We experimentally demonstrate the eminent feasibility of datacenter demand response with a distributed video transcoding application and a simple distributed power controller. We also show that while some software power capping interfaces performed better than others, all the interfaces we investigated had the high dynamic range and low power variance required to achieve high quality power tracking. Our next investigation presents an empirical performance evaluation of algorithms that replace arithmetic operations with low-level bit operations for power-aware Big Data processing. Specifically, we compare two different data structures in terms of execution time and power efficiency: (a) a baseline design using arrays, and (b) a design using bit-slice indexing (BSI) and distributed BSI arithmetic. Across three different datasets and three popular queries, we show that the bit-slicing queries consistently outperform the array algorithm in both power efficiency and execution time. In the context of datacenter power shaping, this performance optimization enables additional power flexibility -- achieving the same or greater performance than the baseline approach, even under power constraints. The investigation of read-optimized index queries leads up to an experimental investigation of the tradeoffs among power constraint, query freshness, and update aggregation size in a dynamic big data environment. We compare several update strategies, presenting a bitmap update optimization that allows improved performance over both a baseline approach and an existing state-of-the-art update strategy. Performing this investigation in the context of load shaping, we show that read-only range queries can be served without performance impact under power cap, and index updates can be tuned to provide a flexible base load. This thesis concludes with a brief discussion of control implementation and summary of our findings. big data analytics cluster computing demand response load shaping power aware computing Electrical and Computer Engineering
68	RETAIL DATA ANALYTICS USING GRAPH DATABASE Priya, Rashmi 01 January 2018 (has links) Big data is an area focused on storing, processing and visualizing huge amount of data. Today data is growing faster than ever before. We need to find the right tools and applications and build an environment that can help us to obtain valuable insights from the data. Retail is one of the domains that collects huge amount of transaction data everyday. Retailers need to understand their customer’s purchasing pattern and behavior in order to take better business decisions. Market basket analysis is a field in data mining, that is focused on discovering patterns in retail’s transaction data. Our goal is to find tools and applications that can be used by retailers to quickly understand their data and take better business decisions. Due to the amount and complexity of data, it is not possible to do such activities manually. We witness that trends change very quickly and retailers want to be quick in adapting the change and taking actions. This needs automation of processes and using algorithms that are efficient and fast. In our work, we mine transaction data by modeling the data as graphs. We use clustering algorithms to discover communities (clusters) in the data and then use the clusters for building a recommendation system that can recommend products to customers based on their buying behavior. Retail Data Analytics Clustering Graph Database Neo4j Louvain Algorithm Recommendation System Computer Engineering
69	Shared and distributed memory parallel algorithms to solve big data problems in biological, social network and spatial domain applications Sharma, Rahil 01 December 2016 (has links) Big data refers to information which cannot be processed and analyzed using traditional approaches and tools, due to 4 V's - sheer Volume, Velocity at which data is received and processed, and data Variety and Veracity. Today massive volumes of data originate in domains such as geospatial analysis, biological and social networks, etc. Hence, scalable algorithms for effcient processing of this massive data is a signicant challenge in the field of computer science. One way to achieve such effcient and scalable algorithms is by using shared & distributed memory parallel programming models. In this thesis, we present a variety of such algorithms to solve problems in various above mentioned domains. We solve five problems that fall into two categories. The first group of problems deals with the issue of community detection. Detecting communities in real world networks is of great importance because they consist of patterns that can be viewed as independent components, each of which has distinct features and can be detected based upon network structure. For example, communities in social networks can help target users for marketing purposes, provide user recommendations to connect with and join communities or forums, etc. We develop a novel sequential algorithm to accurately detect community structures in biological protein-protein interaction networks, where a community corresponds with a functional module of proteins. Generally, such sequential algorithms are computationally expensive, which makes them impractical to use for large real world networks. To address this limitation, we develop a new highly scalable Symmetric Multiprocessing (SMP) based parallel algorithm to detect high quality communities in large subsections of social networks like Facebook and Amazon. Due to the SMP architecture, however, our algorithm cannot process networks whose size is greater than the size of the RAM of a single machine. With the increasing size of social networks, community detection has become even more difficult, since network size can reach up to hundreds of millions of vertices and edges. Processing such massive networks requires several hundred gigabytes of RAM, which is only possible by adopting distributed infrastructure. To address this, we develop a novel hybrid (shared + distributed memory) parallel algorithm to efficiently detect high quality communities in massive Twitter and .uk domain networks. The second group of problems deals with the issue of effciently processing spatial Light Detection and Ranging (LiDAR) data. LiDAR data is widely used in forest and agricultural crop studies, landscape classification, 3D urban modeling, etc. Technological advancements in building LiDAR sensors have enabled highly accurate and dense LiDAR point clouds resulting in massive data volumes, which pose computing issues with processing and storage. We develop the first published landscape driven data reduction algorithm, which uses the slope-map of the terrain as a filter to reduce the data without sacrificing its accuracy. Our algorithm is highly scalable and adopts shared memory based parallel architecture. We also develop a parallel interpolation technique that is used to generate highly accurate continuous terrains, i.e. Digital Elevation Models (DEMs), from discrete LiDAR point clouds. Algorithms Big Data Analytics Distributed Systems High Performance Parallel Computing Computer Sciences
70	Komplexe Datenanalyseprozesse in serviceorientierten Umgebungen Habich, Dirk 24 January 2009 (has links) (PDF) Im Rahmen dieser Dissertation wird sich mit der Einbettung komplexer Datenanalyseprozesse in serviceorientierten Umgebungen beschäftigt. Diese Betrachtung beginnt mit einem konkreten Anwendungsgebiet, indem derartige Analyseprozesse eine entscheidende Rolle bei der Wissenserschließung spielen und ohne deren Hilfe kein Fortschritt erzielt werden kann. Im zweiten Teil werden konkrete komplexe Datenanalyseprozesse entwickelt, die den Ausgangspunkt für die Erörterung der Einbettung in eine serviceorientierte Umgebung bilden. Auf diese Einbettung wird schlussendlich im dritten Teil der Dissertation eingegangen und entsprechende Erweiterungen an den Technologien der bekanntesten Realisierungsform präsentiert. In der Evaluierung wird gezeigt, dass diese neue Form wesentlich besser geeignet ist für komplexe Datenanalyseprozesse als die bisherige Variante. SOA Datenanalyse Datenbanken Datenbanktechnologien SOA data analytics databases database technolgies ddc:004 rvk:ST 270

Search results