• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 158
  • 18
  • 8
  • 6
  • 5
  • 2
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 275
  • 275
  • 116
  • 65
  • 56
  • 49
  • 47
  • 47
  • 44
  • 43
  • 38
  • 31
  • 30
  • 29
  • 29
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Identifying and Evaluating Early Stage Fintech Companies: Working with Consumer Internet Data and Analytic Tools

Shoop, Alexander 24 January 2018 (has links)
The purpose of this project is to work as an interdisciplinary team whose primary role is to mentor a team of WPI undergraduate students completing their Major Qualifying Project (MQP) in collaboration with Vestigo Ventures, LLC. (“Vestigo Ventures�) and Cogo Labs. We worked closely with the project sponsors at Vestigo Ventures and Cogo Labs to understand each sponsor’s goals and desires, and then translated those thoughts into actionable items and concrete deliverables to be completed by the undergraduate student team. As a graduate student team with a diverse set of educational backgrounds and a range of academic and professional experiences, we provided two primary functions throughout the duration of this project. The first function was to develop a roadmap for each individual project, with concrete steps, justification, goals and deliverables. The second function was to provide the undergraduate team with clarification and assistance throughout the implementation and completion of each project, as well as provide our opinions and thoughts on any proposed changes. The two teams worked together in lock-step in order to provide the project sponsors with a complete set of deliverables, with the undergraduate team primarily responsible for implementation and final delivery of each completed project.
62

Rättidiga beslut genererade från olika typer av dataanalyser : En fallstudie inom Landstinget i Värmland / Right-time Decisions Generated from Different types of Data Analysis : A Case Study Within the County Council of Värmland

Molin, Mattias January 2019 (has links)
Syftet med denna kandidatuppsats är att, via kvalitativa intervjuer, identifiera och beskriva ett landstings process för hur patient- och sjukvårdsdata, genom olika typer av dataanalyser, kan generera rättidiga beslut.   I det valda kvalitativa tillvägagångssättet skapades en semi-strukturerad intervjuguide, baserad på en analysmodell. Fem intervjuer med anställda inom fallstudieorganisationen genomfördes. För att underlätta för respondenterna och ge en tydlig helhetssyn på intervjuns innehåll, fick respondenterna innan intervjun se analysmodellen som skapats.   Deskriptiv analys är den vanligaste dataanalysmodellen som används av respondenterna, i form av verksamhets- och produktionsuppföljning. Det framgår även i undersökningen att det är viktigt med ett brett dataunderlag för beslutsfattning och att Landstinget i Värmland överlag är en mycket faktabaserad organisation. Förutsättningarna för att kunna fatta rättidiga beslut, genererade från dataanalyser, anses vara att veta vilka frågeställningar som ska besvaras, att data snabbt finns tillgänglig och att analyser utförs på denna tillgängliga data. Men även om data snabbt finns tillgängligt för beslutsfattarna och analyser gjorts, medför inte det rättidiga beslut. Data ger inte alltid en korrekt eller sann bild, utan behöver först tolkas innan besluten kan fattas. Tolkningar av data kan skilja sig åt vilket medför att ytterligare en person behöver titta på materialet, vilket också medför att besluten skjuts fram. Det anses även saknas ett enhetligt arbetssätt för hur de anställda arbetar i vårdsystemen.   Rättidiga beslut finns på olika nivåer inom landstinget. När det gäller den övergripande nivån är det inte brådskande med beslut, men ju mer operativt personalen arbetar desto viktigare är det med snabba beslut. Detta visar att ”rättidiga beslut” har olika betydelser, beroende på var i verksamheten de anställda befinner sig. Vården är tidspressad och det är inte alltid det finns möjlighet att analysera de data som finns tillräckligt mycket.
63

Chromosome 3D Structure Modeling and New Approaches For General Statistical Inference

Rongrong Zhang (5930474) 03 January 2019 (has links)
<div>This thesis consists of two separate topics, which include the use of piecewise helical models for the inference of 3D spatial organizations of chromosomes and new approaches for general statistical inference. The recently developed Hi-C technology enables a genome-wide view of chromosome</div><div>spatial organizations, and has shed deep insights into genome structure and genome function. However, multiple sources of uncertainties make downstream data analysis and interpretation challenging. Specically, statistical models for inferring three-dimensional (3D) chromosomal structure from Hi-C data are far from their maturity. Most existing methods are highly over-parameterized, lacking clear interpretations, and sensitive to outliers. We propose a parsimonious, easy to interpret, and robust piecewise helical curve model for the inference of 3D chromosomal structures</div><div>from Hi-C data, for both individual topologically associated domains and whole chromosomes. When applied to a real Hi-C dataset, the piecewise helical model not only achieves much better model tting than existing models, but also reveals that geometric properties of chromatin spatial organization are closely related to genome function.</div><div><br></div><div><div>For potential applications in big data analytics and machine learning, we propose to use deep neural networks to automate the Bayesian model selection and parameter estimation procedures. Two such frameworks are developed under different scenarios. First, we construct a deep neural network-based Bayes estimator for the parameters of a given model. The neural Bayes estimator mitigates the computational challenges faced by traditional approaches for computing Bayes estimators. When applied to the generalized linear mixed models, the neural Bayes estimator</div><div>outperforms existing methods implemented in R packages and SAS procedures. Second, we construct a deep convolutional neural networks-based framework to perform</div><div>simultaneous Bayesian model selection and parameter estimation. We refer to the neural networks for model selection and parameter estimation in the framework as the</div><div>neural model selector and parameter estimator, respectively, which can be properly trained using labeled data systematically generated from candidate models. Simulation</div><div>study shows that both the neural selector and estimator demonstrate excellent performances.</div></div><div><br></div><div><div>The theory of Conditional Inferential Models (CIMs) has been introduced to combine information for efficient inference in the Inferential Models framework for priorfree</div><div>and yet valid probabilistic inference. While the general theory is subject to further development, the so-called regular CIMs are simple. We establish and prove a</div><div>necessary and sucient condition for the existence and identication of regular CIMs. More specically, it is shown that for inference based on a sample from continuous</div><div>distributions with unknown parameters, the corresponding CIM is regular if and only if the unknown parameters are generalized location and scale parameters, indexing</div><div>the transformations of an affine group.</div></div>
64

Security Analysis of Interdependent Critical Infrastructures: Power, Cyber and Gas

January 2018 (has links)
abstract: Our daily life is becoming more and more reliant on services provided by the infrastructures power, gas , communication networks. Ensuring the security of these infrastructures is of utmost importance. This task becomes ever more challenging as the inter-dependence among these infrastructures grows and a security breach in one infrastructure can spill over to the others. The implication is that the security practices/ analysis recommended for these infrastructures should be done in coordination. This thesis, focusing on the power grid, explores strategies to secure the system that look into the coupling of the power grid to the cyber infrastructure, used to manage and control it, and to the gas grid, that supplies an increasing amount of reserves to overcome contingencies. The first part (Part I) of the thesis, including chapters 2 through 4, focuses on the coupling of the power and the cyber infrastructure that is used for its control and operations. The goal is to detect malicious attacks gaining information about the operation of the power grid to later attack the system. In chapter 2, we propose a hierarchical architecture that correlates the analysis of high resolution Micro-Phasor Measurement Unit (microPMU) data and traffic analysis on the Supervisory Control and Data Acquisition (SCADA) packets, to infer the security status of the grid and detect the presence of possible intruders. An essential part of this architecture is tied to the analysis on the microPMU data. In chapter 3 we establish a set of anomaly detection rules on microPMU data that flag "abnormal behavior". A placement strategy of microPMU sensors is also proposed to maximize the sensitivity in detecting anomalies. In chapter 4, we focus on developing rules that can localize the source of an events using microPMU to further check whether a cyber attack is causing the anomaly, by correlating SCADA traffic with the microPMU data analysis results. The thread that unies the data analysis in this chapter is the fact that decision are made without fully estimating the state of the system; on the contrary, decisions are made using a set of physical measurements that falls short by orders of magnitude to meet the needs for observability. More specifically, in the first part of this chapter (sections 4.1- 4.2), using microPMU data in the substation, methodologies for online identification of the source Thevenin parameters are presented. This methodology is used to identify reconnaissance activity on the normally-open switches in the substation, initiated by attackers to gauge its controllability over the cyber network. The applications of this methodology in monitoring the voltage stability of the grid is also discussed. In the second part of this chapter (sections 4.3-4.5), we investigate the localization of faults. Since the number of PMU sensors available to carry out the inference is insufficient to ensure observability, the problem can be viewed as that of under-sampling a "graph signal"; the analysis leads to a PMU placement strategy that can achieve the highest resolution in localizing the fault, for a given number of sensors. In both cases, the results of the analysis are leveraged in the detection of cyber-physical attacks, where microPMU data and relevant SCADA network traffic information are compared to determine if a network breach has affected the integrity of the system information and/or operations. In second part of this thesis (Part II), the security analysis considers the adequacy and reliability of schedules for the gas and power network. The motivation for scheduling jointly supply in gas and power networks is motivated by the increasing reliance of power grids on natural gas generators (and, indirectly, on gas pipelines) as providing critical reserves. Chapter 5 focuses on unveiling the challenges and providing solution to this problem. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
65

Utilization of automated location tracking for clinical workflow analytics and visualization

January 2018 (has links)
abstract: The analysis of clinical workflow offers many challenges to clinical stakeholders and researchers, especially in environments characterized by dynamic and concurrent processes. Workflow analysis in such environments is essential for monitoring performance and finding bottlenecks and sources of error. Clinical workflow analysis has been enhanced with the inclusion of modern technologies. One such intervention is automated location tracking which is a system that detects the movement of clinicians and equipment. Utilizing the data produced from automated location tracking technologies can lead to the development of novel workflow analytics that can be used to complement more traditional approaches such as ethnography and grounded-theory based qualitative methods. The goals of this research are to: (i) develop a series of analytic techniques to derive deeper workflow-related insight in an emergency department setting, (ii) overlay data from disparate sources (quantitative and qualitative) to develop strategies that facilitate workflow redesign, and (iii) incorporate visual analytics methods to improve the targeted visual feedback received by providers based on the findings. The overarching purpose is to create a framework to demonstrate the utility of automated location tracking data used in conjunction with clinical data like EHR logs and its vital role in the future of clinical workflow analysis/analytics. This document is categorized based on two primary aims of the research. The first aim deals with the use of automated location tracking data to develop a novel methodological/exploratory framework for clinical workflow. The second aim is to overlay the quantitative data generated from the previous aim on data from qualitative observation and shadowing studies (mixed methods) to develop a deeper view of clinical workflow that can be used to facilitate workflow redesign. The final sections of the document speculate on the direction of this work where the potential of this research in the creation of fully integrated clinical environments i.e. environments with state-of-the-art location tracking and other data collection mechanisms, is discussed. The main purpose of this research is to demonstrate ways by which clinical processes can be continuously monitored allowing for proactive adaptations in the face of technological and process changes to minimize any negative impact on the quality of patient care and provider satisfaction. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2018
66

Fast demand response with datacenter loads: a green dimension of big data

McClurg, Josiah 01 August 2017 (has links)
Demand response is one of the critical technologies necessary for allowing large-scale penetration of intermittent renewable energy sources in the electric grid. Data centers are especially attractive candidates for providing flexible, real-time demand response services to the grid because they are capable of fast power ramp-rates, large dynamic range, and finely-controllable power consumption. This thesis makes a contribution toward implementing load shaping with server clusters through a detailed experimental investigation of three broadly-applicable datacenter workload scenarios. We experimentally demonstrate the eminent feasibility of datacenter demand response with a distributed video transcoding application and a simple distributed power controller. We also show that while some software power capping interfaces performed better than others, all the interfaces we investigated had the high dynamic range and low power variance required to achieve high quality power tracking. Our next investigation presents an empirical performance evaluation of algorithms that replace arithmetic operations with low-level bit operations for power-aware Big Data processing. Specifically, we compare two different data structures in terms of execution time and power efficiency: (a) a baseline design using arrays, and (b) a design using bit-slice indexing (BSI) and distributed BSI arithmetic. Across three different datasets and three popular queries, we show that the bit-slicing queries consistently outperform the array algorithm in both power efficiency and execution time. In the context of datacenter power shaping, this performance optimization enables additional power flexibility -- achieving the same or greater performance than the baseline approach, even under power constraints. The investigation of read-optimized index queries leads up to an experimental investigation of the tradeoffs among power constraint, query freshness, and update aggregation size in a dynamic big data environment. We compare several update strategies, presenting a bitmap update optimization that allows improved performance over both a baseline approach and an existing state-of-the-art update strategy. Performing this investigation in the context of load shaping, we show that read-only range queries can be served without performance impact under power cap, and index updates can be tuned to provide a flexible base load. This thesis concludes with a brief discussion of control implementation and summary of our findings.
67

RETAIL DATA ANALYTICS USING GRAPH DATABASE

Priya, Rashmi 01 January 2018 (has links)
Big data is an area focused on storing, processing and visualizing huge amount of data. Today data is growing faster than ever before. We need to find the right tools and applications and build an environment that can help us to obtain valuable insights from the data. Retail is one of the domains that collects huge amount of transaction data everyday. Retailers need to understand their customer’s purchasing pattern and behavior in order to take better business decisions. Market basket analysis is a field in data mining, that is focused on discovering patterns in retail’s transaction data. Our goal is to find tools and applications that can be used by retailers to quickly understand their data and take better business decisions. Due to the amount and complexity of data, it is not possible to do such activities manually. We witness that trends change very quickly and retailers want to be quick in adapting the change and taking actions. This needs automation of processes and using algorithms that are efficient and fast. In our work, we mine transaction data by modeling the data as graphs. We use clustering algorithms to discover communities (clusters) in the data and then use the clusters for building a recommendation system that can recommend products to customers based on their buying behavior.
68

Shared and distributed memory parallel algorithms to solve big data problems in biological, social network and spatial domain applications

Sharma, Rahil 01 December 2016 (has links)
Big data refers to information which cannot be processed and analyzed using traditional approaches and tools, due to 4 V's - sheer Volume, Velocity at which data is received and processed, and data Variety and Veracity. Today massive volumes of data originate in domains such as geospatial analysis, biological and social networks, etc. Hence, scalable algorithms for effcient processing of this massive data is a signicant challenge in the field of computer science. One way to achieve such effcient and scalable algorithms is by using shared & distributed memory parallel programming models. In this thesis, we present a variety of such algorithms to solve problems in various above mentioned domains. We solve five problems that fall into two categories. The first group of problems deals with the issue of community detection. Detecting communities in real world networks is of great importance because they consist of patterns that can be viewed as independent components, each of which has distinct features and can be detected based upon network structure. For example, communities in social networks can help target users for marketing purposes, provide user recommendations to connect with and join communities or forums, etc. We develop a novel sequential algorithm to accurately detect community structures in biological protein-protein interaction networks, where a community corresponds with a functional module of proteins. Generally, such sequential algorithms are computationally expensive, which makes them impractical to use for large real world networks. To address this limitation, we develop a new highly scalable Symmetric Multiprocessing (SMP) based parallel algorithm to detect high quality communities in large subsections of social networks like Facebook and Amazon. Due to the SMP architecture, however, our algorithm cannot process networks whose size is greater than the size of the RAM of a single machine. With the increasing size of social networks, community detection has become even more difficult, since network size can reach up to hundreds of millions of vertices and edges. Processing such massive networks requires several hundred gigabytes of RAM, which is only possible by adopting distributed infrastructure. To address this, we develop a novel hybrid (shared + distributed memory) parallel algorithm to efficiently detect high quality communities in massive Twitter and .uk domain networks. The second group of problems deals with the issue of effciently processing spatial Light Detection and Ranging (LiDAR) data. LiDAR data is widely used in forest and agricultural crop studies, landscape classification, 3D urban modeling, etc. Technological advancements in building LiDAR sensors have enabled highly accurate and dense LiDAR point clouds resulting in massive data volumes, which pose computing issues with processing and storage. We develop the first published landscape driven data reduction algorithm, which uses the slope-map of the terrain as a filter to reduce the data without sacrificing its accuracy. Our algorithm is highly scalable and adopts shared memory based parallel architecture. We also develop a parallel interpolation technique that is used to generate highly accurate continuous terrains, i.e. Digital Elevation Models (DEMs), from discrete LiDAR point clouds.
69

Komplexe Datenanalyseprozesse in serviceorientierten Umgebungen

Habich, Dirk 24 January 2009 (has links) (PDF)
Im Rahmen dieser Dissertation wird sich mit der Einbettung komplexer Datenanalyseprozesse in serviceorientierten Umgebungen beschäftigt. Diese Betrachtung beginnt mit einem konkreten Anwendungsgebiet, indem derartige Analyseprozesse eine entscheidende Rolle bei der Wissenserschließung spielen und ohne deren Hilfe kein Fortschritt erzielt werden kann. Im zweiten Teil werden konkrete komplexe Datenanalyseprozesse entwickelt, die den Ausgangspunkt für die Erörterung der Einbettung in eine serviceorientierte Umgebung bilden. Auf diese Einbettung wird schlussendlich im dritten Teil der Dissertation eingegangen und entsprechende Erweiterungen an den Technologien der bekanntesten Realisierungsform präsentiert. In der Evaluierung wird gezeigt, dass diese neue Form wesentlich besser geeignet ist für komplexe Datenanalyseprozesse als die bisherige Variante.
70

Middleware for online scientific data analytics at extreme scale

Zheng, Fang 22 May 2014 (has links)
Scientific simulations running on High End Computing machines in domains like Fusion, Astrophysics, and Combustion now routinely generate terabytes of data in a single run, and these data volumes are only expected to increase. Since such massive simulation outputs are key to scientific discovery, the ability to rapidly store, move, analyze, and visualize data is critical to scientists' productivity. Yet there are already serious I/O bottlenecks on current supercomputers, and movement toward the Exascale is further accelerating this trend. This dissertation is concerned with the design, implementation, and evaluation of middleware-level solutions to enable high performance and resource efficient online data analytics to process massive simulation output data at large scales. Online data analytics can effectively overcome the I/O bottleneck for scientific applications at large scales by processing data as it moves through the I/O path. Online analytics can extract valuable insights from live simulation output in a timely manner, better prepare data for subsequent deep analysis and visualization, and gain improved performance and reduced data movement cost (both in time and in power) compared to the conventional post-processing paradigm. The thesis identifies the key challenges for online data analytics based on the needs of a variety of large-scale scientific applications, and proposes a set of novel and effective approaches to efficiently program, distribute, and schedule online data analytics along the critical I/O path. In particular, its solution approach i) provides a high performance data movement substrate to support parallel and complex data exchanges between simulation and online data analytics, ii) enables placement flexibility of analytics to exploit distributed resources, iii) for co-placement of analytics with simulation codes on the same nodes, it uses fined-grained scheduling to harvest idle resources for running online analytics with minimal interference to the simulation, and finally, iv) it supports scalable efficient online spatial indices to accelerate data analytics and visualization on the deep memory hierarchies of high end machines. Our middleware approach is evaluated with leadership scientific applications in domains like Fusion, Combustion, and Molecular Dynamics, and on different High End Computing platforms. Substantial improvements are demonstrated in end-to-end application performance and in resource efficiency at scales of up to 16384 cores, for a broad range of analytics and visualization codes. The outcome is a useful and effective software platform for online scientific data analytics facilitating large-scale scientific data exploration.

Page generated in 0.0597 seconds