Global ETD Search

401	Opportunities and challenges of Big Data Analytics in healthcare : An exploratory study on the adoption of big data analytics in the Management of Sickle Cell Anaemia. Saenyi, Betty January 2018 (has links) Background: With increasing technological advancements, healthcare providers are adopting electronic health records (EHRs) and new health information technology systems. Consequently, data from these systems is accumulating at a faster rate creating a need for more robust ways of capturing, storing and processing the data. Big data analytics is used in extracting insight form such large amounts of medical data and is increasingly becoming a valuable practice for healthcare organisations. Could these strategies be applied in disease management? Especially in rare conditions like Sickle Cell Disease (SCD)? The study answers the following research questions;1. What Data Management practices are used in Sickle Cell Anaemia management?2. What areas in the management of sickle cell anaemia could benefit from use of big data Analytics?3. What are the challenges of applying big data analytics in the management of sickle cell anaemia?Purpose: The purpose of this research was to serve as pre-study in establishing the opportunities and challenges of applying big data analytics in the management of SCDMethod: The study adopted both deductive and inductive approaches. Data was collected through interviews based on a framework which was modified specifically for this study. It was then inductively analysed to answer the research questions.Conclusion: Although there is a lot of potential for big data analytics in SCD in areas like population health management, evidence-based medicine and personalised care, its adoption is not a surety. This is because of lack of interoperability between the existing systems and strenuous legal compliant processes in data acquisition. Big data Analytics Sickle cell Anaemia Healthcare Other Engineering and Technologies Annan teknik
402	Management, visualisation & mining of quantitative proteomics data Ahmad, Yasmeen January 2012 (has links) Exponential data growth in life sciences demands cross discipline work that brings together computing and life sciences in a usable manner that can enhance knowledge and understanding in both fields. High throughput approaches, advances in instrumentation and overall complexity of mass spectrometry data have made it impossible for researchers to manually analyse data using existing market tools. By applying a user-centred approach to effectively capture domain knowledge and experience of biologists, this thesis has bridged the gap between computation and biology through software, PepTracker (http://www.peptracker.com). This software provides a framework for the systematic detection and analysis of proteins that can be correlated with biological properties to expand the functional annotation of the genome. The tools created in this study aim to place analysis capabilities back in the hands of biologists, who are expert in evaluating their data. Another major advantage of the PepTracker suite is the implementation of a data warehouse, which manages and collates highly annotated experimental data from numerous experiments carried out by many researchers. This repository captures the collective experience of a laboratory, which can be accessed via user-friendly interfaces. Rather than viewing datasets as isolated components, this thesis explores the potential that can be gained from collating datasets in a “super-experiment” ideology, leading to formation of broad ranging questions and promoting biology driven lines of questioning. This has been uniquely implemented by integrating tools and techniques from the field of Business Intelligence with Life Sciences and successfully shown to aid in the analysis of proteomic interaction experiments. Having conquered a means of documenting a static proteomics snapshot of cells, the proteomics field is progressing towards understanding the extremely complex nature of cell dynamics. PepTracker facilitates this by providing the means to gather and analyse many protein properties to generate new biological insight, as demonstrated by the identification of novel protein isoforms. 572
403	Security Analysis of Interdependent Critical Infrastructures: Power, Cyber and Gas January 2018 (has links) abstract: Our daily life is becoming more and more reliant on services provided by the infrastructures power, gas , communication networks. Ensuring the security of these infrastructures is of utmost importance. This task becomes ever more challenging as the inter-dependence among these infrastructures grows and a security breach in one infrastructure can spill over to the others. The implication is that the security practices/ analysis recommended for these infrastructures should be done in coordination. This thesis, focusing on the power grid, explores strategies to secure the system that look into the coupling of the power grid to the cyber infrastructure, used to manage and control it, and to the gas grid, that supplies an increasing amount of reserves to overcome contingencies. The first part (Part I) of the thesis, including chapters 2 through 4, focuses on the coupling of the power and the cyber infrastructure that is used for its control and operations. The goal is to detect malicious attacks gaining information about the operation of the power grid to later attack the system. In chapter 2, we propose a hierarchical architecture that correlates the analysis of high resolution Micro-Phasor Measurement Unit (microPMU) data and traffic analysis on the Supervisory Control and Data Acquisition (SCADA) packets, to infer the security status of the grid and detect the presence of possible intruders. An essential part of this architecture is tied to the analysis on the microPMU data. In chapter 3 we establish a set of anomaly detection rules on microPMU data that flag "abnormal behavior". A placement strategy of microPMU sensors is also proposed to maximize the sensitivity in detecting anomalies. In chapter 4, we focus on developing rules that can localize the source of an events using microPMU to further check whether a cyber attack is causing the anomaly, by correlating SCADA traffic with the microPMU data analysis results. The thread that unies the data analysis in this chapter is the fact that decision are made without fully estimating the state of the system; on the contrary, decisions are made using a set of physical measurements that falls short by orders of magnitude to meet the needs for observability. More specifically, in the first part of this chapter (sections 4.1- 4.2), using microPMU data in the substation, methodologies for online identification of the source Thevenin parameters are presented. This methodology is used to identify reconnaissance activity on the normally-open switches in the substation, initiated by attackers to gauge its controllability over the cyber network. The applications of this methodology in monitoring the voltage stability of the grid is also discussed. In the second part of this chapter (sections 4.3-4.5), we investigate the localization of faults. Since the number of PMU sensors available to carry out the inference is insufficient to ensure observability, the problem can be viewed as that of under-sampling a "graph signal"; the analysis leads to a PMU placement strategy that can achieve the highest resolution in localizing the fault, for a given number of sensors. In both cases, the results of the analysis are leveraged in the detection of cyber-physical attacks, where microPMU data and relevant SCADA network traffic information are compared to determine if a network breach has affected the integrity of the system information and/or operations. In second part of this thesis (Part II), the security analysis considers the adequacy and reliability of schedules for the gas and power network. The motivation for scheduling jointly supply in gas and power networks is motivated by the increasing reliance of power grids on natural gas generators (and, indirectly, on gas pipelines) as providing critical reserves. Chapter 5 focuses on unveiling the challenges and providing solution to this problem. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018 Electrical engineering coupled systems cyber-physical system data analytics power grid
404	Utilization of automated location tracking for clinical workflow analytics and visualization January 2018 (has links) abstract: The analysis of clinical workflow offers many challenges to clinical stakeholders and researchers, especially in environments characterized by dynamic and concurrent processes. Workflow analysis in such environments is essential for monitoring performance and finding bottlenecks and sources of error. Clinical workflow analysis has been enhanced with the inclusion of modern technologies. One such intervention is automated location tracking which is a system that detects the movement of clinicians and equipment. Utilizing the data produced from automated location tracking technologies can lead to the development of novel workflow analytics that can be used to complement more traditional approaches such as ethnography and grounded-theory based qualitative methods. The goals of this research are to: (i) develop a series of analytic techniques to derive deeper workflow-related insight in an emergency department setting, (ii) overlay data from disparate sources (quantitative and qualitative) to develop strategies that facilitate workflow redesign, and (iii) incorporate visual analytics methods to improve the targeted visual feedback received by providers based on the findings. The overarching purpose is to create a framework to demonstrate the utility of automated location tracking data used in conjunction with clinical data like EHR logs and its vital role in the future of clinical workflow analysis/analytics. This document is categorized based on two primary aims of the research. The first aim deals with the use of automated location tracking data to develop a novel methodological/exploratory framework for clinical workflow. The second aim is to overlay the quantitative data generated from the previous aim on data from qualitative observation and shadowing studies (mixed methods) to develop a deeper view of clinical workflow that can be used to facilitate workflow redesign. The final sections of the document speculate on the direction of this work where the potential of this research in the creation of fully integrated clinical environments i.e. environments with state-of-the-art location tracking and other data collection mechanisms, is discussed. The main purpose of this research is to demonstrate ways by which clinical processes can be continuously monitored allowing for proactive adaptations in the face of technological and process changes to minimize any negative impact on the quality of patient care and provider satisfaction. / Dissertation/Thesis / Doctoral Dissertation Biomedical Informatics 2018 Operations research Health sciences Health care management Clinical Workflow Data Analytics Real-Time Location Sensing Visualization
405	Visualizing multidimensional data similarities: improvements and applications / Visualizando similaridades em dados multidimensionais: melhorias e aplicações Silva, Renato Rodrigues Oliveira da 05 December 2016 (has links) Multidimensional datasetsare increasingly more prominent and important in data science and many application domains. Such datasets typically consist of a large set of observations, or data points, each which is described by several measurements, or dimensions. During the design of techniques and tools to process such datasets, a key component is to gather insights into their structure and patterns, a goal which is targeted by multidimensional visualization methods. Structures and patterns of high-dimensional data can be described, at a core level, by the notion of similarity of observations. Hence, to visualize such patterns, we need effective and efficient ways to depict similarity relations between a large number of observations, each having a potentially large number of dimensions. Within the realm of multidimensional visualization methods, two classes of techniques exist projections and similarity trees which effectively capture similarity patterns and also scale well to the number of observations and dimensions of the data. However, while such techniques show similarity patterns, understanding and interpreting these patterns in terms of the original data dimensions is still hard. This thesis addresses the development of visual explanatory techniques for the easy interpretation of similarity patterns present in multidimensional projections and similarity trees, by several contributions. First, we proposemethodsthat make the computation of similarity treesefficient for large datasets, and also allow their visual explanation on a multiscale, or several levels of detail. We also propose ways to construct simplified representations of similarity trees, thereby extending their visual scalability even further. Secondly, we propose methods for the visual explanation of multidimensional projections in terms of automatically detected groups of related observations which are also automatically annotated in terms of their similarity in the high-dimensional data space. We show next how these explanatory mechanismscan be adapted to handle both static and time-dependent multidimensional datasets. Our proposed techniques are designed to be easy to use, work nearly automatically, handle any typesof quantitativemultidimensional datasets and multidimensional projection techniques, and are demonstrated on a variety of real-world large datasets obtained from image collections, text archives, scientific measurements, and software engineeering. / Conjuntos de dados multidimensionais são cada vez mais proeminentes e importantes em data science e muitos domínios de aplicação. Esses conjuntos de dados são tipicamente constituídos de um grande número de observações, ou objetos, cada qual descrito por várias medidas, ou dimensões. Durante o projeto de técnicas e ferramentas para processar tais dados, um dos focos principais é prover meios para análise e levantamento de hipóteses a partir das principais estruturas e padrões. Esse objetivo é perseguido por métodos de visualização multidimensional. Estruturas e padrões em dados multidimensionais podem ser descritos, em linhas gerais, pela noção de similaridade das observações. Portanto, para visualizar esses padrões, precisamos de meios efetivos e eficientes para retratar relações de similaridade dentre um grande número de observações, que potencialmente possuem um grande número de dimensões cada. No contexto dos métodos de visualização multidimensional, existem duas categorias de técnicas projeções e árvores de similaridade que efetivamente capturam padrões de similaridade e oferecem boa escalabilidade, tanto para o número de observações e quanto de dimensões. No entanto, embora essas técnicas exibam padrões de similaridade, o entendimento e interpretação desses padrões, em termos das dimensões originais dos dados, ainda é difícil. O trabalho desenvolvido nessa tese visa o desenvolvimento de técnicas explicativas para a fácil interpretação de padrões de similaridade presentes em projeções multidimensionais e árvores de similaridade. Primeiro, propomos métodos que possibilitam a computação eficiente de árvores de similaridade para grandes conjuntos de dados, e também a sua explicação visual em multiescala, ou seja, em vários níveis de detalhe. Também propomos modos de construir representações simplificadas de árvores de similaridade, e desse modo estender ainda mais a sua escalabilidade visual. Segundo, propomos métodos para explicar visualmente projeções multidimensionais em termos de grupos de observações relacionadas, detectadas e anotadas automaticamente para explicitar aspectos de sua similaridade no espaço de alta dimensionalidade. Mostramos em seguida como esses mecanismos explicativos podem ser adaptados para lidar com dados de natureza estática e dependentes no tempo. Nossas técnicas sã construídas visando fácil utilização, funcionamento semi automático, aplicação em quaisquer tipos de dados multidimensionais quantitativos e quaisquer técnicas de projeção multidimensional. Demonstramos a sua utilização em uma variedade de conjuntos de dados reais, obtidos a partir de coleções de imagens, arquivos textuais, medições científicas e de engenharia de software. Análise visual Computação gráfica Computer graphics Dados multidimensionais Multidimensional data Visual analytics Visualização Visualization
406	You Can View the Tweets!: Content Analysis of Tweets Mentioning Works in an Institutional Repository Lowery, Ashley 21 April 2017 (has links) Academic libraries provide resources scholars can use to measure their scholarly output, including altmetrics products. Altmetrics recently emerged to accommodate the sharing and dissemination of scholarship on the social web. The scholarly community is grappling with understanding and utilizing altmetrics tracked by these products. This study uses altmetrics provided by Plum Analytics products to analyze the content of tweets mentioning works from a Digital Commons institutional repository. Plum Analytics provides quantitative (number of tweets and retweets) and qualitative (content of the tweets) data from Twitter. In this study qualitative data is collected and coded to determine the tone of the tweets (negative, neutral, or positive) and other information including the tweet’s author, the intended audience, and hashtags. Results from the study will help better understand the meaning behind Twitter data and consequently guide scholars on effectively using tweets as scholarship measures. tweets altmetrics Plum Analytics Digital Commons Charles C. Sherrod Library Library and Information Science
407	Understanding, Analyzing and Predicting Online User Behavior January 2019 (has links) abstract: Due to the growing popularity of the Internet and smart mobile devices, massive data has been produced every day, particularly, more and more users’ online behavior and activities have been digitalized. Making a better usage of the massive data and a better understanding of the user behavior become at the very heart of industrial firms as well as the academia. However, due to the large size and unstructured format of user behavioral data, as well as the heterogeneous nature of individuals, it leveled up the difficulty to identify the SPECIFIC behavior that researchers are looking at, HOW to distinguish, and WHAT is resulting from the behavior. The difference in user behavior comes from different causes; in my dissertation, I am studying three circumstances of behavior that potentially bring in turbulent or detrimental effects, from precursory culture to preparatory strategy and delusory fraudulence. Meanwhile, I have access to the versatile toolkit of analysis: econometrics, quasi-experiment, together with machine learning techniques such as text mining, sentiment analysis, and predictive analytics etc. This study creatively leverages the power of the combined methodologies, and apply it beyond individual level data and network data. This dissertation makes a first step to discover user behavior in the newly boosting contexts. My study conceptualize theoretically and test empirically the effect of cultural values on rating and I find that an individualist cultural background are more likely to lead to deviation and more expression in review behaviors. I also find evidence of strategic behavior that users tend to leverage the reporting to increase the likelihood to maximize the benefits. Moreover, it proposes the features that moderate the preparation behavior. Finally, it introduces a unified and scalable framework for delusory behavior detection that meets the current needs to fully utilize multiple data sources. / Dissertation/Thesis / Doctoral Dissertation Business Administration 2019 Information science fraudulent detection Individualism Culture Predictive Analytics Social Networks Strategic Behavior User Behavior Language
408	Estimating difficulty of learning activities in design stages: A novel application of Neuroevolution Gallego-Durán, Francisco J. 18 December 2015 (has links) In every learning or training environment, exercises are the basis for practical learning. Learners need to practice in order to acquire new abilities and perfect those gained previously. However, not every exercise is valid for every learner: learners require exercises that match their ability levels. Hence, difficulty of an exercise could be defined as the amount of effort that a learner requires to successfully complete the exercise (its learning cost). Too high difficulties tend to discourage learners and make them drop out, whereas too low difficulties are perceived as unchallenging, resulting in loss of interest. Correctly estimating difficulties is hard and error-prone problem that tends to be done manually using domain-expert knowledge. Underestimating or overestimating difficulty generates a problem for learners, increasing dropout rates in learning environments. This paper presents a novel approach to improve difficulty estimations by using Neuroevolution. The method is based on measuring the computational cost that Neuroevolution algorithms require to successfully complete a given exercise and establishing similarities with previously gathered information from learners. For specific experiments presented, a game called PLMan has been used. PLMan is a PacMan-like game in which users have to program the Artificial Intelligence of the main character using a Prolog knowledge base. Results show that there exists a correlation between students’ learning costs and those of Neuroevolution. This suggests that the approach is valid, and measured difficulty of Neuroevolution algorithms may be used as estimation for student's difficulty in the proposed environment. Neuroevolution Machine learning Learning analytics Education Games
409	Fast demand response with datacenter loads: a green dimension of big data McClurg, Josiah 01 August 2017 (has links) Demand response is one of the critical technologies necessary for allowing large-scale penetration of intermittent renewable energy sources in the electric grid. Data centers are especially attractive candidates for providing flexible, real-time demand response services to the grid because they are capable of fast power ramp-rates, large dynamic range, and finely-controllable power consumption. This thesis makes a contribution toward implementing load shaping with server clusters through a detailed experimental investigation of three broadly-applicable datacenter workload scenarios. We experimentally demonstrate the eminent feasibility of datacenter demand response with a distributed video transcoding application and a simple distributed power controller. We also show that while some software power capping interfaces performed better than others, all the interfaces we investigated had the high dynamic range and low power variance required to achieve high quality power tracking. Our next investigation presents an empirical performance evaluation of algorithms that replace arithmetic operations with low-level bit operations for power-aware Big Data processing. Specifically, we compare two different data structures in terms of execution time and power efficiency: (a) a baseline design using arrays, and (b) a design using bit-slice indexing (BSI) and distributed BSI arithmetic. Across three different datasets and three popular queries, we show that the bit-slicing queries consistently outperform the array algorithm in both power efficiency and execution time. In the context of datacenter power shaping, this performance optimization enables additional power flexibility -- achieving the same or greater performance than the baseline approach, even under power constraints. The investigation of read-optimized index queries leads up to an experimental investigation of the tradeoffs among power constraint, query freshness, and update aggregation size in a dynamic big data environment. We compare several update strategies, presenting a bitmap update optimization that allows improved performance over both a baseline approach and an existing state-of-the-art update strategy. Performing this investigation in the context of load shaping, we show that read-only range queries can be served without performance impact under power cap, and index updates can be tuned to provide a flexible base load. This thesis concludes with a brief discussion of control implementation and summary of our findings. big data analytics cluster computing demand response load shaping power aware computing Electrical and Computer Engineering
410	International Music Preferences: An Analysis of the Determinants of Song Popularity on Spotify for the U.S., Norway, Taiwan, Ecuador, and Costa Rica Suh, Brendan Joseph 01 January 2019 (has links) This paper examines data from Spotify’s API for 2017-2018 to determine the effects of song attributes on the success of tracks on Spotify’s Top 200 Chart across five different countries: the U.S., Norway, Taiwan, Ecuador, and Costa Rica. Two dependent variables are used to measure the success of a song – a track’s peak position on the charts and the number of days it survives on a country’s Top 200 Chart. Using ten separate regressions, one for each dependent variable in all five countries, it is concluded that the presence of a featured guest on a track increases a song’s peak position and the number of days it survives on the charts in almost every country. Further, songs that are perceived as “happier” are more successful for both metrics in Norway and Taiwan while those that are louder and more aggressive have a shorter lifespan on the charts in three out of five of the countries studied. The paper concludes that further research should be conducted with a larger, more diverse dataset to see if these findings hold and if they are present in other countries as well. Spotify Music Streaming Popularity Digital International Business Analytics Music Business Other Music

Search results