Global ETD Search

141	Catching the flu : syndromic surveillance, algorithmic governmentality and global health security Roberts, Stephen L. January 2018 (has links) This thesis offers a critical analysis of the rise of syndromic surveillance systems for the advanced detection of pandemic threats within contemporary global health security frameworks. The thesis traces the iterative evolution and ascendancy of three such novel syndromic surveillance systems for the strengthening of health security initiatives over the past two decades: 1) The Program for Monitoring Emerging Diseases (ProMED-mail); 2) The Global Public Health Intelligence Network (GPHIN); and 3) HealthMap. This thesis demonstrates how each newly introduced syndromic surveillance system has become increasingly oriented towards the integration of digital algorithms into core surveillance capacities to continually harness and forecast upon infinitely generating sets of digital, open-source data, potentially indicative of forthcoming pandemic threats. This thesis argues that the increased centrality of the algorithm within these next-generation syndromic surveillance systems produces a new and distinct form of infectious disease surveillance for the governing of emergent pathogenic contingencies. Conceptually, the thesis also shows how the rise of this algorithmic mode of infectious disease surveillance produces divergences in the governmental rationalities of global health security, leading to the rise of an algorithmic governmentality within contemporary contexts of Big Data and these surveillance systems. Empirically, this thesis demonstrates how this new form of algorithmic infectious disease surveillance has been rapidly integrated into diplomatic, legal, and political frameworks to strengthen the practice global health security – producing subtle, yet distinct shifts in the outbreak notification and reporting transparency of states, increasingly scrutinized by the algorithmic gaze of syndromic surveillance. 327
142	Multi-agent-based DDoS detection on big data systems Osei, Solomon January 2018 (has links) The Hadoop framework has become the most deployed platform for processing Big Data. Despite its advantages, Hadoop s infrastructure is still deployed within the secured network perimeter because the framework lacks adequate inherent security mechanisms against various security threats. However, this approach is not sufficient for providing adequate security layer against attacks such as Distributed Denial of Service. Furthermore, current work to secure Hadoop s infrastructure against DDoS attacks is unable to provide a distributed node-level detection mechanism. This thesis presents a software agent-based framework that allows distributed, real-time intelligent monitoring and detection of DDoS attack at Hadoop s node-level. The agent s cognitive system is ingrained with cumulative sum statistical technique to analyse network utilisation and average server load and detect attacks from these measurements. The framework is a multi-agent architecture with transducer agents that interface with each Hadoop node to provide real-time detection mechanism. Moreover, the agents contextualise their beliefs by training themselves with the contextual information of each node and monitor the activities of the node to differentiate between normal and anomalous behaviours. In the experiments, the framework was exposed to TCP SYN and UDP flooding attacks during a legitimate MapReduce job on the Hadoop testbed. The experimental results were evaluated regarding performance metrics such as false-positive ratio, false-negative ratio and response time to attack. The results show that UDP and TCP SYN flooding attacks can be detected and confirmed on multiple nodes in nineteen seconds with 5.56% false-positive ration, 7.70% false-negative ratio and 91.5% success rate of detection. The results represent an improvement compared to the state-of the-art.
143	SOCIAL MEDIA FOOTPRINTS OF PUBLIC PERCEPTION ON ENERGY ISSUES IN THE CONTERMINOUS UNITED STATES Leifer, David 01 August 2019 (has links) Energy has been at the top of the national and global political agenda along with other big data geocoding sentiment analysis social media twitter
144	Big Data Analytics and Engineering for Medicare Fraud Detection Unknown Date (has links) The United States (U.S.) healthcare system produces an enormous volume of data with a vast number of financial transactions generated by physicians administering healthcare services. This makes healthcare fraud difficult to detect, especially when there are considerably less fraudulent transactions than non-fraudulent. Fraud is an extremely important issue for healthcare, as fraudulent activities within the U.S. healthcare system contribute to significant financial losses. In the U.S., the elderly population continues to rise, increasing the need for programs, such as Medicare, to help with associated medical expenses. Unfortunately, due to healthcare fraud, these programs are being adversely affected, draining resources and reducing the quality and accessibility of necessary healthcare services. In response, advanced data analytics have recently been explored to detect possible fraudulent activities. The Centers for Medicare and Medicaid Services (CMS) released several ‘Big Data’ Medicare claims datasets for different parts of their Medicare program to help facilitate this effort. In this dissertation, we employ three CMS Medicare Big Data datasets to evaluate the fraud detection performance available using advanced data analytics techniques, specifically machine learning. We use two distinct approaches, designated as anomaly detection and traditional fraud detection, where each have very distinct data processing and feature engineering. Anomaly detection experiments classify by provider specialty, determining whether outlier physicians within the same specialty signal fraudulent behavior. Traditional fraud detection refers to the experiments directly classifying physicians as fraudulent or non-fraudulent, leveraging machine learning algorithms to discriminate between classes. We present our novel data engineering approaches for both anomaly detection and traditional fraud detection including data processing, fraud mapping, and the creation of a combined dataset consisting of all three Medicare parts. We incorporate the List of Excluded Individuals and Entities database to identify real world fraudulent physicians for model evaluation. Regarding features, the final datasets for anomaly detection contain only claim counts for every procedure a physician submits while traditional fraud detection incorporates aggregated counts and payment information, specialty, and gender. Additionally, we compare cross-validation to the real world application of building a model on a training dataset and evaluating on a separate test dataset for severe class imbalance and rarity. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection Big data Medicare fraud Data analytics Machine learning
145	PREDICTING MELANOMA RISK FROM ELECTRONIC HEALTH RECORDS WITH MACHINE LEARNING TECHNIQUES Unknown Date (has links) Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. Electronic health records collect an enormous amount of data about real-world patient encounters, treatments, and outcomes. This data can be mined to increase our understanding of melanoma as well as build personalized models to predict risk of developing the cancer. Cancer risk models built from structured clinical data are limited in current research, with most studies involving just a few variables from institutional databases or registries. This dissertation presents data processing and machine learning approaches to build melanoma risk models from a large database of de-identified electronic health records. The database contains consistently captured structured data, enabling the extraction of hundreds of thousands of data points each from millions of patient records. Several experiments are performed to build effective models, particularly to predict sentinel lymph node metastasis in known melanoma patients and to predict individual risk of developing melanoma. Data for these models suffer from high dimensionality and class imbalance. Thus, classifiers such as logistic regression, support vector machines, random forest, and XGBoost are combined with advanced modeling techniques such as feature selection and data sampling. Risk factors are evaluated using regression model weights and decision trees, while personalized predictions are provided through random forest decomposition and Shapley additive explanations. Random undersampling on the melanoma risk dataset shows that many majority samples can be removed without a decrease in model performance. To determine how much data is truly needed, we explore learning curve approximation methods on the melanoma data and three publicly-available large-scale biomedical datasets. We apply an inverse power law model as well as introduce a novel semi-supervised curve creation method that utilizes a small amount of labeled data. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection Melanoma Electronic Health Records Machine learning--Technique Big Data
146	Social Media Data Strategies Bankers Use to Increase Customer Market Share Wright, Michelle Christine 01 January 2019 (has links) Banking leaders who fail to implement social media data-driven marketing strategies lose opportunities to increase customer market share through customer acquisition and retention, improved customer service, heightened brand awareness, customer cocreation, and relationship management. The purpose of this multiple case study was to explore strategies banking leaders used to integrate social media analytics into marketing strategies to increase customer market share. The target population was 6 senior bankers from 2 banks in the mid-Atlantic region of the United States, with a significant social media presence, including 25,000 or more followers across 2 social media platforms. The disruptive innovation theory served as the conceptual framework for the study. Data were collected from semistructured interviews and a review of the organizations' public business documents, websites, and social media websites. Data were analyzed using coding to determine themes. By analyzing these sources and performing methodological triangulation, 8 key themes emerged that were categorized into 4 groupings: (a) social media knowledge management, (b) social media marketing strategy implementation, (c) social media data challenges and communication, and (d) social media competitive gain and future enhancements. The implications of this study for positive social change include social and environmental benefits such as creating jobs and economic growth through a corporate social responsibility initiative. Current and prospect customer bases, local communities, bankers, and stakeholders might benefit from the findings of this study. Banking Big Data Customer Market Share Marketing Social Media Strategies
147	Modelos agrometeorológicos para previsão de pragas e doenças em Coffea arabica L. em Minas Gerais / Aparecido, Lucas Eduardo de Oliveira. January 2019 (has links) Orientador: Glauco de Souza Rolim / Resumo: O café é a bebida mais consumida no mundo e uma das principais causas para a redução da produtividade e qualidade são os problemas fitossanitários. A estratégia mais comum de controle dessas doenças e pragas é a aplicação de fungicidas e inseticidas foliares, dependendo da intensidade dos mesmos na região. Esse método tradicional pode ser melhorado utilizando de sistemas de alertas por meio de modelos de estimativas dos índices de doenças e pragas. Este trabalho tem como OBJETIVOS: A) Calibrar as variáveis meteorológicas: temperatura do ar e precipitação pluviométrica do sistema ECMWF em relação aos dados de reais de superfície mensurados pelo sistema nacional de meteorologia (INMET) para o estado de Minas Gerais; B) Avaliar quais os elementos meteorológicos exercem maior influência nas principais pragas (broca e bicho-mineiro) e doenças (ferrugem e cercosporiose) do cafeeiro arábica nas principais localidades cafeeiras do Sul de Minas Gerais e do Cerrado Mineiro; C) Desenvolver modelos agrometeorológicos para previsão de pragas e doenças em função das variáveis meteorológicas usando algoritmos de machine learning e procurando uma antecipação temporal suficiente para tomada de decisões. MATERIAL E MÉTODOS: Para o objetivo “A” foram utilizados dados climáticos mensais de temperatura do ar (T, ºC) e precipitação pluviométrica (P, mm) provenientes do ECMWF e do INMET no período de 1979 a 2017. A evapotranspiração potencial foi estimada por Thornthwaite (1948) e balanço hídrico p... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Coffee is the most consumed beverage in the world, but phytosanitary problems are amongst the main causes of reduced productivity and quality. The application of foliar fungicides and insecticides is the most common strategy for controlling these diseases and pests, depending on their intensity in a region. This traditional method can be improved by using alert systems with models of disease and pest indices. This work has as OBJECTIVES: A) To calibrate the meteorological variables: air temperature and rainfall of the European Center for Medium Range Weather Forecast (ECMWF) in relation to the real surface data measured by the national meteorological system (INMET) for the state of Minas Gerais; B) To evaluate which meteorological elements, and at what time, have a greater influence on the main pests (coffee borer and coffee miner) and diseases (coffee rust and cercosporiosis) of Coffee arabica in the main coffee regions of the South of Minas Gerais and Cerrado Mineiro; C) To develop agrometeorological models for pest and disease prediction in function of the meteorological variables of the South of Minas Gerais and Cerrado Mineiro using algorithms of machine learning with sufficient temporal anticipation for decision making. MATERIAL AND METHODS: To achieve goal "A" we used monthly climatic data (T, ºC) and rainfall (P, mm) from the ECMWF and INMET from 1979 to 2015. Potential evapotranspiration was estimated by Thornthwaite (1948) and water balance by Thornthwaite and Mathe... (Complete abstract click electronic access below) / Doutor Fitopatologia. Modelagem Aprendizado máquina Big data. Fitopatologia. Modelling Machine learning
148	Smart Urban Metabolism : Toward a New Understanding of Causalities in Cities Shahrokni, Hossein January 2015 (has links) For half a century, urban metabolism has been used to provide insights to support transitions to sustainable urban development (SUD). Internet and Communication Technology (ICT) has recently been recognized as a potential technology enabler to advance this transition. This thesis explored the potential for an ICT-enabled urban metabolism framework aimed at improving resource efficiency in urban areas by supporting decision-making processes. Three research objectives were identified: i) investigation of how the urban metabolism framework, aided by ICT, could be utilized to support decision-making processes; ii) development of an ICT platform that manages real-time, high spatial and temporal resolution urban metabolism data and evaluation of its implementation; and iii) identification of the potential for efficiency improvements through the use of resulting high spatial and temporal resolution urban metabolism data. The work to achieve these objectives was based on literature reviews, single-case study research in Stockholm, software engineering research, and big data analytics of resulting data. The evolved framework, Smart Urban Metabolism (SUM), enabled by the emerging context of smart cities, operates at higher temporal (up to real-time), and spatial (up to household/individual) data resolution. A key finding was that the new framework overcomes some of the barriers identified for the conventional urban metabolism framework. The results confirm that there are hidden urban patterns that may be uncovered by analyzing structured big urban data. Some of those patterns may lead to the identification of appropriate intervention measures for SUD. / <p>QC 20151120</p> / Smart City SRS industrial ecology urban metabolism smart cities big data data science
149	Scalable Embeddings for Kernel Clustering on MapReduce Elgohary, Ahmed 14 February 2014 (has links) There is an increasing demand from businesses and industries to make the best use of their data. Clustering is a powerful tool for discovering natural groupings in data. The k-means algorithm is the most commonly-used data clustering method, having gained popularity for its effectiveness on various data sets and ease of implementation on different computing architectures. It assumes, however, that data are available in an attribute-value format, and that each data instance can be represented as a vector in a feature space where the algorithm can be applied. These assumptions are impractical for real data, and they hinder the use of complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clustering which extends the k-means algorithm to work on a similarity matrix over complex data structures. The kernel k-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel k-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. This thesis defines a family of kernel-based low-dimensional embeddings that allows for scaling kernel k-means on MapReduce via an efficient and unified parallelization strategy. Then, three practical methods for low-dimensional embedding that adhere to our definition of the embedding family are proposed. Combining the proposed parallelization strategy with any of the three embedding methods constitutes a complete scalable and efficient MapReduce algorithm for kernel k-means. The efficiency and the scalability of the presented algorithms are demonstrated analytically and empirically. Data Clustering Kernel Methods Scalable Data Analytics MapReduce Big Data
150	Integrating Fuzzy Decisioning Models With Relational Database Constructs Durham, Erin-Elizabeth A 18 December 2014 (has links) Human learning and classification is a nebulous area in computer science. Classic decisioning problems can be solved given enough time and computational power, but discrete algorithms cannot easily solve fuzzy problems. Fuzzy decisioning can resolve more real-world fuzzy problems, but existing algorithms are often slow, cumbersome and unable to give responses within a reasonable timeframe to anything other than predetermined, small dataset problems. We have developed a database-integrated highly scalable solution to training and using fuzzy decision models on large datasets. The Fuzzy Decision Tree algorithm is the integration of the Quinlan ID3 decision-tree algorithm together with fuzzy set theory and fuzzy logic. In existing research, when applied to the microRNA prediction problem, Fuzzy Decision Tree outperformed other machine learning algorithms including Random Forest, C4.5, SVM and Knn. In this research, we propose that the effectiveness with which large dataset fuzzy decisions can be resolved via the Fuzzy Decision Tree algorithm is significantly improved when using a relational database as the storage unit for the fuzzy ID3 objects, versus traditional storage objects. Furthermore, it is demonstrated that pre-processing certain pieces of the decisioning within the database layer can lead to much swifter membership determinations, especially on Big Data datasets. The proposed algorithm uses the concepts inherent to databases: separated schemas, indexing, partitioning, pipe-and-filter transformations, preprocessing data, materialized and regular views, etc., to present a model with a potential to learn from itself. Further, this work presents a general application model to re-architect Big Data applications in order to efficiently present decisioned results: lowering the volume of data being handled by the application itself, and significantly decreasing response wait times while allowing the flexibility and permanence of a standard relational SQL database, supplying optimal user satisfaction in today's Data Analytics world. We experimentally demonstrate the effectiveness of our approach. Database SQL Big Data Query optimization Fuzzy Classification

Search results