• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 592
  • 119
  • 109
  • 75
  • 41
  • 40
  • 27
  • 22
  • 19
  • 11
  • 8
  • 7
  • 6
  • 6
  • 5
  • Tagged with
  • 1229
  • 1229
  • 181
  • 170
  • 163
  • 157
  • 151
  • 150
  • 150
  • 130
  • 113
  • 111
  • 111
  • 109
  • 108
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
141

A scalable data store and analytic platform for real-time monitoring of data-intensive scientific infrastructure

Suthakar, Uthayanath January 2017 (has links)
Monitoring data-intensive scientific infrastructures in real-time such as jobs, data transfers, and hardware failures is vital for efficient operation. Due to the high volume and velocity of events that are produced, traditional methods are no longer optimal. Several techniques, as well as enabling architectures, are available to support the Big Data issue. In this respect, this thesis complements existing survey work by contributing an extensive literature review of both traditional and emerging Big Data architecture. Scalability, low-latency, fault-tolerance, and intelligence are key challenges of the traditional architecture. However, Big Data technologies and approaches have become increasingly popular for use cases that demand the use of scalable, data intensive processing (parallel), and fault-tolerance (data replication) and support for low-latency computations. In the context of a scalable data store and analytics platform for monitoring data-intensive scientific infrastructure, Lambda Architecture was adapted and evaluated on the Worldwide LHC Computing Grid, which has been proven effective. This is especially true for computationally and data-intensive use cases. In this thesis, an efficient strategy for the collection and storage of large volumes of data for computation is presented. By moving the transformation logic out from the data pipeline and moving to analytics layers, it simplifies the architecture and overall process. Time utilised is reduced, untampered raw data are kept at storage level for fault-tolerance, and the required transformation can be done when needed. An optimised Lambda Architecture (OLA), which involved modelling an efficient way of joining batch layer and streaming layer with minimum code duplications in order to support scalability, low-latency, and fault-tolerance is presented. A few models were evaluated; pure streaming layer, pure batch layer and the combination of both batch and streaming layers. Experimental results demonstrate that OLA performed better than the traditional architecture as well the Lambda Architecture. The OLA was also enhanced by adding an intelligence layer for predicting data access pattern. The intelligence layer actively adapts and updates the model built by the batch layer, which eliminates the re-training time while providing a high level of accuracy using the Deep Learning technique. The fundamental contribution to knowledge is a scalable, low-latency, fault-tolerant, intelligent, and heterogeneous-based architecture for monitoring a data-intensive scientific infrastructure, that can benefit from Big Data, technologies and approaches.
142

Catching the flu : syndromic surveillance, algorithmic governmentality and global health security

Roberts, Stephen L. January 2018 (has links)
This thesis offers a critical analysis of the rise of syndromic surveillance systems for the advanced detection of pandemic threats within contemporary global health security frameworks. The thesis traces the iterative evolution and ascendancy of three such novel syndromic surveillance systems for the strengthening of health security initiatives over the past two decades: 1) The Program for Monitoring Emerging Diseases (ProMED-mail); 2) The Global Public Health Intelligence Network (GPHIN); and 3) HealthMap. This thesis demonstrates how each newly introduced syndromic surveillance system has become increasingly oriented towards the integration of digital algorithms into core surveillance capacities to continually harness and forecast upon infinitely generating sets of digital, open-source data, potentially indicative of forthcoming pandemic threats. This thesis argues that the increased centrality of the algorithm within these next-generation syndromic surveillance systems produces a new and distinct form of infectious disease surveillance for the governing of emergent pathogenic contingencies. Conceptually, the thesis also shows how the rise of this algorithmic mode of infectious disease surveillance produces divergences in the governmental rationalities of global health security, leading to the rise of an algorithmic governmentality within contemporary contexts of Big Data and these surveillance systems. Empirically, this thesis demonstrates how this new form of algorithmic infectious disease surveillance has been rapidly integrated into diplomatic, legal, and political frameworks to strengthen the practice global health security – producing subtle, yet distinct shifts in the outbreak notification and reporting transparency of states, increasingly scrutinized by the algorithmic gaze of syndromic surveillance.
143

Multi-agent-based DDoS detection on big data systems

Osei, Solomon January 2018 (has links)
The Hadoop framework has become the most deployed platform for processing Big Data. Despite its advantages, Hadoop s infrastructure is still deployed within the secured network perimeter because the framework lacks adequate inherent security mechanisms against various security threats. However, this approach is not sufficient for providing adequate security layer against attacks such as Distributed Denial of Service. Furthermore, current work to secure Hadoop s infrastructure against DDoS attacks is unable to provide a distributed node-level detection mechanism. This thesis presents a software agent-based framework that allows distributed, real-time intelligent monitoring and detection of DDoS attack at Hadoop s node-level. The agent s cognitive system is ingrained with cumulative sum statistical technique to analyse network utilisation and average server load and detect attacks from these measurements. The framework is a multi-agent architecture with transducer agents that interface with each Hadoop node to provide real-time detection mechanism. Moreover, the agents contextualise their beliefs by training themselves with the contextual information of each node and monitor the activities of the node to differentiate between normal and anomalous behaviours. In the experiments, the framework was exposed to TCP SYN and UDP flooding attacks during a legitimate MapReduce job on the Hadoop testbed. The experimental results were evaluated regarding performance metrics such as false-positive ratio, false-negative ratio and response time to attack. The results show that UDP and TCP SYN flooding attacks can be detected and confirmed on multiple nodes in nineteen seconds with 5.56% false-positive ration, 7.70% false-negative ratio and 91.5% success rate of detection. The results represent an improvement compared to the state-of the-art.
144

SOCIAL MEDIA FOOTPRINTS OF PUBLIC PERCEPTION ON ENERGY ISSUES IN THE CONTERMINOUS UNITED STATES

Leifer, David 01 August 2019 (has links)
Energy has been at the top of the national and global political agenda along with other
145

Big Data Analytics and Engineering for Medicare Fraud Detection

Unknown Date (has links)
The United States (U.S.) healthcare system produces an enormous volume of data with a vast number of financial transactions generated by physicians administering healthcare services. This makes healthcare fraud difficult to detect, especially when there are considerably less fraudulent transactions than non-fraudulent. Fraud is an extremely important issue for healthcare, as fraudulent activities within the U.S. healthcare system contribute to significant financial losses. In the U.S., the elderly population continues to rise, increasing the need for programs, such as Medicare, to help with associated medical expenses. Unfortunately, due to healthcare fraud, these programs are being adversely affected, draining resources and reducing the quality and accessibility of necessary healthcare services. In response, advanced data analytics have recently been explored to detect possible fraudulent activities. The Centers for Medicare and Medicaid Services (CMS) released several ‘Big Data’ Medicare claims datasets for different parts of their Medicare program to help facilitate this effort. In this dissertation, we employ three CMS Medicare Big Data datasets to evaluate the fraud detection performance available using advanced data analytics techniques, specifically machine learning. We use two distinct approaches, designated as anomaly detection and traditional fraud detection, where each have very distinct data processing and feature engineering. Anomaly detection experiments classify by provider specialty, determining whether outlier physicians within the same specialty signal fraudulent behavior. Traditional fraud detection refers to the experiments directly classifying physicians as fraudulent or non-fraudulent, leveraging machine learning algorithms to discriminate between classes. We present our novel data engineering approaches for both anomaly detection and traditional fraud detection including data processing, fraud mapping, and the creation of a combined dataset consisting of all three Medicare parts. We incorporate the List of Excluded Individuals and Entities database to identify real world fraudulent physicians for model evaluation. Regarding features, the final datasets for anomaly detection contain only claim counts for every procedure a physician submits while traditional fraud detection incorporates aggregated counts and payment information, specialty, and gender. Additionally, we compare cross-validation to the real world application of building a model on a training dataset and evaluating on a separate test dataset for severe class imbalance and rarity. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection
146

PREDICTING MELANOMA RISK FROM ELECTRONIC HEALTH RECORDS WITH MACHINE LEARNING TECHNIQUES

Unknown Date (has links)
Melanoma is one of the fastest growing cancers in the world, and can affect patients earlier in life than most other cancers. Therefore, it is imperative to be able to identify patients at high risk for melanoma and enroll them in screening programs to detect the cancer early. Electronic health records collect an enormous amount of data about real-world patient encounters, treatments, and outcomes. This data can be mined to increase our understanding of melanoma as well as build personalized models to predict risk of developing the cancer. Cancer risk models built from structured clinical data are limited in current research, with most studies involving just a few variables from institutional databases or registries. This dissertation presents data processing and machine learning approaches to build melanoma risk models from a large database of de-identified electronic health records. The database contains consistently captured structured data, enabling the extraction of hundreds of thousands of data points each from millions of patient records. Several experiments are performed to build effective models, particularly to predict sentinel lymph node metastasis in known melanoma patients and to predict individual risk of developing melanoma. Data for these models suffer from high dimensionality and class imbalance. Thus, classifiers such as logistic regression, support vector machines, random forest, and XGBoost are combined with advanced modeling techniques such as feature selection and data sampling. Risk factors are evaluated using regression model weights and decision trees, while personalized predictions are provided through random forest decomposition and Shapley additive explanations. Random undersampling on the melanoma risk dataset shows that many majority samples can be removed without a decrease in model performance. To determine how much data is truly needed, we explore learning curve approximation methods on the melanoma data and three publicly-available large-scale biomedical datasets. We apply an inverse power law model as well as introduce a novel semi-supervised curve creation method that utilizes a small amount of labeled data. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2019. / FAU Electronic Theses and Dissertations Collection
147

Social Media Data Strategies Bankers Use to Increase Customer Market Share

Wright, Michelle Christine 01 January 2019 (has links)
Banking leaders who fail to implement social media data-driven marketing strategies lose opportunities to increase customer market share through customer acquisition and retention, improved customer service, heightened brand awareness, customer cocreation, and relationship management. The purpose of this multiple case study was to explore strategies banking leaders used to integrate social media analytics into marketing strategies to increase customer market share. The target population was 6 senior bankers from 2 banks in the mid-Atlantic region of the United States, with a significant social media presence, including 25,000 or more followers across 2 social media platforms. The disruptive innovation theory served as the conceptual framework for the study. Data were collected from semistructured interviews and a review of the organizations' public business documents, websites, and social media websites. Data were analyzed using coding to determine themes. By analyzing these sources and performing methodological triangulation, 8 key themes emerged that were categorized into 4 groupings: (a) social media knowledge management, (b) social media marketing strategy implementation, (c) social media data challenges and communication, and (d) social media competitive gain and future enhancements. The implications of this study for positive social change include social and environmental benefits such as creating jobs and economic growth through a corporate social responsibility initiative. Current and prospect customer bases, local communities, bankers, and stakeholders might benefit from the findings of this study.
148

Modelos agrometeorológicos para previsão de pragas e doenças em Coffea arabica L. em Minas Gerais /

Aparecido, Lucas Eduardo de Oliveira. January 2019 (has links)
Orientador: Glauco de Souza Rolim / Resumo: O café é a bebida mais consumida no mundo e uma das principais causas para a redução da produtividade e qualidade são os problemas fitossanitários. A estratégia mais comum de controle dessas doenças e pragas é a aplicação de fungicidas e inseticidas foliares, dependendo da intensidade dos mesmos na região. Esse método tradicional pode ser melhorado utilizando de sistemas de alertas por meio de modelos de estimativas dos índices de doenças e pragas. Este trabalho tem como OBJETIVOS: A) Calibrar as variáveis meteorológicas: temperatura do ar e precipitação pluviométrica do sistema ECMWF em relação aos dados de reais de superfície mensurados pelo sistema nacional de meteorologia (INMET) para o estado de Minas Gerais; B) Avaliar quais os elementos meteorológicos exercem maior influência nas principais pragas (broca e bicho-mineiro) e doenças (ferrugem e cercosporiose) do cafeeiro arábica nas principais localidades cafeeiras do Sul de Minas Gerais e do Cerrado Mineiro; C) Desenvolver modelos agrometeorológicos para previsão de pragas e doenças em função das variáveis meteorológicas usando algoritmos de machine learning e procurando uma antecipação temporal suficiente para tomada de decisões. MATERIAL E MÉTODOS: Para o objetivo “A” foram utilizados dados climáticos mensais de temperatura do ar (T, ºC) e precipitação pluviométrica (P, mm) provenientes do ECMWF e do INMET no período de 1979 a 2017. A evapotranspiração potencial foi estimada por Thornthwaite (1948) e balanço hídrico p... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Coffee is the most consumed beverage in the world, but phytosanitary problems are amongst the main causes of reduced productivity and quality. The application of foliar fungicides and insecticides is the most common strategy for controlling these diseases and pests, depending on their intensity in a region. This traditional method can be improved by using alert systems with models of disease and pest indices. This work has as OBJECTIVES: A) To calibrate the meteorological variables: air temperature and rainfall of the European Center for Medium Range Weather Forecast (ECMWF) in relation to the real surface data measured by the national meteorological system (INMET) for the state of Minas Gerais; B) To evaluate which meteorological elements, and at what time, have a greater influence on the main pests (coffee borer and coffee miner) and diseases (coffee rust and cercosporiosis) of Coffee arabica in the main coffee regions of the South of Minas Gerais and Cerrado Mineiro; C) To develop agrometeorological models for pest and disease prediction in function of the meteorological variables of the South of Minas Gerais and Cerrado Mineiro using algorithms of machine learning with sufficient temporal anticipation for decision making. MATERIAL AND METHODS: To achieve goal "A" we used monthly climatic data (T, ºC) and rainfall (P, mm) from the ECMWF and INMET from 1979 to 2015. Potential evapotranspiration was estimated by Thornthwaite (1948) and water balance by Thornthwaite and Mathe... (Complete abstract click electronic access below) / Doutor
149

Smart Urban Metabolism : Toward a New Understanding of Causalities in Cities

Shahrokni, Hossein January 2015 (has links)
For half a century, urban metabolism has been used to provide insights to support transitions to sustainable urban development (SUD). Internet and Communication Technology (ICT) has recently been recognized as a potential technology enabler to advance this transition. This thesis explored the potential for an ICT-enabled urban metabolism framework aimed at improving resource efficiency in urban areas by supporting decision-making processes. Three research objectives were identified: i) investigation of how the urban metabolism framework, aided by ICT, could be utilized to support decision-making processes; ii) development of an ICT platform that manages real-time, high spatial and temporal resolution urban metabolism data and evaluation of its implementation; and iii) identification of the potential for efficiency improvements through the use of resulting high spatial and temporal resolution urban metabolism data. The work to achieve these objectives was based on literature reviews, single-case study research in Stockholm, software engineering research, and big data analytics of resulting data. The evolved framework, Smart Urban Metabolism (SUM), enabled by the emerging context of smart cities, operates at higher temporal (up to real-time), and spatial (up to household/individual) data resolution. A key finding was that the new framework overcomes some of the barriers identified for the conventional urban metabolism framework. The results confirm that there are hidden urban patterns that may be uncovered by analyzing structured big urban data. Some of those patterns may lead to the identification of appropriate intervention measures for SUD. / <p>QC 20151120</p> / Smart City SRS
150

Scalable Embeddings for Kernel Clustering on MapReduce

Elgohary, Ahmed 14 February 2014 (has links)
There is an increasing demand from businesses and industries to make the best use of their data. Clustering is a powerful tool for discovering natural groupings in data. The k-means algorithm is the most commonly-used data clustering method, having gained popularity for its effectiveness on various data sets and ease of implementation on different computing architectures. It assumes, however, that data are available in an attribute-value format, and that each data instance can be represented as a vector in a feature space where the algorithm can be applied. These assumptions are impractical for real data, and they hinder the use of complex data structures in real-world clustering applications. The kernel k-means is an effective method for data clustering which extends the k-means algorithm to work on a similarity matrix over complex data structures. The kernel k-means algorithm is however computationally very complex as it requires the complete data matrix to be calculated and stored. Further, the kernelized nature of the kernel k-means algorithm hinders the parallelization of its computations on modern infrastructures for distributed computing. This thesis defines a family of kernel-based low-dimensional embeddings that allows for scaling kernel k-means on MapReduce via an efficient and unified parallelization strategy. Then, three practical methods for low-dimensional embedding that adhere to our definition of the embedding family are proposed. Combining the proposed parallelization strategy with any of the three embedding methods constitutes a complete scalable and efficient MapReduce algorithm for kernel k-means. The efficiency and the scalability of the presented algorithms are demonstrated analytically and empirically.

Page generated in 0.063 seconds