• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 75
  • 10
  • 5
  • 3
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 114
  • 114
  • 44
  • 26
  • 21
  • 20
  • 19
  • 19
  • 15
  • 15
  • 14
  • 14
  • 14
  • 13
  • 12
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Tackling the Antibiotic Resistant Bacteria Crisis Using Longitudinal Antibiograms

Tlachac, Monica 31 May 2018 (has links)
Antibiotic resistant bacteria, a growing health crisis, arise due to antibiotic overuse and misuse. Resistant infections endanger the lives of patients and are financially burdensome. Aggregate antimicrobial susceptibility reports, called antibiograms, are critical for tracking antibiotic susceptibility and evaluating the likelihood of the effectiveness of different antibiotics to treat an infection prior to the availability of patient specific susceptibility data. This research leverages the Massachusetts Statewide Antibiogram database, a rich dataset composed of antibiograms for $754$ antibiotic-bacteria pairs collected by the Massachusetts Department of Public Health from $2002$ to $2016$. However, these antibiograms are at least a year old, meaning antibiotics are prescribed based on outdated data which unnecessarily furthers resistance. Our objective is to employ data science techniques on these antibiograms to assist in developing more responsible antibiotic prescription practices. First, we use model selectors with regression-based techniques to forecast the current antimicrobial resistance. Next, we develop an assistant to immediately identify clinically and statistically significant changes in antimicrobial resistance between years once the most recent year of antibiograms are collected. Lastly, we use k-means clustering on resistance trends to detect antibiotic-bacteria pairs with resistance trends for which forecasting will not be effective. These three strategies can be implemented to guide more responsible antibiotic prescription practices and thus reduce unnecessary increases in antibiotic resistance.
22

Avaliação e seleção de modelos em detecção não supervisionada de outliers / On the internal evaluation of unsupervised outlier detection

Marques, Henrique Oliveira 23 March 2015 (has links)
A área de detecção de outliers (ou detecção de anomalias) possui um papel fundamental na descoberta de padrões em dados que podem ser considerados excepcionais sob alguma perspectiva. Uma importante distinção se dá entre as técnicas supervisionadas e não supervisionadas. O presente trabalho enfoca as técnicas de detecção não supervisionadas. Existem dezenas de algoritmos desta categoria na literatura, porém cada um deles utiliza uma intuição própria do que deve ser considerado um outlier ou não, que é naturalmente um conceito subjetivo. Isso dificulta sensivelmente a escolha de um algoritmo em particular e também a escolha de uma configuração adequada para o algoritmo escolhido em uma dada aplicação prática. Isso também torna altamente complexo avaliar a qualidade da solução obtida por um algoritmo/configuração em particular adotados pelo analista, especialmente em função da problemática de se definir uma medida de qualidade que não seja vinculada ao próprio critério utilizado pelo algoritmo. Tais questões estão inter-relacionadas e se referem respectivamente aos problemas de seleção de modelos e avaliação (ou validação) de resultados em aprendizado de máquina não supervisionado. Neste trabalho foi desenvolvido um índice pioneiro para avaliação não supervisionada de detecção de outliers. O índice, chamado IREOS (Internal, Relative Evaluation of Outlier Solutions), avalia e compara diferentes soluções (top-n, i.e., rotulações binárias) candidatas baseando-se apenas nas informações dos dados e nas próprias soluções a serem avaliadas. O índice também é ajustado estatisticamente para aleatoriedade e extensivamente avaliado em vários experimentos envolvendo diferentes coleções de bases de dados sintéticas e reais. / Outlier detection (or anomaly detection) plays an important role in the pattern discovery from data that can be considered exceptional in some sense. An important distinction is that between the supervised and unsupervised techniques. In this work we focus on unsupervised outlier detection techniques. There are dozens of algorithms of this category in literature, however, each of these algorithms uses its own intuition to judge what should be considered an outlier or not, which naturally is a subjective concept. This substantially complicates the selection of a particular algorithm and also the choice of an appropriate configuration of parameters for a given algorithm in a practical application. This also makes it highly complex to evaluate the quality of the solution obtained by an algorithm or configuration adopted by the analyst, especially in light of the problem of defining a measure of quality that is not hooked on the criterion used by the algorithm itself. These issues are interrelated and refer respectively to the problems of model selection and evaluation (or validation) of results in unsupervised learning. Here we developed a pioneer index for unsupervised evaluation of outlier detection results. The index, called IREOS (Internal, Relative Evaluation of Outlier Solutions), can evaluate and compare different candidate (top-n, i.e., binary labelings) solutions based only upon the data information and the solution to be evaluated. The index is also statistically adjusted for chance and extensively evaluated in several experiments involving different collections of synthetic and real data sets.
23

Caracterização de classes e detecção de outliers em redes complexa / Characterization of classes and outliers detection in complex networks

Berton, Lilian 25 April 2011 (has links)
As redes complexas surgiram como uma nova e importante maneira de representação e abstração de dados capaz de capturar as relações espaciais, topológicas, funcionais, entre outras características presentes em muitas bases de dados. Dentre as várias abordagens para a análise de dados, destacam-se a classificação e a detecção de outliers. A classificação de dados permite atribuir uma classe aos dados, baseada nas características de seus atributos e a detecção de outliers busca por dados cujas características se diferem dos demais. Métodos de classificação de dados e de detecção de outliers baseados em redes complexas ainda são pouco estudados. Tendo em vista os benefícios proporcionados pelo uso de redes complexas na representação de dados, o presente trabalho apresenta o desenvolvimento de um método baseado em redes complexas para detecção de outliers que utiliza a caminhada aleatória e um índice de dissimilaridade. Este método possibilita a identificação de diferentes tipos de outliers usando a mesma medida. Dependendo da estrutura da rede, os vértices outliers podem ser tanto aqueles distantes do centro como os centrais, podem ser hubs ou vértices com poucas ligações. De um modo geral, a medida proposta é uma boa estimadora de vértices outliers em uma rede, identificando, de maneira adequada, vértices com uma estrutura diferenciada ou com uma função especial na rede. Foi proposta também uma técnica de construção de redes capaz de representar relações de similaridade entre classes de dados, baseada em uma função de energia que considera medidas de pureza e extensão da rede. Esta rede construída foi utilizada para caracterizar mistura entre classes de dados. A caracterização de classes é uma questão importante na classificação de dados, porém ainda é pouco explorada. Considera-se que o trabalho desenvolvido é uma das primeiras tentativas nesta direção / Complex networks have emerged as a new and important way of representation and data abstraction capable of capturing the spatial relationships, topological, functional, and other features present in many databases. Among the various approaches to data analysis, we highlight classification and outlier detection. Data classification allows to assign a class to the data based on characteristics of their attributes and outlier detection search for data whose characteristics differ from the others. Methods of data classification and outlier detection based on complex networks are still little studied. Given the benefits provided by the use of complex networks in data representation, this study developed a method based on complex networks to detect outliers based on random walk and on a dissimilarity index. The method allows the identification of different types of outliers using the same measure. Depending on the structure of the network, the vertices outliers can be either those distant from the center as the central, can be hubs or vertices with few connections. In general, the proposed measure is a good estimator of outlier vertices in a network, properly identifying vertices with a different structure or a special function in the network. We also propose a technique for building networks capable of representing similarity relationships between classes of data based on an energy function that considers measures of purity and extension of the network. This network was used to characterize mixing among data classes. Characterization of classes is an important issue in data classification, but it is little explored. We consider that this work is one of the first attempts in this direction
24

Identificação de outliers em redes complexas baseado em caminhada aleatória / Outlier detection in complex networks based on random walk

Araújo, Bilzã Marques de 20 September 2010 (has links)
Na natureza e na ciência, dados e informações que desviam significativamente da média frequentemente possuem grande relevância. Esses dados são usualmente denominados na literatura como outliers. A identificação de outliers é importante em muitas aplicações reais, tais como detecção de fraudes, diagnóstico de falhas, e monitoramento de condições médicas. Nos últimos anos tem-se testemunhado um grande interesse na área de Redes Complexas. Redes complexas são grafos de grande escala que possuem padrões de conexão não trivial, mostrando-se uma poderosa maneira de representação e abstração de dados. Embora um grande montante de resultados tenham sido reportados nesta área de pesquisa, pouco tem sido explorado acerca de detecção de outliers em redes complexas. Considerando-se a dinâmica de uma caminhada aleatória, foram propostos neste trabalho uma medida de distância e um método de ranqueamento de outliers. Através desta técnica, é possível detectar como outlier não somente nós periféricos, mas também nós centrais (hubs), depedendo da estrutura da rede. Também foi identificado que existem características bem definidas entre os nós outliers, relacionadas a funcionalidade dos mesmos para a rede. Além disso, foi descoberto que nós outliers têm papel importante para a rotulação a priori na tarefa de detecção de comunidades semi-supervisionada. Isto porque os nós centrais são bons difusores de informação e os nós periféricos encontram-se em regiões de borda de comunidade. Baseado nessa observação, foi proposto um método de detecção de comunidades semi-supervisionado. Os resultados de simulações mostram que essa abordagem é promissora / In nature and science, information and data that deviate significantly from the average value often have great relevance. These data are often called in literature as outliers. Outlier identification is important in many real applications, such as fraud detection, fault diagnosis, monitoring of medical conditions. In recent years, it has been witnessed a great interest in the area of Complex Networks. Complex networks are large-scale graphs with non-trivial connection patterns, proving to be a powerful way of data representation and abstraction. Although a large amount of results have been reported in this research area, little has been explored about the outlier detection in complex networks. Considering the dynamics of a random walk, we proposed in this paper a distance measure and a outlier ranking method. By using this technique, we can detect not only peripheral nodes, but also central nodes (hubs) as outliers, depending on the network structure. We also identified that there are well defined relationship between the outlier nodes and the functionality of the same nodes for the network. Furthermore, we found that outliers play an important role to label a priori nodes in the task of semi-supervised community detection. This is because the hubs are good information disseminators and peripheral nodes are usually localized in the regions of community edges. Based on this observation, we proposed a method of semi-supervised community detection. The simulation results show that this approach is promising
25

Técnica de aprendizado semissupervisionado para detecção de outliers / A semi-supervised technique for outlier detection

Zamoner, Fabio Willian 23 January 2014 (has links)
Detecção de outliers desempenha um importante papel para descoberta de conhecimento em grandes bases de dados. O estudo é motivado por inúmeras aplicações reais como fraudes de cartões de crédito, detecção de falhas em componentes industriais, intrusão em redes de computadores, aprovação de empréstimos e monitoramento de condições médicas. Um outlier é definido como uma observação que desvia das outras observações em relação a uma medida e exerce considerável influência na análise de dados. Embora existam inúmeras técnicas de aprendizado de máquina para tratar desse problemas, a maioria delas não faz uso de conhecimento prévio sobre os dados. Técnicas de aprendizado semissupervisionado para detecção de outliers são relativamente novas e incluem apenas um pequeno número de rótulos da classe normal para construir um classificador. Recentemente um modelo semissupervisionado baseado em rede foi proposto para classificação de dados empregando um mecanismo de competição e cooperação de partículas. As partículas são responsáveis pela propagação dos rótulos para toda a rede. Neste trabalho, o modelo foi adaptado a fim de detectar outliers através da definição de um escore de outlier baseado na frequência de visitas. O número de visitas recebido por um outlier é significativamente diferente dos demais objetos de mesma classe. Essa abordagem leva a uma maneira não tradicional de tratar os outliers. Avaliações empíricas sobre bases artificiais e reais demonstram que a técnica proposta funciona bem para bases desbalanceadas e atinge precisão comparável às obtidas pelas técnicas tradicionais de detecção de outliers. Além disso, a técnica pode fornecer novas perspectivas sobre como diferenciar objetos, pois considera não somente a distância física, mas também a formação de padrão dos dados / Outloier detection plays an important role for discovering knowledge in large data sets. The study is motivated by plethora of real applications such as credit card frauds, fault detection in industrial components, network instrusion detection, loan application precoessing and medical condition monitoring. An outlier is defined as an observation that deviates from other observations with respect to a measure and exerts a substantial influence on data analysis. Although numerous machine learning techniques have been developed for attacking this problem, most of them work with no prior knowledge of the data. Semi-supervised outlier detection techniques are reçlatively new and include only a few labels of normal class for building a classifier. Recently, a network-based semi-supervised model was proposed for data clasification by employing a mechanism based on particle competiton and cooperation. Such particle competition and cooperaction. Such particles are responsible for label propagation throughout the network. In this work, we adapt this model by defining a new outlier score based on visit frequency counting. The number of visits received by an outlier is significantly different from the remaining objects. This approach leads to an anorthodox way to deal with outliers. Our empirical ecaluations on both real and simulated data sets demonstrate that proposed technique works well with unbalanced data sets and achieves a precision compared to traditional outlier detection techniques. Moreover, the technique might provide new insights into how to differentiate objects because it considers not only the physical distance but also the pattern formation of the data
26

Distributed Local Outlier Factor with Locality-Sensitive Hashing

Zheng, Lining 08 November 2019 (has links)
Outlier detection remains a heated area due to its essential role in a wide range of applications, including intrusion detection, fraud detection in finance, medical diagnosis, etc. Local Outlier Factor (LOF) has been one of the most influential outlier detection techniques over the past decades. LOF has distinctive advantages on skewed datasets with regions of various densities. However, the traditional centralized LOF faces new challenges in the era of big data and no longer satisfies the rigid time constraints required by many modern applications, due to its expensive computation overhead. A few researchers have explored the distributed solution of LOF, but existant methods are limited by their grid-based data partitioning strategy, which falls short when applied to high-dimensional data. In this thesis, we study efficient distributed solutions for LOF. A baseline MapReduce solution for LOF implemented with Apache Spark, named MR-LOF, is introduced. We demonstrate its disadvantages in communication cost and execution time through complexity analysis and experimental evaluation. Then an approximate LOF method is proposed, which relies on locality-sensitive hashing (LSH) for partitioning data and enables fully distributed local computation. We name it MR-LOF-LSH. To further improve the approximate LOF, we introduce a process called cross-partition updating. With cross-partition updating, the actual global k-nearest neighbors (k-NN) of the outlier candidates are found, and the related information of the neighbors is used to update the outlier scores of the candidates. The experimental results show that MR-LOF achieves a speedup of up to 29 times over the centralized LOF. MR-LOF-LSH further reduces the execution time by a factor of up to 9.9 compared to MR-LOF. The results also highlight that MR-LOF-LSH scales well as the cluster size increases. Moreover, with a sufficient candidate size, MR-LOF-LSH is able to detect in most scenarios over 90% of the top outliers with the highest LOF scores computed by the centralized LOF algorithm.
27

Adaptive Variation in Tiger Salamander Populations

Parsley, Meghan 01 October 2017 (has links)
Amphibians face an unknown future in a time of rapid environmental change due to global climate perturbations. Since amphibians are perceived to be indicators of ecosystem health, understanding the causes of their declines can improve our perception of threats to other species. Molecular techniques have allowed us to explore how environmental change affects genetic variation and to predict evolutionary adaptive potential of amphibian populations. The identification of populations with the greatest potential to respond to changing environmental variables may be an important conservation strategy to aid in future management efforts. I utilized targeted exon capture sequencing to identify adaptive variation in California tiger salamanders (CTS; Ambystoma californiense), a species threatened by land use change and hybridization with barred tiger salamanders (A. mavortium). I identified 17 and 26 outlier loci for balancing selection in historic and recent samples of CTS respectively. The outlier loci corresponded to genes of various functions, though none of the outliers associated significantly with the change in several tested environmental variables. Despite the lack of environmental correlations detected, it must also be considered that the outlier loci could be involved in epistatic interactions where many genes with small effects influence a single phenotype with fitness benefits. Additional hypotheses to explain the observed changes in allele frequencies and outliers may be the effects of UV-B radiation, pesticide use, or indirect effects of climate change.
28

Performance monitoring of wind turbines : a data-mining approach

Verma, Anoop Prakash 01 July 2012 (has links)
The rapid growth of wind turbines in terms of turbine size, number of installations and rated capacity has a huge impact on its operations and maintenance costs. Monitoring the performance of wind turbines and early fault prediction is highly desirable. To date, traditional maintenance strategies such as reactive maintenance, periodic maintenance etc. are more prevalent in wind industry. However, over the last couple of years, the research pertaining to wind turbine has been shifted towards the condition monitoring and maintenance. Condition monitoring approaches have shown their potential in wind industry by providing continuous monitoring of the wind turbines, and identifying fault signatures in the event of faults. However, most of the studies reported in literature are based on the simulated dataset, or in constrained experiments. In reality, the external environment plays an important role in governing the turbine operations. Moreover, the cost associated with condition monitoring cannot be justified as it often requires installations of specific sensors, equipment. Another stream of research focuses on utilizing historical turbine data for turbine performance assessment in real time. The cost associated with such approaches is almost negligible as most of the wind farms are equipped with SCADA systems which records turbine performance data in regular time-interval. Such approaches are called as performance monitoring. In this dissertation, the performance monitoring of wind turbines is accomplished using the historical wind turbine data. The information from SCADA operational data, and fault logs is used to construct accurate models predicting the critical wind turbine faults. Depending upon the nature of turbine faults, monitoring wind turbines with different objectives is studied to accomplish different research goals. Two research directions of wind turbines performance are pursued, (1) identification and prediction of critical turbine faults, and (2) monitoring the performance of overall wind farm. The goal of predicting critical faults is to facilitate planned maintenance, whereas, monitoring the performance of overall wind farm provides the status-quo of all wind turbines installed in a wind farm. Depending on the requirement, the performance of overall wind farm can be assessed on a daily, weekly, or monthly basis. Solution methodologies presented in the dissertation are generic enough to be applicable to other industries such as wastewater treatment facilities, flood prediction, etc.
29

Airway on a chip: Data processing of occluded pulmonary airway reopening at bifurcations

January 2013 (has links)
In the reopening of fluid occluded airways, the pressure gradient due to the propagation of an air bubble causes extensive epithelial cell damage. The mechanism of cell necrosis and biotransport may be further understood by characterizing the flow fields near the tip of a semi-infinite bubble propagating through a fluid-filled bifurcation. A symmetric microfluidic pulmonary bifurcation model was fabricated for optical diagnostics with an instantaneous μ-PIV/ shadowgraphy microscopy system. Data handling and processing techniques were developed to calculate interfacial characteristics of multiphase flow from the microscopy system and accuracy was quantified through varying the apparatus set up. Differences in the interfacial geometric characteristics were quantified for changes in static and dynamic surface tension in comparisons of water, SDS, and Infasurf that may reflect changes in the mechanical stress that stimulate, and potentially damage, epithelial cells that line the airways. From these results, the asymmetrical tendencies of opening a symmetric pulmonary bifurcation model were quantified. It was found that pulmonary surfactant stabilized symmetric bifurcations that opened asymmetrically without the aid of surfactant. / acase@tulane.edu
30

Modelling and grey-box identification of curl and twist in paperboard manufacturing

Bortolin, Gianantonio January 2005 (has links)
The contents of this thesis can be divided into two main parts. The first one is the development of an identification methodology for the modelling of complex industrial processes. The second one is the application of this methodology to the curl and twist problem. The main purpose behind the proposed methodology is to provide a schematic planning, together with some suggested tools, when confronted with the challenge of building a complex model of an industrial process. Particular attention has been placed to outlier detection and data analysis when building a model from old, or historical, process data. Another aspect carefully handled in the proposed methodology is the identifiability analysis. In fact, it is rather common in process modelling that the model structure turns out to be weakly identifiable. Consequently, the problem of variable selection is treated at length in this thesis, and a new algorithm for variable selection based on regularization has been proposed and compared with some of the classical methods, yielding promising results. The second part of the thesis is about the development of a curl predictor. Curl is the tendency of paper of assuming a curved shape and is observed mainly during humidity changes. Curl in paper and in paperboard is a long-standing problem because it may seriously affect the processing of the paper. Unfortunately, curl cannot be measured online, but only in the laboratory after that an entire tambour has been produced. The main goal of this project is then to develop a model for curl and twist, and eventually to implement it as an on-line predictor to be used by the operators and process engineers as a tool for decision/control. The approach we used to tackle this problem is based on grey-box modelling. The reasons for such an approach is that the physical process is very complex and nonlinear. The influence of some inputs is not entirely understood, and besides it depends on a number of unknown parameters and unmodelled/unmesurable disturbances. Simulations on real data show a good agreement with the measurement, particularly for MD and CD curl, and hence we believe that the model has an usable accuracy for being implemented as an on-line predictor. / QC 20100928

Page generated in 0.1044 seconds