Global ETD Search

1	Optimization-Based Network Analysis with Applications in Clustering and Data Mining Shahinpour, Shahram 16 December 2013 (has links) In this research we develop theoretical foundations and efficient solution methods for two classes of cluster-detection problems from optimization point of view. In particular, the s-club model and the biclique model are considered due to various application areas. An analytical review of the optimization problems is followed by theoretical results and algorithmic solution methods developed in this research. The maximum s-club problem has applications in graph-based data mining and robust network design where high reachability is often considered a critical property. Massive size of real-life instances makes it necessary to devise a scalable solution method for practical purposes. Moreover, lack of heredity property in s-clubs imposes challenges in the design of optimization algorithms. Motivated by these properties, a sufficient condition for checking maximality, by inclusion, of a given s-club is proposed. The sufficient condition can be employed in the design of optimization algorithms to reduce the computational effort. A variable neighborhood search algorithm is proposed for the maximum s-club problem to facilitate the solution of large instances with reasonable computational effort. In addition, a hybrid exact algorithm has been developed for the problem. Inspired by wide usability of bipartite graphs in modeling and data mining, we consider three classes of the maximum biclique problem. Specifically, the maximum edge biclique, the maximum vertex biclique and the maximum balanced biclique problems are considered. Asymptotic lower and upper bounds on the size of these structures in uniform random graphs are developed. These bounds are insightful in understanding the evolution and growth rate of bicliques in large-scale graphs. To overcome the computational difficulty of solving large instances, a scale-reduction technique for the maximum vertex and maximum edge biclique problems, in general graphs, is proposed. The procedure shrinks the underlying network, by confirming and removing edges that cannot be in the optimal solution, thus enabling the exact solution methods to solve large-scale sparse instances to optimality. Also, a combinatorial branch-and-bound algorithm is developed that best suits to solve dense instances where scale-reduction method might be less effective. Proposed algorithms are flexible and, with small modifications, can solve the weighted versions of the problems. s-club biclique cluster-detection clustering scale-reduction asymptotic bounds
2	Semi-supervised Information Fusion for Clustering, Classification and Detection Applications Li, Huaying January 2017 (has links) Information fusion techniques have been widely applied in many applications including clustering, classification, detection and etc. The major objective is to improve the performance using information derived from multiple sources as compared to using information obtained from any of the sources individually. In our previous work, we demonstrated the performance improvement of Electroencephalography(EEG) based seizure detection using information fusion. In the detection problem, the optimal fusion rule is usually derived under the assumption that local decisions are conditionally independent given the hypotheses. However, due to the fact that local detectors observe the same phenomenon, it is highly possible that local decisions are correlated. To address the issue of correlation, we implement the fusion rule sub-optimally by first estimating the unknown parameters under one of the hypotheses and then using them as known parameters to estimate the rest of unknown parameters. In the aforementioned scenario, the hypotheses are uniquely defined, i.e., all local detectors follow the same labeling convention. However, in certain applications, the regions of interest (decisions, hypotheses, clusters and etc.) are not unique, i.e., may vary locally (from sources to sources). In this case, information fusion becomes more complicated. Historically, this problem was first observed in classification and clustering. In classification applications, the category information is pre-defined and training data is required. Therefore, a classification problem can be viewed as a detection problem by considering the pre-defined classes as the hypotheses in detection. However, information fusion in clustering applications is more difficult due to the lack of prior information and the correspondence problem caused by symbolic cluster labels. In the literature, information fusion in clustering problem is usually referred to as clustering ensemble problem. Most of the existing clustering ensemble methods are unsupervised. In this thesis, we proposed two semi-supervised clustering ensemble algorithms (SEA). Similar to existing ensemble methods, SEA consists of two major steps: the generation and fusion of base clusterings. Analogous to distributed detection, we propose a distributed clustering system which consists of a base clustering generator and a decision fusion center. The role of the base clustering generator is to generate multiple base clusterings for the given data set. The role of the decision fusion center is to combine all base clusterings into a single consensus clustering. Although training data is not required by conventional clustering algorithms (usually unsupervised), in many applications expert opinions are always available to label a small portion of data observations. These labels can be utilized as the guidance information in the fusion process. Therefore, we design two operational modes for the fusion center according to the absence or presence of the training data. In the unsupervised mode, any existing unsupervised clustering ensemble methods can be implemented as the fusion rule. In the semi-supervised mode, the proposed semi-supervised clustering ensemble methods can be implemented. In addition, a parallel distributed clustering system is also proposed to reduce the computational times of clustering high-volume data sets. Moreover, we also propose a new cluster detection algorithm based on SEA. It is implemented in the system to provide feedback information. When data observations from a new class (other than existing training classes) are detected, signal is sent out to request new training data or switching from the semi-supervised mode to the unsupervised mode. / Thesis / Doctor of Philosophy (PhD)
3	Anomaly detection with extreme value and uncertainty considerations Dudgeon, Shelby Hart 13 December 2024 (has links) (PDF) This dissertation examines a method for detecting clusters in financial loan amount data. After a literature review of scan statistics, order statistics, and extreme value theory, this study introduces a method that uses a scan statistic approach for anomaly detection, along with a tuning parameter that can help with any model uncertainty that may appear. Once these methods are applied on the lower tail on the financial data and clusters are detected, the methods are then extended and modified to get a better handle on the upper tail of the data. The upper tail is first fit by using a peaks-over-threshold approach. The data in the upper tail is then transformed to the generalized Pareto CDF transform, and the scan-based method is applied to the transformed data to identify anomalous loan amounts in the upper tail. These methods were put to a case study and used on two different banks that participated in the Paycheck Protection Program, a program that was previously linked with misreporting and fraud.
4	Spatial Variation in Risk Factors for Malaria in Muleba, Tanzania Thickstun, Charles Russell 18 April 2019 (has links) Despite the rich knowledge surrounding risk factors for malaria, the spatial processes of malaria transmission and vector control interventions are underexplored. This thesis aims 1) to describe the spatial variation of risk factor effects on malaria infection, and 2) to determine the presence and range of any community effect from malaria vector control interventions. Data from a cluster-randomized control trial in Tanzania were analyzed to determine the geographically-weighted odds of malaria infection in children at trial baseline and post-intervention. The spatial range of intervention effects on malaria infection was estimated post-intervention using semivariance models. Spatial heterogeneities in malaria infection and each covariate under study were found. The median effective semivariance range of intervention effects was approximately 1200 meters, suggesting the presence of a community effect that may cause contamination between trial clusters. Trials should consider these spatial effects when examining interventions and ensure that clusters are adequately insulated from contamination. Malaria Geographically-weighted Regression Cluster Detection Community Effect Buffer Semivariance Spatial Analysis
5	Evaluation des méthodes statistiques en épidémiologie spatiale : cas des méthodes locales de détection d'agrégats / Evaluation of statistical methods in spatial epidemiology : the case of cluster detection tests Guttmann, Aline 27 November 2014 (has links) L'évaluation des performances des méthodes de détection d'agrégats de maladie est fondamentale dans le domaine de l'épidémiologie spatiale et, paradoxalement, on déplore une absence de consensus quant à sa conduite. Cette problématique est d'autant plus importante que les nouvelles technologies de partage d'informations promettent une évolution importante des signaux disponibles pour l'épidémiologie et la veille sanitaire. Les spécialistes du domaine ont adopté un mode d'évaluation fondé sur l'utilisation concomitante de plusieurs indicateurs de performances complémentaires tels que des indicateurs dérivés de l'évaluation des méthodes diagnostiques ou encore diverses définitions de puissance conditionnelle. Cependant, ces évaluations issues de schémas de simulation classiques reposent sur le choix de quelques hypothèses alternatives particulières et ne permettent qu'une interprétation limitée à ces hypothèses. De plus, la démultiplication des indicateurs évaluant la performance, différents selon les protocoles, gêne la comparaison des études entres elles et complique l'interprétation des résultats. Notre travail propose et évalue plusieurs indicateurs de performance prenant en compte à la fois puissance et précision de localisation. Leur intérêt dans l'évaluation spatiale systématique des méthodes est illustré par la création de cartes de performance. En complément de l'évaluation des performances lorsqu'une détection est attendue, nous proposons également une méthode d'évaluation de la répartition spatiale de l'erreur de type I complétée par la construction d'une nouvelle inférence statistique testant l'éventualité d'un effet de bord. / Although performance assessment of cluster detection tests is a critical issue in spatial epidemiology, there is a lack of consensus regarding how it should be carried out. Nowadays, with the spread of new technologies in network systems, data sources for epidemiology are undergoing radical changes that will increase the need for performance evaluation. Field specialists are currently evaluating cluster detection tests with multiple complementary performance indicators such as conditional powers or indicators derived from the field of diagnostic tools evaluation. These evaluations are performed following classical protocols for power assessment and are often limited to a few number of simulated alternative hypotheses, thus restricting results interpretation and scope. Furthermore, with the use of multiple varying indicators, comparisons between studies is difficult at best. This work proposes and compares different global performance indicators that take into account both usual power and location accuracy. Their benefit for cluster detection tests evaluation is illustrated with a systematic spatial assessment enabling performance mapping. In addition to the evaluation of performance when clusters exist, we also propose a method for the spatial evaluation of type I error, together with a new statistical test for edge effect. Épidémiologie spatiale Performance Spatial epidemiology Cluster detection tests Performance evaluation
6	Three Dimensional Spatio-Temporal Cluster Analysis of SARS-CoV-2 Infections Allison, Keith W 28 June 2022 (has links) The COVID-19 pandemic has heightened the need for fine-scale analysis of the clustering of cases of infectious disease in order to better understand and prevent the localized spread of infection. The students living on the University of Massachusetts, Amherst campus provided a unique opportunity to do so, due to frequent mandatory testing during the 2020-2021 academic year, and dense living conditions. The South-West dormitory area is of particular interest due to its extremely high population density, housing around half of students living on campus during normal conditions. Using data gathered by the Public Health Promotion Center (PHPC), we analyzed the clustering of SARS-CoV 2 cases in three-dimensional space as well as time within and between the three tallest occupied buildings in the Southwest dormitory area, John Quincy Adams, Kennedy, and Coolidge. We used the SaTScan program and its Space-Time Permutation Model, which searches for areas with a greater than expected number of cases. Analysis was done at various levels of spacial detail. Additionally, this analysis was compared to the purely temporal surveillance method, CDC’s Early Aberration Reporting System (EARS). Analysis with SaTScan at the room and floor level showed multiple significant clusters within the Coolidge dormitory building. Floor-level analysis was found to be as sensitive as and less burdensome than room-level analysis. We recommend using scan statistics in conjunction with other methods such as purely temporal scans and wastewater analysis to detect and respond to outbreaks on campus. COVID-19 SARS-CoV-2 Cluster detection Outbreak detection Spatial analysis Temporal analysis Biostatistics Epidemiology
7	CLUSTER AND COLLECT : Compile Time Optimization For Effective Garbage Collection Ravindar, Archana 05 1900 (has links) (PDF) No description available. Compilers Garbage (Computer Science) Object Clustering Cluster Detection Algorithm Cluster (Computing) Cluster Analysis Clustering Garbage Collection (Computer Science) Computer Science
8	多重群集的偵測研究 / A study of methods for detecting multiple clusters 黃柏誠, Huang, Bo Cheng Unknown Date (has links) 檢測某些地區是否有較高的疾病發生率，亦即群集(Cluster)現象，是近年來空間統計(Spatial Statistics)在流行病學的主要應用之一，常見的偵測方法包括SaTScan (Kulldorff, 1995)及Spatial Scan Statistic (Li et al., 2011)。這些方法多半大都採用一次性偵測，也就是比較疑似群集之內外相對風險(Relative Risk)，如此確實可提高計算效率，同時檢視所有疑似群集。然而，一次性偵測會受到群集外其他發生率較高群集的影響，對於相對風險較小群集的偵測能力過於保守(Zhang et al., 2010)。本文以多重群集偵測為研究目標，以逐次分析的方式修正SaTScan等群集偵測方法，逐一篩選出發生率較高的顯著群集，並探討逐次分析在使用上的時機及限制。除了透過電腦模擬，測試逐次群集分析的改進效果，我們也分析臺灣地區的癌症死亡率，比較偵測結果的差異。研究發現，逐次群集偵測確實能提高相對風險較小群集的偵測能力，像是在相對風險不大於1.6的群集時尤其有效，但若相對風險大於1.6時，SaTScan的偵測能力不受多重群集的影響。 / Cluster detection, one of the major research topics in spatial statistics, has been applied to identify areas with higher incidence rates and is very popular in many fields such as epidemiology. Many famous cluster detection methods are proposed, such as SaTScan (Kulldorff, 1995) and Spatial Scan Statistic (Li et al., 2011). Most of these methods adapt the idea for comparing the relative risk inside and outside the suspected clusters. Although these methods are efficient computationally, clusters with smaller relative risk are not easy to be detected (Zhang et al, 2010). The goal of this study is to apply the idea of sequential search into SaTScan, in order to improve the power of detecting clusters with smaller relative risk, and to explore the limitation of sequential method. The computer simulation and empirical study (Taiwan cancer mortality data) are used to evaluate the sequential SaTScan. We found that the Sequential method can improve the power of cluster detection, especially effective for the cases where the clusters with relative risk not greater than 1.6. However, the sequential method also suffers from identifying false clusters. 群集偵測空間統計逐次分析電腦模擬 Cluster detection Spatial statistics Sequential method Computer simulation
9	焦點檢定方法比較 / A simulation study for evaluating focused tests of cluster detection 蔡丞庭 Unknown Date (has links) 臺灣的癌症發生率及死亡率有連年增加的趨勢，研究指出原因可能與環境中的污染物質有關，檢測可能的污染源附近是否存在癌症群聚(Cluster)，將有助於未來的癌症防治。在空間統計(Spatial Statistics)有不少方法可用於檢測群聚現象，其中用來檢測某個特定位置周圍是否發生群聚的方法被稱為焦點檢定(Focused Test)，本文介紹及評估常用的焦點檢定方法，並使用較佳方法探討臺灣地區疑似污染源的地區。首先本文使用電腦模擬，在不同情境假設下比較焦點檢定方法的檢定力(Power)，例如研究區域大小、群聚形狀等不同的情境，以判斷檢定方法之間的優劣。最後本文分析臺灣鄉鎮市(Township)層級癌症死亡資料，應用焦點檢定方法分析石門核一廠、恆春核三廠及麥寮六輕周圍的癌症死亡率，檢定結果發現核一廠及麥寮六輕附近有較高的癌症死亡率。 / The cancer incidence and mortality rate in Taiwan have been increasing over the past 30 years. Previous studies indicate that the pollution sources, especially for those creating air pollution and excess radiation, are one of the potential causes for the increment. Correctly, detecting the location of possible sources of contaminants can help for cancer prevention. In spatial statistics, focused test can be used to determine if the intensity rate are higher around a possible pollution source. We will introduce and evaluate frequently used focused tests and apply them in Taiwan. First we use computer simulation to compare the power of focused tests in different scenarios, such as study region and cluster shape. Next, we apply the focused tests to Taiwan cancer mortality data, in order to decide if the cancer mortality rates are higher around Chinshan nuclear power plant, Maanshan nuclear power plant, and Mailiao sixth naphtha cracker. The results show that the cancer mortality rates around Chinshan nuclear power plant and Mailiao sixth naphtha cracker are significantly higher. 群聚偵測焦點檢定癌症死亡率檢定力電腦模擬 cluster detection focused test cancer mortality power computer simulation

Search results