Global ETD Search

11	Text Clustering with String Kernels in R Karatzoglou, Alexandros, Feinerer, Ingo January 2006 (has links) (PDF) We present a package which provides a general framework, including tools and algorithms, for text mining in R using the S4 class system. Using this package and the kernlab R package we explore the use of kernel methods for clustering (e.g., kernel k-means and spectral clustering) on a set of text documents, using string kernels. We compare these methods to a more traditional clustering technique like k-means on a bag of word representation of the text and evaluate the viability of kernel-based methods as a text clustering technique. (author's abstract) / Series: Research Report Series / Department of Statistics and Mathematics
12	Improving Search Results with Automated Summarization and Sentence Clustering Cotter, Steven 23 March 2012 (has links) Have you ever searched for something on the web and been overloaded with irrelevant results? Many search engines tend to cast a very wide net and rely on ranking to show you the relevant results first. But, this doesn't always work. Perhaps the occurrence of irrelevant results could be reduced if we could eliminate the unimportant content from each webpage while indexing. Instead of casting a wide net, maybe we can make the net smarter. Here, I investigate the feasibility of using automated document summarization and clustering to do just that. The results indicate that such methods can make search engines more precise, more efficient, and faster, but not without costs. / McAnulty College and Graduate School of Liberal Arts / Computational Mathematics / MS / Thesis
13	Experimental Designs for Generalized Linear Models and Functional Magnetic Resonance Imaging January 2014 (has links) abstract: In this era of fast computational machines and new optimization algorithms, there have been great advances in Experimental Designs. We focus our research on design issues in generalized linear models (GLMs) and functional magnetic resonance imaging(fMRI). The first part of our research is on tackling the challenging problem of constructing exact designs for GLMs, that are robust against parameter, link and model uncertainties by improving an existing algorithm and providing a new one, based on using a continuous particle swarm optimization (PSO) and spectral clustering. The proposed algorithm is sufficiently versatile to accomodate most popular design selection criteria, and we concentrate on providing robust designs for GLMs, using the D and A optimality criterion. The second part of our research is on providing an algorithm that is a faster alternative to a recently proposed genetic algorithm (GA) to construct optimal designs for fMRI studies. Our algorithm is built upon a discrete version of the PSO. / Dissertation/Thesis / Doctoral Dissertation Statistics 2014 Statistics fMRI GLMs Locally optimal designs PSO Robust designs spectral clustering
14	Detection and Analysis of Online Extremist Communities Benigni, Matthew Curran 01 May 2017 (has links) Online social networks have become a powerful venue for political activism. In many cases large, insular online communities form that have been shown to be powerful diffusion mechanisms of both misinformation and propaganda. In some cases these groups users advocate actions or policies that could be construed as extreme along nearly any distribution of opinion, and are thus called Online Extremist Communities (OECs). Although these communities appear increasingly common, little is known about how these groups form or the methods used to influence them. The work in this thesis provides researchers a methodological framework to study these groups by answering three critical research questions: How can we detect large dynamic online activist or extremist communities? What automated tools are used to build, isolate, and influence these communities? What methods can be used to gain novel insight into large online activist or extremist communities? These group members social ties can be inferred based on the various affordances offered by OSNs for group curation. By developing heterogeneous, annotated graph representations of user behavior I can efficiently extract online activist discussion cores using an ensemble of unsupervised machine learning methods. I call this technique Ensemble Agreement Clustering. Through manual inspection, these discussion cores can then often be used as training data to detect the larger community. I present a novel supervised learning algorithm called Multiplex Vertex Classification for network bipartition on heterogeneous, annotated graphs. This methodological pipeline has also proven useful for social botnet detection, and a study of large, complex social botnets used for propaganda dissemination is provided as well. Throughout this thesis I provide Twitter case studies including communities focused on the Islamic State of Iraq and al-Sham (ISIS), the ongoing Syrian Revolution, the Euromaidan Movement in Ukraine, as well as the alt-Right. Covert Network Detection Community Detection Annotated Networks Multilayer Networks Heterogeneous Networks Spectral Clustering
15	p-Laplacian Spectral Clustering Applied in Software Testing / p-Laplacian Spektralklustring tillämpat på mjukvarutestning Ghafoory, Jones January 2019 (has links) Software testing plays a vital role in the software development life cycle. Having a more accurate and cost-efficient testing process is still demanded in the industry. Thus, test optimization becomes an important topic in both state of the art and state of the practice. Software testing today can be performed manually, automatically or semi-automatically. A manual test procedure is still popular for testing for instance in safety critical systems. For testing a software product manually, we need to create a set of manual test case specifications. The number of required test cases for testing a product is dependent on the product size, complexity, the company policies, etc. Moreover, generating and executing test cases manually is a time and resource consuming process. Therefore, ranking the test cases for execution can help us reduce the testing cost and also release the product faster to the market. In order to rank test cases for execution, we need to distinguish test cases from each other. In other words, the properties of each test case should be detected in advance. Requirement coverage is detected as a critical criterion for test cases optimization. In this thesis we propose an approach based on a $p$-Laplacian Spectral Clustering for detecting the traceability matrix between manual test cases and the requirements, in order to find the requirement coverage for the test cases. However, the feasibility of the proposed approach is studied by an empirical evaluation which has been performed on a railway use-case at Bombardier Transportation in Sweden. Through the experiments performed using our proposed method it was able to achieve an $F_1$-score up to $4.4\%$. Although the proposed approach under-performed for this specific problem compared to previous studies, it was possible to get some insights on what limitations $p$-Laplacian Spectral Clustering have and how it could potentially be modified for similar kind of problems. / Mjukvarutestning har en viktig roll inom mjukvaruutveckling. Att ha en mer exakt och kostnadseffektiv testprocess är efterfrågad i industrin. Därför är testoptimering ett viktigt ämne inom forskning och i praktiken. Idag kan mjukvarutestning utföras manuellt, automatiskt eller halvautomatiskt. En manuell testprocess är fortfarande populär för att testa säkerhetskritiska system. För att testa en programvara manuellt så måste vi skapa en uppsättning specifikationer för testfall. Antalet testfall som behövs kan bero på bland annat produktens storlek, komplexitet, företagspolicys etc. Att generera och utföra testfall manuellt är ofta en tids- och resurskrävande process. För att minska testkostnader och för att potentiellt sett kunna släppa produkten till marknaden snabbare kan det därför vara av intresse att rangordna vilka test fall som borde utföras. För att göra rangordningen så måste testfallens särskiljas på något vis. Med andra ord så måste varje testfalls egenskaper upptäckas i förväg. En viktig egenskap att urskilja från testfallen är hur många krav testfallet omfattar. I det här projektet tar vi fram en metod baserad på $p$-Laplacian spektralklustring för att hitta en spårbarhetsmatris mellan manuella testfall och krav för att ta reda på vilka krav som omfattas av alla testfall. För att evaluera metodens lämplighet så jämförs den mot en tidigare empirisk studie av samma problem som gjordes på ett järnvägsbruk hos Bombardier Transportation i Sverige. Från de experiment som utfördes med vår framtagna metod så kunde ett $F_1$-Score på $4.4\%$ uppnås. Även om den metod som togs fram i detta projekt underpresterade för det här specifika problemet så kunde insikter om vilka begränsningar $p$-Laplacian spektralklustring har och hur de potentiellt sett kan behandlas för liknande problem. Applied mathematics Software testing Spectral Clustering Tillämpad matematik mjukvarutestning spektral klustring Computational Mathematics Beräkningsmatematik
16	Computational Study of Calmodulin’s Ca2+-dependent Conformational Ensembles Westerlund, Annie M. January 2018 (has links) Ca2+ and calmodulin play important roles in many physiologically crucial pathways. The conformational landscape of calmodulin is intriguing. Conformational changes allow for binding target-proteins, while binding Ca2+ yields population shifts within the landscape. Thus, target-proteins become Ca2+-sensitive upon calmodulin binding. Calmodulin regulates more than 300 target-proteins, and mutations are linked to lethal disorders. The mechanisms underlying Ca2+ and target-protein binding are complex and pose interesting questions. Such questions are typically addressed with experiments which fail to provide simultaneous molecular and dynamics insights. In this thesis, questions on binding mechanisms are probed with molecular dynamics simulations together with tailored unsupervised learning and data analysis. In Paper 1, a free energy landscape estimator based on Gaussian mixture models with cross-validation was developed and used to evaluate the efficiency of regular molecular dynamics compared to temperature-enhanced molecular dynamics. This comparison revealed interesting properties of the free energy landscapes, highlighting different behaviors of the Ca2+-bound and unbound calmodulin conformational ensembles. In Paper 2, spectral clustering was used to shed light on Ca2+ and target protein binding. With these tools, it was possible to characterize differences in target-protein binding depending on Ca2+-state as well as N-terminal or C-terminal lobe binding. This work invites data-driven analysis into the field of biomolecule molecular dynamics, provides further insight into calmodulin’s Ca2+ and targetprotein binding, and serves as a stepping-stone towards a complete understanding of calmodulin’s Ca2+-dependent conformational ensembles. / <p>QC 20180912</p> Molecular dynamics Calmodulin Free energy estimation Gaussian mixture models Spectral clustering conformational selection Biophysics Biofysik
17	Classification spectrale semi-supervisée : Application à la supervision de l'écosystème marin / Constrained spectral clustering : Application to the monitoring of the marine ecosystem Wacquet, Guillaume 08 December 2011 (has links) Dans les systèmes d'aide à la décision, sont généralement à disposition des données numériques abondantes et éventuellement certaines connaissances contextuelles qualitatives, disponibles a priori ou fournies a posteriori par retour d'expérience. Les performances des approches de classification, en particulier spectrale, dépendent de l'intégration de ces connaissances dans leur conception. Les algorithmes de classification spectrale permettent de traiter la classification sous l'angle de coupes de graphe. Ils classent les données dans l'espace des vecteurs propres de la matrice Laplacienne du graphe. Cet espace est censé mieux révéler la présence de groupements naturels linéairement séparables. Dans ce travail, nous nous intéressons aux algorithmes intégrant des connaissances type contraintes de comparaison. L'espace spectral doit, dans ce cas, révéler la structuration en classes tout en respectant, autant que possible, les contraintes de comparaison. Nous présentons un état de l'art des approches spectrales semi-supervisées contraintes. Nous proposons un nouvel algorithme qui permet de générer un sous-espace de projection par optimisation d'un critère de multi-coupes normalisé avec ajustement des coefficients de pénalité dus aux contraintes. Les performances de l'algorithme sont mises en évidence sur différentes bases de données par comparaison à d'autres algorithmes de la littérature. Dans le cadre de la surveillance de l'écosystème marin, nous avons développé un système de classification automatique de cellules phytoplanctoniques, analysées par cytométrie en flux. Pour cela, nous avons proposé de mesurer les similarités entre cellules par comparaison élastique entre leurs signaux profils caractéristiques. / In the decision support systems, often, there a huge digital data and possibly some contextual knowledge available a priori or provided a posteriori by feedback. The performances of classification approaches, particularly spectral ones, depend on the integration of the domain knowledge in their design. Spectral classification algorithms address the problem of classification in terms of graph cuts. They classify the data in the eigenspace of the graph Laplacian matrix. The generated eigenspace may better reveal the presence of linearly separable data clusters. In this work, we are particularly interested in algorithms integrating pairwise constraints : constrained spectral clustering. The eigenspace may reveal the data structure while respecting the constraints. We present a state of the art approaches to constrained spectral clustering. We propose a new algorithm, which generates a subspace projection, by optimizing a criterion integrating both normalized multicut and penalties due to the constraints. The performances of the algorithms are demonstrated on different databases in comparison to other algorithms in the literature. As part of monitoring of the marine ecosystem, we developed a phytoplankton classification system, based on flow cytometric analysis. for this purpose, we proposed to characterize the phytoplanktonic cells by similarity measures using elastic comparison between their cytogram signals. Classification spectrale Contraintes de comparaison Réduction de la dimension Phytoplancton Ecosystème marin Spectral clustering Pairwise Constraints Dimensionality reduction Phytoplankton Marine ecosystem
18	Robust Image Segmentation applied to Magnetic Resonance and Ultrasound Images of the Prostate Ghose, Soumya 01 October 2012 (has links) (PDF) Prostate segmentation in trans-rectal ultrasound (TRUS) and magnetic resonance images (MRI) facilitates volume estimation, multi-modal image registration, surgical planing and image guided prostate biopsies. The objective of this thesis is to develop shape and region prior deformable models for accurate, robust and computationally efficient prostate segmentation in TRUS and MRI images. Primary contribution of this thesis is in adopting a probabilistic learning approach to achieve soft classification of the prostate for automatic initialization and evolution of a shape and region prior deformable models for prostate segmentation in TRUS images. Two deformable models are developed for the purpose. An explicit shape and region prior deformable model is derived from principal component analysis (PCA) of the contour landmarks obtained from the training images and PCA of the probability distribution inside the prostate region. Moreover, an implicit deformable model is derived from PCA of the signed distance representation of the labeled training data and curve evolution is guided by energy minimization framework of Mumford-Shah (MS) functional. Region based energy is determined from region based statistics of the posterior probabilities. Graph cut energy minimization framework is adopted for prostate segmentation in MRI. Posterior probabilities obtained in a supervised learning schema and from a probabilistic segmentation of the prostate using an at-las are fused in logarithmic domain to reduce segmentation error. Finally a graph cut energy minimization in the stochastic framework achieves prostate segmenta-tion in MRI. Statistically significant improvement in segmentation accuracies are achieved compared to some of the works in literature. Stochastic representation of the prostate region and use of the probabilities in optimization significantly improve segmentation accuracies. Prostate segmentation TRUS MRI Random forest spectral clustering
19	Solutions parallèles pour les grands problèmes de valeurs propres issus de l'analyse de graphe / Parallel solutions for large-scale eigenvalue problems arising in graph analytics Fender, Alexandre 13 December 2017 (has links) Les graphes, ou réseaux, sont des structures mathématiques représentant des relations entre des éléments. Ces systèmes peuvent être analysés dans le but d’extraire des informations sur la structure globale ou sur des composants individuels. L'analyse de graphe conduit souvent à des problèmes hautement complexes à résoudre. À grande échelle, le coût de calcul de la solution exacte est prohibitif. Heureusement, il est possible d’utiliser des méthodes d’approximations itératives pour parvenir à des estimations précises. Lesméthodes historiques adaptées à un petit nombre de variables ne conviennent pas aux matrices creuses de grande taille provenant des graphes. Par conséquent, la conception de solveurs fiables, évolutifs, et efficaces demeure un problème essentiel. L’émergence d'architectures parallèles telles que le GPU ouvre également de nouvelles perspectives avec des progrès concernant à la fois la puissance de calcul et l'efficacité énergétique. Nos travaux ciblent la résolution de problèmes de valeurs propres de grande taille provenant des méthodes d’analyse de graphe dans le but d'utiliser efficacement les architectures parallèles. Nous présentons le domaine de l'analyse spectrale de grands réseaux puis proposons de nouveaux algorithmes et implémentations parallèles. Les résultats expérimentaux indiquent des améliorations conséquentes dans des applications réelles comme la détection de communautés et les indicateurs de popularité / Graphs, or networks, are mathematical structures to represent relations between elements. These systems can be analyzed to extract information upon the comprehensive structure or the nature of individual components. The analysis of networks often results in problems of high complexity. At large scale, the exact solution is prohibitively expensive to compute. Fortunately, this is an area where iterative approximation methods can be employed to find accurate estimations. Historical methods suitable for a small number of variables could not scale to large and sparse matrices arising in graph applications. Therefore, the design of scalable and efficient solvers remains an essential problem. Simultaneously, the emergence of parallel architecture such as GPU revealed remarkable ameliorations regarding performances and power efficiency. In this dissertation, we focus on solving large eigenvalue problems a rising in network analytics with the goal of efficiently utilizing parallel architectures. We revisit the spectral graph analysis theory and propose novel parallel algorithms and implementations. Experimental results indicate improvements on real and large applications in the context of ranking and clustering problems Méthodes numériques Science de données GPU Pagerank Groupement spectral Parallélisme Numerical methods, Data science GPU Pagerank Spectral clustering Parallelism
20	Clustering Based Outlier Detection for Improved Situation Awareness within Air Traffic Control / Förbättrad översiktsbild inom flygtrafikledning med hjälp av klusterbaserad anomalidetektering Gustavsson, Hanna January 2019 (has links) The aim of this thesis is to examine clustering based outlier detection algorithms on their ability to detect abnormal events in flight traffic. A nominal model is trained on a data-set containing only flights which are labeled as normal. A detection scoring function based on the nominal model is used to decide if a new and in forehand unseen data-point behaves like the nominal model or not. Due to the unknown structure of the data-set three different clustering algorithms are examined for training the nominal model, K-means, Gaussian Mixture Model and Spectral Clustering. Depending on the nominal model different methods to obtain a detection scoring is used, such as metric distance, probability and OneClass Support Vector Machine. This thesis concludes that a clustering based outlier detection algorithm is feasible for detecting abnormal events in flight traffic. The best performance was obtained by using Spectral Clustering combined with a Oneclass Support Vector Machine. The accuracy on the test data-set was 95.8%. The algorithm managed to correctly classify 89.4% of the datapoints labeled as abnormal and correctly classified 96.2% of the datapoints labeled as normal. / Syftet med detta arbete är att undersöka huruvida klusterbaserad anomalidetektering kan upptäcka onormala händelser inom flygtrafik. En normalmodell är anpassad till data som endast innehåller flygturer som är märkta som normala. Givet denna normalmodell så anpassas en anomalidetekteringsfunktion så att data-punkter som är lika normalmodellen klassificeras som normala och data-punkter som är avvikande som anomalier. På grund av att strukturen av nomraldatan är okänd så är tre olika klustermetoder testade, K-means, Gaussian Mixture Model och Spektralklustering. Beroende på hur normalmodellen är modellerad så har olika metoder för anpassa en detekteringsfunktion används, så som baserat på avstånd, sannolikhet och slutligen genom One-class Support Vector Machine. Detta arbete kan dra slutsatsen att det är möjligt att detektera anomalier med hjälp av en klusterbaserad anomalidetektering. Den algoritm som presterade bäst var den som kombinerade spektralklustring med One-class Support Vector Machine. På test-datan så klassificerade algoritmen $95.8\%$ av all data korrekt. Av alla data-punkter som var märka som anomalier så klassificerade denna algoritm 89.4% rätt, och på de data-punkter som var märka som normala så klassificerade algoritmen 96.2% rätt. Applied Mathematics Clustering Spectral Clustering Graph Theory GMM Outlier Detection Tillämpad matematik Klustering Spektralklustering grafteori GMM anomalidetektering Mathematics Matematik

Search results