Global ETD Search

41	Statistical Learning and Analysis on Homology-Based Features / Statistisk analys och maskininlärning med homologibaserad data Agerberg, Jens January 2020 (has links) Stable rank has recently been proposed as an invariant to encode the result of persistent homology, a method used in topological data analysis. In this thesis we develop methods for statistical analysis as well as machine learning methods based on stable rank. As stable rank may be viewed as a mapping to a Hilbert space, a kernel can be constructed from the inner product in this space. First, we investigate this kernel in the context of kernel learning methods such as support-vector machines. Next, using the theory of kernel embedding of probability distributions, we give a statistical treatment of the kernel by showing some of its properties and develop a two-sample hypothesis test based on the kernel. As an alternative approach, a mapping to a Euclidean space with learnable parameters can be conceived, serving as an input layer to a neural network. The developed methods are first evaluated on synthetic data. Then the two-sample hypothesis test is applied on the OASIS open access brain imaging dataset. Finally a graph classification task is performed on a dataset collected from Reddit. / Stable rank har föreslagits som en sammanfattning på datanivå av resultatet av persistent homology, en metod inom topologisk dataanalys. I detta examensarbete utvecklar vi metoder inom statistisk analys och maskininlärning baserade på stable rank. Eftersom stable rank kan ses som en avbildning i ett Hilbertrum kan en kärna konstrueras från inre produkten i detta rum. Först undersöker vi denna kärnas egenskaper när den används inom ramen för maskininlärningsmetoder som stödvektormaskin (SVM). Därefter, med grund i teorin för inbäddning av sannolikhetsfördelningar i reproducing kernel Hilbertrum, undersöker vi hur kärnan kan användas för att utveckla ett test för statistisk hypotesprövning. Slutligen, som ett alternativ till metoder baserade på kärnor, utvecklas en avbildning i ett euklidiskt rum med optimerbara parametrar, som kan användas som ett ingångslager i ett neuralt nätverk. Metoderna utvärderas först på syntetisk data. Vidare utförs ett statistiskt test på OASIS, ett öppet dataset inom neuroradiologi. Slutligen utvärderas metoderna på klassificering av grafer, baserat på ett dataset insamlat från Reddit. / <p>QC 20200523</p> Applied topology topological data analysis TDA persistent homology machine learning Topologisk dataanalys maskininlärning Probability Theory and Statistics Sannolikhetsteori och statistik
42	Unveiling patterns in data: harnessing computational topology in machine learning Soham Mukherjee (17874230) 31 January 2024 (has links) <p dir="ltr">Topological Data Analysis (TDA) with its roots embedded in the field of algebraic topology has successfully found its applications in computational biology, drug discovery, machine learning and in many diverse areas of science. One of its cornerstones, persistent homology, captures topological features latent in the data. Recent progress in TDA allows us to integrate these finer topological features into traditional machine learning and deep learning pipelines. However, the utilization of topological methods within a conventional deep learning framework remains relatively uncharted. This thesis presents four scenarios where computational topology tools are employed to advance machine learning.</p><p dir="ltr">The first one involves integrating persistent homology to explore high-dimensional cytometry data. The second one incorporates Extended persistence in a supervised graph classification framework and demonstrates leveraging TDA in cases where data naturally aligns with higher-order elements by extending graph neural networks to higher-order networks, applied specifically in non-manifold mesh classification. The third and fourth scenarios delve into enhancing graph neural networks through multiparameter persistence.</p> Neural networks topological data analysis methods Persistent homology Graph Neural Networks (GNNs) Multiparameter persistence
43	Numerical algorithms for the mathematics of information Mendoza-Smith, Rodrigo January 2017 (has links) This thesis presents a series of algorithmic innovations in Combinatorial Compressed Sensing and Persistent Homology. The unifying strategy across these contributions is in translating structural patterns in the underlying data into specific algorithmic designs in order to achieve: better guarantees in computational complexity, the ability to operate on more complex data, highly efficient parallelisations, or any combination of these.
44	Low-dimensional data analysis and clustering by means of Delaunay triangulation / Analyse et clustering de données en basse dimension par triangulation de Delaunay Razafindramanana, Octavio 05 December 2014 (has links) Les travaux présentés et discutés dans cette thèse ont pour objectif de proposer plusieurs solutions au problème de l’analyse et du clustering de nuages de points en basse dimension. Ces solutions s’appuyent sur l’analyse de triangulations de Delaunay. Deux types d’approches sont présentés et discutés. Le premier type suit une approche en trois-passes classique: 1) la construction d’un graphe de proximité contenant une information topologique, 2) la construction d’une information statistique à partir de ce graphe et 3) la suppression d’éléments inutiles au regard de cette information statistique. L’impact de différentes measures sur le clustering ainsi que sur la reconnaissance de caractères est discuté. Ces mesures s’appuyent sur l’exploitation du complexe simplicial et non pas uniquement sur celle du graphe. Le second type d’approches est composé d’approches en une passe extrayant des clusters en même temps qu’une triangulation de Delaunay est construite. / This thesis aims at proposing and discussing several solutions to the problem of low-dimensional point cloudanalysis and clustering. These solutions are based on the analysis of the Delaunay triangulation.Two types of approaches are presented and discussed. The first one follows a classical three steps approach:1) the construction of a proximity graph that embeds topological information, 2) the construction of statisticalinformation out of this graph and 3) the removal of pointless elements regarding this information. The impactof different simplicial complex-based measures, i.e. not only based on a graph, is discussed. Evaluation is madeas regards point cloud clustering quality along with handwritten character recognition rates. The second type ofapproaches consists of one-step approaches that derive clustering along with the construction of the triangulation. Analyse de données Clustering Graphes de proximité Triangulation de Delaunay Analyse de données topologique Data analysis Point cloud clustering Proximity graph Delaunay triangulation Topological data analysis
45	Similarity between Scalar Fields Narayanan, Vidya January 2016 (has links) (PDF) Scientific phenomena are often studied through collections of related scalar fields such as data generated by simulation experiments that are parameter or time dependent . Exploration of such data requires robust measures to compare them in a feature aware and intuitive manner. Topological data analysis is a growing area that has had success in analyzing and visualizing scalar fields in a feature aware manner based on the topological features. Various data structures such as contour and merge trees, Morse-Smale complexes and extremum graphs have been developed to study scalar fields. The extremum graph is a topological data structure based on either the maxima or the minima of a scalar field. It preserves local geometrical structure by maintaining relative locations of extrema and their neighborhoods. It provides a suitable abstraction to study a collection of datasets where features are expressed by descending or ascending manifolds and their proximity is of importance. In this thesis, we design a measure to understand the similarity between scalar fields based on the extremum graph abstraction. We propose a topological structure called the complete extremum graph and define a distance measure on it that compares scalar fields in a feature aware manner. We design an algorithm for computing the distance and show its applications in analyzing time varying data such as understanding periodicity, feature correspondence and tracking, and identifying key frames. Scalar Fields Topological Data Analysis Extremum Graphs Complete Extremum Graphs Scalar Field Theory Similarity Distance Measures Scalar Field Computer Graphics Computational Topology Computational Geometry Mathematical Physics
46	Two Problems in Applied Topology Nathanael D Cox (11008509) 23 July 2021 (has links) <div>In this thesis, we present two main results in applied topology.</div><div> In our first result, we describe an algorithm for computing a semi-algebraic description of the quotient map of a proper semi-algebraic equivalence relation given as input. The complexity of the algorithm is doubly exponential in terms of the size of the polynomials describing the semi-algebraic set and equivalence relation.</div><div> In our second result, we use the fact that homology groups of a simplicial complex are isomorphic to the space of harmonic chains of that complex to obtain a representative cycle for each homology class. We then establish stability results on the harmonic chain groups.</div> Topological Data Analysis Topology semi-algebraic sets Quotient Algorithm
47	NONLINEAR DIFFUSIONS ON GRAPHS FOR CLUSTERING, SEMI-SUPERVISED LEARNING AND ANALYZING PREDICTIONS Meng Liu (14075697) 09 November 2022 (has links) <p>Graph diffusion is the process of spreading information from one or few nodes to the rest of the graph through edges. The resulting distribution of the information often implies latent structure of the graph where nodes more densely connected can receive more signal. This makes graph diffusions a powerful tool for local clustering, which is the problem of finding a cluster or community of nodes around a given set of seeds. Most existing literatures on using graph diffusions for local graph clustering are linear diffusions as their dynamics can be fully interpreted through linear systems. They are also referred as eigenvector, spectral, or random walk based methods. While efficient, they often have difficulty capturing the correct boundary of a target label or target cluster. On the contrast, maxflow-mincut based methods that can be thought as 1-norm nonlinear variants of the linear diffusions seek to "improve'' or "refine'' a given cluster and can often capture the boundary correctly. However, there is a lack of literature to adopt them for problems such as community detection, local graph clustering, semi-supervised learning, etc. due to the complexity of their formulation. We addressed these issues by performing extensive numerical experiments to demonstrate the performance of flow-based methods in graphs from various sources. We also developed an efficient LocalGraphClustering Python Package that allows others to easily use these methods in their own problems. While studying these flow-based methods, we find that they cannot grow from small seed set. Although there are hybrid procedures that incorporate ideas from both linear diffusions and flow-based methods, they have many hard to set parameters. To tackle these issues, we propose a simple generalization of the objective function behind linear diffusion and flow-based methods which we call generalized local graph min-cut problem. We further show that by involving p-norm in this cut problem, we can develop a nonlinear diffusion procedure that can find local clusters from small seed set and capture the correct boundary simultaneously. Our method can be thought as a nonlinear generalization of the Anderson-Chung-Lang push procedure to approximate a personalized PageRank vector efficiently and is a strongly local algorithm-one whose runtime depends on the size of the output rather than the size of the graph. We also show that the p-norm cut functions improve on the standard Cheeger inequalities for linear diffusion methods. We further extend our generalized local graph min-cut problem and the corresponding diffusion solver to hypergraph-based machine learning problems. Although many methods for local graph clustering exist, there are relatively few for localized clustering in hypergraphs. Moreover, those that exist often lack flexibility to model a general class of hypergraph cut functions or cannot scale to large problems. Our new hypergraph diffusion method on the other hand enables us to compute with a wide variety of cardinality-based hypergraph cut functions and still maintains the strongly local property. We also show that the clusters found by solving the new objective function satisfy a Cheeger-like quality guarantee.</p> <p>Besides clustering, recent work on graph-based learning often focuses on node embeddings and graph neural networks. Although these GNN based methods can beat traditional ones especially when node attributes data is available, it is challenging to understand them because they are highly over-parameterized. To solve this issue, we propose a novel framework that combines topological data analysis and diffusion to transform the complex prediction space into human understandable pictures. The method can be applied to other datasets not in graph formats and scales up to large datasets across different domains and enable us to find many useful insights about the data and the model.</p> Graph, social and multimedia data Topology topological data analysis clustering pagerank semi-supervised learning visualization neural networks diffusions graph social network hypergraph
48	An investigation of average stable ranks : On plane geometric objects and financial transaction data / En undersökning av den genomsnittliga stabila rangen hos plana geometriska figurer och finansiella transaktioner Odelius, Linn January 2020 (has links) This thesis concerns the topological features of plane geometric shapes and financial transaction data. Topological properties of the data such as homology groups and their stable ranks are analysed. It is investigated how to mathematically describe differences between data sets and it is found that stable ranks can be used to capture these differences. Sub sampling is introduced as a way to apply stochastic methods to geometric structures. It is found that the average stable rank can be used to differentiate data sets. Furthermore, the sensitivity of average stable ranks to random noise is explored and it is studied how a single point changes the average stable ranks of geometric shapes and financial transaction data. A method to incorporate categorical data within the analysis is introduced. The theory is applied to financial transaction data with the objective to understand if there are topological differences between fraudulent and legit transactions which can be used to classify them. / I denna uppsats analyseras finansiell transaktionsdata samt plana geometriska objekt med hjälp av verktyg inom Topologisk Dataanalys. Topologiska egenskaper såsom homologi samt stabil rang analyseras och det undersöks hur en matematiskt kan beskriva skillnaden mellan geometriska objekt. Det visar sig att simplistiska komplex och dess motsvarande stabila rang kan användas för att beskriva dessa skillnader. Det undersöks även hur stokastiska metoder kan appliceras på geometrisk data och begreppet genomsnittlig stabil rang introduceras. Känsligheten för brus hos den genomsnittliga stabila rangen undersöks för plana objekt och det undersöks hur den genomsnittliga stabila rangen av en datamängd ändras om en datapunkt läggs till. En metod för att beskriva avstånd på kategorisk data introduceras eftersom analysen av stabil rang kräver ett definierat avstånd mellan datapunkter. Det undersöks huruvida det finns topologiska skillnader mellan bedrägliga och icke-bedrägliga transaktioner, samt om det finns skillnader mellan olika typer av bedrägliga transaktioner. Topological data analysis TDA Mathematics Stable rank Topology Data analysis Financial transactions Topologisk dataanalys Topologi TDA Matematik Teoretisk Matematik Dataanalys Finans Stabil rang Mathematics Matematik
49	Quantifying Traffic Congestion in Nairobi / Kvantifiering av trafik i Nairobi Bojs, Eric January 2020 (has links) This thesis aims to give insight into a novel approach for quantifying car traffic in developing cities. This is necessary to improve efficiency in resource allocation for improvements in infrastructure. The project took form of a case study of neighborhoods in the city of Nairobi, Kenya. The approach consists of a method which relies on topics from the field of Topological Data Analysis, together with the use of large data sources from taxi services in the city. With this, both qualitative and quantitative insight can be given about the traffic. The method was proven useful for understanding how traffic spreads, and to differentiate between levels of congestion: quantifying it. However, it failed to detect the effect of previous improvements of infrastructure. / Målet med rapporten är att ge insikt i en innovativ ansats för att kvantifiera biltrafik i utvecklingsstäder. Detta kommer som en nödvändighet för att kunna förbättra resursfördelning i utvecklandet av infrastruktur. Projektet utspelade sig som en fallstudie där stadsdelar i Nairobi, Kenya studerades. Ansatsen innefattar en metod som bygger på tekniker från topologisk dataanalys (eng. \textit{Topological Data Analysis}), tillsammans med stora datakällor från taxitjänster i staden. Detta hoppas ge både kvalitativ och kvantitativ information om trafiken i staden. Metoden visade sig vara användbar för att förstå hur trafik sprider sig och att differentiera mellan nivåer av trafik, alltså att kvantifiera den. Tyvärr så misslyckades metoden visa sig användbar för att mäta förbättringar i infrastruktur. Topological Data Analysis Traffic congestion Quantification Big Data Smart Cities Topologisk dataanalys trafik kvantifiering big data smarta städer Probability Theory and Statistics Sannolikhetsteori och statistik
50	Topologia computacional para análise de série temporal / Computational topology for time series analysis Miranda, Vanderlei Luiz Daneluz 13 March 2019 (has links) Mudanças de padrão são variações nos dados da série temporal. Tais mudanças podem representar transições que ocorrem entre estados. A análise de dados topológicos (TDA) permite uma caracterização de dados de séries temporais obtidos a partir de sistemas dinâmicos complexos. Neste trabalho, apresentamos uma técnica de detecção de mudança de padrão baseada em TDA. Especificamente, a partir de uma determinada série temporal, dividimos o sinal em janelas deslizantes sem sobreposição e para cada janela calculamos a homologia persistente, ou seja, o barcode associado. A partir desse barcode, o intervalo médio e a entropia persistente são calculados e plotados em relação à duração do sinal. Resultados experimentais em conjuntos de dados reais e artificiais mostram bons resultados do método proposto: 1) Detecta mudança de padrões identificando a mudança no intervalo médio e calculando a entropia persistente para os barcodes gerados pelo conjunto de dados de entrada. 2) Mostra qualitativamente quão sensível é a escolha do método de filtragem para evidenciar características topológicas do espaço original sob exame. Isto é conseguido usando duas filtragens: uma filtragem métrica e uma do tipo lower-star. 3) Variando o tamanho da janela, o método pode caracterizar a presença de estruturas locais do conjunto de dados, como o período de convulsão nos sinais EEG. 4) O método proposto é capaz de caracterizar a complexidade pela medida de entropia persistente dos barcodes, uma medida de entropia baseada na definição de entropia de Shannon. Além disso, neste trabalho, mostramos a evidência de mudanças de complexidade associadas a um período de convulsão de um sinal de EEG / Pattern changings are variations in time series data. Such changes may represent transitions that occur between states. Topological data analysis (TDA) allows characterization of time-series data obtained from complex dynamical systems. In this work, we present a pattern changing detection technique based on TDA. Specifically, starting from a given time series, we divide the signal in slicing windows with no overlapping and for each window we calculate the persistent homology, i.e., the associated barcode. From the barcode the average interval size and persistent entropy are calculated and plotted against the signal duration. Experimental results on artificial and real data sets show good results of the proposed method: 1) It detects pattern changing by identifying the change in the average interval size and calculated persistent entropy for the barcodes generated by the input data set. 2) It shows qualitatively how sensible the choice of filtration method is to evidence topological features of the original space under examination. This is accomplished by using two filtrations: a metric and a lower-star filtration. 3) By varying the slice window size, the method can characterize the presence of local structures of the data set such as the seizure period in EEG signals. 4) The proposed method can characterize complexity by the measure persistent entropy for barcodes, an entropy measure based on Shannon´s entropy definition. Moreover, in this work, we show the evidence of complexity changes associated with a seizure period of an EEG signal Análise de série temporal Análise topológica de dados Complex networks Complexidade Complexity Entropia persistente Homologia persistente Mudança de padrão Pattern changing detection Persistent entropy Persistent homology Redes complexas Time series analysis Topological data analysis

Search results