131 |
Recalage déformable à base de graphes : mise en correspondance coupe-vers-volume et méthodes contextuelles / Graph-based deformable registration : slice-to-volume mapping and context specific methodsFerrante, Enzo 03 May 2016 (has links)
Les méthodes de recalage d’images, qui ont pour but l’alignement de deux ou plusieurs images dans un même système de coordonnées, sont parmi les algorithmes les plus anciens et les plus utilisés en vision par ordinateur. Les méthodes de recalage servent à établir des correspondances entre des images (prises à des moments différents, par différents senseurs ou avec différentes perspectives), lesquelles ne sont pas évidentes pour l’œil humain. Un type particulier d’algorithme de recalage, connu comme « les méthodes de recalage déformables à l’aide de modèles graphiques » est devenu de plus en plus populaire ces dernières années, grâce à sa robustesse, sa scalabilité, son efficacité et sa simplicité théorique. La gamme des problèmes auxquels ce type d’algorithme peut être adapté est particulièrement vaste. Dans ce travail de thèse, nous proposons plusieurs extensions à la théorie de recalage déformable à l’aide de modèles graphiques, en explorant de nouvelles applications et en développant des contributions méthodologiques originales.Notre première contribution est une extension du cadre du recalage à l’aide de graphes, en abordant le problème très complexe du recalage d’une tranche avec un volume. Le recalage d’une tranche avec un volume est le recalage 2D dans un volume 3D, comme par exemple le mapping d’une tranche tomographique dans un système de coordonnées 3D d’un volume en particulier. Nos avons proposé une formulation scalable, modulaire et flexible pour accommoder des termes d'ordre élevé et de rang bas, qui peut sélectionner le plan et estimer la déformation dans le plan de manière simultanée par une seule approche d'optimisation. Le cadre proposé est instancié en différentes variantes, basés sur différentes topologies du graph, définitions de l'espace des étiquettes et constructions de l'énergie. Le potentiel de notre méthode a été démontré sur des données réelles ainsi que des données simulées dans le cadre d’une résonance magnétique d’ultrason (où le cadre d’installation et les stratégies d’optimisation ont été considérés).Les deux autres contributions inclues dans ce travail de thèse, sont liées au problème de l’intégration de l’information sémantique dans la procédure de recalage (indépendamment de la dimensionnalité des images). Actuellement, la plupart des méthodes comprennent une seule fonction métrique pour expliquer la similarité entre l’image source et l’image cible. Nous soutenons que l'intégration des informations sémantiques pour guider la procédure de recalage pourra encore améliorer la précision des résultats, en particulier en présence d'étiquettes sémantiques faisant du recalage un problème spécifique adapté à chaque domaine.Nous considérons un premier scénario en proposant un classificateur pour inférer des cartes de probabilité pour les différentes structures anatomiques dans les images d'entrée. Notre méthode vise à recaler et segmenter un ensemble d'images d'entrée simultanément, en intégrant cette information dans la formulation de l'énergie. L'idée principale est d'utiliser ces cartes estimées des étiquettes sémantiques (fournie par un classificateur arbitraire) comme un substitut pour les données non-étiquettées, et les combiner avec le recalage déformable pour améliorer l'alignement ainsi que la segmentation.Notre dernière contribution vise également à intégrer l'information sémantique pour la procédure de recalage, mais dans un scénario différent. Dans ce cas, au lieu de supposer que nous avons des classificateurs arbitraires pré-entraînés à notre disposition, nous considérons un ensemble d’annotations précis (vérité terrain) pour une variété de structures anatomiques. Nous présentons une contribution méthodologique qui vise à l'apprentissage des critères correspondants au contexte spécifique comme une agrégation des mesures de similarité standard à partir des données annotées, en utilisant une adaptation de l’algorithme « Latent Structured Support Vector Machine ». / Image registration methods, which aim at aligning two or more images into one coordinate system, are among the oldest and most widely used algorithms in computer vision. Registration methods serve to establish correspondence relationships among images (captured at different times, from different sensors or from different viewpoints) which are not obvious for the human eye. A particular type of registration algorithm, known as graph-based deformable registration methods, has become popular during the last decade given its robustness, scalability, efficiency and theoretical simplicity. The range of problems to which it can be adapted is particularly broad. In this thesis, we propose several extensions to the graph-based deformable registration theory, by exploring new application scenarios and developing novel methodological contributions.Our first contribution is an extension of the graph-based deformable registration framework, dealing with the challenging slice-to-volume registration problem. Slice-to-volume registration aims at registering a 2D image within a 3D volume, i.e. we seek a mapping function which optimally maps a tomographic slice to the 3D coordinate space of a given volume. We introduce a scalable, modular and flexible formulation accommodating low-rank and high order terms, which simultaneously selects the plane and estimates the in-plane deformation through a single shot optimization approach. The proposed framework is instantiated into different variants based on different graph topology, label space definition and energy construction. Simulated and real-data in the context of ultrasound and magnetic resonance registration (where both framework instantiations as well as different optimization strategies are considered) demonstrate the potentials of our method.The other two contributions included in this thesis are related to how semantic information can be encompassed within the registration process (independently of the dimensionality of the images). Currently, most of the methods rely on a single metric function explaining the similarity between the source and target images. We argue that incorporating semantic information to guide the registration process will further improve the accuracy of the results, particularly in the presence of semantic labels making the registration a domain specific problem.We consider a first scenario where we are given a classifier inferring probability maps for different anatomical structures in the input images. Our method seeks to simultaneously register and segment a set of input images, incorporating this information within the energy formulation. The main idea is to use these estimated maps of semantic labels (provided by an arbitrary classifier) as a surrogate for unlabeled data, and combine them with population deformable registration to improve both alignment and segmentation.Our last contribution also aims at incorporating semantic information to the registration process, but in a different scenario. In this case, instead of supposing that we have pre-trained arbitrary classifiers at our disposal, we are given a set of accurate ground truth annotations for a variety of anatomical structures. We present a methodological contribution that aims at learning context specific matching criteria as an aggregation of standard similarity measures from the aforementioned annotated data, using an adapted version of the latent structured support vector machine (LSSVM) framework.
|
132 |
Quelques applications de l’optimisation numérique aux problèmes d’inférence et d’apprentissage / Few applications of numerical optimization in inference and learningKannan, Hariprasad 28 September 2018 (has links)
Les relaxations en problème d’optimisation linéaire jouent un rôle central en inférence du maximum a posteriori (map) dans les champs aléatoires de Markov discrets. Nous étudions ici les avantages offerts par les méthodes de Newton pour résoudre efficacement le problème dual (au sens de Lagrange) d’une reformulation lisse du problème. Nous comparons ces dernières aux méthodes de premier ordre, à la fois en terme de vitesse de convergence et de robustesse au mauvais conditionnement du problème. Nous exposons donc un cadre général pour l’apprentissage non-supervisé basé sur le transport optimal et les régularisations parcimonieuses. Nous exhibons notamment une approche prometteuse pour résoudre le problème de la préimage dans l’acp à noyau. Du point de vue de l’optimisation, nous décrivons le calcul du gradient d’une version lisse de la norme p de Schatten et comment cette dernière peut être utilisée dans un schéma de majoration-minimisation. / Numerical optimization and machine learning have had a fruitful relationship, from the perspective of both theory and application. In this thesis, we present an application oriented take on some inference and learning problems. Linear programming relaxations are central to maximum a posteriori (MAP) inference in discrete Markov Random Fields (MRFs). Especially, inference in higher-order MRFs presents challenges in terms of efficiency, scalability and solution quality. In this thesis, we study the benefit of using Newton methods to efficiently optimize the Lagrangian dual of a smooth version of the problem. We investigate their ability to achieve superior convergence behavior and to better handle the ill-conditioned nature of the formulation, as compared to first order methods. We show that it is indeed possible to obtain an efficient trust region Newton method, which uses the true Hessian, for a broad range of MAP inference problems. Given the specific opportunities and challenges in the MAP inference formulation, we present details concerning (i) efficient computation of the Hessian and Hessian-vector products, (ii) a strategy to damp the Newton step that aids efficient and correct optimization, (iii) steps to improve the efficiency of the conjugate gradient method through a truncation rule and a pre-conditioner. We also demonstrate through numerical experiments how a quasi-Newton method could be a good choice for MAP inference in large graphs. MAP inference based on a smooth formulation, could greatly benefit from efficient sum-product computation, which is required for computing the gradient and the Hessian. We show a way to perform sum-product computation for trees with sparse clique potentials. This result could be readily used by other algorithms, also. We show results demonstrating the usefulness of our approach using higher-order MRFs. Then, we discuss potential research topics regarding tightening the LP relaxation and parallel algorithms for MAP inference.Unsupervised learning is an important topic in machine learning and it could potentially help high dimensional problems like inference in graphical models. We show a general framework for unsupervised learning based on optimal transport and sparse regularization. Optimal transport presents interesting challenges from an optimization point of view with its simplex constraints on the rows and columns of the transport plan. We show one way to formulate efficient optimization problems inspired by optimal transport. This could be done by imposing only one set of the simplex constraints and by imposing structure on the transport plan through sparse regularization. We show how unsupervised learning algorithms like exemplar clustering, center based clustering and kernel PCA could fit into this framework based on different forms of regularization. We especially demonstrate a promising approach to address the pre-image problem in kernel PCA. Several methods have been proposed over the years, which generally assume certain types of kernels or have too many hyper-parameters or make restrictive approximations of the underlying geometry. We present a more general method, with only one hyper-parameter to tune and with some interesting geometric properties. From an optimization point of view, we show how to compute the gradient of a smooth version of the Schatten p-norm and how it can be used within a majorization-minimization scheme. Finally, we present results from our various experiments.
|
133 |
Nonconvex Alternating Direction Optimization for Graphs : Inference and Learning / L'algorithme des directions alternées non convexe pour graphes : inférence et apprentissageLê-Huu, Dien Khuê 04 February 2019 (has links)
Cette thèse présente nos contributions àl’inférence et l’apprentissage des modèles graphiquesen vision artificielle. Tout d’abord, nous proposons unenouvelle classe d’algorithmes de décomposition pour résoudrele problème d’appariement de graphes et d’hypergraphes,s’appuyant sur l’algorithme des directionsalternées (ADMM) non convexe. Ces algorithmes sontefficaces en terme de calcul et sont hautement parallélisables.En outre, ils sont également très générauxet peuvent être appliqués à des fonctionnelles d’énergiearbitraires ainsi qu’à des contraintes de correspondancearbitraires. Les expériences montrent qu’ils surpassentles méthodes de pointe existantes sur des benchmarkspopulaires. Ensuite, nous proposons une relaxationcontinue non convexe pour le problème d’estimationdu maximum a posteriori (MAP) dans les champsaléatoires de Markov (MRFs). Nous démontrons quecette relaxation est serrée, c’est-à-dire qu’elle est équivalenteau problème original. Cela nous permet d’appliquerdes méthodes d’optimisation continue pour résoudrele problème initial discret sans perte de précisionaprès arrondissement. Nous étudions deux méthodes degradient populaires, et proposons en outre une solutionplus efficace utilisant l’ADMM non convexe. Les expériencessur plusieurs problèmes réels démontrent quenotre algorithme prend l’avantage sur ceux de pointe,dans différentes configurations. Finalement, nous proposonsune méthode d’apprentissage des paramètres deces modèles graphiques avec des données d’entraînement,basée sur l’ADMM non convexe. Cette méthodeconsiste à visualiser les itérations de l’ADMM commeune séquence d’opérations différenciables, ce qui permetde calculer efficacement le gradient de la perted’apprentissage par rapport aux paramètres du modèle.L’apprentissage peut alors utiliser une descente de gradientstochastique. Nous obtenons donc un frameworkunifié pour l’inférence et l’apprentissage avec l’ADMMnon-convexe. Grâce à sa flexibilité, ce framework permetégalement d’entraîner conjointement de-bout-en-boutun modèle graphique avec un autre modèle, telqu’un réseau de neurones, combinant ainsi les avantagesdes deux. Nous présentons des expériences sur un jeude données de segmentation sémantique populaire, démontrantl’efficacité de notre méthode. / This thesis presents our contributions toinference and learning of graph-based models in computervision. First, we propose a novel class of decompositionalgorithms for solving graph and hypergraphmatching based on the nonconvex alternating directionmethod of multipliers (ADMM). These algorithms arecomputationally efficient and highly parallelizable. Furthermore,they are also very general and can be appliedto arbitrary energy functions as well as arbitraryassignment constraints. Experiments show that theyoutperform existing state-of-the-art methods on popularbenchmarks. Second, we propose a nonconvex continuousrelaxation of maximum a posteriori (MAP) inferencein discrete Markov random fields (MRFs). Weshow that this relaxation is tight for arbitrary MRFs.This allows us to apply continuous optimization techniquesto solve the original discrete problem withoutloss in accuracy after rounding. We study two populargradient-based methods, and further propose a more effectivesolution using nonconvex ADMM. Experimentson different real-world problems demonstrate that theproposed ADMM compares favorably with state-of-theartalgorithms in different settings. Finally, we proposea method for learning the parameters of these graphbasedmodels from training data, based on nonconvexADMM. This method consists of viewing ADMM iterationsas a sequence of differentiable operations, whichallows efficient computation of the gradient of the trainingloss with respect to the model parameters, enablingefficient training using stochastic gradient descent. Atthe end we obtain a unified framework for inference andlearning with nonconvex ADMM. Thanks to its flexibility,this framework also allows training jointly endto-end a graph-based model with another model suchas a neural network, thus combining the strengths ofboth. We present experiments on a popular semanticsegmentation dataset, demonstrating the effectivenessof our method.
|
134 |
An Experimental Evaluation of Probabilistic Deep Networks for Real-time Traffic Scene Representation using Graphical Processing UnitsEl-Shaer, Mennat Allah 03 September 2019 (has links)
No description available.
|
135 |
A Machine Learning Approach to the analysis of mortality in patients with cardiovascular diseasesAldamiz Orcajo, Juan Miguel January 2021 (has links)
Cardiovascular diseases (CVDs) are the main cause of mortality worldwide, counting for a third of world demises. Consequently, early detection and underlying factors of these pathologies can play a critical role in successful treatments. Many researchers have applied machine learning (ML) for mortality risk estimation in CVDs. However, this is difficult due to their complex and multifactorial nature and the lack of large, unbiased data collections. This thesis holds statistical analysis results and a binary classification model for CVDs mortality prediction based on the ESCARVAL-RISK study, a large cohort study (54,678 patients) running from January 2008 until December 2012. This study faces highly imbalanced classes that may lead to classification models with low specificity and sensitivity. This work proposes several ways to balance classes, including hyperparameter optimization and sample techniques tested over 15 different classification algorithms to overcome the problem. While the specificity is low, the proposed approach using SHapley Additive exPlanations (SHAP) identifies factors that may be optimal targets for intensified preventive interventions. / Kardiovaskulära sjukdomar är den främsta dödsorsaken i världen och står för en tredjedel av alla dödsfall i världen. Därför kan tidig upptäckt och underliggande faktorer för dessa sjukdomar spela en avgörande roll för framgångsrika behandlingar. Många forskare har tillämpat maskininlärning (ML) för uppskattning av dödlighetsrisker vid hjärt- och kärlsjukdomar. Detta är dock svårt på grund av deras komplexa och multifaktoriella natur och bristen på stora, opartiska datainsamlingar. Denna avhandling innehåller statistiska analysresultat och en binär klassificeringsmodell för att förutsäga dödligheten i hjärt- och kärlsjukdomar baserat på ESCARVAL-RISK-studien, en stor kohortstudie (54 678 patienter) som pågick från januari 2008 till december 2012. I studien finns mycket obalanserade klasser som kan leda till klassificeringsmodeller med låg specificitet och känslighet. I detta arbete föreslås flera sätt att balansera klasserna, inklusive optimering av hyperparametrar och provtagningstekniker som testats över 15 olika klassificeringsalgoritmer för att lösa problemet. Även om specificiteten är låg identifierar den föreslagna metoden med hjälp av SHapley Additive exPlanations(SHAP) faktorer som kan vara optimala mål för intensifierade förebyggande insatser.
|
136 |
Knowledge-empowered Probabilistic Graphical Models for Physical-Cyber-Social SystemsAnantharam, Pramod 31 May 2016 (has links)
No description available.
|
137 |
Probabilistic Graphical Models: an Application in Synchronization and LocalizationGoodarzi, Meysam 16 June 2023 (has links)
Die Lokalisierung von mobilen Nutzern (MU) in sehr dichten Netzen erfordert häufig die Synchronisierung der Access Points (APs) untereinander. Erstens konzentriert sich diese Arbeit auf die Lösung des Problems der Zeitsynchronisation in 5G-Netzwerken, indem ein hybrider Bayesischer Ansatz für die Schätzung des Taktversatzes und des Versatzes verwendet wird. Wir untersuchen und demonstrieren den beträchtlichen Nutzen der Belief Propagation (BP), die auf factor graphs läuft, um eine präzise netzwerkweite Synchronisation zu erreichen. Darüber hinaus nutzen wir die Vorteile der Bayesischen Rekursiven Filterung (BRF), um den Zeitstempel-Fehler bei der paarweisen Synchronisierung zu verringern. Schließlich zeigen wir die Vorzüge der hybriden Synchronisation auf, indem wir ein großes Netzwerk in gemeinsame und lokale Synchronisationsdomänen unterteilen und so den am besten geeigneten Synchronisationsalgorithmus (BP- oder BRF-basiert) auf jede Domäne anwenden können.
Zweitens schlagen wir einen Deep Neural Network (DNN)-gestützten Particle Filter-basierten (DePF)-Ansatz vor, um das gemeinsame MU-Sync&loc-Problem zu lösen. Insbesondere setzt DePF einen asymmetrischen Zeitstempel-Austauschmechanismus zwischen den MUs und den APs ein, der Informationen über den Taktversatz, die Zeitverschiebung der MUs, und die AP-MU Abstand liefert. Zur Schätzung des Ankunftswinkels des empfangenen Synchronisierungspakets nutzt DePF den multiple signal classification Algorithmus, der durch die Channel Impulse Response (CIR) der Synchronisierungspakete gespeist wird. Die CIR wird auch genutzt, um den Verbindungszustand zu bestimmen, d. h. Line-of-Sight (LoS) oder Non-LoS (NLoS). Schließlich nutzt DePF particle Gaussian mixtures, die eine hybride partikelbasierte und parametrische BRF-Fusion der vorgenannten Informationen ermöglichen und die Position und die Taktparameter der MUs gemeinsam schätzen. / Mobile User (MU) localization in ultra dense networks often requires, on one hand, the Access Points (APs) to be synchronized among each other, and, on the other hand, the MU-AP synchronization. In this work, we firstly address the former, which eventually provides a basis for the latter, i.e., for the joint MU synchronization and localization (sync&loc). In particular, firstly, this work focuses on tackling the time synchronization problem in 5G networks by adopting a hybrid Bayesian approach for clock offset and skew estimation. Specifically, we investigate and demonstrate the substantial benefit of Belief Propagation (BP) running on Factor Graphs (FGs) in achieving precise network-wide synchronization. Moreover, we take advantage of Bayesian Recursive Filtering (BRF) to mitigate the time-stamping error in pairwise synchronization. Finally, we reveal the merit of hybrid synchronization by dividing a large-scale network into common and local synchronization domains, thereby being able to apply the most suitable synchronization algorithm (BP- or BRF-based) on each domain.
Secondly, we propose a Deep Neural Network (DNN)-assisted Particle Filter-based (DePF) approach to address the MU joint sync&loc problem. In particular, DePF deploys an asymmetric time-stamp exchange mechanism between the MUs and the APs, which provides information about the MUs' clock offset, skew, and AP-MU distance. In addition, to estimate the Angle of Arrival (AoA) of the received synchronization packet, DePF draws on the Multiple Signal Classification (MUSIC) algorithm that is fed by the Channel Impulse Response (CIR) experienced by the sync packets. The CIR is also leveraged on to determine the link condition, i.e. Line-of-Sight (LoS) or Non-LoS (NLoS). Finally DePF capitalizes on particle Gaussian mixtures which allow for a hybrid particle-based and parametric BRF fusion of the aforementioned pieces of information and jointly estimate the position and clock parameters of the MUs.
|
138 |
Probabilistic models for quality control in environmental sensor networksDereszynski, Ethan W. 04 June 2012 (has links)
Networks of distributed, remote sensors are providing ecological scientists with a view of our environment that is unprecedented in detail. However, these networks are subject to harsh conditions, which lead to malfunctions in individual sensors and failures in network communications. This behavior manifests as corrupt or missing measurements in the data. Consequently, before the data can be used in ecological models, future experiments, or even policy decisions, it must be quality controlled (QC'd) to flag affected measurements and impute corrected values. This dissertation describes a probabilistic modeling approach for real-time automated QC that exploits the spatial and temporal correlations in the data to distinguish sensor failures from valid observations. The model adapts to a site by learning a Bayesian network structure that captures spatial relationships among sensors, and then extends this structure to a dynamic Bayesian network to incorporate temporal correlations. The final QC model contains both discrete and continuous variables, which makes inference intractable for large sensor networks. Consequently, we examine the performance of three approximate methods for inference in this probabilistic framework. Two of these algorithms represent contemporary approaches to inference in hybrid models, while the third is a greedy search-based method of our own design. We demonstrate the results of these algorithms on synthetic datasets and real environmental sensor data gathered from an ecological sensor network located in western Oregon. Our results suggest that we can improve performance over networks with less sensors that use exhaustive asynchronic inference by including additional sensors and applying approximate algorithms. / Graduation date: 2013
|
139 |
Probabilistic models in noisy environments : and their application to a visual prosthesis for the blindArchambeau, Cédric 26 September 2005 (has links)
In recent years, probabilistic models have become fundamental techniques in machine learning. They are successfully applied in various engineering problems, such as robotics, biometrics, brain-computer interfaces or artificial vision, and will gain in importance in the near future. This work deals with the difficult, but common situation where the data is, either very noisy, or scarce compared to the complexity of the process to model. We focus on latent variable models, which can be formalized as probabilistic graphical models and learned by the expectation-maximization algorithm or its variants (e.g., variational Bayes).<br>
After having carefully studied a non-exhaustive list of multivariate kernel density estimators, we established that in most applications locally adaptive estimators should be preferred. Unfortunately, these methods are usually sensitive to outliers and have often too many parameters to set. Therefore, we focus on finite mixture models, which do not suffer from these drawbacks provided some structural modifications.<br>
Two questions are central in this dissertation: (i) how to make mixture models robust to noise, i.e. deal efficiently with outliers, and (ii) how to exploit side-channel information, i.e. additional information intrinsic to the data. In order to tackle the first question, we extent the training algorithms of the popular Gaussian mixture models to the Student-t mixture models. the Student-t distribution can be viewed as a heavy-tailed alternative to the Gaussian distribution, the robustness being tuned by an extra parameter, the degrees of freedom. Furthermore, we introduce a new variational Bayesian algorithm for learning Bayesian Student-t mixture models. This algorithm leads to very robust density estimators and clustering. To address the second question, we introduce manifold constrained mixture models. This new technique exploits the information that the data is living on a manifold of lower dimension than the dimension of the feature space. Taking the implicit geometrical data arrangement into account results in better generalization on unseen data.<br>
Finally, we show that the latent variable framework used for learning mixture models can be extended to construct probabilistic regularization networks, such as the Relevance Vector Machines. Subsequently, we make use of these methods in the context of an optic nerve visual prosthesis to restore partial vision to blind people of whom the optic nerve is still functional. Although visual sensations can be induced electrically in the blind's visual field, the coding scheme of the visual information along the visual pathways is poorly known. Therefore, we use probabilistic models to link the stimulation parameters to the features of the visual perceptions. Both black-box and grey-box models are considered. The grey-box models take advantage of the known neurophysiological information and are more instructive to medical doctors and psychologists.<br>
|
140 |
Data Mining Meets HCI: Making Sense of Large GraphsChau, Dueng Horng 01 July 2012 (has links)
We have entered the age of big data. Massive datasets are now common in science, government and enterprises. Yet, making sense of these data remains a fundamental challenge. Where do we start our analysis? Where to go next? How to visualize our findings?
We answers these questions by bridging Data Mining and Human- Computer Interaction (HCI) to create tools for making sense of graphs with billions of nodes and edges, focusing on:
(1) Attention Routing: we introduce this idea, based on anomaly detection, that automatically draws people’s attention to interesting areas of the graph to start their analyses. We present three examples: Polonium unearths malware from 37 billion machine-file relationships; NetProbe fingers bad guys who commit auction fraud.
(2) Mixed-Initiative Sensemaking: we present two examples that combine machine inference and visualization to help users locate next areas of interest: Apolo guides users to explore large graphs by learning from few examples of user interest; Graphite finds interesting subgraphs, based on only fuzzy descriptions drawn graphically.
(3) Scaling Up: we show how to enable interactive analytics of large graphs by leveraging Hadoop, staging of operations, and approximate computation.
This thesis contributes to data mining, HCI, and importantly their intersection, including: interactive systems and algorithms that scale; theories that unify graph mining approaches; and paradigms that overcome fundamental challenges in visual analytics.
Our work is making impact to academia and society: Polonium protects 120 million people worldwide from malware; NetProbe made headlines on CNN, WSJ and USA Today; Pegasus won an opensource software award; Apolo helps DARPA detect insider threats and prevent exfiltration.
We hope our Big Data Mantra “Machine for Attention Routing, Human for Interaction” will inspire more innovations at the crossroad of data mining and HCI.
|
Page generated in 0.0814 seconds