Global ETD Search

131	[en] A MODEL-BASED FRAMEWORK FOR SEMI-SUPERVISED CLUSTERING AND COMMUNITY DETECTION / [pt] UM FRAMEWORK BASEADO EM MODELO PARA CLUSTERIZAÇÃO SEMISSUPERVISIONADA E DETECÇÃO DE COMUNIDADES DANIEL LEMES GRIBEL 09 September 2021 (has links) [pt] Em clusterização baseada em modelos, o objetivo é separar amostras de dados em grupos significativos, otimizando a aderência dos dados observados a um modelo matemático. A recente adoção de clusterização baseada em modelos tem permitido a profissionais e usuários mapearem padrões complexos nos dados e explorarem uma ampla variedade de aplicações. Esta tese investiga abordagens orientadas a modelos para detecção de comunidades e para o estudo de clusterização semissupervisionada, adotando uma perspectiva baseada em máxima verossimilhança. Focamos primeiramente na exploração de técnicas de otimização com restrições para apresentar um novo modelo de detecção de comunidades por meio de modelos de blocos estocásticos (SBMs). Mostramos que a formulação com restrições revela comunidades estruturalmente diferentes daquelas obtidas com modelos clássicos. Em seguida, estudamos um cenário onde anotações imprecisas são fornecidas na forma de relações must-link e cannot-link, e propomos um modelo de clusterização semissupervisionado. Nossa análise experimental mostra que a incorporação de supervisão parcial e de conhecimento prévio melhoram significativamente os agrupamentos. Por fim, examinamos o problema de clusterização semissupervisionada na presença de rótulos de classe não confiáveis. Investigamos o caso em que grupos de anotadores deliberadamente classificam incorretamente as amostras de dados e propomos um modelo para lidar com tais anotações incorretas. / [en] In model-based clustering, we aim to separate data samples into meaningful groups by optimizing the fit of some observed data to a mathematical model. The recent adoption of model-based clustering has allowed practitioners to model complex patterns in data and explore a wide range of applications. This thesis investigates model-driven approaches for community detection and semisupervised clustering by adopting a maximum-likelihood perspective. We first focus on exploiting constrained optimization techniques to present a new model for community detection with stochastic block models (SBMs). We show that the proposed constrained formulation reveals communities structurally different from those obtained with classical community detection models. We then study a setting where inaccurate annotations are provided as must-link and cannot-link relations, and propose a novel semi-supervised clustering model. Our experimental analysis shows that incorporating partial supervision and appropriately encoding prior user knowledge significantly enhance clustering performance. Finally, we examine the problem of semi-supervised clustering in the presence of unreliable class labels. We focus on the case where groups of untrustworthy annotators deliberately misclassify data samples and propose a model to handle such incorrect statements. [pt] APRENDIZADO DE MAQUINA [pt] MODELOS DE BLOCOS ESTOCASTICOS [pt] AGRUPAMENTO SEMISSUPERVISIONADO [pt] DETECCAO DE COMUNIDADES [pt] AGRUPAMENTO [pt] OTIMIZACAO [pt] MINERACAO DE DADOS [en] MACHINE LEARNING [en] STOCHASTICK BLOCK MODELS [en] SEMISUPERVISED CLUSTERING [en] COMMUNITY DETECTION [en] GROUPING [en] OPTIMIZATION [en] DATA MINING
132	Social Graph Anonymization / Anonymisation de graphes sociaux Nguyen, Huu-Hiep 04 November 2016 (has links) La vie privée est une préoccupation des utilisateurs des réseaux sociaux. Les réseaux sociaux sont une source de données précieuses pour des analyses scientifiques ou commerciales. Cette thèse aborde trois problèmes de confidentialité des réseaux sociaux: l'anonymisation de graphes sociaux, la détection de communautés privées et l'échange de liens privés. Nous abordons le problème d'anonymisation de graphes via la sémantique de l'incertitude et l'intimité différentielle. Pour la première, nous proposons un modèle général appelé Uncertain Adjacency Matrix (UAM) qui préserve dans le graphe anonymisé les degrés des nœuds du graphe non-anonymisé. Nous analysons deux schémas proposés récemment et montrons leur adaptation dans notre modèle. Nous aussi présentons notre approche dite MaxVar. Pour la technique d'intimité différentielle, le problème devient difficile en raison de l'énorme espace des graphes anonymisés possibles. Un grand nombre de systèmes existants ne permettent pas de relâcher le budget contrôlant la vie privée, ni de déterminer sa borne supérieure. Dans notre approche nous pouvons calculer cette borne. Nous introduisons le nouveau schéma Top-m-Filter de complexité linéaire et améliorons la technique récente EdgeFlip. L'évaluation de ces algorithmes sur une large gamme de graphes donne un panorama de l'état de l'art. Nous présentons le problème original de la détection de la communauté dans le cadre de l'intimité différentielle. Nous analysons les défis majeurs du problème et nous proposons quelques approches pour les aborder sous deux angles: par perturbation d'entrée (schéma LouvainDP) et par perturbation d'algorithme (schéma ModDivisive) / Privacy is a serious concern of users in daily usage of social networks. Social networks are a valuable data source for large-scale studies on social organization and evolution and are usually published in anonymized forms. This thesis addresses three privacy problems of social networks: graph anonymization, private community detection and private link exchange. First, we tackle the problem of graph anonymization via uncertainty semantics and differential privacy. As for uncertainty semantics, we propose a general obfuscation model called Uncertain Adjacency Matrix (UAM) that keep expected node degrees equal to those in the unanonymized graph. We analyze two recently proposed schemes and show their fitting into the model. We also present our scheme Maximum Variance (MaxVar) to fill the gap between them. Using differential privacy, the problem is very challenging because of the huge output space of noisy graphs. A large body of existing schemes on differentially private release of graphs are not consistent with increasing privacy budgets as well as do not clarify the upper bounds of privacy budgets. In this thesis, such a bound is provided. We introduce the new linear scheme Top-m-Filter (TmF) and improve the existing technique EdgeFlip. Thorough comparative evaluation on a wide range of graphs provides a panorama of the state-of-the-art's performance as well as validates our proposed schemes. Second, we present the problem of community detection under differential privacy. We analyze the major challenges behind the problem and propose several schemes to tackle them from two perspectives: input perturbation (LouvainDP) and algorithm perturbation (ModDivisive) Réseaux sociaux Incertaine matrice d'adjacence Maximum Variance Vie privée différentielle Top-M-Filte Détection de communautés LouvainDP ModDivisive Échange intime des liens $(\alpha \beta)$-Échange Social networks Uncertain Adjacency Matrix Maximum Variance Differential privacy Top-M-Filter Community detection LouvainDP ModDivisive Private link exchange $(\alpha \beta)$-Exchange 005.8
133	Spectral inference methods on sparse graphs : theory and applications / Méthodes spectrales d'inférence sur des graphes parcimonieux : théorie et applications Saade, Alaa 03 October 2016 (has links) Face au déluge actuel de données principalement non structurées, les graphes ont démontré, dans une variété de domaines scientifiques, leur importance croissante comme language abstrait pour décrire des interactions complexes entre des objets complexes. L’un des principaux défis posés par l’étude de ces réseaux est l’inférence de propriétés macroscopiques à grande échelle, affectant un grand nombre d’objets ou d’agents, sur la seule base des interactions microscopiquesqu’entretiennent leurs constituants élémentaires. La physique statistique, créée précisément dans le but d’obtenir les lois macroscopiques de la thermodynamique à partir d’un modèle idéal de particules en interaction, fournit une intuition décisive dans l’étude des réseaux complexes.Dans cette thèse, nous utilisons des méthodes issues de la physique statistique des systèmes désordonnés pour mettre au point et analyser de nouveaux algorithmes d’inférence sur les graphes. Nous nous concentrons sur les méthodes spectrales, utilisant certains vecteurs propres de matrices bien choisies, et sur les graphes parcimonieux, qui contiennent une faible quantité d’information. Nous développons une théorie originale de l’inférence spectrale, fondée sur une relaxation de l’optimisation de certaines énergies libres en champ moyen. Notre approche est donc entièrement probabiliste, et diffère considérablement des motivations plus classiques fondées sur l’optimisation d’une fonction de coût. Nous illustrons l’efficacité de notre approchesur différents problèmes, dont la détection de communautés, la classification non supervisée à partir de similarités mesurées aléatoirement, et la complétion de matrices. / In an era of unprecedented deluge of (mostly unstructured) data, graphs are proving more and more useful, across the sciences, as a flexible abstraction to capture complex relationships between complex objects. One of the main challenges arising in the study of such networks is the inference of macroscopic, large-scale properties affecting a large number of objects, based solely on he microscopic interactions between their elementary constituents. Statistical physics, precisely created to recover the macroscopic laws of thermodynamics from an idealized model of interacting particles, provides significant insight to tackle such complex networks.In this dissertation, we use methods derived from the statistical physics of disordered systems to design and study new algorithms for inference on graphs. Our focus is on spectral methods, based on certain eigenvectors of carefully chosen matrices, and sparse graphs, containing only a small amount of information. We develop an original theory of spectral inference based on a relaxation of various meanfield free energy optimizations. Our approach is therefore fully probabilistic, and contrasts with more traditional motivations based on the optimization of a cost function. We illustrate the efficiency of our approach on various problems, including community detection, randomized similarity-based clustering, and matrix completion. Opérateur non retraçant Hessienne de Bethe Méthodes spectrales Détection de communautés Partitionnement spectral Complétion de matrices Modèles graphiques Inférence bayésienne Approximations de champ moyen Systèmes désordonnés Propagation des convictions Algorithmes de passage de messages Non-backtracking operator Bethe Hessian Spectral methods Community detection Spectral clustering Matrix completion Graphical models Bayesian inference Meanfield approximations Disordered systems Belief propagation Message-passing algorithms 530
134	Sur deux problèmes d’apprentissage automatique : la détection de communautés et l’appariement adaptatif / On two problems in machine learning : community detection and adaptive matching Gulikers, Lennart 13 November 2017 (has links) Dans cette thèse, nous étudions deux problèmes d'apprentissage automatique : (I) la détection des communautés et (II) l'appariement adaptatif. I) Il est bien connu que beaucoup de réseaux ont une structure en communautés. La détection de ces communautés nous aide à comprendre et exploiter des réseaux de tout genre. Cette thèse considère principalement la détection des communautés par des méthodes spectrales utilisant des vecteurs propres associés à des matrices choisiesavec soin. Nous faisons une analyse de leur performance sur des graphes artificiels. Au lieu du modèle classique connu sous le nom de « Stochastic Block Model » (dans lequel les degrés sont homogènes) nous considérons un modèle où les degrés sont plus variables : le « Degree-Corrected Stochastic Block Model » (DC-SBM). Dans ce modèle les degrés de tous les nœuds sont pondérés - ce qui permet de générer des suites des degrés hétérogènes. Nous étudions ce modèle dans deux régimes: le régime dense et le régime « épars », ou « dilué ». Dans le régime dense, nous prouvons qu'un algorithme basé sur une matrice d'adjacence normalisée réussit à classifier correctement tous les nœuds sauf une fraction négligeable. Dans le régime épars il existe un seuil en termes de paramètres du modèle en-dessous lequel n'importe quel algorithme échoue par manque d'information. En revanche, nous prouvons qu'un algorithme utilisant la matrice « non-backtracking » réussit jusqu'au seuil - cette méthode est donc très robuste. Pour montrer cela nous caractérisons le spectre des graphes qui sont générés selon un DC-SBM dans son régime épars. Nous concluons cette partie par des tests sur des réseaux sociaux. II) Les marchés d'intermédiation en ligne tels que des plateformes de Question-Réponse et des plateformes de recrutement nécessitent un appariement basé sur une information incomplète des deux parties. Nous développons un modèle de système d'appariement entre tâches et serveurs représentant le comportement de telles plateformes. Pour ce modèle nous donnons une condition nécessaire et suffisante pour que le système puisse gérer un certain flux de tâches. Nous introduisons également une politique de « back-pressure » sous lequel le débit gérable par le système est maximal. Nous prouvons que cette politique atteint un débit strictement plus grand qu'une politique naturelle « gloutonne ». Nous concluons en validant nos résultats théoriques avec des simulations entrainées par des données de la plateforme Stack-Overflow. / In this thesis, we study two problems of machine learning: (I) community detection and (II) adaptive matching. I) It is well-known that many networks exhibit a community structure. Finding those communities helps us understand and exploit general networks. In this thesis we focus on community detection using so-called spectral methods based on the eigenvectors of carefully chosen matrices. We analyse their performance on artificially generated benchmark graphs. Instead of the classical Stochastic Block Model (which does not allow for much degree-heterogeneity), we consider a Degree-Corrected Stochastic Block Model (DC-SBM) with weighted vertices, that is able to generate a wide class of degree sequences. We consider this model in both a dense and sparse regime. In the dense regime, we show that an algorithm based on a suitably normalized adjacency matrix correctly classifies all but a vanishing fraction of the nodes. In the sparse regime, we show that the availability of only a small amount of information entails the existence of an information-theoretic threshold below which no algorithm performs better than random guess. On the positive side, we show that an algorithm based on the non-backtracking matrix works all the way down to the detectability threshold in the sparse regime, showing the robustness of the algorithm. This follows after a precise characterization of the non-backtracking spectrum of sparse DC-SBM's. We further perform tests on well-known real networks. II) Online two-sided matching markets such as Q&A forums and online labour platforms critically rely on the ability to propose adequate matches based on imperfect knowledge of the two parties to be matched. We develop a model of a task / server matching system for (efficient) platform operation in the presence of such uncertainty. For this model, we give a necessary and sufficient condition for an incoming stream of tasks to be manageable by the system. We further identify a so-called back-pressure policy under which the throughput that the system can handle is optimized. We show that this policy achieves strictly larger throughput than a natural greedy policy. Finally, we validate our model and confirm our theoretical findings with experiments based on user-contributed content on an online platform. Apprentissage automatique Détection des communautés Réseaux sociaux Degree-corrected stochastic block models Méthodes spectrales Matrices aléatoires Matrice non-backtracking Graphes aléatoires Appariement adaptatif Apprentissage par renforcement Théorie des files d’attente Machine learning Community detection Social networks Degree-corrected stochastic block models Spectral methods Random matrices Nonbacktracking matrix Random graphs Recommendation systems Reinforcement learning Queueing theory 519
135	Analyse temporelle et sémantique des réseaux sociaux typés à partir du contenu de sites généré par des utilisateurs sur le Web / Temporal and semantic analysis of richly typed social networks from user-generated content sites on the web Meng, Zide 07 November 2016 (has links) Nous proposons une approche pour détecter les sujets, les communautés d'intérêt non disjointes,l'expertise, les tendances et les activités dans des sites où le contenu est généré par les utilisateurs et enparticulier dans des forums de questions-réponses tels que StackOverFlow. Nous décrivons d'abordQASM (Questions & Réponses dans des médias sociaux), un système basé sur l'analyse de réseauxsociaux pour gérer les deux principales ressources d’un site de questions-réponses: les utilisateurs et lecontenu. Nous présentons également le vocabulaire QASM utilisé pour formaliser à la fois le niveaud'intérêt et l'expertise des utilisateurs. Nous proposons ensuite une approche efficace pour détecter lescommunautés d'intérêts. Elle repose sur une autre méthode pour enrichir les questions avec un tag plusgénéral en cas de besoin. Nous comparons trois méthodes de détection sur un jeu de données extrait dusite populaire StackOverflow. Notre méthode basée sur le se révèle être beaucoup plus simple et plusrapide, tout en préservant la qualité de la détection. Nous proposons en complément une méthode pourgénérer automatiquement un label pour un sujet détecté en analysant le sens et les liens de ses mots-clefs.Nous menons alors une étude pour comparer différents algorithmes pour générer ce label. Enfin, nousétendons notre modèle de graphes probabilistes pour modéliser conjointement les sujets, l'expertise, lesactivités et les tendances. Nous le validons sur des données du monde réel pour confirmer l'efficacité denotre modèle intégrant les comportements des utilisateurs et la dynamique des sujets / We propose an approach to detect topics, overlapping communities of interest, expertise, trends andactivities in user-generated content sites and in particular in question-answering forums such asStackOverFlow. We first describe QASM (Question & Answer Social Media), a system based on socialnetwork analysis to manage the two main resources in question-answering sites: users and contents. Wealso introduce the QASM vocabulary used to formalize both the level of interest and the expertise ofusers on topics. We then propose an efficient approach to detect communities of interest. It relies onanother method to enrich questions with a more general tag when needed. We compared threedetection methods on a dataset extracted from the popular Q&A site StackOverflow. Our method basedon topic modeling and user membership assignment is shown to be much simpler and faster whilepreserving the quality of the detection. We then propose an additional method to automatically generatea label for a detected topic by analyzing the meaning and links of its bag of words. We conduct a userstudy to compare different algorithms to choose the label. Finally we extend our probabilistic graphicalmodel to jointly model topics, expertise, activities and trends. We performed experiments with realworlddata to confirm the effectiveness of our joint model, studying the users’ behaviors and topicsdynamics Web social sémantique Analyse des médias sociaux Modèle graphique probabiliste Sites de questions-réponses Contenu généré par l’utilisateur Modélisation des thématiques Détection d’expertise Détection de communautés recouvrantes Social semantic web Social media mining Probabilistic graphical model Question answer sites User-generated content Topic modeling Expertise detection Overlapping community detection
136	Designing a Data-Driven Pipeline to Explore the Complexity of Emergency Medicine Patients Admitted to Hospital Wards / Design av en datadriven pipeline för att undersöka komplexiteten hos akutmedicinska patienter inlagda på sjukvårdsavdelningar Byström, Matilda January 2024 (has links) A prominent challenge in the healthcare system today is the limitation of resources in combi- nation with an increasing need for healthcare services. The pressure on healthcare is already extremely high and increasing due to a larger number of people seeking care as well as an aging population with an increased need for care. Therefore, it becomes more important to distribute resources effectively within healthcare to ensure high-quality care for everyone. Still, research shows that overcrowding of emergency departments and hospital wards is increasing affecting patient safety negatively with several negative implications including higher rates of medical errors and higher mortality. The problem is that healthcare is a complex system with many components that are interrelated and therefore hard to study with traditional approaches. Despite the huge quantity of studies on the overcrowding problem, there is yet to find a solution that could solve the problem. Thus, this thesis aims to design a data-driven pipeline to explore the clinical and logistical complexity of Emergency medicine patients admitted to hospital wards adopting a complex graph approach. Complex network theory provides a suitable tool to investigate complex networks by breaking complex systems down into smaller graphs with objects (nodes) and studying the relationship between these through various analysis tools. In this thesis, five complex networks were constructed representing co-morbidities in the car- diac, medicine, surgery, stroke, and orthopedic wards of the Academic Hospital of Uppsala, a hospital suffering from overcrowding. These networks were analyzed using degree distribution, centrality metrics, clustering coefficient, and community detection to reveal structural and clin- ical patterns. A comprehensive network of all hospital co-morbidities was also created and an- alyzed to compare it with the ward structures. Additionally, a network mapping patient flow from the emergency department based on chief complaints and ICD codes to wards was created and analyzed to identify admission patterns. The analysis of the co-morbidity networks revealed that there was an indication of structure between the wards. This was based on the visualization of nodes and edges of the networks, identified communities, and community comparisons between the wards. Further, it showed that there was a big overlap of common co-morbidities which could indicate the contrary. But it was also revealed that in terms of community structure, the wards were considerably different from each other indicating a good separation of diseases. The results of this research show that complex network theory could be used to increase the understanding of the complexity of healthcare wards in terms of the structure of diseases as well as clinical variability and allow for a discussion regarding if this is related to clinical or logistical factors. It also shows the potential of using complex network theory to increase the understanding of the path patients take from the emergency department to the wards based on the community detection analysis showing that there is a structure of where patient ends up based on the assigned ICD code and chief complaint in the emergency department. Previous studies have typically focused on specific diseases or patient flow within a single ward or the emergency department. This approach offers a tool to examine patient logistics across multiple wards alongside their clinical characteristics. The insights gained could help improve hospital structure by more efficiently distributing patients between wards, thereby enhancing resource use and hospital operations. Further research using complex network theory could deepen understanding of overcrowding issues and identify potential solutions. / En stor utmaning inom sjukvårdssystemet idag är begräsningen av resurser i kombination med ett ökat vårdbehov. Trycket på sjukvården är redan högt och ökar till följd av ett ökat antal personer som söker vård samt en åldrande befolkning med ett ökat vårdbehov. Därav blir det viktigare att fördela resurser inom sjukvården på ett effektivt sätt för att säkerställa en högkva- litativ vård till alla. Forskning visar dock att överbeläggningar på akutvårdsavdelningar och sjukvårdsavdelningar ökar vilket påverkar patientsäkerheten negativt med flera negativa kon- sekvenser däribland en högre andel medicinska misstag och en högre mortalitet. Problemet är att sjukvården är ett komplext system med många komponenter som samverkar och det är därav svårt att studera med traditionella tillvägagångssätt. Trots det höga antalet studier på överbeläggningar inom sjukvården behöver man fortfarande hitta en lösning på problemet. Därav är målet med denna avhandling att designa en datadriven pipeline för att undersöka den kliniska och logistiska komplexiteten hos patienter inlagda från akutvårdsavdelningen med hjälp av en komplex grafmetodik. Komplex nätverksteori är ett lämpligt verktyg för att studera komplexa nätverk genom att bryta ned det i mindre komponen- ter och undersöka sambanden mellan dem med hjälp av olika analysverktyg. I denna avhandling skapades 5 komplexa nätverk som representerade komorbiditeter utifrån tilldelad ICD-10-kod på hjärt-, medicin-, kirurgi-, stroke- och ortopediska avdelningen vid det akademiska sjukhuset i Uppsala, ett sjukhus som för närvarande lider av överbeläggningar. Nätverken analyserades med hjälp av gradfördelning, olika centralitetsmått, klusterkoefficient och samhällsdetektering för att identifiera skillnader eller likheter när det gäller struktur och klinisk variation. Ett heltäckande komplext nätverk skapades där alla komorbiditeter på hela sjukhuset inkluderades för att möjliggöra en jämförelse med strukturen på avdelningarna. Utö- ver detta, skapades och analyserades ett nätverk för att kartlägga patientflödet från akuten till sjukvårdsavdelningarna baserat på huvudorsak till patientens akutbesök och ICD kod. Analysen av samhällsstrukturen visade att det fanns en indikation av struktur mellan avdelning- arna. Detta baserat på visualisering av noder och kopplingar i nätverken, identifierade sam- hällen samt jämförelser av samhällen mellan avdelningarna. Vidare visade det dock att det fanns ett stort överlapp av vanliga komorbiditeter vilket kunde indikera motsatsen. Det visades dock att även när det gäller samhällsstruktur var avdelningarna väldigt olika vilket indikerade en god separering av sjukdomar. Resultaten av denna forskning visar att komplex nätverksteori kan användas för att öka förstå- elsen för komplexiteten på sjukvårdsavdelningarna gällande strukturen mellan sjukdomar såväl som klinisk variationen och öppnar upp för en diskussion om dessa är relaterade till kliniska eller logistiska faktorer. Det visar också potentialen att använda komplex nätverksteori för att öka förståelsen för den väg som patienterna tar från akutvårdsavdelningen till avdelningarna baserat på samhällsdetekteringsanalysen som visar att det finns en struktur av var patienten hamnar baserat på den tilldelade ICD-koden och huvudklagomål från akutvårdsavdelningen. Tidigare studier som har använt detta tillvägagångssätt har i huvudsak undersökt specifika sjuk- domar eller flöden på en specifik avdelning eller akutvårdsavdelning. Det här tillvägagångssät- tet ger ett verktyg för att utforska logistiken för patienters rutter till olika avdelningar samtidigt som deras kliniska egenskaper beaktas. Resultaten genom denna pipeline kan ge en grund för att öka förståelsen för hur man bättre kan strukturera sjukhuset genom att dela patienter mellanvavdelningar och genom detta effektivisera användningen av resurser och potentiellt förbättra rutiner på sjukhuset. Genom vidare studier, kan komplex nätverksteori användas för att öka förståelsen kring faktorer relaterade till problemet med överbeläggningar och hitta potentiella lösningar på problemet. Overcrowding Of Healthcare Complex Network Theory Co-morbidity Networks Complex Network Analysis Degree Distribution Centrality Metrics Clustering Coefficient Community Detection Överbeläggningar inom sjukvården komplex nätverks teori co-morbiditetsnät- verk komplex nätverksanalys gradfördelning centralitetsmått klusteringskoefficient sam- hällsdetektering Medical Engineering Medicinteknik Health Sciences Hälsovetenskaper Medical and Health Sciences Medicin och hälsovetenskap Transport Systems and Logistics Transportteknik och logistik

Page generated in 0.0983 seconds