Global ETD Search

111	事件導向動態社會網路分析應用於政治權力變化之觀察 / An application of event-based dynamic social network analysis for observing political power evolution 莊婉君, Chuang, Wan Chun Unknown Date (has links) 如何從大量的資料中擷取隱匿或不容易直接觀察的資訊，是重要的議題，社會網路提供了一個適當的系統描述模型與內部檢視分析的方法，過去社會網路分析多著重於靜態的分析，無法解釋發生在網路上的動態行為；我們的研究目的是從動態社會網路分析的角度，觀察政治權力的變化，將資料依時間切分成多個資料集，在各個資料集中，利用官員共同異動職務及共事資料建構網路，並使用EdgeBetweenness分群方法將網路做分群，以找出潛在的政治群組，接著再採用事件導向的方法(Event-based Framework)，比較連續兩個時間區間的網路分群結果，以觀察政治群體的動態發展，找出政治群組事件，並將其匯集成政治群組指標，以用來衡量政治群組的變動性及穩定性。我們提供了一個觀察政治權力變化的模型，透過網路建立、網路分群到觀察網路動態行為，找到不容易直接取得的資訊，我們也以此觀察模型解決以下問題：(1)觀察部門之接班梯隊之變化，(2)觀察特定核心人物之核心成員組成模式，(3)部門專業才能單一性或多元性之觀察。實驗結果顯示，利用政治群組事件設計的政治群組指標，可實際反應政府部門選用人才的傾向為內部調任或外部選用。 / Extracting implicit information from a considerable amount of data is an important intelligent data processing task. Social network analysis is appropriate for this purpose due to its emphasis on the relationship between nodes and the structure of networked interactions. Most research in the past has focused on a static point of view. It can't account for whatever action is taking place in the network. Our research objective is to observe the evolution of political power by dynamic social network analysis. We begin by creating static graphs at different time according to the synchronous job change between the government officials or the relationship between the government officials whom work in the same government agency. We obtain political communities from each of these snapshot graphs using edge betweenness clustering method. Next we define a set of evolutionary events of political communities using event-based framework. We compare two consecutive snapshots to capture the evolutionary events of political communities. We also develop two evolutionary political community metrics to measure the stability of political communities. We propose a model of observing the evolution of political power by three steps－network construction, community identification and community evolution tracking. The approach is shown to be effectual for the purposes of: (1) finding succession pool members in government agencies, (2) observing the inner circle of a leading political figure, (3) measuring the specialized degree of government agencies. Experiments also show that our community evolution metrics reflect the tendency of whether a government agency conducts internal succession or outside appointment. 動態社會網路分析網路分群政府專業團隊政治權力觀察 dynamic social network community detection political community political power observation
112	Inférence statistique en grande dimension pour des modèles structurels. Modèles linéaires généralisés parcimonieux, méthode PLS et polynômes orthogonaux et détection de communautés dans des graphes. / Statistical inference for structural models in high dimension. Sparse generalized linear models, PLS through orthogonal polynomials and community detection in graphs Blazere, Melanie 01 July 2015 (has links) Cette thèse s'inscrit dans le cadre de l'analyse statistique de données en grande dimension. Nous avons en effet aujourd'hui accès à un nombre toujours plus important d'information. L'enjeu majeur repose alors sur notre capacité à explorer de vastes quantités de données et à en inférer notamment les structures de dépendance. L'objet de cette thèse est d'étudier et d'apporter des garanties théoriques à certaines méthodes d'estimation de structures de dépendance de données en grande dimension.La première partie de la thèse est consacrée à l'étude de modèles parcimonieux et aux méthodes de type Lasso. Après avoir présenté les résultats importants sur ce sujet dans le chapitre 1, nous généralisons le cas gaussien à des modèles exponentiels généraux. La contribution majeure à cette partie est présentée dans le chapitre 2 et consiste en l'établissement d'inégalités oracles pour une procédure Group Lasso appliquée aux modèles linéaires généralisés. Ces résultats montrent les bonnes performances de cet estimateur sous certaines conditions sur le modèle et sont illustrés dans le cas du modèle Poissonien. Dans la deuxième partie de la thèse, nous revenons au modèle de régression linéaire, toujours en grande dimension mais l'hypothèse de parcimonie est cette fois remplacée par l'existence d'une structure de faible dimension sous-jacente aux données. Nous nous penchons dans cette partie plus particulièrement sur la méthode PLS qui cherche à trouver une décomposition optimale des prédicteurs étant donné un vecteur réponse. Nous rappelons les fondements de la méthode dans le chapitre 3. La contribution majeure à cette partie consiste en l'établissement pour la PLS d'une expression analytique explicite de la structure de dépendance liant les prédicteurs à la réponse. Les deux chapitres suivants illustrent la puissance de cette formule aux travers de nouveaux résultats théoriques sur la PLS . Dans une troisième et dernière partie, nous nous intéressons à la modélisation de structures au travers de graphes et plus particulièrement à la détection de communautés. Après avoir dressé un état de l'art du sujet, nous portons notre attention sur une méthode en particulier connue sous le nom de spectral clustering et qui permet de partitionner les noeuds d'un graphe en se basant sur une matrice de similarité. Nous proposons dans cette thèse une adaptation de cette méthode basée sur l'utilisation d'une pénalité de type l1. Nous illustrons notre méthode sur des simulations. / This thesis falls within the context of high-dimensional data analysis. Nowadays we have access to an increasing amount of information. The major challenge relies on our ability to explore a huge amount of data and to infer their dependency structures.The purpose of this thesis is to study and provide theoretical guarantees to some specific methods that aim at estimating dependency structures for high-dimensional data. The first part of the thesis is devoted to the study of sparse models through Lasso-type methods. In Chapter 1, we present the main results on this topic and then we generalize the Gaussian case to any distribution from the exponential family. The major contribution to this field is presented in Chapter 2 and consists in oracle inequalities for a Group Lasso procedure applied to generalized linear models. These results show that this estimator achieves good performances under some specific conditions on the model. We illustrate this part by considering the case of the Poisson model. The second part concerns linear regression in high dimension but the sparsity assumptions is replaced by a low dimensional structure underlying the data. We focus in particular on the PLS method that attempts to find an optimal decomposition of the predictors given a response. We recall the main idea in Chapter 3. The major contribution to this part consists in a new explicit analytical expression of the dependency structure that links the predictors to the response. The next two chapters illustrate the power of this formula by emphasising new theoretical results for PLS. The third and last part is dedicated to graphs modelling and especially to community detection. After presenting the main trends on this topic, we draw our attention to Spectral Clustering that allows to cluster nodes of a graph with respect to a similarity matrix. In this thesis, we suggest an alternative to this method by considering a $l_1$ penalty. We illustrate this method through simulations. Grande dimension Méthode de régularisation Méthode de réduction de dimension High dimension Sparse generalized linear models Regularization methods Dimension reduction methods Partial least squares Community detection in graphs 519
113	Mineração de estruturas musicais e composição automática utilizando redes complexas / Musical structures mining and composition using complex networks Andrés Eduardo Coca Salazar 26 November 2014 (has links) A teoria das redes complexas tem se tornado cada vez mais em uma poderosa teoria computacional capaz de representar, caracterizar e examinar sistemas com estrutura não trivial, revelando características intrínsecas locais e globais que facilitam a compreensão do comportamento e da dinâmica de tais sistemas. Nesta tese são exploradas as vantagens das redes complexas na resolução de problemas relacionados com tarefas do âmbito musical, especificamente, são estudadas três abordagens: reconhecimento de padrões, mineração e síntese de músicas. A primeira abordagem é desempenhada através do desenvolvimento de um método para a extração do padrão rítmico de uma peça musical de caráter popular. Nesse tipo de peças coexistem diferentes espécies de padrões rítmicos, os quais configuram uma hierarquia que é determinada por aspectos funcionais dentro da base rítmica. Os padrões rítmicos principais são caracterizados por sua maior incidência dentro do discurso musical, propriedade que é refletida na formação de comunidades dentro da rede. Técnicas de detecção de comunidades são aplicadas na extração dos padrões rítmicos, e uma medida para diferenciar os padrões principais dos secundários é proposta. Os resultados mostram que a qualidade da extração é sensível ao algoritmo de detecção, ao modo de representação do ritmo e ao tratamento dado às linhas de percussão na hora de gerar a rede. Uma fase de mineração foi desempenhada usando medidas topológicas sobre a rede obtida após a remoção dos padrões secundários. Técnicas de aprendizado supervisionado e não-supervisionado foram aplicadas para discriminar o gênero musical segundo os atributos calculados na fase de mineração. Os resultados revelam a eficiência da metodologia proposta, a qual foi constatada através de um teste de significância estatística. A última abordagem foi tratada mediante o desenvolvimento de modelos para a composição de melodias através de duas perspectivas, na primeira perspectiva é usada uma caminhada controlada por critérios sobre redes complexas predefinidas e na segunda redes neurais recorrentes e sistemas dinâmicos caóticos. Nesta última perspectiva, o modelo é treinado para compor uma melodia com um valor preestabelecido de alguma característica tonal subjetiva através de uma estratégia de controle proporcional que modifica a complexidade de uma melodia caótica, melodia que atua como entrada de inspiração da rede. / The theory of complex networks has become increasingly a powerful computational tool capable of representing, characterizing and examining systems with non-trivial structure, revealing both local and global intrinsic structures that facilitate the understanding of the behavior and dynamics of such systems. In this thesis, the virtues of complex networks in solving problems related to tasks within the musical scope are explored. Specifically, three approaches are studied: pattern recognition, data mining, and synthesis. The first perspective is addressed by developing a method for extracting the rhythmic pattern of a piece of popular music. In that type of musical pieces, there coexist different types of rhythm patterns which constitute a hierarchy determined by functional aspects within the basic rhythm. The main rhythmic patterns are characterized by a higher incidence within the musical discourse and this factor is reflected in the formation of communities within the network constructed from the music piece. Community detection techniques are applied in the extraction of rhythmic patterns, and a measure to distinguish the main patterns of the secondary is proposed. The results showed that the quality of extraction is sensitive to the detection algorithm, the method of representing rhythm, and treatment of percussion lines when generating the network. Data mining is performed using topological measures over the network obtained after the removal of secondary patterns. Techniques of supervised and unsupervised learning are applied to discriminate the musical genre according to the attributes calculated in the data mining phase. The quantitative results show the efficiency of the proposed methodology, which is confirmed by a test of statistical significance. Regarding the melody generation, an algorithm using a walk controlled by criteria on predefined complex networks has been developed, as well as the development of melody composition models using recurrent neural networks and chaotic dynamical systems. In the last approach, the model is trained to compose a melody with a subjective characteristic melodic value pre-established by a proportional control strategy that acts on the parameters of a chaotic melody as input inspiration. Detecção de comunidades Identificação de gêneros musicais Reconhecimento de padrões rítmicos Redes complexas Redes neurais artificiais Sistemas dinâmicos caóticos Artificial neural networks Chaotic dynamical systems Community detection Music genre recognition Rhythmic pattern recognition
114	„Community Detection“ als Ansatz zur Identifikation von Innovatoren in Sozialen Netzwerken Zeini, Sam, Hoppe, Ulrich January 2010 (has links) No description available. info:eu-repo/classification/ddc/330 ddc:330
115	Community Detection of Anomaly in Large-Scale Network Dissertation - Adefolarin Bolaji .pdf Adefolarin Alaba Bolaji (10723926) 29 April 2021 (has links) <p>The detection of anomalies in real-world networks is applicable in different domains; the application includes, but is not limited to, credit card fraud detection, malware identification and classification, cancer detection from diagnostic reports, abnormal traffic detection, identification of fake media posts, and the like. Many ongoing and current researches are providing tools for analyzing labeled and unlabeled data; however, the challenges of finding anomalies and patterns in large-scale datasets still exist because of rapid changes in the threat landscape. </p><p>In this study, I implemented a novel and robust solution that combines data science and cybersecurity to solve complex network security problems. I used Long Short-Term Memory (LSTM) model, Louvain algorithm, and PageRank algorithm to identify and group anomalies in large-scale real-world networks. The network has billions of packets. The developed model used different visualization techniques to provide further insight into how the anomalies in the network are related. </p><p>Mean absolute error (MAE) and root mean square error (RMSE) was used to validate the anomaly detection models, the results obtained for both are 5.1813e-04 and 1e-03 respectively. The low loss from the training phase confirmed the low RMSE at loss: 5.1812e-04, mean absolute error: 5.1813e-04, validation loss: 3.9858e-04, validation mean absolute error: 3.9858e-04. The result from the community detection shows an overall modularity value of 0.914 which is proof of the existence of very strong communities among the anomalies. The largest sub-community of the anomalies connects 10.42% of the total nodes of the anomalies. </p><p>The broader aim and impact of this study was to provide sophisticated, AI-assisted countermeasures to cyber-threats in large-scale networks. To close the existing gaps created by the shortage of skilled and experienced cybersecurity specialists and analysts in the cybersecurity field, solutions based on out-of-the-box thinking are inevitable; this research was aimed at yielding one of such solutions. It was built to detect specific and collaborating threat actors in large networks and to help speed up how the activities of anomalies in any given large-scale network can be curtailed in time.</p><div><div><div> </div> </div> </div> <br> Applied Computer Science Computer System Security Computer Communications Networks Anomaly Detection Community Detection Artificial Intelligence Deep Learning Network Traffic Large Scale Networks Big Data Analytics Network Graph Modularity Data Visualization
116	Information diffusion and opinion dynamics in social networks / Dissémination de l’information et dynamique des opinions dans les réseaux sociaux Louzada Pinto, Julio Cesar 14 January 2016 (has links) La dissémination d'information explore les chemins pris par l'information qui est transmise dans un réseau social, afin de comprendre et modéliser les relations entre les utilisateurs de ce réseau, ce qui permet une meilleur compréhension des relations humaines et leurs dynamique. Même si la priorité de ce travail soit théorique, en envisageant des aspects psychologiques et sociologiques des réseaux sociaux, les modèles de dissémination d'information sont aussi à la base de plusieurs applications concrètes, comme la maximisation d'influence, la prédication de liens, la découverte des noeuds influents, la détection des communautés, la détection des tendances, etc. Cette thèse est donc basée sur ces deux facettes de la dissémination d'information: nous développons d'abord des cadres théoriques mathématiquement solides pour étudier les relations entre les personnes et l'information, et dans un deuxième moment nous créons des outils responsables pour une exploration plus cohérente des liens cachés dans ces relations. Les outils théoriques développés ici sont les modèles de dynamique d'opinions et de dissémination d'information, où nous étudions le flot d'informations des utilisateurs dans les réseaux sociaux, et les outils pratiques développés ici sont un nouveau algorithme de détection de communautés et un nouveau algorithme de détection de tendances dans les réseaux sociaux / Our aim in this Ph. D. thesis is to study the diffusion of information as well as the opinion dynamics of users in social networks. Information diffusion models explore the paths taken by information being transmitted through a social network in order to understand and analyze the relationships between users in such network, leading to a better comprehension of human relations and dynamics. This thesis is based on both sides of information diffusion: first by developing mathematical theories and models to study the relationships between people and information, and in a second time by creating tools to better exploit the hidden patterns in these relationships. The theoretical tools developed in this thesis are opinion dynamics models and information diffusion models, where we study the information flow from users in social networks, and the practical tools developed in this thesis are a novel community detection algorithm and a novel trend detection algorithm. We start by introducing an opinion dynamics model in which agents interact with each other about several distinct opinions/contents. In our framework, agents do not exchange all their opinions with each other, they communicate about randomly chosen opinions at each time. We show, using stochastic approximation algorithms, that under mild assumptions this opinion dynamics algorithm converges as time increases, whose behavior is ruled by how users choose the opinions to broadcast at each time. We develop next a community detection algorithm which is a direct application of this opinion dynamics model: when agents broadcast the content they appreciate the most. Communities are thus formed, where they are defined as groups of users that appreciate mostly the same content. This algorithm, which is distributed by nature, has the remarkable property that the discovered communities can be studied from a solid mathematical standpoint. In addition to the theoretical advantage over heuristic community detection methods, the presented algorithm is able to accommodate weighted networks, parametric and nonparametric versions, with the discovery of overlapping communities a byproduct with no mathematical overhead. In a second part, we define a general framework to model information diffusion in social networks. The proposed framework takes into consideration not only the hidden interactions between users, but as well the interactions between contents and multiple social networks. It also accommodates dynamic networks and various temporal effects of the diffusion. This framework can be combined with topic modeling, for which several estimation techniques are derived, which are based on nonnegative tensor factorization techniques. Together with a dimensionality reduction argument, this techniques discover, in addition, the latent community structure of the users in the social networks. At last, we use one instance of the previous framework to develop a trend detection algorithm designed to find trendy topics in a social network. We take into consideration the interaction between users and topics, we formally define trendiness and derive trend indices for each topic being disseminated in the social network. These indices take into consideration the distance between the real broadcast intensity and the maximum expected broadcast intensity and the social network topology. The proposed trend detection algorithm uses stochastic control techniques in order calculate the trend indices, is fast and aggregates all the information of the broadcasts into a simple one-dimensional process, thus reducing its complexity and the quantity of necessary data to the detection. To the best of our knowledge, this is the first trend detection algorithm that is based solely on the individual performances of topics Dynamique d'opinions Algorithme d'approximation stochastique Détection des communautés Dissémination d'information Processus de Hawkes Détection des tendances Contrôle stochastique Opinion dynamics Stochastic approximation algorithms Community detection Information diffusion Hawkes processes Trend detection Stochastic control
117	Machine Learning Algorithms for Influence Maximization on Social Networks Abhishek Kumar Umrawal (16787802) 08 August 2023 (has links) <p>With an increasing number of users spending time on social media platforms and engaging with family, friends, and influencers within communities of interest (such as in fashion, cooking, gaming, etc.), there are significant opportunities for marketing firms to leverage word-of-mouth advertising on these platforms. In particular, marketing firms can select sets of influencers within relevant communities to sponsor, namely by providing free product samples to those influencers so that so they will discuss and promote the product on their social media accounts.</p><p>The question of which set of influencers to sponsor is known as <b>influence maximization</b> (IM) formally defined as follows: "if we can try to convince a subset of individuals in a social network to adopt a new product or innovation, and the goal is to trigger a large cascade of further adoptions, which set of individuals should we target?'' Under standard diffusion models, this optimization problem is known to be NP-hard. This problem has been widely studied in the literature and several approaches for solving it have been proposed. Some approaches provide near-optimal solutions but are costly in terms of runtime. On the other hand, some approaches are faster but heuristics, i.e., do not have approximation guarantees.</p><p>In this dissertation, we study the influence maximization problem extensively. We provide efficient algorithms for solving the original problem and its important generalizations. Furthermore, we provide theoretical guarantees and experimental evaluations to support the claims made in this dissertation.</p><p>We first study the original IM problem referred to as the discrete influence maximization (DIM) problem where the marketer can either provide a free sample to an influencer or not, i.e., they cannot give fractional discounts like 10% off, etc. As already mentioned the existing solution methods (for instance, the simulation-based greedy algorithm) provide near-optimal solutions that are costly in terms of runtime and the approaches that are faster do not have approximation guarantees. Motivated by the idea of addressing this trade-off between accuracy and runtime, we propose a community-aware divide-and-conquer framework to provide a time-efficient solution to the DIM problem. The proposed framework outperforms the standard methods in terms of runtime and the heuristic methods in terms of influence.</p><p>We next study a natural extension of the DIM problem referred to as the fractional influence maximization (FIM) problem where the marketer may offer fractional discounts (as opposed to either providing a free sample to an influencer or not in the DIM problem) to the influencers. Clearly, the FIM problem provides more flexibility to the marketer in allocating the available budget among different influencers. The existing solution methods propose to use a continuous extension of the simulation-based greedy approximation algorithm for solving the DIM problem. This continuous extension suggests greedily building the solution for the given fractional budget by taking small steps through the interior of the feasible region. On the contrary, we first characterize the solution to the FIM problem in terms of the solution to the DIM problem. We then use this characterization to propose an efficient greedy approximation algorithm that only iterates through the corners of the feasible region. This leads to huge savings in terms of runtime compared to the existing methods that suggest iterating through the interior of the feasible region. Furthermore, we provide an approximation guarantee for the proposed greedy algorithm to solve the FIM problem.</p><p>Finally, we study another extension of the DIM problem referred to as the online discrete influence maximization (ODIM) problem, where the marketer provides free samples not just once but repeatedly over a given time horizon and the goal is to maximize the cumulative influence over time while receiving instantaneous feedback. The existing solution methods are based on semi-bandit instantaneous feedback where the knowledge of some intermediate aspects of how the influence propagates in the social network is assumed or observed. For instance, which specific individuals became influenced at the intermediate steps during the propagation? However, for social networks with user privacy, this information is not available. Hence, we consider the ODIM problem with full-bandit feedback where no knowledge of the underlying social network or diffusion process is assumed. We note that the ODIM problem is an instance of the stochastic combinatorial multi-armed bandit (CMAB) problem with submodular rewards. To solve the ODIM problem, we provide an efficient algorithm that outperforms the existing methods in terms of influence, and time and space complexities.</p><p>Furthermore, we point out the connections of influence maximization with a related problem of disease outbreak prevention and a more general problem of submodular maximization. The methods proposed in this dissertation can also be used to solve those problems.</p> Industrial engineering Reinforcement learning Operations research Optimisation Social networks Viral marketing Influence maximization Submodular maximization Discrete influence maximization Community detection Fractional influence maximization Partial incentives Online discrete influence maximization Combinatorial multi-armed bandits
118	Spam Analysis and Detection for User Generated Content in Online Social Networks Tan, Enhua 23 July 2013 (has links) No description available. Computer Engineering Computer Science user generated content online social networks user behavior stretched exponential distribution spam filtering spam detection spam classification decision tree social graph user-link graph Sybil attack community detection BARS UNIK
119	[en] A MIP APPROACH FOR COMMUNITY DETECTION IN THE STOCHASTIC BLOCK MODEL / [pt] UMA ABORDAGEM DE PROGRAMAÇÃO INTEIRA MISTA PARA DETECÇÃO DE COMUNIDADES NO STOCHASTIC BLOCK MODEL BRENO SERRANO DE ARAUJO 04 November 2020 (has links) [pt] O Degree-Corrected Stochastic Block Model (DCSBM) é um modelo popular para geração de grafos aleatórios com estrutura de comunidade, dada uma sequência de graus esperados. O princípio básico de algoritmos que utilizam o DCSBM para detecção de comunidades é ajustar os parâmetros do modelo a dados observados, de forma a encontrar a estimativa de máxima verossimilhança, ou maximum likelihood estimate (MLE), dos parâmetros do modelo. O problema de otimização para o MLE é comumente resolvido por meio de heurísticas. Neste trabalho, propomos métodos de programação matemática, para resolver de forma exata o problema de otimização descrito, e comparamos os métodos propostos com heurísticas baseadas no algoritmo de expectation-maximization (EM). Métodos exatos são uma ferramenta fundamental para a avaliação de heurísticas, já que nos permitem identificar se uma solução heurística é sub-ótima e medir seu gap de otimalidade. / [en] The Degree-Corrected Stochastic Block Model (DCSBM) is a popular model to generate random graphs with community structure given an expected degree sequence. The standard approach of community detection algorithms based on the DCSBM is to search for the model parameters which are the most likely to have produced the observed network data, via maximum likelihood estimation (MLE). Current techniques for the MLE problem are heuristics and therefore do not guarantee convergence to the optimum. We present mathematical programming formulations and exact solution methods that can provably find the model parameters and community assignments of maximum likelihood given an observed graph. We compare the proposed exact methods with classical heuristic algorithms based on expectation-maximization (EM). The solutions given by exact methods give us a principled way of recognizing when heuristic solutions are sub-optimal and measuring how far they are from optimality. [pt] PROGRAMACAO INTEIRA MISTA [pt] APRENDIZADO NAO SUPERVISIONADO [pt] STOCHASTIC BLOCK MODEL [pt] DETECCAO DE COMUNIDADES [pt] BUSCA LOCAL [pt] MACHINE LEARNING [en] MIXED INTEGER PROGRAMMING [en] UNSUPERVISED LEARNING [en] STOCHASTIC BLOCK MODEL [en] COMMUNITY DETECTION [en] LOCAL SEARCH [en] MACHINE LEARNING
120	[en] A MODEL-BASED FRAMEWORK FOR SEMI-SUPERVISED CLUSTERING AND COMMUNITY DETECTION / [pt] UM FRAMEWORK BASEADO EM MODELO PARA CLUSTERIZAÇÃO SEMISSUPERVISIONADA E DETECÇÃO DE COMUNIDADES DANIEL LEMES GRIBEL 09 September 2021 (has links) [pt] Em clusterização baseada em modelos, o objetivo é separar amostras de dados em grupos significativos, otimizando a aderência dos dados observados a um modelo matemático. A recente adoção de clusterização baseada em modelos tem permitido a profissionais e usuários mapearem padrões complexos nos dados e explorarem uma ampla variedade de aplicações. Esta tese investiga abordagens orientadas a modelos para detecção de comunidades e para o estudo de clusterização semissupervisionada, adotando uma perspectiva baseada em máxima verossimilhança. Focamos primeiramente na exploração de técnicas de otimização com restrições para apresentar um novo modelo de detecção de comunidades por meio de modelos de blocos estocásticos (SBMs). Mostramos que a formulação com restrições revela comunidades estruturalmente diferentes daquelas obtidas com modelos clássicos. Em seguida, estudamos um cenário onde anotações imprecisas são fornecidas na forma de relações must-link e cannot-link, e propomos um modelo de clusterização semissupervisionado. Nossa análise experimental mostra que a incorporação de supervisão parcial e de conhecimento prévio melhoram significativamente os agrupamentos. Por fim, examinamos o problema de clusterização semissupervisionada na presença de rótulos de classe não confiáveis. Investigamos o caso em que grupos de anotadores deliberadamente classificam incorretamente as amostras de dados e propomos um modelo para lidar com tais anotações incorretas. / [en] In model-based clustering, we aim to separate data samples into meaningful groups by optimizing the fit of some observed data to a mathematical model. The recent adoption of model-based clustering has allowed practitioners to model complex patterns in data and explore a wide range of applications. This thesis investigates model-driven approaches for community detection and semisupervised clustering by adopting a maximum-likelihood perspective. We first focus on exploiting constrained optimization techniques to present a new model for community detection with stochastic block models (SBMs). We show that the proposed constrained formulation reveals communities structurally different from those obtained with classical community detection models. We then study a setting where inaccurate annotations are provided as must-link and cannot-link relations, and propose a novel semi-supervised clustering model. Our experimental analysis shows that incorporating partial supervision and appropriately encoding prior user knowledge significantly enhance clustering performance. Finally, we examine the problem of semi-supervised clustering in the presence of unreliable class labels. We focus on the case where groups of untrustworthy annotators deliberately misclassify data samples and propose a model to handle such incorrect statements. [pt] APRENDIZADO DE MAQUINA [pt] MODELOS DE BLOCOS ESTOCASTICOS [pt] AGRUPAMENTO SEMISSUPERVISIONADO [pt] DETECCAO DE COMUNIDADES [pt] AGRUPAMENTO [pt] OTIMIZACAO [pt] MINERACAO DE DADOS [en] MACHINE LEARNING [en] STOCHASTICK BLOCK MODELS [en] SEMISUPERVISED CLUSTERING [en] COMMUNITY DETECTION [en] GROUPING [en] OPTIMIZATION [en] DATA MINING

Search results