141 |
Topics in spatial and dynamical phase transitions of interacting particle systemsRestrepo Lopez, Ricardo 19 August 2011 (has links)
In this work we provide several improvements in the study of phase transitions
of interacting particle systems:
- We determine a quantitative relation between non-extremality of the limiting Gibbs measure of a tree-based spin system, and the temporal mixing of
the Glauber Dynamics over its finite projections. We define the concept of 'sensitivity' of a reconstruction scheme to establish such a relation. In particular, we focus on the independent sets model, determining a phase
transition for the mixing time of the Glauber dynamics at the same location of
the extremality threshold of the simple invariant Gibbs version of the model.
- We develop the technical analysis of the so-called spatial mixing conditions for interacting particle systems to account for the connectivity structure of the underlying graph. This analysis leads to improvements regarding the location of the uniqueness/non-uniqueness phase transition for the independent sets model over amenable graphs; among them, the elusive hard-square model in lattice statistics, which has received attention since Baxter's solution of the analogous hard-hexagon in 1980.
- We build on the work of Montanari and Gerschenfeld to determine the existence of correlations for the coloring model in sparse random graphs. In particular, we prove that correlations exist above the 'clustering' threshold of such a model; thus providing further evidence for the conjectural algorithmic 'hardness' occurring at such a point.
|
142 |
A multi-resolution discontinuous Galerkin method for rapid simulation of thermal systemsGempesaw, Daniel 29 August 2011 (has links)
Efficient, accurate numerical simulation of coupled heat transfer and fluid dynamics systems continues to be a challenge. Direct numerical simulation (DNS) packages like FLU- ENT exist and are sufficient for design and predicting flow in a static system, but in larger systems where input parameters can change rapidly, the cost of DNS increases prohibitively. Major obstacles include handling the scales of the system accurately - some applications span multiple orders of magnitude in both the spatial and temporal dimensions, making an accurate simulation very costly. There is a need for a simulation method that returns accurate results of multi-scale systems in real time. To address these challenges, the Multi- Resolution Discontinuous Galerkin (MRDG) method has been shown to have advantages over other reduced order methods. Using multi-wavelets as the local approximation space provides an inherently efficient method of data compression, while the unique features of the Discontinuous Galerkin method make it well suited to composition with wavelet theory. This research further exhibits the viability of the MRDG as a new approach to efficient, accurate thermal system simulations. The development and execution of the algorithm will be detailed, and several examples of the utility of the MRDG will be included. Comparison between the MRDG and the "vanilla" DG method will also be featured as justification of the advantages of the MRDG method.
|
143 |
Sub-Polyhedral Compilation using (Unit-)Two-Variables-Per-Inequality Polyhedra / Compilation sous-polyédrique reposant sur des systèmes à deux variables par inégalitéUpadrasta, Ramakrishna 13 March 2013 (has links)
Notre étude de la compilation sous-polyédrique est dominée par l’introduction de la notion l’ordonnancement affine sous-polyédrique, pour laquelle nous proposons une technique utilisant des sous-polyèdres (U)TVPI. Dans ce cadre, nous introduisons des algorithmes capables de construire des sous-approximations de systèmes de contraintes résultant de problèmes d’ordonnancement affine. Cette technique repose sur des algorithmes polynomiaux simples pour approcher un polyèdre quelconque par un polyèdre (U)TVPI. Nos algorithmes sont suffisamment génériques pour s’appliquer à de nombreux problèmes d’ordonnancement, de parallélisation, et d’optimisation de boucles, réduisant leur complexité temporelle à des fonctions polynomiales. Nous introduisons également une méthode pour la génération de code utilisant des algorithmes sous-polyédriques, tirant parti de la faible complexité des sous-polyèdres (U)TVPI. Dans ce cadre, nous montrons comment réduire la complexité associée aux générateurs de code les plus populaires, ramenant la complexité de plusieurs facteurs exponentiels à des fonctions polynomiales. Nombre de ces techniques sont évaluées expérimentalement. Pour cela, nous avons réalisé une version modifiée du compilateur PLuTo, capable de paralléliser et d’optimiser des nids de boucles pour des architectures multi-cœurs à l’aide de transformations affines, et notamment de partitionnement (tiling). Nous montrons qu’une majorité des noyaux de calcul de la suite Polybench (2.0) peut être manipulée à l’aide de notre technique d’ordonnancement, en préservant la faisabilité des polyèdres lors des sous-approximations. L’utilisation des systèmes approchés par des sous-polyèdres conduit à des gains asymptotiques en complexité, qui se traduit par des réductions significatives en temps de compilation, par rapport à un solveur de programmation linéaire de référence. Nous vérifions également que le code généré par notre prototype de parallélisation sous-polyédrique est compétitif par rapport à la performance du code généré par Pluto. / The goal of this thesis is to design algorithms that run with better complexity when compiling or parallelizing loop programs. The framework within which our algorithms operate is the polyhedral model of compilation which has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations of the above framework remain one important weakness. We address it by introducing sub-polyhedral compilation by using (Unit-)Two-Variable-Per-Inequality or (U)TVPI Polyhedra, namely polyhedrawith restricted constraints of the type ax_{i}+bx_{j}\le c (\pm x_{i}\pm x_{j}\le c). A major focus of our sub-polyhedral compilation is the introduction of sub-polyhedral scheduling, where we propose a technique for scheduling using (U)TVPI polyhedra. As part of this, we introduce algorithms that can be used to construct under-aproximations of the systems of constraints resulting from affine scheduling problems. This technique relies on simple polynomial time algorithms to under approximate a general polyhedron into (U)TVPI polyhedra. The above under-approximation algorithms are generic enough that they can be used for many kinds of loop parallelization scheduling problems, reducing each of their complexities to asymptotically polynomial time. We also introduce sub-polyhedral code-generation where we propose algorithms to use the improved complexities of (U)TVPI sub-polyhedra in polyhedral code generation. In this problem, we show that the exponentialities associated with the widely used polyhedral code generators could be reduced to polynomial time using the improved complexities of (U)TVPI sub-polyhedra. The above presented sub-polyhedral scheduling techniques are evaluated in an experimental framework. For this, we modify the state-of-the-art PLuTo compiler which can parallelize for multi-core architectures using permutation and tiling transformations. We show that using our scheduling technique, the above under-approximations yield polyhedra that are non-empty for 10 out of 16 benchmarks from the Polybench (2.0) kernels. Solving the under-approximated system leads to asymptotic gains in complexity, and shows practically significant improvements when compared to a traditional LP solver. We also verify that code generated by our sub-polyhedral parallelization prototype matches the performance of PLuTo-optimized code when the under-approximation preserves feasibility.
|
144 |
O problema da subsequência comum máxima sem repetições / The repetition-free longest common subsequence problemChristian Tjandraatmadja 26 July 2010 (has links)
Exploramos o seguinte problema: dadas duas sequências X e Y sobre um alfabeto finito, encontre uma subsequência comum máxima de X e Y sem símbolos repetidos. Estudamos a estrutura deste problema, particularmente do ponto de vista de grafos e de combinatória poliédrica. Desenvolvemos algoritmos de aproximação e heurísticas para este problema. O enfoque deste trabalho está na construção de um algoritmo baseado na técnica branch-and-cut, aproveitando-nos de um algoritmo de separação eficiente e de heurísticas e técnicas para encontrarmos uma solução ótima mais cedo. Também estudamos um problema mais fácil no qual este problema é baseado: dadas duas sequências X e Y sobre um alfabeto finito, encontre uma subsequência comum máxima de X e Y. Exploramos este problema do ponto de vista de combinatória poliédrica e descrevemos vários algoritmos conhecidos para resolvê-lo. / We explore the following problem: given two sequences X and Y over a finite alphabet, find a longest common subsequence of X and Y without repeated symbols. We study the structure of this problem, particularly from the point of view of graphs and polyhedral combinatorics. We develop approximation algorithms and heuristics for this problem. The focus of this work is in the construction of an algorithm based on the branch-and-cut technique, taking advantage of an efficient separation algorithm and of heuristics and techniques to find an optimal solution earlier. We also study an easier problem on which this problem is based: given two sequences X and Y over a finite alphabet, find a longest common subsequence of X and Y. We explore this problem from the point of view of polyhedral combinatorics and describe several known algorithms to solve it.
|
145 |
Finding A Subset Of Non-defective Items From A Large Population : Fundamental Limits And Efficient AlgorithmsSharma, Abhay 05 1900 (has links) (PDF)
Consider a large population containing a small number of defective items. A commonly
encountered goal is to identify the defective items, for example, to isolate them. In the classical non-adaptive group testing (NAGT) approach, one groups the items into subsets, or pools, and runs tests for the presence of a defective itemon each pool. Using the outcomes the tests, a fundamental goal of group testing is to reliably identify the complete set of defective items with as few tests as possible. In contrast, this thesis studies a non-defective subset identification problem, where the primary goal is to identify a “subset” of “non-defective” items given the test outcomes. The main contributions of this thesis are:
We derive upper and lower bounds on the number of nonadaptive group tests
required to identify a given number of non-defective items with arbitrarily small
probability of incorrect identification as the population size goes to infinity. We
show that an impressive reduction in the number of tests is achievable compared
to the approach of first identifying all the defective items and then picking the
required number of non-defective items from the complement set. For example, in the asymptotic regime with the population size N → ∞, to identify L nondefective items out of a population containing K defective items, when the tests are reliable, our results show that O _ K logK L N _ measurements are sufficient when L ≪ N − K and K is fixed. In contrast, the necessary number of tests using the conventional approach grows with N as O _ K logK log N K_ measurements. Our
results are derived using a general sparse signal model, by virtue of which, they
are also applicable to other important sparse signal based applications such as
compressive sensing.
We present a bouquet of computationally efficient and analytically tractable nondefective subset recovery algorithms. By analyzing the probability of error of the
algorithms, we obtain bounds on the number of tests required for non-defective subset recovery with arbitrarily small probability of error. By comparing with the information theoretic lower bounds, we show that the upper bounds bounds on the number of tests are order-wise tight up to a log(K) factor, where K is the number of defective items. Our analysis accounts for the impact of both the additive noise (false positives) and dilution noise (false negatives). We also provide extensive simulation results that compare the relative performance of the
different algorithms and provide further insights into their practical utility. The
proposed algorithms significantly outperform the straightforward approaches of testing items one-by-one, and of first identifying the defective set and then choosing the non-defective items from the complement set, in terms of the number of measurements required to ensure a given success rate.
We investigate the use of adaptive group testing in the application of finding a
spectrum hole of a specified bandwidth in a given wideband of interest. We propose
a group testing based spectrum hole search algorithm that exploits sparsity in the primary spectral occupancy by testing a group of adjacent sub-bands in a single test. This is enabled by a simple and easily implementable sub-Nyquist sampling scheme for signal acquisition by the cognitive radios. Energy-based hypothesis tests are used to provide an occupancy decision over the group of sub-bands, and this forms the basis of the proposed algorithm to find contiguous spectrum holes of a specified bandwidth. We extend this framework to a multistage sensing algorithm that can be employed in a variety of spectrum sensing scenarios, including non-contiguous spectrum hole search. Our analysis allows one to identify the sparsity and SNR regimes where group testing can lead to significantly lower detection delays compared to a conventional bin-by-bin energy detection scheme. We illustrate the performance of the proposed algorithms via Monte Carlo simulations.
|
146 |
Optimisation physique et logique de systèmes de production / Physical and logical optimization of production systemsBernate Lara, Andres Felipe 04 April 2014 (has links)
Les travaux de cette thèse sont articulés autour du problème d’ordonnancement de tâches dans un type d’atelier de structure complexe peu étudié dans la littérature. Cet atelier est de composition hybride : chaque étage de l’atelier a une ou plusieurs machines. Les principales contraintes considérées sont le traitement par lots et la minimisation du retard total. Les méthodes de résolution de ce problème sont intégrées dans les systèmes d’aide à la décision du programme de recherche du Groupe SouffletEtant donné la structure complexe du type d’atelier considéré, nous avons décomposé ce dernier afin d’étudier plus particulièrement le problème d’ordonnancement sur machines parallèles identiques. Différentes méthodes de résolution sont testées. Les résultats sont analysés afin de proposer une classification d’instances et de méthodes de résolution. Les problèmes étudiés sont résolus de manière exacte et approchée. Différentes méthodes ont été testées : des recherches itératives, des algorithmes tabous, des méthodes évolutionnaires. Les conclusions de la résolution du problème d’ordonnancement des machines parallèles sont utilisées pour construire des méthodes à deux niveaux pour le problème complexe d’ordonnancement. Les résultats montrent que les algorithmes trouvent des solutions de bonne qualité pour le problème traité. De la même manière, de problématiques industrielles similaires sont traitées, dans l’objectif d’optimiser le fonctionnement du centre de recherche / This thesis considers a complex workshop scheduling problem, which is rarely studied to our knowledge. This workshop has a hybrid composition : one or several machines are available at each stage. Main considered constraints are batch processing and total tardiness minimization. Solution methods are embedded on the information system of research program of Soufflet Group. Given the complex structure of the workshop, it has been split in order to study the parallel machines scheduling problem individually. Different solution methods are developed. Obtained results are used to build a classification of instances and solution methods. To solve described problems, exact and approach solution methods are proposed. We have adapted iterated search, tabu search, genetic algorithms, … Findings from solving parallel machines scheduling problem are employed to develop a two levels solution method for the described flow shop problem. Results show the performance of developed algorithms to find good quality solutions for described scheduling problem. Similarly, industrial problems are considered, in order to optimize operational behavior of research center
|
147 |
The k-hop connected dominating set problem: approximation algorithms and hardness results / O problema do conjunto dominante conexo com k-saltos: aproximação e complexidadeRafael Santos Coelho 13 June 2017 (has links)
Let G be a connected graph and k be a positive integer. A vertex subset D of G is a k-hop connected dominating set if the subgraph of G induced by D is connected, and for every vertex v in G, there is a vertex u in D such that the distance between v and u in G is at most k. We study the problem of finding a minimum k-hop connected dominating set of a graph (Mink-CDS). We prove that Mink-CDS is NP-hard on planar bipartite graphs of maximum degree 4. We also prove that Mink-CDS is APX-complete on bipartite graphs of maximum degree 4. We present inapproximability thresholds for Mink-CDS on bipar- tite and on (1, 2)-split graphs. Interestingly, one of these thresholds is a parameter of the input graph which is not a function of its number of vertices. We also discuss the complex- ity of computing this graph parameter. On the positive side, we show an approximation algorithm for Mink-CDS. When k = 1, we present two new approximation algorithms for the weighted version of the problem, one of them restricted to graphs with a poly- nomially bounded number of minimal separators. Finally, also for the weighted variant of the problem where k = 1, we discuss an integer linear programming formulation and conduct a polyhedral study of its associated polytope. / Seja G um grafo conexo e k um inteiro positivo. Um subconjunto D de vértices de G é um conjunto dominante conexo de k-saltos se o subgrafo de G induzido por D é conexo e se, para todo vértice v em G, existe um vértice u em D a uma distância não maior do que k de v. Estudamos neste trabalho o problema de se encontrar um conjunto dominante conexo de k-saltos com cardinalidade mínima (Mink-CDS). Provamos que Mink-CDS é NP-difícil em grafos planares bipartidos com grau máximo 4. Mostramos que Mink-CDS é APX-completo em grafos bipartidos com grau máximo 4. Apresentamos limiares de inaproximabilidade para Mink-CDS para grafos bipartidos e (1, 2)-split, sendo que um desses é expresso em função de um parâmetro independente da ordem do grafo. Também discutimos a complexidade computacional do problema de se computar tal parâmetro. No lado positivo, propomos um algoritmo de aproximação para Mink-CDS cuja razão de aproximação é melhor do que a que se conhecia para esse problema. Finalmente, quando k = 1, apresentamos dois novos algoritmos de aproximação para a versão do problema com pesos nos vértices, sendo que um deles restrito a classes de grafos com um número polinomial de separadores minimais. Além disso, discutimos uma formulação de programação linear inteira para essa versão do problema e provamos resultados poliédricos a respeito de algumas das desigualdades que constituem o politopo associado à formulação.
|
148 |
Algorithmic and Graph-Theoretic Approaches for Optimal Sensor Selection in Large-Scale SystemsLintao Ye (9741149) 15 December 2020 (has links)
<div>Using sensor measurements to estimate the states and parameters of a system is a fundamental task in understanding the behavior of the system. Moreover, as modern systems grow rapidly in scale and complexity, it is not always possible to deploy sensors to measure all of the states and parameters of the system, due to cost and physical constraints. Therefore, selecting an optimal subset of all the candidate sensors to deploy and gather measurements of the system is an important and challenging problem. In addition, the systems may be targeted by external attackers who attempt to remove or destroy the deployed sensors. This further motivates the formulation of resilient sensor selection strategies. In this thesis, we address the sensor selection problem under different settings as follows. </div><div><br></div><div>First, we consider the optimal sensor selection problem for linear dynamical systems with stochastic inputs, where the Kalman filter is applied based on the sensor measurements to give an estimate of the system states. The goal is to select a subset of sensors under certain budget constraints such that the trace of the steady-state error covariance of the Kalman filter with the selected sensors is minimized. We characterize the complexity of this problem by showing that the Kalman filtering sensor selection problem is NP-hard and cannot be approximated within any constant factor in polynomial time for general systems. We then consider the optimal sensor attack problem for Kalman filtering. The Kalman filtering sensor attack problem is to attack a subset of selected sensors under certain budget constraints in order to maximize the trace of the steady-state error covariance of the Kalman filter with sensors after the attack. We show that the same results as the Kalman filtering sensor selection problem also hold for the Kalman filtering sensor attack problem. Having shown that the general sensor selection and sensor attack problems for Kalman filtering are hard to solve, our next step is to consider special classes of the general problems. Specifically, we consider the underlying directed network corresponding to a linear dynamical system and investigate the case when there is a single node of the network that is affected by a stochastic input. In this setting, we show that the corresponding sensor selection and sensor attack problems for Kalman filtering can be solved in polynomial time. We further study the resilient sensor selection problem for Kalman filtering, where the problem is to find a sensor selection strategy under sensor selection budget constraints such that the trace of the steady-state error covariance of the Kalman filter is minimized after an adversary removes some of the deployed sensors. We show that the resilient sensor selection problem for Kalman filtering is NP-hard, and provide a pseudo-polynomial-time algorithm to solve it optimally.</div><div> </div><div> Next, we consider the sensor selection problem for binary hypothesis testing. The problem is to select a subset of sensors under certain budget constraints such that a certain metric of the Neyman-Pearson (resp., Bayesian) detector corresponding to the selected sensors is optimized. We show that this problem is NP-hard if the objective is to minimize the miss probability (resp., error probability) of the Neyman-Pearson (resp., Bayesian) detector. We then consider three optimization objectives based on the Kullback-Leibler distance, J-Divergence and Bhattacharyya distance, respectively, in the hypothesis testing sensor selection problem, and provide performance bounds on greedy algorithms when applied to the sensor selection problem associated with these optimization objectives.</div><div> </div><div> Moving beyond the binary hypothesis setting, we also consider the setting where the true state of the world comes from a set that can have cardinality greater than two. A Bayesian approach is then used to learn the true state of the world based on the data streams provided by the data sources. We formulate the Bayesian learning data source selection problem under this setting, where the goal is to minimize the cost spent on the data sources such that the learning error is within a certain range. We show that the Bayesian learning data source selection is also NP-hard, and provide greedy algorithms with performance guarantees.</div><div> </div><div> Finally, in light of the COVID-19 pandemic, we study the parameter estimation measurement selection problem for epidemics spreading in networks. Here, the measurements (with certain costs) are collected by conducting virus and antibody tests on the individuals in the epidemic spread network. The goal of the problem is then to optimally estimate the parameters (i.e., the infection rate and the recovery rate of the virus) in the epidemic spread network, while satisfying the budget constraint on collecting the measurements. Again, we show that the measurement selection problem is NP-hard, and provide approximation algorithms with performance guarantees.</div>
|
149 |
Computational and communication complexity of geometric problemsHajiaghaei Shanjani, Sima 26 July 2021 (has links)
In this dissertation, we investigate a number of geometric problems in different settings. We present lower bounds and approximation algorithms for geometric problems in sequential and distributed settings.
For the sequential setting, we prove the first hardness of approximation results for the following problems:
\begin{itemize}
\item Red-Blue Geometric Set Cover is APX-hard when the objects are axis-aligned rectangles.
\item Red-Blue Geometric Set Cover cannot be approximated to within $2^{\log^{1-1/{(\log\log m)^c}}m}$ in polynomial time for any constant $c < 1/2$, unless $P=NP$, when the given objects are $m$ triangles or convex objects. This shows that Red-Blue Geometric Set Cover is a harder problem than Geometric Set Cover for some class of objects.
\item Boxes Class Cover is APX-hard.
\end{itemize}
We also define MaxRM-3SAT, a restricted version of Max3SAT, and we prove that this problem is APX-hard. This problem might be interesting in its own right.\\
In the distributed setting, we define a new model, the fixed-link model, where each processor has a position on the plane and processors can communicate to each other if and only if there is an edge between them. We motivate the model and study a number of geometric problems in this model. We prove lower bounds on the communication complexity of the problems in the fixed-link model and present approximation algorithms for them.
We prove lower bounds on the number of expected bits required for any randomized algorithm in the fixed-link model with $n$ nodes to solve the following problems, when the communication is in the asynchronous KT1 model:
\begin{itemize}
\item $\Omega(n^2/\log n)$ expected bits of communication are required for solving Diameter, Convex Hull, or Closest Pair, even if the graph has only a linear number of edges.
\item $\Omega( min\{n^2,1/\epsilon\})$ expected bits of communications are required for approximating Diameter within a $1-\epsilon$ factor of optimal, even if the graph is planar.
\item $\Omega(n^2)$ bits of communications is required for approximating Closest Pair in a graph on an $[n^c] \times [n^c]$ grid, for any constant $c>1+1/(2\lg n)$, within $\frac{n^{c-1/2}}{4}-\epsilon$ factor of optimal, even if the graph is planar.
\end{itemize}
We also present approximation algorithms in geometric communication networks with $n$ nodes, when the communication is in the asynchronous CONGEST KT1 model:
\begin{itemize}
\item An $\epsilon$-kernel, and consequently $(1-\epsilon)$-\diamapprox~ and \ep -Approximate Hull with $O(\frac{n}{\sqrt{\epsilon}})$ messages plus the costs of constructing a spanning tree.
\item An $\frac{n^c}{\sqrt{\frac{k}{2}}}$-Approximate Closest Pair on an $[n^c] \times [n^c]$ grid , for a constant $c>1/2$, plus the cost of computing a spanning tree, for any $k\leq {n-1}$.
\end{itemize}
We also define a new version of the two-party communication problem, Path Computation, where two parties communicate through a path. We prove a lower bound on the communication complexity of this problem. / Graduate
|
150 |
Information diffusion and opinion dynamics in social networks / Dissémination de l’information et dynamique des opinions dans les réseaux sociauxLouzada Pinto, Julio Cesar 14 January 2016 (has links)
La dissémination d'information explore les chemins pris par l'information qui est transmise dans un réseau social, afin de comprendre et modéliser les relations entre les utilisateurs de ce réseau, ce qui permet une meilleur compréhension des relations humaines et leurs dynamique. Même si la priorité de ce travail soit théorique, en envisageant des aspects psychologiques et sociologiques des réseaux sociaux, les modèles de dissémination d'information sont aussi à la base de plusieurs applications concrètes, comme la maximisation d'influence, la prédication de liens, la découverte des noeuds influents, la détection des communautés, la détection des tendances, etc. Cette thèse est donc basée sur ces deux facettes de la dissémination d'information: nous développons d'abord des cadres théoriques mathématiquement solides pour étudier les relations entre les personnes et l'information, et dans un deuxième moment nous créons des outils responsables pour une exploration plus cohérente des liens cachés dans ces relations. Les outils théoriques développés ici sont les modèles de dynamique d'opinions et de dissémination d'information, où nous étudions le flot d'informations des utilisateurs dans les réseaux sociaux, et les outils pratiques développés ici sont un nouveau algorithme de détection de communautés et un nouveau algorithme de détection de tendances dans les réseaux sociaux / Our aim in this Ph. D. thesis is to study the diffusion of information as well as the opinion dynamics of users in social networks. Information diffusion models explore the paths taken by information being transmitted through a social network in order to understand and analyze the relationships between users in such network, leading to a better comprehension of human relations and dynamics. This thesis is based on both sides of information diffusion: first by developing mathematical theories and models to study the relationships between people and information, and in a second time by creating tools to better exploit the hidden patterns in these relationships. The theoretical tools developed in this thesis are opinion dynamics models and information diffusion models, where we study the information flow from users in social networks, and the practical tools developed in this thesis are a novel community detection algorithm and a novel trend detection algorithm. We start by introducing an opinion dynamics model in which agents interact with each other about several distinct opinions/contents. In our framework, agents do not exchange all their opinions with each other, they communicate about randomly chosen opinions at each time. We show, using stochastic approximation algorithms, that under mild assumptions this opinion dynamics algorithm converges as time increases, whose behavior is ruled by how users choose the opinions to broadcast at each time. We develop next a community detection algorithm which is a direct application of this opinion dynamics model: when agents broadcast the content they appreciate the most. Communities are thus formed, where they are defined as groups of users that appreciate mostly the same content. This algorithm, which is distributed by nature, has the remarkable property that the discovered communities can be studied from a solid mathematical standpoint. In addition to the theoretical advantage over heuristic community detection methods, the presented algorithm is able to accommodate weighted networks, parametric and nonparametric versions, with the discovery of overlapping communities a byproduct with no mathematical overhead. In a second part, we define a general framework to model information diffusion in social networks. The proposed framework takes into consideration not only the hidden interactions between users, but as well the interactions between contents and multiple social networks. It also accommodates dynamic networks and various temporal effects of the diffusion. This framework can be combined with topic modeling, for which several estimation techniques are derived, which are based on nonnegative tensor factorization techniques. Together with a dimensionality reduction argument, this techniques discover, in addition, the latent community structure of the users in the social networks. At last, we use one instance of the previous framework to develop a trend detection algorithm designed to find trendy topics in a social network. We take into consideration the interaction between users and topics, we formally define trendiness and derive trend indices for each topic being disseminated in the social network. These indices take into consideration the distance between the real broadcast intensity and the maximum expected broadcast intensity and the social network topology. The proposed trend detection algorithm uses stochastic control techniques in order calculate the trend indices, is fast and aggregates all the information of the broadcasts into a simple one-dimensional process, thus reducing its complexity and the quantity of necessary data to the detection. To the best of our knowledge, this is the first trend detection algorithm that is based solely on the individual performances of topics
|
Page generated in 0.1338 seconds