Global ETD Search

11	Identifying Communities as Core-Periphery Structures in Evolving Networks Kantamneni, Anusha 20 October 2016 (has links) No description available. Computer Science Community Detection Dynamic Networks Core-Periphery Structures
12	Parallel Mining and Analysis of Triangles and Communities in Big Networks Arifuzzaman, S M. 19 August 2016 (has links) A network (graph) is a powerful abstraction for interactions among entities in a system. Examples include various social, biological, collaboration, citation, and co-purchase networks. Real-world networks are often characterized by an abundance of triangles and the existence of well-structured communities. Thus, counting triangles and detecting communities in networks have become important algorithmic problems in network mining and analysis. In the era of big data, the network data emerged from numerous scientific disciplines are very large. Online social networks such as Twitter and Facebook have millions to billions of users. Such massive networks often do not fit in the main memory of a single machine, and the existing sequential methods might take a prohibitively large runtime. This motivates the need for scalable parallel algorithms for mining and analysis. We design MPI-based distributed-memory parallel algorithms for counting triangles and detecting communities in big networks and present related analysis. The dissertation consists of four parts. In Part I, we devise parallel algorithms for counting and enumerating triangles. The first algorithm employs an overlapping partitioning scheme and novel load-balancing schemes leading to a fast algorithm. We also design a space-efficient algorithm using non-overlapping partitioning and an efficient communication scheme. This space efficiency allows the algorithm to work on even larger networks. We then present our third parallel algorithm based on dynamic load balancing. All these algorithms work on big networks, scale to a large number of processors, and demonstrate very good speedups. An important property, very related to triangles, of many real-world networks is high transitivity, which states that two nodes having common neighbors tend to become neighbors themselves. In Part II, we characterize networks by quantifying the number of common neighbors and demonstrate its relationship to community structure of networks. In Part III, we design parallel algorithms for detecting communities in big networks. We propose efficient load balancing and communication approaches, which lead to fast and scalable algorithms. Finally, in Part IV, we present scalable parallel algorithms for a useful graph preprocessing problem-- converting edge list to adjacency list. We present non-trivial parallelization with efficient HPC-based techniques leading to fast and space-efficient algorithms. / Ph. D. Network Mining Parallel Algorithm Triangle Counting Community Detection Big Data
13	Hierarchical Portfolio Allocation with Community Detection / Hierarkisk Portföljallokering med Community Detection Fatah, Kiar, Nazar, Taariq January 2022 (has links) Traditionally, practitioners use modern portfolio theory to invest optimally. Its appeal lies in its mathematical simplicity and elegance. However, despite its beauty, the theory it is plagued with many problems, which are in combination called the Markowitz curse. Lopéz de Prado introduced Hierarchical Risk Parity (HRP), which deals with the problems of Markwitz’s theory by introducing hierarchical structures into the portfolio allocation step.This thesis is a continuation of the HRP. In contrast to De Prado’s work, we build hierarchical clusters that do not have a predetermined structure and also use portfolio allocation methods that incorporates the mean estimates. We use an algorithm called community detection which is derived from graph theory. The algorithm generates clusters purely from the data without user specification. A problem to overcome is the correct identification of the market mode, whichis non-trivial for futures contracts. This is a serious problem since the specific clustering method we use hinges on correctly identifying this mode. Therefore, in this thesis we introduce a method of finding the market mode for futures data. Finally, we compare the portfolios constructed from the hierarchical clusters to traditional methods. We find that the hierarchical portfolios results in slightly worse performance than the traditional methods when we incorporate the mean and better performance for risk based portfolios. In general, we find that the hierarchical portfolios result in less extreme outcomes. / Traditionellt används modern portföljteori för attinvestera optimalt. Anledningen till detta ligger i dess matematiska enkelhet och elegans. Men trots sina många fördelar är teorin plågad med flertal problem, som i kombination kallas för Markowitz-förbannelsen. Lopéz de Prado introducerade Hierarchical Risk Parity (HRP), som påstås tacköa problemen med Markwitz teori genom att införa hierarkiska strukturer i portföljallokeringssteget. Detta examensarbete är en fortsättning på HRP. I motsats till De Prados arbete bygger vi hierarkiska kluster som inte har en förutbestämd struktur och använder även portföljallokeringsmetoder som inkluderar medelskattningarna. Vi använder en algoritm som kallas communitydetection som härrör från grafteori. Algoritmen genererar kluster enbart från data utan användarspecifikation. Ett problem att överkomma är den korrekta identifieringen av marknadsläget, vilket inte är trivialt för terminskontrakt. Detta är ett allvarligt problem eftersom den specifika klustringsmetoden vi använder hänger samman med att korrekt identifiera detta läge. Därför introducerar vi i denna avhandling en metod för att hitta marknadsläget för terminsdata. Slutligen jämför vi portföljerna konstruerade från de hierarkiska klustren med traditionella metoder. Vi finner att de hierarkiska portföljerna ger något sämre prestandaän de traditionella metoderna när vi tar med medelvärdet och bättre prestanda för riskbaserade portföljer. Generellt finner vi att de hierarkiska portföljerna resulterar i mindre extrema utfall. Portfolio Allocation Hierarchical Clustering Graph Theory Community Detection Modern Portfolio Theory Portföljallokering Hierarkisk klustring Grafteori Community Detection Modern Portföljteori Computational Mathematics Beräkningsmatematik
14	Adaptive Weights Clustering and Community Detection Besold, Franz Jürgen 19 April 2023 (has links) Die vorliegende Dissertation widmet sich der theoretischen Untersuchung zweier neuer Algorithmen für Clustering und Community Detection: AWC (Adaptive Weights Clustering) und AWCD (Adaptive Weights Community Detection). Ein zentraler Aspekt sind dabei die Raten der Konsistenz. Bei der Betrachtung von AWC steht die Tiefe Lücke zwischen den Clustern, also die relative Differenz der jeweiligen Dichten, im Vordergrund. Bis auf logarithmische Faktoren ist die erreichte Konsistenzrate optimal. Dies erweitert die niedrigdimensionalen Ergebnisse von Efimov, Adamyan and Spokoiny (2019) auf das Mannigfaltigkeitenmodell und berücksichtigt darüber hinaus viel allgemeinere Bedingungen an die zugrunde liegende Dichte und die Form der Cluster. Insbesondere wird der Fall betrachtet, bei dem zwei Punkte des gleichen Clusters nahe an dessen Rand liegen. Zudem werden Ergebnisse für endliche Stichproben und die optimale Wahl des zentralen Parameters λ diskutiert. Bei der Untersuchung von AWCD steht die Asymptotik der Differenz θ − ρ zwischen den beiden Bernoulli Parametern eines symmetrischen stochastischen Blockmodells im Mittelpunkt. Es stellt sich heraus, dass das Gebiet der starken Konsistenz bei weitem nicht optimal ist. Es werden jedoch zwei Modifikationen des Algorithmus vorgeschlagen: Zum einen kann der Bias der beteiligten Schätzer minimiert werden. Zum anderen schlagen wir vor, die Größe der initialen Schätzung der Struktur der Gruppen zu erhöhen, indem auch längere Pfade mit berücksichtigt werden. Mithilfe dieser Modifikationen erreicht der Algorithmus eine nahezu optimale Konsistenzrate. Teilweise können diese Ergebnisse auch auf allgemeinere stochastische Blockmodelle erweitert werden. Für beide Probleme illustrieren und validieren wir außerdem die theoretischen Resultate durch umfangreiche Experimente. Abschließend lässt sich sagen, dass die vorliegende Arbeit die Lücke zwischen theoretischen und praktischen Ergebnissen für die Algorithmen AWC und AWCD schließt. Insbesondere sind beide Algorithmen nach einigen Modifikationen auf relevanten Modellen konsistent mit einer nahezu optimalen Rate. / This thesis presents a theoretical study of two novel algorithms for clustering and community detection: AWC (Adaptive Weights Clustering) and AWCD (Adaptive Weights Community Detection). Most importantly, we discuss rates of consistency. For AWC, we focus on the asymptotics of the depth ε of the gap between clusters, i.e. the relative difference between the density level of the clusters and the density level of the area between them. We show that AWC is consistent with a nearly optimal rate. This extends the low-dimensional results of Efimov, Adamyan and Spokoiny (2019) to the manifold model while also considering much more general assumptions on the underlying density and the shape of clusters. In particular, we also consider the case of two points in the same cluster that are relatively close to the boundary. Moreover, we provide finite sample guarantees as well as the optimal tuning parameter λ. For AWCD, we consider the asymptotics of the difference θ − ρ between the two Bernoulli parameters of a symmetric stochastic block model. As it turns out, the resulting regime of strong consistency is far from optimal. However, we propose two major modifications to the algorithm: Firstly, we discuss an approach to minimize the bias of the involved estimates. Secondly, we suggest increasing the starting neighborhood guess of the algorithm by taking into account paths of minimal path length k. Using these modifications, we are able to show that AWCD achieves a nearly optimal rate of strong consistency. We partially extend these results to more general stochastic block models. For both problems, we illustrate and validate the theoretical study through a wide range of numerical experiments. To summarize, this thesis closes the gap between the practical and theoretical studies for AWC and AWCD. In particular, after some modifications, both algorithms exhibit a nearly optimal performance on relevant models. Clustering Adaptive Gewichte Community Detection Likelihood-Quotienten-Test Clustering Adaptive Weights Community Detection Likelihood-ratio test 510 Mathematik SK 840 ddc:510
15	Agglomerative clustering for community detection in dynamic graphs Godbole, Pushkar J. 27 May 2016 (has links) Agglomerative Clustering techniques work by recursively merging graph vertices into communities, to maximize a clustering quality metric. The metric of Modularity coined by Newman and Girvan, measures the cluster quality based on the premise that, a cluster has collections of vertices more strongly connected internally than would occur from random chance. Various fast and efficient algorithms for community detection based on modularity maximization have been developed for static graphs. However, since many (contemporary) networks are not static but rather evolve over time, the static approaches are rendered inappropriate for clustering of dynamic graphs. Modularity optimization in changing graphs is a relatively new field that entails the need to develop efficient algorithms for detection and maintenance of a community structure while minimizing the “Size of change” and computational effort. The objective of this work was to develop an efficient dynamic agglomerative clustering algorithm that attempts to maximize modularity while minimizing the “size of change” in the transitioning community structure. First we briefly discuss the previous memoryless dynamic reagglomeration approach with localized vertex freeing and illustrate its performance and limitations. Then we describe the new backtracking algorithm followed by its performance results and observations. In experimental analysis of both typical and pathological cases, we evaluate and justify various backtracking and agglomeration strategies in context of the graph structure and incoming stream topologies. Evaluation of the algorithm on social network datasets, including Facebook (SNAP) and PGP Giant Component networks shows significantly improved performance over its conventional static counterpart in terms of execution time, Modularity and Size of Change. Dynamic graphs Agglomeration Clustering Modularity Social networks Community detection Stinger Streaming algorithms
16	Using Network Science to Estimate the Cost of Architectural Growth Dabkowski, Matthew Francis January 2016 (has links) Between 1997 and 2009, 47 major defense acquisition programs experienced cost overruns of at least 15% or 30% over their current or original baseline estimates, respectively (GAO, 2011, p. 1). Known formally as a Nunn-McCurdy breach (GAO, 2011, p. 1), the reasons for this excessive growth are myriad, although nearly 70% of the cases identified engineering and design issues as a contributing factor (GAO, 2011, p. 5). Accordingly, Congress legislatively acknowledged the need for change in 2009 with the passage of the Weapon Systems Acquisition Reform Act (WSARA, 2009), which mandated additional rigor and accountability in early life cycle (or Pre-Milestone A) cost estimation. Consistent with this effort, the Department of Defense has recently required more system specification earlier in the life cycle, notably the submission of detailed architectural models, and this has created opportunities for new approaches. In this dissertation, I describe my effort to transform one such model (or view), namely the SV-3, into computational knowledge that can be leveraged in Pre-Milestone A cost estimation and risk analysis. The principal contribution of my work is Algorithm 3-a novel, network science-based method for estimating the cost of unforeseen architectural growth in defense programs. Specifically, using number theory, network science, simulation, and statistical analysis, I simultaneously find the best fitting probability mass functions and strengths of preferential attachment for an incoming subsystem's interfaces, and I apply blockmodeling to find the SV-3's globally optimal macrostructure. Leveraging these inputs, I use Monte Carlo simulation and the Constructive Systems Engineering Cost Model to estimate the systems engineering effort required to connect a new subsystem to the existing architecture. This effort is chronicled by the five articles given in Appendices A through C, and it is summarized in Chapter 2.In addition to Algorithm 3, there are several important, tangential outcomes of this work, including: an explicit connection between Model Based System Engineering and parametric cost modeling, a general procedure for organizations to improve the measurement reliability of their early life cycle cost estimates, and several exact and heuristic methods for the blockmodeling of one-, two-, and mixed-mode networks. More generally, this research highlights the benefits of applying network science to systems engineering, and it reinforces the value of viewing architectural models as computational objects. community detection COSYSMO DoDAF network science preferential attachment Systems & Industrial Engineering blockmodeling
17	Narrowing the gap between network models and real complex systems Viamontes Esquivel, Alcides January 2014 (has links) Simple network models that focus only on graph topology or, at best, basic interactions are often insufficient to capture all the aspects of a dynamic complex system. In this thesis, I explore those limitations, and some concrete methods of resolving them. I argue that, in order to succeed at interpreting and influencing complex systems, we need to take into account slightly more complex parts, interactions and information flows in our models.This thesis supports that affirmation with five actual examples of applied research. Each study case takes a closer look at the dynamic of the studied problem and complements the network model with techniques from information theory, machine learning, discrete maths and/or ergodic theory. By using these techniques to study the concrete dynamics of each system, we could obtain interesting new information. Concretely, we could get better models of network walks that are used on everyday applications like journal ranking. We could also uncover asymptotic characteristics of an agent-based information propagation model which we think is the basis for things like belief propaga-tion or technology adoption on society. And finally, we could spot associations between antibiotic resistance genes in bacterial populations, a problem which is becoming more serious every day. complex systems network science community detection model selection signficance analysis ergodicity
18	LARGE-SCALE NETWORK ANALYSIS FOR ONLINE SOCIAL BRAND ADVERTISING Zhang, Kunpeng, Bhattacharyya, Siddhartha, Ram, Sudha 12 1900 (has links) This paper proposes an audience selection framework for online brand advertising based on user activities on social media platforms. It is one of the first studies to our knowledge that develops and analyzes implicit brand-brand networks for online brand advertising. This paper makes several contributions. We first extract and analyze implicit weighted brand-brand networks, representing interactions among users and brands, from a large dataset. We examine network properties and community structures and propose a framework combining text and network analyses to find target audiences. As a part of this framework, we develop a hierarchical community detection algorithm to identify a set of brands that are closely related to a specific brand. This latter brand is referred to as the "focal brand." We also develop a global ranking algorithm to calculate brand influence and select influential brands from this set of closely related brands. This is then combined with sentiment analysis to identify target users from these selected brands. To process large-scale datasets and networks, we implement several MapReduce-based algorithms. Finally, we design a novel evaluation technique to test the effectiveness of our targeting framework. Experiments conducted with Facebook data show that our framework provides significant performance improvements in identifying target audiences for focal brands. Online advertising brand-brand networks community detection audience selection sentiment analysis
19	Strategies to improve results from genomic analyzes in small dairy cattle populations / Estratégias para aprimorar os resultados de análises genômicas em pequenas populações de gado de leite Perez, Bruno da Costa 12 February 2019 (has links) The main objective of the present thesis was to propose a procedure to optimize genotypic information value in small dairy cattle populations and investigate the impacts of including genotypes and phenotypes of cows chosen by different strategies over the performance of genome-wide association studies and genomic selection. The first study was designed to propose innovative methods that could support alternative inference over population structure in livestock populations using graph theory. It reviews general aspects of graphs and how each element relates to theoretical and practical concepts of traditional pedigree structure studies. This chapter also presents a computational application (PedWorks) built in Python 2.7 programming language. It demonstrates that graph theory is a suitable framework for modeling pedigree data. The second study was aimed asses how graph community detection algorithms could help unraveling population partition. This new concept was considered to develop a method for stablishing new cow genotyping strategies (community-based). Results obtained showed that accounting for population structure using community detection for choosing cows to get included in the reference population may improve results from genomic selection. Methods presented are easily applied to animal breeding programs. The third study aimed to observe the impacts of different genotyping strategies (including the proposed community-based) over the ability to detect quantitative trait loci in genome-wide association studies. Distinct models for genomic analysis were also tested. Results obtained showed that including cows with extreme phenotypic observations proportionally sampled from communities can improve the ability to detect quantitative trait loci in genomic evaluations. The last chapter was designed study possible deleterious impacts of the presence of preferential treatment (in different levels) in a small dairy cattle population environment over accuracy and bias of genomic selection. Different proportions of cows with artificially increased phenotypic observations were included in the reference population. Observed results suggest that both accuracy and bias are affected by the presence of preferential treatment of cows in the evaluated population. Preferential treatment is expected to have much more effect on the performance of genomic selection in small than in large dairy cattle populations for the higher (proportional) value of the information from cows in such reduced-size breeds. / O principal objetivo da presente tese foi propor um procedimento capaz de otimizar o valor da informação genotípica em pequenas populações de gado de leite e investigar os impactos da inclusão de genótipos e fenótipos de vacas escolhidas por diferentes estratégias sobre o desempenho de estudos de associação genômica ampla e seleção genômica. O primeiro estudo foi delineado para elaborar um método que permita uma inferência alternativa sobre a estrutura populacional de populações de animais de produção usando como base a teoria de grafos. Este revê os aspectos gerais de grafos e como cada elemento se relaciona com conceitos teóricos e práticos de estudos de estrutura de pedigree tradicionais. Este capítulo também apresenta um aplicativo computacional (PedWorks) construído em linguagem de programação Python 2.7. Resultados observados demonstraram que a teoria de grafos é uma estrutura adequada para modelar dados de pedigree. O segundo estudo teve como objetivo avaliar como os algoritmos de detecção de comunidades de grafos poderiam ajudar revelar o particionamento de uma população. Este novo conceito foi considerado para desenvolver um método para o estabelecimento de novas estratégias de genotipagem de vacas (baseadas em comunidades). Os resultados obtidos mostraram que a contabilização da estrutura populacional usando a detecção de comunidades para a escolha de vacas a serem incluídas na população de referência pode melhorar os resultados da seleção genômica. Os métodos apresentados sugerem ser facilmente introduzidos em programas de melhoramento animal. O terceiro estudo teve como objetivo observar os impactos de diferentes estratégias de genotipagem (incluindo a anteriormente proposta baseada em comunidades) sobre a capacidade de detectar locos relacionados características quantitativas por meio de estudos de associação genômica ampla. Modelos distintos para análise genômica também foram testados. Os resultados obtidos mostraram que incluir vacas com observações fenotípicas extremas amostradas proporcionalmente das comunidades pode melhorar a capacidade de detectar locos de características quantitativas em avaliações genômicas. O último capítulo foi desenhado para estudar possíveis impactos deletérios da presença de tratamento preferencial no ambiente de pequenas populações de gado leiteiro sobre resultados da seleção genômica. Diferentes proporções de vacas com observações fenotípicas aumentadas artificialmente foram incluídas na população de referência. Os resultados observados sugerem que tanto a acurácia quanto o viés são afetados pela presença de tratamento preferencial de vacas na população avaliada. Espera-se que o tratamento preferencial tenha muito mais efeito sobre o desempenho da seleção genômica em populações pequenas de gado de leite que em grandes populações devido a maior relevância das informações de vacas em raças de tamanho reduzido. Associação genômica ampla Community detection Detecção de comunidade Fenotipagem Genome-wide association Genotipagem Genotyping Phenotyping Simulação Simulation
20	Boosting Gene Expression Clustering with System-Wide Biological Information and Deep Learning Cui, Hongzhu 24 April 2019 (has links) Gene expression analysis provides genome-wide insights into the transcriptional activity of a cell. One of the first computational steps in exploration and analysis of the gene expression data is clustering. With a number of standard clustering methods routinely used, most of the methods do not take prior biological information into account. Here, we propose a new approach for gene expression clustering analysis. The approach benefits from a new deep learning architecture, Robust Autoencoder, which provides a more accurate high-level representation of the feature sets, and from incorporating prior system-wide biological information into the clustering process. We tested our approach on two gene expression datasets and compared the performance with two widely used clustering methods, hierarchical clustering and k-means, and with a recent deep learning clustering approach. Our approach outperformed all other clustering methods on the labeled yeast gene expression dataset. Furthermore, we showed that it is better in identifying the functionally common clusters than k-means on the unlabeled human gene expression dataset. The results demonstrate that our new deep learning architecture can generalize well the specific properties of gene expression profiles. Furthermore, the results confirm our hypothesis that the prior biological network knowledge is helpful in the gene expression clustering. biological network community detection deep learning gene expression clustering robust autoencoder

Search results