Global ETD Search

41	Novel Models and Efficient Algorithms for Network-based Optimization in Biomedical Applications Sajjadi, Seyed Javad 30 June 2014 (has links) We introduce and study a novel graph optimization problem to search for multiple cliques with the maximum overall weight, to which we denote as the Maximum Weighted Multiple Clique Problem (MWMCP). This problem arises in research involving network-based data mining, specifically, in bioinformatics where complex diseases, such as various types of cancer and diabetes, are conjectured to be triggered and influenced by a combination of genetic and environmental factors. To integrate potential effects from interplays among underlying candidate factors, we propose a new network-based framework to identify effective biomarkers by searching for "groups" of synergistic risk factors with high predictive power to disease outcome. An interaction network is constructed with vertex weight representing individual predictive power of candidate factors and edge weight representing pairwise synergistic interaction among factors. This network-based biomarker identification problem is then formulated as a MWMCP. To achieve near optimal solutions for large-scale networks, an analytical algorithm based on column generation method as well as a fast greedy heuristic have been derived. Also, to obtain its exact solutions, an advanced branch-price-and-cut algorithm is designed and solved after studying the properties of the problem. Our algorithms for MWMCP have been implemented and tested on random graphs and promising results have been obtained. They also are used to analyze two biomedical datasets: a Type 1 Diabetes (T1D) dataset from the Diabetes Prevention Trial-Type 1 (DPT-1) Study, and a breast cancer genomics dataset for metastasis prognosis. The results demonstrate that our network-based methods can identify important biomarkers with better prediction accuracy compared to the conventional feature selection that only considers individual effects. Biomarker Identication Column Generation Combinatorial Optimization Integer Programming Maximum Clique Problem Industrial Engineering
42	Transcriptomic Data Analysis Using Graph-Based Out-of-Core Methods Rogers, Gary L 01 August 2011 (has links) Biological data derived from high-throughput microarrays can be transformed into finite, simple, undirected graphs and analyzed using tools first introduced by the Langston Lab at the University of Tennessee. Transforming raw data can be broken down into three main tasks: data normalization, generation of similarity metrics, and threshold selection. The choice of methods used in each of these steps effect the final outcome of the graph, with respect to size, density, and structure. A number of different algorithms are examined and analyzed to illustrate the magnitude of the effects. Graph-based tools are then used to extract putative gene networks. These tools are loosely based on the concept of clique, which generates clusters optimized for density. Innovative additions to the paraclique algorithm, developed at the Langston Lab, are introduced to generate results that have highest average correlation or highest density. A new suite of algorithms is then presented that exploits the use of a priori gene interactions. Aptly named the anchored analysis toolkit, these algorithms use known interactions as anchor points for generating subgraphs, which are then analyzed for their graph structure. This results in clusters that might have otherwise been lost in noise. A main product of this thesis is a novel collection of algorithms to generate exact solutions to the maximum clique problem for graphs that are too large to fit within core memory. No other algorithms are currently known that produce exact solutions to this problem for extremely large graphs. A combination of in-core and out-of-core techniques is used in conjunction with a distributed-memory programming model. These algorithms take into consideration such pitfalls as external disk I/O and hardware failure and recovery. Finally, a web-based tool is described that provides researchers access the aforementioned algorithms. The Graph Algorithms Pipeline for Pathway Analysis tool, GrAPPA, was previously developed by the Langston Lab and provides the software needed to take raw microarray data as input and preprocess, analyze, and post-process it in a single package. GrAPPA also provides access to high-performance computing resources, via the TeraGrid. high-throughput microarray out-of-core clique-centric graph theory Theory and Algorithms
43	Predicting Protein Calcium Binding Sites Wang, Xue 01 December 2009 (has links) Calcium is one of the closely relevant metal ions that involves in enormous physicochemical activities in human body, including cell division and apoptosis, muscle contraction, neurotransmitter release, enzyme activation, and blood-clotting. Calcium fulfills its functions through the binding to different classes of calcium-binding proteins. To facilitate our understanding of the roles of calcium in biological systems, and to design novel metal-binding proteins with tailored binding capabilities, it is important to develop computation algorithms to predict calcium-binding sites in different classes of proteins. In literature, calcium-binding sites may be represented by either a spacial point, or the set of residues chelating calcium ion. A thorough statistical analysis of known calcium-binding proteins deposited in Protein Data Bank gives reference values of various parameters characterizing geometric and chemical features in calcium-binding sites including distances, angles, dihedral angles, the Hull property, coordination numbers, ligand types and formal charges. It also reveals clear differences between the well-known EF-hand calcium-binding motif and other calcium-binding motifs. Utilizing the above multiple geometric and chemical parameters in well-formed calcium binding sites, we developed MUG (MUltiple Geometries) program. MUG can re-identify the coordinates of the documented calcium ion and the set of ligand residues. Three previously published data sets were tested. They are comprised of, respectively, 19, 44 and 54 holo protein structures with 48, 92 and 91 documented calcium-binding sites. Defining a "correct hit" as a point within 3.5 angstrom to the documented calcium location, MUG has a sensitivity around 90% and selectivity around 80%. The set of ligand residues (calcium-binding pockets) were identified for 43, 66 and 63 documented calcium ion in these three data set respectively. In order to achieve true prediction, our program was then enhanced to predict calcium-binding pockets in apo (calcium-free) proteins. Our new program MUGSR accounts for the conformational changes involved in calcium-binding pockets before and after the binding of calcium ions. It is able to capture calcium binding pockets that may undergo local conformational changes or side chain torsional rotations, which is validated by referring back to the corresponding holo protein structure sharing more than 98% sequence similarity with the apo protein. Ligand Graph theory Apo Holo Pocket Clique Rotamer Optimization Computer Sciences
44	Scalable Scientific Computing Algorithms Using MapReduce Xiang, Jingen January 2013 (has links) Cloud computing systems, like MapReduce and Pregel, provide a scalable and fault tolerant environment for running computations at massive scale. However, these systems are designed primarily for data intensive computational tasks, while a large class of problems in scientific computing and business analytics are computationally intensive (i.e., they require a lot of CPU in addition to I/O). In this thesis, we investigate the use of cloud computing systems, in particular MapReduce, for computationally intensive problems, focusing on two classic problems that arise in scienti c computing and also in analytics: maximum clique and matrix inversion. The key contribution that enables us to e ectively use MapReduce to solve the maximum clique problem on dense graphs is a recursive partitioning method that partitions the graph into several subgraphs of similar size and running time complexity. After partitioning, the maximum cliques of the di erent partitions can be computed independently, and the computation is sped up using a branch and bound method. Our experiments show that our approach leads to good scalability, which is unachievable by other partitioning methods since they result in partitions of di erent sizes and hence lead to load imbalance. Our method is more scalable than an MPI algorithm, and is simpler and more fault tolerant. For the matrix inversion problem, we show that a recursive block LU decomposition allows us to e ectively compute in parallel both the lower triangular (L) and upper triangular (U) matrices using MapReduce. After computing the L and U matrices, their inverses are computed using MapReduce. The inverse of the original matrix, which is the product of the inverses of the L and U matrices, is also obtained using MapReduce. Our technique is the rst matrix inversion technique that uses MapReduce. We show experimentally that our technique has good scalability, and it is simpler and more fault tolerant than MPI implementations such as ScaLAPACK. Scientific Computing Cloud Computing MapReduce Hadoop Matrix Inversion Maximum Clique Computer Science
45	Graph theoretic generalizations of clique: optimization and extensions Balasundaram, Balabhaskar 15 May 2009 (has links) This dissertation considers graph theoretic generalizations of the maximum clique problem. Models that were originally proposed in social network analysis literature, are investigated from a mathematical programming perspective for the first time. A social network is usually represented by a graph, and cliques were the first models of "tightly knit groups" in social networks, referred to as cohesive subgroups. Cliques are idealized models and their overly restrictive nature motivated the development of clique relaxations that relax different aspects of a clique. Identifying large cohesive subgroups in social networks has traditionally been used in criminal network analysis to study organized crimes such as terrorism, narcotics and money laundering. More recent applications are in clustering and data mining wireless networks, biological networks as well as graph models of databases and the internet. This research has the potential to impact homeland security, bioinformatics, internet research and telecommunication industry among others. The focus of this dissertation is a degree-based relaxation called k-plex. A distance-based relaxation called k-clique and a diameter-based relaxation called k-club are also investigated in this dissertation. We present the first systematic study of the complexity aspects of these problems and application of mathematical programming techniques in solving them. Graph theoretic properties of the models are identified and used in the development of theory and algorithms. Optimization problems associated with the three models are formulated as binary integer programs and the properties of the associated polytopes are investigated. Facets and valid inequalities are identified based on combinatorial arguments. A branch-and-cut framework is designed and implemented to solve the optimization problems exactly. Specialized preprocessing techniques are developed that, in conjunction with the branch-and-cut algorithm, optimally solve the problems on real-life power law graphs, which is a general class of graphs that include social and biological networks. Computational experiments are performed to study the effectiveness of the proposed solution procedures on benchmark instances and real-life instances. The relationship of these models to the classical maximum clique problem is studied, leading to several interesting observations including a new compact integer programming formulation. We also prove new continuous non-linear formulations for the classical maximum independent set problem which maximize continuous functions over the unit hypercube, and characterize its local and global maxima. Finally, clustering and network design extensions of the clique relaxation models are explored. maximum clique problem combinatorial optimization integer programming social network analysis data-mining
46	Leakage power driven behavioral synthesis of pipelined asics Gopalan, Ranganath 01 June 2005 (has links) Traditional approaches for power optimization during high level synthesis, have targetted single-cycle designs where only one input is being processed by the datapath at any given time. Throughput of large single-cycle designs can be improved by means of pipelining. In this work, we present a framework for the high-level synthesis of pipelined datapaths with low leakage power dissipation. We explore the effect of pipelining on the leakage power dissipation of data-flow intensive designs. An algorithm for minimization of leakage power during behavioral pipelining is presented. The transistor level leakage reduction technique employed here is based on Multi-Threshold CMOS (MTCMOS) technology. Pipelined allocation of functional units and registers is performed considering fixed data introduction intervals. Our algorithm uses simulated annealing to perform scheduling, allocation, and binding for obtaining pipelined datapaths that have low leakage dissipation. Mtcmos Speedup Simulated annealing Clique-partitioning Data initiation intervals American Studies Arts and Humanities
47	Scalable Scientific Computing Algorithms Using MapReduce Xiang, Jingen January 2013 (has links) Cloud computing systems, like MapReduce and Pregel, provide a scalable and fault tolerant environment for running computations at massive scale. However, these systems are designed primarily for data intensive computational tasks, while a large class of problems in scientific computing and business analytics are computationally intensive (i.e., they require a lot of CPU in addition to I/O). In this thesis, we investigate the use of cloud computing systems, in particular MapReduce, for computationally intensive problems, focusing on two classic problems that arise in scienti c computing and also in analytics: maximum clique and matrix inversion. The key contribution that enables us to e ectively use MapReduce to solve the maximum clique problem on dense graphs is a recursive partitioning method that partitions the graph into several subgraphs of similar size and running time complexity. After partitioning, the maximum cliques of the di erent partitions can be computed independently, and the computation is sped up using a branch and bound method. Our experiments show that our approach leads to good scalability, which is unachievable by other partitioning methods since they result in partitions of di erent sizes and hence lead to load imbalance. Our method is more scalable than an MPI algorithm, and is simpler and more fault tolerant. For the matrix inversion problem, we show that a recursive block LU decomposition allows us to e ectively compute in parallel both the lower triangular (L) and upper triangular (U) matrices using MapReduce. After computing the L and U matrices, their inverses are computed using MapReduce. The inverse of the original matrix, which is the product of the inverses of the L and U matrices, is also obtained using MapReduce. Our technique is the rst matrix inversion technique that uses MapReduce. We show experimentally that our technique has good scalability, and it is simpler and more fault tolerant than MPI implementations such as ScaLAPACK. Scientific Computing Cloud Computing MapReduce Hadoop Matrix Inversion Maximum Clique Computer Science
48	Cliquenkonflikte im ländlichen und städtischen Raum - eine Folge von Zuwanderung? Thielen-Reffgen, Caroline January 2006 (has links) Zugl.: Trier, Univ., Diss., 2006
49	Robust flight gate assignment / Jaehn, Florian. January 1900 (has links) Thesis (Ph. D.)--Universität-Gesamthochschule-Siegen, 2007. / Includes bibliographical references (p. 123-129).
50	[en] RECOMMENDATION BASED ON DATA MINING FOR RELATIONSHIP MARKETING / [pt] MINERAÇÃO DE DADOS VOLTADA PARA RECOMENDAÇÃO NO ÂMBITO DE MARKETING DE RELACIONAMENTO LIVIA FONSECA FRACALANZA 24 August 2009 (has links) [pt] Cross-selling é uma estratégia de vendas de produtos baseada em uma análise das compras passadas de um cliente ou nas compras passadas de outros clientes com o mesmo perfil. O algoritmo mais conhecido para análise da cesta de compras de um cliente é conhecido por market basket analysis. Este trabalho aborda a descoberta de padrões seqüenciais em grandes bases de dados e tem por objetivo apresentar um algoritmo eficiente que transforma o problema da cesta de compras em um problema de clique máximo. Primeiramente, os dados de entrada são transformados em um grafo e o problema da descoberta do clique máximo é resolvido revelando as relações mais recorrentes entre os itens em questão. Os experimentos apresentados na dissertação demonstram a eficiência do algoritmo em grandes volumes de dados. / [en] Cross-selling is a strategy to recommend products to customers based on their past purchases or the purchases of other customers with the same profile. The best known algorithm for the analysis of a client shopping basket is known in the literature as market basket analysis. This dissertation discusses the discovery of sequential patterns in large databases and aims at implementing an efficient algorithm that transforms the shopping cart problem into a maximum clique problem. First, input data is transformed into a graph and maximum cliques are detected to discover the most frequent relationship between the items on the transaction. The dissertation also includes experiments that evaluate the efficiency of the algorithm for large data volumes. [pt] MINERACAO DE DADOS [en] DATA MINING [pt] INFORMATICA [en] COMPUTER SCIENCE [pt] PROBLEMA DO CLIQUE MAXIMO

Search results