Spelling suggestions: "subject:"graph 2analysis"" "subject:"graph 3analysis""
11 |
Methods for Differential Analysis of Gene Expression and Metabolic Pathway ActivityTemate Tiagueu, Yvette Charly B, Temate Tiagueu, Yvette C. B. 09 May 2016 (has links)
RNA-Seq is an increasingly popular approach to transcriptome profiling that uses the capabilities of next generation sequencing technologies and provides better measurement of levels of transcripts and their isoforms. In this thesis, we apply RNA-Seq protocol and transcriptome quantification to estimate gene expression and pathway activity levels. We present a novel method, called IsoDE, for differential gene expression analysis based on bootstrapping. In the first version of IsoDE, we compared the tool against four existing methods: Fisher's exact test, GFOLD, edgeR and Cuffdiff on RNA-Seq datasets generated using three different sequencing technologies, both with and without replicates. We also introduce the second version of IsoDE which runs 10 times faster than the first implementation due to some in-memory processing applied to the underlying gene expression frequencies estimation tool and we also perform more optimization on the analysis.
The second part of this thesis presents a set of tools to differentially analyze metabolic pathways from RNA-Seq data. Metabolic pathways are series of chemical reactions occurring within a cell. We focus on two main problems in metabolic pathways differential analysis, namely, differential analysis of their inferred activity level and of their estimated abundance. We validate our approaches through differential expression analysis at the transcripts and genes levels and also through real-time quantitative PCR experiments. In part Four, we present the different packages created or updated in the course of this study. We conclude with our future work plans for further improving IsoDE 2.0.
|
12 |
Mining Tera-Scale Graphs: Theory, Engineering and DiscoveriesKang, U 01 May 2012 (has links)
How do we find patterns and anomalies, on graphs with billions of nodes and edges, which do not fit in memory? How to use parallelism for such Tera- or Peta-scale graphs? In this thesis, we propose PEGASUS, a large scale graph mining system implemented on the top of the HADOOP platform, the open source version of MAPREDUCE. PEGASUS includes algorithms which help us spot patterns and anomalous behaviors in large graphs.
PEGASUS enables the structure analysis on large graphs. We unify many different structure analysis algorithms, including the analysis on connected components, PageRank, and radius/diameter, into a general primitive called GIM-V. GIM-V is highly optimized, achieving good scale-up on the number of edges and available machines. We discover surprising patterns using GIM-V, including the 7-degrees of separation in one of the largest publicly available Web graphs, with 7 billion edges.
PEGASUS also enables the inference and the spectral analysis on large graphs. We design an efficient distributed belief propagation algorithm which infer the states of unlabeled nodes given a set of labeled nodes. We also develop an eigensolver for computing top k eigenvalues and eigenvectors of the adjacency matrices of very large graphs. We use the eigensolver to discover anomalous adult advertisers in the who-follows-whom Twitter graph with 3 billion edges. In addition, we develop an efficient tensor decomposition algorithm and use it to analyze a large knowledge base tensor.
Finally, PEGASUS allows the management of large graphs. We propose efficient graph storage and indexing methods to answer graph mining queries quickly. We also develop an edge layout algorithm for better compressing graphs.
|
13 |
A high-performance framework for analyzing massive complex networksMadduri, Kamesh 08 July 2008 (has links)
Graphs are a fundamental and widely-used abstraction for representing data. We can analytically study interesting aspects of real-world complex systems such as the Internet, social systems, transportation networks, and biological interaction data by modeling them as graphs. Graph-theoretic and combinatorial problems are also pervasive in scientific computing and engineering applications. In this dissertation, we address the problem of analyzing large-scale complex networks that represent interactions between hundreds of thousands to billions of entities. We present SNAP, a new high-performance computational framework for efficiently processing graph-theoretic queries on massive datasets.
Graph analysis is computationally very different from traditional scientific computing, and solving massive graph-theoretic problems on current high performance computing systems is challenging due to several reasons. First, real-world graphs are often characterized by a low diameter and unbalanced degree distributions, and are difficult to partition on parallel systems. Second, parallel algorithms for solving graph-theoretic problems are typically memory intensive, and the memory accesses are fine-grained and highly irregular. The primary contributions of this dissertation are the design and implementation of novel parallel graph algorithms for traversal, shortest paths, and centrality computations, optimized for the small-world network topology, and high-performance multithreaded architectures and multicore servers. SNAP (Small-world Network Analysis and Partitioning) is a modular, open-source framework for the exploratory analysis and partitioning of large-scale networks. With SNAP, we demonstrate the capability to process massive graphs with billions of vertices and edges, and achieve up to two orders of magnitude speedup over state-of-the-art network analysis approaches. We also design a new parallel computing benchmark for characterizing the performance of graph-theoretic problems on high-end systems; study data representations for dynamic graph problems on parallel systems; and apply algorithms in SNAP to solve real-world problems in social network analysis and systems biology.
|
14 |
Signal Processing on Graphs - Contributions to an Emerging Field / Traitement du signal sur graphes - Contributions à un domaine émergentGirault, Benjamin 01 December 2015 (has links)
Ce manuscrit introduit dans une première partie le domaine du traitement du signal sur graphe en commençant par poser les bases d'algèbre linéaire et de théorie spectrale des graphes. Nous définissons ensuite le traitement du signal sur graphe et donnons des intuitions sur ses forces et faiblesses actuelles comparativement au traitement du signal classique. En seconde partie, nous introduisons nos contributions au domaine. Le chapitre 4 cible plus particulièrement l'étude de la structure d'un graphe par l'analyse des signaux temporels via une transformation graphe vers série temporelle. Ce faisant, nous exploitons une approche unifiée d'apprentissage semi-supervisé sur graphe dédiée à la classification pour obtenir une série temporelle lisse. Enfin, nous montrons que cette approche s'apparente à du lissage de signaux sur graphe. Le chapitre 5 de cette partie introduit un nouvel opérateur de translation sur graphe définit par analogie avec l'opérateur classique de translation en temps et vérifiant la propriété clé d'isométrie. Cet opérateur est comparé aux deux opérateurs de la littérature et son action est décrite empiriquement sur quelques graphes clés. Le chapitre 6 décrit l'utilisation de l'opérateur ci-dessus pour définir la notion de signal stationnaire sur graphe. Après avoir étudié la caractérisation spectrale de tels signaux, nous donnons plusieurs outils essentiels pour étudier et tester cette propriété sur des signaux réels. Le dernier chapitre s'attache à décrire la boite à outils \matlab développée et utilisée tout au long de cette thèse. / This dissertation introduces in its first part the field of signal processing on graphs. We start by reminding the required elements from linear algebra and spectral graph theory. Then, we define signal processing on graphs and give intuitions on its strengths and weaknesses compared to classical signal processing. In the second part, we introduce our contributions to the field. Chapter 4 aims at the study of structural properties of graphs using classical signal processing through a transformation from graphs to time series. Doing so, we take advantage of a unified method of semi-supervised learning on graphs dedicated to classification to obtain a smooth time series. Finally, we show that we can recognize in our method a smoothing operator on graph signals. Chapter 5 introduces a new translation operator on graphs defined by analogy to the classical time shift operator and verifying the key property of isometry. Our operator is compared to the two operators of the literature and its action is empirically described on several graphs. Chapter 6 describes the use of the operator above to define stationary graph signals. After giving a spectral characterization of these graph signals, we give a method to study and test stationarity on real graph signals. The closing chapter shows the strength of the matlab toolbox developed and used during the course of this PhD.
|
15 |
Unraveling the Structure and Assessing the Quality of Protein Interaction Networks with Power Graph AnalysisRoyer, Loic 12 December 2017 (has links) (PDF)
Molecular biology has entered an era of systematic and automated experimentation. High-throughput techniques have moved biology from small-scale experiments focused on specific genes and proteins to genome and proteome-wide screens. One result of this endeavor is the compilation of complex networks of interacting proteins. Molecular biologists hope to understand life's complex molecular machines by studying these networks. This thesis addresses tree open problems centered upon their analysis and quality assessment.
First, we introduce power graph analysis as a novel approach to the representation and visualization of biological networks. Power graphs are a graph theoretic approach to lossless and compact representation of complex networks. It groups edges into cliques and bicliques, and nodes into a neighborhood hierarchy. We demonstrate power graph analysis on five examples, and show its advantages over traditional network representations. Moreover, we evaluate the algorithm performance on a benchmark, test the robustness of the algorithm to noise, and measure its empirical time complexity at O (e1.71)- sub-quadratic in the number of edges e.
Second, we tackle the difficult and controversial problem of data quality in protein interaction networks. We propose a novel measure for accuracy and completeness of genome-wide protein interaction networks based on network compressibility. We validate this new measure by i) verifying the detrimental effect of false positives and false negatives, ii) showing that gold standard networks are highly compressible, iii) showing that authors' choice of confidence thresholds is consistent with high network compressibility, iv) presenting evidence that compressibility is correlated with co-expression, co-localization and shared function, v) showing that complete and accurate networks of complex systems in other domains exhibit similar levels of compressibility than current high quality interactomes.
Third, we apply power graph analysis to networks derived from text-mining as well to gene expression microarray data. In particular, we present i) the network-based analysis of genome-wide expression profiles of the neuroectodermal conversion of mesenchymal stem cells. ii) the analysis of regulatory modules in a rare mitochondrial cytopathy: emph{Mitochondrial Encephalomyopathy, Lactic acidosis, and Stroke-like episodes} (MELAS), and iii) we investigate the biochemical causes behind the enhanced biocompatibility of tantalum compared with titanium.
|
16 |
Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysisSchiller, Benjamin, Deusser, Clemens, Castrillon, Jeronimo, Strufe, Thorsten 11 January 2017 (has links) (PDF)
Graphs are used to model a wide range of systems from different disciplines including social network analysis, biology, and big data processing. When analyzing these constantly changing dynamic graphs at a high frequency, performance is the main concern. Depending on the graph size and structure, update frequency, and read accesses of the analysis, the use of different data structures can yield great performance variations. Even for expert programmers, it is not always obvious, which data structure is the best choice for a given scenario.
In previous work, we presented an approach for handling the selection of the most efficient data structures automatically using a compile-time approach well-suited for constant workloads.
We extend this work with a measurement study of seven data structures and use the results to fit actual cost estimation functions. In addition, we evaluate our approach for the computations of seven different graph metrics. In analyses of real-world dynamic graphs with a constant workload, our approach achieves a speedup of up to 5.4× compared to basic data structure configurations.
Such a compile-time based approach cannot yield optimal results when the behavior of the system changes later and the workload becomes non-constant. To close this gap we present a run-time approach which provides live profiling and facilitates automatic exchanges of data structures during execution. We analyze the performance of this approach using an artificial, non-constant workload where our approach achieves speedups of up to 7.3× compared to basic configurations.
|
17 |
Functional network centrality in obesity: a resting-state and task fMRI studyGarcía-García, Isabel, Jurado, María Ángeles, Garolera, Maite, Marqués-Iturria, Idoia, Horstmann, Annette, Segura, Bàrbara, Pueyo, Roser, Sender-Palacios, María José, Vernet-Vernet, Maria, Villringer, Arno, Junqué, Carme, Margulies, Daniel S., Neumann, Jane January 2015 (has links)
Obesity is associated with structural and functional alterations in brain areas that are often functionally distinct and anatomically distant. This suggests that obesity is associated with differences in functional connectivity of regions distributed across the brain. However, studies addressing whole brain functional connectivity in obesity remain scarce. Here, we compared voxel-wise degree centrality and eigenvector centrality between participants with obesity (n=20) and normal-weight controls (n=21). We analyzed resting state and task-related fMRI data acquired from the same individuals. Relative to normal-weight controls, participants with obesity exhibited reduced degree centrality in the right middle frontal gyrus in the resting-state condition. During the task fMRI condition, obese participants exhibited less degree centrality in the left middle frontal gyrus and the lateral occipital cortex along with reduced eigenvector centrality in the lateral occipital cortex and occipital pole. Our results highlight the central role of the middle frontal gyrus in the pathophysiology of obesity, a structure involved in several brain circuits signaling attention, executive functions and motor functions. Additionally, our analysis suggests the existence of task-dependent reduced centrality in occipital areas; regions with a role in perceptual processes and that are profoundly modulated by attention.
|
18 |
Analysis, integration and applications of the human interactomeChaurasia, Gautam 12 December 2012 (has links)
Protein-Protein Interaktions (PPI) Netzwerke liefern ein Grundgerüst für systematische Untersuchungen der komplexen molekularen Maschinerie in der Zelle. Die Komplexität von Protein-Wechselwirkungen stellt jedoch in Bezug auf ihre Identifizierung, Validierung und Annotation eine große experimentelle und rechnerische Herausforderung dar. In dieser Arbeit analysierte ich diese Probleme und lieferte Lösungen, um die Limitierungen aktueller humanen PPI Netzwerke zu überwinden. Meine Arbeit kann in zwei Teile aufgeteilt werden: Im ersten Teil führte ich eine kritischen Vergleich von acht unabhängig konstruierten humanen PPI Netzwerke durch, um mögliche experimentellen Verzerrungen zu erkennen. Die Ergebnisse zeigten starke Tendenzen bezüglich der Selektion und Detektion von Interaktionen, die in zukünftigen Anwendungen dieser Netzwerke berücksichtigt werden sollten. Einer der wichtigsten Schlussfolgerungen dieser Studie war, dass die derzeitigen humanen Interaktions Netzwerke komplementär sind und deshalb wurde eine Datenbank mit der Bezeichnung Unified Human Interaktome (UniHI) entwickelt, die menschliche PPI Daten aus zwölf wichtigsten Quellen integriert. Im zweiten Teil dieser Forschungsarbeit benutzte ich die Daten aus der UniHI Datenbank, die genetischen Modifikatoren in einer bestimmten Krankheit, Chorea Huntington (HD) eine autosomal dominante neurodegenerative Erkrankung, zu charakterisieren. Um die Proteine zu identifizieren, die den Krankheitsverlauf modifizieren können, wurden Protein Interaktion Daten mit Genexpressionsdaten von HD-Patienten in Kombination mit einem Mehrschritt-Filterungsverfahren integriert. Mit dem neuartigen Ansatz wurde ein Nucleus caudatus-spezifische Protein-Interaktion HD (PPI)-Netzwerk vorhergesagt, das 14 potentiell dysregulierten Proteine direkt oder indirekt mit dem Huntingtin-Protein verlinkt, mit mögliche Verbindung zu Molekularen Prozessen wie z.B. Apoptose, Metabolismus, neuronale Entwicklung. / Protein interaction networks aim to provide the scaffold maps for systematic studies of the complex molecular machinery in the cell. The complexity of protein interactions poses, however, large experimental and computational challenges regarding their identification, validation and annotation. Additionally, storage and linking is demanding since new data are rapidly accumulating. In this research work, I addressed these issues and provided solutions to overcome the limitations of current human protein-protein interaction (PPI) maps. In particular, my thesis can be partitioned into two parts: In the first part, I conducted a comparative assessment of eight recently constructed human protein-protein interaction networks to identify experimental biases. Results showed strong selection and detection biases which are necessary to take into consideration in future applications of these maps. One of the important conclusions of this study was that the current human interaction networks contain complementary information; hence, a database was developed, termed as Unified Human Interactome (UniHI), integrating human PPI data from twelve major sources. Several new tools were included for querying, analyzing and visualizing human PPI networks. In the second part of this research work, UniHI dataset was applied to characterize the genetic modifiers involved in a specific disease: Chorea Huntington (HD), an autosomal dominant neurodegenerative disease. To find the modifiers, a network-based modeling approach was implemented by integrating huntingtin-specific protein interaction network with gene expression data from HD patients in multiple steps. Using this approach, a Caudate Nucleus-specific HD protein interaction (PPI) network was predicted, connecting 14 potentially dysregulated proteins directly or indirectly to the disease protein, showing a possible link to molecular processes such as pro-apoptotic pathways, cell survival, anti-apoptotic, growth, and neuronal diseases.
|
19 |
Compile- and run-time approaches for the selection of efficient data structures for dynamic graph analysisSchiller, Benjamin, Deusser, Clemens, Castrillon, Jeronimo, Strufe, Thorsten 11 January 2017 (has links)
Graphs are used to model a wide range of systems from different disciplines including social network analysis, biology, and big data processing. When analyzing these constantly changing dynamic graphs at a high frequency, performance is the main concern. Depending on the graph size and structure, update frequency, and read accesses of the analysis, the use of different data structures can yield great performance variations. Even for expert programmers, it is not always obvious, which data structure is the best choice for a given scenario.
In previous work, we presented an approach for handling the selection of the most efficient data structures automatically using a compile-time approach well-suited for constant workloads.
We extend this work with a measurement study of seven data structures and use the results to fit actual cost estimation functions. In addition, we evaluate our approach for the computations of seven different graph metrics. In analyses of real-world dynamic graphs with a constant workload, our approach achieves a speedup of up to 5.4× compared to basic data structure configurations.
Such a compile-time based approach cannot yield optimal results when the behavior of the system changes later and the workload becomes non-constant. To close this gap we present a run-time approach which provides live profiling and facilitates automatic exchanges of data structures during execution. We analyze the performance of this approach using an artificial, non-constant workload where our approach achieves speedups of up to 7.3× compared to basic configurations.
|
20 |
Statistické zhodnocení dat / Statistical data evaluationFadrný, Tomáš January 2009 (has links)
This diploma thesis evaluates and processes data from final device checks. All the devices are similar types of thermal overcurrent relays by the ABB company. For appropriate statistical data processing, the Minitab 14 statistical software was used and various statistical methods were applied. Results are always listed for each device type and each method used. The diploma thesis is divided into two parts. The first one analyzes the methods used and the second part states the method results. There is also an overall evaluation of the processed data.
|
Page generated in 0.0741 seconds