Spelling suggestions: "subject:"motif analysis"" "subject:"otif analysis""
1 |
Transcriptional regulation landscape in health and diseaseCarrasco Pro, Sebastian 26 January 2021 (has links)
Transcription factors (TFs) control gene expression by binding to highly specific DNA sequences in gene regulatory regions. This TF binding is central to control myriad biological processes. Indeed, transcriptional dysregulation has been associated with many diseases such as autoimmune diseases and cancer. In this thesis, I studied the transcriptional regulation of cytokines and gene transcriptional dysregulation in cancer. Cytokines are small proteins produced by immune cells that play a key role in the development of the immune system and response to pathogens and inflammation. I mined three decades of research and developed a user-friendly database, CytReg, containing 843 human and 647 mouse interactions between TFs and cytokines. I analyzed CytReg and integrated it with phenotypic and functional datasets to provide novel insights into the general principles that govern cytokine regulation. I also predicted novel cytokine promoter-TF interactions based on cytokine co-expression patterns and motif analysis, and studied the association of cytokine transcriptional dysregulation with disease. Transcriptional dysregulation can be caused by single nucleotide variants (SNVs) affecting TF binding sites (TFBS). Therefore, I created a database of altered TFBS (aTFBS-DB) by calculating the effect (gain/loss) of all possible SNVs across the human genome for 741 TFs. I showed how the probabilities to gain or disrupt TFBSs in regulatory regions differ between the major TF families, and that cis-eQTL SNVs are more likely to perturb TFBSs than common SNVs in the human population. To further study the effect of somatic SNVs in TFBS, I used the aTFBS-DB to develop TF-aware burden test (TFABT), a novel algorithm to predict cancer driver SNVs in gene promoters. I applied the TFABT to the Pan-Cancer Analysis of Whole Genomes (PCAWG) cohort and identified 2,555 candidate driver SNVs across 20 cancer types. Further, I characterized these cancer drivers using functional and biophysical assay data from three cancer cell lines, demonstrating that most SNVs alter transcriptional activity and differentially recruit cofactors. Taken together, these studies can be used as a blueprint to study transcriptional mechanisms in specific cellular processes (i.e. cytokine expression) and the effect of transcriptional dysregulation in disease (i.e. cancer).
|
2 |
John Wesley's concept of perfect love: a motif analysisCubie, David Livingstone January 1965 (has links)
Thesis (Ph.D.)--Boston University / PLEASE NOTE: Boston University Libraries did not receive an Authorization To Manage form for this thesis or dissertation. It is therefore not openly accessible, though it may be available by request. If you are the author or principal advisor of this work and would like to request open access for it, please contact us at open-help@bu.edu. Thank you. / The problem of the dissertation is to discover what John Wesley meant by perfect love. Statements of both approbation and criticism regarding his doctrine are usually made from the vantage of various present-day interpretations. The goal of this study is to describe the type of perfection and love which was uppermost in Wesley's thought.
The method used is motif analysis as it is developed by Anders Nygren in his book, Agape and Eros. Nygren's method and motifs (Agape, the New Testament motif; Eros, the Greek motif; Nomos, the Judaistic motif; and Caritas, Augustine's union of the Greek and New Testament motifs) are examined to determine their usefulness for research. While Nygren's description of Agape or New Testament love is not sufficiently complete, his description of the contrasting ways and systems of thought is sufficiently demonstrated to warrant the use of motif research . The method proved to be valuable in the examination of Wesley's thought [TRUNCATED] / 2031-01-01
|
3 |
Subgraph Covers- An Information Theoretic Approach to Motif Analysis in NetworksWegner, Anatol Eugen 16 February 2015 (has links) (PDF)
A large number of complex systems can be modelled as networks of interacting units. From a mathematical point of view the topology of such systems can be represented as graphs of which the nodes represent individual elements of the system and the edges interactions or relations between them. In recent years networks have become a principal tool for analyzing complex systems in many different fields.
This thesis introduces an information theoretic approach for finding characteristic connectivity patterns of networks, also called network motifs. Network motifs are sometimes also referred to as basic building blocks of complex networks. Many real world networks contain a statistically surprising number of certain subgraph patterns called network motifs. In biological and technological networks motifs are thought to contribute to the overall function of the network by performing modular tasks such as information processing. Therefore, methods for identifying network motifs are of great scientific interest.
In the prevalent approach to motif analysis network motifs are defined to be subgraphs that occur significantly more often in a network when compared to a null model that preserves certain features of the network. However, defining appropriate null models and sampling these has proven to be challenging. This thesis introduces an alternative approach to motif analysis which looks at motifs as regularities of a network that can be exploited to obtain a more efficient representation of the network. The approach is based on finding a subgraph cover that represents the network using minimal total information. Here, a subgraph cover is a set of subgraphs such that every edge of the graph is contained in at least one subgraph in the cover while the total information of a subgraph cover is the information required to specify the connectivity patterns occurring in the cover together with their position in the graph.
The thesis also studies the connection between motif analysis and random graph models for networks. Developing random graph models that incorporate high densities of triangles and other motifs has long been a goal of network research. In recent years, two such model have been proposed . However, their applications have remained limited because of the lack of a method for fitting such models to networks. In this thesis, we address this problem by showing that these models can be formulated as ensembles of subgraph covers and that the total information optimal subgraph covers can be used to match networks with such models. Moreover, these models can be solved analytically for many of their properties allowing for more accurate modelling of networks in general.
Finally, the thesis also analyzes the problem of finding a total information optimal subgraph cover with respect to its computational complexity. The problem turns out to be NP-hard hence, we propose a greedy heuristic for it. Empirical results for several real world networks from different fields are presented. In order to test the presented algorithm we also consider some synthetic networks with predetermined motif structure.
|
4 |
A Mixture-of-Experts Approach for Gene Regulatory Network InferenceShao, Borong January 2014 (has links)
Context. Gene regulatory network (GRN) inference is an important and challenging problem in bioinformatics. A variety of machine learning algorithms have been applied to increase the GRN inference accuracy. Ensemble learning methods are shown to yield a higher inference accuracy than individual algorithms. Objectives. We propose an ensemble GRN inference method, which is based on the principle of Mixture-of-Experts ensemble learning. The proposed method can quantitatively measure the accuracy of individual GRN inference algorithms at the network motifs level. Based on the accuracy of the individual algorithms at predicting different types of network motifs, weights are assigned to the individual algorithms so as to take advantages of their strengths and weaknesses. In this way, we can improve the accuracy of the ensemble prediction. Methods. The research methodology is controlled experiment. The independent variable is method. It has eight groups: five individual algorithms, the generic average ranking method used in the DREAM5 challenge, the proposed ensemble method including four types of network motifs and five types of network motifs. The dependent variable is GRN inference accuracy, measured by the area under the precision-recall curve (AUPR). The experiment has training and testing phases. In the training phase, we analyze the accuracy of five individual algorithms at the network motifs level to decide their weights. In the testing phase, the weights are used to combine predictions from the five individual algorithms to generate ensemble predictions. We compare the accuracy of the eight method groups on Escherichia coli microarray dataset using AUPR. Results. In the training phase, we obtain the AUPR values of the five individual algorithms at predicting each type of the network motifs. In the testing phase, we collect the AUPR values of the eight methods on predicting the GRN of the Escherichia coli microarray dataset. Each method group has a sample size of ten (ten AUPR values). Conclusions. Statistical tests on the experiment results show that the proposed method yields a significantly higher accuracy than the generic average ranking method. In addition, a new type of network motif is found in GRN, the inclusion of which can increase the accuracy of the proposed method significantly. / Genes are DNA molecules that control the biological traits and biochemical processes that comprise life. They interact with each other to realize the precise regulation of life activities. Biologists aim to understand the regulatory network among the genes, with the help of high-throughput techonologies, such as microarrays, RNA-seq, etc. These technologies produce large amount of gene expression data which contain useful information. Therefore, effective data mining is necessary to discover the information to promote biological research. Gene regulatory network (GRN) inference is to infer the gene interactions from gene expression data, such as microarray datasets. The inference results can be used to guide the direction of further experiments to discover or validate gene interactions. A variety of machine learning (data mining) methods have been proposed to solve this problem. In recent years, experiments have shown that ensemble learning methods achieve higher accuracy than the individual learning methods. Because the ensemble learning methods can take advantages of the strength of different individual methods and it is robust to different network structures. In this thesis, we propose an ensemble GRN inference method, which is based on the principle of the Mixture-of-Experts ensemble learning. By quantitatively measure the accuracy of individual methods at the network motifs level, the proposed method is able to take advantage of the complementarity among the individual methods. The proposed method yields a significantly higher accuracy than the generic average ranking method, which is the most accurate method out of 35 GRN inference methods in the DREAM5 challenge. / 0769607980
|
5 |
Stereochemical Analysis On Protein Structures - Lessons For Design, Engineering And PredictionGunasekaran, K 12 1900 (has links) (PDF)
No description available.
|
6 |
RAD21 Cooperates with Pluripotency Transcription Factors in the Maintenance of Embryonic Stem Cell IdentityBuchholz, Frank, Nitzsche, Anja, Paszkowski-Rogacz, Maciej, Matarese, Filomena, Janssen-Megens, Eva M., Hubner, Nina C., Schulz, Herbert, de Vries, Ingrid, Ding, Li, Huebner, Norbert, Mann, Matthias, Stunnenberg, Hendrik G. 18 January 2016 (has links) (PDF)
For self-renewal, embryonic stem cells (ESCs) require the expression of specific transcription factors accompanied by a particular chromosome organization to maintain a balance between pluripotency and the capacity for rapid differentiation. However, how transcriptional regulation is linked to chromosome organization in ESCs is not well understood. Here we show that the cohesin component RAD21 exhibits a functional role in maintaining ESC identity through association with the pluripotency transcriptional network. ChIP-seq analyses of RAD21 reveal an ESC specific cohesin binding pattern that is characterized by CTCF independent co-localization of cohesin with pluripotency related transcription factors Oct4, Nanog, Sox2, Esrrb and Klf4. Upon ESC differentiation, most of these binding sites disappear and instead new CTCF independent RAD21 binding sites emerge, which are enriched for binding sites of transcription factors implicated in early differentiation. Furthermore, knock-down of RAD21 causes expression changes that are similar to expression changes after Nanog depletion, demonstrating the functional relevance of the RAD21 - pluripotency transcriptional network association. Finally, we show that Nanog physically interacts with the cohesin or cohesin interacting proteins STAG1 and WAPL further substantiating this association. Based on these findings we propose that a dynamic placement of cohesin by pluripotency transcription factors contributes to a chromosome organization supporting the ESC expression program.
|
7 |
RAD21 Cooperates with Pluripotency Transcription Factors in the Maintenance of Embryonic Stem Cell IdentityBuchholz, Frank, Nitzsche, Anja, Paszkowski-Rogacz, Maciej, Matarese, Filomena, Janssen-Megens, Eva M., Hubner, Nina C., Schulz, Herbert, de Vries, Ingrid, Ding, Li, Huebner, Norbert, Mann, Matthias, Stunnenberg, Hendrik G. 18 January 2016 (has links)
For self-renewal, embryonic stem cells (ESCs) require the expression of specific transcription factors accompanied by a particular chromosome organization to maintain a balance between pluripotency and the capacity for rapid differentiation. However, how transcriptional regulation is linked to chromosome organization in ESCs is not well understood. Here we show that the cohesin component RAD21 exhibits a functional role in maintaining ESC identity through association with the pluripotency transcriptional network. ChIP-seq analyses of RAD21 reveal an ESC specific cohesin binding pattern that is characterized by CTCF independent co-localization of cohesin with pluripotency related transcription factors Oct4, Nanog, Sox2, Esrrb and Klf4. Upon ESC differentiation, most of these binding sites disappear and instead new CTCF independent RAD21 binding sites emerge, which are enriched for binding sites of transcription factors implicated in early differentiation. Furthermore, knock-down of RAD21 causes expression changes that are similar to expression changes after Nanog depletion, demonstrating the functional relevance of the RAD21 - pluripotency transcriptional network association. Finally, we show that Nanog physically interacts with the cohesin or cohesin interacting proteins STAG1 and WAPL further substantiating this association. Based on these findings we propose that a dynamic placement of cohesin by pluripotency transcription factors contributes to a chromosome organization supporting the ESC expression program.
|
8 |
Subgraph Covers- An Information Theoretic Approach to Motif Analysis in NetworksWegner, Anatol Eugen 02 April 2015 (has links)
A large number of complex systems can be modelled as networks of interacting units. From a mathematical point of view the topology of such systems can be represented as graphs of which the nodes represent individual elements of the system and the edges interactions or relations between them. In recent years networks have become a principal tool for analyzing complex systems in many different fields.
This thesis introduces an information theoretic approach for finding characteristic connectivity patterns of networks, also called network motifs. Network motifs are sometimes also referred to as basic building blocks of complex networks. Many real world networks contain a statistically surprising number of certain subgraph patterns called network motifs. In biological and technological networks motifs are thought to contribute to the overall function of the network by performing modular tasks such as information processing. Therefore, methods for identifying network motifs are of great scientific interest.
In the prevalent approach to motif analysis network motifs are defined to be subgraphs that occur significantly more often in a network when compared to a null model that preserves certain features of the network. However, defining appropriate null models and sampling these has proven to be challenging. This thesis introduces an alternative approach to motif analysis which looks at motifs as regularities of a network that can be exploited to obtain a more efficient representation of the network. The approach is based on finding a subgraph cover that represents the network using minimal total information. Here, a subgraph cover is a set of subgraphs such that every edge of the graph is contained in at least one subgraph in the cover while the total information of a subgraph cover is the information required to specify the connectivity patterns occurring in the cover together with their position in the graph.
The thesis also studies the connection between motif analysis and random graph models for networks. Developing random graph models that incorporate high densities of triangles and other motifs has long been a goal of network research. In recent years, two such model have been proposed . However, their applications have remained limited because of the lack of a method for fitting such models to networks. In this thesis, we address this problem by showing that these models can be formulated as ensembles of subgraph covers and that the total information optimal subgraph covers can be used to match networks with such models. Moreover, these models can be solved analytically for many of their properties allowing for more accurate modelling of networks in general.
Finally, the thesis also analyzes the problem of finding a total information optimal subgraph cover with respect to its computational complexity. The problem turns out to be NP-hard hence, we propose a greedy heuristic for it. Empirical results for several real world networks from different fields are presented. In order to test the presented algorithm we also consider some synthetic networks with predetermined motif structure.
|
9 |
Efficient and Scalable Subgraph Statistics using Regenerative Markov Chain Monte CarloMayank Kakodkar (12463929) 26 April 2022 (has links)
<p>In recent years there has been a growing interest in data mining and graph machine learning for techniques that can obtain frequencies of <em>k</em>-node Connected Induced Subgraphs (<em>k</em>-CIS) contained in large real-world graphs. While recent work has shown that 5-CISs can be counted exactly, no exact polynomial-time algorithms are known that solve this task for <em>k </em>> 5. In the past, sampling-based algorithms that work well in moderately-sized graphs for <em>k</em> ≤ 8 have been proposed. In this thesis I push this boundary up to <em>k</em> ≤ 16 for graphs containing up to 120M edges, and to <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges. I do so by re-imagining two older, but elegant and memory-efficient algorithms -- FANMOD and PSRW -- which have large estimation errors by modern standards. This is because FANMOD produces highly correlated k-CIS samples and the cost of sampling the PSRW Markov chain becomes prohibitively expensive for k-CIS’s larger than <em>k </em>> 8.</p>
<p>In this thesis, I introduce:</p>
<p>(a) <strong>RTS:</strong> a novel regenerative Markov chain Monte Carlo (MCMC) sampling procedure on the tree, generated on-the-fly by the FANMOD algorithm. RTS is able to run on multiple cores and multiple machines (embarrassingly parallel) and compute confidence intervals of estimates, all this while preserving the memory-efficient nature of FANMOD. RTS is thus able to estimate subgraph statistics for <em>k</em> ≤ 16 for larger graphs containing up to 120M edges, and for <em>k</em> ≤ 25 for smaller graphs containing between a million to 20M edges.</p>
<p>(b) <strong>R-PSRW:</strong> which scales the PSRW algorithm to larger CIS-sizes using a rejection sampling procedure to efficiently sample transitions from the PSRW Markov chain. R-PSRW matches RTS in terms of scaling to larger CIS sizes.</p>
<p>(c) <strong>Ripple:</strong> which achieves unprecedented scalability by stratifying the R-PSRW Markov chain state-space into ordered strata via a new technique that I call <em>sequential stratified regeneration</em>. I show that the Ripple estimator is consistent, highly parallelizable, and scales well. Ripple is able to <em>count</em> CISs of size up to <em>k </em>≤ 12 in real world graphs containing up to 120M edges.</p>
<p>My empirical results show that the proposed methods offer a considerable improvement over the state-of-the-art. Moreover my methods are able to run at a scale that has been considered unreachable until now, not only by prior MCMC-based methods but also by other sampling approaches. </p>
<p><strong>Optimization of Restricted Boltzmann Machines. </strong>In addition, I also propose a regenerative transformation of MCMC samplers of Restricted Boltzmann Machines RBMs. My approach, Markov Chain Las Vegas (MCLV) gives statistical guarantees in exchange for random running times. MCLV uses a stopping set built from the training data and has a maximum number of Markov chain step-count <em>K</em> (referred as MCLV-<em>K</em>). I present a MCLV-<em>K</em> gradient estimator (LVS-<em>K</em>) for RBMs and explore the correspondence and differences between LVS-<em>K</em> and Contrastive Divergence (CD-<em>K</em>). LVS-<em>K</em> significantly outperforms CD-<em>K</em> in the task of training RBMs over the MNIST dataset, indicating MCLV to be a promising direction in learning generative models.</p>
|
Page generated in 0.0374 seconds