• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2450
  • 314
  • 255
  • 242
  • 52
  • 46
  • 31
  • 31
  • 31
  • 31
  • 31
  • 31
  • 20
  • 20
  • 14
  • Tagged with
  • 4117
  • 1475
  • 559
  • 550
  • 529
  • 453
  • 444
  • 442
  • 441
  • 417
  • 340
  • 337
  • 335
  • 332
  • 327
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
651

Application of bioinformatics on gene regulation studies and regulatory network construction with omics data

Qin, Jing, 覃静 January 2013 (has links)
Gene expression is a multi-step process that involves various regulators. From whole genome sequences to the complex gene regulatory system, high-throughput technologies have generated a large amount of omics data, but information in such a large scale is hard to interpret manually. Bioinformatics can help to process this huge biological information and infer biological insights using the merits of mathematics, statistics and computational techniques. In this study, we applied various bioinformatic techniques on gene regulation in several aspects. Multiple primary transcripts of a gene can be initiated at different promoters, termed alternative promoters (APs). Most human genes have multiple APs. However, whether the usage of APs is independent or not is still controversial. In this study, we analyze the roles of APs in gene regulations using various bioinformatics approaches. Chromosomal interactions between APs are found to be more frequent than interactions between different genes. By comparing the APs at two ends of the genes, we find that they are significant different in terms of sequence content, conservation and motif frequency. The position and distance of two APs are important for their combined effects, which prove their regulations are not independent and one AP could affect the transcription of the other. With the aim to understand the multi-level gene regulatory system in various biological processes, a mass of high-throughput omics data have been generated. However, each omics technology measuring the molecular abundance or behavior at a single level has a limited ability to depict the multi-level system. Integrating omics data can effectively comprehend the multi-level gene regulatory system and reduce the false positives. In this study, two web servers, ChIP-Array and ProteoMirExpress, have been built to construct transcriptional and post-transcriptional regulatory networks by integrating omics data. ChIP-Array is a web server for biologists to construct a TF-centered network for their own data. Network library is further constructed by ChIP-Array from publicly available data. Given a series mRNA expression profiles in a biological process, master regulators can be identified by matching the profiles with the networks in the library. To explore gene regulatory network controlled by multiple TFs, least absolute shrinkage and selection operator (LASSO)-type regularization models are applied on multiple integrative data. Golden standard based evaluations demonstrate that the L0 and L1/2 regularization models are efficient and applicable to gene regulatory network inference in large genome with a small number of samples. ProteoMirExpress integrates transcriptomic and proteomic data to infer miRNA-centered networks. It successfully infers the perturbed miRNA and those that co-express with it. The resulting network reports miRNA targets with uncorrelated mRNA and protein levels, which are usually ignored by tools considering only the mRNA abundance, even though some of them may be important downstream regulators. In summary, in this study we analyze gene regulation at multiple levels and develop several tools for gene network construction and regulator analysis with multiple omics data. It benefits researchers to efficiently process high-throughput raw data and to draw biological hypotheses and interpretation. / published_or_final_version / Biochemistry / Doctoral / Doctor of Philosophy
652

New algorithms in factor analysis : applications, model selection and findings in bioinformatics

Wu, Ho-chun, 胡皓竣 January 2013 (has links)
Advancements in microelectronic devices and computational and storage technologies enable the collection of high volume, high speed and high dimension data in many applications. Due to the high dimensionality of these measurements, exact dependence of the observations on the various parameters or variables may not be exactly known. Factor analysis (FA) is a useful multivariate technique to exploit the redundancies among observations and reveal their dependence to some latent variables called factors. Some major issues of the conventional FA are high arithmetic complexity for real-time online implementation, assumption of static system parameters, the demand of interval forecasting, robustness against outlying observations and model selection in problems with high dimension but low number of samples (HDLS). This thesis addresses these issues and proposes new extensions to the existing FA algorithms. First, in order to reduce the arithmetic complexity, we propose new recursive FA algorithms (RFA) that recursively compute only the dominant Principal Components (PCs) and eigenvalues in the major subspace tracked by efficient subspace tracking algorithms. Specifically, two new approaches are proposed for updating the PCs and eigenvalues in the classical fault detection problem with different tradeoff between accuracy and arithmetic complexity, namely rank-1 modification and deflation. They significantly reduce the online arithmetic complexity and allow the adaption to time-varying system parameters. Second, we extend the RFA algorithm to forecasting of time series and propose a new recursive dynamic factor analysis (RDFA) algorithm for electricity price forecasting. While the PCs are recursively tracked by the subspace algorithm, a random walk or a state dynamical model can be incorporated to describe the latest state of the time-varying auto-regressive (AR) model built from the factors. This formulation can be solved by the celebrated Kalman filter (KF), which in turn allows future values to be forecasted with estimated confidence intervals. Third, we propose new robust covariance and outlier detection criteria to improve the robustness of the proposed RFA and RDFA algorithms against outlying observations based on the concept of robust M-estimation. Experimental results show that the proposed methods can effectively suppress the adverse contributions of the outliers on the factors and PCs. Finally, in order to improve the consistency of model selection and facilitate the estimation of p-values in HDLS problems, we propose a new automatic model selection method based on ridge partial least squares and recursive feature elimination. Furthermore, a novel performance criterion is proposed for ranking variables according to their consistency of being chosen in different perturbation of the samples. Using this criterion, the associated p-values can be estimated under the HDLS setting. Experimental results using real gene cancer microarray datasets show that improved prognosis can be obtained by the proposed approach as compared with conventional techniques. Furthermore, to quantify their statistical significance, the p-value of the identified genes are estimated and functional analysis of the significant genes found in the diffused large B-cell lymphoma (DLBCL) gene microarray data is performed to validate the findings. While we focus in a few engineering problems, these algorithms are also applicable to other related applications. / published_or_final_version / Electrical and Electronic Engineering / Doctoral / Doctor of Philosophy
653

Structure-Preserving Rearrangements| Algorithms for Structural Comparison and Protein Analysis

Bliven, Spencer Edward 13 August 2015 (has links)
<p> Protein structure is fundamental to a deep understanding of how proteins function. Since structure is highly conserved, structural comparison can provide deep information about the evolution and function of protein families. The Protein Data Bank (PDB) continues to grow rapidly, providing copious opportunities for advancing our understanding of proteins through large-scale searches and structural comparisons. In this work I present several novel structural comparison methods for specific applications, as well as apply structure comparison tools systematically to better understand global properties of protein fold space. </p><p> Circular permutation describes a relationship between two proteins where the N-terminal portion of one protein is related to the C-terminal portion of the other. Proteins that are related by a circular permutation generally share the same structure despite the rearrangement of their primary sequence. This non-sequential relationship makes them difficult for many structure alignment tools to detect. Combinatorial Extension for Circular Permutations (CE-CP) was developed to align proteins that may be related by a circular permutation. It is widely available due to its incorporation into the RCSB PDB website. </p><p> Symmetry and structural repeats are common in protein structures at many levels. The CE-Symm tool was developed in order to detect internal pseudosymmetry within individual polypeptide chains. Such internal symmetry can arise from duplication events, so aligning the individual symmetry units provides insights about conservation and evolution. In many cases, internal symmetry can be shown to be important for a number of functions, including ligand binding, allostery, folding, stability, and evolution. </p><p> Structural comparison tools were applied comprehensively across all PDB structures for systematic analysis. Pairwise structural comparisons of all proteins in the PDB have been computed using the Open Science Grid computing infrastructure, and are kept continually up-to-date with the release of new structures. These provide a network-based view of protein fold space. CE-Symm was also applied to systematically survey the PDB for internally symmetric proteins. It is able to detect symmetry in ~20% of all protein families. Such PDB-wide analyses give insights into the complex evolution of protein folds. </p>
654

Deciphering the Biological Mechanisms Driving the Phenotype of Interest

Quiroz, Alejandro 06 February 2015 (has links)
The two key concepts of Neo-Darwinian evolution theory are genotype and phenotype. Genotype is defined as the genetic constitution of an organism and phenotype refers to the observable characteristics of that organism. Schematically the relationship between genotype and phenotype can be settled as Genotype + Environment + Random Variation \(\underrightarrow{\text{yields}}\) Phenotype. This schematic representation has led to the fundamental problem of given the interactions of the genes and environment, up to what extent is possible to establish a relationship between gene structure and function to the phenotype (Weatherall, D. J., et. al., (2001)). Since R. A. Fisher establishing the basis of quantitative trait loci up to the work of Subramanian, et. al., (1995) gene set enrichment analysis, several statistical methods have been devoted to answer this question, some with more success and scientific repercussion than others. In this work we attempt to answer to this question by delineating the biological mechanisms driven by the genes that are characterize the differences and actions of the phenotypes of interest. Our contribution resides on two pillars: we present an alternative way to conceive gene expression measurements and the use of functional gene set annotation systems as guided prior knowledge of the biological mechanisms that drive the phenotype of interest. Based on these two pillars we propose a method to infer the Functional Network Inference and an alternative method to perform expression Quantitative Trait Loci analysis. (eQTL) From the Functional Network Inference method we are able to identify what mechanisms describe the behavior of most of the, there fore establishing its importance. The alternative method to perform eQTL analysis that we present, is more direct way to associated variations at a sequence level and the biological mechanisms it affects. With this proposal we attempt to address two important issues of traditional eQTL analysis: statistical power and biological implications.
655

Efficient algorithms for optimizing whole genome alignment

Lu, Ning, 陸宁 January 2004 (has links)
published_or_final_version / abstract / toc / Computer Science / Master / Master of Philosophy
656

Pulsed induction, a method to identify genetic regulators of determination events

Pennington, Steven 23 October 2015 (has links)
<p> Abstract: Determination is the process in which a stem cell commits to differentiation. The process of how a cell goes through determination is not well understood. Determination is important for proper regulation of cell turn-over in tissue and maintaining the adult stem cell population. Deregulation of determination or differentiation can lead to diseases such as several forms of cancer. In this study I will be using microarrays to identify candidate genes involved in determination by pulse induction of mouse erythroleukemia (MEL) cells with DMSO and looking at gene expression changes as the cells go through the early stages of erythropoiesis. The pulsed induction method I have developed to identify candidate genes is to induce cells for a short time (30 min, 2 hours, etc.) and allow them then to grow for the duration of their differentiation time (8 days). For reference, cells were also harvested at the time when the inducer is removed from the media. Results show high numbers of genes differentially expressed including erythropoiesis specific genes such as GATA1, globin genes and many novel candidate genes that have also been indicated as playing a role in the dynamic early signaling of erythropoiesis. In addition, several genes showed a pendulum effect when allowed to recover, making these interesting candidate genes for maintaining self-renewal of the adult stem cell population.</p>
657

Ανάπτυξη διαδικτυακής εφαρμογής με σκοπό τη βέλτιστη ταυτοποίηση πεπτιδίων και πρωτεϊνών από δεδομένα πρωτεωμικής ανάλυσης

Αλεξανδρίδου, Αναστασία 08 February 2008 (has links)
Παρουσίαση των μεθόδων και των τεχνικών που χρησιμοποιούνται για την αναζήτηση πρωτεϊνικών και πεπτιδικών ακολουθιών σε βιολογικές βάσεις δεδομένων. Σκοπός της εργασίας είναι η δημιουργία διαδικτυακής εφαρμογής που θα λειτουργήσει ως ελεύθερα διαθέσιμο εργαλείο Βιοπληροφορικής μέσω του οποίου θα ταυτοποιούνται πεπτίδια και πρωτεϊνες από δεδομένα φασματογραφικής ανάλυσης ανεξαρτήτως της επεξεργασίας που έχουν υποστεί τα πρωτογενή δείγματα. / The methods used in searching proteinate and peptide sequences in biological databases are presented. The aim of this study is to create a free distributed Bioinformatics tool, implemented in network enviroment, to verify peptides and proteines traced by spectographic analysis, regerdless of the processing of the original samples.
658

Concept Matching in Informal Node-Link Knowledge Representations

Marshall, Byron Bennett January 2005 (has links)
Information stored by managed organizations in free text documents, databases, and engineered knowledge repositories can often be processed as networks of conceptual nodes and relational links (concept graphs). However, these models tend to be informal as related to new or multi-source tasks. This work contributes to the understanding of techniques for matching knowledge elements: in informal node-link knowledge representations, drawn from existing data resources, to support user-guided analysis. Its guiding focus is the creation of tools that compare, retrieve, and merge existing information resources.Three essays explore important algorithmic and heuristic elements needed to leverage concept graphs in real-world applications. Section 2 documents an algorithm which identifies likely matches between student and instructor concept maps aiming to support semi-automatic matching and scoring for both classroom and unsupervised environments. The knowledge-anchoring, similarity flooding algorithm significantly improves on term-based matching by leveraging map structure and also has potential as a methodology for combining other informal, human-created knowledge representations. Section 3 describes a decompositional tagging approach to organizing (aggregating) automatically extracted biomedical pathway relations. We propose a five-level aggregation strategy for extracted relations and measure the effectiveness of the BioAggregate tagger in preparing extracted information for analysis and visualization. Section 4 evaluates an importance flooding algorithm designed to assist law enforcement investigators in identifying useful investigational leads. While association networks have a long history as an investigational tool, more systematic processes are needed to guide development of high volume cross-jurisdictional data sharing initiatives. We test path-based selection heuristics and importance flooding to improve on traditional association-closeness methodologies.Together, these essays demonstrate how structural and semantic information can be processed in parallel to effectively leverage ambiguous network representations of data. Also, they show that real applications can be addressed by processing available data using an informal concept graph paradigm. This approach and these techniques are potentially useful for workflow systems, business intelligence analysis, and other knowledge management applications where information can be represented in an informal conceptual network and that information needs to be analyzed and converted into actionable, communicable human knowledge.
659

Informatic approaches to evolutionary systems biology

Hudson, Corey M. 11 February 2014 (has links)
<p> The sheer complexity of evolutionary systems biology requires us to develop more sophisticated tools for analysis, as well as more probing and biologically relevant representations of the data. My research has focused on three aspects of evolutionary systems biology. I ask whether a gene&rsquo;s position in the human metabolic network affects the degree to which natural selection prunes variation in that gene. Using a novel orthology inference tool that uses both sequence similarity and gene synteny, I inferred orthologous groups of genes for the full genomes of 8 mammals. With these orthologs, I estimated the selective constraint (the ratio of non-synonymous to synonymous nucleotide substitutions) on 1190 (or 80.2%) of the genes in the metabolic network using a maximum likelihood model of codon evolution and compared this value to the betweenness centrality of each enzyme (a measure of that enzyme&rsquo;s relative global position in the network). Second, I have focused on the evolution of metabolic systems in the presence of gene and genome duplication. I show that increases in a particular gene&rsquo;s copy number are correlated with limiting metabolic flux in the reaction associated with that gene. Finally, I have investigated the proliferative cell programs present in 6 different cancers (breast, colorectal, gastrointestinal, lung, oral squamous and prostate cancers). I found an overabundance of genes that share expression between cancer and embryonic tissue and that these genes form modular units within regulatory, proteininteraction, and metabolic networks. This despite the fact that these genes, as well as the proteins they encode and reactions they catalyze show little overlap among cancers, suggesting parallel independent reversion to an embryonic pattern of gene expression.</p>
660

Polygenic analysis of genome-wide SNP data

Simonson, Matthew A. 28 June 2013 (has links)
<p> One of the central motivators behind genetic research is to understand how genetic variation relates to human health and disease. Recently, there has been a large-scale effort to find common genetic variants associated with many forms of disease and disorder using single nucleotide polymorphisms (SNPs). Several genome-wide association (GWAS) studies have successfully identified SNPs associated with phenotypes. However, the effect sizes attributed to individual variants is generally small, explaining only a very small amount of the genetic risk and heritability expected based on the estimates of family and twin studies. Several explanations exist for the inability of GWAS to find the "missing heritability." </p><p> The results of recent research appear to confirm the prediction made by population genetics theory that most complex phenotypes are highly polygenic, occasionally influenced by a few alleles of relatively large effect, and usually by several of small effect. Studies have also confirmed that common variants are only part of what contributes to the total genetic variance for most traits, indicating rare-variants may play a significant role. </p><p> This research addresses some of the most glaring weaknesses of the traditional GWAS approach through the application of methods of polygenic analysis. We apply several methods, including those that investigate the net effects of large sets of SNPs, more sophisticated approaches informed by biology rather than the purely statistical approach of GWAS, as well as methods that infer the effects of recessive rare variants. </p><p> Our results indicate that traditional GWAS is well complemented and improved upon by methods of polygenic analysis. We demonstrate that polygenic approaches can be used to significantly predict individual risk for disease, provide an unbiased estimate of a substantial proportion of the heritability for multiple phenotypes, identify sets of genes grouped into biological pathways that are enriched for associations, and finally, detect the significant influence of recessive rare variants.</p>

Page generated in 0.1306 seconds