• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2435
  • 314
  • 250
  • 242
  • 50
  • 46
  • 31
  • 31
  • 31
  • 31
  • 31
  • 31
  • 20
  • 20
  • 14
  • Tagged with
  • 4092
  • 1464
  • 558
  • 549
  • 527
  • 448
  • 441
  • 441
  • 439
  • 416
  • 340
  • 335
  • 334
  • 329
  • 323
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
311

Large Scale Machine Learning in Biology

Raj, Anil January 2011 (has links)
Rapid technological advances during the last two decades have led to a data-driven revolution in biology opening up a plethora of opportunities to infer informative patterns that could lead to deeper biological understanding. Large volumes of data provided by such technologies, however, are not analyzable using hypothesis-driven significance tests and other cornerstones of orthodox statistics. We present powerful tools in machine learning and statistical inference for extracting biologically informative patterns and clinically predictive models using this data. Motivated by an existing graph partitioning framework, we first derive relationships between optimizing the regularized min-cut cost function used in spectral clustering and the relevance information as defined in the Information Bottleneck method. For fast-mixing graphs, we show that the regularized min-cut cost functions introduced by Shi and Malik over a decade ago can be well approximated as the rate of loss of predictive information about the location of random walkers on the graph. For graphs drawn from a generative model designed to describe community structure, the optimal information-theoretic partition and the optimal min-cut partition are shown to be the same with high probability. Next, we formulate the problem of identifying emerging viral pathogens and characterizing their transmission in terms of learning linear models that can predict the host of a virus using its sequence information. Motivated by an existing framework for representing biological sequence information, we learn sparse, tree-structured models, built from decision rules based on subsequences, to predict viral hosts from protein sequence data using multi-class Adaboost, a powerful discriminative machine learning algorithm. Furthermore, the predictive motifs robustly selected by the learning algorithm are found to show strong host-specificity and occur in highly conserved regions of the viral proteome. We then extend this learning algorithm to the problem of predicting disease risk in humans using single nucleotide polymorphisms (SNP) -- single-base pair variations -- in their entire genome. While genome-wide association studies usually aim to infer individual SNPs that are strongly associated with disease, we use popular supervised learning algorithms to infer sufficiently complex tree-structured models, built from single-SNP decision rules, that are both highly predictive (for clinical goals) and facilitate biological interpretation (for basic science goals). In addition to high prediction accuracies, the models identify 'hotspots' in the genome that contain putative causal variants for the disease and also suggest combinatorial interactions that are relevant for the disease. Finally, motivated by the insufficiency of quantifying biological interpretability in terms of model sparsity, we propose a hierarchical Bayesian model that infers hidden structured relationships between features while simultaneously regularizing the classification model using the inferred group structure. The appropriate hidden structure maximizes the log-probability of the observed data, thus regularizing a classifier while increasing its predictive accuracy. We conclude by describing different extensions of this model that can be applied to various biological problems, specifically those described in this thesis, and enumerate promising directions for future research.
312

Genome-wide Predictive Simulation on the Effect of Perturbation and the Cause of Phenotypic variations with Network Biology Approach

Jang, In Sock January 2012 (has links)
Thanks to modern high-throughput technologies such as microarray-based gene expression profiling, a large amount of molecular profile data have been generated in several disease related contexts. Despite the fact that these data likely contain systems-level information about disease regulation, revealing the underlying dynamics between genes and mechanisms of gene regulation in genome wide way remains a major challenge. Understanding these mechanisms in genome-wide fashion and the resulting dynamical behavior is a key goal of the nascent field of systems biology. One approach to dissect the logic of the cell, is to use reverse engineering algorithms that infer regulatory interactions form molecular profile data. In this context, use of information theoretic approaches has been very successful: for instance, the ARACNe algorithm has been able to successfully infer transcriptional interactions between transcription factors and their target genes; similarly, the MINDy algorithm has identified post-translational modulators of transcription factor activity by multivariate analysis of large gene expression profile datasets. Many methods have been proposed to improve ARACNe both from a computational efficiency perspective and in terms of increasing the accuracy of the predicted interactions. Yet, the main core of ARACNe, i.e., the data processing inequality (DPI), has remained virtually unaffected even though modern information theory has extended the DPI theorem into higher-order interactions. First, we introduce an improvement of ARACNe, hARACNe, which recursively applies a higher-order DPI analysis. We show that the new algorithm successfully detects false positive feed-forward loops involving more than three genes. Second, we extend the MINDy algorithm using co-information as a novel metric, thus replacing the conditional mutual information and significantly improving the algorithm"™s predictions. Largely, two ultimate goals of systems perturbation studies are to reveal how human diseases are connected with the genes, and to find regulatory mechanism that determine disease cell behavior. However, these goals remain daunting: even the most talented researchers still have to rely on laborious genetic screens and very simplified hypotheses about effects of given perturbation have been experimentally validated and roughly analyzed with very limited regulatory sub-network such as pathway. To overcome these limitations, use of gene regulatory network is explored in this thesis research. Specifically, we propose creation of a new algorithm that can accurately predict cell state in genome-wide fashion following perturbation of individual genes, such as from silencing or ectopic expression experiments. Furthermore, experimentally validated methods to predict genome-wide changes in a cellular system following a genetic perturbation (e.g., gene silencing or ectopic expression) are still unavailable, and even though phenotypic variations are experimentally profiled and gene signatures are selected by being statistically tested, finding the exact regulator which systematically causes significant variations of gene signature is still quite challenging. In this research, I introduce and experimentally validate a probabilistic Bayesian method to simulate the propagation of genetic perturbations on integrated gene regulatory networks inferred by the hARACNe and coMINDy algorithms from human B cell data. With the same predictive framework, we also computationally predict the master driver (regulator) that is most likely to have produced the observed variations in gene expression levels; these studies as a systematized pre-screening process before genetic manipulation. I predict in silico the effect of silencing of several genes as well as the cause of phenotypic variations. Performance analysis, tested by Gene Set Enrichment Analysis (GSEA), shows that the new methods are highly predictive, thus providing an initial step toward building predictive probabilistic regulatory models, which may be applicable as pre-screening steps in perturbation studies.
313

Topics in Genomic Signal Processing

Jajamovich, Guido Hugo January 2012 (has links)
Genomic information is digital in its nature and admits mathematical modeling in order to gain biological knowledge. This dissertation focuses on the development and application of detection and estimation theories for solving problems in genomics by describing biological problems in mathematical terms and proposing a solution in this domain. More specifically, a novel framework for hypothesis testing is presented, where it is desired to decide among multiple hypotheses and where each hypothesis involves unknown parameters. Within this framework, a test is developed to perform both detection and estimation jointly in an optimal sense. The proposed test is then applied to the problem of detecting and estimating periodicities in DNA sequences. Moreover, the problem of motif discovery in DNA sequences is presented, where a set of sequences is observed and it is needed to determine which sequences contain instances (if any) of an unknown motif and estimate their positions. A statistical description of the problem is used and a sequential Monte Carlo method is applied for the inference. Finally, the phasing of haplotypes for diploid organisms is introduced, where a novel mathematical model is proposed. The haplotypes that are used to reconstruct the observed genotypes of a group of unrelated individuals are detected and the haplotype pair for each individual in the group is estimated. The model translates a biological principle, the maximum parsimony principle, to a sparseness condition.
314

Integrating Functional Genomics with Systems Biology to Discover Drivers and Therapeutic Targets of Human Malignancies

Yu, Jiyang January 2012 (has links)
Genome-wide RNAi screening has emerged as a powerful tool for loss-of-function studies that may lead to therapeutic target discovery for human malignancies in the era of personalized medicine. However, due to high false-positive and false-negative rates arising from noise of high-throughput measurements and off-target effects, powerful computational tools and additional knowledge are much needed to analyze and complement it. Availability of high-throughput genomic data including gene expression profiles, copy number variations from large-sampled primary patients and cell lines allows us to tackle underlying drivers causally associated with tumorigenesis or drug-resistance. In my dissertation, I have developed a framework to integrate functional RNAi screens with systems biology of cancer genomics to tailor potential therapeutics for reversal of drug-resistance or treatment of aggressive tumors. I developed a series of algorithms and tools to deconvolute, QC and post-analyze high-throughput shRNA screening data by next-generation sequencing technology (shSeq), particularly a novel Bayesian hierarchical modeling approach to integrate multiple shRNAs targeting the same gene, which outperforms existing methods. In parallel, I developed a systems biology algorithm, NetBID2, to infer disease drivers from high-throughput genomic data by reverse-engineering network and Bayesian inference, which is able to detect hidden drivers that traditional methods fail to find. Integrating NetBID2 with functional RNAi screens, I have identified known and novel driver-type therapeutic targets in various disease contexts. For example, I discovered that AKT1 is a driver for glucocorticoid (GC) resistance, a problem in the treatment of T-ALL. The inhibition of AKT1 was validated to reverse GC-resistance. Additionally, upon silencing predicted master regulators of GC resistance with shRNA screens, 13 out of 16 were validated to significantly overcome resistance. In breast cancer, I discovered that STAT3 is required for transformation of HER2+ breast cancer, an aggressive breast tumor subtype. The suppression of STAT3 was confirmed in vitro and in vivo to be an effective therapy for HER2+ breast cancer. Moreover, my analysis revealed that STAT3 silencing only works in ER- cases. Using my framework, I have also identified potential therapeutic targets for ABC or GCB-type DLBCL and subtype-based breast cancer that are currently being validated.
315

A Novel Platform to Perform Cancer-Relevant Synthetic Dosage Lethality Screens in Saccharomyces cerevisiae

Dittmar, John January 2013 (has links)
The most significant challenge in developing cancer therapies is to selectively kill cancer cells while leaving normal cells unharmed. One approach to specifically target tumors is to exploit gene expression changes specific to cancer cells. For example, synthetic dosage lethal (SDL) interactions occur when increased gene dosage is lethal only in combination with a specific gene disruption. Since cancer cells often specifically over-express a host of gene products, discovering SDL interactions could reveal new therapeutic targets for cancer treatment. In this situation, cancer cells over-expressing a particular gene would be killed when another gene's expression was silenced. Critically, normal cells would be unaffected because the gene whose expression is knocked down is not essential in normal cells.In this study, I describe the use of Saccharomyces cerevisiae as a model system to speed the discovery of cancer relevant SDL interactions. This approach is based on the observation that many genes that are over-expressed in human cancers are involved in essential functions, such as regulation of the cell cycle and DNA replication. Consequently, many of these genes are highly conserved from humans to Saccharomyces cerevisiae and therefore, any SDL discovered through theses screens with a mammalian ortholog is a potential therapeutic cancer target. Using Saccharomyces cerevisiae as a model has several advantages over performing similar screens in mammalian systems, including reduced costs and faster results. For example, a novel technique called Selective Ploidy Ablation (SPA) allows for the completion of a full genome-wide SDL screen in only 6 days. Using SPA to perform SDL screens produces tens of thousands of yeast colonies that must be analyzed appropriately to identify affected mutants. To aid in the data processing, I developed ScreenMill, a suite of software tools that allows the quantification and review of high-throughput screen data. As a companion to ScreenMill, I also developed a tool termed CLIK (Cutoff Linked to Interaction Knowledge), which uses the wealth of known yeast genetic and physical interactions instead of statistical models to inform screen cutoff and evaluate screen results. Together these tools aided in the completion and evaluation of 23 cancer relevant yeast SDL screens. From these screens I prioritized a list of validated SDLs that have human orthologs and thus, represent potential targets for cancer treatment.To understand the mechanism underlying the SDL interaction discovered in one of the screens performed, I analyzed the results from over-expressing NPL3 in more detail. The Npl3 protein plays a role in mRNA processing and translation and its human ortholog, SFRS1 (ASF), is involved in pre-mRNA splicing, mRNA nuclear export and translation and is also up-regulated in PTEN deficient breast cancer. In yeast, over 50 novel SDL interactions with NPL3 were discovered including several with deletions of lysine deactylases (KDACs). These are particularly interesting because KDACs are evolutionarily conserved and are currently being explored as potential anti-cancer drug targets. Furthermore, using several mutant alleles of NPL3, I show that most of the SDL interactions defined are due to its role in the nucleus and are linked to the acetylation state of the Npl3 protein. Thus, by performing SDL screens in yeast, I demonstrate their utility in defining potential cancer relevant drug targets, as well as uncovering novel gene functions.
316

Analyzing Genomic Studies and a Screen for Genes that Suppress Information Loss During DNA Damage Repair

Pierce, Steven January 2013 (has links)
This thesis is concerned with the means by which cells preserve genetic information and, in particular, with the competition between different DNA damage responses. DNA is continuously damaged and imperfect repair can have extremely detrimental effects. Double strand breaks are the most severe form of damage and can be repaired in several different ways or countered by other cellular responses. DNA context is important; cell cycle, chromosomal structure, and sequence all can make DSBs more likely or more problematic to repair. Saccharomyces cerevisiae is very resilient to DSBs and primarily uses a process called homologous recombination to repair DNA damage. To further our understanding of how S. cerevisiae efficiently uses homologous recombination, and thereby minimizes genetic degradation, I performed a screen for genes affecting this process. >In devising this study, I set out to quickly quantify the contribution of every non-essential yeast gene to suppressing genetic rearrangements and deletions at a single locus. Before I began I did not fully appreciate how variable and contingent this type of recombination phenotype could be. Accounting for the complex and changing recombination baseline across many tests became a significant effort unto itself. The requirements of the experimental protocols precluded the use of traditional recombination rate calculation methods. Searching for the means to compare the utility of normalizations and to validate my results, I sought general approaches for analyzing genome wide screen data and coordinating interpretation with existing knowledge. It was advantageous during this study to develop novel analysis tools. The second chapter describes one of these tools we developed, a technique called CLIK (Cutoff Linked to Interaction Knowledge). CLIK uses preexisting biological information to evaluate screen performance and to empirically define a significance threshold. This technique was used to analyze the screen results described in chapter three. The screen in chapter three represents the primary work of this dissertation. Its purpose was to identify genes and biological processes important for the suppression of recombination between DNA tandem repeats in yeast. By searching for gene deletion strains that show an increase in non-conservative single strand annealing, I found that many genetic backgrounds could induce altered recombination frequencies, with genes involved in DNA repair, mitochondria structural and ribosomal, and chromatin remodeling genes being most important for minimizing the loss of genetic information by HR. In addition, I found that the remodeling complex INO80 subunits, ARP8 and IES5 are significant in suppressing SSA.
317

Computational Contributions Towards Scalable and Efficient Genome-wide Association Methodology

Prabhu, Snehit January 2013 (has links)
Genome-wide association studies are experiments designed to find the genetic bases of physical traits: for example, markers correlated with disease status by comparing the DNA of healthy individuals to the DNA of affecteds. Over the past two decades, an exponential increase in the resolution of DNA-testing technology coupled with a substantial drop in their cost have allowed us to amass huge and potentially invaluable datasets to conduct such comparative studies. For many common diseases, datasets as large as a hundred thousand individuals exist, each tested at million(s) of markers (called SNPs) across the genome. Despite this treasure trove, so far only a small fraction of the genetic markers underlying most common diseases have been identified. Simply stated - our ability to predict phenotype (disease status) from a person's genetic constitution is still very limited today, even for traits that we know to be heritable from one's parents (e.g. height, diabetes, cardiac health). As a result, genetics today often lags far behind conventional indicators like family history of disease in terms of its predictive power. To borrow a popular metaphor from astronomy, this veritable "dark matter" of perceivable but un-locatable genetic signal has come to be known as missing heritability. This thesis will present my research contributions in two hotly pursued scientific hypotheses that aim to close this gap: (1) gene-gene interactions, and (2) ultra-rare genetic variants - both of which are not yet widely tested. First, I will discuss the challenges that have made interaction testing difficult, and present a novel approximate statistic to measure interaction. This statistic can be exploited in a Monte-Carlo like randomization scheme, making an exhaustive search through trillions of potential interactions tractable using ordinary desktop computers. A software implementation of our algorithm found a reproducible interaction between SNPs in two calcium channel genes in Bipolar Disorder. Next, I will discuss the functional enrichment pipeline we subsequently developed to identify sets of interacting genes underlying this disease. Lastly, I will talk about the application of coding theory to cost-efficient measurement of ultra-rare genetic variation (sometimes, as rare as just one individual carrying the mutation in the entire population).
318

Probabilistic Reconstruction and Comparative Systems Biology of Microbial Metabolism

Plata Caviedes, German January 2013 (has links)
With the number of sequenced microbial species soon to be in the tens of thousands, we are in a unique position to investigate microbial function, ecology, and evolution on a large scale. In this dissertation I first describe the use of hundreds of in silico models of bacterial metabolic networks to study the long-term the evolution of growth and gene-essentiality phenotypes. The results show that, over billions of years of evolution, the conservation of bacterial phenotypic properties drops by a similar fraction per unit time following an exponential decay. The analysis provides a framework to generate and test hypotheses related to the phenotypic evolution of different microbial groups and for comparative analyses based on phenotypic properties of species. Mapping of genome sequences to phenotypic predictions -such as used in the analysis just described- critically relies on accurate functional annotations. In this context, I next describe GLOBUS, a probabilistic method for genome-wide biochemical annotations. GLOBUS uses Gibbs sampling to calculate probabilities for each possible assignment of genes to metabolic functions based on sequence information and both local and global genomic context data. Several important functional predictions made by GLOBUS were experimentally validated in Bacillus subtilis and hundreds more were obtained across other species. Complementary to the automated annotation method, I also describe the manual reconstruction and constraints-based analysis of the metabolic network of the malaria parasite Plasmodium falciparum. After careful reconciliation of the model with available biochemical and phenotypic data, the high-quality reconstruction allowed the prediction and in vivo validation of a novel potential antimalarial target. The model was also used to contextualize different types of genome-scale data such as gene expression and metabolomics measurements. Finally, I present two projects related to population genetics aspects of sequence and genome evolution. The first project addresses the question of why highly expressed proteins evolve slowly, showing that, at least for Escherichia coli, this is more likely to be a consequence of selection for translational efficiency than selection to avoid misfolded protein toxicity. The second project investigates genetic robustness mediated by gene duplicates in the context of large natural microbial populations. The analysis shows that, under these conditions, the ability of duplicated yeast genes to effectively compensate for the loss of their paralogs is not a monotonic function of their sequence divergence.
319

Network and Algebraic Topology of Influenza Evolution

Chan, Joseph January 2013 (has links)
Evolution is a force that has molded human existence since its divergence from chimpanzees about 5.4 million years ago. In that same amount of time, an influenza virus, which replicates every six hours, would have undergone an equivalent number of generations over only a hundred years. The fast replication times of influenza, coupled with its high mutation rate, make the virus a perfect model to study real-time evolution at a mega-Darwin scale, more than a million times faster than human evolution. While recent developments in high-throughput sequencing provide an optimal opportunity to dissect their genetic evolution, a concurrent growth in computational tools is necessary to analyze the large influx of complex genomic data. In my thesis, I present novel computational methods to examine different aspects of influenza evolution. I first focus on seasonal influenza, particularly the problems that hamper public health initiatives to combat the virus. I introduce two new approaches: 1. The q2-coefficient, a method of quantifying pathogen surveillance, and 2. FluGraph, a technique that employs network topology to track the spread of seasonal influenza around the world. The second chapter of my thesis examines how mutations and reassortment combine to alter the course of influenza evolution towards pandemic formation. I highlight inherent deficiencies in the current phylogenetic paradigm for analyzing evolution and offer a novel methodology based on algebraic topology that comprehensively reconstructs both vertical and horizontal evolutionary events. I apply this method to viruses, with emphasis on influenza, but foresee broader application to cancer cells, bacteria, eukaryotes, and other taxa.
320

Large-scale Functional Connectivity in the Human Brain Reveals Fundamental Mechanisms of Cognitive, Sensory and Emotion Processing in Health and Psychiatric Disorders

Pantazatos, Spiro January 2014 (has links)
Functional connectivity networks that integrate remote areas of the brain as working functional units are thought to underlie fundamental mechanisms of perception and cognition, and have emerged as an active area of investigation. However, traditional approaches of measuring functional connectivity are limited in that they rely on a priori specification of one or a few brain regions. Therefore, the development of data-driven and exploratory approaches that assess functional connectivity on a large-scale are required in order to further understand the functional network organization of these processes in both health and disease. In this thesis project, I investigate the roles of functional connectivity in visual search (Chapter 2, (Pantazatos, Yanagihara et al., 2012)) and bistable perception (Chapter 3, (Karten et al., 2013)) using traditional functional connectivity approaches, and develop and apply new approaches to characterize the large-scale networks underlying the processing of supraliminal (Chapter 4, (Pantazatos et al., 2012a)) and subliminal (Chapter 5, (Pantazatos, Talati et al., 2012b)) emotional threat signals, speech and song processing in autism (Chapter 6, (Lai et al., 2012)), and face processing in social anxiety disorder (Chapter 7, (Pantazatos et al., 2013)). Finally, I complement the latter study with an investigation of structural morphological abnormalities in social anxiety disorder (Chapter 8, (Talati et al., 2013)). Each of these chapters has been or is about to be published in peer reviewed journals and this thesis provides an overview of the entire body of investigation, based on advances in understanding the role of large-scale neural processes as fundamental organizational units that underlie behavior. In Chapter 2, Independent Components Analysis (ICA), Psychophysiological Interactions (PPI) and Dynamic Causal Modeling (DCM) analyses were used to investigate the hypothesis that expectation and attention-related interactions between ventral and medial prefrontal cortex and association visual cortex underlie visual search for an object. Results extend previous models of visual search processes to include specific frontal-occipital neuronal interactions during a natural and complex search task. In Chapter 3, PPI analyses revealed percept-dependent changes in connectivity between visual cortex, frontoparietal attention and default mode networks during bistable image perception. These findings advance neural models of bistable perception by implicating the default mode and frontoparietal networks during image segmentation. In Chapters 4 and 5, an exploratory approach based on multivariate pattern analysis of large-scale, condition-dependent functional connectivity was developed and applied in order to further understand the neural mechanisms of threat-related emotion processing. This approach was successful in extracting sufficient information to "brain-read" both unattended supraliminal (Chapter 4) and subliminal (Chapter 5) fear perception in healthy subjects. Informative features for supraliminal fear perception included functional connections between thalamus and superior temporal gyrus, angular gyrus and hippocampus, and fusiform and amygdala, while informative features for subliminal fear perception included middle temporal gyrus, cerebellum and angular gyrus. In psychiatric disorders, large-scale functional connectivity is typically assessed during resting-state (i.e. no task or stimulus). However, disorder-dependent alterations in functional network architecture may be more or less prominent during a stimulus or task that is behaviorally relevant to the disorder, as is exemplified by enhanced long-range, frontal-posterior connectivity during song (vs. speech) perception in autism (Chapter 6). In the case of social anxiety disorder (SAD), pattern analysis of large-scale, functional connectivity during neutral face perception was sensitive enough to discriminate individual subjects with SAD from both healthy controls and panic disorder (Chapter 7). The most informative feature was functional connectivity between left hippocampus and left temporal pole, which was reduced in medication-free SAD subjects, and which increased following 8-weeks SSRI treatment, with greater increases correlating with greater decreases in symptom severity. This finding parallels results from observed neuroanatomical abnormalities in SAD, which include reduced grey matter volume in the temporal pole, in addition to increased grey matter volume in cerebellum and fusiform (Chapter 8). The above findings suggest promise for emerging functional connectivity and structural-based neurobiomarkers for SAD diagnosis and treatment effects.

Page generated in 0.0205 seconds