591 |
Analyzing Cell Painting images using different CNNs and Conformal Prediction variations : Optimization of a Deep Learning model to predict the MoA of different drugsHillver, Anna January 2022 (has links)
Microscopy imaging based techniques, such as the Cell Painting assay, could be used to generate images that visualize the Mechanism of Action (MoA) of a drug, which could be of great use in drug development. In order to extract information and predict the MoA of a new compound from these images we need powerful image analysis tools. The purpose with this project is to further develop a Deep Learning model to predict the MoA of different drugs from Cell Painting images using Convolutional Neural Networks (CNNs) and Conformal Prediction. The specific task was to compare the accuracy of different CNN architectures and to compare the efficiency of different nonconformity functions. During the project the CNN architectures ResNet50, ResNet101 and DenseNet121 were compared as well as the nonconformity functions Inverse Probability, Margin and a combination of them both. No significant difference in accuracy between the CNNs and no difference in efficiency between the nonconformity functions was measured. The results showed that the model could predict the MoA of a compound with high accuracy when all compounds were used both in training, validation and test of the model, which validates the implementations. However, it is desirable for the model to be able to predict the MoA of a new compound if the model has been trained on other compounds with the same MoA. This could not be confirmed through this project and the model needs to be further investigated and tested with another dataset in order to be used for that purpose.
|
592 |
Experimental Illumination of Comprehensive Fitness Landscapes: A DissertationHietpas, Ryan T. 24 June 2013 (has links)
Evolution is the single cohesive logical framework in which all biological processes may exist simultaneously. Incremental changes in phenotype over imperceptibly large timescales have given rise to the enormous diversity of life we witness on earth both presently and through the natural record. The basic unit of evolution is mutation, and by perturbing biological processes, mutations may alter the fitness of an individual. However, the fitness effect of a mutation is difficult to infer from historical record, and complex to obtain experimentally in an efficient and accurate manner.
We have recently developed a high throughput method to iteratively mutagenize regions of essential genes in yeast and subsequently analyze individual mutant fitness termed Exceedingly Methodical and Parallel Investigation of Randomized Individual Codons (EMPIRIC). Utilizing this technique as exemplified in Chapters II and III, it is possible to determine the fitness effects of all possible point mutations in parallel through growth competition followed by a high throughput sequencing readout. We have employed this technique to determine the distribution of fitness effects in a nine amino acid region of the Hsp90 gene of S. cerevisiae under elevated temperature, and found the bimodal distribution of fitness effects to be remarkably consistent with near-neutral theory. Comparing the measured fitness effects of mutants to the natural record, phylogenetic alignments appear to be a poor predictor of experimental fitness.
In Chapter IV, to further interrogate the properties of this region, library competition under conditions of elevated temperature and salinity were performed to study the potential of protein adaptation. Strikingly, whereas both optimal and elevated temperatures produced no statistically significant beneficial mutations, under conditions of elevated salinity, adaptive mutations appear with fitness advantages up to 8% greater than wild type. Of particular interest, mutations conferring fitness benefits under conditions of elevated salinity almost always experience a fitness defect in other experimental conditions, indicating these mutations are environmentally specialized. Applying the experimental fitness measurements to long standing theoretical predictions of adaptation, our results are remarkably consistent with Fisher’s Geometric Model of protein evolution.
Epistasis between mutations can have profound effects on evolutionary trajectories. Although the importance of epistasis has been realized since the early 1900s, the interdependence of mutations is difficult to study in vivo due to the stochastic and constant nature of background mutations. In Chapter V, utilizing the EMPIRIC methodology allows us to study the distribution of fitness effects in the context of mutant genetic backgrounds with minimal influence from unintended background mutations. By analyzing intragenic epistatic interactions, we uncovered a complex interplay between solvent shielded structural residues and solvent exposed hydrophobic surface in the amino acid 582-590 region of Hsp90. Additionally, negative epistasis appears to be negatively correlated with mutational promiscuity while additive interactions are positively correlated, indicating potential avenues for proteins to navigate fitness ‘valleys’.
In summary, the work presented in this dissertation is focused on applying experimental context to the theory-rich field of evolutionary biology. The development and implementation of a novel methodology for the rapid and accurate assessment of organismal fitness has allowed us to address some of the most basic processes of evolution including adaptation and protein expression level. Through the work presented here and by investigators across the world, the application of experimental data to evolutionary theory has the potential to improve drug design and human health in general, as well as allow for predictive medicine in the coming era of personalized medicine.
|
593 |
Systematic Experimental Determination of Functional Constraints on Proteins and Adaptive Potential of Mutations: A DissertationJiang, Li 23 May 2016 (has links)
Sequence-function relationship is a fundamental question for many branches of modern biomedical research. It connects the primary sequence of proteins to the function of proteins and fitness of organisms, holding answers for critical questions such as functional consequences of mutations identified in whole genome sequencing and adaptive potential of fast evolving pathogenic viruses and microbes. Many different approaches have been developed to delineate the genotype-phenotype map for different proteins, but are generally limited by their throughput or precision. To systematically quantify the fitness of large numbers of mutations, I modified a novel high throughput mutational scanning approach (EMPIRIC) to investigate the fitness landscape of mutations in important regions of essential proteins from the yeast or RNA viruses. Using EMPIRIC, I analyzed the interplay of the expression level and sequence of Hsp90 on the yeast growth and revealed latent effect of mutations at reduced expression levels of Hsp90. I also examined the functional constraint on the receptor binding site of the Env of Human Immunodeficiency Virus (HIV) and uncovered enhanced receptor binding capacity as a common pathway for adaptation of HIV to laboratory conditions. Moreover, I explored the adaptive potential of neuraminidase (NA) of influenza A virus to a NA inhibitor, oseltamivir, and identified novel oseltamivir resistance mutations with distinct molecular mechanisms. In summary, I applied a high throughput functional genomics approach to map the sequence-function relationship in various systems and examined the evolutionary constraints and adaptive potential of essential proteins ranging from molecular chaperones to drug-targetable viral proteins.
|
594 |
Genomic and Transcriptomic Investigation of Endemic Burkitt Lymphoma and Epstein Barr VirusKaymaz, Yasin 31 July 2017 (has links)
Endemic Burkitt lymphoma (eBL) is the most common pediatric cancer in malaria-endemic equatorial Africa and nearly always contains Epstein-Barr virus (EBV), unlike sporadic Burkitt Lymphoma (sBL) that occurs with a lower incidence in developed countries. Despite this increased burden the study of eBL has lagged. Additionally, while EBV was isolated from an African Burkitt lymphoma tumor 50 years ago, however, the impact of viral variation in oncogenesis is just beginning to be fully explored. In my thesis research, I focused on investigating molecular genetics of the endemic form of this lymphoma with a particular emphasis on the role of the virus and its variation in pathogenesis using novel sequencing and bioinformatic strategies.
First, we sought to understand pathogenesis by investigating transcriptomes using RNA sequencing (RNAseq) from 30 primary eBL tumors and compared to sBL tumors. BL tumor samples were prospectively obtained from 2009 until 2012 in Kenya. Within eBL tumors, minimal expression differences were found based on anatomical presentation site, in-hospital survival rates, and EBV genome type; suggesting that eBL tumors are homogeneous without marked subtypes. The outstanding difference detected using surrogate variable analysis was the significantly decreased expression of key genes in the immunoproteasome complex in eBL tumors carrying type 2 EBV compared to type 1 EBV. Secondly, in comparison to previously published pediatric sBL specimens, the majority of the expression and pathway differences were related to the PTEN/PI3K/mTOR signaling pathway and was correlated most strongly with EBV status rather than the geographic designation. Moreover, the common mutations were observed significantly less frequently in eBL tumors harboring EBV type 1, with mutation frequencies similar between tumors with EBV type 2 and without EBV. In addition to the previously reported genes, we identified a set of new genes mutated in BL. Overall, these suggested that EBV, particularly EBV type 1, supports BL oncogenesis alleviating the need for particular driver mutations in the human genome.
Second, we sought to comprehensively define sequence variations of EBV across the viral genome in eBL tumor cells and normal infections, and correlate variations with clinical phenotypes and disease risk. We investigated the whole genome sequence of EBV from primary tumors (N=41) and plasma from eBL patients (N=21) as well as EBV in the blood of healthy children (N=29) within the same malaria endemic region. We conducted a genome wide association analysis study with viral genomes of healthy kids and BL kids. Furthermore, we found that the frequencies of EBV types among healthy kids were at equal levels while they were skewed in favor of type 1 (70%) among eBL kids. To pinpoint the fundamental divergence between viral genome subtypes, type 1 and type 2, we constructed phylogenetic trees comparing to all public EBV genomes. The pattern of variation defined the substructures correlated with the subtypes. This investigation not only deciphers the puzzling pathogenic differences between subtypes but also helps to understand how these two EBV types persist in the population at the same time.
Overall, this research provides insight into the molecular underpinning of eBL and the role of EBV. It further provides the groundwork and means to unravel the complexity of EBV population structure and provide insight into the viral variation that may influence oncogenesis and outcomes in eBL and other EBV-associated diseases. In addition, genomic and mutational analyses of Burkitt lymphoma tumors identify key differences based on viral content and clinical outcomes suggesting new avenues for the development of prognostic molecular biomarkers and therapeutic interventions.
|
595 |
On Identifying Signatures of Positive Selection in Human Populations: A DissertationCrisci, Jessica L. 25 June 2013 (has links)
As sequencing technology continues to produce better quality genomes at decreasing costs, there has been a recent surge in the variety of data that we are now able to analyze. This is particularly true with regards to our understanding of the human genome—where the last decade has seen data advances in primate epigenomics, ancient hominid genomics, and a proliferation of human polymorphism data from multiple populations. In order to utilize such data however, it has become critical to develop increasingly sophisticated tools spanning both bioinformatics and statistical inference. In population genetics particularly, new statistical approaches for analyzing population data are constantly being developed—unfortunately, often without proper model testing and evaluation of type-I and type-II error. Because the common Wright-Fisher assumptions underlying such models are generally violated in natural populations, this statistical testing is critical. Thus, my dissertation has two distinct but related themes: 1) evaluating methods of statistical inference in population genetics, and 2) utilizing these methods to analyze the evolutionary history of humans and our closest relatives. The resulting collection of work has not only provided important biological insights (including some of the first strong evidence of selection on human-specific epigenetic modifications (Shulha, Crisci, Reshetov, Tushir et al. 2012, PLoS Bio), and a characterization of human-specific genetic changes distinguishing modern humans from Neanderthals (Crisci et al. 2011, GBE)), but also important insights in to the performance of population genetic methodologies which will motivate the future development of improved approaches for statistical inference (Crisci et al, in review).
|
596 |
Optimizing RNA Library Preparation to Redefine the Translational Status of 80S Monosomes: A DissertationHeyer, Erin E. 06 October 2015 (has links)
Deep sequencing of strand-specific cDNA libraries is now a ubiquitous tool for identifying and quantifying RNAs in diverse sample types. The accuracy of conclusions drawn from these analyses depends on precise and quantitative conversion of the RNA sample into a DNA library suitable for sequencing. Here, we describe an optimized method of preparing strand-specific RNA deep sequencing libraries from small RNAs and variably sized RNA fragments obtained from ribonucleoprotein particle footprinting experiments or fragmentation of long RNAs. Because all enzymatic reactions were optimized and driven to apparent completion, sequence diversity and species abundance in the input sample are well preserved. This optimized method was used in an adapted ribosome-profiling approach to sequence mRNA footprints protected either by 80S monosomes or polysomes in S. cerevisiae. Contrary to popular belief, we show that 80S monosomes are translationally active as demonstrated by strong three-nucleotide phasing of monosome footprints across open reading frames. Most mRNAs exhibit some degree of monosome occupancy, with monosomes predominating on upstream ORFs, canonical ORFs shorter than ~590 nucleotides and any ORF for which the total time required to complete elongation is substantially shorter than the time required for initiation. Additionally, endogenous NMD targets tend to be monosome-enriched. Thus, rather than being inactive, 80S monosomes are significant contributors to overall cellular translation.
|
597 |
Systematic Analysis of Duplications and Deletions in the Malaria Parasite P. falciparum: A DissertationDeConti, Derrick K. 15 April 2015 (has links)
Duplications and deletions are a major source of genomic variation. Duplications, specifically, have a significant impact on gene genesis and dosage, and the malaria parasite P. falciparum has developed resistance to a growing number of anti-malarial drugs via gene duplication. It also contains highly duplicated families of antigenically variable allelic genes. While specific genes and families have been studied, a comprehensive analysis of duplications and deletions within the reference genome and population has not been performed. We analyzed the extent of segmental duplications (SD) in the reference genome for P. falciparum, primarily by a whole genome self alignment. We discovered that while 5% of the genome identified as SD, the distribution within the genome was partition clustered, with the vast majority localized to the subtelomeres. Within the SDs, we found an overrepresentation of genes encoding antigenically diverse proteins exposed to the extracellular membrane, specifically the var, rifin, and stevor gene families. To examine variation of duplications and deletions within the parasite populations, we designed a novel computational methodology to identify copy number variants (CNVs) from high throughput sequencing, using a read depth based approach refined with discordant read pairs. After validating the program against in vitro lab cultures, we analyzed isolates from Senegal for initial tests into clinical isolates. We then expanded our search to a global sample of 610 strains from Africa and South East Asia, identifying 68 CNV regions. Geographically, genic CNV were found on average in less than 10% of the population, indicating that CNV are rare. However, CNVs at high frequency were almost exclusively duplications associated with known drug resistant CNVs. We also identified the novel biallelic duplication of the crt gene – containing both the chloroquine resistant and sensitive allele. The synthesis of our SD and CNV analysis indicates a CNV conservative P. falciparum genome except where drug and human immune pressure select for gene duplication.
|
598 |
An RNA comparison study between the Amazonian, Centro-American and Orinocan semispecies of Drosophila paulistorumHedman, Erik January 2020 (has links)
Differential expression analysis can be a powerful method to investigate expressed differences between closely related species. Our ambition is to highlight differentially expressed nuclear genes to explain the hybrid incompatibilities among the Amazonian, Centro-American and Orinocan semispecies of Drosophila paulistorum. RNA sequencing (RNA-seq) establishes the foundation of the study where we first evaluate the influence of two distinct alignment references. We discover the benefits of concatenating a de novo assembly instead of using the genome reference of a close relative. The bioinformatic pipeline handles the interesting inclusion of D. melanogaster and D. willistoni, where their contribution assists in the search for previously studied speciation genes. Among the down- and upregulated subsets we can see a diverse mix of general biological processes such as regulatory functions and transcriptional factors. In the end we uncover potential indications to why the Amazonian seems to be the least compatible semispecie to produce hybrids. This study provides a competitive working frame for comparative RNA-seq studies between closely related species.
|
599 |
Computational biology approaches in drug repurposing and gene essentiality screeningPhilips, Santosh 20 June 2016 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The rapid innovations in biotechnology have led to an exponential growth of data
and electronically accessible scientific literature. In this enormous scientific data,
knowledge can be exploited, and novel discoveries can be made. In my dissertation, I
have focused on the novel molecular mechanism and therapeutic discoveries from big
data for complex diseases. It is very evident today that complex diseases have many
factors including genetics and environmental effects. The discovery of these factors is
challenging and critical in personalized medicine. The increasing cost and time to
develop new drugs poses a new challenge in effectively treating complex diseases. In this
dissertation, we want to demonstrate that the use of existing data and literature as a
potential resource for discovering novel therapies and in repositioning existing drugs. The
key to identifying novel knowledge is in integrating information from decades of research
across the different scientific disciplines to uncover interactions that are not explicitly
stated. This puts critical information at the fingertips of researchers and clinicians who
can take advantage of this newly acquired knowledge to make informed decisions.
This dissertation utilizes computational biology methods to identify and integrate
existing scientific data and literature resources in the discovery of novel molecular targets
and drugs that can be repurposed. In chapters 1 of my dissertation, I extensively sifted
through scientific literature and identified a novel interaction between Vitamin A and CYP19A1 that could lead to a potential increase in the production of estrogens. Further in
chapter 2 by exploring a microarray dataset from an estradiol gene sensitivity study I was
able to identify a potential novel anti-estrogenic indication for the commonly used
urinary analgesic, phenazopyridine. Both discoveries were experimentally validated in
the laboratory. In chapter 3 of my dissertation, through the use of a manually curated
corpus and machine learning algorithms, I identified and extracted genes that are
essential for cell survival. These results brighten the reality that novel knowledge with
potential clinical applications can be discovered from existing data and literature by
integrating information across various scientific disciplines.
|
600 |
Integrative approaches to single cell RNA sequencing analysisJohnson, Travis Steele 21 September 2020 (has links)
No description available.
|
Page generated in 0.1079 seconds