Global ETD Search

51	Data-Intensive Biocomputing in the Cloud Meeramohideen Mohamed, Nabeel 25 September 2013 (has links) Next-generation sequencing (NGS) technologies have made it possible to rapidly sequence the human genome, heralding a new era of health-care innovations based on personalized genetic information. However, these NGS technologies generate data at a rate that far outstrips Moore\'s Law. As a consequence, analyzing this exponentially increasing data deluge requires enormous computational and storage resources, resources that many life science institutions do not have access to. As such, cloud computing has emerged as an obvious, but still nascent, solution. This thesis intends to investigate and design an efficient framework for running and managing large-scale data-intensive scientific applications in the cloud. Based on the learning from our parallel implementation of a genome analysis pipeline in the cloud, we aim to provide a framework for users to run such data-intensive scientific workflows using a hybrid setup of client and cloud resources. We first present SeqInCloud, our highly scalable parallel implementation of a popular genetic variant pipeline called genome analysis toolkit (GATK), on the Windows Azure HDInsight cloud platform. Together with a parallel implementation of GATK on Hadoop, we evaluate the potential of using cloud computing for large-scale DNA analysis and present a detailed study on efficiently utilizing cloud resources for running data-intensive, life-science applications. Based on our experience from running SeqInCloud on Azure, we present CloudFlow, a feature rich workflow manager for running MapReduce-based bioinformatic pipelines utilizing both client and cloud resources. CloudFlow, built on the top of an existing MapReduce-based workflow manager called Cloudgene, provides unique features that are not offered by existing MapReduce-based workflow managers, such as enabling simultaneous use of client and cloud resources, automatic data-dependency handling between client and cloud resources, and the flexibility of implementing user-defined plugins for data transformations. In-general, we believe that our work attempts to increase the adoption of cloud resources for running data-intensive scientific workloads. / Master of Science Cloud Computing Next Generation Sequencing MapReduce GATK Workflow
52	Microfluidics for Low Input Epigenomic Analysis and Its Application to Brain Neuroscience Deng, Chengyu 06 January 2021 (has links) The epigenome carries dynamic information that controls gene expression and maintains cell identity during both disease and normal development. The inherent plasticity of the epigenome paves new avenues for developing diagnostic and therapeutic tools for human diseases. Microfluidic technology has improved the sensitivity and resolution of epigenomic analysis due to its outstanding ability to manipulate nanoliter-scale liquid volumes. In this thesis, I report three projects focusing on low-input, cell-type-specific and spatially resolved histone modification profiling on microfluidic platforms. First, I applied Microfluidic Oscillatory Washing-based Chromatin Immunoprecipitation followed by sequencing (MOWChIP-seq) to study the effect of culture dimensionality, hypoxia stress and bacterium infection on histone modification landscapes of brain tumor cells. I identified differentially marked regions between different culture conditions. Second, I adapted indexed ChIPmentation and introduced mu-CM, a low-input microfluidic device capable of performing 8 assays in parallel on different histone marks using as few as 20 cells in less than 7 hours. Last, I investigated the spatially resolved epigenome and transcriptome of neuronal and glial cells from coronal sections of adult mouse neocortex. I applied unsupervised clustering to identify distinct spatial patterns in neocortex epigenome and transcriptome that were associated with central nervous system development. I demonstrated that our method is well suited for scarce samples, such as biopsy samples from patients in the context of precision medicine. / Doctor of Philosophy / Epigenetic is the study of alternations in organisms not caused by alternation of the genetic codes. Epigenetic information plays pivotal role during growth, aging and disease. Epigenetic information is dynamic and modifiable, and thus serves as an ideal target for various diagnostic and therapeutic strategies of human diseases. Microfluidics is a technology that manipulates liquids with extremely small volumes in miniaturized devices. Microfluidics has improved the sensitivity and resolution of epigenetic analysis. In this thesis, I report three projects focusing on low-input, cell-type-specific and spatially resolved histone modification profiling on microfluidic platforms. Histone modification is one type of epigenetic information and regulates gene expression. First, we studied the influence of culture condition and bacterium infection on histone modification profile of brain tumor cells. Second, we introduced mu-CM, combining a low-input microfluidic device with indexed ChIPmentation and is capable of performing 8 assays in parallel using as few as 20 cells. Last, we investigated spatial variations in the epigenome and transcriptome across adult mouse neocortex, the outer layer of brain involving in higher-order function, such as cognition. I identified distinct spatial patterns responsible for central nervous system development using machine learning algorithm. Our method is well suited for studying scarce samples, such as cells populations isolated from patients in the context of precision medicine. Microfluidics Chromatin immunoprecipitation next generation sequencing histone modifications
53	Identifying and Analyzing Indel Variants in the Human Genome Using Computational Approaches Hasan, Mohammad Shabbir 01 July 2019 (has links) Insertion and deletion (indel), a common form of genetic variation, has been shown to cause or contribute to human genetic diseases and cancer. Despite this importance and being the second most abundant variant type in the human genome, indels have not been studied as much as the single nucleotide polymorphism (SNP). With the advance of next-generation sequencing technology, many indel calling tools have been developed. However, performance comparison of commonly used tools has shown that (1) the tools have limited power in identifying indels and there are significant number of indels undetected, and (2) there is significant disagreement among the indel sets produced by the tools. These findings indicate the necessity of improving the existing tools or developing new algorithms to achieve reliable and consistent indel calling results. Two indels are biologically equivalent if the resulting sequences are the same. Storing biologically equivalent indels as distinct entries in databases causes data redundancy and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. This dissertation describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system. Results show that UPS-indel identifies more redundant indels than existing algorithms. While mapping short reads to the reference genome, a significant number of short reads are unmapped and excluded from downstream analyses, thereby causing information loss in the subsequent variant calling. This dissertation describes Genesis-indel, a computational pipeline that explores the unmapped reads to identify missing novel indels. Results analyzing sequence alignment of 30 breast cancer patients show that Genesis-indel identifies many novel indels that also show significant enrichment in oncogenes and tumor suppressor genes, demonstrating the importance of rescuing indels hidden in the unmapped reads in cancer and disease studies. Somatic mutations play a vital role in transforming healthy cells into cancer cells. Therefore, accurate identification of somatic mutations is essential. Many somatic mutations callers are available with different strengths and weaknesses. An ensemble approach integrating the power of the callers is warranted. This dissertation describes SomaticHunter, an ensemble of two callers, namely Platypus and VarDict. Results on synthetic tumor data show that for both SNPs and indels, SomaticHunter achieves recall comparable to the state-of-the-art somatic mutation callers and the highest precision, resulting in the highest F1 score. / Doctor of Philosophy / Insertion and deletion (indel), a common form of genetic variation in the human genome, is associated with genetic diseases and cancer. However, indels are heavily understudied due to experimental and computational challenges. This dissertation addresses the computational challenges in three aspects. First, the current approach of representing indels is ambiguous and causes significant database redundancy. A universal positioning system, UPS-indel, is proposed to represent equivalent indels unambiguously and the UPS-indel algorithm is theoretically proven to find all equivalent indels and is thus exhaustive. Second, a significant number of indels are hidden in DNA reads not mapped to the reference genome. Genesis-indel, a computational pipeline that explores the unmapped reads to identify novel indels that are initially missed, is developed. Genesis-indel has been shown to uncover indels that can be important genetic markers for breast cancer. Finally, mutations occurring in somatic cells play a vital role in transforming healthy cells into cancer cells. Therefore, accurate identification of somatic mutation is essential for a better understanding of cancer genomes. SomaticHunter, an ensemble of two sensitive variant callers, is developed. Simulated studies using whole genome and whole exome sequences have shown that SomaticHunter achieves recall comparable to state-of-the-art somatic mutation callers while delivering the highest precision and therefore resulting in the highest F1 score among all the callers compared. Genetic Variants Indel Somatic Mutation Next Generation Sequencing
54	Sequence capture as a tool to understand the genomic basis for adaptation in angiosperm and gymnosperm trees Suren, Haktan 21 June 2017 (has links) Forest trees represent a unique group of organisms combined with ecological and economic importance. Owing to their random mating system and widespread geographical distribution, they harbor abundance genetic variation both within and among populations. Despite their importance, research in forest trees has been underrepresented majorly due to their large and complex genome and scarce funding. However, recent climate change and other associated problems such as insect outbreaks, diseases and stress related damages have urged scientists to focus more on trees. Furthermore, the advent in high-throughput sequencing technologies have allowed trees to be sequenced and used as reference genome, which provided deeper understanding between genotype and environment. Whole genome sequencing is still not possible for organisms having large genomes including most tree species, and it is still not feasible economically for population genomic studies which require sequencing hundreds of samples. To get around this problem, genomic reduction is required. Sequence capture has been one of the genomic reduction techniques enabled studying the subset of the DNA of interest. In this paper, our primary goal is to outline challenges, provide guidance about the utility of sequence capture in trees, and to leverage such data in genome-wide association analyses to find the genetic variants that underlie complex, adaptive traits in spruce and pine, as well as poplar. Results of this research will facilitate bridging the genomic information gap between trees and other organisms. Moreover, it will provide better understanding how genetic variation governs phenotype in trees, which will facilitate both marker assisted selection for improved traits as well as provide guidance to determine forest management strategies for reforestation to mitigate the effects of climate change. / Ph. D. Forest trees sequence capture adaptation next-generation sequencing
55	Sequenciamento e análise da variabilidade genética de vírus transmitidos por ácaros do gênero Brevipalpus no Brasil / Sequencing and analysis of the genetic variability of viruses transmitted by Brevipalpus mites in Brazil Jesus, Camila Chabí de 20 January 2016 (has links) Acredita-se que o Brasil é o centro de diversidade de vírus transmitidos por ácaros do gênero Brevipalpus (VTB). Alguns desses VTB infectam culturas fundamentais para o agronegócio brasileiro como citros e café, além de maracujá e de várias plantas ornamentais. Na última década os genomas de dois deles, Citrus leprosis virus C (CiLV-C) e Coffee ringspot virus (CoRSV) foram sequenciados, mas ainda é escasso o conhecimento sobre a diversidade genética e processos evolutivos envolvidos na população dessas espécies. Neste contexto, o objetivo deste trabalho foi caracterizar molecularmente novas estirpes de CiLV-C e CoRSV que infectam citros e café, respectivamente. E revelar as relações filogenéticas com espécies de VTB conhecidas, assim como avaliar a variabilidade genética da população de CiLVC no Brasil. Para o estudo de CiLV-C, 47 amostras de Citrus sinensis apresentando sintomas típicos da leprose dos citros foram coletadas em diferentes regiões do Brasil no período de 2011-2015. A presença de CiLV-C foi detectada por RT-PCR em todas as amostras coletadas e, posteriormente, foi realizado o sequenciamento de quatro regiões do genoma viral (p29, p15, RI e MP) de cada isolado. As sequências obtidas foram utilizadas no estudo de filogenia e variabilidade da população de CiLV-C no Brasil. Foi demonstrado que a população de CiLV-C apresenta uma variabilidade relativamente baixa; entretanto, foi identificada a existência de duas linhagens dentro da espécie, nomeadas Cor e SJRP. Os genomas completos de CiLV-C SJRP e também do dicorhavirus tentativo CoRSV identificado em Limeira, SP, foram obtidos mediante o sequenciamento de RNA de pequeno tamanho (siRNA). Cada sequência foi validada mediante o sequenciamento de fragmentos gerados por RT-PCR ao longo do genoma. CiLV-C SJRP apresenta cerca de 85% de identidade de nucleotídeo com o membro-tipo do gênero Cilevirus e exibe evidências de recombinação com isolados da linhagem Cor, a prevalente no território brasileiro. Globalmente, o genoma de CoRSV Limeira apresenta mais de 90% de identidade de nucleotídeo com isolado CoRSV Lavras, o que indica que ambos os isolados são membros da mesma espécie tentativa de dichorhavirus. / South America is most likely the center of diversity of Brevipalpus transmitted viruses (BTV). Some of these BTV infect major crops of the Brazilian agribusiness such as citrus and coffee. Passion fruit and several other ornamental plants are affected as well. The genome of two of these viruses, Citrus leprosis virus C (CiLV-C) and Coffee ringspot virus (CoRSV) were sequenced, but the knowledge about several molecular characteristics and processes involved in the evolution of their populations are still scarce. Thus, the objective of this study was to molecularly characterize new isolates of BTV infecting citrus and coffee, reveal the phylogenetic relationships with known species of BTV, and assess the genetic variability of the population of CiLV-C in Brazil. For CiLV-C studies, 47 samples of Citrus sinensis showing typical symptoms of leprosis were collected in different Brazilian regions during 2011-2015. The presence of CiLV-C was detected by RT-PCR in all the collected samples and four regions of the viral genome (p29, p15, IR and MP) of each isample were sequenced. It has been shown that the CiLV-C population has relatively low variability; although the existence of two lineages named Cor and SJRP were identified in this work. The complete genomes of one isolate of the lineage SJRP (CiLV-C SJRP) and that of the tentative dicorhavirus CoRSV found in Limeira, SP, were obtained by small RNA (siRNA) Sequencing. Validation was performed by sequencing fragments generated by RT-PCR using specific primers throughout the genome. CiLV-C SJRP has about 85% nucleotide identity with the genome of the type-member of the Cilevirus genus and shows evidence of recombination with isolates of the lineage Cor, which are prevalent in Brazil. CoRSV isolate Limeira has more than 90% of nucleotide identity with CoRSV Lavras, indicating that both isolates are members of the same tentative species of dichorhavirus. Next generation sequencing Ácaros Citrus leprosis Coffe ringspot Leprose dos citros Mancha anular do cafeeiro Mites Next generation sequencing
56	Sequenciamento e análise da variabilidade genética de vírus transmitidos por ácaros do gênero Brevipalpus no Brasil / Sequencing and analysis of the genetic variability of viruses transmitted by Brevipalpus mites in Brazil Camila Chabí de Jesus 20 January 2016 (has links) Acredita-se que o Brasil é o centro de diversidade de vírus transmitidos por ácaros do gênero Brevipalpus (VTB). Alguns desses VTB infectam culturas fundamentais para o agronegócio brasileiro como citros e café, além de maracujá e de várias plantas ornamentais. Na última década os genomas de dois deles, Citrus leprosis virus C (CiLV-C) e Coffee ringspot virus (CoRSV) foram sequenciados, mas ainda é escasso o conhecimento sobre a diversidade genética e processos evolutivos envolvidos na população dessas espécies. Neste contexto, o objetivo deste trabalho foi caracterizar molecularmente novas estirpes de CiLV-C e CoRSV que infectam citros e café, respectivamente. E revelar as relações filogenéticas com espécies de VTB conhecidas, assim como avaliar a variabilidade genética da população de CiLVC no Brasil. Para o estudo de CiLV-C, 47 amostras de Citrus sinensis apresentando sintomas típicos da leprose dos citros foram coletadas em diferentes regiões do Brasil no período de 2011-2015. A presença de CiLV-C foi detectada por RT-PCR em todas as amostras coletadas e, posteriormente, foi realizado o sequenciamento de quatro regiões do genoma viral (p29, p15, RI e MP) de cada isolado. As sequências obtidas foram utilizadas no estudo de filogenia e variabilidade da população de CiLV-C no Brasil. Foi demonstrado que a população de CiLV-C apresenta uma variabilidade relativamente baixa; entretanto, foi identificada a existência de duas linhagens dentro da espécie, nomeadas Cor e SJRP. Os genomas completos de CiLV-C SJRP e também do dicorhavirus tentativo CoRSV identificado em Limeira, SP, foram obtidos mediante o sequenciamento de RNA de pequeno tamanho (siRNA). Cada sequência foi validada mediante o sequenciamento de fragmentos gerados por RT-PCR ao longo do genoma. CiLV-C SJRP apresenta cerca de 85% de identidade de nucleotídeo com o membro-tipo do gênero Cilevirus e exibe evidências de recombinação com isolados da linhagem Cor, a prevalente no território brasileiro. Globalmente, o genoma de CoRSV Limeira apresenta mais de 90% de identidade de nucleotídeo com isolado CoRSV Lavras, o que indica que ambos os isolados são membros da mesma espécie tentativa de dichorhavirus. / South America is most likely the center of diversity of Brevipalpus transmitted viruses (BTV). Some of these BTV infect major crops of the Brazilian agribusiness such as citrus and coffee. Passion fruit and several other ornamental plants are affected as well. The genome of two of these viruses, Citrus leprosis virus C (CiLV-C) and Coffee ringspot virus (CoRSV) were sequenced, but the knowledge about several molecular characteristics and processes involved in the evolution of their populations are still scarce. Thus, the objective of this study was to molecularly characterize new isolates of BTV infecting citrus and coffee, reveal the phylogenetic relationships with known species of BTV, and assess the genetic variability of the population of CiLV-C in Brazil. For CiLV-C studies, 47 samples of Citrus sinensis showing typical symptoms of leprosis were collected in different Brazilian regions during 2011-2015. The presence of CiLV-C was detected by RT-PCR in all the collected samples and four regions of the viral genome (p29, p15, IR and MP) of each isample were sequenced. It has been shown that the CiLV-C population has relatively low variability; although the existence of two lineages named Cor and SJRP were identified in this work. The complete genomes of one isolate of the lineage SJRP (CiLV-C SJRP) and that of the tentative dicorhavirus CoRSV found in Limeira, SP, were obtained by small RNA (siRNA) Sequencing. Validation was performed by sequencing fragments generated by RT-PCR using specific primers throughout the genome. CiLV-C SJRP has about 85% nucleotide identity with the genome of the type-member of the Cilevirus genus and shows evidence of recombination with isolates of the lineage Cor, which are prevalent in Brazil. CoRSV isolate Limeira has more than 90% of nucleotide identity with CoRSV Lavras, indicating that both isolates are members of the same tentative species of dichorhavirus. Next generation sequencing Ácaros Leprose dos citros Mancha anular do cafeeiro Citrus leprosis Coffe ringspot Mites Next generation sequencing
57	Towards next-generation sequencing-based identification of norovirus recognition elements and microfluidic array using phage display technology / Phage Display als Tool zur Next Generation Sequencing-basierten Identifizierung von Norovirus-Erkennungselementen und zur Entwicklung eines mikrofluidischen Arrays Pahlke, Claudia 28 November 2017 (has links) (PDF) Noroviruses are the major cause of acute viral gastroenteritis worldwide. Thus, rapid and reliable pathogen detection and control are crucial to avoid epidemic outbreaks. Peptides which bind to these viruses with high specificity and affinity could serve as small and stable recognition elements in biosensing applications for a point-of-care diagnostic of noroviruses. They can be identified by screening large phage display libraries using the biopanning technique. In the present study, this method was applied to identify norovirus-binding peptide motifs. For this purpose, a biopanning based on column chromatography was established, and three rounds of selections were performed. After the second round, the cosmix-plexing recombination technique was implemented to enhance the chance of obtaining peptides with very high affinity. Biopanning data evaluation was based on next-generation sequencing (NGS), to show that this innovative method can enable a detailed analysis of the complete sequence spectrum obtained during and after biopanning. Highly enriched motifs could be characterized by their large proportion of the amino acids W, K, R, N, and F. Neighbourhood analysis was exemplarily performed for selected motifs, showing that the motifs FAT, RWN, and KWF possessed the fingerprints with the largest differences relative to the original library. This thesis thus presents next-generation sequencing-based analysis tools, which could now be transferred to any other biopanning project. The identified peptide motifs represent promising candidates for a future examination of their norovirus-specific binding. A new option for testing such phage-target interactions in the context of biopanning selections was studied in the second part of the thesis. For this purpose, a phage-based microarray was developed as a miniaturized binding assay. As a prerequisite, the different immobilization behaviour of phages on positively and negatively charged surfaces was studied, and a non-contact printing technique for bacteriophages was developed. Subsequently, the interaction of phages and antibodies directed against phage coat proteins was characterized in enzyme-linked immunosorbent assays, and the protocol was successfully transferred to the non-contact printed phage spots. At the proof-of-concept level, the phage array could finally be integrated into a microfluidic setup, showing a higher signal-to-background ratio relative to the static phage array. These results point the way towards a microfluidic phage array, allowing online monitoring, automation, and parallelisation of the phage array analysis. / Noroviren gelten als Hauptursache akuter viraler Magen-Darm-Erkrankungen. Nur eine zeitnahe und verlässliche Detektion und Kontrolle dieser Pathogene kann epidemische Ausbrüche vermeiden. Um dies zu ermöglichen, könnten Peptide, die an diese Viren mit hoher Spezifität und Affinität binden, als kleine und stabile Erkennungselemente in biosensorischen Anwendungen eingesetzt werden. Solche Peptide können mithilfe der Biopanning-Technik identifiziert werden, die auf dem Screening großer Phagen-Display-Bibliotheken beruht. In der vorliegenden Arbeit wurde diese Methode genutzt, um Norovirus-bindende Peptidmotive zu identifizieren. Dazu wurde ein auf Säulenchromatographie basierendes Biopanning entwickelt und drei Selektionsrunden durchgeführt. Die Cosmix-Plexing-Rekombinationstechnik wurde nach der zweiten Runde eingesetzt, um die Wahrscheinlichkeit der Gewinnung hochaffiner Binder zu erhöhen. Die Auswertung der Biopanningdaten erfolgte mittels Hochdurchsatzsequenzierung (Next-Generation Sequencing). Es konnte gezeigt werden, dass diese innovative Methode die detailierte Analyse des kompletten Sequenzspektrums während und nach dem Biopanning ermöglicht. Stark angereicherte Motive konnten durch ihren hohen Anteil an den Aminosäuren W, K, R, N und F charakterisiert werden. Eine Nachbarschaftsanalyse wurde exemplarisch für ausgewählte Motive durchgeführt. Dabei wurden die stärksten Unterschiede im Fingerprint im Vergleich zur Ausgangsbibliothek bei den Motiven FAT, RWN und KWF gefunden. Diese Dissertation stellt damit auf Next-Generation Sequencing basierende Analysetechniken vor, die für weitere Biopanningprojekte übernommen werden können. Die identifizierten Peptidmotive könnten als vielversprechende Kandidaten zukünftig auf ihre Norovirus-spezifische Bindung hin getestet werden. Eine neue Möglichkeit, solche Phagen-Analyt-Interaktionen zu untersuchen, wurde im zweiten Teil der Dissertation untersucht. Dafür wurde als miniaturisierter Bindungsassay ein Phagen-basiertes Mikroarray entwickelt. Als Voraussetzung wurde zunächst das unterschiedliche Immobilisierungsverhalten von Bakteriophagen auf positiv und negativ geladenen Oberflächen untersucht und eine kontaktfreie Drucktechnik für Bakteriophagen etabliert. Anschließend wurde die Interaktion von Phagen und gegen sie gerichteten Antikörpern in Enzym-gekoppelten Immunadsorptionstests charakterisiert und das Protokoll erfolgreich auf die kontaktfrei gedruckten Phagenspots übertragen. Schließlich wurde erstmals die grundsätzliche Möglichkeit gezeigt, das Array in ein mikrofluidisches Setup zu integrieren, was zu einem höheren Signal-zu-Hintergrund-Verhältnis im Vergleich zum statischen Array führte. Diese Ergebnisse zeigen damit den Weg zu einem mikrofluidischen Phagen-Array auf, das sowohl die Möglichkeit des Online-Monitorings als auch der Automatisierung und Parallelisierung der Phagen-Array-Analyse bietet. Phage Display Norovirus Next-Generation Sequencing mikrofluidischer Array phage display norovirus next-generation sequencing microfluidic array ddc:570 rvk:WF 4900
58	Towards next-generation sequencing-based identification of norovirus recognition elements and microfluidic array using phage display technology Pahlke, Claudia 07 November 2017 (has links) Noroviruses are the major cause of acute viral gastroenteritis worldwide. Thus, rapid and reliable pathogen detection and control are crucial to avoid epidemic outbreaks. Peptides which bind to these viruses with high specificity and affinity could serve as small and stable recognition elements in biosensing applications for a point-of-care diagnostic of noroviruses. They can be identified by screening large phage display libraries using the biopanning technique. In the present study, this method was applied to identify norovirus-binding peptide motifs. For this purpose, a biopanning based on column chromatography was established, and three rounds of selections were performed. After the second round, the cosmix-plexing recombination technique was implemented to enhance the chance of obtaining peptides with very high affinity. Biopanning data evaluation was based on next-generation sequencing (NGS), to show that this innovative method can enable a detailed analysis of the complete sequence spectrum obtained during and after biopanning. Highly enriched motifs could be characterized by their large proportion of the amino acids W, K, R, N, and F. Neighbourhood analysis was exemplarily performed for selected motifs, showing that the motifs FAT, RWN, and KWF possessed the fingerprints with the largest differences relative to the original library. This thesis thus presents next-generation sequencing-based analysis tools, which could now be transferred to any other biopanning project. The identified peptide motifs represent promising candidates for a future examination of their norovirus-specific binding. A new option for testing such phage-target interactions in the context of biopanning selections was studied in the second part of the thesis. For this purpose, a phage-based microarray was developed as a miniaturized binding assay. As a prerequisite, the different immobilization behaviour of phages on positively and negatively charged surfaces was studied, and a non-contact printing technique for bacteriophages was developed. Subsequently, the interaction of phages and antibodies directed against phage coat proteins was characterized in enzyme-linked immunosorbent assays, and the protocol was successfully transferred to the non-contact printed phage spots. At the proof-of-concept level, the phage array could finally be integrated into a microfluidic setup, showing a higher signal-to-background ratio relative to the static phage array. These results point the way towards a microfluidic phage array, allowing online monitoring, automation, and parallelisation of the phage array analysis. / Noroviren gelten als Hauptursache akuter viraler Magen-Darm-Erkrankungen. Nur eine zeitnahe und verlässliche Detektion und Kontrolle dieser Pathogene kann epidemische Ausbrüche vermeiden. Um dies zu ermöglichen, könnten Peptide, die an diese Viren mit hoher Spezifität und Affinität binden, als kleine und stabile Erkennungselemente in biosensorischen Anwendungen eingesetzt werden. Solche Peptide können mithilfe der Biopanning-Technik identifiziert werden, die auf dem Screening großer Phagen-Display-Bibliotheken beruht. In der vorliegenden Arbeit wurde diese Methode genutzt, um Norovirus-bindende Peptidmotive zu identifizieren. Dazu wurde ein auf Säulenchromatographie basierendes Biopanning entwickelt und drei Selektionsrunden durchgeführt. Die Cosmix-Plexing-Rekombinationstechnik wurde nach der zweiten Runde eingesetzt, um die Wahrscheinlichkeit der Gewinnung hochaffiner Binder zu erhöhen. Die Auswertung der Biopanningdaten erfolgte mittels Hochdurchsatzsequenzierung (Next-Generation Sequencing). Es konnte gezeigt werden, dass diese innovative Methode die detailierte Analyse des kompletten Sequenzspektrums während und nach dem Biopanning ermöglicht. Stark angereicherte Motive konnten durch ihren hohen Anteil an den Aminosäuren W, K, R, N und F charakterisiert werden. Eine Nachbarschaftsanalyse wurde exemplarisch für ausgewählte Motive durchgeführt. Dabei wurden die stärksten Unterschiede im Fingerprint im Vergleich zur Ausgangsbibliothek bei den Motiven FAT, RWN und KWF gefunden. Diese Dissertation stellt damit auf Next-Generation Sequencing basierende Analysetechniken vor, die für weitere Biopanningprojekte übernommen werden können. Die identifizierten Peptidmotive könnten als vielversprechende Kandidaten zukünftig auf ihre Norovirus-spezifische Bindung hin getestet werden. Eine neue Möglichkeit, solche Phagen-Analyt-Interaktionen zu untersuchen, wurde im zweiten Teil der Dissertation untersucht. Dafür wurde als miniaturisierter Bindungsassay ein Phagen-basiertes Mikroarray entwickelt. Als Voraussetzung wurde zunächst das unterschiedliche Immobilisierungsverhalten von Bakteriophagen auf positiv und negativ geladenen Oberflächen untersucht und eine kontaktfreie Drucktechnik für Bakteriophagen etabliert. Anschließend wurde die Interaktion von Phagen und gegen sie gerichteten Antikörpern in Enzym-gekoppelten Immunadsorptionstests charakterisiert und das Protokoll erfolgreich auf die kontaktfrei gedruckten Phagenspots übertragen. Schließlich wurde erstmals die grundsätzliche Möglichkeit gezeigt, das Array in ein mikrofluidisches Setup zu integrieren, was zu einem höheren Signal-zu-Hintergrund-Verhältnis im Vergleich zum statischen Array führte. Diese Ergebnisse zeigen damit den Weg zu einem mikrofluidischen Phagen-Array auf, das sowohl die Möglichkeit des Online-Monitorings als auch der Automatisierung und Parallelisierung der Phagen-Array-Analyse bietet. info:eu-repo/classification/ddc/570 ddc:570
59	CONNECTING THE DOTS : Exploring gene contexts through knowledge-graph representations of gene-information derived from scientific literature Hellberg, Henrietta January 2023 (has links) Analyzing the data produced by next-generation sequencing technologies relies on access to information synthesized based on previous research findings. The volume of data available in the literature is growing rapidly, and it is becoming increasingly necessary for researchers to use AI or other statistics-based approaches in the analysis of their datasets. In this project, knowledge graphs are explored as a tool for providing access to contextual gene-information available in scientific literature. The explorative method described in this thesis is based on the implementation and comparison of two approaches for knowledge graph construction, a rule-based statistical as well as a neural-network and co-occurrence based approach, -based on specific literature contexts. The results are presented both in the form of a quantitative comparison between approaches as well as in the form of a qualitative expert evaluation of the quantitative result. The quantitative comparison suggested that contrasting knowledge graphs constructed based on different approaches can provide valuable information for the interpretation and contextualization of key genes. It also demonstrated the limitations of some approaches e.g. in terms of scalability as well as the volume and type of information that can be extracted. The result further suggested that metrics based on the overlap of nodes and edges, as well as metrics that leverage the global topology of graphs are valuable for representing and comparing contextual information between knowledge graphs. The result based on the qualitative expert evaluation demonstrated that literature-derived knowledge graphs of gene-information can be valuable tools for identifying research biases related to genes and also shed light on the challenges related to biological entity normalization in the context of knowledge graph development. In light of these findings, automatic knowledge-graph construction presents as a promising approach for improving access to contextual information about genes in scientific literature. / För att analysera de stora mängder data som produceras med hjälp av next-generation sequencing krävs det att forskare har tillgång till och kan sammanställa information från tidigare forskning. I takt med att mängden data som finns tillgänglig i den vetenskapliga litteraturen ökar, så ökar även behovet av att använda AI och andra statistiska metoder för att få tillgång till denna data i analysen. I detta projekt utforskas kunskapsgrafer som verktyg för att tillgängliggöra kontextuell geninformation i vetenskapliga artiklar. Den explorativa metod som beskrivs i detta projekt är baserad på implementationen och jämförelsen av två olika tekniker för kunskapsgrafgenerering, en regelbaserad-statistisk metod samt en metod baserad på neurala-nätverk och co-occurrence, baserade på specifika kontexter inom litteraturen. Resultatet presenteras både i form av en kvantitativ jämförelse mellan metoder samt genom en kvalitativ expertutvärdering baserad på det kvantitativa resultatet. Den kvantitativa jämförelsen antydde att jämförelsen mellan kunskapsgrafer genererade med hjälp av olika metoder kan bidra med värdefull information för tolkningen och kontextualiseringen av viktiga gener. Resultatet visade även på begränsningar hos vissa metoder, till exempel gällande skalbarhet samt den mängd och typ av information som kan extraheras. Men även att metrics baserade på överlappning av hörn och kanter, samt metrics som tar hänsyn till den globala topologin i grafer kan vara användbara i jämförelsen av, samt för att representera skillnader mellan biologiska kunskapsgrafer. Resultatet från den kvalitativa expertutvärderingen visade att kunskapsgrafer baserade på geninformation extraherad från vetenskapliga artiklar kan vara värdefulla verktyg för att identifiera forskningsbias gällande gener, samt framhävde viktiga utmaningar gällande normalisering av biologiska entiteter inom området kunskapsgrafsutveckling. Baserat på dessa fynd framstår automatisk kunskapsgrafsgenerering som ett lovande tillvägagångssätt för att förbättra tillgängligheten av kontextuell geninformation i vetenskaplig litteratur. Knowledge graph construction Information extraction Knowledge Graphs Next-Generation Sequencing Gene contextualization Kunskapsgrafkonstruktion Informationsextraktion Kunskapsgrafer Next-Generation Sequencing Kontextualisering av gener Computer and Information Sciences Data- och informationsvetenskap
60	CoenzymeQ10-associated gene mutations in South African patients with respiratory chain deficiencies / Lindi-Maryn Jonck Jonck, Lindi-Maryn January 2015 (has links) CoenzymeQ10 (CoQ10) functions as an electron carrier in mitochondria transporting electrons from CI and CII to CIII in the respiratory chain (RC) for normal cellular energy (ATP) production. Mutations in genes of a complicated and not yet well understood CoQ10 biosynthesis cause primary CoQ10 deficiency, a rare autosomal recessive mitochondrial disorder (MD) with diverse heterogeneous clinical phenotypes. Although the major function of CoQ10 is to serve as electron transfer molecule it furthermore possesses multiple metabolic functions which can result in secondary CoQ10 deficiency. Five main clinical phenotypes are associated with CoQ10 deficiency although distinct genotype-phenotype associations are still absent due to the limited molecular genetic diagnoses of most reported CoQ10 deficiency cases. A correlation was found between reduced levels of CoQ10 in muscle tissue and deficient CII + III RC enzyme activities in a South African patient cohort, the current indicators for potential CoQ10 deficiency. The aim of the study was therefore to identify nuclear-encoded mutations in genes associated with CoQ10 deficiencies in a cohort of South African patients diagnosed with respiratory chain deficiencies (RCDs). A high throughput target enrichment strategy was performed in order to identify previously reported and/or possible novel CoQ10-assosciated disease-causing variants using Ion Torrent next generation sequencing (NGS) and an in-house developed bioinformatics pipeline. The data obtained were compared to clinical presentations of the patients to interpret the results of the identified variants considered to be possibly pathogenic. Targeted genes associated with primary CoQ10- and secondary CoQ10 deficiency was successfully sequenced in 24 patients, identifying 16 possible disease-causing variants. Of these variants three compound heterozygous variants were identified in three patients in genes ETFDH, COQ6 and COQ7, which were considered to be pathogenic according to the available data provided. Further validation of these three variants supported its pathogenicity in at least two of these variants (ETFDH and COQ6). In conclusion: This study contributed to better understanding the aetiology of a South African cohort of patients diagnosed with MDs. It also highlighted the valuable role of NGS for such investigations, and furthermore identified areas in the biochemical and molecular diagnostic strategy where improvements could be made in the future. / MSc (Biochemistry), North-West University, Potchefstroom Campus, 2015 CoenzymeQ10 CoQ10 deficiency Mitochondrial disorder Respiratory chain deficiencies Next generation sequencing Bioinformatics

Search results