• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • No language data
  • Tagged with
  • 9
  • 9
  • 9
  • 4
  • 4
  • 4
  • 4
  • 3
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Reconstruction and Local Recovery of Data from Synchronization Errors

Minshen Zhu (15334783) 21 April 2023 (has links)
<p>In this thesis we study the complexity of data recovery from synchronization errors, namely insertion and deletion (insdel) errors.</p> <p>Insdel Locally Decodable Codes (Insdel LDCs) are error-correcting codes that admit super-efficient decoding algorithms even in the presence of many insdel errors. The study of such codes for Hamming errors have spanned several decades, whereas work on the insdel analogue had amounted to only a few papers before our work. This work initiates a systematic study of insdel LDCs, seeking to bridge this gap through designing codes and proving limitations. Our upper bounds essentially match those for Hamming LDCs in important ranges of parameters, even though insdel LDCs are more general than Hamming LDCs. Our main results are lower bounds that are exponentially stronger than the ones inherited from the Hamming LDCs. These results also have implications for the well-studied variant of relaxed LDCs. For this variant, besides showing the first results in the insdel setting, we also answer an open question for the Hamming variant by showing a strong lower bound.</p> <p>In the trace reconstruction problem, the goal is to recover an unknown source string x \in {0,1}n from random traces, which are obtained by hitting the source string with random deletion/insertions at a fixed rate. Mean-based algorithms are a class of reconstruction algorithms whose outputs depend only on the empirical estimates of individual bits. The number of traces needed for mean-based trace reconstruction has already been settled. We further study the performance of mean-based algorithms in a scenario where one wants to distinguish between two source strings parameterized by their edit distance, and we also provide explicit construction of strings that are hard to distinguish. We further establish an equivalence to the Prouhet-Tarry-Escott problem from number theory, which ends up being an obstacle to constructing explicit hard instances against mean-based algorithms.</p>
2

Flexible and Data-Driven Modeling of 3D Protein Complex Structures

Charles W Christoffer (17482395) 30 November 2023 (has links)
<p dir="ltr">Proteins and their interactions with each other, with nucleic acids, and with other molecules are foundational to all known forms of life. The three-dimensional structures of these interactions are an essential component of a comprehensive understanding of how they function. Molecular-biological hypothesis formulation and rational drug design are both often predicated on a particular structure model of the molecule or complex of interest. While experimental methods capable of determining atomic-detail structures of molecules and complexes exist, such as the popular X-ray crystallography and cryo-electron microscopy, these methods require both laborious sample preparation and expensive instruments with limited throughput. Computational methods of predicting complex structures are therefore desirable if they can enable cheap, high-throughput virtual screening of the space of biological hypotheses. Many common biomolecular contexts have largely been blind spots for predictive modeling of complex structures. In this direction, docking methods are proposed to address extreme conformational change, nonuniform environments, and distance-geometric priors. Flex-LZerD deforms a flexible protein using a novel fitting procedure based on iterated normal mode decomposition and was shown to construct accurate complex models even when an initial input subunit structure exhibits extreme conformational differences from its bound state. Mem-LZerD efficiently constrains the docking search space by augmenting the geometric hashing data structure at the core of the LZerD algorithm and enabled membrane protein complexes to be efficiently and accurately modeled. Finally, atomic distance-based approaches developed during modeling competitions and collaborations with wet lab biologists were shown to effectively integrate domain knowledge into complex modeling pipelines.</p>
3

HIGHLY ACCURATE MACROMOLECULAR STRUCTURE COMPLEX DETECTION, DETERMINATION AND EVALUATION BY DEEP LEARNING

Xiao Wang (17405185) 17 November 2023 (has links)
<p dir="ltr">In life sciences, the determination of macromolecular structures and their functions, particularly proteins and protein complexes, is of paramount importance, as these molecules play critical roles within cells. The specific physical interactions of macromolecules govern molecular and cellular functions, making the 3D structure elucidation of these entities essential for comprehending the mechanisms underlying life processes, diseases, and drug discovery. Cryo-electron microscopy (cryo-EM) has emerged as a promising experimental technique for obtaining 3D macromolecular structures. In the course of my research, I proposed CryoREAD, an innovative AI-based method for <i>de nov</i>o DNA/RNA structure modeling. This novel approach represents the first fully automated solution for DNA/RNA structure modeling from cryo-EM maps at near-atomic resolution. However, as the resolution decreases, structure modeling becomes significantly more challenging. To address this challenge, I introduced Emap2sec+, a 3D deep convolutional neural network designed to identify protein secondary structures, RNA, and DNA information from cryo-EM maps at intermediate resolutions ranging from 5-10 Å. Additionally, I presented Alpha-EM-Multimer, a groundbreaking method for automatically building full protein complexes from cryo-EM maps at intermediate resolution. Alpha-EM-Multimer employs a diffusion model to trace the protein backbone and subsequently fits the AlphaFold predicted single-chain structure to construct the complete protein complex. Notably, this method stands as the first to enable the modeling of protein complexes with more than 10,000 residues for cryo-EM maps at intermediate resolution, achieving an average TM-Score of predicted protein complexes above 0.8, which closely approximates the native structure. Furthermore, I addressed the recognition of local structural errors in predicted and experimental protein structures by proposing DAQ, an evaluation approach for experimental protein structure quality that utilizes detection probabilities derived from cryo-EM maps via a pretrained multi-task neural network. In the pursuit of evaluating protein complexes generated through computational methods, I developed GNN-DOVE and DOVE, leveraging convolutional neural networks and graph neural networks to assess the accuracy of predicted protein complex structures. These advancements in cryo-EM-based structural modeling and evaluation methodologies hold significant promise for advancing our understanding of complex macromolecular systems and their biological implications.</p>
4

<b>Systems Modeling of host microbiome interactions in Inflammatory Bowel Diseases</b>

Javier E Munoz (18431688) 24 April 2024 (has links)
<p dir="ltr">Crohn’s disease and ulcerative colitis are chronic inflammatory bowel diseases (IBD) with a rising global prevalence, influenced by clinical and demographics factors. The pathogenesis of IBD involves complex interactions between gut microbiome dysbiosis, epithelial cell barrier disruption, and immune hyperactivity, which are poorly understood. This necessitates the development of novel approaches to integrate and model multiple clinical and molecular data modalities from patients, animal models, and <i>in-vitro</i> systems to discover effective biomarkers for disease progression and drug response. As sequencing technologies advance, the amount of molecular and compositional data from paired measurements of host and microbiome systems is exploding. While it is become routine to generate such rich, deep datasets, tools for their interpretation lag behind. Here, I present a computational framework for integrative modeling of microbiome multi-omics data titled: Latent Interacting Variable Effects (LIVE) modeling. LIVE combines various types of microbiome multi-omics data using single-omic latent variables (LV) into a structured meta-model to determine the most predictive combinations of multi-omics features predicting an outcome, patient group, or phenotype. I implemented and tested LIVE using publicly available metagenomic and metabolomics data set from Crohn’s Disease (CD) and ulcerative colitis (UC) status patients in the PRISM and LLDeep cohorts. The findings show that LIVE reduced the number of features interactions from the original datasets for CD to tractable numbers and facilitated prioritization of biological associations between microbes, metabolites, enzymes, clinical variables, and a disease status outcome. LIVE modeling makes a distinct and complementary contribution to the current methods to integrate microbiome data to predict IBD status because of its flexibility to adapt to different types of microbiome multi-omics data, scalability for large and small cohort studies via reliance on latent variables and dimensionality reduction, and the intuitive interpretability of the meta-model integrating -omic data types.</p><p dir="ltr">A novel application of LIVE modeling framework was associated with sex-based differences in UC. Men are 20% more likely to develop this condition and 60% more likely to progress to colitis-associated cancer compared to women. A possible explanation for this observation is differences in estrogen signaling among men and women in which estrogen signaling may be protective against UC. Extracting causal insights into how gut microbes and metabolites regulate host estrogen receptor β (ERβ) signaling can facilitate the study of the gut microbiome’s effects on ERβ’s protective role against UC. Supervised LIVE models<b> </b>ERβ signaling using high-dimensional gut microbiome data by controlling clinical covariates such as: sex and disease status. LIVE models predicted an inhibitory effect on ER-UP and ER-DOWN signaling activities by pairs of gut microbiome features, generating a novel of catalog of metabolites, microbial species and their interactions, capable of modulating ER. Two strongly positively correlated gut microbiome features: <i>Ruminoccocus gnavus</i><i> </i>with acesulfame and <i>Eubacterium rectale</i><i> </i>with 4-Methylcatechol were prioritized as suppressors ER-UP and ER-DOWN signaling activities. An <i>in-vitro</i> experimental validation roadmap is proposed to study the synergistic relationships between metabolites and microbiota suppressors of ERβ signaling in the context of UC. Two i<i>n-vitro</i> systems, HT-29 female colon cancer cell and female epithelial gut organoids are described to evaluate the effect of gut microbiome on ERβ signaling. A detailed experimentation is described per each system including the selection of doses, treatments, metrics, potential interpretations and limitations. This experimental roadmap attempts to compare experimental conditions to study the inhibitory effects of gut microbiome on ERβ signaling and how it could elevate or reduce the risk of developing UC. The intuitive interpretability of the meta-model integrating -omic data types in conjunction with the presented experimental validation roadmap aim to transform an artificial intelligence-generated big data hypothesis into testable experimental predictions.</p>
5

A Parallel Computing Approach for Identifying Retinitis Pigmentosa Modifiers in Drosophila Using Eye Size and Gene Expression Data

Chawin Metah (15361576) 29 April 2023 (has links)
<p>For many years, researchers have developed ways to diagnose degenerative disease in the retina by utilizing multiple gene analysis techniques. Retinitis pigmentosa (RP) disease can cause either partially or totally blindness in adults. For that reason, it is crucial to find a way to pinpoint the causes in order to develop a proper medication or treatment. One of the common methods is genome-wide analysis (GWA). However, it cannot fully identify the genes that are indirectly related to the changes in eye size. In this research, RNA sequencing (RNA-seq) analysis is used to link the phenotype to genotype, creating a pool of candidate genes that might associate with the RP. This will support future research in finding a therapy or treatment to cure such disease in human adults.</p> <p><br></p> <p>Using the Drosophila Genetic Reference Panel (DGRP) – a gene reference panel of fruit fly – two types of datasets are involved in this analysis: eye-size data and gene expression data with two replicates for each strain. This allows us to create a phenotype-genotype map. In other words, we are trying to trace the genes (genotype) that exhibit the RP disease guided by comparing their eye size (phenotype). The basic idea of the algorithm is to discover the best replicate combination that maximizes the correlation between gene expression and eye-size. Since there are 2N possible replicate combinations, where N is the number of selected strains, the original implementation of sequential algorithm was computationally intensive.</p> <p><br></p> <p>The original idea of finding the best replicate combination was proposed by Nguyen et al. (2022). In this research, however, we restructured the algorithms to distribute the tasks of finding the best replicate combination and run them in parallel. The implementation was done using the R programming language, utilizing doParallel and foreach packages, and able to execute on a multicore machine. The program was tested on both a laptop and a server, and the experimental results showed an outstanding improvement in terms of the execution time. For instance, while using 32 processes, the results reported up to 95% reduction in execution time when compared with the sequential version of the code. Furthermore, with the increment of computational capabilities, we were able to explore and analyze more extreme eye-size lines using three eye-size datasets representing different phenotype models. This further improved the accuracy of the results where the top candidate genes from all cases showed connection to RP.</p>
6

EXPLORING THE EFFECTS OF ANCESTRY ON INFERENCE AND IDENTITY USING BIOINFORMATICS

Noah C Herrick (16649334) 02 October 2023 (has links)
<p>Ancestry is a complex and layered concept, but it must be operationalized for its objective use in genetic studies. Critical decisions in research analyses, clinical practice, and forensic investigations are based on genetic ancestry inference. For example, in genetic association studies for clinical and applied research, investigators may need to isolate one population of interest from a worldwide dataset to avoid false positive results, or in human identification, ancestry inferences can help reveal the identity of unknown DNA evidence by narrowing down a suspect list. Many studies seek to improve ancestry inference for these reasons. The research presented here offers valuable resources for exploring and improving genetic ancestry inference and intelligence toward identity. </p> <p>First, analyses with ‘big data’ in genomics is a resource-intensive task that requires optimization. Therefore, this research introduces a suite of automated Snakemake workflows, <em>Iliad</em>, that was developed to give the research community an easy-to-learn, hands-off computational tool for genomic data processing of multiple data formats. <em>Iliad</em> can be installed and run on a Google Cloud Platform remote server instance in less than 20 minutes when using the provided installation code in the ReadTheDocs documentation. The workflows support raw data processing from various genetic data types including microarray, sequence, and compressed alignment data, as well as performing micro-workflows on variant call format (VCF) files to merge data or lift over variant positions. When compared to a similar workflow, <em>Iliad </em>completed processing one sample’s raw paired-end sequence reads to a human-legible VCF file in 7.6 hours which was three-times faster than the other workflow. This suite of workflows is paramount towards building reference population panels from human whole-genome sequence (WGS) data which is useful in many research studies including imputation, ancestry estimation, and ancestry informative marker (AIM) discovery.</p> <p>Second, there are persistent challenges in ancestry inference for individuals of the Middle East, especially with the use of AIMs. This research demonstrates a population genomics study pertaining to the Middle East, novel population data from Lebanon (n=190), and an unsupervised genetic clustering approach with WGS data from the 1000 Genomes Project and Human Genome Diversity Project. These efforts for AIM discovery identified two single nucleotide polymorphisms (SNPs) based on their high allelic frequency differences between the Middle East and populations in Eurasia, namely Europe and South/Central Asia. These candidate AIMs were evaluated with the most current and comprehensive AIM panel to date, the VISAGE Enhanced Tool (ET), using an external validation set of Middle Eastern WGS data (n=137). Instead of relying on pre-defined biogeographic ancestry labels to confirm the accuracy of validation sample ancestry inference, this research produced a deep, unsupervised ADMIXTURE analysis on 3,469 worldwide WGS samples with nearly 2 million independent SNPs (r2 < 0.1) which provided a genetic “ground truth”. This resulted in 136/137 validation samples as Middle East and provided valuable insights toward reference samples with varying co-ancestries that ultimately affects the classification of admixed individuals. Novel deep learning methods, specifically variational autoencoders, were introduced for visualizing one hundred percent of the genetic variance found using these AIMS in an alternative method to PCA and presents distinct population clusters in a robust ancestry space that remains static for the projection of unknown samples to aid in ancestry inference and human identification. </p> <p>Third, this research delves into a craniofacial study that makes improvements toward key intelligence information about physical identity by exploring the relationship between dentition and facial morphology with an advanced phenotyping approach paired with robust dental parameters used in clinical practice. Cone-beam computed tomography (CBCT) imagery was used to analyze the hard and soft tissue of the face at the same time. Low-to-moderate partial correlations were observed in several comparisons of dentition and soft tissue segments. These results included partial correlations of: i) inter-molar width and soft tissue segments nearest the nasal aperture, the lower maxillary sinuses, and a portion of the upper cheek, and ii) of lower incisor inclination and soft tissue segments overlapping the mentolabial fold. These results indicate that helpful intelligence information, potentially leading towards identity in forensic investigations, may be present where hard tissue structures are manifested in an observable way as a soft tissue phenotype. This research was a valuable preliminary study that paves the way towards the addition of facial hard tissue structures in combination with external soft tissue phenotypes to advance fundamental facial genetic research. Thus, CBCT scans greatly add to the current facial imagery landscape available for craniofacial research and provide hard and soft tissue data, each with measurable morphological variation among individuals. When paired with genetic association studies and functional biological experiments, this will ultimately lead to a greater understanding of the intricate coordination that takes place in facial morphogenesis, and in turn, guide clinical orthodontists to better treatment modalities with an emphasis on personalized medicine. Lastly, it aids intelligence methodologies when applied within the field of forensic anthropology.</p>
7

<b>Two Case Studies on the Use of Public Bioinformatics Data Toward Open-Access Research</b>

Daphne Rae Krutulis (18414876) 20 April 2024 (has links)
<p dir="ltr">Open-access bioinformatics data enables accessible public health research for a variety of stakeholders, including teachers and low-resourced researchers. This project outlines two case studies utilizing open-access bioinformatics data sets and analysis software as proofs of concept for the types of research projects that can be adapted for workforce development purposes. The first case study is a spatial temporal analysis of Lyme disease rates in the United States from 2008 to 2020 using freely available data from the United States Department of Agriculture and Centers for Disease Control and Prevention to determine how urbanization and other changes in land use have impacted Lyme disease rates over time. The second case study conducts a pangenome analysis using bacteriophage data from the Actinobacteriophage Database to determine conserved gene regions related to host specificity.</p>
8

Identification and characterization of microRNAs which moderate neutrophil migration and acute inflammation

Alan Y Hsu (8912033) 09 September 2022 (has links)
<p>Neutrophils are the first cells recruited to an immune stimulus stemming from infection or sterile injuries via a mixture of chemoattractant cues. In addition to eliminating pathogens, neutrophils coordinate the overall inflammation by activating and producing inflammatory signals in the tissue while modulating the activation of other immune cells which in some cases leads to adverse tissue damage. Over amplified or chronic neutrophil recruitment directly leads to autoimmune diseases including rheumatic arthritis, diabetes, neurodegenerative diseases, and cancer. Dampening neutrophil recruitment is a strategy to intervene in neutrophil-orchestrated chronic inflammation. Despite intensive research over the past several decades, clinical studies targeting neutrophil migration have been largely unsuccessful, possibly due to the prominent redundancy of adhesion receptors and chemokines. Additional challenges lie in the balance of dampening detrimental inflammation while preserving immunity. Neutrophils are terminally differentiated cells that are hard to study in cell culture. Mouse models are often used to study hematopoiesis, migration, and chemotaxis of neutrophils but is very labor intensive. To discover novel therapeutic targets that modulate neutrophil migration, we performed a neutrophil-specific microRNA (miRNA) overexpression screen in zebrafish and identified eight miRNAs as potent suppressors of neutrophil migration. We have generated transgenic zebrafish lines that overexpresses these candidate miRNAs where we recapitulated the mitigation in neutrophil motility and chemotaxis to tissue injury or infection. Among those we further characterized two miRNAs which have not been reported to regulate neutrophil migration, namely miR-722 and miR-199.</p> <p> </p> <p>MiR-722 downregulates the transcript level of <i>rac2</i> through binding to the <i>rac2</i> 3'UTR. Furthermore, miR-722-overexpressing larvae display improved outcomes in both sterile and bacterial systemic models, which correlates with a robust upregulation of the anti-inflammatory cytokines in the whole larvae and isolated neutrophils. miR-722 protects zebrafish from lethal lipopolysaccharide challenge. In addition, overexpression of mir-722 reduced chemotaxis of human neutrophil like cells, indicating that miR-722 is a potential agent to reduce inflammation in humans. </p> <p>MiR-199<i>,</i> decreases neutrophil chemotaxis in zebrafish and human neutrophil-like cells. Intriguingly, in terminally differentiated neutrophils, miR-199 alters the cell cycle-related pathways and directly suppresses cyclin-dependent kinase 2 (<i>cdk2</i>), whose known activity is restricted to cell cycle progression and cell differentiation. Inhibiting Cdk2, but not DNA replication, disrupts cell polarity and chemotaxis of zebrafish neutrophils without inducing cell death. Human neutrophil-like cells deficient in CDK2 fail to polarize and display altered signaling downstream of the formyl peptide receptor. Chemotaxis of primary human neutrophils is also reduced upon CDK2 inhibition. Furthermore, miR-199 overexpression or CDK2 inhibition significantly improves the outcome of lethal systemic inflammation challenges in zebrafish. </p> <p> </p> <p>In summary, our results reveal previously unknown functions of these miRNAs, and provide potential avenues to modulate neutrophil migration as well as lead to discoveries of novel factors which can regulate this process. We have also discovered a non-classical role of CDK2 in regulating neutrophil migration which provides directions for alleviating systemic inflammation and a better understanding of neutrophil biology. </p>
9

A Machine Learning Model of Perturb-Seq Data for use in Space Flight Gene Expression Profile Analysis

Liam Fitzpatric Johnson (18437556) 27 April 2024 (has links)
<p dir="ltr">The genetic perturbations caused by spaceflight on biological systems tend to have a system-wide effect which is often difficult to deconvolute into individual signals with specific points of origin. Single cell multi-omic data can provide a profile of the perturbational effects but does not necessarily indicate the initial point of interference within a network. The objective of this project is to take advantage of large scale and genome-wide perturbational or Perturb-Seq datasets by using them to pre-train a generalist machine learning model that is capable of predicting the effects of unseen perturbations in new data. Perturb-Seq datasets are large libraries of single cell RNA sequencing data collected from CRISPR knock out screens in cell culture. The advent of generative machine learning algorithms, particularly transformers, make it an ideal time to re-assess large scale data libraries in order to grasp cell and even organism-wide genomic expression motifs. By tailoring an algorithm to learn the downstream effects of the genetic perturbations, we present a pre-trained generalist model capable of predicting the effects of multiple perturbations in combination, locating points of origin for perturbation in new datasets, predicting the effects of known perturbations in new datasets, and annotation of large-scale network motifs. We demonstrate the utility of this model by identifying key perturbational signatures in RNA sequencing data from spaceflown biological samples from the NASA Open Science Data Repository.</p>

Page generated in 0.6431 seconds