• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 2450
  • 314
  • 255
  • 242
  • 52
  • 46
  • 31
  • 31
  • 31
  • 31
  • 31
  • 31
  • 20
  • 20
  • 14
  • Tagged with
  • 4117
  • 1475
  • 559
  • 550
  • 529
  • 453
  • 444
  • 442
  • 441
  • 417
  • 340
  • 337
  • 335
  • 332
  • 327
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
811

Super-resolution 3D dot localization in Escherichia Coli using a convolutional neural network

Hennig, Patrick January 2022 (has links)
No description available.
812

Little big data - extending plastid genome databases using marine planktonic metagenomes

Huber, Thomas M. January 2022 (has links)
No description available.
813

Bioinformatics analysis on the drug design supporting systems

Guszpit, Emilia January 2023 (has links)
This research project investigates the interactions of staurosporine, a potent kinase inhibitor, with 11 ligands, highlighting its role in drug design and bioinformatics. Focusing on the selectivity and promiscuity of staurosporine in binding to protein kinases, the study employs the MANORAA database for data extraction. A Python script was developed to automate the retrieval and organisation of data, particularly targeting ligands with known affinity numbers. This method efficiently structures complex biochemical information into a comprehensible format. The research culminated in the creation of a website that presents detailed data on staurosporine’s molecular interactions and binding affinities. This website can serve as a valuable tool for researchers, offering insights into the drug's mechanism of action and its implications in therapeutic applications. The study methods included Python scripting for data handling and API integration for efficient data extraction, emphasising the importance of computational tools in bioinformatics. The findings reveal significant insights into the binding dynamics of staurosporine, identifying conserved and variable regions in kinase binding pockets that influence drug efficacy. These results contribute to a deeper understanding of staurosporine's broad spectrum of kinase inhibition and provide a model for future research in drug-protein interaction analysis. This project underscores the significance of accessible data presentation in bioinformatics, facilitating advanced research and development in drug design.
814

Redescription Mining: Algorithms and Applications in Bioinformatics

Kumar, Deept 10 May 2007 (has links)
Scientific data mining purports to extract useful knowledge from massive datasets curated through computational science efforts, e.g., in bioinformatics, cosmology, geographic sciences, and computational chemistry. In the recent past, we have witnessed major transformations of these applied sciences into data-driven endeavors. In particular, scientists are now faced with an overload of vocabularies for describing domain entities. All of these vocabularies offer alternative and mostly complementary (sometimes, even contradictory) ways to organize information and each vocabulary provides a different perspective into the problem being studied. To further knowledge discovery, computational scientists need tools to help uniformly reason across vocabularies, integrate multiple forms of characterizing datasets, and situate knowledge gained from one study in terms of others. This dissertation defines a new pattern class called redescriptions that provides high level capabilities for reasoning across domain vocabularies. A redescription is a shift of vocabulary, or a different way of communicating the same information; redescription mining finds concerted sets of objects that can be defined in (at least) two ways using given descriptors. We present the CARTwheels algorithm for mining redescriptions by exploiting equivalences of partitions induced by distinct descriptor classes as well as applications of CARTwheels to several bioinformatics datasets. We then outline how we can build more complex data mining operations by cascading redescriptions to realize a story, leading to a new data mining capability called storytelling. Besides applications to characterizing gene sets, we showcase its uses in other datasets as well. Finally, we extend the core CARTwheels algorithm by introducing a theoretical framework, based on partitions, to systematically explore redescription space; generalizing from mining redescriptions (and stories) within a single domain to relating descriptors across different domains, to support complex relational data mining scenarios; and exploiting structure of the underlying descriptor space to yield more effective algorithms for specific classes of datasets. / Ph. D.
815

Assessing the Role of Clusters Derived from Large Sequence Similarity Networks for Gene Function Predictions

Vora, Parth Harish 29 May 2020 (has links)
Large scale genomic sequencing efforts have resulted in a massive inflow of raw sequence data. This raw data, when appropriately processed and analyzed, can provide insight to a trained biologist and aid in hypothesis-driven research. Given the time and resource requirements necessary for biological experiments, computational predictions of gene functions can aid in reducing a large list of candidate genes to a few promising targets. Various computational solutions have been proposed and developed for gene function prediction. These solutions utilize various forms of data, such as DNA/RNA/protein sequences, protein structures, interaction networks, literature mining, and a combination of these data sources. However, these methods do not always produce precise results as the underlying data sets used for training or modeling are quite sparse. We developed and used a massive sequence similarity network build over 108 million known protein sequences to aid in protein function prediction. Predictions are made through the alignment of query sequences to representative sequences for a given cluster derived from the massive sequence similarity network. Derived clusters aggregate information (particularly that from the Gene Ontology) from respective members, which we then consolidate through a novel weighted path method. We evaluate our method on four holdout datasets using CAFA evaluation metrics. Our results suggest that clustering significantly reduces the time and memory requirements, with a marginal impact on predictive power. At lower sequence similarity thresholds, our method outperforms other gold standard methods. / Master of Science / We often think of a protein as a nutritional requirement. However, proteins are far more than just food, they play countless and unappreciated roles in facilitating life. From transporting nutrients in the body, synthesis of hormones, functioning as enzymes to expediting chemical reactions, serving as the scaffold for cells and tissues, to protecting the body against foreign pathogens. On a molecular level, each protein is made up of chains of 20 different amino acids, just like a chain of beads, that are then folded to create a 3-dimensional structure. The variations in the ordering of amino acids result in different types of proteins. There are millions of genes across known life, and they perform different functions when translated into proteins. Nature has given us many proteins with interesting properties, and the low cost of sequencing their precursors (DNA) has resulted in large amounts of sequence data that is not yet associated with a function. Biological experiments to determine the function of a protein can be time consuming and expensive. We built a massive network encompassing 108 million protein sequences based on sequence similarity. This ensures that we make use of as much data as possible to make better predictions. Specifically, our work focuses on utilizing this information of similar proteins to aid in predicting the functions of a protein given its sequences. It is based on the idea of guilt by association, such that if two proteins are similar in sequences, they perform similar functions. We show that using computationally efficient methods and large datasets, one can achieve fast and highly precise predictions.
816

B-cell transcriptomic analyses in patients with CVID

Pousas Navarro, Anna Carolina January 2024 (has links)
Common variable immunodeficiency (CVID) encompasses a heterogeneous group of inborn immune system errors characterized by a failure in antibody production. The defective immune response in CVID subjects results in poor clearance of infectious agents and higher susceptibility to severe diseases, mainly caused by encapsulated extracellular bacteria in the respiratory tract. On top of that, autoimmunity, and inflammatory complications are common clinical manifestations, leading to important impairment in the overall health status and lifespan of these patients. This study aimed to understand, at the genetic level, the particularities of the immune pathways from a CVID cohort in contrast to a healthy control group. For that, transcriptomic analyses from bulk RNA-sequencing from in vitro activated B cells were performed. Results from differentially expressed genes and enrichment analyses involving both naïve and memory cells unveiled the peculiarities of the expression network from the pathological B-cell environment. For instance, pathways involving mTOR1, MYC, E2F, and IL-2/STAT5 signaling were downregulated in activated naïve B cells from the CVID group, following the lower expression of genes related to the metabolic processes. Nevertheless, the activated memory B cells revealed an opposite pattern: enrichment in genes related to cell metabolism, as well as the enhancement of mTOR1, p53, STAT3, and MYC targets. Markers of inflammation such as type I interferons and complement, immunosurveillance, and cellular stress response, the latter represented by processes relative to unfolded protein responses, apoptosis, and autophagy, were found over-represented in all activated B cells, naïve and memory, in the CVID group. In summary, these results could indicate major problems in the germinal center reactions from secondary lymphoid organs causing a defective transition from naïve to memory and long-lived plasma cells in patients with CVID, but further studies are needed to validate these assumptions. Finally, since epigenetic mechanisms were also found more expressed in the disease group, the genetic signature solely may not determine an illness’ fate. If future researchers could determine how environmental factors could influence the disease phenotype, a personalized and maybe curative approach for these patients would be a reality.
817

A framework for single-cell morphological data

Frey, Benjamin January 2024 (has links)
In this thesis, I present a comprehensive framework for the analysis of single-cell (SC)morphological data, specifically focusing on the Cell Painting assay. SC technologies haverevolutionized biological research by enabling high-throughput and high-content screeningat the cellular level. Here, the computational challenges and opportunities associated withSC morphological profiling are explored, leveraging both traditional tools like CellProfilerand advanced deep learning methods such as DeepProfiler. This study investigates the potential of SC morphological data to uncover cellular hetero-geneity and identify distinct sub populations within complex datasets. To attain this goal, various feature extraction, normalization, and filtering techniques are employed, followedby unsupervised and supervised learning methods to analyze the extracted features. The results demonstrate the effectiveness of the deep-learning model DeepProfiler in cap-turing intricate cellular features, outperforming the traditional method CellProfiler in most tasks including mechanism of action predictions by as much as 30% macro F1. Thiswork also highlights the importance of efficient computational resources and robust dataprocessing pipelines to handle the large-scale datasets typically generated in SC research.Additionally, I propose a combination of metrics, namely e-distance and SC grit score,for evaluating perturbation strength and filtering morphological data. These metrics, inconjunction with advanced analysis tools such as UMAP and the introduced CellViewer,enhance the interpretability of results, offering a deeper insight into the morphologicalchanges induced by various treatments and subsequent biological implications.
818

Towards More Robust Metagenome Profiling: Modeling and Analysis

Pusadkar, Vaidehi 07 1900 (has links)
With the large-scale metagenome sequencing data produced currently, alignment-free metagenomic profiling approaches have demonstrated the effectiveness of Markov models in addressing the limitations of alignment-based techniques, particularly in handling unclassified reads. The development of POSMM (Python Optimized Standard Markov Model), employing SMM (Standard Markov Model) algorithm, initially showcased competitive performance when compared to tools such as Kraken2. However, when subjected to simulated damages present in ancient metagenomics data, shortcomings emerged, leading to false positives or misclassified sequences that compromised overall classification accuracy. To address this problem, we developed a segmental genome model (SGM) algorithm based on the generation of the ensemble of models representing distinct classes of DNA segments in a genome. SGM incorporated a recursive segmentation and clustering approach to segregate regions of distinct composition in a microbial genome. An ensemble of higher-order Markov models is trained on DNA clusters generated for each genome. A database of models of genomes, with each genome represented by multiple Markov models are then queried to infer the origin of reads from a metagenome. SGM was benchmarked using diverse synthetic metagenome datasets of varying composition, read lengths, and error profiles. The comparative assessment showed that SGM consistently outperformed SMM. SGM brings in significant advances in alignment-free profiling, offering a new promising avenue for metagenomic exploration through its integration in the next version of POSMM. Furthermore, leveraging the power of integration of alignment-free and alignment-based approaches and highlighting the versatility and practicality of these methods in addressing critical public health challenges, we developed a statistical analysis and machine learning pipeline to identify candidate microbes associated with COVID-19. This involved a meta-analysis of the whole genome sequencing data of COVID-19 patients' samples and its predictive modeling to discern the distinctive microbial features. We improve and explore alignment-free metagenome profiling to raise the bar in metagenome profiling in complex real-world samples.
819

Optimizing analysis pipelines for improved variant discovery

Highnam, Gareth Wei An 17 April 2014 (has links)
In modern genomics, all experiments begin data collection with sequencing and downstream alignment or assembly processing. As such, the development of reliable sequencing pipelines is hugely important as a foundation for any future analysis on that data. While much existing work has been done on enhancing the throughput and computational performance of such pipelines, there is still the question of accuracy. The rift in knowledge between speed and accuracy can be attributed to the more conceptually complex nature of what constitutes the measurement of accuracy. Unlike simply parsing logs of memory usage and CPU hours, accuracy requires experimental validation. Subsets of accuracy are also created when assessing alignment or variations around particular genomic features such as indels, Copy Number Variants (CNVs), or microsatellite repeats. Here is the development of accuracy measurements in read alignment and variation calls, allowing the optimization of sequencing pipelines at all stages. The underlying hypothesis, then, is that different sequencing platforms and analysis software can be distinguished from each other in accuracy by both sample and genomic variation of interest. As the term accuracy suggests, the measurements of alignment and variation recall require comparison against a truth set, for which read library simulations and high quality data from the Genome in a Bottle Consortium or Illumina Omni array have served us. In exploring the hypothesis, the measurements are built into a community resource to crowdsource the creation of a benchmarking repository for pipeline comparison. Results from pipelines promoted by this computational model are then wet lab validated with support for a hierarchy of pipeline performance. Particularly, the construction of an accurate pipeline for genotyping microsatellite repeats will be investigated, which is then used to create a database of human microsatellites. Progress in this area is vital for the growth of sequencing in both clinical and research settings. For genomics research to fully translate to the bedside, the boom of new technology must be controlled by rational metrics and industry standardization. This project will address both of these issues, as well as contribute to the understanding of human microsatellite variation. / Ph. D.
820

Exploring the performance of Conformal Prediction on Chemical Properties and Its Influencing Factors

Chen, Yuhang January 2024 (has links)
Machine learning has gained much attention and extended to the field of drug discovery. However, due to the uncertainties of the dataset, predictions should be quantitatively analyzed. Conformal prediction is a powerful method for quantifying these uncertainties, generating a predefined confidence level and a corresponding interval within which the true target is anticipated to fall. This paper aims to explore the effects of different chemical representations of SMILES structures for training (chemical descriptors, Morgan fingerprints), machine learning algorithms (k-nearest neighbor, support vector machine, random forest, extreme gradient boosting, and artificial neural network), and different normalization methods (k-nearest neighbor, Mondrian regression) in influencing the conformal prediction results. We find that Morgan fingerprint outperforms chemical descriptors, Mondrian regression outperforms knearest neighbor for one or several values of coverage, and the mean, median, and standard deviation of the output interval. None of the investigated machine learning methods extremely outperforms the other methods. Conformal predictive system, an alternative form of conformal prediction was also investigated to explore its usefulness in drug discovery.

Page generated in 0.0614 seconds