• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 376
  • 47
  • 33
  • 20
  • 17
  • 10
  • 8
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 706
  • 706
  • 369
  • 189
  • 173
  • 106
  • 96
  • 94
  • 90
  • 82
  • 81
  • 78
  • 78
  • 76
  • 73
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
691

The Development of Image Processing Algorithms in Cryo-EM

Rui Yan (6591728) 15 May 2019 (has links)
Cryo-electron microscopy (cryo-EM) has been established as the leading imaging technique for structural studies from small proteins to whole cells at a molecular level. The great advances in cryo-EM have led to the ability to provide unique insights into a wide variety of biological processes in a close to native, hydrated state at near-atomic resolutions. The developments of computational approaches have significantly contributed to the exciting achievements of cryo-EM. This dissertation emphasizes new approaches to address image processing problems in cryo-EM, including tilt series alignment evaluation, simultaneous determination of sample thickness, tilt, and electron mean free path based on Beer-Lambert law, Model-Based Iterative Reconstruction (MBIR) on tomographic data, minimization of objective lens astigmatism in instrument alignment and defocus and magnification dependent astigmatism of TEM images. The final goal of these methodological developments is to improve the 3D reconstruction of cryo-EM and visualize more detailed characterization.
692

Developing new methods for estimating population divergence times from sequence data

Svärd, Karl January 2021 (has links)
Methods for estimating past demographic events of populations are powerful tools in order to get insights of otherwise hidden pasts. The genetic data of people is a valuable resource for these purposes as patterns of variation can inform of the past evolutionary forces and historical events that generated them. There is, however, a lack of methods within the field that uses this information to its full extent. That is why this project has looked at developing a set of new alternatives for estimating demographic events. The work done has been based on modifying the purely sequence based method TTo (Two-Two-outgroup) for estimating divergence times of two populations. The modifications consisted of using beta distributions to model the polymorphic diversity of the ancestral population in order to increase the max sample size possible. The finished project resulted in two implemented methods: TT-beta and a partial variant of MM. TT-beta was able to produce estimations in the same region as TTo and showed that the usage of beta distributions had real potential. For MM there only was a partial implementation able to be done, but this one also showed promise and the ability to use varying sample sizes to estimate demographic values.
693

Cluster-Based Analysis Of Retinitis Pigmentosa Candidate Modifiers Using Drosophila Eye Size And Gene Expression Data

James Michael Amstutz (10725786) 01 June 2021 (has links)
<p>The goal of this thesis is to algorithmically identify candidate modifiers for <i>retinitis pigmentosa</i> (RP) to help improve therapy and predictions for this genetic disorder that may lead to a complete loss of vision. A current research by (Chow et al., 2016) focused on the genetic contributors to RP by trying to recognize a correlation between genetic modifiers and phenotypic variation in female <i>Drosophila melanogaster</i>, or fruit flies. In comparison to the genome-wide association analysis carried out in Chow et al.’s research, this study proposes using a K-Means clustering algorithm on RNA expression data to better understand which genes best exhibit characteristics of the RP degenerative model. Validating this algorithm’s effectiveness in identifying suspected genes takes priority over their classification.</p><p>This study investigates the linear relationship between <i>Drosophila </i>eye size and genetic expression to gather statistically significant, strongly correlated genes from the clusters with abnormally high or low eye sizes. The clustering algorithm is implemented in the R scripting language, and supplemental information details the steps of this computational process. Running the mean eye size and genetic expression data of 18,140 female <i>Drosophila</i> genes and 171 strains through the proposed algorithm in its four variations helped identify 140 suspected candidate modifiers for retinal degeneration. Although none of the top candidate genes found in this study matched Chow’s candidates, they were all statistically significant and strongly correlated, with several showing links to RP. These results may continue to improve as more of the 140 suspected genes are annotated using identical or comparative approaches.</p>
694

Building the Interphase Nucleus: A study on the kinetics of 3D chromosome formation, temporal relation to active transcription, and the role of nuclear RNAs

Abramo, Kristin N. 28 July 2020 (has links)
Following the discovery of the one-dimensional sequence of human DNA, much focus has been directed on microscopy and molecular techniques to learn about the spatial organization of chromatin in a 3D cell. The development of these powerful tools has enabled high-resolution, genome-wide analysis of chromosome structure under many different conditions. In this thesis, I focus on how the organization of interphase chromatin is established and maintained following mitosis. Mitotic chromosomes are folded into helical loop arrays creating short and condensed chromosomes, while interphase chromosomes are decondensed and folded into a number of structures at different length scales ranging from loops between CTCF sites, enhancers and promoters to topologically associating domains (TADs), and larger compartments. While the chromatin organization at these two very different states is well defined, the transition from a mitotic to interphase chromatin state is not well understood. The aim of this thesis is to determine how interphase chromatin is organized following mitotic chromosome decondensation and to interrogate factors potentially responsible for driving the transition. First, I determine the temporal order with which CTCF-loops, TADs, and compartments reform as cells exit mitosis, revealing a unique structure at the anaphase-telophase transition never observed before. Second, I test the role of transcription in reformation of 3D chromosome structure and show that active transcription is not required for the formation of most interphase chromatin features; instead, I propose that transcription relies on the proper formation of these structures. Finally, I show that RNA in the interphase nucleus can be degraded with only slight consequences on the overall chromatin organization, suggesting that once interphase chromatin structures are achieved, the structures are stable and RNA is only required to reduce the mixing of active and inactive compartments. Together, these studies further our understanding of how interphase structures form, how these structures relate to functional activities of the interphase cell, and the stability of chromatin structures over time.
695

A Computational Study of the Mechanism for F1-ATPase Inhibition by the Epsilon Subunit

Thomson, Karen J. January 2013 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The multi-protein complex of F0F1 ATP synthase has been of great interest in the fields of microbiology and biochemistry, due to the ubiquitous use of ATP as a biological energy source. Efforts to better understand this complex have been made through structural determination of segments based on NMR and crystallographic data. Some experiments have provided useful data, while others have brought up more questions, especially when structures and functions are compared between bacteria and species with chloroplasts or mitochondria. The epsilon subunit is thought to play a signi cant role in the regulation of ATP synthesis and hydrolysis, yet the exact pathway is unknown due to the experimental difficulty in obtaining data along the transition pathway. Given starting and end point protein crystal structures, the transition pathway of the epsilon subunit was examined through computer simulation.The purpose of this investigation is to determine the likelihood of one such proposed mechanism for the involvement of the epsilon subunit in ATP regulation in bacterial species such as E. coli.
696

Data analysis and creation of epigenetics database

Desai, Akshay A. 21 May 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / This thesis is aimed at creating a pipeline for analyzing DNA methylation epigenetics data and creating a data model structured well enough to store the analysis results of the pipeline. In addition to storing the results, the model is also designed to hold information which will help researchers to decipher a meaningful epigenetics sense from the results made available. Current major epigenetics resources such as PubMeth, MethyCancer, MethDB and NCBI’s Epigenomics database fail to provide holistic view of epigenetics. They provide datasets produced from different analysis techniques which raises an important issue of data integration. The resources also fail to include numerous factors defining the epigenetic nature of a gene. Some of the resources are also struggling to keep the data stored in their databases up-to-date. This has diminished their validity and coverage of epigenetics data. In this thesis we have tackled a major branch of epigenetics: DNA methylation. As a case study to prove the effectiveness of our pipeline, we have used stage-wise DNA methylation and expression raw data for Lung adenocarcinoma (LUAD) from TCGA data repository. The pipeline helped us to identify progressive methylation patterns across different stages of LUAD. It also identified some key targets which have a potential for being a drug target. Along with the results from methylation data analysis pipeline we combined data from various online data reserves such as KEGG database, GO database, UCSC database and BioGRID database which helped us to overcome the shortcomings of existing data collections and present a resource as complete solution for studying DNA methylation epigenetics data.
697

System biology modeling : the insights for computational drug discovery

Huang, Hui January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Traditional treatment strategy development for diseases involves the identification of target proteins related to disease states, and the interference of these proteins with drug molecules. Computational drug discovery and virtual screening from thousands of chemical compounds have accelerated this process. The thesis presents a comprehensive framework of computational drug discovery using system biology approaches. The thesis mainly consists of two parts: disease biomarker identification and disease treatment discoveries. The first part of the thesis focuses on the research in biomarker identification for human diseases in the post-genomic era with an emphasis in system biology approaches such as using the protein interaction networks. There are two major types of biomarkers: Diagnostic Biomarker is expected to detect a given type of disease in an individual with both high sensitivity and specificity; Predictive Biomarker serves to predict drug response before treatment is started. Both are essential before we even start seeking any treatment for the patients. In this part, we first studied how the coverage of the disease genes, the protein interaction quality, and gene ranking strategies can affect the identification of disease genes. Second, we addressed the challenge of constructing a central database to collect the system level data such as protein interaction, pathway, etc. Finally, we built case studies for biomarker identification for using dabetes as a case study. The second part of the thesis mainly addresses how to find treatments after disease identification. It specifically focuses on computational drug repositioning due to its low lost, few translational issues and other benefits. First, we described how to implement literature mining approaches to build the disease-protein-drug connectivity map and demonstrated its superior performances compared to other existing applications. Second, we presented a valuable drug-protein directionality database which filled the research gap of lacking alternatives for the experimental CMAP in computational drug discovery field. We also extended the correlation based ranking algorithms by including the underlying topology among proteins. Finally, we demonstrated how to study drug repositioning beyond genomic level and from one dimension to two dimensions with clinical side effect as prediction features.
698

Unsupervised Detection of Interictal Epileptiform Discharges in Routine Scalp EEG : Machine Learning Assisted Epilepsy Diagnosis

Shao, Shuai January 2023 (has links)
Epilepsy affects more than 50 million people and is one of the most prevalent neurological disorders and has a high impact on the quality of life of those suffering from it. However, 70% of epilepsy patients can live seizure free with proper diagnosis and treatment. Patients are evaluated using scalp EEG recordings which is cheap and non-invasive. Diagnostic yield is however low and qualified personnel need to process large amounts of data in order to accurately assess patients. MindReader is an unsupervised classifier which detects spectral anomalies and generates a hypothesis of the underlying patient state over time. The aim is to highlight abnormal, potentially epileptiform states, which could expedite analysis of patients and let qualified personnel attest the results. It was used to evaluate 95 scalp EEG recordings from healthy adults and adult patients with epilepsy. Interictal Epileptiform discharges (IED) occurring in the samples had been retroactively annotated, along with the patient state and maneuvers performed by personnel, to enable characterization of the classifier’s detection performance. The performance was slightly worse than previous benchmarks on pediatric scalp EEG recordings, with a 7% and 33% drop in specificity and sensitivity, respectively. Electrode positioning and partial spatial extent of events saw notable impact on performance. However, no correlation between annotated disturbances and reduction in performance could be found. Additional explorative analysis was performed on serialized intermediate data to evaluate the analysis design. Hyperparameters and electrode montage options were exposed to optimize for the average Mathew’s correlation coefficient (MCC) per electrode per patient, on a subset of the patients with epilepsy. An increased window length and lowered amount of training along with an common average montage proved most successful. The Euclidean distance of cumulative spectra (ECS), a metric suitable for spectral analysis, and homologous L2 and L1 loss function were implemented, of which the ECS further improved the average performance for all samples. Four additional analyses, featuring new time-frequency transforms and multichannel convolutional autoencoders were evaluated and an analysis using the continuous wavelet transform (CWT) and a convolutional autoencoder (CNN) performed the best, with an average MCC score of 0.19 and 56.9% sensitivity with approximately 13.9 false positives per minute.
699

VISUAL ANALYTICS OF BIG DATA FROM MOLECULAR DYNAMICS SIMULATION

Catherine Jenifer Rajam Rajendran (5931113) 03 February 2023 (has links)
<p>Protein malfunction can cause human diseases, which makes the protein a target in the process of drug discovery. In-depth knowledge of how protein functions can widely contribute to the understanding of the mechanism of these diseases. Protein functions are determined by protein structures and their dynamic properties. Protein dynamics refers to the constant physical movement of atoms in a protein, which may result in the transition between different conformational states of the protein. These conformational transitions are critically important for the proteins to function. Understanding protein dynamics can help to understand and interfere with the conformational states and transitions, and thus with the function of the protein. If we can understand the mechanism of conformational transition of protein, we can design molecules to regulate this process and regulate the protein functions for new drug discovery. Protein Dynamics can be simulated by Molecular Dynamics (MD) Simulations.</p> <p>The MD simulation data generated are spatial-temporal and therefore very high dimensional. To analyze the data, distinguishing various atomic interactions within a protein by interpreting their 3D coordinate values plays a significant role. Since the data is humongous, the essential step is to find ways to interpret the data by generating more efficient algorithms to reduce the dimensionality and developing user-friendly visualization tools to find patterns and trends, which are not usually attainable by traditional methods of data process. The typical allosteric long-range nature of the interactions that lead to large conformational transition, pin-pointing the underlying forces and pathways responsible for the global conformational transition at atomic level is very challenging. To address the problems, Various analytical techniques are performed on the simulation data to better understand the mechanism of protein dynamics at atomic level by developing a new program called Probing Long-distance interactions by Tapping into Paired-Distances (PLITIP), which contains a set of new tools based on analysis of paired distances to remove the interference of the translation and rotation of the protein itself and therefore can capture the absolute changes within the protein.</p> <p>Firstly, we developed a tool called Decomposition of Paired Distances (DPD). This tool generates a distance matrix of all paired residues from our simulation data. This paired distance matrix therefore is not subjected to the interference of the translation or rotation of the protein and can capture the absolute changes within the protein. This matrix is then decomposed by DPD</p> <p>using Principal Component Analysis (PCA) to reduce dimensionality and to capture the largest structural variation. To showcase how DPD works, two protein systems, HIV-1 protease and 14-3-3 σ, that both have tremendous structural changes and conformational transitions as displayed by their MD simulation trajectories. The largest structural variation and conformational transition were captured by the first principal component in both cases. In addition, structural clustering and ranking of representative frames by their PC1 values revealed the long-distance nature of the conformational transition and locked the key candidate regions that might be responsible for the large conformational transitions.</p> <p>Secondly, to facilitate further analysis of identification of the long-distance path, a tool called Pearson Coefficient Spiral (PCP) that generates and visualizes Pearson Coefficient to measure the linear correlation between any two sets of residue pairs is developed. PCP allows users to fix one residue pair and examine the correlation of its change with other residue pairs.</p> <p>Thirdly, a set of visualization tools that generate paired atomic distances for the shortlisted candidate residue and captured significant interactions among them were developed. The first tool is the Residue Interaction Network Graph for Paired Atomic Distances (NG-PAD), which not only generates paired atomic distances for the shortlisted candidate residues, but also display significant interactions by a Network Graph for convenient visualization. Second, the Chord Diagram for Interaction Mapping (CD-IP) was developed to map the interactions to protein secondary structural elements and to further narrow down important interactions. Third, a Distance Plotting for Direct Comparison (DP-DC), which plots any two paired distances at user’s choice, either at residue or atomic level, to facilitate identification of similar or opposite pattern change of distances along the simulation time. All the above tools of PLITIP enabled us to identify critical residues contributing to the large conformational transitions in both HIV-1 protease and 14-3-3σ proteins.</p> <p>Beside the above major project, a side project of developing tools to study protein pseudo-symmetry is also reported. It has been proposed that symmetry provides protein stability, opportunities for allosteric regulation, and even functionality. This tool helps us to answer the questions of why there is a deviation from perfect symmetry in protein and how to quantify it.</p>
700

Exploring DeepSEA CNN and DNABERT for Regulatory Feature Prediction of Non-coding DNA

Stachowicz, Jacob January 2021 (has links)
Prediction and understanding of the regulatory effects of non-coding DNA is an extensive research area in genomics. Convolutional neural networks have been used with success in the past to predict regulatory features, making chromatin feature predictions based solely on non-coding DNA sequences. Non-coding DNA shares various similarities with the human spoken language. This makes Language models such as the transformer attractive candidates for deciphering the non-coding DNA language. This thesis investigates how well the transformer model, usually used for NLP problems, predicts chromatin features based on genome sequences compared to convolutional neural networks. More specifically, the CNN DeepSEA, which is used for regulatory feature prediction based on noncoding DNA, is compared with the transformer DNABert. Further, this study explores the impact different parameters and training strategies have on performance. Furthermore, other models (DeeperDeepSEA and DanQ) are also compared on the same tasks to give a broader comparison value. Lastly, the same experiments are conducted on modified versions of the dataset where the labels cover different amounts of the DNA sequence. This could prove beneficial to the transformer model, which can understand and capture longrange dependencies in natural language problems. The replication of DeepSEA was successful and gave similar results to the original model. Experiments used for DeepSEA were also conducted on DNABert, DeeperDeepSEA, and DanQ. All the models were trained on different datasets, and their results were compared. Lastly, a Prediction voting mechanism was implemented, which gave better results than the models individually. The results showed that DeepSEA performed slightly better than DNABert, regarding AUC ROC. The Wilcoxon Signed-Rank Test showed that, even if the two models got similar AUC ROC scores, there is statistical significance between the distribution of predictions. This means that the models look at the dataset differently and might be why combining their prediction presents good results. Due to time restrictions of training the computationally heavy DNABert, the best hyper-parameters and training strategies for the model were not found, only improved. The Datasets used in this thesis were gravely unbalanced and is something that needs to be worked on in future projects. This project works as a good continuation for the paper Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk, Which uses the DeepSEA model to learn more about how specific mutations correlate with Autism Spectrum Disorder. / Arbetet kring hur icke-kodande DNA påverkar genreglering är ett betydande forskningsområde inom genomik. Convolutional neural networks (CNN) har tidigare framgångsrikt använts för att förutsäga reglerings-element baserade endast på icke-kodande DNA-sekvenser. Icke-kod DNA har ett flertal likheter med det mänskliga språket. Detta gör språkmodeller, som Transformers, till attraktiva kandidater för att dechiffrera det icke-kodande DNA-språket. Denna avhandling undersöker hur väl transformermodellen kan förutspå kromatin-funktioner baserat på gensekvenser jämfört med CNN. Mer specifikt jämförs CNN-modellen DeepSEA, som används för att förutsäga reglerande funktioner baserat på icke-kodande DNA, med transformern DNABert. Vidare undersöker denna studie vilken inverkan olika parametrar och träningsstrategier har på prestanda. Dessutom jämförs andra modeller (DeeperDeepSEA och DanQ) med samma experiment för att ge ett bredare jämförelsevärde. Slutligen utförs samma experiment på modifierade versioner av datamängden där etiketterna täcker olika mängder av DNA-sekvensen. Detta kan visa sig vara fördelaktigt för transformer modellen, som kan förstå beroenden med lång räckvidd i naturliga språkproblem. Replikeringen av DeepSEA experimenten var lyckad och gav liknande resultat som i den ursprungliga modellen. Experiment som användes för DeepSEA utfördes också på DNABert, DeeperDeepSEA och DanQ. Alla modeller tränades på olika datamängder, och resultat på samma datamängd jämfördes. Slutligen implementerades en algoritm som kombinerade utdatan av DeepDEA och DNABERT, vilket gav bättre resultat än modellerna individuellt. Resultaten visade att DeepSEA presterade något bättre än DNABert, med avseende på AUC ROC. Wilcoxon Signed-Rank Test visade att, även om de två modellerna fick liknande AUC ROC-poäng, så finns det en statistisk signifikans mellan fördelningen av deras förutsägelser. Det innebär att modellerna hanterar samma information på olika sätt och kan vara anledningen till att kombinationen av deras förutsägelser ger bra resultat. På grund av tidsbegränsningar för träning av det beräkningsmässigt tunga DNABert hittades inte de bästa hyper-parametrarna och träningsstrategierna för modellen, utan förbättrades bara. De datamängder som användes i denna avhandling var väldigt obalanserade, vilket måste hanteras i framtida projekt. Detta projekt fungerar som en bra fortsättning för projektet Whole-genome deep-learning analysis identifies contribution of non-coding mutations to autism risk, som använder DeepSEA-modellen för att lära sig mer om hur specifika DNA-mutationer korrelerar med autismspektrumstörning.

Page generated in 0.3429 seconds