Global ETD Search

491	Applying Evolutionary Computation and Ensemble Approaches to Protein Contact Map and Protein Function Determination Chapman, Samuel D. 13 January 2017 (has links) <p> Proteins are important biological molecules that perform many different functions in an organism. They are composed of sequences of amino acids that play a large part in determining both their structure and function. In turn, the structures of proteins are related to their functions. Using computational methods for protein study is a popular approach, offering the possibility of being faster and cheaper than experimental methods. These software-based methods are able to take information such as the protein sequence and other empirical data and output predictions such as protein structure or function.</p><p> In this work, we have developed a set of computational methods that are used in the application of protein structure prediction and protein function prediction. For protein structure prediction, we use the evolution of logic circuits to produce logic circuit classifiers that predict the protein contact map of a protein based on high-dimensional feature data. The diversity of the evolved logic circuits allows for the creation of ensembles of classifiers, and the answers from these ensembles are combined to produce more-accurate answers. We also apply a number of ensemble algorithms to our results.</p><p> Our protein function prediction work is based on the use of six existing computational protein function prediction methods, of which four were optimized for use on a benchmark dataset, along with two others developed by collaborators. We used a similar ensemble framework, combining the answers from the six methods into an ensemble using an algorithm, CONS, that we helped develop.</p><p> Our contact map prediction study demonstrated that it was possible to evolve logic circuits for this purpose, and that ensembles of the classifiers improved performance. The results fell short of state-of-the-art methods, and additional ensemble algorithms failed to improve the performance. However, the method was also able to work as a feature detector, discovering salient features from the high-dimensional input data, a computationally-intractable problem. In our protein function prediction work, the combination of methods similarly led to a robust ensemble. The CONS ensemble, while not performing as well as the best individual classifier in absolute terms, was nevertheless very close in terms of performance. More intriguingly, there were many specific cases where it performed better than any single method, indicating that this ensemble provided valuable information not captured by any single methods. </p><p> To our knowledge, this is the first time the evolution of logic circuits has been used in any Bioinformatics problem, and it is expected that as the method becomes more developed, results will improve. It is also expected that the feature-detection aspect of this method can be used in other studies. The function prediction study also marks, to our knowledge, the most-comprehensive ensemble classification for protein function prediction. Finally, we expect that the ensemble classification methods used and developed in our protein structure and function work here will pave the way towards stronger ensemble predictors in the future.</p>
492	A hierarchical spherical radial quadrature algorithm for multilevel GLMMs, GSMMs, and gene pathway analysis Gagnon, Jacob A 01 January 2010 (has links) The first part of my thesis is concerned with estimation for longitudinal data using generalized semi-parametric mixed models and multilevel generalized linear mixed models for a binary response. Likelihood based inferences are hindered by the lack of a closed form representation. Consequently, various integration approaches have been proposed. We propose a spherical radial integration based approach that takes advantage of the hierarchical structure of the data, which we call the 2 SR method. Compared to Pinheiro and Chao’s multilevel Adaptive Gaussian quadrature [37], our proposed method has an improved time complexity with the number of functional evaluations scaling linearly in the number of subjects and in the dimension of random effects per level. Simulation studies show that our approach has similar to better accuracy compared to Gauss Hermite Quadrature (GHQ) and has better accuracy compared to PQL especially in the variance components. The second part of my thesis is concerned with identifying differentially expressed gene pathways/gene sets. We propose a logistic kernel machine to model the gene pathway effect with a binary response. Kernel machines were chosen since they account for gene interactions and clinical covariates. Furthermore, we established a connection between our logistic kernel machine with GLMMs allowing us to use ideas from the GLMM literature. For estimation and testing, we adopted Clarkson’s spherical radial approach [6] to perform the high dimensional integrations. For estimation, our performance in simulation studies is comparable to better than Bayesian approaches at a much lower computational cost. As for testing of the genetic pathway effect, our REML likelihood ratio test has increased power compared to a score test for simulated non-linear pathways. Additionally, our approach has three main advantages over previous methodologies: (1) our testing approach is self-contained rather than competitive, (2) our kernel machine approach can model complex pathway effects and gene-gene interactions, and (3) we test for the pathway effect adjusting for clinical covariates. Motivation for our work is the analysis of an Acute Lymphocytic Leukemia data set where we test for the genetic pathway effect and provide confidence intervals for the fixed effects.
493	A mathematical growth model of the viral population in early HIV-1 infections Giorgi, Elena Edi 01 January 2011 (has links) In this thesis we develop a mathematical model to describe HIV-1 evolution during the first stages of infection (approximately within 40–60 days since onset), when one can assume exponential growth and random accumulation of mutations under a neutral drift. We analyze the Hamming distance (HD) distribution under different models (synchronous and asynchronous) in the absence of selection and recombination. In the second part of the thesis, we introduce recombination and develop a combinatorial approach to estimate the new HD distribution. We conclude describing a T statistic to test significance differences between the HD of two genetic samples, which we derive using U-statistics.
494	Towards large-scale validation of protein flexibility using rigidity analysis Jagodzinski, Filip 01 January 2012 (has links) Proteins are dynamic molecules involved in virtually every chemical process in our bodies. Understanding how they flex and bend provides fundamental insights to their functions. At the atomic level, protein motion cannot be observed using existing experimental methods. To gain insights into these motions, simulation methods are used. However such simulations are computationally expensive. Rigidity analysis is a fast, alternative graph-based method to molecular simulations, that gives information about the flexibility properties of molecules modeled as mechanical structures. Due to the lack of convenient tools for curating protein data, the usefulness of rigidity analysis has been demonstrated on only a handful of proteins to infer several of their biophysical properties. Previous studies also relied on heuristics to determine which choice of modeling options of important stabilizing interactions allowed for extracting relevant biological observations from rigidity analysis results. Thus there is no agreed-upon choice of modeling of stabilizing interactions that is validated with experimental data. In this thesis we make progress towards large-scale validation of protein flexibility using rigidity analysis. We have developed the KINARI software to test the predictive power of using rigidity analysis to infer biophysical properties of proteins. We develop new tools for curating protein data files and for generating biological functional forms and crystal lattices of molecules. We show that rigidity analysis of these biological assemblies provides structural and functional information that would be missed if only the unprocessed data of protein structures were analyzed. To provide a proof-of-concept that rigidity analysis can be used to perform fast evaluation of in silico mutations that may not be easy to perform in vitro, we have developed KINARI-Mutagen. Finally, we perform a systematic study in which we vary how hydrogen bonds and hydrophobic interactions are modeled when constructing a mechanical framework of a protein. We propose a general method to evaluate how varying the modeling of these important inter-atomic interactions affects the degree to which rigidity parameters correlate with experimental stability data. Bioinformatics\|Computer science
495	Modeling the life span of red blood cells Shrestha, Rajiv Prakash 01 January 2012 (has links) The subject of red blood cell (RBC) survival has been discussed in the medical literature for nearly a hundred years. There has been a large amount of experimental work on RBC survival, but the supporting analysis consisted mostly of a number of more or less ad hoc models for the RBC lifespan distribution. In this context, this dissertation makes four key contributions based on the biotin-tagged RBC survival data from healthy subjects: 1. We provide a theory of RBC survival supported by appropriate analysis. Specifically, we apply non-linear mixed effects (NLME) analysis to study the population level and individual level variation in several characteristics of RBC survival, based on random sample survival data. The general approach can be used for data obtained by several different experimental methods. 2. We present a unified analysis of RBC survival data obtained using RBCs labeled at multiple densities of biotin, thus exhibiting, for the first time, the dependence of the estimated RBC survival characteristics as a function of the biotin labeling density. Our results suggest that low-density biotinylation of RBCs does not have a significant effect on RBC survival. 3. We show that, using NLME analysis results from a reference population database, good accuracy in the estimation of clinically relevant parameters from random sample survival data can be achieved with only 2-point or 3-point optimized measurement schedules. 4. We present an argument that RBC survival results obtained from radioactive chromium labeling of RBCs may not be reliable with currently used analysis methods. The analysis presented in the dissertation can potentially be used to study RBC survival in broad range of clinical applications such as drug efficacy, quality of stored blood, and the development of protocols for the management of anemia. Biomedical engineering\|Bioinformatics
496	Development and validation of accelerometer-based activity classification algorithms for older adults: A machine learning approach Sasaki, Jeffer Eidi 01 January 2014 (has links) Machine learning algorithms to classify activity type from wearable accelerometers are important to improve our understanding of the relationship between physical activity (PA) and risk for physical disability in older adults. Therefore, the main objective of this dissertation was to develop and evaluate machine learning algorithms to predict activity type and intensity in older adults from a commercially available accelerometer (ActiGraph GT3X+). In Study 1, we developed machine learning algorithms to classify activity type and intensity from raw accelerometer data in older adults. Thirty-five older adults performed an activity routine comprised of different activities (5 min/activity) while wearing three ActiGraph GT3X+ activity monitors (dominant hip, wrist, and ankle) and a portable metabolic system. Accelerometer and steady-state metabolic data were used to develop artificial neural network, random forest, and support vector machine algorithms (ANNLab, RF Lab, and SVMLab) to predict activity type and intensity in older adults using 20 s classification intervals. Classification accuracy of the models in detecting five activity categories ranged from 87% (ANN Lab hip, RFLab hip, and SVMLab hip) to 96% (SVM Lab wrist). The biases and root mean squared errors (RMSE) for predicted METs ranged from -0.01 MET (RMSE: 0.54 MET) for the RFLab wrist algorithm to 0.02 MET (RMSE: 0.67 MET) for the ANNLab hip algorithm. Study 2 evaluated the performance of the RFLab and SVM Lab algorithms for predicting activity type in free-living conditions. Fifteen participants from Study 1 were observed for 2-3 h in their free-living environment while wearing three ActiGraph GT3X+ activity monitors (dominant hip, wrist, and ankle). The RFLab and SVMLab algorithms were applied to hip, wrist, and ankle accelerometer data to classify five activity categories. Direct observation of activity type and duration served as criterion measures to evaluate percent correct classification rates of the algorithms. Correct classification rates ranged from 49% (SVMLab hip, SVMLab wrist, and RFLab wrist) to 55% (SVM Lab ankle). New RF and SVM algorithms were developed using free-living accelerometer data (RFFL and SVMFL) and different classification intervals were also applied. Correct classification of activity types for the RFFL and SVMFL ranged from 53% (SVMFL wrist, 5 s classification intervals) to 71% (SVMFL ankle, 30 s classification intervals). Overall correct classification rates of up to 76% (RFFL hip and RFFL ankle, 30 s classification intervals) were achieved when classifying only three activity categories. Our machine learning algorithms accurately predict activity type from accelerometer data in older adults under 'laboratory conditions' but not in free-living conditions. We were able to improve free-living classification accuracy using algorithms developed under free-living conditions. Further refinement of the algorithms is required for achieving sufficient accuracy in classifying activity type in free-living older adults. Gerontology\|Kinesiology\|Bioinformatics
497	Mathematical Analysis of Ca2+ Cycling and Protein Turnover in Muscle Fibers Unknown Date (has links) Scientists have used computers to model a plethora of systems since the rise of the computer age. The most common use of modeling is to analyze an existing system. With a complete model of an existing system the parameters can be changed and the results can be observed without the need to set up an experimental apparatus. For biological systems the parameters are determined by the choice of biological species which may limit the kinds of the experiments that can be conducted. For example, if doubling a muscle fiber size is hypothesized to affect the performance of that fiber, this cannot be tested experimentally because nature may not produce a fiber of that size. In previous work, conducted in collaboration between Professor Kinsey and Professor Locke, reaction-diffusion models were developed to determine the diffusion controlled regime in aerobic muscle metabolism by analyzing reaction rates and the spatial concentrations of intermolecular species, as well as determining how spatial locations of mitochondria affect these concentrations (Dasika et al., 2011; Pathi et al., 2011). Weisz (1973) proposed that all biological systems are designed to function within the constraints imposed by the geometry and the subsequent reaction-diffusion system as to not be limited by diffusion constraints. Because of the complexity of a muscle fiber the differing model types have plethora of ranges that can be modeled. The aerobic muscle metabolism that was previously modeled has large diffusion distances, large diffusion coefficients and moderate reaction rates, while the calcium cycling has small diffusion distances, large diffusion coefficients, and extremely fast reaction rates. The protein synthesis has large diffusion distances, small diffusion coefficients and slow reaction rates. This variance in the scales can be very difficult to put into one comprehensive model and assumptions can be made when modeling the specific area of interest. The functional unit of the muscle is a sarcomere which is on the order of microns with millisecond response times for calcium transients. Muscle fibers are on the order of millimeters with time scales ranging from minutes for diffusion to weeks for mitochondria replication and nuclear recruitment (Pathi et al., 2011). The two aspects of the dissertation presented here deal separately with force production at the functional unit level and muscle size and organization at the fiber level. The sarcomere is surrounded by the Sarcoplasmic Reticulum (S.R.) which is the calcium store. The S.R. runs the entire length of the sarcomere. When a muscle is pulsed with electricity calcium is released from the S.R. near the z-line (end caps of the sarcomere). Calcium then diffuses along the length of the sarcomere toward the m-line (center line of the sarcomere) while reacting (binding) with other chemical species. Down the length of the sarcomere the calcium is pumped back into the S.R. via Sarcoplasmic/Endoplasmic Reticulum Calcium (SERCA) pumps. These pumps provide the major route for calcium sequestration in the sarcomere. While calcium is in the sarcomere it can bind to the buffers: troponin-C, parvalbumin, and ATP. These are the major components of the calcium transient model. The entire cycle can occur anywhere from 20 Hz to 200 Hz (Kenneth et al., 2010; Rome et al., 1996). For the nuclear model, only the Myonuclear Domain (MND) is modeled along with mRNA and proteins. It is known that the nucleus contains DNA which is translated into mRNA. The mRNA leaves the nucleus and attaches to a ribosome for transcription. The transcription process produces proteins which are the building blocks for cellular growth in muscle cells. Muscle cells are one of the only cells in the human body that are multi-nucleated. Many research groups believe that in order for muscle cells to grow, more nuclei need to be recruited to the cell. Other groups, however, have demonstrated induced muscle growth without recruiting nuclei (Rehfeldt, 2007). / A Dissertation submitted to the Department of Chemical and Biomedical Engineering in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester, 2015. / February 5, 2015. / Calcium Transients, mRNA, Muscle, Myonuclear Domain, Protein, Reaction-Diffusion / Includes bibliographical references. / Bruce Locke, Professor Directing Dissertation; Bryant Chase, University Representative; Stephen Kinsey, Committee Member; Teng Ma, Committee Member; Ravindran Chella, Committee Member. Biophysics Bioinformatics Biochemistry
498	Comparative mRNA Expression Analysis Leveraging Known Biochemical Interactions Unknown Date (has links) We present two studies incorporating existing biological knowledge into differential gene expression analysis that attempt to place the results within a broader biological context. The studies investigate breast cancer health disparity between differing ethnic groups by comparing gene expression levels in tumor samples from patients from different ethnic populations. We incorporate existing knowledge by making comparisons not just between individual genes, but between sets of related genes and networks of interacting genes. In the first study, a comparison is made between mRNA expression patterns in Asian and Caucasian American breast cancer samples in an attempt to better understand why there are significantly lower breast cancer incidence and mortality rates in Asian Americans compared to Caucasian Americans. In the second study, the expression levels of genes related to drug and xenobiotic metabolizing enzymes (DXME) are compared between African, Asian, and Caucasian American breast cancer patients. The expression of genes related to these enzymes has been found to significantly affect drug clearance and the onset of drug resistance. Both studies found differentially expressed genes and pathways that may be associated with health disparities between the three ethnic populations. A thorough investigation of the literature was made in order to understand the context in which these differences in gene expression could affect the development and progression of breast tumors, and to identify genes and pathways that may be differentially expressed between the ethnic groups in general but not associated with breast cancer. Many of the relevant differences in gene expression were found to be linked to factors such as diet and differences in body composition. The process of finding relevant pathways and sets of interacting genes to inform comparative mRNA expression analysis can be laborious and time consuming. The literature is expanding at an exponential rate, and there is little hope for research groups to be able to keep up with all of the latest research. It is becoming more common for journals to require authors to make their results available in public databases, but many results concerning biochemical interactions are only accessible in unstructured text. Extracting relationships and interactions from the biological literature using techniques from machine learning and natural language processing is an important and growing field of research. To gain a better understanding of this field, we participated in the BioCreative VI Track 4 challenge, which involved classifying PubMed abstracts that contain examples of protein-protein interactions that are affected by a mutation. We discuss the model we developed and the lessons learned while participating in the competition. The problem of acquiring sufficient quantities of quality labeled data is a great obstacle preventing the improvement of performance. We present a web application we are developing to streamline the annotation of entity-entity interactions in text. It makes use of a database of known interactions to locate passages that are likely to be relevant and offers a simple and concise user interface to minimize the cognitive burden on the annotator. / A Dissertation submitted to the Department of Statistics in partial fulfillment of the requirements for the degree of Doctor of Philosophy. / Spring Semester 2018. / April 10, 2018. / Cancer Health Disparity, Gene Expression, Protein-Protein Interactions, Text Annotation, Text Mining / Includes bibliographical references. / Jinfeng Zhang, Professor Directing Dissertation; Qing-Xiang Sang, University Representative; Wei Wu, Committee Member; Xufeng Niu, Committee Member. Bioinformatics Statistics Computer science
499	Adaptive balancing of exploitation with exploration to improve protein structure prediction Brunette, TJ 01 January 2011 (has links) The most significant impediment for protein structure prediction is the inadequacy of conformation space search. Conformation space is too large and the energy landscape too rugged for existing search methods to consistently find near-optimal minima. Conformation space search methods thus have to focus exploration on a small fraction of the search space. The ability to choose appropriate regions, i.e. regions that are highly likely to contain the native state, critically impacts the effectiveness of search. To make the choice of where to explore requires information, with higher quality information resulting in better choices. Most current search methods are designed to work in as many domains as possible, which leads to less accurate information because of the need for generality. However, most domains provide unique, and accurate information. To best utilize domain specific information search needs to be customized for each domain. The first contribution of this thesis customizes search for protein structure prediction, resulting in significantly more accurate protein structure predictions. Unless information is perfect, mistakes will be made, and search will focus on regions that do not contain the native state. How search recovers from mistakes is critical to its effectiveness. To recover from mistakes, this thesis introduces the concept of adaptive balancing of exploitation with exploration. Adaptive balancing of exploitation with exploration allows search to use information only to the extent to which it guides exploration toward the native state. Existing methods of protein structure prediction rely on information from known proteins. Currently, this information is from either full-length proteins that share similar sequences, and hence have similar structures (homologs), or from short protein fragments. Homologs and fragments represent two extremes on the spectrum of information from known proteins. Significant additional information can be found between these extremes. However, current protein structure prediction methods are unable to use information between fragments and homologs because it is difficult to identify the correct information from the enormous amount of incorrect information. This thesis makes it possible to use information between homologs and fragments by adaptively balancing exploitation with exploration in response to an estimate of template protein quality. My results indicate that integrating the information between homologs and fragments significantly improves protein structure prediction accuracy, resulting in several proteins predicted with <1 angstrom RMSD resolution. Bioinformatics\|Computer science
500	Transcriptome of Mycobacterium riyadhense in an in vitro Infection Model Alwajeeh, Hanouf 08 1900 (has links) Mycobacteria is a genus characterized by its unique layer of mycomembrane, which enhances its pathogenicity causing notorious infections such as tuberculosis or leprosy in humans. Some pathogenic mycobacteria are part of the Mycobacterium tuberculosis complex (MTBC), while others are predominantly environmental and belong to the class of non-tuberculosis mycobacteria (NTM). Some of the NTMs are also opportunistic pathogens causing infections mostly in immunocompromised individuals. In this study, we focus on a recently discovered species of NTM known as M. riyadhense, originally isolated from a patient with TB-like symptoms in Riyadh. With prepublication access to the completely assembled and fully annotated genomes of M. riyadhense, we wanted to study the gene expression of M. riyadhense after establishing an infection model using a murine macrophage cell line. We performed transcriptomic analysis of M. riyadhense upon infection using RAW264.7 murine macrophages to determine the hallmarks of differentially expressed (DE) genes at early infection time points. Most DE genes observed belong to one of the crucial secretion systems known as ESX-1. Most genes were highly upregulated during 12-hour of infection, particularly esxA and esxB, which encode for ESAT-6 and CFP-10 secretion proteins. These substrates are essential for the virulence and pathogenicity of Mycobacterium tuberculosis (MTB). In addition, we observed downregulation of WhiB5, a transcriptional regulator that is a well-known controller of Mycobacterium tuberculosis virulence and reactivation, and regulates genes encoding the constituents of two type VII secretion systems, namely, ESX-2 and ESX-4. We have also identified other genes of yet unknown function that are highly upregulated during early infection needing functional characterization in future follow-up studies. Overall, we have established an in vitro cell infection model for M. riyadhense that can be used to study host pathogen cross talks during infection processes in tubercle bacilli. Transcriptomics Mycobacteriology Bioinformatics

Search results