691 |
Feature Learning as a Tool to Identify Existence of Multiple Biological PatternsPatsekin, Aleksandr 13 June 2018 (has links)
<p> This paper introduces a novel approach for assessing multiple patterns in biological imaging datasets. The developed tool should be able to provide most probable structure of a dataset of images that consists of biological patterns not encountered during the model training process. The tool includes two major parts: (1) feature learning and extraction pipeline and (2) subsequent clustering with estimation of number of classes. The feature-learning part includes two deep-learning techniques and a feature quantitation pipeline as a benchmark method. Clustering includes three non-parametric methods. K-means clustering is employed for validation and hypothesis testing by comparing results with provided ground truth. The most appropriate methods and hyper-parameters were suggested to achieve maximum clustering quality. A convolutional autoencoder demonstrated the most stable and robust results: entropy-based V-measure metric 0.9759 on a dataset of classes employed for training and 0.9553 on a dataset of completely novel classes.</p><p>
|
692 |
Developing RNA diagnostics for studying healthy human ageingSood, Sanjana January 2017 (has links)
Developing strategies to cope with increase in the ageing population and age-related chronic diseases is one of the societies biggest challenges. The characteristics of the ageing process shows significant inter-individual variation. Building genomic signatures that could account for variation in health outcomes with age may facilitate early prognosis of individual age-correlated diseases (e.g. cancer, coronary artery diseases and dementia) and help in developing better targeted treatments provided years in advance of acquiring disabling symptoms for these diseases. The aim of this thesis was to explore methods for diagnosing molecular features of human ageing. In particular, we utilise multi-platform transcriptomics, independent clinical data and classification methods to evaluate which human tissues demonstrate a reproducible molecular signature for age and which clinical phenotypes correlated with these new RNA biomarkers.
|
693 |
The investigation of type-specific features of the copper coordinating AA9 proteins and their effect on the interaction with crystalline cellulose using molecular dynamics studiesMoses, Vuyani January 2018 (has links)
AA9 proteins are metallo-enzymes which are crucial for the early stages of cellulose degradation. AA9 proteins have been suggested to cleave glycosidic bonds linking cellulose through the use of their Cu2+ coordinating active site. AA9 proteins possess different regioselectivities depending on the resulting cleavage they form and as result, are grouped accordingly. Type 1 AA9 proteins cleave the C1 carbon of cellulose while Type 2 AA9 proteins cleave the C4 carbon and Type 3 AA9 proteins cleave either C1 or C4 carbons. The steric congestion of the AA9 active site has been proposed to be a contributor to the observed regioselectivity. As such, a bioinformatics characterisation of type-specific sequence and structural features was performed. Initially AA9 protein sequences were obtained from the Pfam database and multiple sequence alignment was performed. The sequences were phylogenetically characterised and sequences were grouped into their respective types and sub-groups were identified. A selection analysis was performed on AA9 LPMO types to determine the selective pressure acting on AA9 protein residues. Motif discovery was then performed to identify conserved sequence motifs in AA9 proteins. Once type-specific sequence features were identified structural mapping was performed to assess possible effects on substrate interaction. Physicochemical property analysis was also performed to assess biochemical differences between AA9 LPMO types. Molecular dynamics (MD) simulations were then employed to dynamically assess the consequences of the discovered type-specific features on AA9-cellulose interaction. Due to the absence of AA9 specific force field parameters MD simulations were not readily applicable. As a result, Potential Energy Surface (PES) scans were performed to evaluate the force field parameters for the AA9 active site using the PM6 semi empirical approach and least squares fitting. A Type 1 AA9 active site was constructed from the crystal structure 4B5Q, encompassing only the Cu2+ coordinating residues, the Cu2+ ion and two water residues. Due to the similarity in AA9 active sites, the Type force field parameters were validated on all three AA9 LPMO types. Two MD simulations for each AA9 LPMO types were conducted using two separate Lennard-Jones parameter sets. Once completed, the MD trajectories were analysed for various features including the RMSD, RMSF, radius of gyration, coordination during simulation, hydrogen bonding, secondary structure conservation and overall protein movement. Force field parameters were successfully evaluated and validated for AA9 proteins. MD simulations of AA9 proteins were able to reveal the presence of unique type-specific binding modes of AA9 active sites to cellulose. These binding modes were characterised by the presence of unique type-specific loops which were present in Type 2 and 3 AA9 proteins but not in Type 1 AA9 proteins. The loops were found to result in steric congestion that affects how the Cu2+ ion interacts with cellulose. As a result, Cu2+ binding to cellulose was observed for Type 1 and not Type 2 and 3 AA9 proteins. In this study force field parameters have been evaluated for the Type 1 active site of AA9 proteins and this parameters were evaluated on all three types and binding. Future work will focus on identifying the nature of the reactive oxygen species and performing QM/MM calculations to elucidate the reactive mechanism of all three AA9 LPMO types.
|
694 |
Development of a minimally invasive molecular biomarker for early detection of lung cancerPerez-Rogers, Joseph 24 March 2017 (has links)
The diagnostic evaluation of ever smokers with pulmonary nodules represents a growing clinical challenge due to the implementation of lung cancer screening. The high false-positive rate of screening frequently results in the use of unnecessary invasive procedures in patients who are ultimately diagnosed as benign, clearly highlighting the need for additional diagnostic approaches. We previously derived and validated a bronchial epithelial gene-expression biomarker to detect lung cancer in ever smokers. However, bronchoscopy is not always chosen as a diagnostic modality. Given that bronchial and nasal epithelial gene-expression are similarly altered by cigarette smoke exposure, we sought to determine if cancer-associated gene-expression might also be detectable in the more readily accessible nasal epithelium.
Nasal epithelial brushings were prospectively collected from ever smokers undergoing diagnostic evaluation for lung cancer in the AEGIS-1 (n=375) and AEGIS-2 (n=130) clinical trials and gene-expression profiled using microarrays. The computational framework used to discover biomarkers in these data was formalized and implemented in an open-source R-package.
We identified 535 genes in the nasal epithelium of AEGIS-1 patients whose expression was associated with lung cancer status. Using matched bronchial gene-expression data from a subset of these patients, we found significantly concordant cancer-associated gene-expression alterations between the two airway sites. A nasal lung cancer classifier derived in the AEGIS-1 cohort that combined clinical factors and nasal gene-expression had significantly higher AUC (0.81) and sensitivity (0.91) than the clinical-factor model alone in independent samples from the AEGIS-2 cohort. These results support that the airway epithelial field of lung cancer-associated injury extends to the nose and demonstrates the potential of using nasal gene-expression as a non-invasive biomarker for lung cancer detection.
The framework for deriving this biomarker was generalized and implemented in an open-source R-package. The package provides a computational pipeline to compare biomarker development strategies using microarray data. The results from this pipeline can be used to highlight the optimal model development parameters for a given dataset leading to more robust and accurate models. This package provides the community with a novel and powerful tool to facilitate biomarker discovery in microarray data.
|
695 |
Determination of specificity and affinity of the Lactose permease (LacY) protein of Escherichia coli through application of molecular dynamics simulationLutimba, Stuart January 2018 (has links)
Proteins are essential in all living organisms. They are involved in various critical activities and are also structural components of cells and tissues. Lactose permease a membrane protein has become a prototype for the major facilitator super family and utilises an existing electrochemical proton gradient to shuttle galactoside sugars to the cell. Therefore it exists in two principle states exposing the internal binding site to either side of the membrane. From previous studies it has been suggested that protonation precedes substrate binding but it is still unclear why this has to occur in the event of substrate binding. Therefore this study aimed to bridge this gap and to determine the chemical characteristics of the transport pathway. Molecular dynamics simulation methods and specialised simulation hardware were employed to elucidate the dependency of substrate binding on the protonation nature of Lactose permease. Protein models that differed in their conformation as well as their protonation states were defined from their respective X-ray structures. Targeted molecular dynamics was implemented to drive the substrate to the binding site and umbrella sampling was used to define the free energy of the transport pathway. It was therefore suggested that protonation for sugar binding is due to the switch-like mechanism of Glu325 in the residue-residue interaction (His322 and Glu269) that leads to sugar binding only in the protonated state of LacY. Furthermore, the free energy profile of sugar transport path way was lower only in the protonated state which indicates stability of sugar binding in the protonated state.
|
696 |
Genes involved in inflammation are within celiac disease risk loci show differential mRNA expressionTahseen Yahia Keelani, Ahlam January 2018 (has links)
Celiac disease (CD) is a chronic autoimmune disease, caused by the consumption of gluten in genetically predisposed individuals. Celiac patients develop many clinical features include; weight loss, diarrhea, and Intestinal damage, and if left untreated, CD patient may face an increased risk of malignancies. Materials and methods403 patient were admitted to the study. These patients were divided into three groups; celiac cases, controls, and latent celiac cases. Gene expression analysis was performed for intestinal biopsies and blood samples (leukocytes) using a quantitative PCR technique. The second section of the study was studying the effect of PRODH enzyme on Drosophila Melanogaster intestines. To achieve that PRODH enzyme and different amino acids were added to the fly food. One way ANOVA and Wilcoxon tests were applied to find out the significant genes. ResultsMost of the differentially expressed genes in celiac disease are involved in the inflammatory response. However, many genes have significantly altered expression in the latent celiac group but not altered significantly in CD group. These genes are CXCL1, IL15RA, IL2RB, MAPK11, and TGM2. They are involved in the TNF signaling pathway and in inflammatory cytokines. It was noticed that in celiac disease there is a significant alteration in PRODH expression in the intestines, and the addition of PRODH enzyme to glutamine has a similar effect on the intestinal gene expression as gluten does. ConclusionWe can conclude that Non-HLA genes are important in activating the immune system, increasing proline level, and developing the clinical features of celiac disease. Secondly, Proline metabolism has an important role in tumor suppression and in augmenting tumor growth, which makes it an important therapeutic target in tumor therapy.
|
697 |
Subgroup Analysis of Patients with Hepatocellular Carcinoma| A Quest for Statistical Algorithms for Tissue Classification ProblemOng, Vy Quoc 06 November 2018 (has links)
<p> Hepatocellular carcinoma (HCC) is the most common type of liver cancer. This type of cancer has been observed with prevalence as the third leading cause of death from cancer worldwide and as the ninth leading cause of cancerous mortality in the United States. People with hepatitis B or C are considered to be at high risk for this kind of cancer. Remarkably poor prognostic HCC patients with low survival rates commonly possess intra-hepatic metastases that are either tumor thrombi in the portal vein or intra-hepatic spread. It is uncommon for them to die of extra-hepatic metastases. Therefore, identifying metastatic HCC has become vital and clinically challenging in efforts of timely therapeutic intervention to improve the survival rate of patients who suffer from this disease. </p><p> To date, studies that look for an accurate molecular profiling model have been developed to identify these patients in advance for a better treatment or intervention. An approach has been to focus on identifying individual candidate genes characterizing metastatic HCC. Another direction has been to find a global genome scale solution by using microarray technology to obtain a gene expression for this carcinoma. Among research following the latter was that developed by Qing-Hai Ye et al., <i>Nature Medicine</i>, Volume 9, Number 4, April 2003. They applied cDNA microarray-based gene expression profiling with compound co-variate predictors for primary HCC, metastatic HCC, and metastasis-free HCC binary classification tasks on a dataset of 87 observations and 9984 features taken from 40 hepatitis B-positive Chinese patients. Notably, a robust 153-gene model was generated to successfully classify tumor-thrombi-in-the-portal-vein samples with metastasis-free samples. However, they admitted distinguishing primary tumor samples from their matched-metastatic lesions were still a challenge. In this molecule signature, a gene named osteopontin, a secreted phosphoprotein, served as the lead gene in diagnosing HCC metastasis. </p><p> The analysis is based on the metastatic status of HCC, which is clinically predetermined. However, the validation of the class definition is needed to investigate if the data are sufficient to translate the three classes predefined. We will use some statistical clustering algorithms to validate the class defined. After that, we will conduct variable selection to find markers that are differentially expressed genes among clinical groups validated from this research. Next, using the compound markers found by this research, we will develop a statistical model to predict a new patient’s HCC type for intervention. The generalized performance of the prediction model will be evaluated via a cross-validation test. This study aims to build a highly accurate model that renders a better classification of the fore-mentioned clinical groups of HCC and thus enhances the rate of predicting metastatic patients.</p><p>
|
698 |
Geometric Approaches for Modeling Movement Quality: Applications in Motor Control and TherapyJanuary 2016 (has links)
abstract: There has been tremendous technological advancement in the past two decades. Faster computers and improved sensing devices have broadened the research scope in computer vision. With these developments, the task of assessing the quality of human actions, is considered an important problem that needs to be tackled. Movement quality assessment finds wide range of application in motor control, health-care, rehabilitation and physical therapy. Home-based interactive physical therapy requires the ability to monitor, inform and assess the quality of everyday movements. Obtaining labeled data from trained therapists/experts is the main limitation, since it is both expensive and time consuming.
Motivated by recent studies in motor control and therapy, in this thesis an existing computational framework is used to assess balance impairment and disease severity in people suffering from Parkinson's disease. The framework uses high-dimensional shape descriptors of the reconstructed phase space, of the subjects' center of pressure (CoP) tracings while performing dynamical postural shifts. The performance of the framework is evaluated using a dataset collected from 43 healthy and 17 Parkinson's disease impaired subjects, and outperforms other methods, such as dynamical shift indices and use of chaotic invariants, in assessment of balance impairment.
In this thesis, an unsupervised method is also proposed that measures movement quality assessment of simple actions like sit-to-stand and dynamic posture shifts by modeling the deviation of a given movement from an ideal movement path in the configuration space, i.e. the quality of movement is directly related to similarity to the ideal trajectory, between the start and end pose. The S^1xS^1 configuration space was used to model the interaction of two joint angles in sit-to-stand actions, and the R^2 space was used to model the subject's CoP while performing dynamic posture shifts for application in movement quality estimation. / Dissertation/Thesis / Masters Thesis Electrical Engineering 2016
|
699 |
Machine Learning Methods for Diagnosis, Prognosis and Prediction of Long-term Treatment Outcome of Major DepressionJanuary 2017 (has links)
abstract: Major Depression, clinically called Major Depressive Disorder, is a mood disorder that affects about one eighth of population in US and is projected to be the second leading cause of disability in the world by the year 2020. Recent advances in biotechnology have enabled us to collect a great variety of data which could potentially offer us a deeper understanding of the disorder as well as advancing personalized medicine.
This dissertation focuses on developing methods for three different aspects of predictive analytics related to the disorder: automatic diagnosis, prognosis, and prediction of long-term treatment outcome. The data used for each task have their specific characteristics and demonstrate unique problems. Automatic diagnosis of melancholic depression is made on the basis of metabolic profiles and micro-array gene expression profiles where the presence of missing values and strong empirical correlation between the variables is not unusual. To deal with these problems, a method of generating a representative set of features is proposed. Prognosis is made on data collected from rating scales and questionnaires which consist mainly of categorical and ordinal variables and thus favor decision tree based predictive models. Decision tree models are known for the notorious problem of overfitting. A decision tree pruning method that overcomes the shortcomings of a greedy nature and reliance on heuristics inherent in traditional decision tree pruning approaches is proposed. The method is further extended to prune Gradient Boosting Decision Tree and tested on the task of prognosis of treatment outcome. Follow-up studies evaluating the long-term effect of the treatments on patients usually measure patients' depressive symptom severity monthly, resulting in the actual time of relapse upper bounded by the observed time of relapse. To resolve such uncertainty in response, a general loss function where the hypothesis could take different forms is proposed to predict the risk of relapse in situations where only an interval for time of relapse can be derived from the observed data. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
700 |
Network and Multi-Omics Analyses of Arabidopsis Arogenate Dehydratase Knock-Out and Over-Expression MutantsHixson, Kim Kathleen 11 July 2018 (has links)
<p> Arogenate dehydratases (ADTs) are enzymes found within the aromatic amino acid pathway. They are responsible for catalyzing the final step in phenylalanine (Phe) biosynthesis in vascular plants. While being essential for protein production in all living systems, Phe additionally is the starting precursor to a multitude of secondary metabolites produced in the phenylpropanoid pathway. Our group discovered that by knocking out ADT isoenzymes in <i>Arabidopsis thaliana </i>, measurable reductions in lignin levels can be achieved in stem tissue. This finding provides the opportunity to study potential mechanisms related to lignin biosynthesis and could have implications in bioengineering applications where alterations in lignin level might be desired. </p><p> Any alteration to a gene family, as important as that of the ADTs, imparts plant-wide biomolecular changes and because of this, it is not only important to know that lignin is reduced but that optimal plant function is maintained or to understand how it has been changed in order to mediate any undesirable effects. Here we utilized a multitude of analytical platforms and data analysis techniques on both ADT knock-outs (KOs) and over-expression (OE) lines. By using both KO and OE lines we could provide validation to our findings, as KO and OE mutants of the same enzyme/s typically show converse biomolecular abundance changes. As a systems level understanding was desired, we utilized a multi-omics strategy (metabolomics, transcriptomics and proteomics). </p><p> Identified metabolites showed which metabolite and metabolite classes were most affected. Major KEGG defined pathway changes were identified at the transcript and protein enzyme family level. Integration of all omics data revealed which enzymatic reactions were most correlated to observed metabolite abundance changes. Network and clustering algorithms identified patterns of molecular change between metabolites, transcripts and proteins and these patterns were further correlated to reveal possible post-transcriptional regulatory processes involved in lignin biosynthesis. </p><p> Taken altogether, these data informed us of how ADT alterations affect the entire biomolecular system of <i>Arabidopsis</i> and also revealed targets for future studies aimed at elucidating further how lignin biosynthesis is regulated at the post-transcriptional and translational levels.</p><p>
|
Page generated in 0.102 seconds