41 |
Framtidens biomarkörer : En prioritering av proteinerna i det humana plasmaproteometAntonsson, Elin, Eulau, William, Fitkin, Louise, Johansson, Jennifer, Levin, Fredrik, Lundqvist, Sara, Palm, Elin January 2019 (has links)
In this report, we rank possible protein biomarkers based on different criteria for use in Olink Proteomics’ protein panels. We started off with a list compiled through the Human Plasma Proteome Project (HPPP) and have in different ways used this to obtain the final results. To complete this task we compared the list with Olink’s and its competitors’ protein catalogs, identified diseases beyond Olink’s coverage and the proteins linked with these. We also created a scoring system used to fa- cilitate detection of good biomarkers. From this, we have concluded that Olink should focus on proteins that the competitors have in their catalogs and proteins that can be found in many pathways and are linked with many diseases. From each of the methods used, we have been able to identify a number of proteins that we recommend Olink to investigate further.
|
42 |
Method for recognizing local descriptors of protein structures using Hidden Markov ModelsBjörkholm, Patrik January 2008 (has links)
Being able to predict the sequence-structure relationship in proteins will extend the scope of many bioinformatics tools relying on structure information. Here we use Hidden Markov models (HMM) to recognize and pinpoint the location in target sequences of local structural motifs (local descriptors of protein structure, LDPS) These substructures are composed of three or more segments of amino acid backbone structures that are in proximity with each other in space but not necessarily along the amino acid sequence. We were able to align descriptors to their proper locations in 41.1% of the cases when using models solely built from amino acid information. Using models that also incorporated secondary structure information, we were able to assign 57.8% of the local descriptors to their proper location. Further enhancements in performance was yielded when threading a profile through the Hidden Markov models together with the secondary structure, with this material we were able assign 58,5% of the descriptors to their proper locations. Hidden Markov models were shown to be able to locate LDPS in target sequences, the performance accuracy increases when secondary structure and the profile for the target sequence were used in the models.
|
43 |
The role of RFX-target genes in neurodevelopmental and psychiatric disordersGanesan, Abhishekapriya January 2021 (has links)
Neurodevelopmental disorders such as autism spectrum disorder (ASD) and psychiatric disorders, for example, schizophrenia (SCZ) represent a large spectrum of disorders that manifest through cognitive and behavioural problems. ASD and SCZ are both highly heritable, and some phenotypic similarities between ASD and SCZ have sparked an interest in understanding their genetic commonalities. The genetics of both disorders exhibit significant heterogeneity. Developments in genomics and systems biology, continually increases people’s understanding of these disorders. Recently, pathogenic genetic variants in the regulatory factor X (RFX) family of transcription factors have been identified in a number of ASD cases. In this thesis, common genetic variants and expression patterns of genes identified to have a conserved promotor X-Box motif region, a binding site of RFX factors, are studied. Significant common variants identified through expression quantitative trait loci (eQTLs) and genome wide association studies (GWAS) are mapped to the regulatory regions of these genes and analysed for putative enrichment. In addition, single-cell RNA sequencing data is utilised to examine enrichment of cell types having high X-Box gene expression in the developing human cortex. Through the study, genes that have eQTLs or SNPs in the genomic regulatory regions of the X-Box genes have been identified. While there were no eQTLs or GWAS SNPs in the X-Box motifs, in the X-Box promoter regions some common variants were found. By hypergeometric distribution testing and the subsequent p-values obtained, all of these distributions are statistically under-enriched. Further, major cell types in the cortical region with increased expression of the X-Box genes and most expressed genes among these enriched cell types have been identified. Among the 11 cell types seven were found to be enriched for X-Box genes and many of the most expressed genes in these cell-types were similar. A further study into the cell types and genes identified, along with additional systems biological data analysis, could reveal a larger list of X-Box genes involved in ASD and SCZ and the specific roles of these genes.
|
44 |
Collective analysis of multiple high-throughput gene expression datasetsAbu Jamous, Basel January 2015 (has links)
Modern technologies have resulted in the production of numerous high-throughput biological datasets. However, the pace of development of capable computational methods does not cope with the pace of generation of new high-throughput datasets. Amongst the most popular biological high-throughput datasets are gene expression datasets (e.g. microarray datasets). This work targets this aspect by proposing a suite of computational methods which can analyse multiple gene expression datasets collectively. The focal method in this suite is the unification of clustering results from multiple datasets using external specifications (UNCLES). This method applies clustering to multiple heterogeneous datasets which measure the expression of the same set of genes separately and then combines the resulting partitions in accordance to one of two types of external specifications; type A identifies the subsets of genes that are consistently co-expressed in all of the given datasets while type B identifies the subsets of genes that are consistently co-expressed in a subset of datasets while being poorly co-expressed in another subset of datasets. This contributes to the types of questions which can addressed by computational methods because existing clustering, consensus clustering, and biclustering methods are inapplicable to address the aforementioned objectives. Moreover, in order to assist in setting some of the parameters required by UNCLES, the M-N scatter plots technique is proposed. These methods, and less mature versions of them, have been validated and applied to numerous real datasets from the biological contexts of budding yeast, bacteria, human red blood cells, and malaria. While collaborating with biologists, these applications have led to various biological insights. In yeast, the role of the poorly-understood gene CMR1 in the yeast cell-cycle has been further elucidated. Also, a novel subset of poorly understood yeast genes has been discovered with an expression profile consistently negatively correlated with the well-known ribosome biogenesis genes. Bacterial data analysis has identified two clusters of negatively correlated genes. Analysis of data from human red blood cells has produced some hypotheses regarding the regulation of the pathways producing such cells. On the other hand, malarial data analysis is still at a preliminary stage. Taken together, this thesis provides an original integrative suite of computational methods which scrutinise multiple gene expression datasets collectively to address previously unresolved questions, and provides the results and findings of many applications of these methods to real biological datasets from multiple contexts.
|
45 |
Clustering biological data using a hybrid approach : Composition of clusterings from different featuresKeller, Jens January 2008 (has links)
Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology. The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.
|
46 |
Timing of chromosomal alterations during tumour developmentViklund, Björn January 2017 (has links)
During cancer development, tumour cells will accumulate a lot of both somatic point mutations and copy number alterations. It is not unusual that affected genes have a copy number that differs from the usual two. Due to the loss of DNA repair mechanisms the cells can mutate independent from each other which gives rise to different subclones within the tumour. A tumour cell and its future daughter cells that gets an advantage in cell division speed compared to its competing neighbours, will eventually make up a large portion of the tumour. All the mutations that the subclone’s most recent common ancestor acquired until the expansion will be shared across the subclone. In this project, we have developed a method using the mutation frequencies from publicly available whole genome sequencing data, to quantify the amount of competing subclones in a sample and determining the time to its copy number duplications. This method could be further developed to be an extension to regular copy number analysis. A heterogeneous tumour can grow faster and be more resistant to treatment. Therefore, it is important to learn more about cancer development and get a greater understanding of the order in which copy number alterations occur.
|
47 |
A comparative validation of the human variant simulator SIMdromÅnäs, Sofia January 2017 (has links)
The past decade’s progress in next generation sequencing has drastically decreased the price of whole genome and exome sequencing, making it available as a clinical tool for diagnosing patients with genetic disease. However, finding a disease-causing mutation among millions of non-pathogenic variants in a patient’s genome, is not an easy task. Therefore, algorithms for finding variants relevant for clinicians to investigate more closely are needed and constantly developed. To test these algorithms a software called SIMdrom has been developed to simulate test data. In this project, the simulated data is validated through comparison to real genetic data to ensure that it is suitable to use as test data. Through ensuring the data’s reliability and finding possible improvements, the development of algorithms for finding disease-causing mutations can be facilitated. This in-turn could lead to better diagnosing-possibilities for clinicians. When visualizing simulated data together with real genomes using principal components analysis, it clusters near it’s real counterpart. This shows that the simulated data resembles the real genomes. Simulated exomes also performed well when used as a part in one of three training sets for the classifier in the Prioritization of Exome Data by Image Analysis study. Here they perform second best after an in-house data set consisting of real exomes. To conclude, the SIMdrom simulated data performs well in both parts of this project. Additional tests of its validity should include testing against larger real data sets, an improvement possibility could be to implement a simulation option for spiking in noise.
|
48 |
Predicting gene expression using artificial neural networksLindefelt, Lisa January 2002 (has links)
Today one of the greatest aims within the area of bioinformatics is to gain a complete understanding of the functionality of genes and the systems behind gene regulation. Regulatory relationships among genes seem to be of a complex nature since transcriptional control is the result of complex networks interpreting a variety of inputs. It is therefore essential to develop analytical tools detecting complex genetic relationships. This project examines the possibility of the data mining technique artificial neural network (ANN) detecting regulatory relationships between genes. As an initial step for finding regulatory relationships with the help of ANN the goal of this project is to train an ANN to predict the expression of an individual gene. The genes predicted are the nuclear receptor PPAR-g and the insulin receptor. Predictions of the two target genes respectively were made using different datasets of gene expression data as input for the ANN. The results of the predictions of PPAR-g indicate that it is not possible to predict the expression of PPAR-g under the circumstances for this experiment. The results of the predictions of the insulin receptor indicate that it is not possible to discard using ANN for predicting the gene expression of an individual gene.
|
49 |
Using nuclear receptor interactions as biomarkers for metabolic syndromeHettne, Kristina January 2003 (has links)
Metabolic syndrome is taking epidemic proportions, especially in developed countries. Each risk factor component of the syndrome independently increases the risk of developing coronary artery disease. The risk factors are obesity, dyslipidemia, hypertension, diabetes type 2, insulin resistance, and microalbuminuria. Nuclear receptors is a family of receptors that has recently received a lot of attention due to their possible involvement in metabolic syndrome. Putting the receptors into context with their co-factors and ligands may reveal therapeutic targets not found by studying the receptors alone. Therefore, in this thesis, interactions between genes in nuclear receptor pathways were analysed with the goal of investigating if these interactions can supply leads to biomarkers for metabolic syndrome. Metabolic syndrome donor gene expression data from the BioExpressä, database was analysed with the APRIORI algorithm (Agrawal et al. 1993) for generating and mining association rules. No association rules were found to function as biomarkers for metabolic syndrome, but the resulting rules show that the data mining technique successfully found associations between genes in signaling pathways.
|
50 |
Characterization of immune infiltrate in early breast cancer based on a multiplex imaging methodZacharouli, Markella-Achilleia January 2020 (has links)
Breast cancer is the most common type of cancer among women worldwide. Multiple studies have reported the role of tumor-immune interactions and mechanisms that the immune system uses to combat tumor cells. Therapies based on the immune response are evolving by time, but more research is required to understand and identify the patterns and relationships within the tumor microenvironment. This study aims to characterize immune cell expression patterns using a multiplex method and to investigate the way different subpopulations in breast cancer patients’ tissue samples are correlated with clinicopathological characteristics. The results of this study indicate that there must be an association within immune cell composition and clinicopathological characteristics (Estrogen Receptor Status (ER+/ER-), Progesterone Receptor (PR+/PR-), Grade (I,II,III), which is a way to characterize the cancer cells on how similar they look to normal ones, Menopause, Tumor size, Nodal status, HR status, HER2) but validation in larger patient population is required in order to evaluate the role of the immune infiltration as a predictive / prognostic biomarker in early breast cancer.
|
Page generated in 0.1531 seconds