Spelling suggestions: "subject:"[een] BIOINFORMATICS"" "subject:"[enn] BIOINFORMATICS""
561 |
Structural characterization of overrepresentedLysholm, Fredrik January 2008 (has links)
Background: Through the last decades vast amount of sequence information have been produced by various protein sequencing projects, which enables studies of sequential patterns. One of the bestknown efforts to chart short peptide sequences is the Prosite pattern data bank. While sequential patterns like those of Prosite have proved very useful for classifying protein families, functions etc. structural analysis may provide more information and possible crucial clues linked to protein folding. Today PDB, which is the main repository for protein structure, contains more than 50’000 entries which enables structural protein studies. Result: Strongly folded pentapeptides, defined as pentapeptides which retained a specific conformation in several significantly structurally different proteins, were studied out of PDB. Among these several groups were found. Possibly the most well defined is the “double Cys” pentapeptide group, with two amino acids in between (CXXCX|XCXXC) which were found to form backbone loops where the two Cysteine amino acids formed a possible Cys-Cys bridge. Other structural motifs were found both in helixes and in sheets like "ECSAM" and "TIKIW", respectively. Conclusion: There is much information to be extracted by structural analysis of pentapeptides and other oligopeptides. There is no doubt that some pentapeptides are more likely to obtain a specific fold than others and that there are many strongly folded pentapeptides. By combining the usage of such patterns in a protein folding model, such as the Hydrophobic-polar-model improvements in speed and accuracy can be obtained. Comparing structural conformations for important overrepresented pentapeptides can also help identify and refine both structural information data banks such as SCOP and sequential pattern data banks such as Prosite.
|
562 |
Computational analyses of biological sequences -applications to antibody-based proteomics and gene family characterizationLindskog, Mats January 2005 (has links)
<p>Following the completion of the human genome sequence, post-genomic efforts have shifted the focus towards the analysis of the encoded proteome. Several different systematic proteomics approaches have emerged, for instance, antibody-based proteomics initiatives, where antibodies are used to functionally explore the human proteome. One such effort is HPR (the Swedish Human Proteome Resource), where affinity-purified polyclonal antibodies are generated and subsequently used for protein expression and localization studies in normal and diseased tissues. The antibodies are directed towards protein fragments, PrESTs (Protein Epitope Signature Tags), which are selected based on criteria favourable in subsequent laboratory procedures.</p><p>This thesis describes the development of novel software (Bishop) to facilitate the selection of proper protein fragments, as well as ensuring a high-throughput processing of selected target proteins. The majority of proteins were successfully processed by this approach, however, the design strategy resulted in a number ofnfall-outs. These proteins comprised alternative splice variants, as well as proteins exhibiting high sequence similarities to other human proteins. Alternative strategies were developed for processing of these proteins. The strategy for handling of alternative splice variants included the development of additional software and was validated by comparing the immunohistochemical staining patterns obtained with antibodies generated towards the same target protein. Processing of high sequence similarity proteins was enabled by assembling human proteins into clusters according to their pairwise sequence identities. Each cluster was represented by a single PrEST located in the region of the highest sequence similarity among all cluster members, thereby representing the entire cluster. This strategy was validated by identification of all proteins within a cluster using antibodies directed to such cluster specific PrESTs using Western blot analysis. In addition, the PrEST design success rates for more than 4,000 genes were evaluated.</p><p>Several genomes other than human have been finished, currently more than 300 genomes are fully sequenced. Following the release of the tree model organism black cottonwood (<i>Populus trichocarpa</i>), a bioinformatic analysis identified unknown cellulose synthases (CesAs), and revealed a total of 18 CesA family members. These genes are thought to have arisen from several rounds of genome duplication. This number is significantly higher than previous studies performed in other plant genomes, which comprise only ten CesA family members in those genomes. Moreover, identification of corresponding orthologous ESTs belonging to the closely related hybrid aspen (<i>P</i>. <i>tremula x tremuloides</i>) for two pairs of CesAs suggest that they are actively transcribed. This indicates that a number of paralogs have preserved their functionalities following extensive genome duplication events in the tree’s evolutionary history.</p>
|
563 |
Development of a hierarchical k-selecting clustering algorithm – application to allergy.Malm, Patrik January 2007 (has links)
<p>The objective with this Master’s thesis was to develop, implement and evaluate an iterative procedure for hierarchical clustering with good overall performance which also merges features of certain already described algorithms into a single integrated package. An accordingly built tool was then applied to an allergen IgE-reactivity data set. The finally implemented algorithm uses a hierarchical approach which illustrates the emergence of patterns in the data. At each level of the hierarchical tree a partitional clustering method is used to divide data into k groups, where the number k is decided through application of cluster validation techniques. The cross-reactivity analysis, by means of the new algorithm, largely arrives at anticipated cluster formations in the allergen data, which strengthen results obtained through previous studies on the subject. Notably, though, certain unexpected findings presented in the former analysis where aggregated differently, and more in line with phylogenetic and protein family relationships, by the novel clustering package.</p>
|
564 |
Structural characterization of overrepresentedLysholm, Fredrik January 2008 (has links)
<p>Background: Through the last decades vast amount of sequence information have been produced by various protein sequencing projects, which enables studies of sequential patterns. One of the bestknown efforts to chart short peptide sequences is the Prosite pattern data bank. While sequential patterns like those of Prosite have proved very useful for classifying protein families, functions etc. structural analysis may provide more information and possible crucial clues linked to protein folding. Today PDB, which is the main repository for protein structure, contains more than 50’000 entries which enables structural protein studies.</p><p>Result: Strongly folded pentapeptides, defined as pentapeptides which retained a specific conformation in several significantly structurally different proteins, were studied out of PDB. Among these several groups were found. Possibly the most well defined is the “double Cys” pentapeptide group, with two amino acids in between (CXXCX|XCXXC) which were found to form backbone loops where the two Cysteine amino acids formed a possible Cys-Cys bridge. Other structural motifs were found both in helixes and in sheets like "ECSAM" and "TIKIW", respectively.</p><p>Conclusion: There is much information to be extracted by structural analysis of pentapeptides and other oligopeptides. There is no doubt that some pentapeptides are more likely to obtain a specific fold than others and that there are many strongly folded pentapeptides. By combining the usage of such patterns in a protein folding model, such as the Hydrophobic-polar-model improvements in speed and accuracy can be obtained. Comparing structural conformations for important overrepresented pentapeptides can also help identify and refine both structural information data banks such as SCOP and sequential pattern data banks such as Prosite.</p>
|
565 |
Computational analyses of biological sequences -applications to antibody-based proteomics and gene family characterizationLindskog, Mats January 2005 (has links)
Following the completion of the human genome sequence, post-genomic efforts have shifted the focus towards the analysis of the encoded proteome. Several different systematic proteomics approaches have emerged, for instance, antibody-based proteomics initiatives, where antibodies are used to functionally explore the human proteome. One such effort is HPR (the Swedish Human Proteome Resource), where affinity-purified polyclonal antibodies are generated and subsequently used for protein expression and localization studies in normal and diseased tissues. The antibodies are directed towards protein fragments, PrESTs (Protein Epitope Signature Tags), which are selected based on criteria favourable in subsequent laboratory procedures. This thesis describes the development of novel software (Bishop) to facilitate the selection of proper protein fragments, as well as ensuring a high-throughput processing of selected target proteins. The majority of proteins were successfully processed by this approach, however, the design strategy resulted in a number ofnfall-outs. These proteins comprised alternative splice variants, as well as proteins exhibiting high sequence similarities to other human proteins. Alternative strategies were developed for processing of these proteins. The strategy for handling of alternative splice variants included the development of additional software and was validated by comparing the immunohistochemical staining patterns obtained with antibodies generated towards the same target protein. Processing of high sequence similarity proteins was enabled by assembling human proteins into clusters according to their pairwise sequence identities. Each cluster was represented by a single PrEST located in the region of the highest sequence similarity among all cluster members, thereby representing the entire cluster. This strategy was validated by identification of all proteins within a cluster using antibodies directed to such cluster specific PrESTs using Western blot analysis. In addition, the PrEST design success rates for more than 4,000 genes were evaluated. Several genomes other than human have been finished, currently more than 300 genomes are fully sequenced. Following the release of the tree model organism black cottonwood (Populus trichocarpa), a bioinformatic analysis identified unknown cellulose synthases (CesAs), and revealed a total of 18 CesA family members. These genes are thought to have arisen from several rounds of genome duplication. This number is significantly higher than previous studies performed in other plant genomes, which comprise only ten CesA family members in those genomes. Moreover, identification of corresponding orthologous ESTs belonging to the closely related hybrid aspen (P. tremula x tremuloides) for two pairs of CesAs suggest that they are actively transcribed. This indicates that a number of paralogs have preserved their functionalities following extensive genome duplication events in the tree’s evolutionary history. / QC 20101021
|
566 |
Rare Sidechain Conformations in Proteins and DNAHintze, Bradley Joel January 2015 (has links)
<p>Medical advances often come as a result of understanding the underlying mechanisms of life. Life, in this sense, happens at various scales. A very complex and interesting one is the molecular scale. Understanding life’s mechanistic details at this level will provide the most promising therapies to modern ailments. Because of structure and function’s close relationship, knowledge of macromolecular structure provides invaluable insight into molecular mechanism.</p><p>A major tool used to get structural information at the molecular scale is X-ray crystallography. Such experiments result in an electron density map from which a model is built. Building such a model is a difficult task, especially at low resolu- tion where detailed features in the electron density deteriorate making it difficult to interpret. However, many advances in the field have greatly eased the model build- ing task, in fact, at high resolutions it has become automated. However, human inspection is still required to get a correct solution.</p><p>The largest boon to model building has been the application of structural knowl- edge. A prominent example is bond and dihedral angles. We often know what is absolutely not allowed and often convince ourselves we know everything that is al- lowed. This work focuses on the fuzzy border between allowed and disallowed. The hypothesis is that rare structural conformations exist but one needs to take great care in modeling them.</p><p>This work has two major components – rotamers (protein sidechain conformation)</p><p>and Hoogsteen base pairing in DNA. I first describe methods used to gain empirical knowledge about rotamers and how that knowledge is used in model validation. Part of this knowledge is rotamer-dependent bond angle deviations. I describe how the observation and quantitation of these deviations is used in a novel set of restraints in protein structure refinement. To provide structural context to rare rotamers, I describe where and why some occur.</p><p>My DNA work has focused on Hoogsteen base pairing. I describe a collaborative survey of existing Hoogsteen base pairs in the PDB. Lessons learned during the survey led to the other DNA topic, the detection and correction of mismodeled purines. I identified Hoogsteens in the PDB mismodeled as Watson-Crick base pairs. This work underscores that Hoogsteens are extremely rare but nonetheless do occur.</p><p>The fuzzy borderland between allowed and disallowed is a strange place filled with the most interesting structural features. My work here has focused on this area, bringing into view many rare conformations. Going forward we need to ensure that conformational frequency is taken into account during model building, refinement, and validation.</p> / Dissertation
|
567 |
Clustering biological data using a hybrid approach : Composition of clusterings from different featuresKeller, Jens January 2008 (has links)
Clustering of data is a well-researched topic in computer sciences. Many approaches have been designed for different tasks. In biology many of these approaches are hierarchical and the result is usually represented in dendrograms, e.g. phylogenetic trees. However, many non-hierarchical clustering algorithms are also well-established in biology. The approach in this thesis is based on such common algorithms. The algorithm which was implemented as part of this thesis uses a non-hierarchical graph clustering algorithm to compute a hierarchical clustering in a top-down fashion. It performs the graph clustering iteratively, with a previously computed cluster as input set. The innovation is that it focuses on another feature of the data in each step and clusters the data according to this feature. Common hierarchical approaches cluster e.g. in biology, a set of genes according to the similarity of their sequences. The clustering then reflects a partitioning of the genes according to their sequence similarity. The approach introduced in this thesis uses many features of the same objects. These features can be various, in biology for instance similarities of the sequences, of gene expression or of motif occurences in the promoter region. As part of this thesis not only the algorithm itself was implemented and evaluated, but a whole software also providing a graphical user interface. The software was implemented as a framework providing the basic functionality with the algorithm as a plug-in extending the framework. The software is meant to be extended in the future, integrating a set of algorithms and analysis tools related to the process of clustering and analysing data not necessarily related to biology. The thesis deals with topics in biology, data mining and software engineering and is divided into six chapters. The first chapter gives an introduction to the task and the biological background. It gives an overview of common clustering approaches and explains the differences between them. Chapter two shows the idea behind the new clustering approach and points out differences and similarities between it and common clustering approaches. The third chapter discusses the aspects concerning the software, including the algorithm. It illustrates the architecture and analyses the clustering algorithm. After the implementation the software was evaluated, which is described in the fourth chapter, pointing out observations made due to the use of the new algorithm. Furthermore this chapter discusses differences and similarities to related clustering algorithms and software. The thesis ends with the last two chapters, namely conclusions and suggestions for future work. Readers who are interested in repeating the experiments which were made as part of this thesis can contact the author via e-mail, to get the relevant data for the evaluation, scripts or source code.
|
568 |
Loop Prediction and Homology Modeling with High ResolutionXu, Tianchuan January 2020 (has links)
Three-dimensional (3D) structure of a protein is essential as the guidance of structure-based drug dis-covery. To achieve robust homology modeling with atomic-level accuracy, reliable loop predictions are required. Here, a novel hierarchical protocol of Protein Local Optimization Program (PLOP) is designed to produce sub-2 angstrom predictions on loop regions in homology modeling. Dramatic improvements in both speed and accuracy have been realized with implementation of special-designed clustering and adaptive loop closure algorithm. Four prediction rounds are designed for homology modeling as the high-level protocol of PLOP, which allows latter rounds employ the educated guess of backbone atom positions and hydrogen bonding information inherited from the previous rounds, contributing to additional prediction accuracy. The success of PLOP has been demonstrated with four different data sets, mainly concen-trating on homology modeling of H3 loops of antibodies. GPU-accelerated sampling algorithm and deep learning models are implemented, which are able to produce promising predictions as input templates for PLOP in the context of homology modeling.
|
569 |
Contribution of Retrotransposons to Breast Cancer MalignancyRaplee, Isaac D. 25 April 2019 (has links)
The components contributing to cancer progression, especially the transition from early to invasive are unknown. Consequently, the biological reasons are unclear as to why some patients diagnosed with atypia and ductal carcinoma in situ (DCIS) never progress into invasive breast cancer. The “one gene at a time” approach does not sufficiently predict progression. To elucidate the early stage progression to invasive ductal cancer, expression signature of transcripts and transposable elements in micropunched samples of formalin-fixed, paraffin embedded (FFPE) tissue was conducted. A bioinformatics pipeline to analyze poor quality, short reads (>36 nts) from RNA-Seq data was created to compare the most common tools for alignment and differential expression. Most samples from patients prepared for RNA-seq analysis are acquired through archived FFPE tissue collections, which have low RNA quality. The pipeline analytics revealed that STAR alignment software outperformed others. Furthermore, our comparison revealed both DESeq2 and edgeR, with the estimateDisp function applied, both perform well when analyzing greater than 12 replicates. Transcriptome analysis revealed progressive diversification into known oncogenic pathways, a few novel biochemical pathways, in addition to antiviral and interferon activation. Furthermore, the transposable element (TE) signature during breast cancer progression at early stages indicated long terminal repeat (LTRs) as the most abundantly differentially expressed TEs. LTRs belong to endogenous retroviruses (ERV), a subclass of TEs. The retroviral and innate immune response activity in DCIS, which indirectly corroborates the increase in ERV expression in this pre-malignant stage. Finally, to demonstrate the potential role of TEs in the transition from pre-malignant to malignant breast cancer we used pharmacological approaches to alter global TE expression and inhibit retrotransposition activity in control and breast cancer cell lines. It was expected that dysregulation of TEs be associated with increased invasiveness and growth. However, our results indicated that DNA methyltransferase inhibitor 5-Azacytidine (AZA) consistently retarded cell migration and growth. While unexpected, these findings corroborate recent studies that AZA may induce an interferon response in cancer via increased ERV expression. This body of work illustrates the importance of understanding bioinformatics methods used in RNA-seq analysis of common clinical samples. These studies suggest the potential for TEs as biomarkers for disease progression and novel therapeutic approach to investigate in additional model systems.
|
570 |
Tissue-dependent analysis of common and rare genetic variants for Alzheimer's disease using multi-omics dataPatel, Devanshi 21 January 2021 (has links)
Alzheimer’s disease (AD) is a complex neurodegenerative disease characterized by progressive memory loss and caused by a combination of genetic, environmental, and lifestyle factors. AD susceptibility is highly heritable at 58-79%, but only about one third of the AD genetic component is accounted for by common variants discovered through genome-wide association studies (GWAS). Rare variants may contribute to some of the unexplained heritability of AD and have been demonstrated to contribute to large gene expression changes across tissues, but conventional analytical approaches pose challenges because of low statistical power even for large sample sizes. Recent studies have demonstrated by expression quantitative trait locus (eQTL) analysis that changes in gene expression could play a key role in the pathogenesis of AD. However, regulation of gene expression has been shown to be context-specific (e.g., tissue and cell-types), motivating a context dependent approach to achieve more precise and statistically significant associations. To address these issues, I applied a strategy to identify new AD risk or protective rare variants by examining mutations occurring only in cases or only controls, observing that different mutations in the same gene or variable dose of a mutation may result in distinct dementias. I also evaluated the impact of rare variation on expression at the gene and gene pathway levels in blood and brain tissue, further strengthening the rare variant findings with functional evidence and finding evidence for a large immune and inflammatory component to AD. Lastly, I identified cell-type specific eQTLs in blood and brain tissue to explain underlying genetic associations of common variants in AD, and also discovered additional evidence for the role of myeloid cells in AD risk and potential novel blood and brain AD biomarkers. Collectively, these findings further explain the genetic basis of AD risk and provide insight about mechanisms leading to this disorder. / 2022-01-21T00:00:00Z
|
Page generated in 0.0621 seconds