Global ETD Search

1	Towards improving the accuracy of GenTHREADER alignments Tress, Michael January 2002 (has links) No description available. 547 Remotely related protein sequences
2	Construction of Distributed Method for Analyzing a Large Number of Sequence Data: Using Influenza A Virus Protein Sequences as Examples Tu, Guo-Hua 01 November 2010 (has links) Abstract Analyzing the eight genomic protein segments of influenza A virus could provide a better understanding of this specific virus. Along with the progress of computer technology, numerous influenza A virus protein sequences were available in various internet data banks. However, analyzing a large number of protein sequences is a cumbersome work. Thus it is necessary to develop new tools with algorithmic method. This study used distributed method to develop a protein sequence clustering analysis software by JAVA programming language. The software could split a large number of protein sequences downloaded from NCBI into several files. Because of these individual files were calculated at the same time, therefore it could reduce the time in process of comparison and analysis. Finally, we used PRIMER 5 program to analyze these individual files and produce similarity analysis chart diagrams of MDS and UPGMA. In The similarity analysis chart diagrams indicated high homology in genomic protein segments of influenza A virus from year 1997 to 2006. The analysis also showed the genomic protein segments of influenza A virus are similar in Asia countries. However, the similarity between Asian countries and China is not significant. From analyzing the hosts, the genomic protein segments of influenza A virus are highly similar in species such like birds, chickens, ducks and pigs. Therefore, our data strongly support that the possibility of influenza A viruses can cross species to infect humans. Influenza A Virus Protein Sequences Distributed
3	Assessing the Role of Clusters Derived from Large Sequence Similarity Networks for Gene Function Predictions Vora, Parth Harish 29 May 2020 (has links) Large scale genomic sequencing efforts have resulted in a massive inflow of raw sequence data. This raw data, when appropriately processed and analyzed, can provide insight to a trained biologist and aid in hypothesis-driven research. Given the time and resource requirements necessary for biological experiments, computational predictions of gene functions can aid in reducing a large list of candidate genes to a few promising targets. Various computational solutions have been proposed and developed for gene function prediction. These solutions utilize various forms of data, such as DNA/RNA/protein sequences, protein structures, interaction networks, literature mining, and a combination of these data sources. However, these methods do not always produce precise results as the underlying data sets used for training or modeling are quite sparse. We developed and used a massive sequence similarity network build over 108 million known protein sequences to aid in protein function prediction. Predictions are made through the alignment of query sequences to representative sequences for a given cluster derived from the massive sequence similarity network. Derived clusters aggregate information (particularly that from the Gene Ontology) from respective members, which we then consolidate through a novel weighted path method. We evaluate our method on four holdout datasets using CAFA evaluation metrics. Our results suggest that clustering significantly reduces the time and memory requirements, with a marginal impact on predictive power. At lower sequence similarity thresholds, our method outperforms other gold standard methods. / Master of Science / We often think of a protein as a nutritional requirement. However, proteins are far more than just food, they play countless and unappreciated roles in facilitating life. From transporting nutrients in the body, synthesis of hormones, functioning as enzymes to expediting chemical reactions, serving as the scaffold for cells and tissues, to protecting the body against foreign pathogens. On a molecular level, each protein is made up of chains of 20 different amino acids, just like a chain of beads, that are then folded to create a 3-dimensional structure. The variations in the ordering of amino acids result in different types of proteins. There are millions of genes across known life, and they perform different functions when translated into proteins. Nature has given us many proteins with interesting properties, and the low cost of sequencing their precursors (DNA) has resulted in large amounts of sequence data that is not yet associated with a function. Biological experiments to determine the function of a protein can be time consuming and expensive. We built a massive network encompassing 108 million protein sequences based on sequence similarity. This ensures that we make use of as much data as possible to make better predictions. Specifically, our work focuses on utilizing this information of similar proteins to aid in predicting the functions of a protein given its sequences. It is based on the idea of guilt by association, such that if two proteins are similar in sequences, they perform similar functions. We show that using computationally efficient methods and large datasets, one can achieve fast and highly precise predictions. bioinformatics computational biology protein sequences
4	Computer analysis of molecular sequences Parsons, Jeremy David January 1993 (has links) No description available. 572.8 DNA sequence data; Protein sequences
5	Studies on L-Lactate dehydrogenase genes and protein structure of Iguana iguana and molecular phylogenetics among reptiles Hsu, Che-Hsiung 05 July 2002 (has links) L-Lactate dehydrogenases (LDH) are ¡§house keeping¡¨ enzymes. The LDH isozymes are known to be a very stable, slow-evolving and suitable to be a model to elucidate the phylogenetic relationships among various species. The cDNA sequences of LDH-A4 (muscle) and LDH-B4 (heart) from green iguana (Iguana iguana) were determined. The results of isozyme electrophoresis demonstrated that there are two isoforms of Iguana iguana LDH-B isozyme and called LDH-B and LDH-b. The protein structures of LDH-B and LDH-b monomer were constructed, and these two proteins have the same structure except the 5 different amino acids. The phylogenetic relationships between green iguana and other vertebrates, whose LDH cDNA sequences published previously, were analyzed by phylogenetic tree construction methods Mega2 program as well as Phylip program. These results on the relationships among lizards indicate that Iguana iguana is closer to Sceloporus woodi than to Sceloporus undulates. The divergent times between Iguana iguana and Sceloporus woodi, Iguana iguana and Sceloporus undulatus were estimated about 98 and 188 million years, respectively. The sequences of mitochondrial DNA(12S, 16S and ND1) among these three lizards were also analyzed, and the results were consistent with the traditional phylogenetics that species in the same genus were closer. The unexpected relationship that different genus of Iguana iguana and Sceloporus woodi is closer than that within same genus of Sceloporus woodi and Sceloporus undulates by analyzing among vertebrates LDH isozymes remained to be further confirmed. protein sequences evolution iguana reptile LDH
6	Bioinformatics and Handwriting/Speech Reconition: Uncoventional Applications of Similarity Search Tools Jensen, Kyle, Stephanopoulos, Gregory 01 1900 (has links) This work introduces two unconventional applications for sequence alignment algorithms outside the domain of bioinformatics: handwriting recognition and speech recognition. In each application we treated data samples, such as the path of a and written pen stroke, as a protein sequence and use the FastA sequence alignment tool to classify unknown data samples, such as a written character. That is, we handle the handwriting and speech recognition problems like the protein annotation problem: given a sequence of unknown function, we annotate the sequence via sequence alignment. This approach achieves classification rates of 99.65% and 93.84% for the handwriting and speech recognition respectively. In addition, we provide a framework for applying sequence alignment to a variety of other non–traditional problems. / Singapore-MIT Alliance (SMA) Machine learning bioinformatics amino acids protein sequences sequence alignment FastA voice dynamic programming handwriting
7	Sequence Analysis And Design Of Immunogens From The Stem Domain Of Influenza Hemagglutinin Bommakanti, Gayathri 07 1900 (has links) (PDF) Influenza is an important respiratory pathogen that infects several million people each year. Currently available flu vaccines have to be updated regularly in order to be effective as the virus changes its composition by antigenic drift and shift. Most of the antibody response generated by these vaccines is strain specific as it is directed against the head domain (HA1) of HA. The HA2 subunit of hemagglutinin is highly conserved and immunogens designed from this subunit are likely to provide protection against multiple strains of the virus. However, expression of HA2 alone in the absence of HA1 resulted in a protein that took up the low pH conformation of HA. Our goal was to design immunogens from HA2 that would fold into the neutral pH form. Sequence analysis of a large number of HA protein sequences was carried out to identify conserved and exposed regions on HA. Several peptide and protein constructs were designed from the stem region of HA. These proteins were expressed in bacteria and purified proteins were used to immunize mice. Immunized mice were challenged with a lethal dose of virus to test for efficacy of the immunogen. Using this approach, stem domain constructs of HA were successfully designed and shown to take up the neutral pH form. These immunogens were also shown to be capable of providing broad range protection. Residues involved in the low pH induced conformational change of HA were identified from studies on HA2 derived peptides. Influenza Vaccine Immunity Influenza Hemagglutinin Respiratory Infection Stem Cell Research Immunogens Hemagglutinin Protein Sequences Immunology
8	Integrating protein annotations for the in silico prioritization of putative drug target proteins in malaria Mpangase, Phelelani Thokozani 15 May 2013 (has links) Current anti-malarial methods have been effective in reducing the number of malarial cases. However, these methods do not completely block the transmission of the parasite. Research has shown that repeated use of the current anti-malarial drugs, which include artemisinin-based drug combinations, might be toxic to humans. There have also been reports of an emergence of artemisinin-resistant parasites. Finding anti-malarial drugs through the drug discovery process takes a long time and failure results in a great financial loss. The failure of drug discovery projects can be partly attributed to the improper selection of drug targets. There is thus a need for an eff ective way of identifying and validating new potential malaria drug targets for entry into the drug discovery process. The availability of the genome sequences for the Plasmodium parasite, human host and the Anopheles mosquito vector has facilitated post-genomic studies on malaria. Proper utilizationof this data, in combination with computational biology and bioinformatics techniques, could aid in the in silico prioritization of drug targets. This study was aimed at extensively annotating the protein sequences from the Plasmodium parasites, H. sapiens and A. gambiae with data from di fferent online databases in order to create a resource for the prioritization of drug targets in malaria. Essentiality, assay feasibility, resistance, toxicity, structural information and druggability were the main target selection criteria which were used to collect data for protein annotations. The data was used to populate the Discovery resource (http://malport. bi.up.ac.za/) for the in silico prioritization of potential drug targets. A new version of the Discovery system, Discovery 2.0 (http://discovery.bi.up.ac.za/), has been developed using Java. The system contains new and automatically updated data as well as improved functionalities. The new data in Discovery 2.0 includes UniProt accessions, gene ontology annotations from the UniProt-GOA project, pathways from Reactome and Malaria Parasite Metabolic Pathways databases, protein-protein interactions data from. IntAct as well as druggability data from the DrugEBIlity resource hosted by ChEMBL. Users can access the data by searching with a protein identi er, UniProt accession, protein name or through the advanced search which lets users filter protein sequences based on different protein properties. The results are organized in a tabbed environment, with each tab displaying different protein annotation data. A sample investigation using a previously proposed malarial target, S-adenosyl-Lhomocysteine hydrolase, was carried out to demonstrate the diff erent categories of data available in Discovery 2.0 as well as to test if the available data is su fficient for assessment and prioritization of drug targets. The study showed that using the annotation data in Discovery 2.0, a protein can be assessed, in a species comparative manner, on the potential of being a drug target based on the selection criteria mentioned here. However, supporting data from literature is also needed to further validate the findings. / Dissertation (MSc)--University of Pretoria, 2012. / Biochemistry / unrestricted Anti-malarial drugs Plasmodium parasites H. sapiens A. gambiae Protein sequences UCTD
9	MoRFs A Dataset of Molecular Recognition Features Mohan, Amrita 26 July 2006 (has links) Submitted to the faculty of the Bioinformatics Graduate Program in partial fulfillment of the requirements for the degree Master of Science in the School of Informatics, Indiana University December 2005 / The last decade has witnessed numerous proteomic studies which have predicted and successfully confirmed the existence of extended structurally flexible regions in protein molecules. Parallel to these advancements, the last five years of structural bioinformatics has also experienced an explosion of results on molecular recognition and its importance in protein-protein interactions. This work provides an extension to past and ongoing research efforts by looking specifically at the â€œflexibility and disorderâ€ found in protein sequences involved in molecular recognition processes and known as, Molecular Recognition Elements or Molecular Recognition Features (MoREs or MoRFs, as we call them). MoRFs are relatively short in length (10 â€“ 70 residues length); loosely structured protein regions within longer sequences that are largely disordered in nature. Interestingly, upon binding to other proteins, these MoRFs are able to undergo disorder-to-order transition. Thus, in our interpretation, MoRFs could serve as potential binding sites, and that this binding to another protein lends a functional advantage to the whole protein complex by enabling interaction with their physiological partner. There are at least three basic types of MoRFs: those that form Î±-helical structures upon binding, those that form Î²-strands (in which the peptide forms a Î²-sheet with additional Î²-strands provided by the protein partner), and those that form irregular structures when bound. Our proposed names for these structures are Î±-MoRF (also known as Î±-MoRE, alpha helical molecular recognition feature/element), Î²-MoRF (beta sheet molecular recognition feature/element), and I-MoRF (Irregular molecular recognition feature/element), respectively. The results presented in this work suggest that functionally significant residual structure can exist in MoRF regions prior to the actual binding event. We also demonstrate profound conformational preferences within MoRF regions for Î±-helices. We believe that the results from this study would subsequently improve our understanding of protein-protein interactions especially those related to the molecular recognition, and may pave way for future work on the development of protein binding site predictions. We hope that via the conclusions of this work, we would have demonstrated that within only a few of years of its conception, intrinsic protein disorder has gained wide-scale importance in the field of protein-protein interactions and can be strongly associated with molecular recognition. protein sequences Molecular Recognition Features Molecular Recognition Elements protein-protein interactions intrinsic protein disorder bioinformatics
10	STUDY OF THE RELATIONSHIP BETWEEN Mus musculus PROTEIN SEQUENCES AND THEIR BIOLOGICAL FUNCTIONS Seth, Pawan 08 August 2007 (has links) No description available. sequence pairs PROTEIN PROTEIN SEQUENCES print EV No count ontology SEQUENCES

Search results