Spelling suggestions: "subject:"biology -- data processing -- 3research"" "subject:"biology -- data processing -- 1research""
1 |
Optimizing hydropathy scale to improve IDP prediction and characterizing IDPs' functionsHuang, Fei January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Intrinsically disordered proteins (IDPs) are flexible proteins without defined 3D structures. Studies show that IDPs are abundant in nature and actively involved in numerous biological processes. Two crucial subjects in the study of IDPs lie in analyzing IDPs’ functions and identifying them. We thus carried out three projects to better understand IDPs. In the 1st project, we propose a method that separates IDPs into different function groups. We used the approach of CH-CDF plot, which is based the combined use of two predictors and subclassifies proteins into 4 groups: structured, mixed, disordered, and rare. Studies show different structural biases for each group. The mixed class has more order-promoting residues and more ordered regions than the disordered class. In addition, the disordered class is highly active in mitosis-related processes among others. Meanwhile, the mixed class is highly associated with signaling pathways, where having both ordered and disordered regions could possibly be important. The 2nd project is about identifying if an unknown protein is entirely disordered. One of the earliest predictors for this purpose, the charge-hydropathy plot (C-H plot), exploited the charge and hydropathy features of the protein. Not only is this algorithm simple yet powerful, its input parameters, charge and hydropathy, are informative and readily interpretable. We found that using different hydropathy scales significantly affects the prediction accuracy. Therefore, we sought to identify a new hydropathy scale that optimizes the prediction. This new scale achieves an accuracy of 91%, a significant improvement over the original 79%. In our 3rd project, we developed a per-residue C-H IDP predictor, in which three hydropathy scales are optimized individually. This is to account for the amino acid composition differences in three regions of a protein sequence (N, C terminus and internal). We then combined them into a single per-residue predictor that achieves an accuracy of 74% for per-residue predictions for proteins containing long IDP regions.
|
2 |
Context specific text mining for annotating protein interactions with experimental evidencePandit, Yogesh 03 January 2014 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / Proteins are the building blocks in a biological system. They interact with other proteins to make unique biological phenomenon. Protein-protein interactions play a valuable role in understanding the molecular mechanisms occurring in any biological system. Protein interaction databases are a rich source on protein interaction related information. They gather large amounts of information from published literature to enrich their data. Expert curators put in most of these efforts manually. The amount of accessible and publicly available literature is growing very rapidly. Manual annotation is a time consuming process. And with the rate at which available information is growing, it cannot be dealt with only manual curation. There need to be tools to process this huge amounts of data to bring out valuable gist than can help curators proceed faster. In case of extracting protein-protein interaction evidences from literature, just a mere mention of a certain protein by look-up approaches cannot help validate the interaction. Supporting protein interaction information with experimental evidence can help this cause. In this study, we are applying machine learning based classification techniques to classify and given protein interaction related document into an interaction detection method. We use biological attributes and experimental factors, different combination of which define any particular interaction detection method. Then using predicted detection methods, proteins identified using named entity recognition techniques and decomposing the parts-of-speech composition we search for sentences with experimental evidence for a protein-protein interaction. We report an accuracy of 75.1% with a F-score of 47.6% on a dataset containing 2035 training documents and 300 test documents.
|
Page generated in 0.0837 seconds