Return to search

基於資料科學方法之巨量蛋白質功能預測 / Applying Data Science to High-throughput Protein Function Prediction

自人體基因組計畫與次世代定序的完成後,生物資料呈現爆炸性的成長,其中蛋白質序列也是大量發現的基因產物之一,然而蛋白質的功能檢測與標記極其耗時,因此存在大量已知序列卻不知其功能的蛋白質,在實驗前透過電腦先預測可能之功能,能夠幫助生物學家排定不同的蛋白質功能實驗順序,因而加快蛋白質功能標注的速度。基因本體論(GO)是一個被廣泛使用描述基因產物功能與性質的分類方法,分為生物途徑、細胞組件、分子功能三個分支,每個分支皆為一個由多個GO組成的階層樹。蛋白質功能預測為透過蛋白質序列預測該蛋白質所擁有的GO,因此可以視為一個多標籤的分類機器學習問題。我們提出一個基於序列同源性的機器學習預測框架,同時能夠結合蛋白質家族的資訊,並設計多種不同的投票方法解決多標籤的預測問題。 / Biological data has grown explosively with the accomplishment of Human Genome Project and Next-generation sequencing. Annotating protein function with wet lab experiment is time-consuming, so many proteins’ functions are still unknown. Fortunately, computational function prediction can help wet lab formulate biological hypotheses and prioritize experiments. Gene Ontology (GO) is the framework for unifying the representation of gene function and classifying these functions into three domains namely, Biological Process Ontology, Cellular Component Ontology, and Molecular Function Ontology. Each domain is a hierarchical tree composed of labels known as GO terms. Protein function prediction can be considered as a multiple label classification problem, i.e., given a protein sequence, predict its GO terms. We proposed a machine learning framework to predict protein function based on its homology sequence structure, which is believed to contain protein family information and designed various voting mechanisms to resolve the multiple label prediction problem.

Identiferoai:union.ndltd.org:CHENGCHI/G0104753013
Creators劉義瑋, Liu, Yi-Wei
Publisher國立政治大學
Source SetsNational Chengchi University Libraries
Language英文
Detected LanguageEnglish
Typetext
RightsCopyright © nccu library on behalf of the copyright holders

Page generated in 0.0022 seconds