Return to search

Large Scale Parallel Inference of Protein and Protein Domain families

Protein domains are recurring independent segment of proteins. The combinatorial arrangement of domains is at the root of the functional and structural diversity of proteins. Several methods have been developed to infer protein domain decomposition and domain family clustering from sequence information alone. MkDom2 is one of those methods. Mkdom2 infers domain families in a greedy fashion. Families are inferred one after the other in order to create a delineation of domains on proteins and a clustering of those domains in families. MkDom2 is instrumental in the building of the ProDom database. The exponential growth of the number of sequences to process as rendered MkDom2 obsolete, it would now take several years to compute a newrelease of ProDom. We present a nous algorithm, MPI_MkDom2, allowing computation of several families at once across a distributed computing platform. MPI_MkDom2 is an asynchronous distributed algorithm managing load balancing to ensure efficient platform usage; it ensures the creation of a non-overlapping partitioning of the whole protein set. A new proximity measure is defined to assess the effect of the parallel computation on the result. We also Propose a second algorithm, MPI_mkDom3, allowing the simultaneous computation of a clustering of protein domains as well as full protein sharing the same domain decomposition.

Identiferoai:union.ndltd.org:CCSD/oai:tel.archives-ouvertes.fr:tel-00682495
Date28 September 2011
CreatorsRezvoy, Clément
PublisherEcole normale supérieure de lyon - ENS LYON
Source SetsCCSD theses-EN-ligne, France
LanguageEnglish
Detected LanguageEnglish
TypePhD thesis

Page generated in 0.0016 seconds