Global ETD Search

1	Modeling Evolutionary Constraints and Improving Multiple Sequence Alignments using Residue Couplings Hossain, K.S.M. Tozammel 16 November 2016 (has links) Residue coupling in protein families has received much attention as an important indicator toward predicting protein structures and revealing functional insight into proteins. Existing coupling methods identify largely pairwise couplings and express couplings over amino acid combinations, which do not yield a mechanistic explanation. Most of these methods primarily use a multiple protein sequence alignment---most likely a resultant alignment---which better exposes couplings and is obtained through manual tweaking of an alignment constructed by a classical alignment algorithm. Classical alignment algorithms primarily focus on capturing conservations and may not fully unveil couplings in the alignment. In this dissertation, we propose methods for capturing both pairwise and higher-order couplings in protein families. Our methods provide mechanistic explanations for couplings using physicochemical properties of amino acids and discernibility between orders. We also investigate a method for mining frequent episodes---called coupled patterns---in an alignment produced by a classical algorithm for proteins and for exploiting the coupled patterns for improving the alignment quality in terms of exposition of couplings. We demonstrate the effectiveness of our proposed methods on a large collection of sequence datasets for protein families. / Ph. D. / Proteins are biomolecules that comprise amino acid compounds. A chain of amino acid (a.k.a. protein sequence) forms the primary structure of a protein, and the shaping of this chain into various folds gives rise to a more complex 3D structure, a natural state of proteins. It is through structures protein performs various activities. To preserve these activities in proteins, evolution allows only those changes in protein sequences that do not disrupt the overall structures and functions of proteins. Coupling is a evolutionary phenomenon that helps proteins preserve their structures and functions. Two or more amino acid positions are coupled if changes of amino acids at a position is compensated by changes in the other position(s). In this thesis, we propose a set of probabilistic methods for modeling such couplings between two or more positions. Our methods identify the most probable couplings in a set of protein sequences and express them with probabilistic graphical models (a powerful and interpretable framework), which can be used for answering questions related to protein structures, functions, and protein synthesis. Using this notion of coupling, we also develop a method for improving the quality of multiple protein sequence alignment, a widely used tool for protein sequence analyses. We evaluate our methods with a large collection of sequence datasets for protein families, and the results substantiate the efficacy of our methods. residue coupling multiple sequence alignment graphical models pattern set mining
2	Méthodes hybrides parallèles pour la résolution de problèmes d'optimisation combinatoire : application au clustering sous contraintes / Parallel hybrid methods for solving combinatorial optimization problems : application to clustering under constraints Ouali, Abdelkader 03 July 2017 (has links) Les problèmes d’optimisation combinatoire sont devenus la cible de nombreuses recherches scientifiques pour leur importance dans la résolution de problèmes académiques et de problèmes réels rencontrés dans le domaine de l’ingénierie et dans l’industrie. La résolution de ces problèmes par des méthodes exactes ne peut être envisagée à cause des délais de traitement souvent exorbitants que nécessiteraient ces méthodes pour atteindre la (les) solution(s) optimale(s). Dans cette thèse, nous nous sommes intéressés au contexte algorithmique de résolution des problèmes combinatoires, et au contexte de modélisation de ces problèmes. Au niveau algorithmique, nous avons appréhendé les méthodes hybrides qui excellent par leur capacité à faire coopérer les méthodes exactes et les méthodes approchées afin de produire rapidement des solutions. Au niveau modélisation, nous avons travaillé sur la spécification et la résolution exacte des problématiques complexes de fouille des ensembles de motifs en étudiant tout particulièrement le passage à l’échelle sur des bases de données de grande taille. D'une part, nous avons proposé une première parallélisation de l'algorithme DGVNS, appelée CPDGVNS, qui explore en parallèle les différents clusters fournis par la décomposition arborescente en partageant la meilleure solution trouvée sur un modèle maître-travailleur. Deux autres stratégies, appelées RADGVNS et RSDGVNS, ont été proposées qui améliorent la fréquence d'échange des solutions intermédiaires entre les différents processus. Les expérimentations effectuées sur des problèmes combinatoires difficiles montrent l'adéquation et l'efficacité de nos méthodes parallèles. D'autre part, nous avons proposé une approche hybride combinant à la fois les techniques de programmation linéaire en nombres entiers (PLNE) et la fouille de motifs. Notre approche est complète et tire profit du cadre général de la PLNE (en procurant un haut niveau de flexibilité et d’expressivité) et des heuristiques spécialisées pour l’exploration et l’extraction de données (pour améliorer les temps de calcul). Outre le cadre général de l’extraction des ensembles de motifs, nous avons étudié plus particulièrement deux problèmes : le clustering conceptuel et le problème de tuilage (tiling). Les expérimentations menées ont montré l’apport de notre proposition par rapport aux approches à base de contraintes et aux heuristiques spécialisées. / Combinatorial optimization problems have become the target of many scientific researches for their importance in solving academic problems and real problems encountered in the field of engineering and industry. Solving these problems by exact methods is often intractable because of the exorbitant time processing that these methods would require to reach the optimal solution(s). In this thesis, we were interested in the algorithmic context of solving combinatorial problems, and the modeling context of these problems. At the algorithmic level, we have explored the hybrid methods which excel in their ability to cooperate exact methods and approximate methods in order to produce rapidly solutions of best quality. At the modeling level, we worked on the specification and the exact resolution of complex problems in pattern set mining, in particular, by studying scaling issues in large databases. On the one hand, we proposed a first parallelization of the DGVNS algorithm, called CPDGVNS, which explores in parallel the different clusters of the tree decomposition by sharing the best overall solution on a master-worker model. Two other strategies, called RADGVNS and RSDGVNS, have been proposed which improve the frequency of exchanging intermediate solutions between the different processes. Experiments carried out on difficult combinatorial problems show the effectiveness of our parallel methods. On the other hand, we proposed a hybrid approach combining techniques of both Integer Linear Programming (ILP) and pattern mining. Our approach is comprehensive and takes advantage of the general ILP framework (by providing a high level of flexibility and expressiveness) and specialized heuristics for data mining (to improve computing time). In addition to the general framework for the pattern set mining, two problems were studied: conceptual clustering and the tiling problem. The experiments carried out showed the contribution of our proposition in relation to constraint-based approaches and specialized heuristics. Méthodes parallèles Décomposition arborescente Réseaux de fonctions de coût Modèle maître-travailleur Clustering conceptuel Problème de tuilage Contraintes n-airs Heuristiques Extraction des ensembles de motifs Cadre déclaratif Metaheuristics Variable neighborhood search method Parallel methods Tree decomposition Combinatorial optimization problem Cost function networks Master-worker model Conceptual clustering Tiling problem N-ary constraints Nteger linear programming Heuristics Pattern mining, pattern set mining Declarative framework

Search results

Modeling Evolutionary Constraints and Improving Multiple Sequence Alignments using Residue Couplings

Méthodes hybrides parallèles pour la résolution de problèmes d'optimisation combinatoire : application au clustering sous contraintes / Parallel hybrid methods for solving combinatorial optimization problems : application to clustering under constraints