Global ETD Search

11	Type- and Workload-Aware Scheduling of Large-Scale Wide-Area Data Transfers Kettimuthu, Rajkumar 02 October 2015 (has links) No description available. Computer Science
12	Planejamento, gerenciamento e análise de dados de microarranjos de DNA para identificação de biomarcadores de diagnóstico e prognóstico de cânceres humanos / Planning, management and analysis of DNA microarray data aiming at discovery of biomarkers for diagnosis and prognosis of human cancers. Simões, Ana Carolina Quirino 12 May 2009 (has links) Nesta tese, apresentamos nossas estratégias para desenvolver um ambiente matemático e computacional para análises em larga-escala de dados de expressão gênica obtidos pela tecnologia de microarranjos de DNA. As análises realizadas visaram principalmente à identificação de marcadores moleculares de diagnóstico e prognóstico de cânceres humanos. Apresentamos o resultado de diversas análises implementadas através do ambiente desenvolvido, as quais conduziram a implementação de uma ferramenta computacional para a anotação automática de plataformas de microarranjos de DNA e de outra ferramenta destinada ao rastreamento da análise de dados realizada em ambiente R. Programação eXtrema (eXtreme Programming, XP) foi utilizada como técnica de planejamento e gerenciamento dos projetos de análise dados de expressão gênica. Todos os conjuntos de dados foram obtidos por nossos colaboradores, utilizando-se duas diferentes plataformas de microarranjos de DNA: a primeira enriquecida em regiões não-codificantes do genoma humano, em particular regiões intrônicas, e a segunda representando regiões exônicas de genes humanos. A primeira plataforma foi utilizada para avaliação do perfil de expressão gênica em tumores de próstata e rim humanos, sendo que análises utilizando SAM (Significance Analysis of Microarrays) permitiram a proposição de um conjunto de 49 sequências como potenciais biomarcadores de prognóstico de tumores de próstata. A segunda plataforma foi utilizada para avaliação do perfil de transcritos expressos em sarcomas, carcinomas epidermóide e carcinomas epidermóides de cabeça e pescoço. As análises com sarcomas permitiram a identificação de um conjunto de 12 genes relacionados à agressividade local e metástase. As análises com carcinomas epidermóides de cabeça e pescoço permitiram a identificação de 7 genes relacionados à metástase linfonodal. / In this PhD Thesis, we present our strategies to the development of a mathematical and computational environment aiming the analysis of large-scale microarray datasets. The analyses focused mainly on the identification of molecular markers for diagnosis and prognosis of human cancers. Here we show the results of several analyses implemented using this environment, which led to the development of a computational tool for automatic annotation of DNA microarray platforms and a tool for tracking the analysis within R environment. We also applied eXtreme Programming (XP) as a tool for planning and management of gene expression analyses projects. All data sets were obtained by our collaborators using two different microarray platforms. The first is enriched in non-coding human sequences, particularly intronic sequences. The second one represents exonic regions of human genes. Using the first platform, we evaluated gene expression profiles of prostate and kidney human tumors. Applying SAM to prostate tumor data revealed 49 potential molecular markers for prognosis of this disease. Gene expression in samples of sarcomas, epidermoid carcinomas and head and neck epidermoid carcinomas was investigated using the second platform. A set of 12 genes were identified as potential biomarkers for local aggressiveness and metastasis in sarcoma. In addition, the analyses of data obtained from head and neck epidermoid carcinomas allowed the identification of 7 potential biomarkers for lymph-nodal metastases. Análise de dados em larga escala Cancer Câncer Classificadores Classifiers DNA microarrays eXtreme Programming Large-scale data analysis Marcadores Moleculares Microarranjos de DNA Molecular Markers Programação eXtrema
13	Barcoded DNA Sequencing for Parallel Protein Detection Dezfouli, Mahya January 2015 (has links) The work presented in this thesis describes methodologies developed for integration and accurate interpretation of barcoded DNA, to empower large-scale-omics analysis. The objectives mainly aim at enabling multiplexed proteomic measurements in high-throughput format through DNA barcoding and massive parallel sequencing. The thesis is based on four scientific papers that focus on three main criteria; (i) to prepare reagents for large-scale affinity-proteomics, (ii) to present technical advances in barcoding systems for parallel protein detection, and (iii) address challenges in complex sequencing data analysis. In the first part, bio-conjugation of antibodies is assessed at significantly downscaled reagent quantities. This allows for selection of affinity binders without restrictions to accessibility in large amounts and purity from amine-containing buffers or stabilizer materials (Paper I). This is followed by DNA barcoding of antibodies using minimal reagent quantities. The procedure additionally enables efficient purification of barcoded antibodies from free remaining DNA residues to improve sensitivity and accuracy of the subsequent measurements (Paper II). By utilizing a solid-phase approach on magnetic beads, a high-throughput set-up is ready to be facilitated by automation. Subsequently, the applicability of prepared bio-conjugates for parallel protein detection is demonstrated in different types of standard immunoassays (Papers I and II). As the second part, the method immuno-sequencing (I-Seq) is presented for DNAmediated protein detection using barcoded antibodies. I-Seq achieved the detection of clinically relevant proteins in human blood plasma by parallel DNA readout (Paper II). The methodology is further developed to track antibody-antigen interaction events on suspension bead arrays, while being encapsulated in barcoded emulsion droplets (Paper III). The method, denoted compartmentalized immuno-sequencing (cI-Seq), is potent to perform specific detections with paired antibodies and can provide information on details of joint recognition events. Recent progress in technical developments of DNA sequencing has increased the interest in large-scale studies to analyze higher number of samples in parallel. The third part of this thesis focuses on addressing challenges of large-scale sequencing analysis. Decoding of a huge DNA-barcoded data is presented, aiming at phase-defined sequence investigation of canine MHC loci in over 3000 samples (Paper IV). The analysis revealed new single nucleotide variations and a notable number of novel haplotypes for the 2nd exon of DLA DRB1. Taken together, this thesis demonstrates emerging applications of barcoded sequencing in protein and DNA detection. Improvements through the barcoding systems for assay parallelization, de-convolution of antigen-antibody interactions, sequence variant analysis, as well as large-scale data interpretation would aid biomedical studies to achieve a deeper understanding of biological processes. The future perspectives of the developed methodologies may therefore stem for advancing large-scale omics investigations, particularly in the promising field of DNA-mediated proteomics, for highly multiplex studies of numerous samples at a notably improved molecular resolution. / <p>QC 20150203</p> DNA barcoding antibody labeling antibody oligonucleotide bio-conjugation DNAassisted proteomics immuno-sequencing (I-Seq) droplet-based system large-scale data analysis
14	Contributions to large-scale data processing systems / Contributions aux systèmes de traitement de données à grande échelle Caneill, Matthieu 05 February 2018 (has links) Cette thèse couvre le sujet des systèmes de traitement de données àgrande échelle, et plus précisément trois approches complémentaires :la conception d'un système pour prédir des défaillances de serveursgrâce à l'analyse de leurs données de supervision; l'acheminement dedonnées dans un système à temps réel en étudiant les corrélationsentre les champs des messages pour favoriser la localité; etfinalement un environnement de développement innovateur pour concevoirdes transformations de donées en utilisant des graphes orientés deblocs.À travers le projet Smart Support Center, nous concevons unearchitecture qui passe à l'échelle, afin de stocker des sériestemporelles rapportées par des moteurs de supervision, qui vérifienten permanence la santé des systèmes informatiques. Nous utilisons cesdonnées pour effectuer des prédictions, et détecter de potentielsproblèmes avant qu'ils ne ne produisent.Nous nous plongeons ensuite dans les algorithmes d'acheminement pourles sytèmes de traitement de données en temps réel, et développons unecouche pour acheminer les messages plus efficacement, en évitant lesrebonds entre machines. Dans ce but, nous identifions en temps réelles corrélations qui apparaissent entre les champs de ces messages,tels les mots-clics et leur localisation géographique, par exempledans le cas de micromessages. Nous utilisons ces corrélations pourcréer des tables d'acheminement qui favorisent la colocation desacteurs traitant ces messages.Pour finir, nous présentons λ-blocks, un environnement dedéveloppement pour effectuer des tâches de transformations de donnéessans écrire de code source, mais en créant des graphes de blocs decode. L'environnement est rapide, et est distribué avec des pilesincluses: libraries de blocs, modules d'extension, et interfaces deprogrammation pour l'étendre. Il est également capable de manipulerdes graphes d'exécution, pour optimisation, analyse, vérification, outout autre but. / This thesis covers the topic of large-scale data processing systems,and more precisely three complementary approaches: the design of asystem to perform prediction about computer failures through theanalysis of monitoring data; the routing of data in a real-time systemlooking at correlations between message fields to favor locality; andfinally a novel framework to design data transformations usingdirected graphs of blocks.Through the lenses of the Smart Support Center project, we design ascalable architecture, to store time series reported by monitoringengines, which constantly check the health of computer systems. We usethis data to perform predictions, and detect potential problems beforethey arise.We then dive in routing algorithms for stream processing systems, anddevelop a layer to route messages more efficiently, by avoiding hopsbetween machines. For that purpose, we identify in real-time thecorrelations which appear in the fields of these messages, such ashashtags and their geolocation, for example in the case of tweets. Weuse these correlations to create routing tables which favor theco-location of actors handling these messages.Finally, we present λ-blocks, a novel programming framework to computedata processing jobs without writing code, but rather by creatinggraphs of blocks of code. The framework is fast, and comes withbatteries included: block libraries, plugins, and APIs to extendit. It is also able to manipulate computation graphs, foroptimization, analyzis, verification, or any other purposes. Abstractions de programmation Smart support center Composants Localité Lambda blocks Large-Scale data processing Programming abstractions Smart support center Components Locality Lambda blocks 004
15	Planejamento, gerenciamento e análise de dados de microarranjos de DNA para identificação de biomarcadores de diagnóstico e prognóstico de cânceres humanos / Planning, management and analysis of DNA microarray data aiming at discovery of biomarkers for diagnosis and prognosis of human cancers. Ana Carolina Quirino Simões 12 May 2009 (has links) Nesta tese, apresentamos nossas estratégias para desenvolver um ambiente matemático e computacional para análises em larga-escala de dados de expressão gênica obtidos pela tecnologia de microarranjos de DNA. As análises realizadas visaram principalmente à identificação de marcadores moleculares de diagnóstico e prognóstico de cânceres humanos. Apresentamos o resultado de diversas análises implementadas através do ambiente desenvolvido, as quais conduziram a implementação de uma ferramenta computacional para a anotação automática de plataformas de microarranjos de DNA e de outra ferramenta destinada ao rastreamento da análise de dados realizada em ambiente R. Programação eXtrema (eXtreme Programming, XP) foi utilizada como técnica de planejamento e gerenciamento dos projetos de análise dados de expressão gênica. Todos os conjuntos de dados foram obtidos por nossos colaboradores, utilizando-se duas diferentes plataformas de microarranjos de DNA: a primeira enriquecida em regiões não-codificantes do genoma humano, em particular regiões intrônicas, e a segunda representando regiões exônicas de genes humanos. A primeira plataforma foi utilizada para avaliação do perfil de expressão gênica em tumores de próstata e rim humanos, sendo que análises utilizando SAM (Significance Analysis of Microarrays) permitiram a proposição de um conjunto de 49 sequências como potenciais biomarcadores de prognóstico de tumores de próstata. A segunda plataforma foi utilizada para avaliação do perfil de transcritos expressos em sarcomas, carcinomas epidermóide e carcinomas epidermóides de cabeça e pescoço. As análises com sarcomas permitiram a identificação de um conjunto de 12 genes relacionados à agressividade local e metástase. As análises com carcinomas epidermóides de cabeça e pescoço permitiram a identificação de 7 genes relacionados à metástase linfonodal. / In this PhD Thesis, we present our strategies to the development of a mathematical and computational environment aiming the analysis of large-scale microarray datasets. The analyses focused mainly on the identification of molecular markers for diagnosis and prognosis of human cancers. Here we show the results of several analyses implemented using this environment, which led to the development of a computational tool for automatic annotation of DNA microarray platforms and a tool for tracking the analysis within R environment. We also applied eXtreme Programming (XP) as a tool for planning and management of gene expression analyses projects. All data sets were obtained by our collaborators using two different microarray platforms. The first is enriched in non-coding human sequences, particularly intronic sequences. The second one represents exonic regions of human genes. Using the first platform, we evaluated gene expression profiles of prostate and kidney human tumors. Applying SAM to prostate tumor data revealed 49 potential molecular markers for prognosis of this disease. Gene expression in samples of sarcomas, epidermoid carcinomas and head and neck epidermoid carcinomas was investigated using the second platform. A set of 12 genes were identified as potential biomarkers for local aggressiveness and metastasis in sarcoma. In addition, the analyses of data obtained from head and neck epidermoid carcinomas allowed the identification of 7 potential biomarkers for lymph-nodal metastases. Análise de dados em larga escala Câncer Classificadores Marcadores Moleculares Microarranjos de DNA Programação eXtrema Cancer Classifiers DNA microarrays eXtreme Programming Large-scale data analysis Molecular Markers
16	Passage à l'échelle pour la visualisation interactive exploratoire de données : approches par abstraction et par déformation spatiale / Addressing scaling challenges in interactive exploratory visualization with abstraction and spatial distortion Richer, Gaëlle 26 November 2019 (has links) La visualisation interactive est un outil essentiel pour l'exploration, la compréhension et l'analyse de données. L'exploration interactive efficace de jeux de données grands ou complexes présente cependant deux difficultés fondamentales. La première est visuelle et concerne les limitations de la perception et cognition humaine, ainsi que celles des écrans. La seconde est computationnelle et concerne les limitations de capacité mémoire ou de traitement des machines standards. Dans cette thèse, nous nous intéressons aux techniques de passage à l'échelle relativement à ces deux difficultés, pour plusieurs contextes d'application.Pour le passage à l'échelle visuelle, nous présentons une approche versatile de mise en évidence de sous-ensembles d'éléments par déformation spatiale appliquée aux vues multiples et une représentation abstraite et multi-/échelle de coordonnées parallèles. Sur les vues multiples, la déformation spatiale vise à remédier à la diminution de l'efficacité de la surbrillance lorsque les éléments graphiques sont de taille réduite. Sur les coordonnées parallèles, l'abstraction multi-échelle consiste à simplifier la représentation tout en permettant d'accéder interactivement au détail des données, en les pré-agrégeant à plusieurs niveaux de détail.Pour le passage à l'échelle computationnelle, nous étudions des approches de pré-calcul et de calcul à la volée sur des infrastructures distribuées permettant l'exploration de jeux de données de plus d'un milliard d'éléments en temps interactif. Nous présentons un système pour l'exploration de données multi-dimensionnelles dont les interactions et l'abstraction respectent un budget en nombre d'éléments graphiques qui, en retour, fournit une borne théorique sur les latences d'interactions dues au transfert réseau entre client et serveur. Avec le même objectif, nous comparons des stratégies de réduction de données géométrique pour la reconstruction de cartes de densité d'ensembles de points. / Interactive visualization is helpful for exploring, understanding, and analyzing data. However, increasingly large and complex data challenges the efficiency of visualization systems, both visually and computationally. The visual challenge stems from human perceptual and cognitive limitations as well as screen space limitations while the computational challenge stems from the processing and memory limitations of standard computers.In this thesis, we present techniques addressing the two scalability issues for several interactive visualization applications.To address visual scalability requirements, we present a versatile spatial-distortion approach for linked emphasis on multiple views and an abstract and multi-scale representation based on parallel coordinates. Spatial distortion aims at alleviating the weakened emphasis effect of highlighting when applied to small-sized visual elements. Multiscale abstraction simplifies the representation while providing detail on demand by pre-aggregating data at several levels of detail.To address computational scalability requirements and scale data processing to billions of items in interactive times, we use pre-computation and real-time computation on a remote distributed infrastructure. We present a system for multi-/dimensional data exploration in which the interactions and abstract representation comply with a visual item budget and in return provides a guarantee on network-related interaction latencies. With the same goal, we compared several geometric reduction strategies for the reconstruction of density maps of large-scale point sets. Visualisation d'information Données massives Interface homme-Machine Abstraction de données Passage à l'échelle Déformation spatiale Information visualization Large-Scale data Human-Computer interface Data abstraction Scalability Spatial distortion
17	Simulation générique et contribution à l'optimisation de la robustesse des systèmes de données à large échelle / Generic simulation and contribution to the robustness optimization of large-scale data storage systems Gougeaud, Sebastien 11 May 2017 (has links) La capacité des systèmes de stockage de données ne cesse de croître pour atteindre actuellement l’échelle de l’exaoctet, ce qui a un réel impact sur la robustesse des systèmes de stockage. En effet, plus le nombre de disques contenus dans un système est grand, plus il est probable d’y avoir une défaillance. De même, le temps de la reconstruction d’un disque est proportionnel à sa capacité. La simulation permet le test de nouveaux mécanismes dans des conditions quasi réelles et de prédire leur comportements. Open and Generic data Storage system Simulation tool (OGSSim), l’outil que nous proposons, supporte l’hétérogénéité et la taille importante des systèmes actuels. Sa décomposition modulaire permet d’entreprendre chaque technologie de stockage, schéma de placement ou modèle de calcul comme des briques pouvant être combinées entre elles pour paramétrer au mieux la simulation. La robustesse étant un paramètre critique dans ces systèmes, nous utilisons le declustered RAID pour assurer la distribution de la reconstruction des données d’un disque en cas de défaillance. Nous proposons l’algorithme Symmetric Difference of Source Sets (SD2S) qui utilise le décalage des blocs de données pour la création du schéma de placement. Le pas du décalage est issu du calcul de la proximité des ensembles de provenance logique des blocs d’un disque physique. Pour évaluer l’efficacité de SD2S, nous l’avons comparé à la méthode Crush, exemptée des réplicas. Il en résulte que la création du schéma de placement, aussi bien en mode normal qu’en mode défaillant, est plus rapide avec SD2S, et que le coût en espace mémoire est également réduit (nul en mode normal). En cas de double défaillance, SD2S assure la sauvegarde d’une partie, voire de la totalité, des données / Capacity of data storage systems does not cease to increase to currently reach the exabyte scale. This observation gets a real impact on storage system robustness. In fact, the more the number of disks in a system is, the greater the probability of a failure happening is. Also, the time used for a disk reconstruction is proportional to its size. Simulation is an appropriate technique to test new mechanisms in almost real conditions and predict their behavior. We propose a new software we callOpen and Generic data Storage system Simulation tool (OGSSim). It handles the heterogeneity andthe large size of these modern systems. Its modularity permits the undertaking of each storage technology, placement scheme or computation model as bricks which can be added and combined to optimally configure the simulation.Robustness is a critical issue for these systems. We use the declustered RAID to distribute the data reconstruction in case of a failure. We propose the Symmetric Difference of Source Sets (SD2S) algorithmwhich uses data block shifhting to achieve the placement scheme. The shifting offset comes from the computation of the distance between logical source sets of physical disk blocks. To evaluate the SD2S efficiency, we compared it to Crush method without replicas. It results in a faster placement scheme creation in normal and failure modes with SD2S and in a significant reduced memory space cost (null without failure). Furthermore, SD2S ensures the partial, if not total, reconstruction of data in case of multiple failures. Simulation Stockage de données à large échelle Robustesse Disques magnétiques Disques à mémoire flash Simulation Large-Scale data storage Robustness Hard disk drives Solid-State drives
18	Statistical and Machine Learning Approaches For Visualizing and Analyzing Large-Scale Simulation Data Hazarika, Subhashis January 2019 (has links) No description available. Computer Science Statistics visualization statistical modeling probability distributions multivariate statistics copula information theory neural networks explainable artificial intelligence scientific simulation large scale data visual analytics
19	Recommandation Pair-à-Pair pour Communautés en Ligne à Grande Echelle / Peer-to-Peer Recommendation for Large-scale Online Communities Draidi, Fady 09 March 2012 (has links) Les systèmes de recommandation (RS) et le pair-à-pair (P2) sont complémentaires pour faciliter le partage de données à grande échelle: RS pour filtrer et personnaliser les requêtes des utilisateurs, et P2P pour construire des systèmes de partage de données décentralisés à grande échelle. Cependant, il reste beaucoup de difficultés pour construire des RS efficaces dans une infrastructure P2P. Dans cette thèse, nous considérons des communautés en ligne à grande échelle, où les utilisateurs notent les contenus qu'ils explorent et gardent dans leur espace de travail local les contenus de qualité pour leurs sujets d'intérêt. Notre objectif est de construire un P2P-RS efficace pour ce contexte. Nous exploitons les sujets d'intérêt des utilisateurs (extraits automatiquement des contenus et de leurs notes) et les données sociales (amitié et confiance) afin de construire et maintenir un overlay P2P social. La thèse traite de plusieurs problèmes. D'abord, nous nous concentrons sur la conception d'un P2P-RS qui passe à l'échelle, appelé P2Prec, en combinant les approches de recommandation par filtrage collaboratif et par filtrage basé sur le contenu. Nous proposons alors de construire et maintenir un overlay P2P dynamique grâce à des protocoles de gossip. Nos résultats d'expérimentation montrent que P2Prec permet d'obtenir un bon rappel avec une charge de requêtes et un trafic réseau acceptables. Ensuite, nous considérons une infrastructure plus complexe afin de construire et maintenir un overlay P2P social, appelé F2Frec, qui exploite les relations sociales entre utilisateurs. Dans cette infrastructure, nous combinons les aspects filtrage par contenu et filtrage basé social, pour obtenir un P2P-RS qui fournit des résultats de qualité et fiables. A l'aide d'une évaluation de performances extensive, nous montrons que F2Frec améliore bien le rappel, ainsi que la confiance dans les résultats avec une surcharge acceptable. Enfin, nus décrivons notre prototype de P2P-RS que nous avons implémenté pour valider notre proposition basée sur P2Prec et F2Frec. / Recommendation systems (RS) and P2P are both complementary in easing large-scale data sharing: RS to filter and personalize users' demands, and P2P to build decentralized large-scale data sharing systems. However, many challenges need to be overcome when building scalable, reliable and efficient RS atop P2P. In this work, we focus on large-scale communities, where users rate the contents they explore, and store in their local workspace high quality content related to their topics of interest. Our goal then is to provide a novel and efficient P2P-RS for this context. We exploit users' topics of interest (automatically extracted from users' contents and ratings) and social data (friendship and trust) as parameters to construct and maintain a social P2P overlay, and generate recommendations. The thesis addresses several related issues. First, we focus on the design of a scalable P2P-RS, called P2Prec, by leveraging collaborative- and content-based filtering recommendation approaches. We then propose the construction and maintenance of a P2P dynamic overlay using different gossip protocols. Our performance experimentation results show that P2Prec has the ability to get good recall with acceptable query processing load and network traffic. Second, we consider a more complex infrastructure in order to build and maintain a social P2P overlay, called F2Frec, which exploits social relationships between users. In this new infrastructure, we leverage content- and social-based filtering, in order to get a scalable P2P-RS that yields high quality and reliable recommendation results. Based on our extensive performance evaluation, we show that F2Frec increases recall, and the trust and confidence of the results with acceptable overhead. Finally, we describe our prototype of P2P-RS, which we developed to validate our proposal based on P2Prec and F2Frec. Système pair-à-pair (P2P) Système de recommandation (RS) Communautés en ligne Réseaux sociaux Recherche d’information Gestion de données à grande échelle P2P system Recommendation system (RS) Online communities Social networks Information retrieval Large-scale data management
20	Machine Learning based Protein Sequence to (un)Structure Mapping and Interaction Prediction Iqbal, Sumaiya 09 August 2017 (has links) Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that is otherwise obscured. The objective of this dissertation is to develop machine learning based effective tools to predict disordered protein, its properties and dynamics, and interaction paradigm by systematically mining and analyzing large-scale biological data. In this dissertation, we propose a robust framework to predict disordered proteins given only sequence information, using an optimized SVM with RBF kernel. Through appropriate reasoning, we highlight the structure-like behavior of IDPs in disease-associated complexes. Further, we develop a fast and effective predictor of Accessible Surface Area (ASA) of protein residues, a useful structural property that defines protein’s exposure to partners, using regularized regression with 3rd-degree polynomial kernel function and genetic algorithm. As a key outcome of this research, we then introduce a novel method to extract position specific energy (PSEE) of protein residues by modeling the pairwise thermodynamic interactions and hydrophobic effect. PSEE is found to be an effective feature in identifying the enthalpy-gain of the folded state of a protein and otherwise the neutral state of the unstructured proteins. Moreover, we study the peptide-protein transient interactions that involve the induced folding of short peptides through disorder-to-order conformational changes to bind to an appropriate partner. A suite of predictors is developed to identify the residue-patterns of Peptide-Recognition Domains from protein sequence that can recognize and bind to the peptide-motifs and phospho-peptides with post-translational-modifications (PTMs) of amino acid, responsible for critical human diseases, using the stacked generalization ensemble technique. The involved biologically relevant case-studies demonstrate possibilities of discovering new knowledge using the developed tools. Machine Learning Large-Scale Data Analysis Bioinformatics Intrinsically Disordered Protein Predictor Framework Protein-Protein Interaction Artificial Intelligence and Robotics Bioinformatics Computational Biology Computer Sciences Databases and Information Systems

Search results