Simulation numérique et approche orientée connaissance pour la découverte de nouvelles molécules thérapeutiques / Numeric simulation and knowledge-oriented approach for the discovery of new therapeutic molecules

Ghemtio Wafo, Léo Aymar 07 May 2010 (has links)
L’innovation thérapeutique progresse traditionnellement par la combinaison du criblage expérimental et de la modélisation moléculaire. En pratique, cette dernière approche est souvent limitée par la pénurie de données expérimentales, particulièrement les informations structurales et biologiques. Aujourd'hui, la situation a complètement changé avec le séquençage à haut débit du génome humain et les avancées réalisées dans la détermination des structures tridimensionnelles des protéines. Cette détermination permet d’avoir accès à une grande quantité de données pouvant servir à la recherche de nouveaux traitements pour un grand nombre de maladies. À cet égard, les approches informatiques permettant de développer des programmes de criblage virtuel à haut débit offrent une alternative ou un complément aux méthodes expérimentales qui font gagner du temps et de l’argent dans la découverte de nouveaux traitements.Cependant, la plupart de ces approches souffrent des mêmes limitations. Le coût et la durée des temps de calcul pour évaluer la fixation d'une collection de molécules à une cible, qui est considérable dans le contexte du haut débit, ainsi que la précision des résultats obtenus sont les défis les plus évidents dans le domaine. Le besoin de gérer une grande quantité de données hétérogènes est aussi particulièrement crucial.Pour surmonter les limitations actuelles du criblage virtuel à haut débit et ainsi optimiser les premières étapes du processus de découverte de nouveaux médicaments, j’ai mis en place une méthodologie innovante permettant, d’une part, de gérer une masse importante de données hétérogènes et d’en extraire des connaissances et, d’autre part, de distribuer les calculs nécessaires sur les grilles de calcul comportant plusieurs milliers de processeurs, le tout intégré à un protocole de criblage virtuel en plusieurs étapes. L’objectif est la prise en compte, sous forme de contraintes, des connaissances sur le problème posé afin d’optimiser la précision des résultats et les coûts en termes de temps et d’argent du criblage virtuel / Therapeutic innovation has traditionally benefited from the combination of experimental screening and molecular modelling. In practice, however, the latter is often limited by the shortage of structural and biological information. Today, the situation has completely changed with the high-throughput sequencing of the human genome, and the advances realized in the three-dimensional determination of the structures of proteins. This gives access to an enormous amount of data which can be used to search for new treatments for a large number of diseases. In this respect, computational approaches have been used for high-throughput virtual screening (HTVS) and offer an alternative or a complement to the experimental methods, which allow more time for the discovery of new treatments.However, most of these approaches suffer the same limitations. One of these is the cost and the computing time required for estimating the binding of all the molecules from a large data bank to a target, which can be considerable in the context of the high-throughput. Also, the accuracy of the results obtained is another very evident challenge in the domain. The need to manage a large amount of heterogeneous data is also particularly crucial.To try to surmount the current limitations of HTVS and to optimize the first stages of the drug discovery process, I set up an innovative methodology presenting two advantages. Firstly, it allows to manage an important mass of heterogeneous data and to extract knowledge from it. Secondly, it allows distributing the necessary calculations on a grid computing platform that contains several thousand of processors. The whole methodology is integrated into a multiple-step virtual screening funnel. The purpose is the consideration, in the form of constraints, of the knowledge available about the problem posed in order to optimize the accuracy of the results and the costs in terms of time and money at various stages of high-throughput virtual screening

InGriDE: um ambiente integrado e extensível de desenvolvimento para computação em grade / InGriDE: an integrated and extensible development environment for grid computing

Eduardo Leal Guerra 07 May 2007 (has links)
Recentes avanços proporcionaram às grades computacionais um bom nível de maturidade. Esses sistemas têm sido implantados em ambientes de produção de qualidade na comunidade de pesquisa acadêmica e vêm despertando um grande interesse da indústria. Entretanto, desenvolver aplicações para essas infra-estruturas heterogêneas e distribuídas ainda é uma tarefa complexa e propensa a erros. As iniciativas de facilitar essa tarefa resultaram, na maioria dos casos, em ferramentas não integradas e baseadas em características específicas de cada grade computacional. O presente trabalho tem como objetivo minimizar a dificuldade de desenvolvimento de aplicações para a grade através da construção de um ambiente integrado e extensível de desenvolvimento (IDE) para computação em grade chamado InGriDE. O InGriDE fornece um conjunto único de ferramentas compatíveis com diferentes sistemas de middleware, desenvolvidas baseadas na interface de programação Grid Application Toolkit (GAT). O conjunto de funcionalidades do InGriDE foi desenvolvido com base na plataforma Eclipse que, além de fornecer um arcabouço para construção de IDEs, facilita a extensão do conjunto inicial de funcionalidades. Para validar a nossa solução, utilizamos em nosso estudo de caso o middleware InteGrade, desenvolvido no nosso grupo de pesquisa. Os resultados obtidos nesse trabalho mostraram a viabilidade de fornecer independência de middleware para IDEs através do uso de uma interface genérica de programação como o GAT. Além disso, os benefícios obtidos com o uso do Eclipse como arcabouço para construção de IDEs indicam que os recursos fornecidos por esse tipo de arcabouço atendem de forma eficiente as necessidades inerentes ao processo de desenvolvimento de aplicações para a grade. / Computational grids have evolved considerably over the past few years. These systems have been deployed in production environments in the academic research community and have increased the interest by the industrial community. However, developing applications over heterogeneous and distributed infrastructure is still a complex and error prone process. The initiatives to facilitate this task, in the majority of the cases, resulted in isolated, middleware-specific tools. This work has the objective of minimizing the difficulty of developing grid applications through the construction of an integrated and extensible development environment for grid computing, called InGriDE. InGriDE provides a unique set of tools, compliant with different middleware systems, based on the Grid Application Toolkit (GAT). We developed the InGriDE set of features, based on the Eclipse platform, which provides both a framework for building IDEs and the possibility to extend the initial set of features. To validate our solution we used the InteGrade middleware, developed in our research group, as our case study. The results obtained from our work showed the viability of providing middleware independence to IDEs through the use of a generic application programming interface like GAT. Moreover, the benefits obtained through the use of Eclipse as our framework for building IDEs indicates that this kind of framework satisfies the requirements inherent to the grid application development process in a efficient way.

Simulace distribuovaných systémů / Distributed Systems Simulation

Ďuriš, Anton January 2021 (has links)
This thesis is focused on distributed systems modeling using Petri nets. Distributed systems are increasingly being implemented in applications and computing systems, where their task is to ensure sufficient performance and stability for a large number of its users. When modeling a distributed systems, stochastic behavior of Petri nets is important, which will provide more realistic simulations. Therefore, this thesis focuses mainly on timed Petri nets. The theoretical part of this thesis summarizes distributed systems, their properties, types and available architectures, as well as Petri nets, their representation, types and the principle of an operation. In the practical part, two models were implemented, namely a horizontally scaled web application divided into several services with a distributed database and a large grid computing system, more precisely the BOINC platform with the Folding@home project. Both models were implemented using the PetNetSim library of Python. The goal of this thesis is to perform simulations on the created models for different scenarios of their behavior.

Workshop Mensch-Computer-Vernetzung

Hübner, Uwe 15 October 2003 (has links)
Workshop Mensch-Computer-Vernetzung vom 14.-17. April 2003 in Löbsal (bei Meißen)

Economic scheduling in Grid computing using Tender models

Bsoul, Mohammad January 2007 (has links)
Economic scheduling needs to be considered for Grid computing environment, because it gives an incentive for resource providers to supply their resources. Moreover, it enforces efficient use of resources, because the users have to pay for their use. Tendering is a suitable model for Grid scheduling because users start the negotiations for finding suitable resources for executing their jobs. Furthermore, the users specify their job requirements with their requests and therefore the resources reply with bids that are based on the cost of taking on the job and the availability of their processors. In this thesis, a framework for economic Grid scheduling using tendering is proposed. The framework entities such as users, brokers and resources employ tender/contract-net model to negotiate the prices and deadlines. The brokers' role is acting on behalf of users. During the negotiations, the entities aim to maximise their performance which is measured by a number of metrics. In order to evaluate the entities' performance under different scenarios, a Java- based simulator, called MICOSim, supporting event-driven simulation of economic Grid scheduling is presented. MICOSim can perform a simulation of more than one hundred entities faster than real time. It is concluded from the evaluation that users who are interested in increasing the job success rate and paying less for executing their jobs have to consider received prices to select the most appropriate bids, while users who are interested in improving the job average satisfaction rate have to consider either received completion time or both price and completion time to select the most suitable bids when the submission of jobs is static. The best broker strategy is the one that doesn't take into account meeting the job deadlines in the bids it sends to job owners. Finally, the resource strategy that considers the price to determine if to reply to a request or not is superior to other resource strategies. The only exception is employing this strategy with price that is too low. However, there is a tiny difference between the performances of different user strategies in dynamic submission. It is also concluded from the evaluation that broker strategies have the best performance when the revenue they target from the users is reasonable. Thus, the broker's aim has to be receiving reasonable revenue (neither too low nor too high) from acting on behalf of users. It is observed from the results that the strategy performance is influenced by the behaviour of other entities such as the submission time of user jobs. Finally, it is observed that the characteristics of entities have an effect on the performance of strategies. For example, the two user strategies that consider the received completion time and both price and completion time to determine if to accept a broker bid have similar performance, because of the existence of resources with various prices from cheap to expensive and existence of resources which don't care about the price paid for the execution. So, the price threshold doesn't have a large effect on the performance.

Epidémiologie moléculaire et métagénomique à haut débit sur la grille / Molecular epidemiology and high-throughput metagenomics on the grid

Doan, Trung-Tung 17 December 2012 (has links)
Résumé indisponible / The objective of this thesis focuses on the study and the development of bioinformatics platforms and tools on the grid. The second objective is to develop applications in molecular epidemiology and metagenomics based on these tools and platforms. Based on the studies of existing bioinformatics platforms and tools, we propose our solution: a platform and a portal for molecular epidemiology and high throughput metagenomics on the grid. The main idea of ​​our platform is to simplify the submission of jobs to the grid via the pilots jobs (jobs generic that can control and launch many real tasks) and the PULL model (tasks are retrieved and executed automatically). There are other platforms that have similar approaches but our platform focuses on the simplicity and the saving time for the submission of jobs. Bioinformatics tools chosen to deploy the platform are popular tools that can be used in many bioinformatics analyses. We apply a workflow engine in the platform so that users can make the analysis easier. Our platform can be seen as a generalized system that can be applied to both the epidemiological surveillance and metagenomics of which two use cases are deployed and tested on the grid. The first use case is used to monitor bird flu. The approach of this application is to federate data sequences of influenza viruses and provide a portal with tools on the grid to analyze these data. The second use case is used to apply the power of the grid in the analysis of high throughput sequencing of amplicon sequences. In this case, we prove the efficiency of the grid by using our platform to gridifier an existing application, which has much less performance than the gridified version.

\"Armazenamento distribuído de dados e checkpointing de aplicações paralelas em grades oportunistas\" / Distributed data storage and checkpointing of parallel applications in opportunistic grids

Camargo, Raphael Yokoingawa de 04 May 2007 (has links)
Grades computacionais oportunistas utilizam recursos ociosos de máquinas compartilhadas para executar aplicações que necessitam de um alto poder computacional e/ou trabalham com grandes quantidades de dados. Mas a execução de aplicações paralelas computacionalmente intensivas em ambientes dinâmicos e heterogêneos, como grades computacionais oportunistas, é uma tarefa difícil. Máquinas podem falhar, ficar inacessíveis ou passar de ociosas para ocupadas inesperadamente, comprometendo a execução de aplicações. Um mecanismo de tolerância a falhas que dê suporte a arquiteturas heterogêneas é um importante requisito para estes sistemas. Neste trabalho, analisamos, implementamos e avaliamos um mecanismo de tolerância a falhas baseado em checkpointing para aplicações paralelas em grades computacionais oportunistas. Este mecanismo permite o monitoramento de execuções e a migração de aplicações entre nós heterogêneos da grade. Mas além da execução, é preciso gerenciar e armazenar os dados gerados e utilizados por estas aplicações. Desejamos uma infra-estrutura de armazenamento de dados de baixo custo e que utilize o espaço livre em disco de máquinas compartilhadas da grade. Devemos utilizar somente os ciclos ociosos destas máquinas para armazenar e recuperar dados, de modo que um sistema de armazenamento distribuído que as utilize deve ser redundante e tolerante a falhas. Para resolver o problema do armazenamento de dados em grades oportunistas, projetamos, implementamos e avaliamos o middleware OppStore. Este middleware provê armazenamento distribuído e confiável de dados, que podem ser acessados de qualquer máquina da grade. As máquinas são organizadas em aglomerados, que são conectados por uma rede peer-to-peer auto-organizável e tolerante a falhas. Dados são codificados em fragmentos redundantes antes de serem armazenados, de modo que arquivos podem ser reconstruídos utilizando apenas um subconjunto destes fragmentos. Finalmente, para lidar com a heterogeneidade dos recursos, desenvolvemos uma extensão ao protocolo de roteamento em redes peer-to-peer Pastry. Esta extensão adiciona balanceamento de carga e suporte à heterogeneidade de máquinas ao protocolo Pastry. / Opportunistic computational grids use idle resources from shared machines to execute applications that need large amounts of computational power and/or deal with large amounts of data. But executing computationally intensive parallel applications in dynamic and heterogeneous environments, such as opportunistic grids, is a daunting task. Machines may fail, become inaccessible, or change from idle to occupied unexpectedly, compromising the application execution. A fault tolerance mechanism that supports heterogeneous architectures is an important requisite for such systems. In this work, we analyze, implement and evaluate a checkpointing-based fault tolerance mechanism for parallel applications running on opportunistic grids. The mechanism monitors application execution and allows the migration of applications between heterogeneous nodes of the grid. But besides application execution, it is necessary to manage data generated and used by those applications. We want a low cost data storage infrastructure that utilizes the unused disk space of grid shared machines. The system should use the machines to store and recover data only during their idle periods, requiring the system to be redundant and fault-tolerant. To solve the data storage problem in opportunistic grids, we designed, implemented and evaluated the OppStore middleware. This middleware provides reliable distributed storage for application data, which can be accessed from any machine in the grid. The machines are organized in clusters, connected by a self-organizing and fault-tolerant peer-to-peer network. During storage, data is codified into redundant fragments, allowing the reconstruction of the original file using only a subset of those fragments. Finally, to deal with resource heterogeneity, we developed an extension to the Pastry peer-to-peer routing substrate, enabling heterogeneity-aware load-balancing message routing.


ELBIO RENATO TORRES ABIB 03 August 2004 (has links)
[pt] O problema de escalonamento de tarefas divisíveis consiste em determinar como uma carga a ser processada deve ser dividida entre processadores e em que ordem cada fração de carga será enviada a cada processador. Considera-se o escalonamento em redes estrela com computadores e enlaces heterogêneos. Nesta dissertação são propostas formulações originais deste problema como modelos de programação linear inteira mista, assim como um novo algoritmo de complexidade O(n) para a solução ótima de um caso especial. Além disso, também são propostas duas novas heurísticas para o problema, que permitem a elaboração de bons escalonamentos para instâncias de grande porte em um reduzido tempo de processamento. / [en] The problem of divisible job scheduling consists of determining how to divide the data to be processed among processors and in which order each fraction should be sent to them. In this dissertation, we consider the divisible load scheduling problem in star networks with heterogeneous computers and links. Original mixed integer linear programming formulations of this problem are proposed, as well as a new algorithm with complexity O(n) to find the optimal solution for a special case. We also propose two fast heuristics that achieve good results for instances representing large scale computing systems.

Praktické uplatnění technologií data mining ve zdravotních pojišťovnách / Practical applications of data mining technologies in health insurance companies

Kulhavý, Lukáš January 2010 (has links)
This thesis focuses on data mining technology and its possible practical use in the field of health insurance companies. Thesis defines the term data mining and its relation to the term knowledge discovery in databases. The term data mining is explained, inter alia, with methods describing the individual phases of the process of knowledge discovery in databases (CRISP-DM, SEMMA). There is also information about possible practical applications, technologies and products available in the market (both products available free and commercial products). Introduction of the main data mining methods and specific algorithms (decision trees, association rules, neural networks and other methods) serves as a theoretical introduction, on which are the practical applications of real data in real health insurance companies build. These are applications seeking the causes of increased remittances and churn prediction. I have solved these applications in freely-available systems Weka and LISP-Miner. The objective is to introduce and to prove data mining capabilities over this type of data and to prove capabilities of Weka and LISP-Miner systems in solving tasks due to the methodology CRISP-DM. The last part of thesis is devoted the fields of cloud and grid computing in conjunction with data mining. It offers an insight into possibilities of these technologies and their benefits to the technology of data mining. Possibilities of cloud computing are presented on the Amazon EC2 system, grid computing can be used in Weka Experimenter interface.

Energy consumption optimization of parallel applications with Iterations using CPU frequency scaling / Optimisation de la consommation énergétique des applications parallèles avec des itérations en utilisant réduisant la fréquence des processeurs

Fanfakh, Ahmed Badri Muslim 17 October 2016 (has links)
Au cours des dernières années, l'informatique “green” est devenue un sujet important dans le calcul intensif. Cependant, les plates-formes informatiques continuent de consommer de plus en plus d'énergie en raison de l'augmentation du nombre de noeuds qui les composent. Afin de minimiser les coûts d'exploitation de ces plates-formes de nombreuses techniques ont été étudiées, parmi celles-ci, il y a le changement de la fréquence dynamique des processeurs (DVFS en anglais). Il permet de réduire la consommation d'énergie d'un CPU, en abaissant sa fréquence. Cependant, cela augmente le temps d'exécution de l'application. Par conséquent, il faut trouver un seuil qui donne le meilleur compromis entre la consommation d'énergie et la performance d'une application. Cette thèse présente des algorithmes développés pour optimiser la consommation d'énergie et les performances des applications parallèles avec des itérations synchrones et asynchrones sur des clusters ou des grilles. Les modèles de consommation d'énergie et de performance proposés pour chaque type d'application parallèle permettent de prédire le temps d'exécution et la consommation d'énergie d'une application pour toutes les fréquences disponibles.La contribution de cette thèse peut être divisé en trois parties. Tout d'abord, il s'agit d'optimiser le compromis entre la consommation d'énergie et les performances des applications parallèles avec des itérations synchrones sur des clusters homogènes. Deuxièmement, nous avons adapté les modèles de performance énergétique aux plates-formes hétérogènes dans lesquelles chaque noeud peut avoir des spécifications différentes telles que la puissance de calcul, la consommation d'énergie, différentes fréquences de fonctionnement ou encore des latences et des bandes passantes réseaux différentes. L'algorithme d'optimisation de la fréquence CPU a également été modifié en fonction de l'hétérogénéité de la plate-forme. Troisièmement, les modèles et l'algorithme d'optimisation de la fréquence CPU ont été complètement repensés pour prendre en considération les spécificités des algorithmes itératifs asynchrones.Tous ces modèles et algorithmes ont été appliqués sur des applications parallèles utilisant la bibliothèque MPI et ont été exécutés avec le simulateur Simgrid ou sur la plate-forme Grid'5000. Les expériences ont montré que les algorithmes proposés sont plus efficaces que les méthodes existantes. Ils n’introduisent qu’un faible surcoût et ne nécessitent pas de profilage au préalable car ils sont exécutés au cours du déroulement de l’application. / In recent years, green computing has become an important topic in the supercomputing research domain. However, the computing platforms are still consuming more and more energy due to the increase in the number of nodes composing them. To minimize the operating costs of these platforms many techniques have been used. Dynamic voltage and frequency scaling (DVFS) is one of them. It can be used to reduce the power consumption of the CPU while computing, by lowering its frequency. However, lowering the frequency of a CPU may increase the execution time of the application running on that processor. Therefore, the frequency that gives the best trade-off between the energy consumption and the performance of an application must be selected.This thesis, presents the algorithms developed to optimize the energy consumption and theperformance of synchronous and asynchronous message passing applications with iterations runningover clusters or grids. The energy consumption and performance models for each type of parallelapplication predicts its execution time and energy consumption for any selected frequency accordingto the characteristics of both the application and the architecture executing this application.The contribution of this thesis can be divided into three parts: Firstly, optimizing the trade-offbetween the energy consumption and the performance of the message passing applications withsynchronous iterations running over homogeneous clusters. Secondly, adapting the energy andperformance models to heterogeneous platforms where each node can have different specificationssuch as computing power, energy consumption, available frequency gears or network’s latency andbandwidth. The frequency scaling algorithm was also modified to suit the heterogeneity of theplatform. Thirdly, the models and the frequency scaling algorithm were completely rethought to takeinto considerations the asynchronism in the communication and computation. All these models andalgorithms were applied to message passing applications with iterations and evaluated over eitherSimGrid simulator or Grid’5000 platform. The experiments showed that the proposed algorithms areefficient and outperform existing methods such as the energy and delay product. They also introducea small runtime overhead and work online without any training or profiling.

