1 |
Desenvolvimento de um ambiente de computação voluntária baseado em computação ponto-a-ponto / Development of an volunteer computing environment based in peer-to-peer computingSantiago, Caio Rafael do Nascimento 13 March 2015 (has links)
As necessidades computacionais de experimentos científicos muitas vezes exigem computadores potentes. Uma forma alternativa de obter esse processamento é aproveitar o processamento ocioso de computadores pessoais de modo voluntário. Essa técnica é conhecida como computação voluntária e possui grande potencial na ajuda aos cientistas. No entanto existem diversos fatores que podem reduzir sua eficiência quando aplicada a experimentos científicos complexos, por exemplo, aqueles que envolvem processamento de longa duração, uso de dados de entrada ou saída muito grandes, etc. Na tentativa de solucionar alguns desses problemas surgiram abordagens que aplicam conceitos de computação ponto-a-ponto. Neste projeto foram especificados, desenvolvidos e testados um ambiente e um escalonador de atividades que aplicam conceitos de computação ponto-a-ponto à execução de workflows com computação voluntária. Quando comparado com a execução local de atividades e com a computação voluntária tradicional houve melhoras em relação ao tempo de execução (até 22% de redução quando comparada com a computação voluntária tradicional nos testes mais complexos) e em alguns casos também houve uma redução no consumo de banda de upload do servidor de até 62%. / The computational needs of scientific experiments often require powerful computers. One alternative way to obtain this processing power is taking advantage of the idle processing of personal computers as volunteers. This technique is known as volunteer computing and has great potential in helping scientists. However, there are several issues which can reduce the efficiency of this approach when applied to complex scientific experiments, such as, the ones with long processing time, very large input or output data, etc. In an attempt to solve these problems some approaches based on P2P concepts arisen. In this project a workflow execution environment and a scheduler of activities were specified, developed and tested applying P2P concepts in the workflows execution using volunteer computing. When compared with the local execution of activities and traditional volunteer computing was the execution time was improved (until 22% of reduction when compared with the traditional volunteer computing in the most complex tests) and in some cases there was also a reduction of the server upload bandwidth use of until 62%.
|
2 |
Desenvolvimento de um ambiente de computação voluntária baseado em computação ponto-a-ponto / Development of an volunteer computing environment based in peer-to-peer computingCaio Rafael do Nascimento Santiago 13 March 2015 (has links)
As necessidades computacionais de experimentos científicos muitas vezes exigem computadores potentes. Uma forma alternativa de obter esse processamento é aproveitar o processamento ocioso de computadores pessoais de modo voluntário. Essa técnica é conhecida como computação voluntária e possui grande potencial na ajuda aos cientistas. No entanto existem diversos fatores que podem reduzir sua eficiência quando aplicada a experimentos científicos complexos, por exemplo, aqueles que envolvem processamento de longa duração, uso de dados de entrada ou saída muito grandes, etc. Na tentativa de solucionar alguns desses problemas surgiram abordagens que aplicam conceitos de computação ponto-a-ponto. Neste projeto foram especificados, desenvolvidos e testados um ambiente e um escalonador de atividades que aplicam conceitos de computação ponto-a-ponto à execução de workflows com computação voluntária. Quando comparado com a execução local de atividades e com a computação voluntária tradicional houve melhoras em relação ao tempo de execução (até 22% de redução quando comparada com a computação voluntária tradicional nos testes mais complexos) e em alguns casos também houve uma redução no consumo de banda de upload do servidor de até 62%. / The computational needs of scientific experiments often require powerful computers. One alternative way to obtain this processing power is taking advantage of the idle processing of personal computers as volunteers. This technique is known as volunteer computing and has great potential in helping scientists. However, there are several issues which can reduce the efficiency of this approach when applied to complex scientific experiments, such as, the ones with long processing time, very large input or output data, etc. In an attempt to solve these problems some approaches based on P2P concepts arisen. In this project a workflow execution environment and a scheduler of activities were specified, developed and tested applying P2P concepts in the workflows execution using volunteer computing. When compared with the local execution of activities and traditional volunteer computing was the execution time was improved (until 22% of reduction when compared with the traditional volunteer computing in the most complex tests) and in some cases there was also a reduction of the server upload bandwidth use of until 62%.
|
3 |
Adequação da computação intensiva em dados para ambientes desktop grid com uso de MapReduce / Adequacy of intensive data computing to desktop grid environment with using of mapreduceAnjos, Julio Cesar Santos dos January 2012 (has links)
O surgimento de volumes de dados na ordem de petabytes cria a necessidade de desenvolver-se novas soluções que viabilizem o tratamento dos dados através do uso de sistemas de computação intensiva, como o MapReduce. O MapReduce é um framework de programação que apresenta duas funções: uma de mapeamento, chamada Map, e outra de redução, chamada Reduce, aplicadas a uma determinada entrada de dados. Este modelo de programação é utilizado geralmente em grandes clusters e suas tarefas Map ou Reduce são normalmente independentes entre si. O programador é abstraído do processo de paralelização como divisão e distribuição de dados, tolerância a falhas, persistência de dados e distribuição de tarefas. A motivação deste trabalho é aplicar o modelo de computação intensiva do MapReduce com grande volume de dados para uso em ambientes desktop grid. O objetivo então é investigar os algoritmos do MapReduce para adequar a computação intensiva aos ambientes heterogêneos. O trabalho endereça o problema da heterogeneidade de recursos, não tratando neste momento a volatilidade das máquinas. Devido às deficiências encontradas no MapReduce em ambientes heterogêneos foi proposto o MR-A++, que é um MapReduce com algoritmos adequados ao ambiente heterogêneo. O modelo do MR-A++ cria uma tarefa de medição para coletar informações, antes de ocorrer a distribuição dos dados. Assim, as informações serão utilizadas para gerenciar o sistema. Para avaliar os algoritmos alterados foi empregada a Análise 2k Fatorial e foram executadas simulações com o simulador MRSG. O simulador MRSG foi construído para o estudo de ambientes (homogêneos e heterogêneos) em larga escala com uso do MapReduce. O pequeno atraso introduzido na fase de setup da computação é compensado com a adequação do ambiente heterogêneo à capacidade computacional das máquinas, com ganhos de redução de tempo de execução dos jobs superiores a 70 % em alguns casos. / The emergence of data volumes in the order of petabytes creates the need to develop new solutions that make possible the processing of data through the use of intensive computing systems, as MapReduce. MapReduce is a programming framework that has two functions: one called Map, mapping, and another reducing called Reduce, applied to a particular data entry. This programming model is used primarily in large clusters and their tasks are normally independent. The programmer is abstracted from the parallelization process such as division and data distribution, fault tolerance, data persistence and distribution of tasks. The motivation of this work is to apply the intensive computation model of MapReduce with large volume of data in desktop grid environments. The goal then is to investigate the intensive computing in heterogeneous environments with use MapReduce model. First the problem of resource heterogeneity is solved, not treating the moment of the volatility. Due to deficiencies of the MapReduce model in heterogeneous environments it was proposed the MR-A++; a MapReduce with algorithms adequated to heterogeneous environments. The MR-A++ model creates a training task to gather information prior to the distribution of data. Therefore the information will be used to manager the system. To evaluate the algorithms change it was employed a 2k Factorial analysis and simulations with the simulant MRSG built for the study of environments (homogeneous and heterogeneous) large-scale use of MapReduce. The small delay introduced in phase of setup of computing compensates with the adequacy of heterogeneous environment to computational capacity of the machines, with gains in the run-time reduction of jobs exceeding 70% in some cases.
|
4 |
Adequação da computação intensiva em dados para ambientes desktop grid com uso de MapReduce / Adequacy of intensive data computing to desktop grid environment with using of mapreduceAnjos, Julio Cesar Santos dos January 2012 (has links)
O surgimento de volumes de dados na ordem de petabytes cria a necessidade de desenvolver-se novas soluções que viabilizem o tratamento dos dados através do uso de sistemas de computação intensiva, como o MapReduce. O MapReduce é um framework de programação que apresenta duas funções: uma de mapeamento, chamada Map, e outra de redução, chamada Reduce, aplicadas a uma determinada entrada de dados. Este modelo de programação é utilizado geralmente em grandes clusters e suas tarefas Map ou Reduce são normalmente independentes entre si. O programador é abstraído do processo de paralelização como divisão e distribuição de dados, tolerância a falhas, persistência de dados e distribuição de tarefas. A motivação deste trabalho é aplicar o modelo de computação intensiva do MapReduce com grande volume de dados para uso em ambientes desktop grid. O objetivo então é investigar os algoritmos do MapReduce para adequar a computação intensiva aos ambientes heterogêneos. O trabalho endereça o problema da heterogeneidade de recursos, não tratando neste momento a volatilidade das máquinas. Devido às deficiências encontradas no MapReduce em ambientes heterogêneos foi proposto o MR-A++, que é um MapReduce com algoritmos adequados ao ambiente heterogêneo. O modelo do MR-A++ cria uma tarefa de medição para coletar informações, antes de ocorrer a distribuição dos dados. Assim, as informações serão utilizadas para gerenciar o sistema. Para avaliar os algoritmos alterados foi empregada a Análise 2k Fatorial e foram executadas simulações com o simulador MRSG. O simulador MRSG foi construído para o estudo de ambientes (homogêneos e heterogêneos) em larga escala com uso do MapReduce. O pequeno atraso introduzido na fase de setup da computação é compensado com a adequação do ambiente heterogêneo à capacidade computacional das máquinas, com ganhos de redução de tempo de execução dos jobs superiores a 70 % em alguns casos. / The emergence of data volumes in the order of petabytes creates the need to develop new solutions that make possible the processing of data through the use of intensive computing systems, as MapReduce. MapReduce is a programming framework that has two functions: one called Map, mapping, and another reducing called Reduce, applied to a particular data entry. This programming model is used primarily in large clusters and their tasks are normally independent. The programmer is abstracted from the parallelization process such as division and data distribution, fault tolerance, data persistence and distribution of tasks. The motivation of this work is to apply the intensive computation model of MapReduce with large volume of data in desktop grid environments. The goal then is to investigate the intensive computing in heterogeneous environments with use MapReduce model. First the problem of resource heterogeneity is solved, not treating the moment of the volatility. Due to deficiencies of the MapReduce model in heterogeneous environments it was proposed the MR-A++; a MapReduce with algorithms adequated to heterogeneous environments. The MR-A++ model creates a training task to gather information prior to the distribution of data. Therefore the information will be used to manager the system. To evaluate the algorithms change it was employed a 2k Factorial analysis and simulations with the simulant MRSG built for the study of environments (homogeneous and heterogeneous) large-scale use of MapReduce. The small delay introduced in phase of setup of computing compensates with the adequacy of heterogeneous environment to computational capacity of the machines, with gains in the run-time reduction of jobs exceeding 70% in some cases.
|
5 |
Adequação da computação intensiva em dados para ambientes desktop grid com uso de MapReduce / Adequacy of intensive data computing to desktop grid environment with using of mapreduceAnjos, Julio Cesar Santos dos January 2012 (has links)
O surgimento de volumes de dados na ordem de petabytes cria a necessidade de desenvolver-se novas soluções que viabilizem o tratamento dos dados através do uso de sistemas de computação intensiva, como o MapReduce. O MapReduce é um framework de programação que apresenta duas funções: uma de mapeamento, chamada Map, e outra de redução, chamada Reduce, aplicadas a uma determinada entrada de dados. Este modelo de programação é utilizado geralmente em grandes clusters e suas tarefas Map ou Reduce são normalmente independentes entre si. O programador é abstraído do processo de paralelização como divisão e distribuição de dados, tolerância a falhas, persistência de dados e distribuição de tarefas. A motivação deste trabalho é aplicar o modelo de computação intensiva do MapReduce com grande volume de dados para uso em ambientes desktop grid. O objetivo então é investigar os algoritmos do MapReduce para adequar a computação intensiva aos ambientes heterogêneos. O trabalho endereça o problema da heterogeneidade de recursos, não tratando neste momento a volatilidade das máquinas. Devido às deficiências encontradas no MapReduce em ambientes heterogêneos foi proposto o MR-A++, que é um MapReduce com algoritmos adequados ao ambiente heterogêneo. O modelo do MR-A++ cria uma tarefa de medição para coletar informações, antes de ocorrer a distribuição dos dados. Assim, as informações serão utilizadas para gerenciar o sistema. Para avaliar os algoritmos alterados foi empregada a Análise 2k Fatorial e foram executadas simulações com o simulador MRSG. O simulador MRSG foi construído para o estudo de ambientes (homogêneos e heterogêneos) em larga escala com uso do MapReduce. O pequeno atraso introduzido na fase de setup da computação é compensado com a adequação do ambiente heterogêneo à capacidade computacional das máquinas, com ganhos de redução de tempo de execução dos jobs superiores a 70 % em alguns casos. / The emergence of data volumes in the order of petabytes creates the need to develop new solutions that make possible the processing of data through the use of intensive computing systems, as MapReduce. MapReduce is a programming framework that has two functions: one called Map, mapping, and another reducing called Reduce, applied to a particular data entry. This programming model is used primarily in large clusters and their tasks are normally independent. The programmer is abstracted from the parallelization process such as division and data distribution, fault tolerance, data persistence and distribution of tasks. The motivation of this work is to apply the intensive computation model of MapReduce with large volume of data in desktop grid environments. The goal then is to investigate the intensive computing in heterogeneous environments with use MapReduce model. First the problem of resource heterogeneity is solved, not treating the moment of the volatility. Due to deficiencies of the MapReduce model in heterogeneous environments it was proposed the MR-A++; a MapReduce with algorithms adequated to heterogeneous environments. The MR-A++ model creates a training task to gather information prior to the distribution of data. Therefore the information will be used to manager the system. To evaluate the algorithms change it was employed a 2k Factorial analysis and simulations with the simulant MRSG built for the study of environments (homogeneous and heterogeneous) large-scale use of MapReduce. The small delay introduced in phase of setup of computing compensates with the adequacy of heterogeneous environment to computational capacity of the machines, with gains in the run-time reduction of jobs exceeding 70% in some cases.
|
6 |
On using Desktop Grid Computing in software industryCederström, Andreas January 2010 (has links)
Context. When dealing with large data sets and heavy calculations the common solution is clusters, supercomputers or Grids of these two. However, there are ways of gaining large computational power by utilizing the unused cycles of regular home or office computers, this are referred to as Desktop Grids. Objectives. In this study we review the current field of solutions for open source Desktop Grid computing capable of dealing with a heterogeneous set of clients and dynamic size of the Desktop Grid. We investigate current use, interest of use and priority of key attributes of Desktop Grids. Finally we want to show how time effective Desktop Grids are compared to execution on a single machine and in the process show effort needed to setup a Desktop Grid and start computing. The overall purpose of this study is to provide a path for industry organizations to take when taking the first step into Desktop Grid computing. Methods. We use a systematic review to collect information of existing open source Desktop Grid solutions. Studies are selected based on inclusion criterions and a quality assessment. A survey questioner is used to assess industry usage, interest and prioritization of attributes of Desktop Grids. We will conduct an experiment to show execution speedup as well as setup effort. Results. We found ten open source Desktop Grids fulfilling our requirements. The survey shows that Desktop Grids is used to a very little extent within industry while a majority of the participants state that there is an interest for Desktop Grids. As result of the experiment, we can say that we achieved very high speedup and that effort needed to setup a Desktop Grid is about 40 hours for one person with no prior experience to the selected Desktop Grid system. Conclusions. We conclude that industry organizations have a possible need for Desktop Grids but in order to be more successful, Desktop Grid developers must put more effort into areas as automated testing and code compilation.
|
7 |
An Investigation of Run-time Operations in a Heterogeneous Desktop Grid Environment: The Texas Tech University Desktop Grid Case StudyPerez, Jerry Felix 01 January 2013 (has links)
The goal of the dissertation study was to evaluate the existing DG scheduling algorithm. The evaluation was developed through previously explored simulated analyses of DGs performed by researchers in the field of DG scheduling optimization and to improve the current RT framework of the DG at TTU. The author analyzed the RT of an actual DG,
thereby enabling other investigators to compare theoretical results with the results of this dissertation case study.
Two statistical methods were used to formulate and validate predictive models: multiple linear regression and graphical exploratory data analysis techniques. Using both statistical methods, the author was able to determine that the theoretical model was able to predict the significance of four independent variables of resource fragmentation,
computational volatility, resource management, and grid job scheduling on the dependent variables quality of service and job performance affecting RT. After an experimental case study analysis of the DG variables, the author identified the best DG resources to perform
optimization of run-time performance of DG at TTU. The projected outcome of this investigation is the improved job scheduling techniques of the DG at TTU.
|
8 |
A Desktop Grid Computing Approach for Scientific Computing and VisualizationConstantinescu-Fuløp, Zoran January 2008 (has links)
<p>Scientific Computing is the collection of tools, techniques, and theories required to solve on a computer, mathematical models of problems from science and engineering, and its main goal is to gain insight in such problems. Generally, it is difficult to understand or communicate information from complex or large datasets generated by Scientific Computing methods and techniques (computational simulations, complex experiments, observational instruments etc.). Therefore, support of Scientific Visualization is needed, to provide the techniques, algorithms, and software tools needed to extract and display appropriately important information from numerical data.</p><p>Usually, complex computational and visualization algorithms require large amounts of computational power. The computing power of a single desktop computer is insufficient for running such complex algorithms, and, traditionally, large parallel supercomputers or dedicated clusters were used for this job. However, very high initial investments and maintenance costs limit the availability of such systems. A more convenient solution, which is becoming more and more popular, is based on the use of nondedicated desktop PCs in a Desktop Grid Computing environment. Harnessing idle CPU cycles, storage space and other resources of networked computers to work together on a particularly computational intensive application does this. Increasing power and communication bandwidth of desktop computers provides for this solution.</p><p>In a desktop grid system, the execution of an application is orchestrated by a central scheduler node, which distributes the tasks amongst the worker nodes and awaits workers’ results. An application only finishes when all tasks have been completed. The attractiveness of exploiting desktop grids is further reinforced by the fact that costs are highly distributed: every volunteer supports her resources (hardware, power costs and internet connections) while the benefited entity provides management infrastructures, namely network bandwidth, servers and management services, receiving in exchange a massive and otherwise unaffordable computing power. The usefulness of desktop grid computing is not limited to major high throughput public computing projects. Many institutions, ranging from academics to enterprises, hold vast number of desktop machines and could benefit from exploiting the idle cycles of their local machines.</p><p>In the work presented in this thesis, the central idea has been to provide a desktop grid computing framework and to prove its viability by testing it in some Scientific Computing and Visualization experiments. We present here QADPZ, an open source system for desktop grid computing that have been developed to meet the above presented needs. QADPZ enables users from a local network or Internet to share their resources. It is a multi-platform, heterogeneous system, where different computing resources from inside an organization can be used. It can be used also for volunteer computing, where the communication infrastructure is the Internet. QADPZ supports the following native operating systems: Linux, Windows, MacOS and Unix variants. The reason behind natively supporting multiple operating systems, and not only one (Unix or Windows, as other systems do), is that often, in real life, this kind of limitation restricts very much the usability of desktop grid computing.</p><p>QADPZ provides a flexible object-oriented software framework that makes it easy for programmers to write various applications, and for researchers to address issues such as adaptive parallelism, fault-tolerance, and scalability. The framework supports also the execution of legacy applications, which for different reasons could not be rewritten, and that makes it suitable for other domains as business. It also supports low-level programming languages as C/C++ or high-level language applications, (e.g. Lisp, Python, and Java), and provides the necessary mechanisms to use such applications in a computation. Consequently, users with various backgrounds can benefit from using QADPZ. The flexible object-oriented structure and the modularity allow facile improvements and further extensions to other programming languages.</p><p>We have developed a general-purpose runtime and an API to support new kinds of high performance computing applications, and therefore to benefit from the advantages offered by desktop grid computing. This API directly supports the C/C++ programming language. We have shown how distributed computing extends beyond the master-worker paradigm (typical for such systems) and provided QADPZ with an extended API that supports in addition lightweight tasks and parallel computing (using the message passing paradigm - MPI). This extends the range of applications that can be used to already existing MPI based applications - e.g. parallel numerical solvers used in computational science, or parallel visualization algorithms.</p><p>Another restriction of existing systems, especially middleware based, is that each resource provider needs to install a runtime module with administrator privileges. This poses some issues regarding data integrity and accessibility on providers computers. The QADPZ system tries to overcome this by allowing the middleware module to run as a non-privileged user, even with restricted access, to the local system.</p><p>QADPZ provides also low-level optimizations, such as on-the-fly compression and encryption for communication. The user can choose from different algorithms, depending on the application, improving both the communication overhead imposed by large data transfers and keeping privacy of the data. The system goes further, by providing an experimental, adaptive compression algorithm, which can transparently choose different algorithms to improve the application. QADPZ support two different protocols (UDP and TCP/IP) in order to improve the efficiency of communication.</p><p>Free source code allows its flexible installations and modifications based on the particular needs of research projects and institutions. In addition to being a very powerful tool for computationally intensive research, the open sourceness makes QADPZ a flexible educational platform for numerous smallsize student projects in the areas of operating systems, distributed systems, mobile agents, parallel algorithms, etc. Open source software is a natural choice for modern research as well, because it encourages effectively integration, cooperation and boosting of new ideas.</p><p>This thesis proposes also an improved conceptual model (based on the master-worker paradigm), which makes contributions in several directions: pull vs. push work-units, pipelining of work-units, more work-units sent at a time, adaptive number of workers, adaptive time-out interval for work-units, and multithreading. We have also demonstrated that the use of desktop grids should not be limited to only master-worker applications, but it can be used for more fine-grained parallel Scientific Computing and Visualization applications, by performing some specific experiments. This thesis makes supplementary contributions: a hierarchical taxonomy of the main existing desktop grids, and an adaptive compression algorithm for remote visualization. QADPZ has also pioneered autonomic computing approach for desktop grids and presents specific self-management features: self-knowledge, self-configuration, selfoptimization and self-healing. It is worth to mention that to the present the QADPZ has over a thousand users who have download it (since July, 2001 when it has been uploaded to sourceforge.net), and many of them use it for their daily tasks (see the appendix). Many of the results have been published or are in course of publishing as it can be seen from the references.</p>
|
9 |
A Desktop Grid Computing Approach for Scientific Computing and VisualizationConstantinescu-Fuløp, Zoran January 2008 (has links)
Scientific Computing is the collection of tools, techniques, and theories required to solve on a computer, mathematical models of problems from science and engineering, and its main goal is to gain insight in such problems. Generally, it is difficult to understand or communicate information from complex or large datasets generated by Scientific Computing methods and techniques (computational simulations, complex experiments, observational instruments etc.). Therefore, support of Scientific Visualization is needed, to provide the techniques, algorithms, and software tools needed to extract and display appropriately important information from numerical data. Usually, complex computational and visualization algorithms require large amounts of computational power. The computing power of a single desktop computer is insufficient for running such complex algorithms, and, traditionally, large parallel supercomputers or dedicated clusters were used for this job. However, very high initial investments and maintenance costs limit the availability of such systems. A more convenient solution, which is becoming more and more popular, is based on the use of nondedicated desktop PCs in a Desktop Grid Computing environment. Harnessing idle CPU cycles, storage space and other resources of networked computers to work together on a particularly computational intensive application does this. Increasing power and communication bandwidth of desktop computers provides for this solution. In a desktop grid system, the execution of an application is orchestrated by a central scheduler node, which distributes the tasks amongst the worker nodes and awaits workers’ results. An application only finishes when all tasks have been completed. The attractiveness of exploiting desktop grids is further reinforced by the fact that costs are highly distributed: every volunteer supports her resources (hardware, power costs and internet connections) while the benefited entity provides management infrastructures, namely network bandwidth, servers and management services, receiving in exchange a massive and otherwise unaffordable computing power. The usefulness of desktop grid computing is not limited to major high throughput public computing projects. Many institutions, ranging from academics to enterprises, hold vast number of desktop machines and could benefit from exploiting the idle cycles of their local machines. In the work presented in this thesis, the central idea has been to provide a desktop grid computing framework and to prove its viability by testing it in some Scientific Computing and Visualization experiments. We present here QADPZ, an open source system for desktop grid computing that have been developed to meet the above presented needs. QADPZ enables users from a local network or Internet to share their resources. It is a multi-platform, heterogeneous system, where different computing resources from inside an organization can be used. It can be used also for volunteer computing, where the communication infrastructure is the Internet. QADPZ supports the following native operating systems: Linux, Windows, MacOS and Unix variants. The reason behind natively supporting multiple operating systems, and not only one (Unix or Windows, as other systems do), is that often, in real life, this kind of limitation restricts very much the usability of desktop grid computing. QADPZ provides a flexible object-oriented software framework that makes it easy for programmers to write various applications, and for researchers to address issues such as adaptive parallelism, fault-tolerance, and scalability. The framework supports also the execution of legacy applications, which for different reasons could not be rewritten, and that makes it suitable for other domains as business. It also supports low-level programming languages as C/C++ or high-level language applications, (e.g. Lisp, Python, and Java), and provides the necessary mechanisms to use such applications in a computation. Consequently, users with various backgrounds can benefit from using QADPZ. The flexible object-oriented structure and the modularity allow facile improvements and further extensions to other programming languages. We have developed a general-purpose runtime and an API to support new kinds of high performance computing applications, and therefore to benefit from the advantages offered by desktop grid computing. This API directly supports the C/C++ programming language. We have shown how distributed computing extends beyond the master-worker paradigm (typical for such systems) and provided QADPZ with an extended API that supports in addition lightweight tasks and parallel computing (using the message passing paradigm - MPI). This extends the range of applications that can be used to already existing MPI based applications - e.g. parallel numerical solvers used in computational science, or parallel visualization algorithms. Another restriction of existing systems, especially middleware based, is that each resource provider needs to install a runtime module with administrator privileges. This poses some issues regarding data integrity and accessibility on providers computers. The QADPZ system tries to overcome this by allowing the middleware module to run as a non-privileged user, even with restricted access, to the local system. QADPZ provides also low-level optimizations, such as on-the-fly compression and encryption for communication. The user can choose from different algorithms, depending on the application, improving both the communication overhead imposed by large data transfers and keeping privacy of the data. The system goes further, by providing an experimental, adaptive compression algorithm, which can transparently choose different algorithms to improve the application. QADPZ support two different protocols (UDP and TCP/IP) in order to improve the efficiency of communication. Free source code allows its flexible installations and modifications based on the particular needs of research projects and institutions. In addition to being a very powerful tool for computationally intensive research, the open sourceness makes QADPZ a flexible educational platform for numerous smallsize student projects in the areas of operating systems, distributed systems, mobile agents, parallel algorithms, etc. Open source software is a natural choice for modern research as well, because it encourages effectively integration, cooperation and boosting of new ideas. This thesis proposes also an improved conceptual model (based on the master-worker paradigm), which makes contributions in several directions: pull vs. push work-units, pipelining of work-units, more work-units sent at a time, adaptive number of workers, adaptive time-out interval for work-units, and multithreading. We have also demonstrated that the use of desktop grids should not be limited to only master-worker applications, but it can be used for more fine-grained parallel Scientific Computing and Visualization applications, by performing some specific experiments. This thesis makes supplementary contributions: a hierarchical taxonomy of the main existing desktop grids, and an adaptive compression algorithm for remote visualization. QADPZ has also pioneered autonomic computing approach for desktop grids and presents specific self-management features: self-knowledge, self-configuration, selfoptimization and self-healing. It is worth to mention that to the present the QADPZ has over a thousand users who have download it (since July, 2001 when it has been uploaded to sourceforge.net), and many of them use it for their daily tasks (see the appendix). Many of the results have been published or are in course of publishing as it can be seen from the references.
|
10 |
Revisiter les grilles de PCs avec des technologies du Web et le Cloud computing / Re-examaning the Desktop Grids with Web Technologies and Cloud ComputingAbidi, Leila 03 March 2015 (has links)
Le contexte de cette thèse est à l’intersection des contextes des grilles de calculs, des nouvelles technologies du Web ainsi que des Clouds et des services à la demande. Depuis leur avènement au cours des années 90, les plates-formes distribuées, plus précisément les systèmes de grilles de calcul (Grid Computing), n’ont pas cessé d’évoluer permettant ainsi de susciter multiple efforts de recherche. Les grilles de PCs ont été proposées comme une alternative aux super-calculateurs par la fédération des milliers d’ordinateurs de bureau. Les détails de la mise en oeuvre d’une telle architecture de grille, en termes de mécanismes de mutualisation des ressources, restent très difficile à cerner. Parallèlement, le Web a complètement modifié notre façon d’accéder à l’information. Le Web est maintenant une composante essentielle de notre quotidien. Les équipements ont, à leur tour, évolué d’ordinateurs de bureau ou ordinateurs portables aux tablettes, lecteurs multimédias, consoles de jeux, smartphones, ou NetPCs. Cette évolution exige d’adapter et de repenser les applications/intergiciels de grille de PCs qui ont été développés ces dernières années. Notre contribution se résume dans la réalisation d’un intergiciel de grille de PCs que nous avons appelé RedisDG. Dans son fonctionnement, RedisDG reste similaire à la plupart des intergiciels de grilles de calcul, c’est-à-dire qu’il est capable d’exécuter des applications sous forme de «sacs de tâches» dans un environnement distribué, assurer le monitoring des noeuds, valider et certifier les résultats. L’innovation de RedisDG, réside dans l’intégration de la modélisation et la vérification formelles dans sa phase de conception, ce qui est non conventionnel mais très pertinent dans notre domaine. Notre approche consiste à repenser les grilles de PCs à partir d’une réflexion et d’un cadre formel permettant de les développer, de manière rigoureuse et de mieux maîtriser les évolutions technologiques à venir. / The context of this work is at the intersection of grid computing, the new Web technologies and the Clouds and services on demand contexts. Desktop Grid have been proposed as an alternative to supercomputers by the federation of thousands of desktops. The details of the implementation of such an architecture, in terms of resource sharing mechanisms, remain very hard. Meanwhile, the Web has completely changed the way we access information. The equipment, in turn, have evolved from desktops or laptops to tablets, smartphones or NetPCs. Our approach is to rethink Desktop Grids from a reflexion and a formal framework to develop them rigorously and better control future technological developments. We have reconsidered the interactions between the traditional components of a Desktop Grid based on the Web technology, and given birth to RedisDG, a new Desktop Grid middelware capable to operate on small devices, ie smartphones, tablets like the more traditional devicves (PCs). Our system is entirely based on the publish-subscribe paradigm. RedisDG is developped with Python and uses Redis as advanced key-value cache and store.
|
Page generated in 0.0532 seconds