• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 163
  • 73
  • 45
  • 20
  • 18
  • 12
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 394
  • 78
  • 73
  • 72
  • 70
  • 59
  • 57
  • 50
  • 38
  • 37
  • 35
  • 35
  • 34
  • 34
  • 33
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
71

Genetic Programming in Mathematica

Suleman, Hussein 01 1900 (has links)
GP has traditionally been implemented in LISP but there is a slow migration towards faster languages like C++. Any implementation language is dictated not only by the speed of the platform but also by the desirability of such an implementation. With a large number of scientists migrating to scientifically-biased programming languages like Mathematica, such provides an ideal testbed for GP.In this study it was attempted to implement GP on a Mathematica platform, exploiting the advantages of Mathematica's unique capabilities. Wherever possible, optimizations have been applied to drive the GP algorithm towards realistic goals. At an early stage it was noted that the standard GP algorithm could be significantly speeded up by parallelisation and the distribution of processing. This was incorporated into the algorithm, using known techniques and Mathematica-specific knowledge.
72

Programmer Cognition in Explicit Coordination Modeling: Understanding the Design of Complex Human Interaction and Coordination

Lin, Sirong 22 December 2011 (has links)
Parallel thinking is a mindset that enables computer scientists to think about and implement systems that allow activities to happen concurrently. This mindset is needed in designing and implementing a wide range of computer systems involving coordinated components (e.g., parallel, distributed, and multi-user systems). No matter what the coordinated component is, whether human or computer, the underlying issue is to imagine coordination between these components and manage the distribution and reintegration of coordinated work. The rapid development of multi-core technologies has attracted people's attention back to parallelism. Ubiquitous and pervasive computing further brings parallelism into the everyday experiences of non-computer scientists. Designing and developing for ubiquitous parallelism become an essential and heavy responsibility for every software designer and developer. This situation creates a new standard for every one working in the computing field; simply understanding the techniques and algorithm in parallel-distributed computing to support parallel computing resources is not enough; the ability to create support for parallel human activities is also needed. Therefore, the need to train CS students to have a "parallel thinking" mindset is more urgent than ever. This doctoral work approaches the pedagogy of parallel thinking by teaching CS students to model coordination for parallel human activities explicitly. Although most participants started with an undeveloped imagination for human coordination, they were able to improve by focusing on coordination issues in the context of a class. The research method was to study a semester-long experimental class in the Department of Computer Science at Virginia Tech through a qualitative design-based research approach. Multiple types of data were collected using methodological triangulation to maximize validity. The data analysis process was guided by Grounded Theory (GT) through a systematic set of procedures. The outcomes provide a rich, thick, and detailed description about how CS students conceptualize and approach parallel thinking. The research contributes to CS education, programmer cognition literature, and computer supported collaborative system design and development by elaborating and analyzing various challenges in coordinated system creation, and making suggestions about pedagogical solutions, and software infrastructure and tools design. / Ph. D.
73

Applying Source Level Auto-Vectorization to Aparapi Java

Albert, Frank Curtis 19 June 2014 (has links)
Ever since chip manufacturers hit the power wall preventing them from increasing processor clock speed, there has been an increased push towards parallelism for performance improvements. This parallelism comes in the form of both data parallel single instruction multiple data (SIMD) instructions, as well as parallel compute cores in both central processing units (CPUs) and graphics processing units (GPUs). While these hardware enhancements offer potential performance enhancements, programs must be re-written to take advantage of them in order to see any performance improvement Some lower level languages that compile directly to machine code already take advantage of the data parallel SIMD instructions, but often higher level interpreted languages do not. Java, one of the most popular programming languages in the world, still does not include support for these SIMD instructions. In this thesis, we present a vector library that implements all of the major SIMD instructions in functions that are accessible to Java through JNI function calls. This brings the benefits of general purpose SIMD functionality to Java. This thesis also works with the data parallel Aparapi Java extension to bring these SIMD performance improvements to programmers who use the extension without any additional effort on their part. Aparapi already provides programmers with an API that allows programmers to declare certain sections of their code parallel. These parallel sections are then run on OpenCL capable hardware with a fallback path in the Java thread pool to ensure code reliability. This work takes advantage of the knowledge of independence of the parallel sections of code to automatically modify the Java thread pool fallback path to include the vectorization library through the use of an auto-vectorization tool created for this work. When the code is not vectorizable the auto-vectorizer tool is still able to offer performance improvements over the default fallback path through an improved looped implementation that executes the same code but with less overhead. Experiments conducted by this work illustrate that for all 10 benchmarks tested the auto-vectorization tool was able to produce an implementation that was able to beat the default Aparapi fallback path. In addition it was found that this improved fallback path even outperformed the GPU implementation for several of the benchmarks tested. / Master of Science
74

Language Constructs for Safe Parallel Programming on Multi-Cores

Östlund, Johan January 2016 (has links)
The last decade has seen the transition from single-core processors to multi-cores and many-cores. This move has by and large shifted the responsibility from chip manufacturers to programmers to keep up with ever-increasing expectations on performance. In the single-core era, improvements in hardware capacity could immediately be leveraged by an application: faster machine - faster program. In the age of the multi-cores, this is no longer the case. Programs must be written in specific ways to utilize available parallel hardware resources. Programming language support for concurrent and parallel programming is poor in most popular object-oriented programming languages. Shared memory, threads and locks is the most common concurrency model provided. Threads and locks are hard to understand, error-prone and inflexible; they break encapsulation - the very foundation of the object-oriented approach. This makes it hard to break large complex problems into smaller pieces which can be solved independently and composed to make a whole. Ubiquitous parallelism and object-orientation, seemingly, do not match. Actors, or active objects, have been proposed as a concurrency model better fit for object-oriented programming than threads and locks. Asynchronous message passing between actors each with a logical thread of control preserves encapsulation as objects themselves decide when messages are executed. Unfortunately most implementations of active objects do not prevent sharing of mutable objects across actors. Sharing, whether on purpose or by accident, exposes objects to multiple threads of control, destroying object encapsulation. In this thesis we show techniques for compiler-enforced isolation of active objects, while allowing sharing and zero-copy communication of mutable data in the cases where it is safe to do so. We also show how the same techniques that enforce isolation can be utilized internal to an active object to allow data race-free parallel message processing and data race-free structured parallel computations. This overcomes the coarse-grained nature of active object parallelism without compromising safety. / UPMARC
75

Uma proposta de escalonamento distribuído para exploração de paralelismo na programação em lógica / A distributed scheduler proposal for exploration of parellelism in logic programming

Costa, Cristiano Andre da January 1998 (has links)
Este trabalho apresenta um modelo de escalonamento hierárquico para exploração do paralelismo E Independente e do paralelismo OU na programação em lógica. O modelo utiliza informações de granulosidade geradas pelo GRANLOG (Granularity Analyzer for Logic Programming) para o auxílio ao escalonamento. Um estudo detalhado de ambientes de programação em lógica explorando o paralelismo é apresentado. A partir deste, é feita uma comparação destacando as principais características de cada um. O escalonamento em linhas gerais também é descrito e uma enfâse maior é dada ao escalonamento dinâmico. As principais vantagens e desvantagens de cada escalonador são mostradas. O modelo proposto recebe o nome de DSLP – Distributed Scheduler for Logic Programming e realiza o escalonamento em duas fases. Inicialmente é executada a Fase OU, na qual todo paralelismo OU é explorado. Em seguida, é iniciada a Fase E onde ocorre a exploração do paralelismo E Independente. A estratégia de escalonamento proposta, utiliza informações de complexidade do GRANLOG para determinar o trabalho a ser exportado, bem como o nível de sobrecarga dos nodos. Para validação do trabalho, um protótipo utilizando o ambiente Parallel Virtual Machine foi implementado. O protótipo é um simulador de programas Prolog e implementa a fase E de escalonamento. / This work presents a hierarchical scheduling model for exploration of the Independent AND parallelism and OR parallelism in logic programming. The model uses granularity information generated by GRANLOG (Granularity Analyzer for Logic Programming) to aid the scheduler. A detailed study of parallel logic programming environments is presented. Starting from this, it is made a comparison highlighting the main characteristics of each one. Scheduling in general is also described and the dynamic scheduling is pointed out. The main advantages and disadvantages of each scheduler are shown. The proposed model receives the name of DSLP – Distributed Scheduler for Logic Programming and it accomplishes the scheduling in two phases. Initially the OR Phase is executed and the whole OR parallelism is explored. Soon after, it is initiate the AND Phase with the exploration of the Independent AND parallelism. The scheduling strategy proposed uses complexity information generated by GRANLOG to determinate the task to be exported, as well as the nodes overloaded level. For work validation, a prototype using the Parallel Virtual Machine was implemented. The prototype is a Prolog simulator and it implements the scheduling AND phase.
76

Uma proposta de escalonamento distribuído para exploração de paralelismo na programação em lógica / A distributed scheduler proposal for exploration of parellelism in logic programming

Costa, Cristiano Andre da January 1998 (has links)
Este trabalho apresenta um modelo de escalonamento hierárquico para exploração do paralelismo E Independente e do paralelismo OU na programação em lógica. O modelo utiliza informações de granulosidade geradas pelo GRANLOG (Granularity Analyzer for Logic Programming) para o auxílio ao escalonamento. Um estudo detalhado de ambientes de programação em lógica explorando o paralelismo é apresentado. A partir deste, é feita uma comparação destacando as principais características de cada um. O escalonamento em linhas gerais também é descrito e uma enfâse maior é dada ao escalonamento dinâmico. As principais vantagens e desvantagens de cada escalonador são mostradas. O modelo proposto recebe o nome de DSLP – Distributed Scheduler for Logic Programming e realiza o escalonamento em duas fases. Inicialmente é executada a Fase OU, na qual todo paralelismo OU é explorado. Em seguida, é iniciada a Fase E onde ocorre a exploração do paralelismo E Independente. A estratégia de escalonamento proposta, utiliza informações de complexidade do GRANLOG para determinar o trabalho a ser exportado, bem como o nível de sobrecarga dos nodos. Para validação do trabalho, um protótipo utilizando o ambiente Parallel Virtual Machine foi implementado. O protótipo é um simulador de programas Prolog e implementa a fase E de escalonamento. / This work presents a hierarchical scheduling model for exploration of the Independent AND parallelism and OR parallelism in logic programming. The model uses granularity information generated by GRANLOG (Granularity Analyzer for Logic Programming) to aid the scheduler. A detailed study of parallel logic programming environments is presented. Starting from this, it is made a comparison highlighting the main characteristics of each one. Scheduling in general is also described and the dynamic scheduling is pointed out. The main advantages and disadvantages of each scheduler are shown. The proposed model receives the name of DSLP – Distributed Scheduler for Logic Programming and it accomplishes the scheduling in two phases. Initially the OR Phase is executed and the whole OR parallelism is explored. Soon after, it is initiate the AND Phase with the exploration of the Independent AND parallelism. The scheduling strategy proposed uses complexity information generated by GRANLOG to determinate the task to be exported, as well as the nodes overloaded level. For work validation, a prototype using the Parallel Virtual Machine was implemented. The prototype is a Prolog simulator and it implements the scheduling AND phase.
77

Uma proposta de escalonamento distribuído para exploração de paralelismo na programação em lógica / A distributed scheduler proposal for exploration of parellelism in logic programming

Costa, Cristiano Andre da January 1998 (has links)
Este trabalho apresenta um modelo de escalonamento hierárquico para exploração do paralelismo E Independente e do paralelismo OU na programação em lógica. O modelo utiliza informações de granulosidade geradas pelo GRANLOG (Granularity Analyzer for Logic Programming) para o auxílio ao escalonamento. Um estudo detalhado de ambientes de programação em lógica explorando o paralelismo é apresentado. A partir deste, é feita uma comparação destacando as principais características de cada um. O escalonamento em linhas gerais também é descrito e uma enfâse maior é dada ao escalonamento dinâmico. As principais vantagens e desvantagens de cada escalonador são mostradas. O modelo proposto recebe o nome de DSLP – Distributed Scheduler for Logic Programming e realiza o escalonamento em duas fases. Inicialmente é executada a Fase OU, na qual todo paralelismo OU é explorado. Em seguida, é iniciada a Fase E onde ocorre a exploração do paralelismo E Independente. A estratégia de escalonamento proposta, utiliza informações de complexidade do GRANLOG para determinar o trabalho a ser exportado, bem como o nível de sobrecarga dos nodos. Para validação do trabalho, um protótipo utilizando o ambiente Parallel Virtual Machine foi implementado. O protótipo é um simulador de programas Prolog e implementa a fase E de escalonamento. / This work presents a hierarchical scheduling model for exploration of the Independent AND parallelism and OR parallelism in logic programming. The model uses granularity information generated by GRANLOG (Granularity Analyzer for Logic Programming) to aid the scheduler. A detailed study of parallel logic programming environments is presented. Starting from this, it is made a comparison highlighting the main characteristics of each one. Scheduling in general is also described and the dynamic scheduling is pointed out. The main advantages and disadvantages of each scheduler are shown. The proposed model receives the name of DSLP – Distributed Scheduler for Logic Programming and it accomplishes the scheduling in two phases. Initially the OR Phase is executed and the whole OR parallelism is explored. Soon after, it is initiate the AND Phase with the exploration of the Independent AND parallelism. The scheduling strategy proposed uses complexity information generated by GRANLOG to determinate the task to be exported, as well as the nodes overloaded level. For work validation, a prototype using the Parallel Virtual Machine was implemented. The prototype is a Prolog simulator and it implements the scheduling AND phase.
78

Some formal characteristics of parallel speech in Kambera

Asplund, Leif January 2017 (has links)
In all languages of the eastern Indonesian island of Sumba, parallel speech is used in ritual contexts. In this study, some characteristics of parallel speech in Kambera are investigated. In the analysis of the structural types of couplets, the units of parallel speech, all the couplets found in Kapita (1987), and some additional materials, even from other Sumbanese languages, are used. In the rest of the investigation, only a sample of 100 couplets, here loosely defined as parallel speech units, in Kapita (1987) of the most common type is used, and the investigation is limited to formal (non-semantic) features which connect the two lines. The features investigated are number of syllables, words, and stresses and syntactic structure. In the discussion part, the question if the existence of non-parallel lines incorporating parallel pairs should be recognized and other questions are discussed. The conclusion summarizes the results about the different varieties of couplets and the formal connection between the two lines in them. / I alla språk på den östindonesiska ön Sumba användsparallellt tal i rituella sammanhang. I denna studie undersöks några egenskaperav parallellt tal i Kambera. I analysen av kupletters (enheter för parallellttal) strukturtyper används alla kupletter som finns i Kapita (1987) och en delannat material, även från andra sumbanesiska språk. I resten av undersökningenanvänds endast ett urval av 100 kupletter, här löst definierade som parallellatalenheter, i Kapita (1987) av den vanligaste typen, och undersökningen ärbegränsad till formella (icke-semantiska) drag som förbinder de två raderna. Dedrag som undersökts är antalet stavelser, ord och betoningar och syntaktiskstruktur. I diskussionsdelen diskuteras bl a om existensen av icke-parallellarader som inkorporerar parallella par bör erkännas. Slutsatserna summerarresultaten om de olika typerna av kupletter och den formella kopplingen mellande två raderna i dem.
79

Parallelisierung von Algorithmen zur Nutzung auf Architekturen mit Teilwortparallelität / Parallelization of Algorithms for using on Architectures with Subword Parallelism

Schaffer, Rainer 12 October 2010 (has links) (PDF)
Der technologische Fortschritt gestattet die Implementierung zunehmend komplexerer Prozessorarchitekturen auf einem Schaltkreis. Ein Trend der letzten Jahre ist die Implementierung von mehr und mehr Verarbeitungseinheiten auf einem Chip. Daraus ergeben sich neue Herausforderungen für die Abbildung von Algorithmen auf solche Architekturen, denn alle Verarbeitungseinheiten sollen effizient bei der Ausführung des Algorithmus genutzt werden. Der Schwerpunkt der eingereichten Dissertation ist die Ausnutzung der Parallelität von Rechenfeldern mit Teilwortparallelität. Solche Architekturen erlauben Parallelverarbeitung auf mehreren Ebenen. Daher wurde eine Abbildungsstrategie, mit besonderem Schwerpunkt auf Teilwortparallelität entwickelt. Diese Abbildungsstrategie basiert auf den Methoden des Rechenfeldentwurfs. Rechenfelder sind regelmäßig angeordnete Prozessorelemente, die nur mit ihren Nachbarelementen kommunizieren. Die Datenein- und -ausgabe wird durch die Prozessorelemente am Rand des Rechenfeldes realisiert. Jedes Prozessorelement kann mehrere Funktionseinheiten besitzen, welche die Rechenoperationen des Algorithmus ausführen. Die Teilwortparallelität bezeichnet die Fähigkeit zur Teilung des Datenpfads der Funktionseinheit in mehrere schmale Datenpfade für die parallele Ausführung von Daten mit geringer Wortbreite. Die entwickelte Abbildungsstrategie unterteilt sich in zwei Schritte, die \"Vorverarbeitung\" und die \"Mehrstufige Modifizierte Copartitionierung\" (kurz: MMC). Die \"Vorverarbeitung\" verändert den Algorithmus in einer solchen Art, dass der veränderte Algorithmus schnell und effizient auf die Zielarchitektur abgebildet werden kann. Hierfür wurde ein Optimierungsproblem entwickelt, welches schrittweise die Parameter für die Transformation des Algorithmus bestimmt. Die \"Mehrstufige Modifizierte Copartitionierung\" wird für die schrittweise Anpassung des Algorithmus an die Zielarchitektur eingesetzt. Darüber hinaus ermöglicht die Abbildungsmethode die Ausnutzung der lokalen Register in den Prozessorelementen und die Anpassung des Algorithmus an die Speicherarchitektur, an die das Rechenfeld angebunden ist. Die erste Stufe der MMC dient der Transformation eines Algorithmus mit Einzeldatenoperationen in einen Algorithmus mit teilwortparallelen Operationen. Mit der zweiten Copartitionierungsstufe wird der Algorithmus an die lokalen Register und an das Rechenfeld angepasst. Weitere Copartitionierungsstufen können zur Anpassung des Algorithmus an die Speicherarchitektur verwendet werden. / The technological progress allows the implementation of complex processor architectures on a chip. One trend of the last years is the implemenation of more and more execution units on one chip. That implies new challenges for the mapping of algorithms on such architectures, because the execution units should be used efficiently during the execution of the algorithm. The focus of the submitted dissertation thesis is the utilization of the parallelism of processor arrays with subword parallelism. Such architectures allow parallel executions on different levels. Therefore an algorithm mapping strategy was developed, where the exploitation of the subword parallelism was in the focus. This algorithm mapping strategy is based on the methods of the processor array design. Processor arrays are regular arranged processor elements, which communicate with their neighbors elements only. The data in- and output will be realized by the processor elements on the border of the array. Each processor element can have several functional units, which execute the computational operations. Subword parallelism means the capability for splitting the data path of the functional units in several smaller chunks for the parallel execution of data with lower word width. The developed mapping strategy is subdivided in two steps, the \"Preprocessing\" and the \"Multi-Level Modified Copartitioning\" (kurz: MMC), whereat the MMC means the method of the step simultaneously. The \"Preprocessing\" alter the algorithm in such a kind, that the altered algorithm can be fast and efficient mapped on the target architecture. Therefore an optimization problem was developed, which determines gradual the parameter for the transformation of the algorithm. The \"Multi-Level Modified Copartitioning\" is used for mapping the algorithm gradual on the target architecture. Furthermore the mapping methodology allows the exploitation of the local registers in the processing elements and the adaptation of the algorithm on the memory architecture, where the processing array is connected on. The first level of the MMC is used for the transformation of an algorithm with operation based on single data to an algorithm with subword parallel operations. With the second level, the algorithm will be adapted to the local registers in the processing elements and to the processor array. Further copartition levels can be used for matching the algorithm to the memory architecture.
80

On the automated compilation of UML notation to a VLIW chip multiprocessor

Stevens, David January 2013 (has links)
With the availability of more and more cores within architectures the process of extracting implicit and explicit parallelism in applications to fully utilise these cores is becoming complex. Implicit parallelism extraction is performed through the inclusion of intelligent software and hardware sections of tool chains although these reach their theoretical limit rather quickly. Due to this the concept of a method of allowing explicit parallelism to be performed as fast a possible has been investigated. This method enables application developers to perform creation and synchronisation of parallel sections of an application at a finer-grained level than previously possible, resulting in smaller sections of code being executed in parallel while still reducing overall execution time. Alongside explicit parallelism, a concept of high level design of applications destined for multicore systems was also investigated. As systems are getting larger it is becoming more difficult to design and track the full life-cycle of development. One method used to ease this process is to use a graphical design process to visualise the high level designs of such systems. One drawback in graphical design is the explicit nature in which systems are required to be generated, this was investigated, and using concepts already in use in text based programming languages, the generation of platform-independent models which are able to be specialised to multiple hardware architectures was developed. The explicit parallelism was performed using hardware elements to perform thread management, this resulted in speed ups of over 13 times when compared to threading libraries executed in software on commercially available processors. This allowed applications with large data dependent sections to be parallelised in small sections within the code resulting in a decrease of overall execution time. The modelling concepts resulted in the saving of between 40-50% of the time and effort required to generate platform-specific models while only incurring an overhead of up to 15% the execution cycles of these models designed for specific architectures.

Page generated in 0.3169 seconds