• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 165
  • 73
  • 45
  • 20
  • 18
  • 12
  • 4
  • 4
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 396
  • 78
  • 73
  • 72
  • 70
  • 59
  • 57
  • 50
  • 38
  • 37
  • 36
  • 35
  • 34
  • 34
  • 34
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
81

On the automated compilation of UML notation to a VLIW chip multiprocessor

Stevens, David January 2013 (has links)
With the availability of more and more cores within architectures the process of extracting implicit and explicit parallelism in applications to fully utilise these cores is becoming complex. Implicit parallelism extraction is performed through the inclusion of intelligent software and hardware sections of tool chains although these reach their theoretical limit rather quickly. Due to this the concept of a method of allowing explicit parallelism to be performed as fast a possible has been investigated. This method enables application developers to perform creation and synchronisation of parallel sections of an application at a finer-grained level than previously possible, resulting in smaller sections of code being executed in parallel while still reducing overall execution time. Alongside explicit parallelism, a concept of high level design of applications destined for multicore systems was also investigated. As systems are getting larger it is becoming more difficult to design and track the full life-cycle of development. One method used to ease this process is to use a graphical design process to visualise the high level designs of such systems. One drawback in graphical design is the explicit nature in which systems are required to be generated, this was investigated, and using concepts already in use in text based programming languages, the generation of platform-independent models which are able to be specialised to multiple hardware architectures was developed. The explicit parallelism was performed using hardware elements to perform thread management, this resulted in speed ups of over 13 times when compared to threading libraries executed in software on commercially available processors. This allowed applications with large data dependent sections to be parallelised in small sections within the code resulting in a decrease of overall execution time. The modelling concepts resulted in the saving of between 40-50% of the time and effort required to generate platform-specific models while only incurring an overhead of up to 15% the execution cycles of these models designed for specific architectures.
82

Simultaneous partitioning and modeling : a framework for learning from complex data

Deodhar, Meghana 11 October 2010 (has links)
While a single learned model is adequate for simple prediction problems, it may not be sufficient to represent heterogeneous populations that difficult classification or regression problems often involve. In such scenarios, practitioners often adopt a "divide and conquer" strategy that segments the data into relatively homogeneous groups and then builds a model for each group. This two-step procedure usually results in simpler, more interpretable and actionable models without any loss in accuracy. We consider prediction problems on bi-modal or dyadic data with covariates, e.g., predicting customer behavior across products, where the independent variables can be naturally partitioned along the modes. A pivoting operation can now result in the target variable showing up as entries in a "customer by product" data matrix. We present a model-based co-clustering framework that interleaves partitioning (clustering) along each mode and construction of prediction models to iteratively improve both cluster assignment and fit of the models. This Simultaneous CO-clustering And Learning (SCOAL) framework generalizes co-clustering and collaborative filtering to model-based co-clustering, and is shown to be better than independently clustering the data first and then building models. Our framework applies to a wide range of bi-modal and multi-modal data, and can be easily specialized to address classification and regression problems in domains like recommender systems, fraud detection and marketing. Further, we note that in several datasets not all the data is useful for the learning problem and ignoring outliers and non-informative values may lead to better models. We explore extensions of SCOAL to automatically identify and discard irrelevant data points and features while modeling, in order to improve prediction accuracy. Next, we leverage the multiple models provided by the SCOAL technique to address two prediction problems on dyadic data, (i) ranking predictions based on their reliability, and (ii) active learning. We also extend SCOAL to predictive modeling of multi-modal data, where one of the modes is implicitly ordered, e.g., time series data. Finally, we illustrate our implementation of a parallel version of SCOAL based on the Google Map-Reduce framework and developed on the open source Hadoop platform. We demonstrate the effectiveness of specific instances of the SCOAL framework on prediction problems through experimentation on real and synthetic data. / text
83

Complementing user-level coarse-grain parallelism with implicit speculative parallelism

Ioannou, Nikolas January 2012 (has links)
Multi-core and many-core systems are the norm in contemporary processor technology and are expected to remain so for the foreseeable future. Parallel programming is, thus, here to stay and programmers have to endorse it if they are to exploit such systems for their applications. Programs using parallel programming primitives like PThreads or OpenMP often exploit coarse-grain parallelism, because it offers a good trade-off between programming effort versus performance gain. Some parallel applications show limited or no scaling beyond a number of cores. Given the abundant number of cores expected in future many-cores, several cores would remain idle in such cases while execution performance stagnates. This thesis proposes using cores that do not contribute to performance improvement for running implicit fine-grain speculative threads. In particular, we present a many-core architecture and protocols that allow applications with coarse-grain explicit parallelism to further exploit implicit speculative parallelism within each thread. We show that complementing parallel programs with implicit speculative mechanisms offers significant performance improvements for a large and diverse set of parallel benchmarks. Implicit speculative parallelism frees the programmer from the additional effort to explicitly partition the work into finer and properly synchronized tasks. Our results show that, for a many-core comprising 128 cores supporting implicit speculative parallelism in clusters of 2 or 4 cores, performance improves on top of the highest scalability point by 44% on average for the 4-core cluster and by 31% on average for the 2-core cluster. We also show that this approach often leads to better performance and energy efficiency compared to existing alternatives such as Core Fusion and Turbo Boost. Moreover, we present a dynamic mechanism to choose the number of explicit and implicit threads, which performs within 6% of the static oracle selection of threads. To improve energy efficiency processors allow for Dynamic Voltage and Frequency Scaling (DVFS), which enables changing their performance and power consumption on-the-fly. We evaluate the amenability of the proposed explicit plus implicit threads scheme to traditional power management techniques for multithreaded applications and identify room for improvement. We thus augment prior schemes and introduce a novel multithreaded power management scheme that accounts for implicit threads and aims to minimize the Energy Delay2 product (ED2). Our scheme comprises two components: a “local” component that tries to adapt to the different program phases on a per explicit thread basis, taking into account implicit thread behavior, and a “global” component that augments the local components with information regarding inter-thread synchronization. Experimental results show a reduction of ED2 of 8% compared to having no power management, with an average reduction in power of 15% that comes at a minimal loss of performance of less than 3% on average.
84

The Effect of Psychometric Parallelism among Predictors on the Efficiency of Equal Weights and Least Squares Weights in Multiple Regression

Zhang, Desheng 05 1900 (has links)
There are several conditions for applying equal weights as an alternative to least squares weights. Psychometric parallelism, one of the conditions, has been suggested as a necessary and sufficient condition for equal-weights aggregation. The purpose of this study is to investigate the effect of psychometric parallelism among predictors on the efficiency of equal weights and least squares weights. Target correlation matrices with 10,000 cases were simulated so that the matrices had varying degrees of psychometric parallelism. Five hundred samples with six ratios of observation to predictor = 5/1, 10/1, 20/1, 30/1, 40/1, and 50/1 were drawn from each population. The efficiency is interpreted as the accuracy and the predictive power estimated by the weighting methods. The accuracy is defined by the deviation between the population R² and the sample R² . The predictive power is referred to as the population cross-validated R² and the population mean square error of prediction. The findings indicate there is no statistically significant relationship between the level of psychometric parallelism and the accuracy of least squares weights. In contrast, the correlation between the level of psychometric parallelism and the accuracy of equal weights is significantly negative. Under different conditions, the minimum p value of χ² for testing psychometric parallelism among predictors is also different in order to prove equal weights more powerful than least squares weights. The higher the number of predictors is, the higher the minimum p value. The higher the ratio of observation to predictor is, the higher the minimum p value. The higher the magnitude of intercorrelations among predictors is, the lower the minimum p value. This study demonstrates that the most frequently used levels of significance, 0.05 and 0.01, are no longer the only p values for testing the null hypotheses of psychometric parallelism among predictors when replacing least squares weights with equal weights.
85

Politický paralelismus mediálních obsahů / Political parallelism of media content

Rabitsch Adamčíková, Jitka January 2015 (has links)
. Jitka Rabitsch Aamčíková The aim of this thesis is the critical analysis of political influence of Andrej Babis through his ownership of the Czech daily Mlada Fronta DNES. The purpose of the work is the display of political content in the Czech media, especially in relation to elections. This reflection is viewed through the lens of political parallelism complemented by an introduction to political communication. The methodological outline is followed by the analytical part using a quantitative content analysis which focuses on the presentation of four political parties in the Czech daily Mlada Fronta. This thesis tests the hypothesis whether the media picture of competing political parties during the Czech local elections 2014 was influenced by the fact that the entrepreneur and politician Andrej Babis is the owner of the Mlada Fronta.
86

Analysis of Interface Automata with On-Demand Replication / Analysis of Interface Automata with On-Demand Replication

Daniel, Jakub January 2013 (has links)
Interface automaton is a model of software component behaviour based on finite state machines. It describes component's provided interface, the supported usage, and required interface, the usage of other components. A considerable number of components can be used in parallel with no bound on the level of parallelism. It is not necessary for the model to attempt to capture such unboundedness. An alternative approach is to allow an increment of the level of parallelism on- demand. This thesis analyses on a theoretical level and proposes a final form of an operation to perform such replication to allow creation of models of an arbitrary level of parallelism of certain parts of its behaviour.
87

[en] AN ALTERNATIVE MODEL FOR CONCURRENT PROGRAMMING IN LUA / [pt] UM MODELO ALTERNATIVO PARA PROGRAMAÇÃO CONCORRENTE EM LUA

ALEXANDRE RUPERT ARPINI SKYRME 23 July 2008 (has links)
[pt] A popularização dos processadores multinúcleo e de tecnologias como o hyper-threading evidencia uma mudança de foco na evolução dos processadores. Essa mudança fomenta o interesse por programação concorrente e a exploração de paralelismo para obtenção de melhor desempenho. Entretanto, os modelos atuais para programação concorrente são alvo de críticas recorrentes, o que estimula a elaboração de propostas alternativas. Este trabalho apresenta uma análise crítica do multithreading preemptivo com compartilhamento de memória, um modelo amplamente utilizado para programação concorrente, e faz um breve apanhado de trabalhos que abordam alternativas para programação concorrente. Em seguida, propõe um modelo para programação concorrente estruturado com a linguagem de programação Lua e descreve as suas principais características e vantagens. Finalmente, apresenta os resultados da avaliação de diversos aspectos de uma biblioteca desenvolvida para implementar o modelo proposto. / [en] The popularization of multi-core processors and of technologies such as hyper-threading indicates a different approach to the evolution of processors. This new approach brings about an increased interest in concurrent programming and the exploration of parallelism in order to achieve better performance. However, concurrent programming models now in use are subject to recurring criticism, which stimulates the development of alternative proposals. This work presents a critical analysis of preemptive multithreading with shared memory, which is a widely used model for concurrent programming, and brie y summarizes some studies that deal with alternatives for concurrent programming. It then, proposes a model for concurrent programming structured with the Lua programming language and describes its main characteristics and advantages. Finally, it presents the results of an evaluation of several aspects of a library developed to implement the proposed model.
88

Improving performance on NUMA systems / Amélioration de performance sur les architectures NUMA

Lepers, Baptiste 24 January 2014 (has links)
Les machines multicœurs actuelles utilisent une architecture à Accès Mémoire Non-Uniforme (Non-Uniform Memory Access - NUMA). Dans ces machines, les cœurs sont regroupés en nœuds. Chaque nœud possède son propre contrôleur mémoire et est relié aux autres nœuds via des liens d'interconnexion. Utiliser ces architectures à leur pleine capacité est difficile : il faut notamment veiller à éviter les accès distants (i.e., les accès d'un nœud vers un autre nœud) et la congestion sur les bus mémoire et les liens d'interconnexion. L'optimisation de performance sur une machine NUMA peut se faire de deux manières : en implantant des optimisations ad-hoc au sein des applications ou de manière automatique en utilisant des heuristiques. Cependant, les outils existants fournissent trop peu d'informations pour pouvoir implanter efficacement des optimisations et les heuristiques existantes ne permettent pas d'éviter les problèmes de congestion. Cette thèse résout ces deux problèmes. Dans un premier temps nous présentons MemProf, le premier outil d'analyse permettant d'implanter efficacement des optimisations NUMA au sein d'applications. Pour ce faire, MemProf construit des flots d'interactions entre threads et objets. Nous évaluons MemProf sur 3 machines NUMA et montrons que les optimisations trouvées grâce à MemProf permettent d'obtenir des gains de performance significatifs (jusqu'à 2.6x) et sont très simples à implanter (moins de 10 lignes de code). Dans un second temps, nous présentons Carrefour, un algorithme de gestion de la mémoire pour machines NUMA. Contrairement aux heuristiques existantes, Carrefour se concentre sur la réduction de la congestion sur les machines NUMA. Carrefour permet d'obtenir des gains de performance significatifs (jusqu'à 3.3x) et est toujours plus performant que les heuristiques existantes. / Modern multicore systems are based on a Non-Uniform Memory Access (NUMA) design. In a NUMA system, cores are grouped in a set of nodes. Each node has a memory controller and is interconnected with other nodes using high speed interconnect links. Efficiently exploiting such architectures is notoriously complex for programmers. Two key objectives on NUMA multicore machines are to limit as much as possible the number of remote memory accesses (i.e., accesses from a node to another node) and to avoid contention on memory controllers and interconnect links. These objectives can be achieved by implementing application-level optimizations or by implementing application-agnostic heuristics. However, in many cases, existing profilers do not provide enough information to help programmers implement application-level optimizations and existing application-agnostic heuristics fail to address contention issues. The contributions of this thesis are twofold. First we present MemProf, a profiler that allows programmers to choose and implement efficient application-level optimizations for NUMA systems. MemProf builds temporal flows of interactions between threads and objects, which help programmers understand why and which memory objects are accessed remotely. We evaluate MemProf on Linux on three different machines. We show how MemProf helps us choose and implement efficient optimizations, unlike existing profilers. These optimizations provide significant performance gains (up to 2.6x), while requiring very lightweight modifications (10 lines of code or less). Then we present Carrefour, an application-agnostic memory management algorithm. Contrarily to existing heuristics, Carrefour focuses on traffic contention on memory controllers and interconnect links. Carrefour provides significant performance gains (up to 3.3x) and always performs better than existing heuristics.
89

Análises estatísticas para a paralelização de linguagens de atribuição única para sistemas de memória distribuída / Static analysis for the parallelization of single assigment languages for distributed memory systems

Nakashima, Raul Junji 24 September 2001 (has links)
Este trabalho descreve técnicas de análise estática de compilação baseadas na álgebra e programação linear que buscam otimizar a distribuição de loops forall e array em programas escritos na linguagem S/SAL visando à execução em máquinas paralelas de memória distribuídas. Na fase de alinhamento, nós trabalhamos com o alinhamento de hiperplanos onde objetivo é tentar encontrar as porções dos diferentes arrays que necessitam ser distribuídas juntas. Na fase de divisão, que tenta quebrar em partes independente dados e computações, nós usamos duas funções afins, a função de decomposição de dados e a função de decomposição de computação. A última fase, o mapeamento, distribui os elementos de computação nos elementos de processamento usando um conjunto de inequações. As técnicas foram implementadas num compilador SISAL, mas pode ser usada sem mudanças em outras linguagens de associação simples e com a adição de análise de dependências pode ser usada em linguagens imperativas. / This work describes static compiler analysis techniques based on linear algebra and linear programming for optimizing the distribution of forall loops and of array elements in programs written in the SISAL programming language for distributed memory parallel machines. In the alignment phase, attempt is made in the identification of portions of different arrays that need to be distributed jointly by means of alignment of hyperplanes. In the partitioning phase, effort is made in breaking as even possible the computation and pertinent data in independent parts, by means of using related functions: the data decomposition function and the computation decomposition function. The last phase is dedicated to the mapping, which comprises the distribution of the elements of computation into the existing processing elements by means of a set of inequations. These techniques are being implemented in a SISAL compiler, but can be also used without changes by means of other single assignment languages or, with the addition of dependency analysis when using other set of languages, as well.
90

Proposta de uma linguagem Java para um ambiente paralelo-JAPAR / Design of the Java-like language for a parallel environment-JAPAR

Traina, Antônio Fernando 10 March 2000 (has links)
Com o crescente número de usuários de computadores, novas ferramentas têm sido apresentadas com a finalidade de aumentar a eficiência dos computadores, meio para que seus usuários disponham de recursos automatizados. Mais recentemente, máquinas ligadas em rede de computadores e o fenômeno Internet tornaram necessárias ferramentas específicas para este tipo particular de uso. Entre as principais respostas a essas necessidades surgiu a linguagem Java, que tem ganhado adeptos tanto na comunidade científica como no mercado. Surge daí a necessidade de buscar formas alternativas para o uso de computadores em rede. Entre as soluções propostas encontra-se a de arquiteturas e linguagens paralelas. Estas ferramentas, ainda em fase experimental, apresentam soluções que podem a médio e longo prazo serem viáveis, desde que estudos e pesquisas mostrem sua viabilidade. Neste trabalho investiga-se a aplicação do conceito de paralelismo em linguagens para rede, mais especificamente a linguagem Java. A proposta é estudar as possíveis formas para se explorar o paradigma de linguagens paralelas em ambientes Java. Para isto, apresenta-se uma pesquisa relativa as principais linguagens paralelas disponíveis na literatura, de forma a conhecer as melhores soluções apresentadas por essas linguagens. O trabalho apresenta também um estudo realizado em um conjunto de ferramentas Java disponíveis no mercado. Finalmente, propõe-se um novo ambiente que possa disponibilizar ao usuário os melhores recursos da linguagem Java, explorando as melhores soluções encontradas na literatura. / With the increasing number of computer users, new tools have been presented to improve the computers efficiency and to make automated resources available for those users. Nowadays, the use of computers in a network and the Internet phenomenon requires specific tools. Among them, the Java language appears as an important tool, which has been attracting users in both the scientific and commercial communities. At the same time the computer networks are becoming more popular and some problems have emergent concerned to the networks assessment and connections. It is necessary to look for alternative ways of handling the network computer systems. The parallel architectures and languages appear among the proposed solutions. These tools are still in an experimental phase, studies and researches being necessary additional to confirm their feasibility. In this work we investigate the application of parallelism concepts in languages for networks, and particular we deal with the Java language. The aim is to study the possible approaches for exploring the parallel languages paradigm in Java environments. Research about the main parallel languages available in the literature is presented, in order to check the best solutions proposed by those languages. The work also presents an investigation about the Java tools available in the commercial market. Finally a new environment is proposed that makes some of the best resources of language Java available to the users by exploring the best solutions found in the literature.

Page generated in 0.0904 seconds