261 |
Adaptive tiling algorithm based on highly correlated picture regions for the HEVC standard / Algoritmo de tiling adaptativo baseado em regiões altamente correlacionadas de um quadro para o padrão de codificação de vídeos de alta eficiênciaSilva, Cauane Blumenberg January 2014 (has links)
Esta dissertação de mestrado propõe um algoritmo adaptativo que é capaz de dinamicamente definir partições tile para quadros intra- e inter-preditos com o objetivo de reduzir o impacto na eficiência de codificação. Tiles são novas ferramentas orientadas ao paralelismo que integram o padrão de codificação de vídeos de alta eficiência (HEVC – High Efficiency Video Coding standard), as quais dividem o quadro em regiões retangulares independentes que podem ser processadas paralelamente. Para viabilizar o paralelismo, os tiles quebram as dependências de codificação através de suas bordas, gerando impactos na eficiência de codificação. Este impacto pode ser ainda maior caso os limites dos tiles dividam regiões altamente correlacionadas do quadro, porque a maior parte das ferramentas de codificação usam informações de contexto durante o processo de codificação. Assim, o algoritmo proposto agrupa as regiões do quadro que são altamente correlacionadas dentro de um mesmo tile para reduzir o impacto na eficiência de codificação que é inerente ao uso de tiles. Para localizar as regiões altamente correlacionadas do quadro de uma maneira inteligente, as características da imagem e também as informações de codificação são analisadas, gerando mapas de particionamento que servem como parâmetro de entrada para o algoritmo. Baseado nesses mapas, o algoritmo localiza as quebras naturais de contexto presentes nos quadros do vídeo e define os limites dos tiles nessas regiões. Dessa maneira, as quebras de dependência causadas pelas bordas dos tiles coincidem com as quebras de contexto naturais do quadro, minimizando as perdas na eficiência de codificação causadas pelo uso dos tiles. O algoritmo proposto é capaz de reduzir mais de 0.4% e mais de 0.5% o impacto na eficiência de codificação causado pelos tiles em quadros intra-preditos e inter-preditos, respectivamente, quando comparado com tiles uniformes. / This Master Thesis proposes an adaptive algorithm that is able to dynamically choose suitable tile partitions for intra- and inter-predicted frames in order to reduce the impact on coding efficiency arising from such partitioning. Tiles are novel parallelismoriented tools that integrate the High Efficiency Video Coding (HEVC) standard, which divide the frame into independent rectangular regions that can be processed in parallel. To enable the parallelism, tiles break the coding dependencies across their boundaries leading to coding efficiency impacts. These impacts can be even higher if tile boundaries split highly correlated picture regions, because most of the coding tools use context information during the encoding process. Hence, the proposed algorithm clusters the highly correlated picture regions inside the same tile to reduce the inherent coding efficiency impact of using tiles. To wisely locate the highly correlated picture regions, image characteristics and encoding information are analyzed, generating partitioning maps that serve as the algorithm input. Based on these maps, the algorithm locates the natural context break of the picture and defines the tile boundaries on these key regions. This way, the dependency breaks caused by the tile boundaries match the natural context breaks of a picture, then minimizing the coding efficiency losses caused by the use of tiles. The proposed adaptive tiling algorithm, in some cases, provides over 0.4% and over 0.5% of BD-rate savings for intra- and inter-predicted frames respectively, when compared to uniform-spaced tiles, an approach which does not consider the picture context to define the tile partitions.
|
262 |
Idiom-driven innermost loop vectorization in the presence of cross-iteration data dependencies in the HotSpot C2 compiler / Idiomdriven vektorisering av inre loopar med databeroenden i HotSpots C2 kompilatorSjöblom, William January 2020 (has links)
This thesis presents a technique for automatic vectorization of innermost single statement loops with a cross-iteration data dependence by analyzing data-flow to recognize frequently recurring program idioms. Recognition is carried out by matching the circular SSA data-flow found around the loop body’s φ-function against several primitive patterns, forming a tree representation of the relevant data-flow that is then pruned down to a single parameterized node, providing a high-level specification of the data-flow idiom at hand used to guide algorithmic replacement applied to the intermediate representation. The versatility of the technique is shown by presenting an implementation supporting vectorization of both a limited class of linear recurrences as well as prefix sums, where the latter shows how the technique generalizes to intermediate representations with memory state in SSA-form. Finally, a thorough performance evaluation is presented, showing the effectiveness of the vectorization technique.
|
263 |
Detektor plagiátů textových dokumentů / Text document plagiarism detectorKořínek, Lukáš January 2021 (has links)
This diploma thesis is concerned with research on available methods of plagiarism detection and then with design and implementation of such detector. Primary aim is to detect plagiarism within academic works or theses issued at BUT. The detector uses sophisticated preprocessing algorithms to store documents in its own corpus (document database). Implemented comparison algorithms are designed for parallel execution on graphical processing units and they compare a single subject document against all other documents within the corpus in the shortest time possible, enabling near real-time detection while maintaining acceptable quality of output.
|
264 |
Paralelizace ultrazvukových simulací pomocí akcelerátoru Intel Xeon Phi / Parallelisation of Ultrasound Simulations on Intel Xeon Phi AcceleratorVrbenský, Andrej January 2015 (has links)
Nowadays, the simulation of ultrasound acoustic waves has a wide range of practical usage. As one of them we can name the simulation in realistic tissue media, which is successfully used in medicine. There are several software applications dedicated to perform such simulations. k-Wave is one of them. The computational difficulty of the simulation itself is very high, and this leaves a space to explore new speed-up methods. In this master's thesis, we proposed a way to speed-up the simulation based on parallelization using Intel Xeon Phi accelerator. The accelerator contains large amount of cores and an extra-wide vector unit, and therefore, is ideal for purpose of parallelization and vectorization. The implementation is using OpenMP version 4.0, which brings some new options such as explicit vectorization. Results were measured during extensive experiments.
|
265 |
A la recherche de la haute performance pour les codes de calcul et la visualisation scientifique / Searching for the highest performance for simulation codes and scientific visualizationColin de Verdière, Guillaume 16 October 2019 (has links)
Cette thèse vise à démontrer que l'algorithmique et la programmation, dans un contexte de calcul haute performance (HPC), ne peuvent être envisagées sans tenir compte de l'architecture matérielle des supercalculateurs car cette dernière est régulièrement remise en cause.Après avoir rappelé quelques définitions relatives aux codes et au parallélisme, nous montrons que l'analyse des différentes générations de supercalculateurs, présents au CEA lors de ces 30 dernières années, permet de dégager des points de vigilances et des recommandations de bonnes pratiques en direction des développeurs de code.En se reposant sur plusieurs expériences, nous montrons comment viser une performance adaptée aux supercalculateurs et comment essayer d'atteindre la performance portable voire la performance extrême dans le monde du massivement parallèle, incluant ou non l'usage de GPU.Nous expliquons que les logiciels et matériels dédiés au dépouillement graphique des résultats de calcul suivent les mêmes principes de parallélisme que pour les grands codes scientifiques, impliquant de devoir maîtriser une vue globale de la chaîne de simulation. Enfin, nous montrons quelles sont les tendances et contraintes qui vont s'imposer à la conception des futurs supercalculateurs de classe exaflopique, impactant de fait le développement des prochaines générations de codes de calcul. / This thesis aims to demonstrate that algorithms and coding, in a high performance computing (HPC) context, cannot be envisioned without taking into account the hardware at the core of supercomputers since those machines evolve dramatically over time. After setting a few definitions relating to scientific codes and parallelism, we show that the analysis of the different generations of supercomputer used at CEA over the past 30 years allows to exhibit a number of attention points and best practices toward code developers.Based on some experiments, we show how to aim at code performance suited to the usage of supercomputers, how to try to get portable performance and possibly extreme performance in the world of massive parallelism, potentially using GPUs.We explain that graphical post-processing software and hardware follow the same parallelism principles as large scientific codes, requiring to master a global view of the simulation chain.Last, we describe tendencies and constraints that will be forced on the new generations of exaflopic class supercomputers. These evolutions will, yet again, impact the development of the next generations of scientific codes.
|
266 |
Evaluation of Machine Learning Primitives on a Digital Signal ProcessorEngström, Vilhelm January 2020 (has links)
Modern handheld devices rely on specialized hardware for evaluating machine learning algorithms. This thesis investigates the feasibility of using the digital signal processor, a part of the modem of the device, as an alternative to this specialized hardware. Memory management techniques and implementations for evaluating the machine learning primitives convolutional, max-pooling and fully connected layers are proposed. The implementations are evaluated based on to what degree they utilize available hardware units. New instructions for packing data and facilitating instruction pipelining are suggested and evaluated. The results show that convolutional and fully connected layers are well-suited to the processor used. The aptness of the convolutional layer is subject to the kernel being applied with a stride of 1 as larger strides cause the hardware usage to plummet. Max-pooling layers, while not ill-suited, are the most limited in terms of hardware usage. The proposed instructions are shown to have positive effects on the throughput of the implementations.
|
267 |
Solutions parallèles pour les grands problèmes de valeurs propres issus de l'analyse de graphe / Parallel solutions for large-scale eigenvalue problems arising in graph analyticsFender, Alexandre 13 December 2017 (has links)
Les graphes, ou réseaux, sont des structures mathématiques représentant des relations entre des éléments. Ces systèmes peuvent être analysés dans le but d’extraire des informations sur la structure globale ou sur des composants individuels. L'analyse de graphe conduit souvent à des problèmes hautement complexes à résoudre. À grande échelle, le coût de calcul de la solution exacte est prohibitif. Heureusement, il est possible d’utiliser des méthodes d’approximations itératives pour parvenir à des estimations précises. Lesméthodes historiques adaptées à un petit nombre de variables ne conviennent pas aux matrices creuses de grande taille provenant des graphes. Par conséquent, la conception de solveurs fiables, évolutifs, et efficaces demeure un problème essentiel. L’émergence d'architectures parallèles telles que le GPU ouvre également de nouvelles perspectives avec des progrès concernant à la fois la puissance de calcul et l'efficacité énergétique. Nos travaux ciblent la résolution de problèmes de valeurs propres de grande taille provenant des méthodes d’analyse de graphe dans le but d'utiliser efficacement les architectures parallèles. Nous présentons le domaine de l'analyse spectrale de grands réseaux puis proposons de nouveaux algorithmes et implémentations parallèles. Les résultats expérimentaux indiquent des améliorations conséquentes dans des applications réelles comme la détection de communautés et les indicateurs de popularité / Graphs, or networks, are mathematical structures to represent relations between elements. These systems can be analyzed to extract information upon the comprehensive structure or the nature of individual components. The analysis of networks often results in problems of high complexity. At large scale, the exact solution is prohibitively expensive to compute. Fortunately, this is an area where iterative approximation methods can be employed to find accurate estimations. Historical methods suitable for a small number of variables could not scale to large and sparse matrices arising in graph applications. Therefore, the design of scalable and efficient solvers remains an essential problem. Simultaneously, the emergence of parallel architecture such as GPU revealed remarkable ameliorations regarding performances and power efficiency. In this dissertation, we focus on solving large eigenvalue problems a rising in network analytics with the goal of efficiently utilizing parallel architectures. We revisit the spectral graph analysis theory and propose novel parallel algorithms and implementations. Experimental results indicate improvements on real and large applications in the context of ranking and clustering problems
|
268 |
Návrh síťových aplikací na platformě NetCOPE / Design of Network Applications for a NetCOPE PlatformHank, Andrej January 2009 (has links)
Monitoring and security in multigigabit networks with speeds 1 - 100 Gb/s needs hardware acceleration. NetCOPE platform for rapid development of network applications uses hardware acceleration card with FPGA technology by means of hardware/software codesign. Increas in performance of platform's software part is dependent of parallel processing in applications to take advantage of utilising more processor cores. This thesis analyses NetCOPE platform architecture and possibilities of parallelising classic network applications and creates models of concurrent access to data in NetCOPE platform to utilize more processor cores. These models are subsequently implemented as extensions to platform's Linux system drivers. Userspace libraries are created to provide simple interface for applications to use these new features. To achieve high throughput of this solution several optimizations are performed. Results are measured by created testing tools.
|
269 |
An Internal Representation for Adaptive Online ParallelizationRehme, Koy D. 29 May 2009 (has links) (PDF)
Future computer processors may have tens or hundreds of cores, increasing the need for efficient parallel programming models. The nature of multicore processors will present applications with the challenge of diversity: a variety of operating environments, architectures, and data will be available and the compiler will have no foreknowledge of the environment until run time. Adaptive Online Parallelization (ADOPAR) is a unifying framework that attempts to overcome diver sity by separating discovery and packaging of parallelism. Scheduling for execution may then occur at run time when diversity may best be resolved. This work presents a compact representation of parallelism based on the task graph programming model, tailored especially for ADOPAR and for regular and irregular parallel computations. Task graphs can be unmanageably large for fine-grained parallelism. Rather than representing each task individually, similar tasks are grouped into task descriptors. From these, a task descriptor graph, with relationship descriptors forming the edges of the graph, may be represented. While even highly irregular computations often have structure, previous representations have chosen to restrict what can be easily represented, thus limiting full exploitation by the back end. Therefore, in this work, task and relationship descriptors have been endowed with instantiation functions (methods of descriptors that act as factories) so the front end may have a full range of expression when describing the task graph. The representation uses descriptors to express a full range of regular and irregular computations in a very flexible and compact manner. The representation also allows for dynamic optimization and transformation, which assists ADOPAR in its goal of overcoming various forms of diversity. We have successfully implemented this representation using new compiler intrinsics, allow ADOPAR schedulers to operate on the described task graph for parallel execution, and demonstrate the low code size overhead and the necessity for native schedulers.
|
270 |
Scalable Extraction and Visualization of Scientific Features with Load-Balanced ParallelismXu, Jiayi January 2021 (has links)
No description available.
|
Page generated in 0.0558 seconds