Global ETD Search

51	Syntéza a optimalizace polymorfních obvodů / Polymorphic circuits synthesis and optimization Crha, Adam January 2021 (has links) Tato práce se zabývá metodami logické syntézy a optimalizací pro polymorfní obvody. V práci jsou jak diskutovány existující metody pro konvenční obvody, tak i představeny nové metody, aplikovatelné na polymorfní elektroniku. Hlavním přínosem práce je představení nových metod optimalizace a logické syntézy pro polymorfní obvody. Přesto, že v minulých letech byly představeny metody pro návrh polymorfních obvodů, jsou tyto metody založené na evolučních technikách nebo nejsou dobře škálovatelné. Z toho vyplývá, že stále neexistuje stabilní metodika pro návrh složitějších polymorfních obvodů. Tato práce představuje zejména reprezentaci polymorgních obvodů a metodiku pro jejich návrh založenou na And-Inverter grafech. Na polymorfní obvody reprezentované pomocí AIG je možné aplikovat známé techniky jako například přepisování [rewriting]. Nasazením techniky přepisování na polymorfní AIG získáme obvod, obsahující polymorfní prvky uvnitř obvodu, a je možné dosáhnout značných úspor prostředků, které mohou být sdíleny mezi dvěma funkcemi současně. Ověření návrhové metodiky pro polymorfní obvody bylo provedeno nad sadou veřejně dostupných obvodů, čímž je demonstrována efektivita metodiky.
52	Modelování silových účinků působících na dopravní a manipulační zařízení s cílem jejich optimalizace / Modelling of Load Impacts Acting on Transport and Handling Equipments with the Aim of their Optimization Šťastný, Antonín January 2015 (has links) This PhD thesis deals with employment of the state of the art methods of mathematical optimization and structural analysis in the field of load carrying steel structures of handling devices. The goal of the thesis is to compile a methodology which enables generating of optimal dimensions of conceptually designed load carrying parts of handling devices. The proposed methodology is composed of sub-methods which are successively applied to find an optimal configuration of structure according to a chosen criterion. The methodology incorporates sub-methods such as Design of Experiments, parametric finite-element modelling, the state of the art computational methods for stability assessment, mathematical approximation methods and state of the art optimization schemes based of both, heuristic and gradient principle. Recommendations from Eurocode 3 are used to introduce imperfections to the finite element model in order to perform the nonlinear buckling analysis. The practical part of this thesis is focused on optimization of welded beams. The principle of the methodology is in detail explained and demonstrated on an example of lifting spreader beam of load carrying capacity of 20 tons. The proposed methodology is practically realized by an algorithm created in Matlab software. Matlab is also utilized to implement some sub-methods including mathematical optimization schemes. Both, gradient and heuristic optimization algorithms are used for comparison and mutual verification. Structural analysis is performed by means of parametrical finite-element models which are built in the Ansys Parametric Design Language (APDL). The methodology takes into account buckling, which is inherent to thin walled structures under compressive load. The buckling analysis is performed by means of both, linear and non-linear procedures in Ansys. The output of the algorithm is an optimized configuration of the structure, which minimizes the objective function and complies with all requirements implemented in the form of design constraints.
53	Improving Energy Efficiency of Network-on-Chips Using Emerging Wireless Technology and Router Optimizations DiTomaso, Dominic F. 25 July 2012 (has links) No description available. Computer Engineering Network-on-Chip Wireless Interconnects Channel Buffers Crossbars Power-efficient NoC CMP Router Optimizations Energy efficient
54	Resource-efficient and fast Point-in-Time joins for Apache Spark : Optimization of time travel operations for the creation of machine learning training datasets / Resurseffektiva och snabba Point-in-Time joins i Apache Spark : Optimering av tidsresningsoperationer för skapande av träningsdata för maskininlärningsmodeller Pettersson, Axel January 2022 (has links) A scenario in which modern machine learning models are trained is to make use of past data to be able to make predictions about the future. When working with multiple structured and time-labeled datasets, it has become a more common practice to make use of a join operator called the Point-in-Time join, or PIT join, to construct these datasets. The PIT join matches entries from the left dataset with entries of the right dataset where the matched entry is the row whose recorded event time is the closest to the left row’s timestamp, out of all the right entries whose event time occurred before or at the same time of the left event time. This feature has long only been a part of time series data processing tools but has recently received a new wave of attention due to the rise of the popularity of feature stores. To be able to perform such an operation when dealing with a large amount of data, data engineers commonly turn to large-scale data processing tools, such as Apache Spark. However, Spark does not have a native implementation when performing these joins and there has not been a clear consensus by the community on how this should be achieved. This, along with previous implementations of the PIT join, raises the question: ”How to perform fast and resource efficient Pointin- Time joins in Apache Spark?”. To answer this question, three different algorithms have been developed and compared for performing a PIT join in Spark in terms of resource consumption and execution time. These algorithms were benchmarked using generated datasets using varying physical partitions and sorting structures. Furthermore, the scalability of the algorithms was tested by running the algorithms on Apache Spark clusters of varying sizes. The results received from the benchmarks showed that the best measurements were achieved by performing the join using Early Stop Sort-Merge Join, a modified version of the regular Sort-Merge Join native to Spark. The best performing datasets were the datasets that were sorted by timestamp and primary key, ascending or descending, using a suitable number of physical partitions. Using this new information gathered by this project, data engineers have been provided with general guidelines to optimize their data processing pipelines to be able to perform more resource-efficient and faster PIT joins. / Ett vanligt scenario för maskininlärning är att träna modeller på tidigare observerad data för att för att ge förutsägelser om framtiden. När man jobbar med ett flertal strukturerade och tidsmärkta dataset har det blivit vanligare att använda sig av en join-operator som kallas Point-in-Time join, eller PIT join, för att konstruera dessa datauppsättningar. En PIT join matchar rader från det vänstra datasetet med rader i det högra datasetet där den matchade raden är den raden vars registrerade händelsetid är närmaste den vänstra raden händelsetid, av alla rader i det högra datasetet vars händelsetid inträffade före eller samtidigt som den vänstra händelsetiden. Denna funktionalitet har länge bara varit en del av datahanteringsverktyg för tidsbaserad data, men har nyligen fått en ökat popularitet på grund av det ökande intresset för feature stores. För att kunna utföra en sådan operation vid hantering av stora mängder data vänder sig data engineers vanligvis till storskaliga databehandlingsverktyg, såsom Apache Spark. Spark har dock ingen inbyggd implementation för denna join-operation, och det finns inte ett tydligt konsensus från Spark-rörelsen om hur det ska uppnås. Detta, tillsammans med de tidigare implementationerna av PIT joins, väcker frågan: ”Vad är det mest effektiva sättet att utföra en PIT join i Apache Spark?”. För att svara på denna fråga har tre olika algoritmer utvecklats och jämförts med hänsyn till resursförbrukning och exekveringstid. För att jämföra algoritmerna, exekverades de på genererade datauppsättningar med olika fysiska partitioner och sorteringstrukturer. Dessutom testades skalbarheten av algoritmerna genom att köra de på Spark-kluster av varierande storlek. Resultaten visade att de bästa mätvärdena uppnåddes genom att utföra operationen med algoritmen early stop sort-merge join, en modifierad version av den vanliga sort-merge join som är inbyggd i Spark, med en datauppsättning som är sorterad på tidsstämpel och primärnyckel, antingen stigande eller fallande. Fysisk partitionering av data kunde även ge bättre resultat, men det optimala antal fysiska partitioner kan variera beroende på datan i sig. Med hjälp av denna nya information som samlats in av detta projekt har data engineers försetts med allmänna riktlinjer för att optimera sina databehandlings-pipelines för att kunna utföra mer resurseffektiva och snabbare PIT joins Apache Spark Point-in-Time ASOF Join Optimizations Time travel Apache Spark Point-in-Time ASOF Join Optimeringar Tidsresning Software Engineering Programvaruteknik
55	Runtime specialization for heterogeneous CPU-GPU platforms Farooqui, Naila 27 May 2016 (has links) Heterogeneous parallel architectures like those comprised of CPUs and GPUs are a tantalizing compute fabric for performance-hungry developers. While these platforms enable order-of-magnitude performance increases for many data-parallel application domains, there remain several open challenges: (i) the distinct execution models inherent in the heterogeneous devices present on such platforms drives the need to dynamically match workload characteristics to the underlying resources, (ii) the complex architecture and programming models of such systems require substantial application knowledge and effort-intensive program tuning to achieve high performance, and (iii) as such platforms become prevalent, there is a need to extend their utility from running known regular data-parallel applications to the broader set of input-dependent, irregular applications common in enterprise settings. The key contribution of our research is to enable runtime specialization on such hybrid CPU-GPU platforms by matching application characteristics to the underlying heterogeneous resources for both regular and irregular workloads. Our approach enables profile-driven resource management and optimizations for such platforms, providing high application performance and system throughput. Towards this end, this research: (a) enables dynamic instrumentation for GPU-based parallel architectures, specifically targeting the complex Single-Instruction Multiple-Data (SIMD) execution model, to gain real-time introspection into application behavior; (b) leverages such dynamic performance data to support novel online resource management methods that improve application performance and system throughput, particularly for irregular, input-dependent applications; (c) automates some of the programmer effort required to exercise specialized architectural features of such platforms via instrumentation-driven dynamic code optimizations; and (d) proposes a specialized, affinity-aware work-stealing scheduling runtime for integrated CPU-GPU processors that efficiently distributes work across all CPU and GPU cores for improved load balance, taking into account both application characteristics and architectural differences of the underlying devices. Dynamic instrumentation Dynamic compilation GPU computing Heterogeneous computing Profile-guided optimizations Program analysis Workload characterization Compiler Runtime Multicore CUDA OpenCL SIMD
56	Une approche dynamique pour l'optimisation des communications concurrentes sur réseaux hautes performance Brunet, Elisabeth 08 December 2008 (has links) Cette thèse cherche à optimiser les communications des applications de calcul intensif s'exécutant sur des grappes de PC. En raison de l'usage massif de processeurs multicoeurs, il est désormais impératif de gérer un grand nombre de flux de communication concurrents. Nous avons mis en évidence et analysé les performances décevantes des solutions actuelles dans un tel contexte. Nous avons ainsi proposé une architecture de communication centrée sur l'arbitrage de l'accès aux matériels. Son originalité réside dans la dissociation de l'activité de l'application de celle des cartes réseaux. Notre modèle exploite l'intervalle de temps introduit entre le dépot des requêtes de communication et la disponibilité des cartes réseaux pour appliquer des optimisations de manière opportuniste. NewMadeleine implémente ce concept et se révèle capable d'exploiter les réseaux les plus performants du moment. Des tests synthétiques et portages d'implémentations caractéristiques de MPI ont permis de valider l'architecture proposée. / The aim of this thesis is to optimize the communications of high performance applications, in the context of clusters computing. Given the massive use of multicore architectures, it is now crucial to handle a large number of concurrent communication flows. We highlighted and analyzed the shortcomings of existing solutions. We therefore designed a new way to schedule communication flows by focusing on the activity of the network cards. Its novelty consists in untying the activity of applications from that of the network cards. Our model takes advantage of the delay that exists between the deposal of the communication requests and the moment when the network cards become idle in order to apply some opportunistic optimizations. NewMadeleine implements this model, thus making possible to exploit last generation high speed networks. The approach of NewMadeleine is not only validated by synthetical tests but also by real applications. Informatique Calcul intensif Grappes de machines Communications haute performance Ordonnancement Optimisation MPI Réseau Parallélisme Computer science High performance computing Cluster computing High speed communication Scheduling Optimizations MPI Networking Parallelism
57	Conception d'algorithmes hybrides pour l'optimisation de l'énergie mémoire dans les systèmes embarqués et de fonctions multimodales / Design of hybrid algorithms for memory energy optimization in embedded systems and multimodal functions Idrissi Aouad, Maha 04 July 2011 (has links) La mémoire est considérée comme étant gloutonne en consommation d'énergie, un problème sensible, particulièrement dans les systèmes embarqués. L'optimisation globale de fonctions multimodales est également un problème délicat à résoudre du fait de la grande quantité d'optima locaux de ces fonctions. Dans ce mémoire, je présente différents nouveaux algorithmes hybrides et distribués afin de résoudre ces deux problèmes d'optimisation. Ces algorithmes sont comparés avec les méthodes classiques utilisées dans la littérature et les résultats obtenus sont encourageants. En effet, ces résultats montrent une réduction de la consommation d'énergie en mémoire d'environ 76% jusqu'à plus de 98% sur nos programmes tests, d'une part. D'autre part, dans le cas de l'optimisation globale de fonctions multimodales, nos algorithmes hybrides convergent plus souvent vers la solution optimale globale. Des versions distribuées et coopératives de ces nouveaux algorithmes hybrides sont également proposées. Elles sont, par ailleurs, plus rapides que leurs versions séquentielles respectives. / Résumé en anglais : Memory is considered to be greedy in energy consumption, a sensitive issue, especially in embedded systems. The global optimization of multimodal functions is also a difficult problem because of the large number of local optima of these functions. In this thesis report, I present various new hybrid and distributed algorithms to solve these two optimization problems. These algorithms are compared with conventional methods used in the literature and the results obtained are encouraging. Indeed, these results show a reduction in memory energy consumption by about 76% to more than 98% on our benchmarks on one hand. On the other hand, in the case of global optimization of multimodal functions, our hybrid algorithms converge more often to the global optimum solution. Distributed and cooperative versions of these new hybrid algorithms are also proposed. They are more faster than their respective sequential versions. Systèmes embarqués autonomes Algorithmes hybrides Basse énergie Fonctions multimodales Gestion mémoire Optimisations Autonomus embedded systems Hybrid algorithms Low-energy Multimodal functions Memory management Optimizations.
58	Sub-Polyhedral Compilation using (Unit-)Two-Variables-Per-Inequality Polyhedra Upadrasta, Ramakrishna 13 March 2013 (has links) (PDF) The goal of this thesis is to design algorithms that run with better complexity when compiling or parallelizing loop programs. The framework within which our algorithms operate is the polyhedral model of compilation which has been successful in the design and implementation of complex loop nest optimizers and parallelizing compilers. The algorithmic complexity and scalability limitations of the above framework remain one important weakness. We address it by introducing sub-polyhedral compilation by using (Unit-)Two-Variable-Per-Inequality or (U)TVPI Polyhedra, namely polyhedrawith restricted constraints of the type ax_{i}+bx_{j}\le c (\pm x_{i}\pm x_{j}\le c). A major focus of our sub-polyhedral compilation is the introduction of sub-polyhedral scheduling, where we propose a technique for scheduling using (U)TVPI polyhedra. As part of this, we introduce algorithms that can be used to construct under-aproximations of the systems of constraints resulting from affine scheduling problems. This technique relies on simple polynomial time algorithms to under approximate a general polyhedron into (U)TVPI polyhedra. The above under-approximation algorithms are generic enough that they can be used for many kinds of loop parallelization scheduling problems, reducing each of their complexities to asymptotically polynomial time. We also introduce sub-polyhedral code-generation where we propose algorithms to use the improved complexities of (U)TVPI sub-polyhedra in polyhedral code generation. In this problem, we show that the exponentialities associated with the widely used polyhedral code generators could be reduced to polynomial time using the improved complexities of (U)TVPI sub-polyhedra. The above presented sub-polyhedral scheduling techniques are evaluated in an experimental framework. For this, we modify the state-of-the-art PLuTo compiler which can parallelize for multi-core architectures using permutation and tiling transformations. We show that using our scheduling technique, the above under-approximations yield polyhedra that are non-empty for 10 out of 16 benchmarks from the Polybench (2.0) kernels. Solving the under-approximated system leads to asymptotic gains in complexity, and shows practically significant improvements when compared to a traditional LP solver. We also verify that code generated by our sub-polyhedral parallelization prototype matches the performance of PLuTo-optimized code when the under-approximation preserves feasibility. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Affine scheduling Approximation algorithms Compiler optimizations Compilers Loop transformations Optimization Parallelism Asymptotic complexity Code generation
59	Taking architecture and compiler into account in formal proofs of numerical programs Nguyen, Thi Minh Tuyen 11 June 2012 (has links) (PDF) On some recently developed architectures, a numerical program may give different answers depending on the execution hardware and the compilation. These discrepancies of the results come from the fact that each floating-point computation is calculated with different precisions. The goal of this thesis is to formally prove properties about numerical programs while taking the architecture and the compiler into account. In order to do that, we propose two different approaches. The first approach is to prove properties of floating-point programs that are true for multiple architectures and compilers. This approach states the rounding error of each floating-point computation whatever the environment and the compiler choices. It is implemented in the Frama-C platform for static analysis of C code. The second approach is to prove behavioral properties of numerical programs by analyzing their compiled assembly code. We focus on the issues and traps that may arise on floating-point computations. Direct analysis of the assembly code allows us to take into account architecture- or compiler-dependent features such as the possible use of extended precision registers. It is implemented above the Why platform for deductive verification [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Floating-point arithmetic Numerical programs Static analysis Compile-time optimizations The Why platform The Frama-C platform
60	Automatic Optimization of Geometric Multigrid Methods using a DSL Approach Vasista, Vinay V January 2017 (has links) (PDF) Geometric Multigrid (GMG) methods are widely used in numerical analysis to accelerate the convergence of partial differential equations solvers using a hierarchy of grid discretizations. These solvers find plenty of applications in various fields in engineering and scientific domains, where solving PDEs is of fundamental importance. Using multigrid methods, the pace at which the solvers arrive at the solution can be improved at an algorithmic level. With the advance in modern computer architecture, solving problems with higher complexity and sizes is feasible - this is also the case with multigrid methods. However, since hardware support alone cannot achieve high performance in execution time, there is a need for good software that help programmers in doing so. Multiple grid sizes and recursive expression of multigrid cycles make the task of manual program optimization tedious and error-prone. A high-level language that aids domain experts to quickly express complex algorithms in a compact way using dedicated constructs for multigrid methods and with good optimization support is thus valuable. Typical computation patterns in a GMG algorithm includes stencils, point-wise accesses, restriction and interpolation of a grid. These computations can be optimized for performance on modern architectures using standard parallelization and locality enhancement techniques. Several past works have addressed the problem of automatic optimizations of computations in various scientific domains using a domain-specific language (DSL) approach. A DSL is a language with features to express domain-specific computations and compiler support to enable optimizations specific to these computations. Halide and PolyMage are two of the recent works in this direction, that aim to optimize image processing pipelines. Many computations like upsampling and downsampling an image are similar to interpolation and restriction in geometric multigrid methods. In this thesis, we demonstrate how high performance can be achieved on GMG algorithms written in the PolyMage domain-specific language with new optimizations we added to the compiler. We also discuss the implementation of non-trivial optimizations, on PolyMage compiler, necessary to achieve high parallel performance for multigrid methods on modern architectures. We realize these goals by: • introducing multigrid domain-specific constructs to minimize the verbosity of the algorithm specification; • storage remapping to reduce the memory footprint of the program and improve cache locality exploitation; • mitigating execution time spent in data handling operations like memory allocation and freeing, using a pool of memory, across multiple multigrid cycles; and • incorporating other well-known techniques to leverage performance, like exploiting multi-dimensional parallelism and minimizing the lifetime of storage buffers. We evaluate our optimizations on a modern multicore system using five different benchmarks varying in multigrid cycle structure, complexity and size, for two-and three-dimensional data grids. Experimental results show that our optimizations: • improve performance of existing PolyMage optimizer by 1.31x; • are better than straight-forward parallel and vector implementations by 3.2x; • are better than hand-optimized versions in conjunction with optimizations by Pluto, a state-of-the-art polyhedral source-to-source optimizer, by 1.23x; and • achieve up to 1.5$\times$ speedup over NAS MG benchmark from the NAS Parallel Benchmarks. (The speedup numbers are Geometric means over all benchmarks) Geometric Multigrid Methods Automatic Optimization Geometric Multigrid Method Compiler Optimizations PolyMage Compiler DSL Approach Automatic Optimization Domain-specific Language Approach GMG Algorithm Polyhedral Optimization Storage Optimization Computer Science

Search results