• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 44
  • 10
  • 6
  • 4
  • 4
  • 2
  • 2
  • 2
  • 1
  • Tagged with
  • 85
  • 30
  • 22
  • 21
  • 16
  • 13
  • 11
  • 10
  • 8
  • 8
  • 8
  • 8
  • 7
  • 7
  • 7
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
61

Contribuição para a otimização de traços de concreto utilizados na produção de blocos estruturais

Felipe, Alexsandro dos Santos [UNESP] 01 February 2010 (has links) (PDF)
Made available in DSpace on 2014-06-11T19:25:21Z (GMT). No. of bitstreams: 0 Previous issue date: 2010-02-01Bitstream added on 2014-06-13T20:53:08Z : No. of bitstreams: 1 felipe_as_me_ilha.pdf: 2392820 bytes, checksum: 9c6e04fef232018e3760dee715984441 (MD5) / Universidade Estadual Paulista (UNESP) / Diante do grande crescimento da alvenaria estrutural no país, muitas fábricas de blocos de concreto vieram por necessidade, buscar por otimizações do seu processo produtivo, visto que, projetos mais arrojados, acabam exigindo maiores controles de qualidade. A proposta deste estudo é melhorar a produção destes artefatos de concreto por meio de otimizações simples que reduzem o custo e garantem uma produção eficiente na fábrica. Estudar a fundo os vários parâmetros de formação de um traço de concreto seco, tais como coesão, textura, energias de adensamento e resistência à compressão axial, todos os fenômenos dependentes entre si, torna-se muito complexo se avaliado em um único trabalho. No entanto, propor um estudo que colete informações apresentadas por vários autores, facilita na otimização e criação de uma pesquisa que possa contribuir na dosagem para concretos secos, em especial, na fabricação de blocos estruturais. Neste estudo, adaptaram-se para laboratório alguns equipamentos de uso comum para confecção destes artefatos de concreto, possibilitando a correlação direta de corpos de prova cilíndricos com os blocos. Uma das adaptações é o estudo baseado na padronização da energia de compactação, proporcionada pelo equipamento para ensaios de mini-proctor, simulando assim, a máquina vibro-prensa. Outras correlações como coesão e resistência à compressão, também foram possíveis de obter no laboratório, reduzindo então, as interferências constantes no processo produtivo da fábrica, ocasião verificada em vários outros estudos. Diante do exposto, foi possível avaliar com boa segurança os resultados. O estudo foi desenvolvido em três etapas, sempre buscando a maior massa específica seca compactada da mistura de agregados, inicialmente na primeira etapa, foi utilizado somente dois agregados (areia fina e pedrisco), conforme... / Given the substantial growth of structural masonry in Brazil, many concrete block companies have seen the need to optimize their production process, since more challenging projects require greater quality control. This study proposes to improve the production of concrete artifacts by means of simple optimizations that reduce costs and ensure the company’s efficient production. Studying in depth the various parameters of the formation of dry concrete trace, such as cohesion, texture, density energy and axial compressive strength, all the particularities interdependent on one another become very complex when assessed in a single study. However, proposing a study that collects the information submitted by various authors, expedites optimizing and creating a research study which may assist in improving the dosage for dried concrete, particularly in the manufacture of structural blocks. In this study, some commonly used manufacturing equipment, such as concrete artifacts, were laboratory-adapted, enabling a direct correlation of the cylindrical specimens with the blocks. One of the adjustments is the study based on the standardization of compaction energy, provided by the mini-proctor testing equipment, thus simulating the vibro press machine. Other correlations such as cohesion and compressive strength were also obtained in the laboratory, therefore reducing the constant interferences in the plant’s production process, observed in many other studies. In this manner, it was possible to reliably assess the results. The study was conducted in three stages, always seeking the highest compressed dry specific mass of the mixture of aggregates. In the first stage, only two aggregates were used (fine sand and pebbles), commonly used at the plant. The second stage included adding the coarse sand and stone powder to correct the lack of resistance promoted by the high amount of fine sand from... (Complete abstract click electronic access below)
62

Cálculo do equilíbrio sólido-líquido e ajuste de parâmetros para modelos temodinâmicos em misturas binárias e ternárias de ácidos graxos, seus ésteres e triacilgliceróis / Solid-liquid equilibria calculation and parameters determination in thermodynamic models for binary and ternary mixtures of fatty acids, esters and triacylglycerols

Rocha, Stella Alonso 09 September 2011 (has links)
Orientador: Reginaldo Guirardello / Tese (doutorado) - Universidade Estadual de Campinas, Faculdade de Engenharia Química / Made available in DSpace on 2018-08-19T03:13:19Z (GMT). No. of bitstreams: 1 Rocha_StellaAlonso_D.pdf: 2102940 bytes, checksum: d1db5bc412c8ae0218252291126c501e (MD5) Previous issue date: 2011 / Resumo: O trabalho desenvolvido é de caráter teórico e computacional e tem como objetivo o estudo e aplicação de técnicas de otimização juntamente com análise de convexidade para cálculo do equilíbrio sólido-líquido, utilizando um problema de Programação Não-linear, implementado no software GAMS em conjunto com o software Microsoft Excel, aplicado a misturas graxas binárias e ternárias, de origem natural, compostas por ácidos graxos, triacilgliceróis, ésteres de etila e metila, baseando-se na minimização da energia de Gibbs do sistema. A modelagem das fases foi desenvolvida com dois modelos termodinâmicos, em que a fase sólida foi descrita por uma modificação da equação de Slaughter e Doherty e a líquida, modelada por Margules 2-sufixos. Assim esse trabalho trata da utilização de novas aplicações para a fase líquida, com o Modelo de Margules - 2 sufixos, que passa e ser usado e desenvolvido de dois modos: o primeiro, nomeado Margules Assimétrico, em que os parâmetros de interação de Margules são diferentes, e o outro, chamado Margules Simétrico, neste, os parâmetros são considerados equivalentes; e ainda com aplicação do Modelo de Wilson. Para tal desenvolvimento, a proposta é utilizar dados experimentais de equilíbrio, disponíveis da literatura, em caráter comparativo; e fazer o próprio cálculo dos dados da curva de equilíbrio, com os respectivos ajustes dos parâmetros nos modelos utilizados. Os resultados obtidos são apresentados na forma de diagrama de fases e ajustes de parâmetros dos modelos envolvidos, calculados e comparados pela técnica de minimização dos quadrados dos erros, que por sua vez apresentam-se satisfatória proximidade dos dados experimentais. Os resultados ainda foram comparados entre si e puderam apresentar algumas características de comportamento, para algumas classes de compostos / Abstract: This work has theorical and computational character which objectives are the study and application of optimization technique with convexity analysis to calculate solid-liquid equilibrium. This problem was a non-linear program, that was implemented in software GAMS and software Microsoft Excel, which the case studies are binary and ternary mixtures of fatty acids, triglycerides, ethyl and methyl esters using the minimization of Gibbs energy free of the systems. The description of phases was done based in two thermodynamics models, the solid phase was characterized using a modified Slaughter and Doherty model and liquid phase with Margules 2 - suffixes. In this work the on liquid phase, the Margules model assume two forms: Margules Asymmetric, where the Margules parameters are different and Margules Symmetric, with equal Margules parameters, after was used Wilson Model too. Experimental data was used in comparative mode and new equilibrium dates was obtained and the parameters model also was discovery. The results are described in form of phase diagrams, where the equilibrium data and the parameters model was calculated based in the square errors, that are nearly of the experimental data. Comparing the results with each other, in some classes of compounds, there are some similar ways / Doutorado / Desenvolvimento de Processos Químicos / Doutor em Engenharia Química
63

Multiplatform Game Development Using the Unity Engine / Multiplatform Game Development Using the Unity Engine

Jašek, Roman January 2014 (has links)
Tato práce se zabývá možnostmi herního vývoje pro víceré platformy pomocí enginu Unity 3D. Rozebírá různé aspekty multiplatformního vývoje na počítačích i mobilních zařízeních. Důraz je kladen na analýzu tvorby her současně pro více platforem a problémy spojené s tímhle přístupem. Práce poskytuje možnosti řešení problémů, které vyvstanou při použití tohohle postupu, pomocí nástrojů, které poskytuje Unity 3D. Analýza se zabývá zejména na zvyšování výkonu her za použití metod dostupných na všech platformách vybraných pro testování. Tyhle vylepšení zahrnují způsoby jak snížit práci ve scéně a naopak zvýšit počet vykreslení za sekundu při zachování stejné vizuální kvality. Práce také nabízí pohled do minulosti tohoto odvětví a předpoklady o jeho příštím směrování.
64

Robust and distributed top-n frequent-pattern mining with SAP BW accelerator

Lehner, Wolfgang, Legler, Thomas, Schaffner, Jan, Krüger, Jens 22 April 2022 (has links)
Mining for association rules and frequent patterns is a central activity in data mining. However, most existing algorithms are only moderately suitable for real-world scenarios. Most strategies use parameters like minimum support, for which it can be very difficult to define a suitable value for unknown datasets. Since most untrained users are unable or unwilling to set such technical parameters, we address the problem of replacing the minimum-support parameter with top-n strategies. In our paper, we start by extending a top-n implementation of the ECLAT algorithm to improve its performance by using heuristic search strategy optimizations. Also, real-world datasets are often distributed and modern database architectures are switching from expensive SMPs to cheaper shared-nothing blade servers. Thus, most mining queries require distribution handling. Since partitioning can be forced by user-defined semantics, it is often forbidden to transform the data. Therefore, we developed an adaptive top-n frequent-pattern mining algorithm that simplifies the mining process on real distributions by relaxing some requirements on the results. We first combine the PARTITION and the TPUT algorithms to handle distributed top-n frequent-pattern mining. Then, we extend this new algorithm for distributions with real-world data characteristics. For frequent-pattern mining algorithms, equal distributions are important conditions, and tiny partitions can cause performance bottlenecks. Hence, we implemented an approach called MAST that defines a minimum absolute-support threshold. MAST prunes patterns with low chances of reaching the global top-n result set and high computing costs. In total, our approach simplifies the process of frequent-pattern mining for real customer scenarios and data sets. This may make frequent-pattern mining accessible for very new user groups. Finally, we present results of our algorithms when run on the SAP NetWeaver BW Acceleratorwith standard and real business datasets.
65

Effect of Whole-Body Kinematics on ACL Strain and Knee Joint Loads and Stresses during Single-Leg Cross Drop and Single-Leg Landing from a Jump

Sadeqi, Sara 11 July 2022 (has links)
No description available.
66

Optimization of a welding gun use case by using a time-based ergonomics evaluation method

Mora Quiles, Elia January 2022 (has links)
Nowadays virtual simulations are commonly used to solve problems regarding worker well-being or productivity in manufacturing companies. However, when it comes to finding a solution to one of these two objectives, the other usually tends to be secondary. In order to solve this problem, the Ergonomics in Production Platform (EPP) has been developed within research efforts at University of Skövde, which through the use of optimizations is able to obtain solutions where both objectives are taken into account. In turn, in order to address worker well-being, EPP makes use of the digital human modelling (DHM) tool. DHM tools are often used to evaluate simulations focused on studying human-machine interaction. However, as these software evolve and start to be able to reproduce complete motions, before they were only considering frames, new methods are needed to be able to assess risk factors such as time and prevent the occurrence of musculoskeletal disorders (MSDs). In order to assist in the development of EPP optimizations for simulations carried out in DHM tools, the time-based observational method RAMP was implemented, specifically the posture-related criteria of RAMP II. Using the Design and Creation research methodology, a welding gun case study located in China offered by Volvo Cars was used to evaluate the results of the optimizations carried out with EPP. For the evaluation of this case study, a manikin family of 10 members representing key cases of the Asian population was created for this task. Later, this task was recreated in IPS IMMA, where the 10 cases interacted with 3 welding guns to weld different spots on a piece. The analysis of this case study consisted of two distinct phases where the results of RAMP II implemented in EPP could be evaluated. The first phase focused on analyzing initial results of three different trajectories for all members of the family. The second phase consisted of optimizing one of the trajectories analyzed in the previous phase in such a way as to find the best welding angle of the gun to improve the results of the worst case in the first analysis. Three different factors were evaluated in this phase: RAMP II results versus the new angle, RAMP II results versus the results of other methods and the effect of productivity versus worker well-being. The results showed that welding angles of 116º and 80º were able to improve the values of the RAMP II criteria for the most disadvantaged manikin in the welding task. At the same time, it was observed that the higher the percentage of value added time, the higher the risk obtained in the analysis, worsening the worker's well-being.
67

Composable, Sound Transformations for Nested Recursion and Loops

Kirshanthan Sundararajah (16647885) 26 July 2023 (has links)
<p>    </p> <p>Programs that use loops to operate over arrays and matrices are generally known as <em>regular programs</em>. These programs appear in critical applications such as image processing, differential equation solvers, and machine learning. Over the past few decades, extensive research has been done on composing, verifying, and applying scheduling transformations like loop interchange and loop tiling for regular programs. As a result, we have general frameworks such as the polyhedral model to handle transformations for loop-based programs. Similarly, programs that use recursion and loops to manipulate pointer-based data structures are known as <em>irregular programs</em>. Irregular programs also appear in essential applications such as scientific simulations, data mining, and graphics rendering. However, there is no analogous framework for recursive programs. In the last decade, although many scheduling transformations have been developed for irregular programs, they are ad-hoc in various aspects, such as being developed for a specific application and lacking portability. This dissertation examines principled ways to handle scheduling transformations for recursive programs through a unified framework resulting in performance enhancement. </p> <p>Finding principled approaches to optimize irregular programs at compile-time is a long-standing problem. We specifically focus on scheduling transformations that reorder a program’s operations to improve performance by enhancing locality and exploiting parallelism. In the first part of this dissertation, we present PolyRec, a unified general framework that can compose and apply scheduling transformations to nested recursive programs and reason about the correctness of composed transformations. PolyRec is a first-of-its-kind unified general transformation framework for irregular programs consisting of nested recursion and loops. It is built on solid theoretical foundations from the world of automata and transducers and provides a fundamentally novel way to think about recursive programs and scheduling transformations for them. The core idea is designing mechanisms to strike a balance between the expressivity in representing the set of dynamic instances of computations, transformations, and dependences and the decidability of checking the correctness of composed transformations. We use <em>multi-tape </em>automata and transducers to represent the set of dynamic instances of computations and transformations, respectively. These machines are similar yet more expressive than their classical single-tape counterparts. While in general decidable properties of classical machines are undecidable for multi-tape machines, we have proven that those properties are decidable for the class of machines we consider, and we present algorithms to verify these properties. Therefore these machines provide the building blocks to compose and verify scheduling transformations for nested recursion and loops. The crux of the PolyRec framework is its regular string-based representation of dynamic instances that allows to lexicographically order instances identically to their execution order. All the transformations considered in PolyRec require different ordering of these strings representable only with <em>additive </em>changes to the strings. </p> <p>Loop transformations such as <em>skewing </em>require performing arithmetic on the representation of dynamic instances. In the second part of this dissertation, we explore this space of transformations by introducing skewing to nested recursion. Skewing plays an essential role in producing easily parallelizable loop nests from seemingly difficult ones due to dependences carried across loops. The inclusion of skewing for nested recursion to PolyRec requires significant extensions to representing dynamic instances and transformations that facilitate <em>performing arithmetic using strings</em>. First, we prove that the machines that represent the transformations are still composable. Then we prove that the representation of dependences and the algorithm that checks the correctness of composed transformations hold with minimal changes. Our new extended framework is known as UniRec, since it resembles the unimodular transformations for perfectly nested loop nests, which consider any combination of the primary transformations interchange, reversal, and skewing. UniRec opens possibilities of producing newly composed transformations for nested recursion and loops and verifying their correctness. We claim that UniRec completely subsumes the unimodular framework for loop transformations since nested recursion is more general than loop nests. </p>
68

Improving the performance of GPU-accelerated spatial joins

Hrstic, Dusan Viktor January 2017 (has links)
Data collisions have been widely studied by various fields of science and industry. Combing CPU and GPU for processing spatial joins has been broadly accepted due to the increased speed of computations. This should redirect efforts in GPGPU research from straightforward porting of applications to establishing principles and strategies that allow efficient mapping of computation to graphics hardware. As threads are executing instructions while using hardware resources that are available, impact of different thread organizations and their effect on spatial join performance is analyzed and examined in this report.Having new perspectives and solutions to the problem of thread organization and warp scheduling may contribute more to encourage others to program on the GPU side. The aim with this project is to examine the impact of different thread organizations in spatial join processes. The relationship between the items inside datasets are examined by counting the number of collisions their join produce in order to understand how different approaches may have an influence on performance. Performance benchmarking, analysis and measuring of different approaches in thread organization are investigated and analyzed in this report in order to find the most time efficient solution which is the purpose of the conducted work.This report shows the obtained results for the utilization of different thread techniques in order to optimize the computational speeds of the spatial join algorithms. There are two algorithms on the GPU, one implementing thread techniques and the other non-optimizing solution. The GPU times are compared with the execution times on the CPU and the GPU implementations are verified by observing the collision counters that are matching with all of the collision counters from the CPU counterpart.In the analysis part of this report the the implementations are discussed and compared to each other. It has shown that the difference between algorithm implementing thread techniques and the non-optimizing one lies around 80% in favour of the algorithm implementing thread techniques and it is also around 56 times faster then the spatial joins on the CPU. / Datakollisioner har studerats i stor utsträckning i olika områden inom vetenskap och industri. Att kombinera CPU och GPU för bearbetning av rumsliga föreningar har godtagits på grund av bättre prestanda. Detta bör omdirigera insatser i GPGPU-forskning från en enkel portning av applikationer till fastställande av principer och strategier som möjliggör en effektiv användning av grafikhårdvara. Eftersom trådar som exekverar instruktioner använder sig av hårdvaruresurser, förekommer olika effekter beroende på olika trådorganisationer. Deras på verkan på prestanda av rumsliga föreningar kommer att analyseras och granskas i denna rapport. Nya perspektiv och lösningar på problemet med trådorganisationen och schemaläggning av warps kan bidra till att fler uppmuntras till att använda GPU-programmering. Syftet med denna rapport är att undersöka effekterna av olika trådorganisationer i rumsliga föreningar. Förhållandet mellan objekten inom datamängder undersöks genom att beräkna antalet kollisioner som ihopslagna datamängder förorsakar. Detta görs för att förstå hur olika metoder kan påverka effektivitet och prestanda. Prestandamätningar av olika metoder inom trå dorganisationer undersö ks och analyseras fö r att hitta den mest tidseffektiva lösningen. I denna rapport visualiseras också det erhållna resultatet av olika trådtekniker som används för att optimera beräkningshastigheterna för rumsliga föreningar. Rapporten undersökeren CPU-algoritm och två GPU-algoritmer. GPU tiderna jämförs hela tiden med exekveringstiderna på CPU:n, och GPU-implementeringarna verifieras genom att jämföra antalet kollisioner från både CPU:n och GPU:n. Under analysdelen av rapporten jämförs och diskuteras olika implementationer med varandra. Det visade sig att skillnaden mellan en algoritm som implementerar trådtekniker och en icke-optimerad version är cirka 80 % till förmån för algoritmen som implementerar trådtekniker. Det visade sig också föreningarna på CPU:n att den är runt 56 gånger snabbare än de rumsliga
69

Taking architecture and compiler into account in formal proofs of numerical programs / Preuves formelles de programmes numériques en prenant en compte l'architecture et le compilateur

Nguyen, Thi Minh Tuyen 11 June 2012 (has links)
Sur des architectures récentes, un programme numérique peut donner des réponses différentes en fonction du hardware et du compilateur. Ces incohérences des résultats viennent du fait que chaque calcul en virgule flottante est effectué avec des précisions différentes. Le but de cette thèse est de prouver formellement des propriétés des programmes opérant sur des nombres flottants en prenant en compte l’architecture et le compilateur. Pour le faire, nous avons proposé deux approches différentes. La première approche est de prouver des propriétés des programmes en virgule flottante qui sont vraies sur plusieurs architectures et compilateurs. Cette approche ne considère que les erreurs d’arrondi qui doivent être validées quels que soient l’environnement matériel et le choix du compilateur. Elle est implantée dans la plate-forme Frama-C pour l’analyse statique de code C. La deuxième approche consiste à prouver des propriétés des programmes en analysant leur code assembleur. Nous nous concentrons sur des problèmes et des pièges qui apparaissent sur des calculs en virgule flottante. L’analyse directe du code assembleur nous permet de considérer des caratéristiques dépendant de l’architecture ou du compilateur telle que l’utilisation des registres en précision étendue. Cette approche est implantée comme une sur-couche de la plate-forme Why pour la vérification déductive. / On some recently developed architectures, a numerical program may give different answers depending on the execution hardware and the compilation. These discrepancies of the results come from the fact that each floating-point computation is calculated with different precisions. The goal of this thesis is to formally prove properties about numerical programs while taking the architecture and the compiler into account. In order to do that, we propose two different approaches. The first approach is to prove properties of floating-point programs that are true for multiple architectures and compilers. This approach states the rounding error of each floating-point computation whatever the environment and the compiler choices. It is implemented in the Frama-C platform for static analysis of C code. The second approach is to prove behavioral properties of numerical programs by analyzing their compiled assembly code. We focus on the issues and traps that may arise on floating-point computations. Direct analysis of the assembly code allows us to take into account architecture- or compiler-dependent features such as the possible use of extended precision registers. It is implemented above the Why platform for deductive verification
70

Τεχνικές μεταγλωττιστών και αρχιτεκτονικές επεξεργαστών για στατιστικές και δυναμικές εφαρμογές

Αλαχιώτης, Νικόλαος 19 July 2010 (has links)
Οι σημερινές εφαρμογές έχουν ολοένα και μεγαλύτερες ανάγκες επεξεργαστικής ισχύος προκειμένου να εκτελεστούν σε συντομότερο χρονικό διάστημα. Για να την ικανοποίηση αυτών των χρονικών περιορισμών απαιτείται η ανάπτυξη βελτιστοποιημένων τεχνικών σχεδιασμού. Το αντικείμενο της παρούσας διατριβής σχετίζεται με την ανάπτυξη αρχιτεκτονικών και τεχνικών μεταφραστών με σκοπό την γρηγορότερη τροφοδότηση του επεξεργαστή με δεδομένα από την ιεραρχία μνήμης. α) Μεθοδολογία επιτάχυνσης εκτέλεσης εφαρμογής πολλαπλασιασμού πινάκων Παρουσιάζεται μία μεθοδολογία που βασίζεται στην τοπικότητα των δεδομένων με σκοπό την επιτάχυνση εκτέλεσης του πολλαπλασιασμού πινάκων. Μετά από διερεύνηση, παράγεται ο βέλτιστος τρόπος χρονοπρογραμματισμού των προσπελάσεων στη μνήμη λαμβάνοντας υπόψη την τοπικότητα των δεδομένων και τα μεγέθη των επιπέδων ιεραρχίας μνήμης. Ο χρόνος διερεύνησης είναι σύντομος καθώς απορρίπτονται όλες οι μη-βέλτιστες λύσεις. Η προτεινόμενη μεθοδολογία συγκρίνεται με άλλες υπάρχουσες και παρατηρείται αύξηση της απόδοσης μέχρι 55%. β)Mεθοδολογία αποδοτικής υλοποίησης του Fast Fourier Transform (FFT) Παρουσιάζεται μια νέα μεθοδολογία, που επιτυγχάνει βελτιωμένη απόδοση στην υλοποίηση του FFT, έχοντας ως γνώμονα την ελαχιστοποίηση των προσπελάσεων που πραγματοποιούνται στα δεδομένα. Η προτεινόμενη μεθοδολογία έχει σημαντικά πλεονεκτήματα. Πρώτον, την πλήρη αξιοποίηση της παραγωγής και της κατανάλωσης των αποτελεσμάτων των πεταλούδων του FFT αλγορίθμου, της επαναχρησιμοποίησης δεδομένων και της συμμετρίας των twiddle συντελεστών του FFT αλγορίθμου. Δεύτερον, η βέλτιστη λύση χρονοπρογραμματισμού βρίσκεται λαμβάνοντας υπόψη τόσο τον αριθμό των καταχωρητών, όσο και το μέγεθος της κρυφής μνήμης κάθε επιπέδου, αναζητώντας μόνο τον αριθμό του επιπέδου του tiling του FFT. Τρίτον, ο χρόνος μετάφρασης και το μέγεθος του πηγαίου κώδικα είναι πολύ μικροί συγκρινόμενοι με την SOA βιβλιοθήκη υλοποίησης του FFT αλγορίθμου, την FFTW. Η προτεινόμενη μεθοδολογία επιτυγχάνει αύξηση της απόδοσης μέχρι και 63% σε σχέση με την βιβλιοθήκη FFTW. γ)Ανάπτυξη Αρχιτεκτονικών για Διαχείριση Μνήμης Παρουσιάζεται μια αποσυζευγμένη αρχιτεκτονική επεξεργαστών με μια ιεραρχία μνήμης που αποτελείται μόνο από μνήμες scratch-pad, και μια κύρια μνήμη. Η αρχιτεκτονική αυτή εκμεταλλεύεται τα οφέλη των scratch-pad μνημών και τον παραλληλισμό μεταξύ της επεξεργασίας δεδομένων και υπολογισμού διευθύνσεων. Η αρχιτεκτονική συγκρίνεται στην απόδοση με την αρχιτεκτονική MIPS με cache και με scratch-pad ιεραρχίες μνήμης και παρουσιάζεται η υψηλότερη απόδοσή της. Τα πειραματικά αποτελέσματα δείχνουν ότι η απόδοση αυξάνεται μέχρι 3,7 φορές. Στη συνέχεια γίνεται περαιτέρω έρευνα σε αρχιτεκτονικές με Scratch-pad μνήμες. Παρουσιάζεται μια αρχιτεκτονική που είναι σε θέση να παρέχει πληροφορίες για το ακριβές περιεχόμενο δεδομένων της scratch-pad, κατά τη διάρκεια της εκτέλεσης και μπορεί επίσης να εκτελέσει όλες τις απαραίτητες ενέργειες για την τοποθέτηση των νέων δεδομένων στη scratch-pad. Με αυτόν τον τρόπο, αξιοποιείται η επαναχρησιμοποίηση δεδομένων που εμφανίζεται τυχαία και δεν μπορεί να προσδιοριστεί από το μεταγλωττιστή. Συγκρίνεται με αρχιτεκτονική MIPS που περιέχει cache και με scratch-pad μνήμες και αναδεικνύεται η μεγαλύτερη απόδοσή της. Τα πειραματικά αποτελέσματα δείχνουν ότι η απόδοση αυξάνεται μέχρι 5 φορές έναντι των αρχιτεκτονικών με scratch-pad και 2.5 φορές έναντι των αρχιτεκτονικών με cache. / Modern applications have indence needs in processing power in order to be executed in short time. For satisfying the time limits, there have to be generated new techniques for optimizing the designs. The object of the present thesis is about developing new compiler techniques and hardware architectures which aim to transfer data faster, from the memory hierarchy to the CPU. a) Methdology for accelerating the execution of matrix multiplications A new methodology using the standard MMM algorithm is presented, achieving improved performance by focusing on data locality (both temporal and spatial). This methodology finds the scheduling which conforms with the optimum memory management. The scheduling used for the tile level is different from the element level’s one, having better data locality, suited to the sizes of memory hierarchy. Its exploration time is short, because it searches only for the number of the level of tiling used for finding the best tile size for each cache level. Compared with the best existing related work, which we implemented, better performance up to 55% β)Methodology for increasing performance on Fast Fourier Transform (FFT) A new methodology is presented based on minimizing the memory accesses for FFT. It exploits, the production and comsumption of the FFT batterfly results and the reuse of data. The optimum scheduling solution is found taking into account the number of registers and the cache memory size. The compile time and source code size are short comparing to SOA library. The methodology performance gains are up to 63% comparing to FFTW library. γ)Ανάπτυξη Αρχιτεκτονικών για Διαχείριση Μνήμης A decoupled processors architecture with a memory hierarchy is presented consisting only of scratch–pad memories, and a main memory. This architecture exploits both the benefits of scratch-pad memories and the parallelism between address computation and application data processing. The architecture is compared in performance with the MIPS architecture with cache and with scratch-pad memory hierarchies and with the existing decoupled architectures showing its higher normalized performance. Experimental results show that the performance is increased up to 3.7 times. Continuing, more research is done on Scratch-pad memories. We present an architecture that is able to provide information about the exact data contents of scratch-pad during execution and can also do all the necessary operations for placing the new data blocks in scratch-pad. Thereby, the temporal locality which occurs randomly and can not be identified by the compiler is exploited. It is compared with the MIPS architecture with cache and with scratch-pad memories showing its higher normalized performance. Experimental results show that the performance is increased up to 5 times compared to cache architectures and 2,5 times compared to existing scratch-pad architectures.

Page generated in 0.0307 seconds