• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 10
  • 8
  • 7
  • 2
  • 2
  • 1
  • Tagged with
  • 34
  • 11
  • 6
  • 6
  • 6
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
21

Performance evaluation of multithreading in a Diameter Credit Control Application

Åkesson, Gustav, Rantzow, Pontus January 2010 (has links)
Moore's law states that the amount of computational power available at a given cost doubles every 18 months and indeed, for the past 20 years there has been a tremendous development in microprocessors. However, for the last few years, Moore's law has been subject for debate, since to manage heat issues, processor manufacturers have begun favoring multicore processors, which means parallel computation has become necessary to fully utilize the hardware. This also means that software has to be written with multiprocessing in mind to take full advantage of the hardware, and writing parallel software introduces a whole new set of problems. For the last couple of years, the demands on telecommunication systems have increased and to manage the increasing demands, multiprocessor servers have become a necessity. Applications must fully utilize the hardware and such an application is the Diameter Credit Control Application (DCCA). The DCCA uses the Diameter networking protocol and the DCCA's purpose is to provide a framework for real-time charging. This could, for instance, be to grant or deny a user's request of a specific network activity and to account for the eventual use of that network resource. This thesis investigates whether it is possible to develop a Diameter Credit Control Application that achieves linear scaling and the eventual pitfalls that exist when developing a scalable DCCA server. The assumption is based on the observation that the DCCA server's connections have little to nothing in common (i.e. little or no synchronization), and introducing more processors should therefore give linear scaling. To investigate whether a DCCA server's performance scales linearly, a prototype has been developed. Along with the development of the prototype, constant performance analysis was conducted to see what affected performance and server scalability in a multiprocessor DCCA environment. As the results show, quite a few factors besides synchronization and independent connections affected scalability of the DCCA prototype. The results show that the DCCA prototype did not always achieve linear scaling. However, even if it was not linear, certain design decisions gave considerable performance increase when more processors were introduced.
22

Characterizing The Vulnerability Of Parallelism To Resource Constraints

Vivekanand, V 01 1900 (has links) (PDF)
No description available.
23

Co-projeto hardware/software para cálculo de fluxo ótico / Software/hardware co-desing for the optical flow calculation

Tiago Mendonça Lobo 17 June 2013 (has links)
O cálculo dos vetores de movimento é utilizado em vários processos na área de visão computacional. Problemas como estabelecer rotas de colisão e movimentação da câmera (egomotion) utilizam os vetores como entrada de algoritmos complexos e que demandam muitos recursos computacionais e consequentemente um consumo maior de energia. O fluxo ótico é uma aproximação do campo gerado pelos vetores de movimento. Porém, para aplicações móveis e de baixo consumo de energia se torna inviável o uso de computadores de uso geral. Um sistema embarcado é definido como um computador desenvolvido com um propósito específico referente à aplicação na qual está inserido. O objetivo principal deste trabalho foi elaborar um módulo em sistema embarcado que realiza o cálculo do fluxo ótico. Foi elaborado um co-projeto de hardware e software dedicado e implementados em FPGAs Cyclone II e Stratix IV para a prototipação do sistema. Desta forma, a implementação de um projeto que auxilia a detecção e medição do movimento é importante não só como aplicação isolada, mas para servir de base no desenvolvimento de outras aplicações como tracking, compressão de vídeos, predição de colisão, etc / The motion vectors calculation is used in many processes in the area of computer vision. Problems such as establishing collision routes and the movement of the camera (egomotion) use this vectors as input for complexes algorithms that require many computational and energy resources. The optical flow is an approximation of the field generated by the motion vectors. However, for mobile, low power consumption applications becomes infeasible to use general-purpose computers. An embedded system is defined as a computer designed with a specific purpose related to the application in which it is inserted. The main objective of this work is to implement a hardware and software co-design to assist the optical flow field calculation using the CycloneII and Stratix IV FPGAs. Sad that, it is easily to see that the implementation of a project to help the detection and measurement of the movement can be the base to the development of others applications like tracking, video compression and collision detection
24

Development and evaluation of a multiprocessing structural vibration algorithm

Morel, Michael January 1988 (has links)
No description available.
25

Μελέτη και εφαρμογή της θεωρίας της Decomposability στην εκτίμηση υπολογιστικών συστημάτων / An application of the theory of Decomposability to a computer system performance evaluation problem

Νικολακόπουλος, Αθανάσιος Ν. 31 July 2012 (has links)
Σκοπός της παρούσας διπλωματικής εργασίας είναι η μελέτη της θεωρίας της Near Complete Decomposability (NCD) και η εφαρμογή της στην ανάλυση της απόδοσης ενός υπολογιστικού συστήματος, του οποίου η μοντελοποίηση με παραδοσιακές τεχνικές οδηγεί σε απαγορευτικά μεγάλο χώρο κατάστασης. Αρχικά, παραθέτουμε τα βασικά σημεία της θεωρίας όπως αυτή θεμελιώνεται μαθηματικά από τον Courtois στην κλασική του μονογραφία (Courtois, 1977), ενώ στη συνέχεια προβαίνουμε στη μοντελοποίηση ενός υποθετικού σταθμού εργασίας κάποιου πολυεπεξεργαστικού συστήματος, στο οποίο εκτελούνται ανά πάσα στιγμή το πολύ Κ έργα. Ο σταθμός εργασίας που μελετάμε διαθέτει buffer πεπερασμένου μεγέθους και είναι επιφορτισμένος με τη συγκέντρωση και το συνδυασμό των επιμέρους υποέργων κάθε έργου και την αποθήκευση του στη μνήμη. Οι κλασικές τεχνικές μοντελοποίησης του buffer οδηγούν σε ένα μοντέλο με πολύ μεγάλο χώρο κατάστασης. Ωστόσο εμείς μοντελοποιούμε μία συναθροιστική εκδοχή του αρχικού μοντέλου, η οποία υπό αρκετά ρεαλιστικές συνθήκες χαίρει της NCD ιδιότητας. Την ιδιότητα αυτή του μοντέλου μας τη δικαιολογούμε τόσο διαισθητικά, όσο και μαθηματικά. Επίσης, επιβεβαιώνουμε πως το NCD μοντέλο πετυχαίνει υψηλής ποιότητας εκτίμηση των πιθανοτήτων μόνιμης κατάστασης και μίας σειράς άλλων χρήσιμων μετρικών, με σημαντικά μικρότερο υπολογιστικό κόστος σε σχέση με το αρχικό μοντέλο, εκτελώντας μία σειρά μετρήσεων στο περιβάλλον Matlab. Παράλληλα, η αξιοποίηση του NCD μοντέλου αυξάνει σημαντικά την ικανότητά μας να ερμηνεύσουμε τη δυναμική συμπεριφορά του συστήματος καθώς αυτό οδεύει προς μια κατάσταση στατιστικής ισορροπίας. Τέλος, επιχειρούμε μία σειρά από “educated guesses” για πιθανές κλάσεις συστημάτων τα οποία θα μπορούσαν να αναλυθούν με μεθοδολογία αντίστοιχη με αυτήν που ακολουθήσαμε εμείς στο παρόν κείμενο. / The purpose of this diploma dissertation is, on one hand the brief study of the theory of Near Complete Decomposability (NCD), and on the other hand the application of NCD in the analysis of a system, the modeling of which leads to a prohibitively large state space. First, we point out the fundamental mathematical principles of NCD as established by Courtois in his classic monograph (Courtois, 1977). Then, we proceed to the modeling of a hypothetical service station (R) of a multiprocessing computer system, which executes at most K jobs simultaneously. R has a finite buffer and its duty is to combine the arriving tasks into a single job and store it to memory. The usual modeling techniques applied to this “task buffer”, lead to a model with extremely large state space. So, we construct a lumped model instead, which enjoys the property of NCD. We prove this, using intuitive arguments as well as mathematical ones. Then, we confirm that the NCD model achieves a reliable estimation of the steady state probability vector and other important metrics, with significantly reduced computational complexity in comparison with the initial model. Furthermore, the exploitation of the NCD model increases significantly our ability to understand the dynamics of our system and to interpret aspects of its transient behavior towards statistical equilibrium. Finally, we make a number of “educated guesses” about possible classes of systems that could be analyzed using the same kind of techniques we used in this dissertation.
26

Parallelize Automated Tests in a Build and Test Environment

Durairaj, Selva Ganesh January 2016 (has links)
This thesis investigates the possibilities of finding solutions, in order to reduce the total time spent for testing and waiting times for running multiple automated test cases in a test framework. The “Automated Test Framework”, developed by Axis Communications AB, is used to write the functional tests to test both hardware and software of a resource. The functional tests that tests the software is considered in this thesis work. In the current infrastructure, tests are executed sequentially and resources are allocated using First In First Out scheduling algorithm. From the user’s point of view, it is inefficient to wait for many hours to run their tests that take few minutes to execute. The thesis consists of two main parts: (1) identify a plugin that suits the framework and executes the tests in parallel, which reduces the overall execution time of tests and (2) analyze various scheduling algorithms in order to address the resource allocation problem, which arose due to limited resource availability, while the tests were run in parallel. By distributing multiple tests across several resources and executing them in parallel, help in improving the test strategy, thereby reducing the overall execution times of test suites. The case studies were created to emulate the problematic scenarios in the company and sample tests were written that reflect the real tests in the framework. Due to the complexity of the current architecture and the limited resources available for running the test in parallel, a simulator was developed with the identified plugin in a multi-core computer, with each core simulating a resource. Multiple tests were run using the simulator in order to explore, check and assess if the overall execution time of the tests can be reduced. While achieving parallelism in running the automated tests, resource allocation became a problem, since limited resources are available to run parallel tests. In order to address this problem, scheduling algorithms were considered. A prototype was developed to mimic the behaviour of a scheduling plugin and the scheduling algorithms were implemented in the prototype. The set of values were given as input to the prototype and tested with scenarios described under case studies. The results from the prototype are used to analyze the impact caused by various scheduling algorithms on reducing the waiting times of the tests. The combined usage of simulator along with scheduler prototype helped in understanding how to minimize the total time spent for testing and improving the resource allocation process.
27

Bounds For Scheduling In Non-Identical Uniform Multiprocessor Systems

Darera, Vivek N 06 1900 (has links)
With multiprocessors and multicore processors becoming ubiquitous, focus has shifted from research on uniprocessors to that on multiprocessors. Results derived for the uniprocessor case unfortunately do not always directly extend to the multiprocessor case in a straightforward manner. This necessitates a paradigm shift in the approach used to design and analyse the behaviour of such processors. In the case of Real-time systems, that is, systems characterised by explicit timing constraints, analysis and performance guarantees are more important, as failure is unacceptable. Scheduling algorithms used in Real-time systems have to be carefully designed as the performance of the system depends critically on them. Efficient tests for determining if a set of tasks can be feasibly scheduled on such a computing system using a particular scheduling algorithm thus assumes importance. Traditionally, the ‘task utilization’ parameter has been used for devising such tests. Utilization based tests for Earliest Deadline First(EDF) and Rate Monotonic(RM) scheduling algorithms are known and are well understood for uniprocessor systems. In our work, we derive limits on similar bounds for the multiprocessor case. Our work diners from previous literature in that we explore the case when the individual processors constituting the multiprocessor need not be identical. Each processor in such a system is characterised by a capacity, or speed, and the time taken by a task to execute on a processor is inversely proportional to its speed. Such instances may arise during system upgradation, when faster processors may be added to the system, making it a non-identical multiprocessor, or during processor design, when the different cores on the chip may have different processing power to handle dynamic workloads. We derive results for the partitioned paradigm of multiprocessor scheduling, that is, when tasks are partitioned among the processors, and interprocessor migration after a part of execution is completed is not allowed. Results are derived for both fixed priority algorithms(RM)and dynamic priority algorithms (EDF) used on individual processors. A maximum and minimum limit on the bounds for a ‘greedy’ class of algorithms are established, since the actual value of the bound depends on the algorithm that allocates the tasks. We also derive the utilization bound of an algorithm whose bound is close to the upper limit in both cases. We find that an expression for the utilization bound can be obtained when EDF is used as the uniprocessor scheduling algorithm, but when RM is the uniprocessor scheduling algorithm,an O(mn) algorithm is required to find the utilization bound, where m is the number of tasks in the system and n is the number of processors. Knowledge of such bounds allows us to carry out very fast schedulability tests, although we have the limitation that the tests are sufficient but not necessary to ensure schedulability. We also compare the value of the bounds with those achievable in ‘equivalent’ identical multiprocessor systems and find that the performance guarantees provided by the non-identical multiprocessor system are far higher than those offered by the equivalent identical system.
28

Validation et mise en oeuvre de la synchronisation dans un système multiprocesseur à mémoire dupliquée

Latapie, Guy 14 November 1980 (has links) (PDF)
ON PRESENTE LES PRINCIPAUX OUTILS QUI PERMETTENT DE SPECIFIER ET D'IMPLEMENTER LES MECANISMES DE SYNCHRONISATION DANS UN SYSTEME MONOPROCESSEUR PUIS MULTIPROCESSEUR. ON DEFINIT ET ON MONTRE LES REGLES D'EVOLUTION DES RESEAUX DE PETRI ET ON DETAILLE LES DIVERSES METHODES D'ANALYSE QU'ILS AUTORISENT. ON PROPOSE UN MODELE DERIVE APPELE RESEAUX DE PETRI A JETONS INDIVIDUALISES. ON DECRIT UN ALGORITHME DE SYNCHRONISATION ET ON PROPOSE UNE MISE EN OEUVRE DE CET ALGORITHME A PARTIR DES RESEAUX DE PETRI A JETONS INDIVIDUALISES
29

Architecture and Compiler Support for Leakage Reduction Using Power Gating in Microprocessors

Roy, Soumyaroop 31 August 2010 (has links)
Power gating is a technique commonly used for runtime leakage reduction in digital CMOS circuits. In microprocessors, power gating can be implemented by using sleep transistors to selectively deactivate circuit modules when they are idle during program execution. In this dissertation, a framework for power gating arithmetic functional units in embedded microprocessors with architecture and compiler support is proposed. During compile time, program regions are identified where one or more functional units are idle and sleep instructions are inserted into the code so that those units can be put to sleep during program execution. Subsequently, when their need is detected during the instruction decode stage, they are woken up with the help of hardware control signals. For a set of benchmarks from the MiBench suite, leakage energy savings of 27% and 31% are achieved (based on a 70 nm PTM model) in the functional units of a processor, modeled on the ARM architecture, with and without floating point units, respectively. Further, the impact of traditional performance-enhancing compiler optimizations on the amount of leakage savings obtained with this framework is studied through analysis and simulations. Based on the observations, a leakage-aware compilation flow is derived that improves the effectiveness of this framework. It is observed that, through the use of various compiler optimizations, an additional savings of around 15% and even up to 9X leakage energy savings in individual functional units is possible. Finally,in the context of multi-core processors supporting multithreading, three different microarchitectural techniques, for different multithreading schemes, are investigated for state-retentive power gating of register files. In an in-order core, when a thread gets blocked due to a memory stall, the corresponding register file can be placed in a low leakage state. When the memory stall gets resolved, the register file is activated so that it may be accessed again. The overhead due to wake-up latency is completely hidden in two of the schemes, while it is hidden for the most part in the third. Experimental results on multiprogrammed workloads comprised of SPEC 2000 integer benchmarks show that, in an 8-core processor executing 64 threads, the average leakage savings in the register files, modeled in FreePDK 45 nm MTCMOS technology, are 42% in coarse-grained multithreading, while they are between 7% and 8% in fine-grained and simultaneous multithreading. The contributions of this dissertation represent a significant advancement in the quest for reducing leakage energy consumption in microprocessors with minimal degradation in performance.
30

HdSC: modelagem de alto nível para simulação nativa de plataformas com suporte ao desenvolvimento de HdS

Prado, Bruno Otávio Piedade 08 1900 (has links)
Com os grandes avanços recentes dos sistemas computacionais, houve a possibilidade de ascensão de dispositivos inovadores, como os modernos telefones celulares e tablets com telas sensíveis ao toque. Para gerenciar adequadamente estas diversas interfaces é necessário utilizar o software dependente do hardware (HdS), que é responsável pelo controle e acesso a estes dispositivos. Além deste complexo arranjo de componentes, para atender a crescente demanda por mais funcionalidades integradas, o paradigma de multiprocessamento vem sendo adotado para aumentar o desempenho das plataformas. Devido à lacuna de produtividade de sistemas, tanto a indústria como a academia têm pesquisado processos mais eficientes para construir e simular sistemas cada vez mais complexos. A premissa dos trabalhos do estado da arte está em trabalhar com modelos com alto nível de abstração e de precisão que permitam ao projetista avaliar rapidamente o sistema, sem ter que depender de lentos e complexos modelos baseados em ISS. Neste trabalho é definido um conjunto de construtores para modelagem de plataformas baseadas em processadores, com suporte para desenvolvimento de HdS e reusabilidade dos componentes, técnicas para estimativa estática de tempo simulado em ambiente nativo de simulação e suporte para plataformas multiprocessadas. Foram realizados experimentos com aplica- ções de entrada e saída intensiva, computação intensiva e multiprocessada, com ganho médio de desempenho da ordem de 1.000 vezes e precisão de estimativas com erro médio inferior a 3%, em comparação com uma plataforma de referência baseada em ISS._________________________________________________________________________________________ ABSTRACT: The amazing advances of computer systems technology enabled the rise of innovative devices, such as modern touch sensitive cell phones and tablets. To properly manage these various interfaces, it is required the use of the Hardwaredependent Software (HdS) that is responsible for these devices control and access. Besides this complex arrangement of components, to meet the growing demand for more integrated features, the multiprocessing paradigm has been adopted to increase the platforms performance. Due to the system design gap, both industry and academia have been researching for more efficient processes to build and simulate systems with this increasingly complexity. The premise of the state of art works is the development of high level of abstraction and precise models to enable the designer to quickly evaluate the system, without having to rely on slow and complex models based on instruction set simulators (ISS). This work defined a set of constructors for processor-based platforms modeling, supporting HdS development and components reusability, techniques for static simulation timing estimation in native environment and support for multiprocessor platforms. Experiments were carried out with input and output intensive, compute intensive and multiprocessed applications leading to an average performance speed up of about 1,000 times and average timing estimation accuracy of less than 3%, when compared with a reference platform based on ISS.

Page generated in 0.0533 seconds