Global ETD Search

51	A Multiprocessor Platform Based on FPGA Technology Targeted for a Driver Vigilance Monitoring Device Moussa, Wafik January 2009 (has links) Medical devices processing images or audio or executing complex AI algorithms are able to run more efficiently and meet real time requirements if the parallelism in those algorithms is exploited. In this research a methodology is proposed to exploit the flexibility and short design cycle of FPGAs (Field Programmable Gate Arrays) in order to achieve this target. Hardware/software co-design and dynamic partitioning allow the optimization of the multiprocessor platform design parameters and software code targeting each core to meet real time constraints. This is practically demonstrated by building a real life driver vigilance monitoring system based on visual cues extraction and evaluation. The application drives the whole design process to prove its effectiveness. An algorithm was built to achieve the goal of detecting the eye state of the driver (open or closed) and it is applied on captured consecutive frames to evaluate the vigilance state of the driver. Vigilance state is measured depending on duration of eye closure. This video processing application is then targeted to run on a multi-core FPGA based processing platform using the proposed methodology. Results obtained were very good using the Grimace Face Database and when operating the system on one’s face. On operating the device, a false positive of eye closure must take place two consecutive times in order to get an alarm, which decreases the probability of failure. The timing analysis applied proved the importance of using the concept of parallelism to achieve performance constraints. FPGA technology proved to be a very powerful prototyping tool for complex multiprocessor systems design. The flexible FPGA technology coupled with hardware/software co-design provided means to explore the design space and reach decisions that satisfy the design constraints with minimum time investment and cost. Multiprocessor FPGA Driver Vigilance Hardware/Software Co-Design Image Processing Portable Devices Electrical and Computer Engineering
52	Hardware/Software Deadlock Avoidance for Multiprocessor Multiresource System-on-a-Chip Lee, Jaehwan 23 November 2004 (has links) This thesis describes fast and deterministic deadlock avoidance methods that are easily applicable to real-time MultiProcessor System-on-a-Chip (MPSoC) design. This thesis first describes the proofs of the correctness of Parallel Deadlock Detection Algorithm (PDDA) and the run-time complexity of its hardware implementation in the Deadlock Detection Unit (DDU), proposed previously. The DDU has a worst-case run-time of O(min(m,n)) where m and n are the numbers of resources and processes, respectively. This thesis also provides detailed explanation and mathematical analysis of PDDA and the DDU along with examples, as well as extensive performance comparisons among PDDA in software, the DDU and an O(m x n) deadlock detection algorithm. The DDU is 100X or more faster than software implementations of deadlock detection algorithms. This thesis secondly proposes a novel deadlock avoidance algorithm and its hardware implementation in the Deadlock Avoidance Unit (DAU) that provides very fast and automatic deadlock avoidance in an MPSoC with multiple single-instance resources. The DAU avoids deadlock by not allowing any grant or request that leads to a deadlock. In case of livelock in an attempt to avoid deadlock, the DAU asks one of the processes involved in the livelock to release resource(s) so that such a livelock can also be resolved. We simulated two synthetic applications that can benefit from the DAU and demonstrated that the DAU avoids deadlock approximately 300X faster than its software implementation does. This thesis also proposes a novel Parallel Bankers Algorithm (PBA), a parallelized version of the Banker's Algorithm, and its hardware implementation in PBA Unit (PBAU) that provides fast, automatic deadlock avoidance for multiple-instance resource systems. The run-time complexity of the PBA is O(n) with the best case of O(1). The PBAU is about 1000X faster than the Banker's Algorithm in software and achieves in a particular example a 19% speed-up of application execution time. We believe that our approaches initiate a paradigm shift in the context of deadlock solutions for MPSoC from sole software to hardware/software partitioned solutions that enable a distribution of part of the burden imposed on processors to a low cost, fast hardware IP core exploiting full parallelism. Hardware Deadlock Avoidance Hardware Deadlock Detection Multiprocessor System on a Chip Parallel Banker's Algorithm
53	Multiprocessor Scheduling with Availability Constraints Grigoriu, Liliana 2010 May 1900 (has links) We consider the problem of scheduling a given set of tasks on multiple pro- cessors with predefined periods of unavailability, with the aim of minimizing the maximum completion time. Since this problem is strongly NP-hard, polynomial ap- proximation algorithms are being studied for its solution. Among these, the best known are LPT (largest processing time first) and Multifit with their variants. We give a Multifit-based algorithm, FFDL Multifit, which has an optimal worst- case performance in the class of polynomial algorithms for same-speed processors with at most two downtimes on each machine, and for uniform processors with at most one downtime on each machine, assuming that P 6= NP. Our algorithm finishes within 3/2 the maximum between the end of the last downtime and the end of the optimal schedule. This bound is asymptotically tight in the class of polynomial algorithms assuming that P 6= NP. For same-speed processors with at most k downtimes on each machine our algorithm finishes within ( 3 2 + 1 2k ) the end of the last downtime or the end of the optimal schedule. For problems where the optimal schedule ends after the last downtime, and when the downtimes represent fixed jobs, the maximum completion time of FFDL Multifit is within 3 2 or ( 3 2+ 1 2k ) of the optimal maximum completion time. We also give an LPT-based algorithm, LPTX, which matches the performance of FFDL Multifit for same-speed processors with at most one downtime on each machine, and is thus optimal in the class of polynomial algorithms for this case. LPTX differs from LPT in that it uses a specific order of processors to assign tasks if two processors become available at the same time. For a similar problem, when there is at most one downtime on each machine and no more than half of the machines are shut down at the same time, we show that a bound of 2 obtained in a previous work for LPT is asymptotically tight in the class of polynomial algorithms assuming that P 6= NP. multiprocessor scheduling LPT machine shutdowns worst-case bounds makespan fixed jobs Multifit uniform processors
54	Analysis and optimization of global interconnects for many-core architectures Balakrishnan, Anant 02 December 2010 (has links) The objective of this thesis is to develop circuit-aware interconnect technology optimization for network-on-chip based many-core architectures. The dimensions of global interconnects in many-core chips are optimized for maximum bandwidth density and minimum delay taking into account network-on-chip router latency and size effects of copper. The optimal dimensions thus obtained are used to characterize different network-on-chip topologies based on wiring area utilization, maximum core-to-core channel width, aggregate chip bandwidth and worse case latency. Finally, the advantages of many-core many-tier chips are evaluated for different network-on-chip topologies. Area occupied by a router within a core is shown to be the bottleneck to achieve higher performance in network-on-chip based architectures. Circuit optimization Routing Delay estimation Multiprocessor interconnections Networks on a chip Moore's law Semiconductors
55	An Implementation of Cross Architecture Procedure Call Laeeq, Khan M 06 1900 (has links) Indian Institute of Science / workstations are ideally suited for computing jobs which require an interactive environment because they are basically single user machines and hence provide consistent response time. Another factor is the availability of many peripheral devices such as mice and light pens etc., which render workstations more user friendly for interactive jobs. However workstations are not suitable for highly compute intensive jobs as they are basically uniprocessor machines operating at moderate frequencies. For such type of work, large mainframes or supercomputers are more suitable, but interactive use of these machines is not economically feasible. Further more devices like mice etc., are not usually available for these types of machines. A typical application program is partly interactive and partly compute intensive and hence requires the features of workstations and supercomputers both. We have implemented a Cross Architecture Procedure call (CAPC) model. The purpose of this architecture is to make supercomputers available to workstation users as compute savers. In this method a workstation user marks some of the procedures in his/her application program which he/she wants to be executed on a remote mainframe or supercomputer connected to the workstation by a network. These procedures are compiled by a compiler to produce machine code for the computer on which they are supposed to be executed. A special purpose loader loads these procedures on the appropriate machines and then these procedures are executed on the remote machines at appropriate times without any further modification in the source code. Usually a user will want to execute compute intensive procedures on supercomputers and interactive parts on a workstation thus utilizing both the machines most efficiently. In our method both local and remote procedures use standard subroutine call instructions unlike RPC. In this architecture, both local and remote subroutines share a common virtual address space (physically distributed over many machines) and thus global and pointer variables can be used and parameters can be passed by reference with complete transparency. Arbitrary nesting of remote and local procedures is also possible. In our prototype implementation we have used an IBM - PC (8088 processor operating at 4.7 MHz) as a workstation and a MAGNUM - 1 (68030 processor operating at 25 MHz) as a compute server. As an IBM - PC does not have any virtual memory hardware (essential for our architecture) we have simulated a virtual memory management system for that machine through software, Our "network" is an RS 232C connection between the two machines using COTPL (Connection Oriented Transport. Provider for Local Communications) operating at 9600 baud. To test the system we have also implemented the required compiler for a simple language (a subset of Pascal - PL/OI which produces code for 8088 and 68030 machines, and also a special loader. The system has been completely implemented and tested with several programs. We have also made a thorough performance study of this system. The System is found to accelerate the applications as much as 2.8 times in the best cases. Computer and Information Science Cross Architecture Procedure Call Multiprocessor Systems Remote Procedure Call
56	Constraint-based real-time scheduling for process control Song, Jianping 23 November 2010 (has links) This research addresses real-time task scheduling in industrial process control. It includes a constraint-based scheduler which is based on MSP.RTL, a tool for real-time multiprocessor scheduling problems with a wide variety of timing constraints. This dissertation extends previous work in two broad directions: improving the tool itself and broadening the application domain of the tool to include wired and wireless industrial process control. For the tool itself, we propose enhancements to MSP.RTL in three steps. In the first step, we modify the data structure for representing the temporal constraint graph and cutting the memory usage in half. In the second step, we model the search problem as a constraint satisfaction problem (CSP) and utilize backmarking and conflict-directed backjumping to speed up the search process. In the third step, we perform the search from the perspective of constraint satisfaction programming. As a result, we are able to use existing CSP techniques efficiently, such as look ahead, backjumping and consistency checking. Compared to the various ad hoc heuristics used in the original version, the new approach is more systematic and powerful. To exercise the new MSP.RTL tool, we acquired an updated version of the Boeing 777 Integrated Airplane Information Management System(AIMS). This new benchmark problem is more complicated than the old one used in the original tool in that data communications are described in messages and a message can have multiple senders and receivers. The new MSP.RTL tool successfully solved the new benchmark problem, whereas the old tool would not be able to do so. In order to apply real-time scheduling in industrial process control, we carry out our research in two directions. First, we apply the improved tool to traditional wired process control. The tool has been successfully applied to solve the block assignment problem in Fieldbus networks, where each block comprising the control system is assigned to a specific device such that certain metrics of the system can be optimized. Wireless industrial control has received a lot of attention recently. We experimented with the tool to schedule communications on a simulated wireless industrial network. In order to integrate the scheduler in real wireless process control systems, we are building an experimental platform based on the WirelessHART standard. WirelessHART, as the first open wireless standard for process control, defines a time synchronized MAC layer, which is ideal for real time process control. We have successfully implemented a prototype WirelessHART stack on Freescale JM128 toolkits and built some demo applications on top of it. Even with the scheduler tool to regulate communications in a wireless process control, it may still be possible that communications cannot be established on an inferior wireless link within an expected period. In order to handle this type of failures, we propose to make the control modules aware of the unreliability of wireless links, that is, to make the control modules adapt to the varying link qualities. PID(Proportional, Integral, Derivative) modules are the most used control modules. We developed PIDPlus, an enhanced PID algorithm to cope with possible lost inputs and outputs. It has been shown that PIDPlus can drastically improve the stability of the control loop in cases of unreliable wireless communications. / text
57	Σχεδίαση και ανάπτυξη συστήματος κατανεμημένης διαμοιραζόμενης μνήμης για πολυεπεξεργαστή του ενός ολοκληρωμένου (CMP) / Design and development of a shared distributed memory system for a chip multiprocessor (CMP) Αδαμίδης, Ανδρέας 09 February 2009 (has links) Αντικείμενο της παρούσας μεταπτυχιακής εργασίας είναι ο σχεδιασμός και η ανάπτυξη συστήματος κατανεμημένης διαμοιραζόμενης μνήμης ως τμήμα της αρχιτεκτονικής πολυεπεξεργαστικού συστήματος SiScape. Λόγω των ιδιαιτεροτήτων της αρχιτεκτονικής αυτής, το σύστημα μνήμης της και συγκεκριμένα η κρυφή μνήμη δευτέρου επιπέδου που καθιστά δυνατή τη λειτουργία του, κρίθηκε απαραίτητο να σχεδιαστεί και να αναπτυχθεί από το μηδέν, προκειμένου να ανταποκριθεί στις απαιτήσεις της. Ο σχεδιασμός της κρυφής μνήμης δευτέρου επιπέδου περιγράφηκε στη γλώσσα περιγραφής υλικού VHDL. / The purpose of this master thesis is the design and development of a shared distributed memory system as part of the multiprocessor architecture SiScape. Because of the architecture's irregular structure, it was imperative that the memory system and particularly the second level cache that enables its functionality, was designed from scratch, to fill all of its requirements. The design of the second level cache was described using the VHDL hardware description language. Πολυπεξεργαστής Συγχρονισμός 004.35 SiScape Chip multiprocessor VHDL Second level cache Synchronization
58	Control system architectures for distributed manipulators and modular robots Thatcher, Terence W. January 1987 (has links) This thesis outlines the evolution of computer hardware and software architectures which are suitable for the programming and control of modular robots and distributed manipulators. Fundamental aspects of automating manufacturing functions are considered and the use of flexible machines, constructed from components of a family of mechanical modules and associated control system elements, are proposed. Many of the features of these flexible machines can be identified with those of conventional industrial robots. However a broader class of manufacturing machine is represented in as much as the industrial user defines the kinematics and dynamics of the manipulator. Such flexible machines can be referred to as "modular robots" or, where the mechanical modules are arranged in concurrently operating but mechanically decoupled groups, as "distributed manipulators". The main body of the work reported centred on the design of a family of computer control system elements which can serve a range of distributed manipulator and modular robot forms. These control system elements, whose cost is commensurate with the size and complexity of the manipulator's mechanical configuration, necessarily have many of the features found in robot controllers but also require properties of reconfigurability, programmability, and control system performance for the considerable array of manipulator configurations which can be constructed. 629.892
59	Improved Heuristics for Partitioned Multiprocessor Scheduling Based on Rate-Monotonic Small-Tasks Müller, Dirk, Werner, Matthias 01 November 2012 (has links) (PDF) Partitioned preemptive EDF scheduling is very similar to bin packing, but there is a subtle difference. Estimating the probability of schedulability under a given total utilization has been studied empirically before. Here, we show an approach for closed-form formulae for the problem, starting with n = 3 tasks on m = 2 processors. Scheduling Mehrprozessor RMS partitioniert Echtzeit Scheduling Multiprocessor RMS partitioned real time ddc:004 Scheduling Heuristik
60	Multiprocessor scheduling in the presence of link contention delays Macey, Benjamin January 2004 (has links) [Truncated abstract] Parallel computing is recognised today as an important tool in the solution of a wide variety of computationally intensive problems, problems which were previously considered intractable. While it offers the promise of vastly increased performance, parallel computing introduces additional complexities which are not encountered with sequential processing. One of these is the scheduling problem, in which the individual tasks comprising a parallel program are scheduled onto the processors comprising the parallel architecture. The objective is to minimise execution time while still preserving the precedence relations between the tasks. Scheduling is of vital importance since a poor task schedule can undo any potential gains from the parallelism present in the application. Inappropriate scheduling can result in the hardware being used inefficiently, or worse, the program could run slower in parallel than on a single processor. The scheduling problem is one of the more difficult problems facing the parallel programmer. In fact, it is NP-complete in the general case. As a result, a large number of heuristic methods with sub-optimal performance but polynomial, rather than exponential, time complexity have been proposed. In order to simplify their algorithms, researchers have restricted the problem: by making assumptions concerning the parallel architecture or imposing limitations on the task graph representing the parallel program. The evolution of the task scheduling problem has involved the gradual relaxation of these restrictions. A major change occurred when the assumption of zero inter-processor communication costs was removed. This was driven by the increasing popularity of distributed-memory message-passing multiprocessors. Multiprocessors Multiprocessor scheduling List scheduling Parallel computing Interprocessor communication Contention delay

Search results