• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 79
  • 25
  • 17
  • 13
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 159
  • 55
  • 48
  • 45
  • 43
  • 42
  • 34
  • 32
  • 31
  • 24
  • 24
  • 23
  • 19
  • 18
  • 18
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
31

OpenMPBench : An Open-Source Benchmark for Multiprocessor Based Embedded Systems / OpenMPBench : en Open-Source riktmärke för multiprocessor baserade inbyggda system

Liang, Yuchen, Iqbal, Syed Muhammad Zeeshan January 2010 (has links)
It is a new and open-source benchmark for multiprocessor based embedded system. It comprises a set of parallel implementations for seven classical algorithms that cover different computing features of general-purpose processor. The performance data including tables and figures is provided for guiding the potential users to evaluate the design of multiprocessor based embedded system. The parallel implementations for seven applications that cover four categories are shown according to the category: Automation and Industry Control * Bitcount * SUSAN * BASICMATH Network * Patricia * Dijkstra Office * Stringsearch Security * SHA Among them, Bitcount and Dijkstra involve more than one parallel application implemented for different functions or using different strategies. Bitcount consists three parallel applications, parallel Bitcnt_1, parallel Bitstring and parallel Bitcnts, that implemented bit counting with different strategy. Three parallel applications implemented for Dijkstra. One is for all-pairs shortest paths problem. Another two are for solving single-source shortest paths problem using single queue strategy and multiple queue strategy respectively. Stringsearch consists of Pratt-Boyer-Moore, Case-sensitive Boyer-Moore-Horspool, Case-Insensitive Boyer-Moore-Horspool, and Boyer-Moore-Horspool (Case-insensitive with accented character translation) implementations. Source code of sequential versions of these applications download from Mibench as well as the standard output based on x86-linux. For OpenMPBench, all parallel applications have been implemented in ANCI C language using POSIX threads. All libraries related to implementations are based on GNU standard library. Development environment is in UBUNTU 9.04 with 2.6.28-generic Linux kernel, GCC 4.2.4 compiler, and Emacs 22.1 editor. On the basis of current hardware condition, a workstation with 8 processors, shipped with UBUNTU 4.2.4, is selected for experiment environment. UBUNTU is a free GNU Linux version that offers all GNU standard library and GCC has been installed by default. In conclusion, we consider this experiment environment is available to simulate the multiprocessor based on embedded systems. / Det är en ny och öppen källkod riktmärke för multiprocessor baserade inbyggda system. Det innehåller en rad parallella implementationer i sju klassiska algoritmer som täcker olika datorer funktioner i allmänt bruk processor. Uppgifter om prestanda inklusive tabeller och siffror ges för att styra potentiella användare att utvärdera utformningen av multiprocessor baserade inbyggda system. De parallella implementeringar för sju ansökningar som omfattar fyra kategorier visas beroende på vilken kategori: Automation och industri Control * Bitcount * SUSAN * BASICMATH Nätverk * Patricia * Dijkstra Office * Stringsearch Säkerhet * SHA Bland dem, Bitcount och Dijkstra omfattar mer än en parallell ansökan genomförs för olika funktioner eller med hjälp av olika strategier. Bitcount består tre parallella program, parallell Bitcnt_1, parallell Bitstring och parallella Bitcnts, som genomförs bit räknar med olika strategi. Tre parallella ansökningar genomförs för Dijkstra. Den ena är för all-par kortaste stigar problem. Ytterligare två är för att lösa enda källa kortaste stigar problemet, använder en kö strategi och flera kö strategi respektive. Stringsearch består av Pratt-Boyer-Moore, skiftlägeskänslig Boyer-Moore-Horspool, skiftlägesokänslig Boyer-Moore-Horspool, och Boyer-Moore-Horspool (små bokstäver med accenttecken översättning) implementationer. Källkod sekventiell versioner av dessa program att hämta från Mibench liksom standard produktion baserad på x86-linux. För OpenMPBench har alla parallella ansökningar har genomförts i ANCI C-språk med POSIX trådar. Alla bibliotek i samband med implementationer är baserat på GNU standard bibliotek. Utvecklingsmiljö i Ubuntu 9.04 med 2.6.28-generic Linuxkärnan, GCC 4.2.4 kompilator och Emacs 22,1 redaktör. På grundval av nuvarande hårdvara skick, en arbetsstation med 8 processorer, som levereras med Ubuntu 4.2.4, har valts för experiment miljön. Ubuntu är ett gratis GNU Linux-version som kan erbjuda alla GNU Standard bibliotek och GCC har installerats som standard. Sammanfattningsvis anser vi att detta experiment miljön är tillgänglig för att simulera multiprocessor baserade på inbyggda system. / Yuchen Liang: phone no: 8641182120823 6-3-1, No. 44, Huabei Road Ganduan, Ganjingzi District, Dalian City, 116023, Liaoning Province, P. R. China Syed Muhammad Zeeshan Iqbal: phone no: 92415510275 Muhallah Gurunanak Pura, Street No: 7, House No:211, Faisalabad, Pakistan
32

A comparison of sequencing formulations in a constraint generation procedure for avionics scheduling

Boberg, Jessika January 2017 (has links)
This thesis compares different mixed integer programming (MIP) formulations for sequencing of tasks in the context of avionics scheduling. Sequencing is a key concern in many discrete optimisation problems, and there are numerous ways of accomplishing sequencing with different MIP formulations. A scheduling tool for avionic systems has previously been developed in a collaboration between Saab and Linköping University. This tool includes a MIP formulation of the scheduling problem where one of the model components has the purpose to sequence tasks. In this thesis, this sequencing component is replaced with other MIP formulations in order to study whether the computational performance of the scheduling tool can be improved. Different scheduling instances and objective functions have been used when performing the tests aiming to evaluate the performances, with the computational times of the entire avionic scheduling model determining the success of the different MIP formulations for sequencing. The results show that the choice of MIP formulation makes a considerable impact on the computational performance and that a significant improvement can be achieved by choosing the most suitable one.
33

A Multi-core Testbed on Desktop Computer for Research on Power/Thermal Aware Resource Management

Dierivot, Ashley 06 June 2014 (has links)
Our goal is to develop a flexible, customizable, and practical multi-core testbed based on an Intel desktop computer that can be utilized to assist the theoretical research on power/thermal aware resource management in design of computer systems. By integrating different modules, i.e. thread mapping/scheduling, processor/core frequency and voltage variation, temperature/power measurement, and run-time performance collection, into a systematic and unified framework, our testbed can bridge the gap between the theoretical study and practical implementation. The effectiveness for our system was validated using appropriately selected benchmarks. The importance of this research is that it complements the current theoretical research by validating the theoretical results in practical scenarios, which are closer to that in the real world. In addition, by studying the discrepancies of results of theoretical study and their applications in real world, the research also aids in identifying new research problems and directions.
34

ByteSTM: Java Software Transactional Memory at the Virtual Machine Level

Mahmoud Mohamedin, Mohamed Ahmed 21 March 2012 (has links)
As chip vendors are increasingly manufacturing a new generation of multi-processor chips called multicores, improving software performance requires exposing greater concurrency in software. Since code that must be run sequentially is often due to the need for synchronization, the synchronization abstraction has a significant effect on program performance. Lock-based synchronization — the most widely used synchronization method — suffers from programability, scalability, and composability challenges. Transactional memory (TM) is an emerging synchronization abstraction that promises to alleviate the difficulties with lock-based synchronization. With TM, code that read/write shared memory objects is organized as transactions, which speculatively execute. When two transactions conflict (e.g., read/write, write/write), one of them is aborted, while the other commits, yielding (the illusion of) atomicity. Aborted transactions are re-started, after rolling-back changes made to objects. In addition to a simple programming model, TM provides performance comparable to lock-based synchronization. Software transactional memory (STM) implements TM entirely in software, without any special hardware support, and is usually implemented as a library, or supported by a compiler or by a virtual machine. In this thesis, we present ByteSTM, a virtual machine-level Java STM implementation. ByteSTM implements two STM algorithms, TL2 and RingSTM, and transparently supports implicit transactions. Program bytecode is automatically modified to support transactions: memory load/store bytecode instructions automatically switch to transactional mode when a transaction starts, and switch back to normal mode when the transaction successfully commits. Being implemented at the VM-level, it accesses memory directly and uses absolute memory addresses to uniformly handle memory. Moreover, it avoids Java garbage collection (which has a negative impact on STM performance), by manually allocating and recycling memory for transactional metadata. ByteSTM uses field-based granularity, and uses the thread header to store transactional metadata, instead of the slower Java ThreadLocal abstraction. We conducted experimental studies comparing ByteSTM with other state-of-the-art Java STMs including Deuce, ObjectFabric, Multiverse, DSTM2, and JVSTM on a set of micro- benchmarks and macro-benchmarks. Our results reveal that, ByteSTM's transactional throughput improvement over competitors ranges from 20% to 75% on micro-benchmarks and from 36% to 100% on macro-benchmarks. / Master of Science
35

Compilation and Generation of Multi-Processor on a Chip Real-Time Embedded Systems

Klingler, Randall S. 10 July 2007 (has links) (PDF)
Current FPGA technology has advanced to the point that useful embedded System-on-Programmable-Chips (SoPC)s can now be designed. The Real Time Processor (RTP) project leverages the advances in FPGA technology with a system architecture that is customizable to specific real-time applications. The design and implementation of the framework for architecting such a system from ANSI-C code is presented. The Small Device C Compiler (SDCC) was retargeted to the RTP architecture and extended to produce a generator directive file. The RTPGen hardware generator was created to consume the directive file and produce a highly customized top-level structural VHDL file that can be synthesized and programmed onto an FPGA such as the Xilinx Spartan-3. Thus, an application specific multiprocessor real-time embedded system is realized from ANSI-C code.
36

Bat Intelligent Hunting Optimization with Application to Multiprocessor Scheduling

Kim, Hyun Soo January 2010 (has links)
No description available.
37

A Novel Configurable Benchmarking System for Multi-core Architectures

Panda, Amayika 20 September 2011 (has links)
No description available.
38

Methods for Creating and Exploiting Data Locality

Wallin, Dan January 2006 (has links)
The gap between processor speed and memory latency has led to the use of caches in the memory systems of modern computers. Programs must use the caches efficiently and exploit data locality for maximum performance. Multiprocessors, built from many processing units, are becoming commonplace not only in large servers but also in smaller systems such as personal computers. Multiprocessors require careful data locality optimizations since accesses from other processors can lead to invalidations and false sharing cache misses. This thesis explores hardware and software approaches for creating and exploiting temporal and spatial locality in multiprocessors. We propose the capacity prefetching technique, which efficiently reduces the number of cache misses but avoids false sharing by distinguishing between cache lines involved in communication from non-communicating cache lines at run-time. Prefetching techniques often lead to increased coherence and data traffic. The new bundling technique avoids one of these drawbacks and reduces the coherence traffic in multiprocessor prefetchers. This is especially important in snoop-based systems where the coherence bandwidth is a scarce resource. Most of the studies have been performed on advanced scientific algorithms. This thesis demonstrates that a cc-NUMA multiprocessor, with hardware data migration and replication optimizations, efficiently exploits the temporal locality in such codes. We further present a method of parallelizing a multigrid Gauss-Seidel partial differential equation solver, which creates temporal locality at the expense of increased communication. Our conclusion is that on modern chip multiprocessors, it is more important to optimize algorithms for data locality than to avoid communication, since communication can take place using a shared cache.
39

Scheduling Tasks on Heterogeneous Chip Multiprocessors with Reconfigurable Hardware

Teller, Justin Stevenson 31 July 2008 (has links)
No description available.
40

Performance prediction for dynamic voltage and frequency scaling

Miftakhutdinov, Rustam Raisovich 28 October 2014 (has links)
This dissertation proves the feasibility of accurate runtime prediction of processor performance under frequency scaling. The performance predictors developed in this dissertation allow processors capable of dynamic voltage and frequency scaling (DVFS) to improve their performance or energy efficiency by dynamically adapting chip or core voltages and frequencies to workload characteristics. The dissertation considers three processor configurations: the uniprocessor capable of chip-level DVFS, the private cache chip multiprocessor capable of per-core DVFS, and the shared cache chip multiprocessor capable of per-core DVFS. Depending on processor configuration, the presented performance predictors help the processor realize 72–85% of average oracle performance or energy efficiency gains. / text

Page generated in 0.0786 seconds