• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 20
  • 8
  • 4
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 45
  • 45
  • 45
  • 45
  • 14
  • 11
  • 9
  • 9
  • 8
  • 7
  • 7
  • 7
  • 7
  • 6
  • 6
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Design and Implementation of SoC Hardware-Software Co-design Platform

Leong, Mun-kit 14 February 2008 (has links)
Reconfigurable supercomputing has been used by many high-performance computer systems to accelerate the processing speed. Thus, it is the present trend to use the microprocessor to combine with reconfigurable FPGA as the embedded system platform. However, the hardware-software co-design and integration of embedded system become great challenges of the designer. Beside this, the communication between hardware and software is crucial for the system to be operated effectively. Our concept consists of the design of FPGA configuration, described in I-Link hardware/software integration, improve the communication among the hardware and software. Besides, by using command packet method, we put the data to multi-hardware through hardware management unit (HMU). While system is operated, The Boot Loader will set up TCB and HCB data structure through PSP. The PSP can be regarded as the important reference segment of messages switching among system and hardware/software. The HMU has data buffering and management ability which can let the processes more easy and smooth. We successfully accomplish a hardware-software integrated system in HSCP, which is developed in our laboratory. The basic components of our platform include ARM7TDMI CPU, memory and Altera ACEK 1K-100 of FPGA. By using ARM-code, we also preliminary accomplish the Boot Loader, HW Constructor and self-developed embedded system. Finally, we make use of a large amount of multiplication operation and matrix summation to verify the feasibility of this system architecture.
2

Towards An Automated Approach to Hardware/Software Decomposition

Qin, Shengchao, He, Jifeng, Chin, Wei Ngan 01 1900 (has links)
We propose in this paper an algebraic approach to hard-ware/software partitioning in Verilog Hardware Description Language (HDL). We explore a collection of algebraic laws for Verilog programs, from which we design a set of syntax-based algebraic rules to conduct hardware/software partitioning. The co-specification language and the target hardware and software description languages are specific subsets of Verilog. Through this, we confirm successful verification for the correctness of the partitioning process by an algebra of Verilog. Facilitated by Verilog’s rich features, we have also successfully studied hw/sw partitioning for environment-driven systems. / Singapore-MIT Alliance (SMA)
3

System Modeling and Design Refinement in ForSyDe

Sander, Ingo January 2003 (has links)
Advances in microelectronics allow the integration of more andmore functionality on a single chip. Emerging system-on-a-chiparchitectures include a large amount of heterogeneous componentsand are of increasing complexity. Applications using thesearchitectures require many low-level details in order to yield anefficient implementation. On the other hand constanttime-to-market pressure on electronic systems demands a shortdesign process that allows to model a system at a highabstraction level, not taking low-level implementation detailsinto account. Clearly there is a significant abstraction gapbetween an ideal model for specification and another one forimplementation. This abstraction gap has to be addressed bymethodologies for electronic system design. This thesis presents the ForSyDe (Formal System Design)methodology, which has been developed with the objective to movesystem design to a higher level of abstraction and to bridge theabstraction gap by transformational design refinement. ForSyDe isbased on carefully selected formal foundations. The initialspecification model uses a synchronous model of computation,which separates communication from computation and has anabstract notion of time. ForSyDe uses the concept of processconstructors to implement the synchronous model, to allow fordesign transformation and the mapping of a refined model onto thetarget architecture. The specification model is refined into adetailed implementation model by the stepwise application ofwell-defined design transformation rules. These rules are eithersemantic preserving or they inflict a design decision modifyingthe semantics. These design decisions are used to introduce thelow-level implementation details that are needed for an efficientimplementation. The implementation model is mapped onto thecomponents of the target architecture. At present ForSyDe modelscan be mapped onto VHDL or C/C++ in order to allow commercialtools to generate custom hardware or sequential software. Thethesis uses a digital equalizer to illustrate the concepts andpotential of ForSyDe. Electronic System Design, Hardware/Software Co-Design,Electrical Engineering
4

System Modeling and Design Refinement in ForSyDe

Sander, Ingo January 2003 (has links)
<p>Advances in microelectronics allow the integration of more andmore functionality on a single chip. Emerging system-on-a-chiparchitectures include a large amount of heterogeneous componentsand are of increasing complexity. Applications using thesearchitectures require many low-level details in order to yield anefficient implementation. On the other hand constanttime-to-market pressure on electronic systems demands a shortdesign process that allows to model a system at a highabstraction level, not taking low-level implementation detailsinto account. Clearly there is a significant abstraction gapbetween an ideal model for specification and another one forimplementation. This abstraction gap has to be addressed bymethodologies for electronic system design.</p><p>This thesis presents the ForSyDe (Formal System Design)methodology, which has been developed with the objective to movesystem design to a higher level of abstraction and to bridge theabstraction gap by transformational design refinement. ForSyDe isbased on carefully selected formal foundations. The initialspecification model uses a synchronous model of computation,which separates communication from computation and has anabstract notion of time. ForSyDe uses the concept of processconstructors to implement the synchronous model, to allow fordesign transformation and the mapping of a refined model onto thetarget architecture. The specification model is refined into adetailed implementation model by the stepwise application ofwell-defined design transformation rules. These rules are eithersemantic preserving or they inflict a design decision modifyingthe semantics. These design decisions are used to introduce thelow-level implementation details that are needed for an efficientimplementation. The implementation model is mapped onto thecomponents of the target architecture. At present ForSyDe modelscan be mapped onto VHDL or C/C++ in order to allow commercialtools to generate custom hardware or sequential software. Thethesis uses a digital equalizer to illustrate the concepts andpotential of ForSyDe.</p><p>Electronic System Design, Hardware/Software Co-Design,Electrical Engineering</p>
5

A New System Architecture for Heterogeneous Compute Units

Asmussen, Nils 09 August 2019 (has links)
The ongoing trend to more heterogeneous systems forces us to rethink the design of systems. In this work, I study a new system design that considers heterogeneous compute units (general-purpose cores with different instruction sets, DSPs, FPGAs, fixed-function accelerators, etc.) from the beginning instead of as an afterthought. The goal is to treat all compute units (CUs) as first-class citizens, enabling (1) isolation and secure communication between all types of CUs, (2) a direct interaction of all CUs, removing the conventional CPU from the critical path, and (3) access to operating system (OS) services such as file systems and network stacks for all CUs. To study this system design, I am using a hardware/software co-design based on two key ideas: 1) introduce a new hardware component next to each CU used by the OS as the CUs' common interface and 2) let the OS kernel control applications remotely from a different CU. The hardware component is called data transfer unit (DTU) and offers the minimal set of features to reach the stated goals: secure message passing and memory access. The OS is called M³ and runs its kernel on a dedicated CU and runs the OS services and applications on the remaining CUs. The kernel is responsible for establishing DTU-based communication channels between services and applications. After a channel has been set up, services and applications communicate directly without involving the kernel. This approach allows to support arbitrary CUs as aforementioned first-class citizens, ranging from fixed-function accelerators to complex general-purpose cores.
6

ADAPT : architectural and design exploration for application specific instruction-set processor technologies

Shee, Seng Lin, Computer Science & Engineering, Faculty of Engineering, UNSW January 2007 (has links)
This thesis presents design automation methodologies for extensible processor platforms in application specific domains. The work presents first a single processor approach for customization; a methodology that can rapidly create different processor configurations by the removal of unused instructions sets from the architecture. A profile directed approach is used to identify frequently used instructions and to eliminate unused opcodes from the available instruction pool. A coprocessor approach is next explored to create an SoC (System-on-Chip) to speedup the application while reducing energy consumption. Loops in applications are identified and accelerated by tightly coupling a coprocessor to an ASIP (Application Specific Instruction-set Processor). Latency hiding is used to exploit the parallelism provided by this architecture. A case study has been performed on a JPEG encoding algorithm; comparing two different coprocessor approaches: a high-level synthesis approach and our custom coprocessor approach. The thesis concludes by introducing a heterogenous multi-processor system using ASIPs as processing entities in a pipeline configuration. The problem of mapping each algorithmic stage in the system to an ASIP configuration is formulated. We proposed an estimation technique to calculate runtimes of the configured multiprocessor system without running cycle-accurate simulations, which could take a significant amount of time. We present two heuristics to efficiently search the design space of a pipeline-based multi ASIP system and compare the results against an exhaustive approach. In our first approach, we show that, on average, processor size can be reduced by 30%, energy consumption by 24%, while performance is improved by 24%. In the coprocessor approach, compared with the use of a main processor alone, a loop performance improvement of 2.57x is achieved using the custom coprocessor approach, as against 1.58x for the high level synthesis method, and 1.33x for the customized instruction approach. Energy savings are 57%, 28% and 19%, respectively. Our multiprocessor design provides a performance improvement of at least 4.03x for JPEG and 3.31x for MP3, for a single processor design system. The minimum cost obtained using our heuristic was within 0.43% and 0.29% of the optimum values for the JPEG and MP3 benchmarks respectively.
7

Low-cost Hardware Profiling of Run-time and Energy in FPGA Soft Processors

Aldham, Mark 11 August 2011 (has links)
Field Programmable Gate Arrays (FPGAs) are a reconfigurable hardware platform which enable the acceleration of software code through the use of custom-hardware circuits. Complex systems combining processors with programmable logic require partitioning to decide which code segments to accelerate. This thesis provides tools to help determine which software code sections would most benefit from hardware acceleration. A low-overhead profiling architecture, called LEAP, is proposed to attain real-time profiles of an FPGA-based processor. LEAP is designed to be extensible for a variety of profiling tasks, three of which are investigated and implemented to identify candidate software for acceleration. 1) Cycle profiling determines the most time-consuming functions to maximize speedup. 2) Cache stall profiling detects memory-intensive code; large memory overheads reduce the benefits of acceleration. 3) Energy consumption profiling detects energy-inefficient code through the use of an instruction-level power database to minimize the system's energy consumption.
8

Enabling Hardware/Software Co-design in High-level Synthesis

Choi, Jongsok 21 November 2012 (has links)
A hardware implementation can bring orders of magnitude improvements in performance and energy consumption over a software implementation. Hardware design, however, can be extremely difficult. High-level synthesis, the process of compiling software to hardware, promises to make hardware design easier. However, compiling an entire software program to hardware can be inefficient. This thesis proposes hardware/software co-design, where computationally intensive functions are accelerated by hardware, while remaining program segments execute in software. The work in this thesis builds a framework where user-designated software functions are automatically compiled to hardware accelerators, which can execute serially or in parallel to work in tandem with a processor. To support multiple parallel accelerators, new multi-ported cache designs are presented. These caches provide low-latency high-bandwidth data to further improve the performance of accelerators. An extensive range of cache architectures are explored, and results show that certain cache architectures significantly outperform others in a processor/accelerator system.
9

Low-cost Hardware Profiling of Run-time and Energy in FPGA Soft Processors

Aldham, Mark 11 August 2011 (has links)
Field Programmable Gate Arrays (FPGAs) are a reconfigurable hardware platform which enable the acceleration of software code through the use of custom-hardware circuits. Complex systems combining processors with programmable logic require partitioning to decide which code segments to accelerate. This thesis provides tools to help determine which software code sections would most benefit from hardware acceleration. A low-overhead profiling architecture, called LEAP, is proposed to attain real-time profiles of an FPGA-based processor. LEAP is designed to be extensible for a variety of profiling tasks, three of which are investigated and implemented to identify candidate software for acceleration. 1) Cycle profiling determines the most time-consuming functions to maximize speedup. 2) Cache stall profiling detects memory-intensive code; large memory overheads reduce the benefits of acceleration. 3) Energy consumption profiling detects energy-inefficient code through the use of an instruction-level power database to minimize the system's energy consumption.
10

Enabling Hardware/Software Co-design in High-level Synthesis

Choi, Jongsok 21 November 2012 (has links)
A hardware implementation can bring orders of magnitude improvements in performance and energy consumption over a software implementation. Hardware design, however, can be extremely difficult. High-level synthesis, the process of compiling software to hardware, promises to make hardware design easier. However, compiling an entire software program to hardware can be inefficient. This thesis proposes hardware/software co-design, where computationally intensive functions are accelerated by hardware, while remaining program segments execute in software. The work in this thesis builds a framework where user-designated software functions are automatically compiled to hardware accelerators, which can execute serially or in parallel to work in tandem with a processor. To support multiple parallel accelerators, new multi-ported cache designs are presented. These caches provide low-latency high-bandwidth data to further improve the performance of accelerators. An extensive range of cache architectures are explored, and results show that certain cache architectures significantly outperform others in a processor/accelerator system.

Page generated in 0.0489 seconds