Global ETD Search

11	High Data Rate Signal Processing Architectures and Compilation Strategies for Scalable, Multi-Gigabit Digital Systems Nybo, Daniel Alexander 12 April 2024 (has links) (PDF) In this study we present a high-performance computing architecture and hardware acceleration strategy for a heterogeneous multi-gigabit computing system. The system architecture integrates a BeeGFS distributed file system, capable of achieving 80 Gbps of sustained write throughput across five nodes, essential for managing the high data volumes generated by a 25 high performance computer (HPC) compute cluster. To ensure operational efficiency and scalability, the tasks performed on the Linux compute cluster consisting of 30 nodes are automated using Ansible, facilitating seamless deployment, management, and updates. We present compilation strategies for a hardware accelerated Polyphase Filter Bank (PFB) channelization routine optimized for Xilinx Ultrascale+ FPGAs, capable of simultaneously processing 2048 channels per 12 input streams. This setup shows the efficiency of High Level Sysnthesis of FPGA-based signal processing in handling demanding data analysis tasks. We also present the implementation and verification of a 1.6 Gsps Direct Memory Access (DMA) transfer from DDR4 memory to a modern Radio Frequency System on Chip (RFSoC) digital to analog converter. The combination of a high-throughput file system, streamlined automation, and advanced signal processing capabilities shows these system's ability to meet the needs of complex, real-time data analysis and processing applications, advancing the field of computational research. InfiniBand remote direct memory access distributed file systems GPU direct storage BeeGFS polyphase filter bank high level synthesis DSP radio astronomy Engineering
12	Toward Highly-efficient GPU-centric Networking / Mot Högeffektiva GPU-centrerade Nätverk Girondi, Massimo January 2024 (has links) Graphics Processing Units (GPUs) are emerging as the most popular accelerator for many applications, powering the core of Machine Learning applications and many computing-intensive workloads. GPUs have typically been consideredas accelerators, with Central Processing Units (CPUs) in charge of the mainapplication logic, data movement, and network connectivity. In these architectures,input and output data of network-based GPU-accelerated application typically traverse the CPU, and the Operating System network stack multiple times, getting copied across the system main memory. These increase application latency and require expensive CPU cycles, reducing the power efficiency of systems, and increasing the overall response times. These inefficiencies become of higher importance in latency-bounded deployments, or with high throughput, where copy times could easily inflate the response time of modern GPUs. The main contribution of this dissertation is towards a GPU-centric network architecture, allowing GPUs to initiate network transfers without the intervention of CPUs. We focus on commodity hardware, using NVIDIA GPUs and Remote Direct Memory Access over Converged Ethernet (RoCE) to realize this architecture, removing the need of highly homogeneous clusters and ad-hoc designed network architecture, as it is required by many other similar approaches. By porting some rdma-core posting routines to GPU runtime, we can saturate a 100-Gbps link without any CPU cycle, reducing the overall system response time, while increasing the power efficiency and improving the application throughput.The second contribution concerns the analysis of Clockwork, a State-of-The-Art inference serving system, showing the limitations imposed by controller-centric, CPU-mediated architectures. We then propose an alternative architecture to this system based on an RDMA transport, and we study some performance gains that such a system would introduce. An integral component of an inference system is to account and track user flows,and distribute them across multiple worker nodes. Our third contribution aims to understand the challenges of Connection Tracking applications running at 100Gbps, in the context of a Stateful Load Balancer running on commodity hardware. / <p>QC 20240315</p> Low-Latency Internet Services Packet Processing Network Functions Virtualization Middle Boxes Commodity Hardware Multi-Hundred-Gigabit-Per-Second Low-Level Optimization Graphics Processing Units Inference Serving Remote Direct Memory Access Internettjänster med Låg Fördröjning Paketbearbetning Virtualisering av Nätverksfunktioner Mellanutrustning Tillgänglig Datorhårdvara Flera-Hundra- Gigabit-Per-Sekund Lågnivå-Optimering Grafikprocessor Inferensserving Remote Direct Memory Access Communication Systems Kommunikationssystem Computer Systems Datorsystem
13	Framework pro hardwarovou akceleraci 400Gb sítí / Framework for Hardware Acceleration of 400Gb Networks Hummel, Václav January 2017 (has links) The NetCOPE framework has proven itself as a viable framework for rapid development of hardware accelerated wire-speed network applications using Network Functions Virtualization (NFV). To meet the current and future requirements of such applications the NetCOPE platform has to catch up with upcoming 400 Gigabit Ethernet. Otherwise, it may become deprecated in following years. Catching up with 400 Gigabit Ethernet brings many challenges bringing necessity of completely different way of thinking. Multiple network packets have to be processed each clock cycle requiring a new concept of processing. Advanced memory management is used to ensure constant memory complexity with respect to the number of DMA channels without any impact on performance. Thanks to that, even more than 256 completely independent DMA channels are feasible with current technology. A lot of effort was made to create the framework as generic as possible allowing deployment of 400 Gigabit Ethernet and beyond. Emphasis is put on communication between the framework and host computer via PCI Express technology. Multiple Ethernet ports are also considered. The proposed system is prepared to be deployed on the family of COMBO cards, used as a reference platform.
14	Ověření vybraných komunikačních rozhraní procesoru TC275 / Verification of selected communication interfaces on TRICORE TC275 Šebesta, Patrik January 2015 (has links) Diploma thesis handles with set up of peripheral modules of the processor TC275 families’ AURIX developed by Infineon. Processor’s peripheral module QSPI implements communication SPI set up as master on a bus supported by another processor’s module DMA. Module DMA periodically service transmit and receive shift buffers of QSPI which are connected with slave analog to digital converter IC CIC751. Another peripheral module is MultiCAN. Programmed drivers used only basic header files with register definition of processor TC275, which are part of IDE TriCore Free Entry Tool Chain used for created drivers.
15	The Buffer - direktåtkomst av minnesbuffer för ljudspår / The Buffer - Direct access of an audio memory buffer Pettersson, Erik January 2020 (has links) Modulära synthesizers blev en stor kommerciell succé in på 1960-talet som sedan in på 2010-talet skulle få uppmärksamhet på nytt, troligtvis i samband med en "Do it yourself-rörelse" (DIY-movement). En sampler är ett instrument som finns både självständigt och som styrspänningskontrollerad modul inom modulärsyntes. Vanligt är att vissa aspekter till uppspelning går att kontrollera med styrspänning, exempelvis uppspelningshastigheten. Något varken jag eller min handledare har sett tidigare är direktåtkomst med styrspänning till minnespekare i en ljudbuffer för samplermoduler. Därför implementerade jag The Buffer, en samplermodul i en virtuell modulärsyntesmiljö - VCV Rack. I arbetet undersökte jag två frågeställningar: kopplingen mellan inspänningen till modulen och det resulterande ljudet, samt även vilket maximalt minnesområde som går att adressera för styrspänningen så att pekarna sveper konsekutivt genom varje frame av ljudspåret. I den senare utforskade jag två möjliga svar, ett teoretiskt största möjliga, och ett med utgångspunkt i min implementation. Jag utförde även en användarstudie på mer subjektiv basis för en indikation på modulens användbarhet. virtual modular synthesizer direct memory access vcv rack voltage controlled sound design virtuell modulär synthesizer direktåtkomst minnesbuffer ljuddesign Signal Processing Signalbehandling Interaction Technologies Interaktionsteknik Övrig annan teknik
16	Closed-loop control and data- recording of a modular-multilevel converter (MMC) Su, Longgang January 2022 (has links) Modular multilevel converters (MMCs) are the preferred converter solution in flexible ac transmission systems (FACTS) and high-voltage direct current (HVDC) applications. This is due to the high quality of the voltage and current signals, lower overall losses, and fewer problems with switching-related EMI. However, without an efficient and fast data recording system, the sampled data from current and voltage measurement boards can cause long latencies in the control system and make it difficult to analyze the operation of MMCs. In this thesis, a filed programmable gate array (FPGA)-based closed-loop control, and a high-speed data recording system is developed for a low-power singlephase MMC prototype. In the prototype, a data-transmission scheme based on the RS485 (TIA/EIA- 485) standard exists. This protocol offers a robust solution for transmitting data over noisy environments. A direct memory access (DMA) scheme is utilized to transmit sampled data from the programmable logic (PL) to the processing subsystem (PS) in the Zynq-7000 SOC. Moreover, an asymmetric multiprocessing (AMP) mechanism was implemented on the two processor cores in the PS. The first processor controls the power transmission to and from the power grid, and the second processor runs the ethernet application to transmit sampled data to the computer using MATLAB. For the closed-loop control of this MMC prototype, a phase-locked loop (PLL), a proportional resonant (PR) current controller, and an energy control loop for capacitor voltage balancing and control are implemented. The results showed that the output power of this single-phase MMC prototype is under control and each sub-module capacitor voltage is balanced and charged to the desired value. The sampled data can be recorded from the computer through the implemented data recording system at 25.6Mbps. Moreover, a dynamic oscilloscope function is developed in MATLAB using this online data recording scheme. / Modulära multilevel-omvandlare (MMC) är den föredragna omvandlarlösningen i flexibla växelströmstransmissionssystem (FACTS) och applikationer med högspänningslikström (HVDC). Detta beror på den höga kvaliteten på spännings- och strömsignalerna, lägre totala förluster och färre problem med omkopplingsrelaterad EMI. Utan ett effektivt och snabbt dataregistreringssystem kan dock samplade data från ström- och spänningsmätkort orsaka långa latenser i styrsystemet och göra det svårt att analysera driften av MMC:er. I denna avhandling utvecklas en FPGA-baserad styrning med sluten slinga och ett höghastighetsdataregistreringssystem för en lågeffekts enfas MMCprototyp. I prototypen finns ett dataöverföringssystem baserat på standarden RS485 (TIA/EIA-485). Detta protokoll erbjuder en robust lösning för att överföra data över bullriga miljöer. Ett schema för direkt minnesåtkomst (DMA) används för att överföra samplade data från den programmerbara logiken (PL) till bearbetningsundersystemet (PS) i Zynq-7000 SOC. Dessutom implementerades en asymmetrisk multiprocessing (AMP)-mekanism på de två processorkärnorna i PS. Den första processorn styr kraftöverföringen till och från elnätet, och den andra processorn kör ethernetapplikationen för att överföra samplade data till datorn med MATLAB. För styrning med sluten slinga av denna MMC-prototyp implementeras en faslåst slinga (PLL), en proportionell resonansströmkontroller (PR) och en energikontrollslinga för balansering och kontroll av kondensatorspänning. Resultaten visade att uteffekten från denna enfasiga MMC-prototyp är under kontroll och varje undermoduls kondensatorspänning är balanserad och laddad till önskat värde. Samplade data kan spelas in från datorn genom det implementerade dataregistreringssystemet vid 25,6 Mbps. Dessutom utvecklas en dynamisk oscilloskopfunktion i MATLAB med hjälp av detta onlinedataregistreringsschema. Modular multilevel converters (MMCs) filed programmable gate array (FPGA) direct memory access (DMA) energy closed-loop control data recording ethernet. Modulära multilevel-omvandlare (MMC) filed programmeable gate array (FPGA) direkt minnesåtkomst (DMA) energistyrning med sluten slinga datainspelning ethernet. Elektroteknik och elektronik

Page generated in 0.0628 seconds