Global ETD Search

1	A Low Overhead Transport Protocol for Linux Networking Chiu, Shih-Yang 29 July 2008 (has links) In this thesis, a low overhead transport protocol for Linux networking is proposed. This proposed protocol is motivated by the observation that the transport protocol for Linux requires more than a single memory copy for both the receiving and transmission of a packet, which is essentially redundant. In this thesis, we show that all but one of the memory copies can be eliminated, thus reducing the number of memory copies to one and only one, which is in general referred to as zero copy. zero-copy transport protocol low overhead
2	Low-Overhead Isolation Cells for Low-Power Multipliers Wu, Zong-Lin 30 July 2009 (has links) With the rapid progress in manufacturing technology, the chip design is more and more complicated day by day. As a result, the circuit design with standard cell library becomes more significant. Standard cell is universally applied to cell-based design and the designer can complete their design quickly by using of the elements in standard cell library through cell-based design flow. Therefore, it is indispensable for VLSI design to utilize standard cell library for circuit design. Moreover, the low power design is getting increasingly important in the circuit design. Therefore, we design the cells with particular function and add them into the standard cell library so that the low power design can be more well-designed. In this thesis, we design and and the transmission gate into the standard cell library. In addition, we design two types of standard cells with TSMC 0.13£gm technology: a low-overhead latch and a modified transmission-gate based full adder. They are applied to design different low power multipliers with cell-based design flow and full custom design flow. Experimental results show that our proposed standard cells can reduce the power consumption of the entire multiplier efficiently. low overhead isolation cell standard cell low-power multipliers
3	Low Overhead Ethernet Communication for Open MPI on Linux Clusters Hoefler, Torsten, Reinhardt, Mirko, Mietke, Frank, Mehlan, Torsten, Rehm, Wolfgang 20 July 2006 (has links) (PDF) This paper describes the basic concepts of our solution to improve the performance of Ethernet Communication on a Linux Cluster environment by introducing Reliable Low Latency Ethernet Sockets. We show that about 25% of the socket latency can be saved by using our simplified protocol. Especially, we put emphasis on demonstrating that this performance benefit is able to speed up the MPI level communication. Therefore we have developed a new BTL component for Open MPI, an open source MPI-2 implementation which offers with its Modular Component Architecture a nearly ideal environment to implement our changes. Microbenchmarks of MPI collective and Point-to-Point operations were performed. We see a performance improvement of 8% to 16% for LU and SP implementations of the NAS parallel benchmark suite which spends a significant amount of time in the MPI. Practical application tests with Abinit, an electronic structure calculation program, show that the runtime of be nearly halved on a 4 node system. Thus we show evidence that our new Ethernet communication protocol is able to increase the speedup of parallel applications considerably. Low Overhead Communication Open MPI Optimized MPI ddc:004 Ethernet LINUX MPI
4	Low Overhead Ethernet Communication for Open MPI on Linux Clusters Hoefler, Torsten, Reinhardt, Mirko, Mietke, Frank, Mehlan, Torsten, Rehm, Wolfgang 20 July 2006 (has links) This paper describes the basic concepts of our solution to improve the performance of Ethernet Communication on a Linux Cluster environment by introducing Reliable Low Latency Ethernet Sockets. We show that about 25% of the socket latency can be saved by using our simplified protocol. Especially, we put emphasis on demonstrating that this performance benefit is able to speed up the MPI level communication. Therefore we have developed a new BTL component for Open MPI, an open source MPI-2 implementation which offers with its Modular Component Architecture a nearly ideal environment to implement our changes. Microbenchmarks of MPI collective and Point-to-Point operations were performed. We see a performance improvement of 8% to 16% for LU and SP implementations of the NAS parallel benchmark suite which spends a significant amount of time in the MPI. Practical application tests with Abinit, an electronic structure calculation program, show that the runtime of be nearly halved on a 4 node system. Thus we show evidence that our new Ethernet communication protocol is able to increase the speedup of parallel applications considerably. info:eu-repo/classification/ddc/004 ddc:004 Ethernet LINUX MPI Low Overhead Communication Open MPI Optimized MPI
5	Performance Optimisation of Discrete-Event Simulation Software on Multi-Core Computers / Prestandaoptimering av händelsestyrd simuleringsmjukvara på flerkärniga datorer Kaeslin, Alain E. January 2016 (has links) SIMLOX is a discrete-event simulation software developed by Systecon AB for analysing logistic support solution scenarios. To cope with ever larger problems, SIMLOX's simulation engine was recently enhanced with a parallel execution mechanism in order to take advantage of multi-core processors. However, this extension did not result in the desired reduction in runtime for all simulation scenarios even though the parallelisation strategy applied had promised linear speedup. Therefore, an in-depth analysis of the limiting scalability bottlenecks became necessary and has been carried out in this project. Through the use of a low-overhead profiler and microarchitecture analysis, the root causes were identified: atomic operations causing a high communication overhead, poor locality leading to translation lookaside buffer thrashing, and hot spots that consume significant amounts of CPU time. Subsequently, appropriate optimisations to overcome the limiting factors were implemented: eliminating the expensive operations, more efficient handling of heap memory through the use of a scalable memory allocator, and data structures that make better use of caches. Experimental evaluation using real world test cases demonstrated a speedup of at least 6.75x on an eight-core processor. Most cases even achieve a speedup of more than 7.2x. The various optimisations implemented further helped to lower run times for sequential execution by 1.5x or more. It can be concluded that achieving nearly linear speedup on a multi-core processor is possible in practice for discrete-event simulation. / SIMLOX är en kommersiell mjukvara utvecklad av Systecon AB, vars huvudsakliga funktion är en händelsestyrd simuleringskärna för analys av underhållslösningar för komplexa tekniska system. För hantering av stora problem så används parallellexekvering för simuleringen, vilket i teorin borde ge en nästan linjär skalning med antal trådar. Prestandaförbättringen som observerats i praktiken var dock ytterst begränsad, varför en ordentlig analys av skalbarheten har gjorts i detta projekt. Genom användandet av ett profileringsverktyg med liten overhead och mikroarkitektur-analys, så kunde orsakerna hittas: atomiska operationer som skapar mycket overhead för kommunikation, dålig lokalitet ger fragmentering vid översättning till fysiska adresser och dåligt utnyttjande av TLB-cachen, och vissa flaskhalsar som kräver mycket CPU-kraft. Därefter implementerades och testade optimeringar för att undvika de identifierade problem. Testade lösningar inkluderar eliminering av dyra operationer, ökad effektivitet i minneshantering genom skalbara minneshanteringsalgoritmer och implementation av datastrukturer som ger bättre lokalitet och därmed bättre användande av cache-strukturen. Verifiering på verkliga testfall visade på uppsnabbningar på åtminstone 6.75 gånger på en processor med 8 kärnor. De flesta fall visade på en uppsnabbning med en faktor större än 7.2. Optimeringarna gav även en uppsnabbning med en faktor på åtminstone 1.5 vid sekventiell exekvering i en tråd. Slutsatsen är därmed att det är möjligt att uppnå nästan linjär skalning med antalet kärnor för denna typ av händelsestyrd simulering. cache hierarchy caches communication overhead data structures discrete-event simulation heap memory linear speedup logistic support low-overhead profiler memory allocator microarchitecture microarchitecture analysis multi-core optimisation parallel execution profiler runtime scalability scalability bottlenecks scalable memory allocator simulation translation lookaside buffer translation lookaside buffer thrashing TLB Computer Sciences Datavetenskap (datalogi)
6	Advanced EM/Power Side-Channel Attacks and Low-overhead Circuit-level Countermeasures Debayan Das (11178318) 27 July 2021 (has links) <div>The huge gamut of today’s internet-connected embedded devices has led to increasing concerns regarding the security and confidentiality of data. To address these requirements, most embedded devices employ cryptographic algorithms, which are computationally secure. Despite such mathematical guarantees, as these algorithms are implemented on a physical platform, they leak critical information in the form of power consumption, electromagnetic (EM) radiation, timing, cache hits and misses, and so on, leading to side-channel analysis (SCA) attacks. Non-profiled SCA attacks like differential/correlational power/EM analysis (DPA/CPA/DEMA/CEMA) are direct attacks on a single device to extract the secret key of an encryption algorithm. On the other hand, profiled attacks comprise of building an offline template (model) using an identical device and the attack is performed on a similar device with much fewer traces.</div><div><br></div><div>This thesis focusses on developing efficient side-channel attacks and circuit-level low-overhead generic countermeasures. A cross-device deep learning-based profiling power side-channel attack (X-DeepSCA) is proposed which can break the secret key of an AES-128 encryption engine running on an Atmel microcontroller using just a single power trace, thereby increasing the threat surface of embedded devices significantly. Despite all these advancements, most works till date, both attacks as well as countermeasures, treat the crypto engine as a black box, and hence most protection techniques incur high power/area overheads.</div><div><br></div><div>This work presents the first white-box modeling of the EM leakage from a crypto hardware, leading to the understanding that the critical correlated current signature should not be passed through the higher metal layers. To achieve this goal, a signature attenuation hardware (SAH) is utilized, embedding the crypto core locally within the lower metal layers so that the critical correlated current signature is not passed through the higher metals, which behave as efficient antennas and its radiation can be picked up by a nearby attacker. Combination of the 2 techniques – current-domain signature suppression and local lower metal routing shows >350x signature attenuation in measurements on our fabricated 65nm test chip, leading to SCA resiliency beyond 1B encryptions, which is a 100x improvement in both EM and power SCA protection over the prior works with comparable overheads. Moreover, this is a generic countermeasure and can be utilized for any crypto core without any performance degradation.</div><div><br></div><div>Next, backed by our physics-level understanding of EM radiation, a digital library cell layout technique is proposed which shows >5x reduction in EM SCA leakage compared to the traditional digital logic gate layout design. Further, exploiting the magneto-quasistatic (MQS) regime of operation for the present-day CMOS circuits, a HFSS-based framework is proposed to develop a pre-silicon EM SCA evaluation technique to test the vulnerability of cryptographic implementations against such attacks during the design phase itself.</div><div><br></div><div>Finally, considering the continuous growth of wearable and implantable devices around a human body, this thesis also analyzes the security of the internet-of-body (IoB) and proposes electro-quasistatic human body communication (EQS-HBC) to form a covert body area network. While the traditional wireless body area network (WBAN) signals can be intercepted even at a distance of 5m, the EQS-HBC signals can be detected only up to 0.15m, which is practically in physical contact with the person. Thus, this pioneering work proposing EQS-HBC promises >30x improvement in private space compared to the traditional WBAN, enhancing physical security. In the long run, EQS-HBC can potentially enable several applications in the domain of connected healthcare, electroceuticals, augmented and virtual reality, and so on. In addition to these physical security guarantees, side-channel secure cryptographic algorithms can be augmented to develop a fully secure EQS-HBC node.</div> Circuits and Systems Microelectronics and Integrated Circuits Computer System Security Electromagnetic Security Hardware Security Mixed-Signal Circuit design for security Generic Low-overhead countermeasure Signature Attenuation Hardware STELLAR X-DeepSCA

1

Page generated in 0.0658 seconds