Global ETD Search

221	SEU-Induced Persistent Error Propagation in FPGAs Morgan, Keith S. 06 July 2006 (has links) (PDF) This thesis introduces a new way to characterize the dynamic SEU cross section of an FPGA design in terms of its persistent and non-persistent components. An SEU in the persistent cross section results in a permanent interruption of service until reset. An SEU in the non-persistent cross section causes a temporary interruption of service, but in some cases this interruption may be tolerated. Techniques for measuring these cross sections are introduced. These cross sections can be measured and characterized for an arbitrary FPGA design. Furthermore, circuit components in the non-persistent and persistent cross section can statically be determined. Functional error mitigation techniques can leverage this identification to improve the reliability of some applications at lower costs by focusing mitigation on just the persistent cross section. The reliability of a practical signal processing application in use at Los Alamos National Laboratory was improved by nearly two orders of magnitude at a theoretical savings of over 53% over traditional comprehensive mitigation techniques such as full TMR. SEU FPGA persistence error propagation simulator proton accelerator radiation dynamic testing Electrical and Computer Engineering
222	Positron annihilation lifetime spectroscopy at a superconducting electron accelerator Wagner, A., Anwand, W., Attallah, A.G., Dornberg, G., Elsayed, M., Enke, Dirk, Hussein, A.E.M., Krause-Rehberg, R., Liedke, M.O., Potzger, K., Trinh, T.T. 25 April 2023 (has links) The Helmholtz-Zentrum Dresden-Rossendorf operates a superconducting linear accelerator for electrons with energies up to 35 MeV and average beam currents up to 1.6 mA. The electron beam is employed for production of several secondary beams including X-rays from bremsstrahlung production, neutrons, and positrons. The secondary positron beam after moderation feeds the Monoenergetic Positron Source (MePS) where positron annihilation lifetime (PALS) and positron annihilation Doppler-broadening experiments in materials science are performed in parallel. The adjustable repetition rate of the continuous-wave electron beams allows matching of the pulse separation to the positron lifetime in the sample under study. The energy of the positron beam can be set between 0.5 keV and 20 keV to perform depth resolved defect spectroscopy and porosity studies especially for thin films. info:eu-repo/classification/ddc/530 ddc:530
223	Hardware accelerators for post-quantum cryptography and fully homomorphic encryption Agrawal, Rashmi 16 January 2023 (has links) With the monetization of user data, data breaches have become very common these days. In the past five years, there were more than 7000 data breaches involving theft of personal information of billions of people. In the year 2020 alone, the global average cost per data breach was $3.86 million, and this number rose to $4.24 million in 2021. Therefore, the need for maintaining data security and privacy is becoming increasingly critical. Over the years, various data encryption schemes including RSA, ECC, and AES are being used to enable data security and privacy. However, these schemes are deemed vulnerable to quantum computers with their enormous processing power. As quantum computers are expected to become main stream in the near future, post-quantum secure encryption schemes are required. To this end, through NIST’s standardization efforts, code-based and lattice-based encryption schemes have emerged as one of the plausible way forward. Both code-based and lattice-based encryption schemes enable public key cryptosystems, key exchange mechanisms, and digital signatures. In addition, lattice-based encryption schemes support fully homomorphic encryption (FHE) that enables computation on encrypted data. Over the years, there have been several efforts to design efficient FPGA-based and ASIC-based solutions for accelerating the code-based and lattice-based encryption schemes. The conventional code-based McEliece cryptosystem uses binary Goppa code, which has good code rate and error correction capability, but suffers from high encoding and decoding complexity. Moreover, the size of the generated public key is in several MBs, leading to cryptosystem designs that cannot be accommodated on low-end FPGAs. In lattice-based encryption schemes, large polynomial ring operations form the core compute kernel and remain a key challenge for many hardware designers. To extend support for large modular arithmetic operations on an FPGA, while incurring low latency and hardware resource utilization requires substantial design efforts. Moreover, prior FPGA solutions for lattice-based FHE include hardware acceleration of basic FHE primitives for impractical parameter sets without the support for bootstrapping operation that is critical to building real-time privacy-preserving applications. Similarly, prior ASIC proposals of FHE that include bootstrapping are heavily memory bound, leading to large execution times, underutilized compute resources, and cost millions of dollars. To respond to these challenges, in this dissertation, we focus on the design of efficient hardware accelerators for code-based and lattice-based public key cryptosystems (PKC). For code-based PKC, we propose the design of a fully-parameterized en/decryption co-processor based on a new variant of McEliece cryptosystem. This co-processor takes advantage of the non-binary Orthogonal Latin Square Code (OLSC) to achieve a lower computational complexity along with smaller key size than that of the binary Goppa code. Our FPGA-based implementation of the co-processor is ∼3.5× faster than an existing classic McEliece cryptosystem implementation. For lattice-based PKC, we propose the design of a co-processor that implements large polynomial ring operations. It uses a fully-pipelined NTT polynomial multiplier to perform fast polynomial multiplications. We also propose the design of a highly-optimized Gaussian noise sampler, capable of sampling millions of high-precision samples per second. Through an FPGA-based implementation of this lattice-based PKC co-processor, we achieve a speedup of 6.5× while utilizing 5× less hardware resources as compared to state-of-the-art implementations. Leveraging our work on lattice-based PKC implementation, we explore the design of hardware accelerators that perform FHE operations using Cheon-Kim-Kim-Song (CKKS) scheme. Here, we first perform an in-depth architectural analysis of various FHE operations in the CKKS scheme so as to explore ways to accelerate an end-to-end FHE application. For this analysis, we develop a custom architecture modeling tool, SimFHE, to measure the compute and memory bandwidth requirements of hardware-accelerated CKKS. Our analysis using SimFHE reveals that, without a prohibitively large cache, all FHE operations exhibit low arithmetic intensity (<1 Op/byte). To address the memory bottleneck resulting from the low arithmetic intensity, we propose several memory-aware design (MAD) techniques, including caching and algorithmic optimizations, to reduce the memory requirements of CKKS-based application execution. We show that the use of our MAD techniques can yield an ASIC design that is at least 5-10× cheaper than the large-cache proposals, but only ∼2-3× slower. We also design FAB, an FPGA-based accelerator for bootstrappable FHE. FAB, for the first time ever, accelerates bootstrapping (along with basic FHE primitives) on an FPGA for a secure and practical parameter set. FAB tackles the memory-bounded nature of bootstrappable FHE through judicious datapath modification, smart operation scheduling, and on-chip memory management techniques to maximize the overall FHE-based compute throughput. FAB outperforms all prior CPU/GPU works by 9.5× to 456× and provides a practical performance for our target application: secure training of logistic regression models. / 2025-01-16T00:00:00Z Computer engineering FPGA Fully homomorphic encryption Hardware accelerator Post-quantum cryptography Privacy-preserving computing
224	A Data Sorting Hardware Accelerator on FPGA Liu, Boyan January 2020 (has links) In recent years, with the rise of the application of big data, efficiency has become more important for data processing, and simple sorting methods require higher stability and efficiency in large-scale scenarios. This thesis explores topics related to hardware acceleration for data sorting networks of massive input resource or data stream, which leads to our three different design approaches: running the whole data processing fully on the software side (sorting and merging on PC), a combination of PC side and field- programmable gate arrays (FPGA) platform (hardware sorting with software merging), and fully hardware side (sorting and merging on FPGA). Parallel data hardware sorters have been proposed before, but they do not consider that the loading and off-loading of data often is serial in nature. In this analysis, we explore an insertion-sort solution that can sort data in the same clock cycle as is written to the sorter and compare it with standard parallel sorters.‌ The main contributions in this thesis are techniques for data sorting acceleration for large data streams, which involve fully software design, hardware/software co-design and fully hardware design solution on a reconfigurable FPGA platform. The results of this whole experiment mostly meet our predictions, and we show that Insertion-Sort implemented in hardware can improve the data processing speed for small input data sizes. / De senaste årens ökning av tillämpad big data har inneburit att effektiviteten blivit viktigare vid databehandling. Enkla sorteringsmetoder kräver högre stabilitet och effektivitet i storskaliga scenarier. Den här avhandlingen undersöker ämnen relaterade till hårdvaruacceleration av datasorteringsnätverk med massiv inmatning eller strömmande data, vilket leder till tre olika designmetoder: att köra databehandlingen helt på mjukvarusidan (sortering och sammanslagning på PC), en kombination av PC och Fält- Programmerbara Gate-Arrays (FPGA) plattform (hårdvarusortering med mjukvarusammanslagning), och enbart hårdvarulösning (sortering och sammanslagning på FPGA). Parallella hårdvarusorterare has föreslagits förr, men de tar vanligtvis inte hänsyn till att indata och utdata ofta är seriell till sin natur. I den här avhandlingen undersöker vi en insertion-sort lösning, som kan sortera indata i samma clock cykel som den läses in, och jämför den med några standard parallella sorterare.‌ De viktigaste bidragen i den här avhandlingen är tekniker för datasorteringsacceleration av stora dataströmmar, vilket involverar en implementering helt i mjukvara, en HW/SW codesign lösning och en implementering helt i hårdvara på en rekonfigurerbar FPGA plattform. Resultaten av experimenten uppfyller mestadels våra förutsägelser, och vi visar att Insertion-Sort implementerad i hårdvara kan förbättra databehandlingshastigheten för små dataserier. Data Sorting Hardware Accelerator Algorithm Block Circuit FPGA Computer and Information Sciences Data- och informationsvetenskap
225	Speeding Up Social Entrepreneurship: Improving the Sustainability of the Accelerator Program DE VRIES, MARTE January 2018 (has links) In the past decade, a new entrepreneurial phenomenon aimed at seeding start-up companies has emerged across the globe: the social enterprise (SE) accelerator program. These accelerators focus on scaling social entrepreneurs by accelerating their journey to the market. Different actors like business reporters, entrepreneurs, and angel investors have expressed skepticism around the viability of the accelerator model. To research this sustainability, this thesis studied the revenue models of SE accelerators. Four semi-structured interviews were conducted with experts working at SE accelerators in Stockholm. These four identified getting revenue from partnerships, government institutions, and philanthropy and donations. Consulting contracts, equity shares and fees were not used by these four but were discussed as potential revenue streams. All respondents emphasized the importance of revenue model diversification and were currently working on strategies to act on this. Diversifying the revenue models of SE accelerators will increase the sustainability of their revenue models. This might be the first step from the focus of monetary gain towards a society where business is created to do good. Accelerator programs Revenue Models Social Entrepreneurship Revenue Diversification Engineering and Technology Teknik och teknologier
226	Part A: Thermal and Electrical Behaviour of Thin Metal Films; Part B: Implementation Accelerator System Beatty, Denis Clyde 08 1900 (has links) Part A: The preliminary investigation of the thermal and electrical behaviour of thin metal films gives evidence, Part I, that several mechanisms are responsible for the change of resistance as the temperature increases from room temperature to 500°C. Firstly, there appears grain growth giving a characteristic decrease in resistance. Secondly, the formation of agglomerates upon the continued growth of grains; especially for the thinner Al and Cr films. This effect tends to increase the resistance and a mathematical model is proposed to explain the results qualitatively. Thirdly, the occurrence of what appeared to be an electromigration effect. This latter point provided the incentive for a study on the effects of electromigration in thin aluminum film, Part II. The results of this study are comparable to those obtained by other workers, except that the interpretation for the direction of electromigration in Al is reversed. One possible explanation for the difference in the direction of migration could be due to the interpretation of marker motion. A mathematical model is also proposed for electromigration, in which both the effects due to the applied electric field and the electrons collision with the ions have been taken into consideration. It was found that the effect due to electrons collision with the ions upon the migration of ions could be expressed in terms of an exponential function of the square of the electron to ion collision relaxation time. / Thesis / Master of Engineering (ME) engineering physics thermal, electrical, behaviour thin metal film design, construction ion implementation accelerator cryogenic experimental chamber
227	Development of a Low Energy Ion Mass Spectrometer Karapetsas, Spyridon 02 1900 (has links) <p> The interaction mechanisms of an ion beam with a solid target are identified. Basic parameters associated with ion scattering, charge neutralization, inelastic energy losses and secondary ion production are described. Low energy (1-20 kev) experimental studies on these topics are reviewed. A low energy ion mass spectrometer is described. The ion beam is generated by an existing kev ion accelerator and is directed to a newly constructed UHV target chamer. The energy and angular distributions of the backscattered particles are measured with a hemispherical electrostatic analyser and a channeltron detector. A high precision goniometer allows target rotation about two perpendicular axes by angles of 180° and 90° with an accuracy and repeatability of 0.1°. The interaction chamber is bakeable to 250°c and was designed for an ultimate pressure of 10^-11 torr. The data acquisition system chamber scans the energy spectrum automatically so that the radiation dosage at the target is equalized for all channels. </p> / Thesis / Master of Engineering (MEngr) low energy ion mass spectrometer ion scattering charge neutralization ion accelerator goniometer
228	Acceleration of Machine-Learning Pipeline Using Parallel Computing Erickson, Xavante January 2021 (has links) Researchers from Lund have conducted research on classifying images in three different categories, faces, landmarks and objects from EEG data [1]. The researchers used SVMs (Support Vector Machine) to classify between the three different categories [2, 3]. The scripts written to compute this had the potential to be extremely parallelized and could potentially be optimized to complete the computations much faster. The scripts were originally written in MATLAB which is a propriety software and not the most popular language for machine learning. The aim of this project is to translate the MATLAB code in the aforementioned Lund project to Python and perform code optimization and parallelization, in order to reduce the execution time. With much other data science transitioning into Python as well, it was a key part in this project to understand the differences between MATLAB and Python and how to translate MATLAB code to Python. With the exception of the preprocessing scripts, all the original MATLAB scripts were translated to Python. The translated Python scripts were optimized for speed and parallelized to decrease the execution time even further. Two major parallel implementations of the Python scripts were made. One parallel implementation was made using the Ray framework to compute in the cloud [4]. The other parallel implementation was made using the Accelerator, a framework to compute using local threads[5]. After translation, the code was tested versus the original results and profiled for any key mistakes, for example functions which took unnecessarily long time to execute. After optimization the single thread script was twelve times faster than the original MATLAB script. The final execution times were around 12−15 minutes, compared to the benchmark of 48 hours it is about 200 times faster. The benchmark of the original code used less iterations than the researchers used, decreasing the computational time from a week to 48 hours. The results of the project highlight the importance of learning and teaching basic profiling of slow code. While not entirely considered in this project, doing complexity analysis of code is important as well. Future work includes a deeper complexity analysis on both a high and low level, since a high level language such as Python relies heavily on modules with low level code. Future work also includes an in-depth analysis of the NumPy source code, as the current code relies heavily on NumPy and has shown tobe a bottleneck in this project. / Datorer är en central och oundviklig del av mångas vardag idag. De framsteg som har gjorts inom maskin-inlärning har gjort det nästintill lika viktigt inom mångas vardag som datorer. Med de otroliga framsteg som gjorts inom maskininlärning så har man börjat använda det för att försöka tolka hjärnsignaler, i hopp om att skapa BCI (Brain Computer Interface) eller hjärn dator gränssnitt. Forskare på Lund Universitet genomförde ett experiment där de försökte kategorisera hjärnsignaler med hjälp av maskininlärning. Forskarna försökte kategorisera mellan tre olika saker, objekt, ansikten och landmärken. En av de större utmaningarna med projektet var att det tog väldigt lång tid att beräkna på en vanlig dator, runt en veckas tid. Det här projektet hade som uppgift att försöka förbättra och snabba upp beräkningstiden av koden. Projektet översatte den kod som skulle förbättras från programmeringspråket MATLAB till Python. Projektet använde sig utav profilering, kluster och av ett accelereringsverktyg. Med hjälp av profilering kan man lokalisera delar av kod som körs långsamt och förbättra koden till att vara snabbare, ett optimeringsverktyg helt enkelt. Kluster är en samling av datorer som man kan använda för att kollektivt beräkna större problem med, för att öka beräkningshastigheten. Det här projektet använde sig utav ett ramverk kallat Ray, vilket möjliggjorde beräkningar av koden på ett kluster ägt av Ericsson. Ett accellereringsverktyg kallat the Accelerator implementerades också, separat från Ray implementationen av koden. The Accelerator utnyttjar endast lokala processorer för att parallelisera ett problem gentemot att använda flera datorer. Den största fördelen med the Accelerator är att den kan hålla reda på vad som beräknats och inte och sparar alla resultat automatiskt. När the Accelerator håller reda på allt så kan det återanvända gamla resultat till nya beräkningar ifall gammal kod används. Återanvändningen av gamla resultat betyder att man undviker beräkningstiden det skulle ta att beräkna kod man redan har beräknat. Detta projekt förbättrade beräkningshastigheten till att vara över två hundra gånger snabbare än den var innan. Med både Ray och the Accelerator sågs en förbättring på över två hundra gånger snabbare, med de bästa resultaten från the Accelerator på runt två hundra femtio gånger snabbare. Det skall dock nämnas att de bästa resultaten från the Accelerator gjordes på en bra server processor. En bra server processor är en stor investering medan en klustertjänst endast tar betalt för tiden man använder, vilket kan vara billigare på kort sikt. Om man däremot behöver använda datorkraften mycket kan det vara mer lönsamt i längden att använda en serverprocessor. En förbättring på två hundra gånger kan ha stora konsekvenser, om man kan se en sådan förbättring i hastighet för BCI överlag. Man skulle potentiellt kunna se en tolkning av hjärnsignaler mer i realtid, vilket man kunde använda till att styra apparater eller elektronik med. Resultaten i det här projektet har också visat att NumPy, ett vanligt beräknings bibliotek i Python, har saktat ned koden med de standardinställningar det kommer med. NumPy gjorde kod långsammare genom att använda flera trådar i processorn, även i en flertrådad miljö där manuell parallelisering hade gjorts. Det visade sig att NumPy var långsammare för både den fler och entrådade implementationen, vilket antyder att NumPy kan sakta ned kod generellt, något många är omedvetna om. Efter att manuellt fixat de miljövariabler som NumPy kommer med, så var koden mer än tre gånger så snabb än innan. / <p>Xavante Erickson ORCID-id: 0009-0000-6316-879X</p><p></p> acceleration ray accelerator numpy machine-learning machine learning optimization parallelization speedup profiling Computer Sciences Datavetenskap (datalogi)
229	Non-linear effects in the ATLAS track-counting luminosity measurement Gautam, Daniel January 2023 (has links) In this thesis the linearity of the ATLAS track-counting luminosity measurement is studied using two different sets Monte Carlo simulated crossings of proton-proton bunches. A primary high-momentum, or hard, interaction must be chosen for the Monte Carlo simulation. The first of the two sets is simulated using Z→µµ as primary hard scatter in the bunch crossings while the second set is simulated with a single neutrino particle gun as the primary hard scatter. The luminosity can be determined by track counting from the relationship between the number of reconstructed charged particle tracks and the number of proton-proton interactions per bunch crossing in the ATLAS detector. The relationship between the two is theoretically linear but is affected by non-linear effects from the presence of fake tracks and the reduced tracking efficiency at large µ. The linearity is studied and compared for eight different sets of track selection criteria called working points. Four of the working points were used during Run 2 of the Large Hadron Collider and four are introduced for Run 3. It is found that the use of the physical hard scatter, Z→µµ, in the Monte Carlo generation results in the appearance of tracks at all interaction rates, to a degree that does not agree with experiment. The use of the single neutrino particle gun for the simulation of hard scatter interactions is found to be more suitable for the track counting studies. Two of the working points introduced for Run 3, called TightModHighPtStrictLumi and TightModFullEtaHighPtStrictLumi, are found to outperform the rest of the working points. / I denna uppsats studeras linjäriteten av en luminositet-mätningsmetod kallad track-counting som används vid ATLAS-detektorn. Linjäriteten studeras för två olika uppsättningar av simulerade proton-protonkollisioner. Kollisionerna produceras med hjälp av Monte Carlo-simuleringar. Den första uppsättningen simuleras användandes Z→µµ som mest högenergetisk interaktion i alla event medan den andra uppsättningen istället simuleras användandes en högenergetisk neutrinopartikel i alla event. Med hjälp av track-counting bestäms luminositeten genom förhållandet mellan antalet rekonstruerade laddade partikelspår och antalet proton-protoninteraktioner per "bunch crossing" i ATLAS-detektorn. I teorin är relationen mellan de två linjär, men track-counting metoden påverkas av icke-linjära effekter såsom falskt rekonstruerade partikelspår och minskad effektivitet vid stora µ-värden. Linjäriteten studeras och jämförs för åtta olika uppsättningar av kriterier som appliceras på partikelspåren. Linjäriteten jämförs for åtta olika uppsättningar av spårkriterier som kallas "Working points". Fyra Working points har tidigare använts under den andra körningen av "the Large Hadron Collider" medan fyra Working points är introducerade inför den tredje körningen. Användningen av uppsättningen kollisioner som simuleras med Z→µµ som mest högenergetisk interaktion resulterar i spår vid alla µ-värden till en grad som inte överensstämmer med förväntningar. Användningen av neutrinopartikeln som mest högenergetsik interaktion vid simulering av event visar sig vara mer lämplig för studier som berör track-counting. Två av de Working points som introducerades inför den tredje körningen av "The Large Hadron Collider" visar sig prestera bättre än de andra. Dessa Working points har namnen TightModHighPtStrictLumi och TightModFullEtaHighPtStrictLumi. Particle physics CERN ATLAS Track counting Luminosity Monte Carlo Track Selection Accelerator Physics and Instrumentation Acceleratorfysik och instrumentering
230	Improving polarizing neutron optics by introducing 11B4C as interlayers Falk, Martin January 2023 (has links) In this report, the effects of adding 11B4C as interlayers into Fe/Si multilayers is studied. Fe/Si multilayers are commonly used for neutron polarization at large research facilities, and improving the polarizing properties would improve their efficiency. To study this, DC magnetron sputtering was used to make different sets of samples varying interlayer thicknesses, period thicknesses, number of periods, layer thickness ratios and also testing it with steel instead of iron in the multilayers. The samples were then studied using a series of characterization techniques to study how different growth parameters affected the sample’s properties. X-ray diffraction(XRD) and selected area electron diffraction (ED) were used for studying the crystal structure of the samples. X-ray reflectometry (XRR) was used to for fitting layer thicknesses and interface widths, and also to compare reflectivities. Elastic recoil detection analysis (ERDA) was used to study the compositions changes of the samples. Vibrating sample magnetometry (VSM) gave information about how the magnetization changed between samples. Transmission electron microscopy (TEM) visualized the structure of the samples. Finally, polarized neutron reflectometry (PNR) was done at Institute Laue Langevin (ILL), revealing the actual polarization of the samples. The results of the measurements concluded that for a sample with 40 periods, a period thickness of approximately 16 Å and a thickness ratio of around 0.5 for iron and silicon, using 1 Å thick 11B4C interlayers improved the polarization between the Bragg peaks by 60 %, and at the angle of the spin up peak by 130 %. The results also indicate improved polarization for samples with more or thicker periods. Using low carbon steel instead of iron showed poor results for thin layers, however showed promise for thicker layers due to good reflectivity results, but further testing is required. Thin film multilayers Neutron optics Neutron polarization Accelerator Physics and Instrumentation Acceleratorfysik och instrumentering

Search results