Global ETD Search

21	Structure-based Optimizations for Sparse Matrix-Vector Multiply Belgin, Mehmet 16 January 2011 (has links) This dissertation introduces two novel techniques, OSF and PBR, to improve the performance of Sparse Matrix-vector Multiply (SMVM) kernels, which dominate the runtime of iterative solvers for systems of linear equations. SMVM computations that use sparse formats typically achieve only a small fraction of peak CPU speeds because they are memory bound due to their low flops:byte ratio, they access memory irregularly, and exhibit poor ILP due to inefficient pipelining. We particularly focus on improving the flops:byte ratio, which is the main limiter on performance, by exploiting recurring structures or sub-structures in matrices. Our techniques also support micro-architecture level optimizations to further improve performance. Operation Stacking Framework (OSF) stacks problems in large ensemble computations, which run the same sparse kernel using an identical matrix structure, such that they share a single copy of the indexing information to significantly reduce memory bandwidth usage. OSF provides performance improvements of up to 1.94x on an AMD Opteron compared to the CSR method. We validate performance results using hardware event counters, which demonstrate significantly improved cache and pipeline utilization. Pattern-based Representation (PBR) exploits recurring block nonzero patterns by generating custom code for each recurring block pattern. In this way, no indexing data for individual nonzero elements are read from memory, reducing the overall size of the indices by up to 98%. Our code generator emits highly tuned codes that utilize SSE vectorization and software prefetching. PBR accurately identifies a block size that achieves optimal or near-optimal performance using a linear multiple regression performance model. On recent multicore machines, PBR provides performance improvements of up to 3.4x sequentially and 5x in parallel, compared to the CSR method. The PBR library we provide converts matrices at runtime, allowing our method to be used as a drop-in replacement for existing methods. We compare PBR's overhead relative to its benefits and show that PBR is beneficial for many applications that repetitively call the SMVM kernel for the same matrix structure. / Ph. D. Code Generators Vectorization Sparse SpMV SMVM Matrix Vector Multiply PBR OSF thread pool parallel SpMV
22	Multiobjective Shape Optimization of Linear Elastic Structures Considering Multiple Loading Conditions (Dealing with Mean Compliance Minimization problems) SHIMODA, Masatoshi, AZEGAMI, Hideyuki, SAKURAI, Toshiaki 15 July 1996 (has links) No description available. Optimum Design Computational Mechanics Finite Element Method Structural Analysis Domain Optimization Multiobjective Optimization Pareto Solution Traction Method Multiply Connected Domain
23	On Post's embedding problem and the complexity of lossy channels Chambart, Pierre 29 September 2011 (has links) (PDF) Lossy channel systems were originally introduced to model communication protocols. It gave birth to a complexity class wich remained scarcely undersood for a long time. In this thesis we study some of the most important gaps. In particular, we bring matching upper and lower bounds for the time complexity. Then we describe a new proof tool : the Post Embedding Problem (PEP) which is a simple problem, closely related to the Post Correspondence Problem, and complete for this complexity class. Finally, we study PEP, its variants and the languages of solutions of PEP on which we provide complexity results and proof tools like pumping lemmas. [INFO:INFO_OH] Computer Science/Other [INFO:INFO_OH] Informatique/Autre Lossy channel systems Model-checking Multiply recursive function Post's embedding problem
24	複数荷重を考慮した線形弾性体の多目的形状最適化(平均コンプライアンス最小化問題を例として) 下田, 昌利, Shimoda, Masatoshi, 畔上, 秀幸, Azegami, Hideyuki, 桜井, 俊明, Sakurai, Toshiaki 02 1900 (has links) No description available. Optimum Design Computational Mechanics Finite-Element Method Structural Analysis Domain Optimization Multiobjective Optimization Pareto Solution Traction Method Multiply Connected Domain
25	IMPLEMENTATION OF A NOVEL INTEGRATED DISTRIBUTED ARITHMETIC AND COMPLEX BINARY NUMBER SYSTEM IN FAST FOURIER TRANSFORM ALGORITHM Bowlyn, Kevin Nathaniel 01 December 2017 (has links) This research focuses on a novel integrated approach for computing and representing complex numbers as a single entity without the use of any dedicated multiplier for calculating the fast Fourier transform algorithm (FFT), using the Distributed Arithmetic (DA) technique and Complex Binary Number Systems (CBNS). The FFT algorithm is one of the most used and implemented technique employed in many Digital Signal Processing (DSP) applications in the field of science, engineering, and mathematics. The DA approach is a technique that is used to compute the inner dot product between two vectors without the use of any dedicated multipliers. These dedicated multipliers are fast but they consume a large amount of hardware and are quite costly. The DA multiplier process is accomplished by shifting and adding only without the need of any dedicated multiplier. In today's technology, complex numbers are computed using the divide and conquer approach in which complex numbers are divided into two parts: the real and imaginary. The CBNS technique however, allows for each complex addition and multiplication to be computed in one single step instead of two. With the combined DA-CBNS approach for computing the FFT algorithm, those dedicated multipliers are being replaced with a DA system that utilize a Rom-based memory for storing the twiddle factor 'wn' value and the complex arithmetic operations being represented as a single entity, not two, with the CBNS approach. This architectural design was implemented by coding in a very high speed integrated circuit (VHSIC) hardware description language (VHDL) using Xilinx ISE design suite software program version 14.2. This computer aided tool allows for the design to be synthesized to a logic gate level in order to be further implemented onto a Field Programmable Gate Array (FPGA) device. The VHDL code used to build this architecture was downloaded on a Nexys 4 DDR Artix-7 FPGA board for further testing and analysis. This novel technique resulted in the use of no dedicated multipliers and required half the amount of complex arithmetic computations needed for calculating an FFT structure compared with its current traditional approach. Finally, the results showed that for the proposed architecture design, for a 32 bit, 8-point DA-CBNS FFT structure, the results showed a 32% area reduction, 41% power reduction, 59% reduction in run-time, 42% reduction in logic gate cost, and 66% increase in speed. For a 28 bit, 16-point DA-CBNS FFT structure, its area size, power consumption, run-time, and logic gate, were also found to be reduced at approximately 30%, 37%, 60%, and 39%, respectively, with an increase of speed of approximately 67% when compared to the traditional approach that employs dedicated multipliers and computes its complex arithmetic as two separate entities: the real and imaginary. Complex Binary Number System (CBNS) Digital Signal Processing (DSP) Distributed Arithmetic (DA) Fast Fourier Transform (FFT) Multiply and Accumulate (MAC)
26	Using Class Pass Intervention (CPI) to Decrease Disruptive Behavior in Children Zuniga, Andrea N. 07 March 2019 (has links) Finding of previous research has shown that disruptive behavior can impair students’ academic success (Pierce, Reid, & Epstein, 2004), as well as increase teacher’s stress level (Westling, 2010). Class Pass Intervention (CPI) is a Tier 2 intervention designed to decrease disruptive behavior and increase academic engagement, however, thus far research on the effects of CPI has been limited to typically developing elementary and high school students with escape and attention-maintained problem behaviors. Therefore, the purpose of this study was to replicate and extend previous research on the effects of CPI on problem behavior and academic engagement however with students whose problem behavior was multiply-maintained. The study used a multiple baseline design to assess experimental control. In the current study, CPI led to a decrease in problem behavior and increase in academic engagement for two students with ADHD and one student at risk of ADHD, all of whom engaged in problem behavior maintained by escape, access to attention, or both. In addition, results of a social validity assessment completed with teachers and students indicated that the intervention was effective and easy, respectively. academic engagement multiply-maintained behaviors negative reinforcement positive reinforcement Social and Behavioral Sciences
27	Redundant Number Systems for Optimising Digital Signal Processing Performance in Field Programmable Gate Array Kamp, William Hermanus Michael January 2010 (has links) Speeding up addition is the key to faster digital signal processing (DSP). This can be achieved by exploiting the properties of redundant number systems. Their expanded symbol (digit) alphabet gives them multiple representations for most values. Utilising redundant representations at the output of an adder permits addition to be performed without carry-propagation, yielding fast, constant time performance irrespective of the word length. A resource efficient implementation of this fast adder structure is developed that re-purposes the fast carry logic of low-cost field programmable gate arrays (FPGAs). Experiments confirm constant time addition and show that it outperforms binary ripple carry addition at word lengths of greater than 44 bits in a Xilinx Spartan 3 FPGA and 24 bits in an Altera Cyclone III FPGA. Redundancy also provides other properties that can be exploited for performance gain. Some redundant representations will have more zero-symbols than others. These maximise the opportunities to exploit the multiplicative absorbing and additive identity properties of zero that when exercised reduce superfluous calculations. A serial recoding algorithm is developed that generates a redundant representation for a specified value with as few nonzero symbols as possible. Unlike previously published methods, it accepts a wide specification of number systems including those with irregularly spaced symbol alphabets. A Markov analysis and analysis of the elementary cycles in the formulated state machine provides average and worst case measures for the tested number system. Typically, the average number of non-zero symbols is less than a third and the worst case is less than a half. Further to the increase in zero-symbols, zero-dominance is proposed as a new property of redundant number representations. It promotes a set of representations that have uniquely positioned zero-symbols, in a Pareto-optimal sense. This set covers all representations of a value and is used to select representations to optimise the calculation of a dot-product. The dot-product or vector-multiply is a fundamental operation in DSP, since it is employed in filtering, correlation and convolution. The nonzero partial products can be packed together, substantially reducing the calculation time. The application of redundant number systems provides a two-fold benefit. Firstly, the number of nonzero partial products is reduced. Secondly, a novel opportunity is identified to use the representations in the zero-dominant set to optimise the packing further, gaining an extra 18% improvement. An implementation of the proposed dot-product with partial product packing is developed for a Cyclone II FPGA. It outperforms a quad-multiplier binary implementation in throughput by 50% . Redundant number systems excel at increasing performance in particular DSP subsystems, those that are numerically intensive and consist of considerable accumulation. The conversion back to a binary result is the performance bottleneck in the DSP algorithm, taking a time proportional to a binary adder. Therefore, redundant number systems are best utilised when this conversion cost can be amortised over many fast redundant additions, which is typical in many DSP and communications applications. Redundant number systems Field programmable gate array FPGA Digital signal processing DSP Zero dominant set ZDS Hamming weight partial product packing dot product multiply accumulate
28	Produtos torcidos e variedades conformemente planas / Warped products and conformally flat manifolds. Bonfim, Paula Gonçalves Correia 25 February 2015 (has links) Submitted by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-15T14:04:40Z No. of bitstreams: 2 Dissertação - Paula Gonçalves Correia Bonfim - 2015.pdf: 761299 bytes, checksum: 88e01b4ea63a9e5d1b49cc325edca279 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-05-15T14:10:40Z (GMT) No. of bitstreams: 2 Dissertação - Paula Gonçalves Correia Bonfim - 2015.pdf: 761299 bytes, checksum: 88e01b4ea63a9e5d1b49cc325edca279 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-05-15T14:10:40Z (GMT). No. of bitstreams: 2 Dissertação - Paula Gonçalves Correia Bonfim - 2015.pdf: 761299 bytes, checksum: 88e01b4ea63a9e5d1b49cc325edca279 (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2015-02-25 / Coordenação de Aperfeiçoamento de Pessoal de Nível Superior - CAPES / In this work, we propose to present concepts and results that guide us to construction of examples of complete locally conformally flat manifolds of nonpositive curvature obtained by Brozos-Vázquez, García-Río e Vázquez-Lorenzo in [4]. For this, was made a study of product manifolds equipped with a multiply warped metric. / Neste trabalho nos propomos a apresentar conceitos e resultados que nos guiem para a construção de exemplos de variedades completas localmente conformemente planas de curvatura seccional não positiva, obtidos por Brozos-Vázquez, García-Río e Vázquez- Lorenzo em [4]. Para isso, foi feito um estudo sobre variedades produto equipadas com uma métrica torcida múltipla. Produtos torcidos múltiplos Curvatura seccional não positiva Multiply warped products Locally conformally flat manifolds Nonpositive sectional curvature CIENCIAS EXATAS E DA TERRA::MATEMATICA
29	Métrica produto torcido e variedades de curvatura negativa / Warped product metric and manifolds of negative curvature Santos, Aderval Alves dos 16 April 2015 (has links) Submitted by Cláudia Bueno (claudiamoura18@gmail.com) on 2015-10-22T19:38:27Z No. of bitstreams: 2 Dissertação - Aderval Alves dos Santos - 2015.pdf: 1809483 bytes, checksum: 2d02135104ab475d9fa74b9e024f978b (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Approved for entry into archive by Luciana Ferreira (lucgeral@gmail.com) on 2015-10-23T11:06:33Z (GMT) No. of bitstreams: 2 Dissertação - Aderval Alves dos Santos - 2015.pdf: 1809483 bytes, checksum: 2d02135104ab475d9fa74b9e024f978b (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) / Made available in DSpace on 2015-10-23T11:06:33Z (GMT). No. of bitstreams: 2 Dissertação - Aderval Alves dos Santos - 2015.pdf: 1809483 bytes, checksum: 2d02135104ab475d9fa74b9e024f978b (MD5) license_rdf: 23148 bytes, checksum: 9da0b6dfac957114c6a7714714b86306 (MD5) Previous issue date: 2015-04-16 / Conselho Nacional de Pesquisa e Desenvolvimento Científico e Tecnológico - CNPq / This work, based on the articles M. Brozos Vazquez, E. Garcia-Rio and R. Vazquez- Lorenzo whose goal is to build examples of manifolds locally conformally flat full of negative curvature through warped product and multiply warped product structure. The warped product was first introduced by Bishop and O’Neill, who modified the structure of the Riemannian product in obtaining new manifolds of negative curvature. / Este trabalho, baseado no artigo de M. Brozos-Vázquez, E. Garcia-Río e R. Vázquez- Lorenzo, tem como objetivo construir exemplos de variedades localmente conformemente flat completas de curvatura negativa por meio de produto torcido e estrutura de produto torcido mútiplo. Os produtos torcidos foram introduzidos primeiramente por Bishop e O’Neill, que modificaram a estrutura do produto Riemanniano na obtenção de novas variedades de curvatura negativa. Produto torcido Produto torcido multiplo Conformemente flat Curvatura negativa Warped product Multiply warped product Conformally flat Negative curvature MATEMATICA::GEOMETRIA E TOPOLOGIA ALGEBRA::GEOMETRIA ALGEBRICA
30	Interactions of slow multiply charged ions with large, free radiosensitizing metallic nanoparticles / Interaction d'ions multichargés lents avec des nanoparticules métalliques radiosensibilisantes Mika, Arkadiusz 19 December 2017 (has links) Cette thèse est consacrée à l'étude de l'interaction d'ions multichargés avec des particules métalliques de taille nanométrique. Ce travail a eu pour but d'étudier les processus fondamentaux ainsi que d'éclairer leur rôle comme radio-sensibilisants dans le traitement de cancer par hadronthérapie. Le nouveau dispositif développé dans ce cadre consiste en une source d'agrégats de type magnétron, d'une chambre de dépôt afin de permettre la caractérisation de la taille des nanoparticules neutres par analyse microscopique, et d'un spectromètre de masse par temps de vol capable de détecter des systèmes positivement chargés jusqu'à une masse de 50 000 ua. Les études de collisions ont été réalisées avec des agrégats de Bi (2 nm ; 200 atomes) et de Ag (6 nm ; 5000 atomes). Dans le deux cas, le processus de capture multiélectronique crée un système multichargé. Dans le cas du Bi, une grande partie fragmente par la fission asymétrique émettant des petits fragments. Dans le cas des particules plus grandes (Ag), les systèmes multichargés ne fragmentent pas, par contre des petits fragments sont aussi observés mais ils sont le produit de la pulvérisation de la nano-surface lors de collisions pénétrantes. En perspective, des expériences seront réalisées avec des nanoparticules métalliques fonctionnalisées ainsi que le comptage des électrons émis lors de la collision. / This thesis presents a study of the interaction of multiply charged ions with metallic nano-sized particles both in the context of fundamental processes and possible applications as radiosensitizers in nanoparticle-enhanced hadrontherapy. For this purpose a new experimental set-up has been constructed based on a magnetron-discharge cluster source, a deposition chamber for analyzing the size of neutral nanoparticles with AFM and TEM techniques and a time-of-flight mass spectrometer able to detect positively charged particles with masses up to 50 000 amu. Collision studies were performed with Bi clusters of 2nm in diameter, containing 200 atoms, as well as Ag nanoparticles (6 nm, 5000 atoms). In both cases multi-electron capture leads to the formation of multiply charged systems. In the Bi case a large fraction fragments by asymmetric fission emitting small singly charged fragments. In the case of large Ag nanoparticles multiply charged systems are stable. However, small size fragments are formed due to sputtering of the nano-surface in penetrating collisions. Future experiments will be performed with functionalized metal nanoparticles, aiming to count the number of electrons emitted after ion collisions. Milieux dilués et optique fondamentale Nanoparticules métalliques Capture multi- électronique Fission asymétrique Metallic nanoparticles, Multiply charged ions Time-of-flight mass spectrometry Asymmetric fission Sputtering Radiosensitizers Hadrontherapy

Search results