Global ETD Search

151	Performance Analysis of Cache-Aware Multicore Parallelization with Application to Optimization Theory Stensen, Kristoffer January 2012 (has links) In previous work, a cache-aware sparse matrix multiplication for linear programming interior point methods was proposed. The serial implementations achieved speedups ranging from 1.2 to 108.0 over the implementation in GLPK, an open-source linear programming solver. In this work, the same ideas and data structures are used to develop a cache-aware sparse cholesky decomposition as it is implemented in GLPK. The serial implementation achieves a speedup of up to 2.5 on the problem set considered. The matrix multiplication and cholesky decomposition are analysed by use of performance counters on both an AMD-based and an Intel-based system. The analysis shows that the applied blocking techniques reduce the number of floating point operations performed, and that this effect is even more important than the achieved cache utilization to produce speedup for some problems. ntnudaim:7758 MTDT datateknikk Komplekse datasystemer
152	Scalability Modeling for Optimal Provisioning of Data Centers in Telenor : A better balance between under- and over-provisioning Rygg, Knut Helge January 2012 (has links) The scalability of an information system describes the relationship between system ca-pacity and system size. This report studies the scalability of Microsoft Lync Server 2010 in order to provide guidelines for provisioning hardware resources. Optimal pro-visioning is required to reduce both deployment and operational costs, while keeping an acceptable service quality.All Lync servers in the test setup are virtualizedusingVMware ESXi 5.0 and the system runs on a Cisco Unified Computing System (UCS) platform. The scenario is a typical hosted Lync deployment with external users only and telephone integration. While several companies offer hosted virtual Lync deployments and the Cisco UCS platform has a rich market share, Microsofts capacity planning guides dont provide help for such a deployment scenario or hardware platform. This report consequently fill an information gap.The scalability is determined by using empirical measurements with different work-loads and system sizes. The workload is determined by the number of Lync end-users and the system size varies from 1/4 to 4/4 Cisco UCS blade server. The results show a linear scaling in the range of 1/4 to 4/4 blade servers. The processor is found to be the main bottleneck resource in this deployment. Themean opinion score (MOS) aswell as the front end server utilization are the best metrics formonitoring service quality. ntnudaim:6400 MTDT datateknikk Komplekse datasystemer
153	Terrain Rendering Techniques for the HPC-Lab Snow Simulator Babington, Kjetil January 2012 (has links) This thesis presents a technique for GPU-based terrain rendering and the changes made to the HPC-lab snow simulator to integrate the new terrain rendering technique into the simulator. Our novel terrain rendering technique combines ideas from existing terrain rendering techniques such as CDLODcite{CDLOD} and Geometry Clipmapscite{geoclipmaps} into a hybrid method. The terrain rendering works on patches of quads, that are tessellated, using hardware tessellation based on the level of detail needed. The tessellated patches are then displaced, using a vertex texture fetch of the heightmap in the tessellation shader. The implemented GPU terrain rendering technique is then added to the HPC-lab snow simulator, and changes to the simulator are implemented to facilitate the new terrain rendering technique, all of the old GLSL shaders are updated to the newest standard, the code structure is changed, and the collision detection of the snow simulator is updated to accommodate changes made to the terrain.The results from our benchmarks show that the tessellation pipeline can be used to facilitate terrain triangle count of over 16 million triangles while maintaining a stable frame rate of over 1400 FPS. When used in combination with the simulator, the implementation is still able to achieve frame rates that are vastly greater than the old implementation in the snow simulator. The visual results acheived from using Perlin noise gives the simulator a more realistic feel, while not degrading the performance of the implementation. Suggestions for futher improvements are also included. ntnudaim:7613 MTDT datateknikk Komplekse datasystemer
154	Implementing a Heterogeneous Multi-Core Prototype in an FPGA Rusten, Leif Tore, Sortland, Gunnar Inge January 2012 (has links) Since the mid-1980s processor performance growth has been remarkable, with an annual growth of about 52 %. Methods such as architectural enhancements exploiting ILP and frequency scaling have been effective at increasing performance, but are now limited by its diminishing returns and the power wall. Heterogeneous processors as an alternative source for continued growth looks promising, but research on heterogeneous software is made difficult as heterogeneous hardware is in low supply. This thesis cover the design and implementation of a heterogeneous processor called SHMAC and its framework. Flexibility of the delivered system allows rapid exploration of both hardware and software sides of heterogeneous processor research questions. The system is intended for research at CARD at NTNU. Two processor tiles and a set of additional tiles for extended functionality are provided, yielding a wide range of possible hardware setups in the delivered framework. Using a Xilinx Virtex 6 we were able to implement 40 integer cores or 16 floating-point cores. ntnudaim:7315 MTDT datateknikk Komplekse datasystemer
155	Video Games: Game AI Jensen, Remy January 2010 (has links) The goals of this project was to learn the industry standards of what good and challenging game AI was. The author reviewed literature on the topic and had personal correspondence where the research questions was answered by professionals within the field. The availability of open literature and the openness of the professionals really helped with understanding the industry standards to game AI.Using the information from the research a prototype system adhering to the industry standard was made with the intention of expanding it into an experimental prototype using unorthodox techniques to achieve the appearance of intelligence. By practical application of the methods learned it became apparent certain theoretical ideas was not optimally compatible with the provided framework. And a redesign of the conflicting module was nescecary.The system implemented performed well within the industry standards. But no experimental prototyping of unorthodox methods could be made with the system due to lack of time to implement such features. A couple of optimality tweaks been discovered since the end of the implementation phase of the project and the author will keep theese in mind when continuing work on the system in the future. ntnudaim:5299 SIF2 datateknikk Komplekse datasystemer
156	Computerised Methods and Device for Intuitive Use of the Human Hand for Touching and Re-shaping of Three-Dimensional Virtual Objects Neshaug, Vegar January 2010 (has links) Today, the use of the mouse and keyboard input devices to interfacewith the computer is common almost regardless for what end the computeris used. This is no less true for users who daily work in three-dimensionalmodelling. Because of this, there is a signicant threshold for individualsbeginning three-dimensional modelling before being able to form even themost rudimentary of objects. A possible approach to lowering this thresh-old is to bridge the gap between traditional sculpting and virtual sculptingby utilizing the full range of human hand motion when interfacing withthe computer. This work review the status of Human-Computer Interaction devicesand methods, and set out to design and implement a low-cost data gloveprototype. A polygon mesh deformation method is developed which demonstratesthe functionality of the glove and system design. The work constitutes a useful platform for further academic work inthis eld. ntnudaim:5597 SIF2 datateknikk Komplekse datasystemer
157	WoolPlot: A Visual Wool Profiler Hemmen, Peter January 2011 (has links) Task-based programming involves creating tasks, which can be run independently of each other, and letting the run-time system schedule the tasks on the underlying architecture. Wool is a new library for task-based programming created at SICS in Sweden. To assist a developer who is using Wool to parallelize a program, as well as the scientists who are actually developing Wool, a profiler which shows what happened in a computation can be very helpful.In this project we modify the Wool library to print more data about its computations. When the output is given to a Java application also developed in this project, the Java application produces a graphical representation of the execution. Each worker thread is visualized separately, with spawns, steals, leaps, critical path and CPU usage information included at a position corresponding to when the events actually occurred.The profiler, which we have named WoolPlot, is put to the test using a few real-world benchmarks, as well as some created especially for this project. The benchmarks show that WoolPlot works well when describing the distinct events such as steals and spawns. The reporting on the CPU load is too inaccurate to be sufficient for all practical uses. The overhead of the profiler is estimated to be between 3% and 6%. ntnudaim:6045 MTDT datateknikk Komplekse datasystemer
158	Parallel Methods for Projection on Strongly Curved Surfaces Chelliah, Joel Eelaraj January 2011 (has links) Using the parallel architecture of the graphics processing unit for general purpose programming has become increasingly common in the recent years. The process of creating a mathematically correct transformation of a scene for curved stereoscopic projection is a very expensive task, which would greatly benefit from a massively parallel solution implemented on the GPU.In this thesis, we first investigate two different methods for obtaining a mathematically correct transformation of images intended for stereoscopic projection on strongly curved surfaces. One method revolves around transforming a pre-rendered image, pixel by pixel, while the other method applies the transformation to the projection of the vertices in the scene before they are rendered as an image. We then develop massively parallel solutions for both these methods on the GPU, striving to a reach a real-time rate for the stereoscopic projection of the transformed images.We test both methods for different problem areas, and compare the results to map their strengths and weaknesses. From the obtained results, we conclude that they are both useful in different areas. The vertex transformation performs poorly when the number of vertices in the scene is very high, but for a moderate number of vertices it achieves excellent results, even for exceptionally large image resolutions. The pixel transformation is far less affected by the number of vertices in the scene; however its performance declines rapidly as we increase the size of the image. Both methods were able to execute in real-time for relevant problem sizes. ntnudaim:6190 MTDT datateknikk Komplekse datasystemer
159	A PCI Express communication interface for DMP camera arrays Lye, Tor Arne January 2010 (has links) Distributed Multimedia Plays (DMP) is a virtual collaboration system intended to provide real time audiovisual communication between multiple users. The system will produce near-natural picture and sound quality.This report explores the requirements of a camera interface unit forDMP. This device interfaces with several image sensors and allows them to communicate on a common serial communications channel based on the PCI Express standard. The noteworthy features of PCI Express are outlined, and the standard is compared to the alternative Aurora communications protocol. A functional prototype of a camera interface system has been implemented in VHDLand synthesised for an FPGA. The theoretical performance of this system is analysed and its suitability for use with DMP is evaluated. The results show that real time performance is possible with this architecture using a single PCI Express lane.As PCI Express is originally an internal computer bus, a simple test has been performed in order to determine whether it can be employed as an external communications interface. The results show that reliable communication is possible across distances of 1.5 meters or more. ntnudaim:5536 SIF2 datateknikk Komplekse datasystemer
160	Evaluating the Influence of Network Structure on Boolean Networks and Cellular Automata Hvaal, Harald January 2010 (has links) While there have been many papers respectively on the qualities of Boolean networks and Cellular Automata, little work has been done on comparing these networks to each other. Network parameters such as input count and choice of Boolean functions are often fixed in preparation of the experiments with less regard to what effect that choice has. In this paper a broader overview of how the choice of network structure and network parameters will affect the behavior of the network is given. Metrics such as iterations until stabilization (intermediary state count) and complexity of network behavior over time (functional complexity) are proposed, and evaluated for a set of 15 different network configurations. CA networks are observed to have much less functional complexity than BN, and in general BN seems to have more potential for complex behavior. It is also observed that for increasing values of dimension count/input count the functional complexity decreases. ntnudaim:5362 SIF2 datateknikk Komplekse datasystemer

Search results