Spelling suggestions: "subject:"sio2"" "subject:"sift""
31 |
Dynamic Selection of MPI Intra-copy Routines Based on Program CharacteristicsBorg, Øystein Lauen January 2006 (has links)
<p>The Message Passing Interface(MPI) has become a de-facto standard for parallel programming. The ultimate goal of parallel processing is high performance and this brings a motivation for a highly optimized MPI - implementation. When an application calls an MPI communications routine, data is copied between user memory and the memory areas managed by the MPI library. The speed of this transfer depends on a multitude of factors, including the architecture, amount of data, data layout and whether the data is referenced right before or after a transfer. There are numerous ways to copy data from one location to another, and their characteristics combined with the data properties will yield different efficiency. The information needed to select the best way to copy data is only available during application execution. In this Master's Thesis, we present and implement a method to improve the performance of parallel applications by dynamically perform a close-to-optimal selection of intra-copy routines within an MPI implementation. Our method detect loops of MPI calls, and exploit loop predictability to time their performance while varying the routine selections. In order to obtain a good routine selection reasonably fast, a global optimization heuristic, simulated annealing, is used. In particular, our solution method is employed within Scali MPI Connect (SMC), an MPI implementation providing 35 different intra-copy routines. Through various benchmarks, it is observed that our method introduce low overhead and find a good selection fast, thus reducing the execution time of the given benchmark. In benchmarks where the difference between an optimal routine selection and the standard selection within SMC allows it, a bandwidth improvement of 40% is observed.</p>
|
32 |
Bandwidth-Aware Prefetching in Chip MultiprocessorsGrannæs, Marius January 2006 (has links)
<p>Chip Multiprocessors (CMP) are an increasingly popular architecture and increasing numbers of vendors are now offering CMP solutions. The shift to CMP architectures from uniprocessors is driven by the increasing complexity of cores, the processor-memory performance gap, limitations in ILP and increasing power requirements. Prefetching is a successful technique commonly used in high performance processors to hide latency. In a CMP, prefetching offers new opportunities and challenges, as current uniprocessor heuristics will need adaption or redesign to integrate with CMPs. In this thesis, I look at the state of the art in prefetching and CMP architecture. I conduct experiments on how unmodified uniprocessor prefetching heuristics perform in a CMP. In addition, I have proposed a new prefetching scheme based on bandwidth monitoring and prediction through performance counters, suited for embedded CMP systems. This new prefetching scheme has been simulated with SimpleScalar. It offers lower bandwidth usage (up to 47.8 %), while retaining most of the performance gains from prefetching for low accuracy prefetching heuristics.</p>
|
33 |
Visualization of water surface using GPUGustavsen, Jostein, Harkestad, Dan Lewi January 2006 (has links)
<p>Several methods for simulating a body of water and a water surface has been investigated. A method by Layton & van de Panne based on a simplification of the Navier-Stokes equations was selected. A number of simplifications was made to increase the performance of the method, and it was implemented on the programmable graphical processing unit (GPU) using the Jacobi method to solve the linear equations. A conjugate gradient solver was also implemented on the GPU. The performance of the methods were measured and recorded.</p>
|
34 |
User Interface for 3D Visualization with Emphasis on Combined Voxel and Surface Representation : Design ReportLyngset, Runar Ylvisåker January 2006 (has links)
<p>The thesis presents a user interface design aimed at the scenario where a dual representation of a volume is desired in order to emphasize certain parts of a volume using surface graphics while the rest of the volume is rendered using direct volume rendering techniques. A typical situation in which this configuration can prove useful is when studying images acquired for medical purposes. Sometimes the user wants to identify and represent an organ using an opaque surface in an otherwise partly opaque visualization of the volume data set. The design is based on the visualization library VTK along with Trolltech Qt, a GUI Toolkit in C++. The choice of using VTK as a visualization library was made after evaluating similar systems. The report includes a state of the art chapter, the requirements for the system, the system design and the results achieved after implementing the design are shown.</p>
|
35 |
Benchmarking Catastrophic Forgetting in Neural NetworksMoe-Helgesen, Ole-Marius January 2006 (has links)
<p>Catastrophic Forgetting is a behavior seen in artificial neural networks (ANNs) when new information overwrites old in such a way that the old information is no longer usable. Since this happens very rapidly in ANNs, it leads to both major practical problems and problems using the artificial networks as models for the human brain. In this thesis I will approach the problem from the practical viewpoint and attempt to provide rules, guidelines, datasets and analysis methods that can aid researchers better analyze new ANN models in terms of catastrophic forgetting and thus lead to better solutions. I suggest two methods of analysis that measure the overlap between input patterns in the input space. I will show strong indications that these measurements can predict if a back-propagation network will retain information better or worse. I will also provide source code implemented in Matlab for analyzing datasets, both with the new suggested measurements and other existing ones, and for running experiments measuring the catastrophic forgetting.</p>
|
36 |
Weighted Pattern Matching with PWMs on FPGAsKrutådal, Lars Karsten January 2006 (has links)
<p>This paper has presented a solution to an FPGA-based PWM matcher in the form of the so-called FPWM Prototype, using the hardware facilities on the Cray XD1 Supercomputer. The prototype implementation currently runs as a single core on a single node of the Cray, and provides a theoretical PWM matching capability roughly 15 times greater than a contemporary Pentium M general-purpose CPU. Theoretical and empirical data regarding performance and resource consumption for this implementation have been provided. A method for increasing the speedup to a theoretical maximum of 480x has also been described, using a multi-core implementation on a single chip. This theoretical limit could potentially be attained with today's hardware, but would require certain compromises with regard to bit resolution and PWM length in order to fit on the FPGA. A full-scale implementation providing the capabilities required by many of today's algorithms would most likely not reach this speed, but as the FPGA currently installed on the Cray is also available in a larger variant (the Virtex-4 family), it is reasonable to assume that such an implementation could indeed be feasible on contemporary hardware. A method for using several nodes on the Cray XD1 transparently for the user application, in order to further increase the performance, has also been described. However, as theoretical performance estimation on such hardware is a highly inexact science, and empirical measurements could not be performed at this time due to the state of the prototype, no estimates have been provided for this method. While some of the original goals were attained, other parts of the project could be considered a failure. Due to a number of implementation problems, a working FPWM was not available in time for use with the two other projects mentioned in the introduction, involving hardware acceleration of the Gibbs Sampling and MEME algorithms. The main problem with the cooperation between these projects was that it relied on the FPWM being in a finished and working condition before the work involving it could begin, which turned out to be much harder and take much longer time than what was first envisioned. The planned empirical measurements of the performance boost for these algorithms are therefore not yet available.</p>
|
37 |
Discovery of approximate composite motifs in biological sequencesValebjørg, Vetle Søraas January 2006 (has links)
<p>Mapping the regulatory system in living organisms is a great challenge, and many methods have been created during the last 15 years to solve this problem. The biological processes are however more flexible and complex than first thought, and many of the methods lack the ability to imitate this exactly. The new method devised here is not a complete solution to this situation, but pose an innovative solution for finding approximate composite patterns in a set of sequences. Motifs are read from any third-party tool represented as either {A,C,G,T}, IUPAC or PWMs, and weighted with significance and support as an estimate to how important the patterns are. Finding combinations with both high significance and support can reveal important properties preserved in the sequences. Based on this, the algorithm use a branch-and-bound approach to traverse every combination while preserving the best solutions in this multiple object optimization problem in a Pareto front. The best patterns found, are investigated further by applying different statistical and experimental method to better support the significance of the patterns found. The three most important tests done on the TransCompel dataset, where (i) to look at the patterns predicted measured against known sites based on nucleotide correlation. (ii) Find the frequency for motifs participating in the combinations, so that the best could be studied manually. And (iii), different test where compared when the significance was based on real background sequences instead of the uniform distribution. Some of the results found where low, but still similar to the accuracy provided by other known methods that have been tested with the same methods. The test results can be biased by the parameters used, a too simple and restrictive test set or by faulty predictions done one the dataset tested. More testing and tuning of parameters might result in better predictions. However, the different tests still proved this method to be a valuable tool in composite motif discovery.</p>
|
38 |
Automatic recognition of unwanted behaviorLøvlie, Erik Sundnes January 2006 (has links)
<p>The use of video surveillance in public areas is ever increasing. With that increase, it becomes impractical to continue using humans to view and respond to the surveillance video streams, due to the massive amount of information that must be processed. If one hope to use surveillance to avoid personal injuries, damage to property and so forth, instead of merely a forensic tool after the fact, humans must be replaced by artificial intelligence. This thesis examines the whole process of recognizing unwanted human behaviors from videos taken by surveillance cameras. An overview of the state of the art in automated security and human behavior recognition is given. Algorithms for motion detection and tracking are described and implemented. The motion detection algorithm uses background subtraction, and can deal with large amounts of random noise. It also detects and removes cast shadows. The tracking algorithm uses a spatial occupancy overlap test between the predicted positions of tracked objects and current foreground blobs. Merges/splits are handled by grouping/ungrouping objects and recovering the trajectory using distance between predicted position and foreground blobs. Behaviors that are unwanted in most public areas are discussed, and a set of such concrete behaviors described. New algorithms for recognizing chasing/fleeing scenarios and people lying on the floor are then presented. A real-time intelligent surveillance system capable of recognizing chasing/fleeing scenarios and people lying on the floor has been implemented, and results from analyzing real video sequences are presented. The thesis concludes with a discussion on the advantages and disadvantes of the presented algorithms, and suggestions for future research.</p>
|
39 |
En parallell løsning av cellulær utvikling i maskinvare / A True Parallel Approach to Cellular Artificial Development in HardwareØyan, Mats Jørgen January 2006 (has links)
<p>Dagens elektroniske kretser utvikles som regel med en top-ned designstrategi. Siden kretsene blir større og mer komplekse blir designet av kretsene også en større og vanskeligere jobb. For å takle denne økende kompleksiteten har det blitt introdusert nye designmetoder, blant annet inspirert av naturen. Denne type maskinvare kan for eksempel bruke evolusjon, cellestrukturer eller prøve å simulere intelligens. En mye brukt platform for biologisk inspirert maskinvare er FPGA. På grunn av en del begrensninger med denne har datamaskingruppa ved IDI på NTNU introdusert en virtuell FPGA kalt Sblock. Dette er en platform som fjerner en del av begrensningene fra å jobbe direkte på en FPGA og består av en matrise med celler. Hver celle har en funksjonalitet og en development-prosess som forandrer funksjonaliteten til cellen ut fra visse regler. Tidligere har det blitt laget en implementasjon av en Sblock-matrise med development-prosessen i en sentral ko-prosessor. Målet med denne oppgaven er å implementere og teste ut en Sblock-matrise hvor også development-prosessen er lagt parallellt i hver Sblock og finne ut hvor store matriser som er mulige å lage med tilgjengelig maskinvare. Resultatene viser at Sblock-matrisen skalerer godt og at brukt maskinvare øker lineært med antall Sblocker i matrisen. Den nye implementasjonen fungerer korrekt, men noen instruksjoner har blitt fjernet og noen lagt til i forhold til den tidligere implementasjonen.</p>
|
40 |
Anvendelse av biologisk inspirerte metoder i musikk / Applying biological inspired Methods in MusicOtteren, Per Kåre Hollund January 2006 (has links)
<p>Det eksisterer mange kjente metoder for å etterligne menneskelig tenkegang eller utøve kunstig intelligens. Et interessant anvendelsesområde for slike metoder er i musikken, siden dette er en kunstform uten klare regler eller absolutte sannheter. Uten regler og sannheter er det heller ikke en korrekt måte å lage musikk på. Arbeidet beskrevet i denne rapporten går ut på å bygge et system for å tolke musikalsk input fra en musikkutøver. Til dette blir det forsøkt benyttet nevrale nettverk. Et nevralt nettverk kan utføre denne oppgaven ved å trenes opp til å gjenkjenne hvilke akkorder som blir spilt. Systemet blir testet ved å koble det sammen med programmet Happysound (Utvliket i prosjektoppgaven høst 05). Dette er et program som benytter fuzzy logikk for å genererer musikalske soloer. Ved å koble et nevralt nettverk til dette fåes et system som tar inn akkompagnement fra en medspiller og spiller av soloer som passer til dette. Resultatene som presenteres viser at nevrale nettverk er en god metode for musikalske formål men at det legges begrensninger av hvor mange forskjellige akkordrepresentasjoner ett enkelt nevralt nettverk klarer å kjenne igjen. Til slutt fremlegges en utvidelse for å overkomme disse begrensningene.</p>
|
Page generated in 0.0391 seconds