• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 76
  • 16
  • 7
  • 5
  • 3
  • 3
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 148
  • 148
  • 59
  • 23
  • 21
  • 21
  • 19
  • 19
  • 19
  • 19
  • 16
  • 16
  • 15
  • 15
  • 14
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.

Distributed Hydrological Modeling Using Soil Depth Estimated from Landscape Variable Derived with Enhanced Terrain Analysis

Tesfa, Teklu K. 01 May 2010 (has links)
The spatial patterns of land surface and subsurface characteristics determine the spatial heterogeneity of hydrological processes. Soil depth is one of these characteristics and an important input parameter required by distributed hydrological models that explicitly represent spatial heterogeneity. Soil is related to topography and land cover due to the role played by topography and vegetation in affecting soil-forming processes. The research described in this dissertation addressed the development of statistical models that predict the soil depth pattern over the landscape; derivation of new topographic variables evaluated using both serial and parallel algorithms; and evaluation of the impacts of detailed soil depth representation on simulations of stream flow and soil moisture. The dissertation is comprised of three papers. In paper 1, statistical models were developed to predict soil depth pattern over the watershed based on topographic and land cover variables. Soil depth was surveyed at locations selected to represent the topographic and land cover variation at the Dry Creek Experimental Watershed, near Boise, Idaho. Explanatory variables were derived from a digital elevation model and remote sensing imagery for regression to the field data. Generalized Additive and Random Forests models were developed to predict soil depth over the watershed. The models were able to explain about 50% of the soil depth spatial variation, which is an important improvement over the soil depth extracted from the SSURGO national soil database. In paper 2, definitions of the new topographic variables derived in the effort to model soil depth, and serial and Message Passing Interface parallel implementations of the algorithms for their evaluation are presented. The parallel algorithms enhanced the processing speed of large digital elevation models as compared to the serial recursive algorithms initially developed. In paper 3, the impact of spatially explicit soil depth information on simulations of stream flow and soil moisture as compared to soil depth derived from the SSURGO soil database has been evaluated. The Distributed Hydrology Vegetation Soil Model was applied using automated parameter optimization technique with all input parameters the same except soil depth. Stream flow was less impacted by the detailed soil depth information, while simulation of soil moisture was slightly improved due to the detailed representation of soil depth.

Algorithm Adaptation and Optimization of a Novel DSP Vector Co-processor

Karlsson, Andréas January 2010 (has links)
<p>The Division of Computer Engineering at Linköping's university is currently researching the possibility to create a highly parallel DSP platform, that can keep up with the computational needs of upcoming standards for various applications, at low cost and low power consumption. The architecture is called ePUMA and it combines a general RISC DSP master processor with eight SIMD co-processors on a single chip. The master processor will act as the main processor for general tasks and execution control, while the co-processors will accelerate computing intensive and parallel DSP kernels.This thesis investigates the performance potential of the co-processors by implementing matrix algebra kernels for QR decomposition, LU decomposition, matrix determinant and matrix inverse, that run on a single co-processor. The kernels will then be evaluated to find possible problems with the co-processors' microarchitecture and suggest solutions to the problems that might exist. The evaluation shows that the performance potential is very good, but a few problems have been identified, that causes significant overhead in the kernels. Pipeline mismatches, that occurs due to different pipeline lengths for different instructions, causes pipeline hazards and the current solution to this, doesn't allow effective use of the pipeline. In some cases, the single port memories will cause bottlenecks, but the thesis suggests that the situation could be greatly improved by using buffered memory write-back. Also, the lack of register forwarding makes kernels with many data dependencies run unnecessarily slow.</p>

Analyzing hybrid architectures for massively parallel graph analysis

Ediger, David 08 April 2013 (has links)
The quantity of rich, semi-structured data generated by sensor networks, scientific simulation, business activity, and the Internet grows daily. The objective of this research is to investigate architectural requirements for emerging applications in massive graph analysis. Using emerging hybrid systems, we will map applications to architectures and close the loop between software and hardware design in this application space. Parallel algorithms and specialized machine architectures are necessary to handle the immense size and rate of change of today's graph data. To highlight the impact of this work, we describe a number of relevant application areas ranging from biology to business and cybersecurity. With several proposed architectures for massively parallel graph analysis, we investigate the interplay of hardware, algorithm, data, and programming model through real-world experiments and simulations. We demonstrate techniques for obtaining parallel scaling on multithreaded systems using graph algorithms that are orders of magnitude faster and larger than the state of the art. The outcome of this work is a proposed hybrid architecture for massive-scale analytics that leverages key aspects of data-parallel and highly multithreaded systems. In simulations, the hybrid systems incorporating a mix of multithreaded, shared memory systems and solid state disks performed up to twice as fast as either homogeneous system alone on graphs with as many as 18 trillion edges.

Algorithms for large graphs

Das Sarma, Atish 01 July 2010 (has links)
No description available.

Objektų sekimo vaizde algoritmų įgyvendinimo LPLM įrenginiu tyrimas / Investigation of Object Tracking Algorithms Based on FPGA

Sledevič, Tomyslav 26 July 2012 (has links)
Magistro baigiamojo darbo tikslas – įgyvendinti realiuoju laiku veikiančius objektų sekimo vaizde algoritmus lauku programuojamų loginių matricų įrenginyje (LPLM) ir ištirti šių algoritmų veikimą. Iškelti uždaviniai pasiekti 3 etapais. Atlikta analitinė objektų sekimo vaizde literatūros apžvalga, išanalizuoti objektų sekimo vaizde algoritmai bei jų įgyvendinimo galimybės LPLM įrenginiuose. Sukurti algoritmai ir programos įgyvendintos viename ir keliuose LPLM įrenginiuose (sinchroniškai) taikant VHDL programavimo kalbą ir veikia realiu laiku. Atlikti sukurtų algoritmų tyrimai ir gautų rezultatų analizė. Ištirtas objektų sekimo stabilumas keičiant apšviestumo lygį, fono sudėtingumą, objekto spalvą, judesio greitį, atstumą iki kameros ir posūkio kampą. Darbo apimtis – 69 psl. teksto be priedų, 72 iliustr., 70 bibliografinių šaltinių, 3 priedai. / The aim of master’s thesis is to investigate the object tracking methods and implement the object tracking algorithms in field programmable gate array (FPGA) devices for real-time execution. The aim is achieved by performing 3 tasks. The analytical review of object tracking methods is performed, reviewing the abilities of algorithms implementation on FPGAs. The object tracking algorithms are implemented in VHDL and distributed on one and few FPGA chips in parallel and works in real-time. The implemented algorithms are investigated and results are analyzed. The stability of different object tracking is investigated by changing the illumination, background complexity, object color, moving velocity, distance to camera and rotation angle. Thesis consists of: 69 p. text without appendixes, 72 figures, 70 bibliographical entries, 3 appendixes included.

A Complexity Theory for VLSI

Thompson, C. D. 01 August 1980 (has links)
The established methodologies for studying computational complexity can be applied to the new problems posed by very large-scale integrated (VLSI) circuits. This thesis develops a ''VLSI model of computation'' and derives upper and lower bounds on the silicon area and time required to solve the problems of sorting and discrete Fourier transformation. In particular, the area A and time T taken by any VLSI chip using any algorithm to perform an N-point Fourier transform must satisfy AT2 ≥ c N2 log2 N, for some fixed c > 0. A more general result for both sorting and Fourier transformation is that AT2x = Ω(N1 + x log2x N) for any x in the range 0 < x < 1. Also, the energy dissipated by a VLSI chip during the solution of either of these problems is at least Ω(N3/2 log N). The tightness of these bounds is demonstrated by the existence of nearly optimal circuits for both sorting and Fourier transformation. The circuits based on the shuffle-exchange interconnection pattern are fast but large: T = O(log2 N) for Fourier transformation, T = O(log3 N) for sorting; both have area A of at most O(N2 / log1/2 N). The circuits based on the mesh interconnection pattern are slow but small: T = O(N1/2 loglog N), A = O(N log2 N).

High performance computing for irregular algorithms and applications with an emphasis on big data analytics

Green, Oded 22 May 2014 (has links)
Irregular algorithms such as graph algorithms, sorting, and sparse matrix multiplication, present numerous programming challenges, including scalability, load balancing, and efficient memory utilization. In this age of Big Data we face additional challenges since the data is often streaming at a high velocity and we wish to make near real-time decisions for real-world events. For instance, we may wish to track Twitter for the pandemic spread of a virus. Analyzing such data sets requires combing algorithmic optimizations and utilization of massively multithreaded architectures, accelerator such as GPUs, and distributed systems. My research focuses upon designing new analytics and algorithms for the continuous monitoring of dynamic social networks. Achieving high performance computing for irregular algorithms such as Social Network Analysis (SNA) is challenging as the instruction flow is highly data dependent and requires domain expertise. The rapid changes in the underlying network necessitates understanding real-world graph properties such as the small world property, shrinking network diameter, power law distribution of edges, and the rate at which updates occur. These properties, with respect to a given analytic, can help design load-balancing techniques, avoid wasteful (redundant) computations, and create streaming algorithms. In the course of my research I have considered several parallel programming paradigms for a wide range systems of multithreaded platforms: x86, NVIDIA's CUDA, Cray XMT2, SSE-SIMD, and Plurality's HyperCore. These unique programming models require examination of the parallel programming at multiple levels: algorithmic design, cache efficiency, fine-grain parallelism, memory bandwidths, data management, load balancing, scheduling, control flow models and more. This thesis deals with these issues and more.

Fast Algorithms for Mining Co-evolving Time Series

Li, Lei 01 September 2011 (has links)
Time series data arise in many applications, from motion capture, environmental monitoring, temperatures in data centers, to physiological signals in health care. In the thesis, I will focus on the theme of learning and mining large collections of co-evolving sequences, with the goal of developing fast algorithms for finding patterns, summarization, and anomalies. In particular, this thesis will answer the following recurring challenges for time series: 1. Forecasting and imputation: How to do forecasting and to recover missing values in time series data? 2. Pattern discovery and summarization: How to identify the patterns in the time sequences that would facilitate further mining tasks such as compression, segmentation and anomaly detection? 3. Similarity and feature extraction: How to extract compact and meaningful features from multiple co-evolving sequences that will enable better clustering and similarity queries of time series? 4. Scale up: How to handle large data sets on modern computing hardware? We develop models to mine time series with missing values, to extract compact representation from time sequences, to segment the sequences, and to do forecasting. For large scale data, we propose algorithms for learning time series models, in particular, including Linear Dynamical Systems (LDS) and Hidden Markov Models (HMM). We also develop a distributed algorithm for finding patterns in large web-click streams. Our thesis will present special models and algorithms that incorporate domain knowledge. For motion capture, we will describe the natural motion stitching and occlusion filling for human motion. In particular, we provide a metric for evaluating the naturalness of motion stitching, based which we choose the best stitching. Thanks to domain knowledge (body structure and bone lengths), our algorithm is capable of recovering occlusions in mocap sequences, better in accuracy and longer in missing period. We also develop an algorithm for forecasting thermal conditions in a warehouse-sized data center. The forecast will help us control and manage the data center in a energy-efficient way, which can save a significant percentage of electric power consumption in data centers.

Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel / Análise de sincronizações em execuções por escalonamento guloso e aplicações para geração eficiente de números pseudoaleatórios em paralelo / Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail applications à la génération déterministe de nombres pseudo-aléatoires

Mor, Stefano Drimon Kurz January 2015 (has links)
Nous présentons deux contributions dans le domaine de la programmation parallèle. La première est théorique : nous introduisons l’analyse SIPS, une approche nouvelle pour dénombrer le nombre d’opérations de synchronisation durant l’exécution d’un algorithme parallèle ordonnancé par vol de travail. Basée sur le concept d’horloges logiques, elle nous permet : d’une part de donner de nouvelles majorations de coût en moyenne; d’autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité. La seconde contribution est pragmatique : nous présentons une parallélisation générique d’algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l’exécution. Alternative à l’utilisation d’un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS. Sa caractéristique principale est d’exploiter un générateur séquentiel qui peut “sauter” directement d’un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire. Grâce à l’analyse SIPS, nous montrons qu’en moyenne, lors d’une exécution par vol de travail d’un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d’opérations), ces opérations de saut sont rares. Par-R est comparé au générateur pseudo-aléatoire DotMix écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail. Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement. De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent. / Nós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório. / We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.

Analysis of synchronizations in greedy-scheduled executions and applications to efficient generation of pseudorandom numbers in parallel / Análise de sincronizações em execuções por escalonamento guloso e aplicações para geração eficiente de números pseudoaleatórios em paralelo / Analyse des synchronisations dans un programme parallèle ordonnancé par vol de travail applications à la génération déterministe de nombres pseudo-aléatoires

Mor, Stefano Drimon Kurz January 2015 (has links)
Nous présentons deux contributions dans le domaine de la programmation parallèle. La première est théorique : nous introduisons l’analyse SIPS, une approche nouvelle pour dénombrer le nombre d’opérations de synchronisation durant l’exécution d’un algorithme parallèle ordonnancé par vol de travail. Basée sur le concept d’horloges logiques, elle nous permet : d’une part de donner de nouvelles majorations de coût en moyenne; d’autre part de concevoir des programmes parallèles plus efficaces par adaptation dynamique de la granularité. La seconde contribution est pragmatique : nous présentons une parallélisation générique d’algorithmes pour la génération déterministe de nombres pseudo-aléatoires, indépendamment du nombre de processus concurrents lors de l’exécution. Alternative à l’utilisation d’un générateur pseudo-aléatoire séquentiel par processus, nous introduisons une API générique, appelée Par-R qui est conçue et analysée grâce à SIPS. Sa caractéristique principale est d’exploiter un générateur séquentiel qui peut “sauter” directement d’un nombre à un autre situé à une distance arbitraire dans la séquence pseudo-aléatoire. Grâce à l’analyse SIPS, nous montrons qu’en moyenne, lors d’une exécution par vol de travail d’un programme très parallèle (dont la profondeur ou chemin critique est très petite devant le travail ou nombre d’opérations), ces opérations de saut sont rares. Par-R est comparé au générateur pseudo-aléatoire DotMix écrit pour Cilk Plus, une extension de C/C++ pour la programmation parallèle par vol de travail. Le surcout théorique de Par-R se compare favorablement au surcoput de DotMix, ce qui apparait aussi expériemntalement. De plus, étant générique, Par-R est indépendant du générateur séquentiel sous-jacent. / Nós apresentamos duas contribuições para a área de programação paralela. A primeira contribuição é teórica: nós introduzimos a análise SIPS, uma nova abordagem para a estimar o número de sincronizações realizadas durante a execução de um algoritmo paralelo. SIPS generaliza o conceito de relógios lógicos para contar o número de sincronizações realizadas por um algoritmo paralelo e é capaz de calcular limites do pior caso mesmo na presença de execuções paralelas não-determinísticas, as quais não são geralmente cobertas por análises no estado-da-arte. Nossa análise nos permite estimar novos limites de pior caso para computações escalonadas pelo popular algoritmo de roubo de tarefas e também projetar programas paralelos e adaptáveis que são mais eficientes. A segunda contribuição é pragmática: nós apresentamos uma estratégia de paralelização eficiente para a geração de números pseudoaleatórios. Como uma alternativa para implementações fixas de componentes de geração aleatória nós introduzimos uma API chamada Par-R, projetada e analisada utilizando-se SIPS. Sua principal idea é o uso da capacidade de um gerador sequencial R de realizar um “pulo” eficiente dentro do fluxo de números gerados; nós os associamos a operações realizadas pelo escalonador por roubo de tarefas, o qual nossa análise baseada em SIPS demonstra ocorrer raramente em média. Par-R é comparado com o gerador paralelo de números pseudoaleatórios DotMix, escrito para a plataforma de multithreading dinâmico Cilk Plus. A latência de Par-R tem comparação favorável à latência do DotMix, o que é confirmado experimentalmente, mas não requer o uso subjacente fixado de um dado gerador aleatório. / We present two contributions to the field of parallel programming. The first contribution is theoretical: we introduce SIPS analysis, a novel approach to estimate the number of synchronizations performed during the execution of a parallel algorithm. Based on the concept of logical clocks, it allows us: on one hand, to deliver new bounds for the number of synchronizations, in expectation; on the other hand, to design more efficient parallel programs by dynamic adaptation of the granularity. The second contribution is pragmatic: we present an efficient parallelization strategy for pseudorandom number generation, independent of the number of concurrent processes participating in a computation. As an alternative to the use of one sequential generator per process, we introduce a generic API called Par-R, which is designed and analyzed using SIPS. Its main characteristic is the use of a sequential generator that can perform a “jump-ahead” directly from one number to another on an arbitrary distance within the pseudorandom sequence. Thanks to SIPS, we show that, in expectation, within an execution scheduled by work stealing of a “very parallel” program (whose depth or critical path is subtle when compared to the work or number of operations), these operations are rare. Par-R is compared with the parallel pseudorandom number generator DotMix, written for the Cilk Plus dynamic multithreading platform. The theoretical overhead of Par-R compares favorably to DotMix’s overhead, what is confirmed experimentally, while not requiring a fixed generator underneath.

Page generated in 0.0634 seconds