Global ETD Search

11	Fast Split Arithmetic Encoder Architectures and Perceptual Coding Methods for Enhanced JPEG2000 Performance Varma, Krishnaraj M. 11 April 2006 (has links) JPEG2000 is a wavelet transform based image compression and coding standard. It provides superior rate-distortion performance when compared to the previous JPEG standard. In addition JPEG2000 provides four dimensions of scalability-distortion, resolution, spatial, and color. These superior features make JPEG2000 ideal for use in power and bandwidth limited mobile applications like urban search and rescue. Such applications require a fast, low power JPEG2000 encoder to be embedded on the mobile agent. This embedded encoder needs to also provide superior subjective quality to low bitrate images. This research addresses these two aspects of enhancing the performance of JPEG2000 encoders. The JPEG2000 standard includes a perceptual weighting method based on the contrast sensitivity function (CSF). Recent literature shows that perceptual methods based on subband standard deviation are also effective in image compression. This research presents two new perceptual weighting methods that combine information from both the human contrast sensitivity function as well as the standard deviation within a subband or code-block. These two new sets of perceptual weights are compared to the JPEG2000 CSF weights. The results indicate that our new weights performed better than the JPEG2000 CSF weights for high frequency images. Weights based solely on subband standard deviation are shown to perform worse than JPEG2000 CSF weights for all images at all compression ratios. Embedded block coding, EBCOT tier-1, is the most computationally intensive part of the JPEG2000 image coding standard. Past research on fast EBCOT tier-1 hardware implementations has concentrated on cycle efficient context formation. These pass-parallel architectures require that JPEG2000's three mode switches be turned on. While turning on the mode switches allows for arithmetic encoding from each coding pass to run independent of each other (and thus in parallel), it also disrupts the probability estimation engine of the arithmetic encoder, thus sacrificing coding efficiency for improved throughput. In this research a new fast EBCOT tier-1 design is presented: it is called the Split Arithmetic Encoder (SAE) process. The proposed process exploits concurrency to obtain improved throughput while preserving coding efficiency. The SAE process is evaluated using three methods: clock cycle estimation, multithreaded software implementation, a field programmable gate array (FPGA) hardware implementation. All three methods achieve throughput improvement; the hardware implementation exhibits the largest speedup, as expected. A high speed, task-parallel, multithreaded, software architecture for EBCOT tier-1 based on the SAE process is proposed. SAE was implemented in software on two shared-memory architectures: a PC using hyperthreading and a multi-processor non-uniform memory access (NUMA) machine. The implementation adopts appropriate synchronization mechanisms that preserve the algorithm's causality constraints. Tests show that the new architecture is capable of improving throughput as much as 50% on the NUMA machine and as much as 19% on a PC with two virtual processing units. A high speed, multirate, FPGA implementation of the SAE process is also proposed. The mismatch between the rate of production of data by the context formation (CF) module and the rate of consumption of data by the arithmetic encoder (AE) module is studied in detail. Appropriate choices for FIFO sizes and FIFO write and read capabilities are made based on the statistics obtained from test runs of the algorithm. Using a fast CF module, this implementation was able to achieve as much as 120% improvement in throughput. / Ph. D. FPGA hardware Multithreaded software Split Arithmetic Encoder EBCOT JPEG2000 Perceptual weighting
12	Entwicklung des Kommunikationsteilsystems für ein objektorientiertes, verteiltes Betriebssystem Becher, Mike 09 November 1998 (has links) Thema dieser Arbeit ist die Entwicklung eines Kommunikationsteilsystems fuer das Experimentiersystem CHEOPS zur Ermoeglichung einer Interobjektkommunika- tion zwischen Objekten auf dem gleichen bzw. verschiedenen Systemen. Ausgangspunkte stellen dabei eine verfuegbare Implementation eines Ethernet- Treibers der Kartenfamilie WD80x3 fuer MS-DOS, eine geforderte Kommunikations- moeglichkeit mit UNIX-Prozessen sowie die dort benutzbaren Protokoll-Familien dar. Die Arbeit beschaeftigt sich mit der Analyse und Konzipierung des Ethernet- Treibers sowie der Internet-Protokoll-Familie fuer CHEOPS als auch deren Implementation resultierend in einem minimalen Grundsystem. Weiterhin wird ein erster Entwurf fuer ein spaeter weiterzuentwickelndes bzw. zu vervoll- staendigendes Netz-Interface vorgeschlagen und durch eine Beispiel-Implemen- tierung belegt. ARP; Address Resolution Protocol; CHEOPS; Ethernet Device; Ethernet Treiber; Ethernet; ICMP; IP-Defragmentier-Algorithmus; IP-Routing-Algorithmus; IP; IPv4; Internet Control Message Protocol; Internet Protocol Version 4; Internet Protocol; Internet Protokoll; Interobjektkommunikation; Loopback Device; Loopback Treiber; Loopback; Massage basiertes Internet Protokoll; Message basierte Kommunikation; Message; Multi-Threaded ARP; Multi-Threaded Ethernet Treiber; Multi-Threaded ICMP; Multi-Threaded IP; Multi-Threaded Loopback Treiber; Multi-Threaded UDP; OOP; Objektorientierte Programmierung; Objektorientierter Ethernet Treiber; Objektorientierter Loopback Treiber; Objektorientiertes ARP; Objektorientiertes ICMP; Objektorientiertes IP; Objektorientiertes Netzwerk-Interface; Objektorientiertes UDP; UDP; User Datagram Protocol; multithreaded ARP; multithreaded ICMP; multithreaded IP; multithreaded UDP; multithreaded ethernet device driver; multithreaded loopback device driver; ddc:004
13	Micro-Network Processor : A Processor Architecture for Implementing NoC Routers Martin Rovira, Julia, Manuel Fructoso Melero, Francisco January 2007 (has links) <p>Routers are probably the most important component of a NoC, as the performance of the whole network is driven by the routers’ performance. Cost for the whole network in terms of area will also be minimised if the router design is kept small. A new application specific processor architecture for implementing NoC routers is proposed in this master thesis, which will be called µNP (Micro-Network Processor). The aim is to offer a solution in which there is a trade-off between the high performance of routers implemented in hardware and the high level of flexibility that could be achieved by loading a software that routed packets into a GPP. Therefore, a study including the design of a hardware based router and a GPP based router has been conducted. In this project the first version of the µNP has been designed and a complete instruction set, along with some sample programs, is also proposed. The results show that, in the best case for all implementation options, µNP was 7.5 times slower than the hardware based router. It has also behaved more than 100 times faster than the GPP based router, keeping almost the same degree of flexibility for routing purposes within NoC.</p> System on Chip NoC Router Packet Switching, Network Processor Application Specific Processor, Multithreaded Processor Performance Evaluation Electronics Elektronik
14	Micro-Network Processor : A Processor Architecture for Implementing NoC Routers Martin Rovira, Julia, Manuel Fructoso Melero, Francisco January 2007 (has links) Routers are probably the most important component of a NoC, as the performance of the whole network is driven by the routers’ performance. Cost for the whole network in terms of area will also be minimised if the router design is kept small. A new application specific processor architecture for implementing NoC routers is proposed in this master thesis, which will be called µNP (Micro-Network Processor). The aim is to offer a solution in which there is a trade-off between the high performance of routers implemented in hardware and the high level of flexibility that could be achieved by loading a software that routed packets into a GPP. Therefore, a study including the design of a hardware based router and a GPP based router has been conducted. In this project the first version of the µNP has been designed and a complete instruction set, along with some sample programs, is also proposed. The results show that, in the best case for all implementation options, µNP was 7.5 times slower than the hardware based router. It has also behaved more than 100 times faster than the GPP based router, keeping almost the same degree of flexibility for routing purposes within NoC. System on Chip NoC Router Packet Switching, Network Processor Application Specific Processor, Multithreaded Processor Performance Evaluation Electronics Elektronik
15	Study and design of a manycore architecture with multithreaded processors for dynamic embedded applications / Etude et mise en œuvre d’une architecture multiprocesseur constituée de ressources de calculs multitâches pour les systèmes embarqués Bechara, Charly 08 December 2011 (has links) Les systèmes embarqués sont de plus en plus complexes et requièrent des besoins en puissance de calcul toujours plus importants. Ils doivent être capables de s'adapter à l'évolution rapide de leurs applications qui requièrent un haut niveau de performance (ordre du TOPS: Téra-opérations par seconde) et de parallélisme. Par ailleurs, la complexité des parties irrégulières étant de plus en plus importantes, des solutions de calcul performantes et adaptées doivent être mises en œuvre afin de prendre en compte leur dynamisme. Une prise en compte efficace du dynamisme réduit le déséquilibre de charge entre les ressources de calcul et améliore grandement les performances globales.Pour répondre aux besoins de ces applications de calcul intensif massivement parallèle et dynamique, nous proposons dans cette thèse l’architecture AHDAM qui signifie « Asymmetric Homogeneous with Dynamic Allocator Manycore architecture ». Cette architecture a été conçue afin de masquer efficacement la latence d’accès à la mémoire extérieure dont de nombreux accès sont nécessaires lors de la manipulation de grands volumes de données. Pour cela, des processeurs multitâches ont été utilisés. Par ailleurs, l’architecture AHDAM imbrique plusieurs niveaux de parallélisme afin de tirer partie efficacement des différentes formes de parallélisme des applications, et ainsi atteindre un haut niveau de performance. Enfin, cette architecture utilise un contrôleur centralisé pour équilibrer la charge de calcul entre ses ressources de calcul afin d’augmenter leur taux d’utilisation et supporter les applications fortement dynamiques.L’architecture AHDAM a été évaluée en portant une application de radio logicielle appelée «spectrum radio-sensing ». Avec 136 cœurs cadencés à 500 MHz, l'architecture AHDAM atteint une performance crête de 196 GOPS et répond aux exigences de l'application. / Embedded systems are getting more complex and require more intensive processing capabilities. They must be able to adapt to the rapid evolution of the high-end embedded applications that are characterized by their high computation-intensive workloads (order of TOPS: Tera Operations Per Second), and their high level of parallelism. Moreover, since the dynamism of the applications is becoming more significant, powerful computing solutions should be designed accordingly. By exploiting efficiently the dynamism, the load will be balanced between the computing resources, which will improve greatly the overall performance. To tackle the challenges of these future high-end massively-parallel dynamic embedded applications, we have designed the AHDAM architecture, which stands for “Asymmetric Homogeneous with Dynamic Allocator Manycore architecture". Its architecture permits to process applications with large data sets by efficiently hiding the processors' stall time using multithreaded processors. Besides, it exploits the parallelism of the applications at multiple levels so that they would be accelerated efficiently on dedicated resources, hence improving efficiently the overall performance. AHDAM architecture tackles the dynamism of these applications by dynamically balancing the load between its computing resources using a central controller to increase their utilization rate.The AHDAM architecture has been evaluated using a relevant embedded application from the telecommunication domain called “spectrum radio-sensing”. With 136 cores running at 500 MHz, AHDAM architecture reaches a peak performance of 196 GOPS and meets the computation requirements of the application. Multicoeur MPSoC Processeurs multitâches Systèmes embarqués Applications dynamiques Simulation Multicore MPSoC Multithreaded processors Embedded systems Dynamic applications Simulation
16	Entwicklung des Kommunikationsteilsystems für ein objektorientiertes, verteiltes Betriebssystem 09 November 1998 (has links) Thema dieser Arbeit ist die Entwicklung eines Kommunikationsteilsystems fuer das Experimentiersystem CHEOPS zur Ermoeglichung einer Interobjektkommunika- tion zwischen Objekten auf dem gleichen bzw. verschiedenen Systemen. Ausgangspunkte stellen dabei eine verfuegbare Implementation eines Ethernet- Treibers der Kartenfamilie WD80x3 fuer MS-DOS, eine geforderte Kommunikations- moeglichkeit mit UNIX-Prozessen sowie die dort benutzbaren Protokoll-Familien dar. Die Arbeit beschaeftigt sich mit der Analyse und Konzipierung des Ethernet- Treibers sowie der Internet-Protokoll-Familie fuer CHEOPS als auch deren Implementation resultierend in einem minimalen Grundsystem. Weiterhin wird ein erster Entwurf fuer ein spaeter weiterzuentwickelndes bzw. zu vervoll- staendigendes Netz-Interface vorgeschlagen und durch eine Beispiel-Implemen- tierung belegt. info:eu-repo/classification/ddc/004 ddc:004 ARP; Address Resolution Protocol; CHEOPS; Ethernet Device; Ethernet Treiber; Ethernet; ICMP; IP-Defragmentier-Algorithmus; IP-Routing-Algorithmus; IP; IPv4; Internet Control Message Protocol; Internet Protocol Version 4; Internet Protocol; Internet Protokoll; Interobjektkommunikation; Loopback Device; Loopback Treiber; Loopback; Massage basiertes Internet Protokoll; Message basierte Kommunikation; Message; Multi-Threaded ARP; Multi-Threaded Ethernet Treiber; Multi-Threaded ICMP; Multi-Threaded IP; Multi-Threaded Loopback Treiber; Multi-Threaded UDP; OOP; Objektorientierte Programmierung; Objektorientierter Ethernet Treiber; Objektorientierter Loopback Treiber; Objektorientiertes ARP; Objektorientiertes ICMP; Objektorientiertes IP; Objektorientiertes Netzwerk-Interface; Objektorientiertes UDP; UDP; User Datagram Protocol; multithreaded ARP; multithreaded ICMP; multithreaded IP; multithreaded UDP; multithreaded ethernet device driver; multithreaded loopback device driver;
17	Vérification à l'exécution de spécifications décentralisées hiérarchiques / Runtime Verification of Hierarchical Decentralized Specifications El hokayem, Antoine 18 December 2018 (has links) La vérification à l’exécution est une méthode formelle légère qui consiste à vérifier qu’une exécution d’un système est correcte par rapport à une spécification. La spécification exprime de manière rigoureuse le comportement attendu du système, en utilisant généralement des formalismes basés sur la logique ou les machines à états finies. Alors que la verification a l’éxecution traite les systèmes monolithiques de manière exhaustive, plusieurs difficultés se présentent lors de l’application des techniques existantes à des systèmes décentralisés, c-à-d. des systèmes avec plusieurs composants sans point d’observation central. Dans cette thèse, nous nous concentrons particulièrement sur trois problèmes : la gestion de l’information partielle, la séparation du déploiement des moniteurs du processus de vérification lui-même et le raisonnement sur la décentralisation de manière modulaire et hiérarchique. Nous nous concentrons sur la notion de spécification décentralisée dans laquelle plusieurs spécifications sont fournies pour des parties distinctes du système. Utiliser une spécification décentralisée a divers avantages tels que permettre une synthèse de moniteurs à partir des spécifications complexes et la possibilité de modulariser les spécifications. Nous présentons également un algorithme de vérification général pour les spécifications décentralisées et une structure de données pour représenter l’exécution d’un automate avec observations partielles. Nous développons l’outil THEMIS, qui fournit une plateforme pour concevoir des algorithmes de vérification décentralisée, des mesures pour les algorithmes, une simulation et des expérimentations reproductibles pour mieux comprendre les algorithmes.Nous illustrons notre approche avec diverses applications. Premièrement, nous utilisons des spécifications décentralisées pour munir une analyse de pire cas, adapter, comparer et simuler trois algorithmes de vérification décentralisée existants dans deux scénarios: l’interface graphique Chiron, et des traces et spécifications générées aléatoirement. Deuxièmement, nous utilisons des spécifications décentralisées pour vérifier diverses propriétés dans un appartement intelligent: correction du comportement des capteurs de l’appartement, détection d’activité spécifiques de l’utilisateur (Activities of Daily Living, ADL) et composition de spécifications des deux catégories précédentes.En outre, nous élaborons sur l’utilisation de spécifications décentralisées pour la vérification décentralisée pendant l’exécution de programmes parallélisés. Nous commençons par discuter les limitations des approches et des outils existants lorsque les difficultés introduites par le parallélisme sont rencontrées. Nous détaillons la description de zones de parallélisme d’une unique exécution d’un programme et décrivons une approche générale qui permet de réutiliser des techniques de verification à l’éxécution existantes. Dans notre configuration, les moniteurs sont déployés dans des fils d’exécution spécifiques et échangent de l’information uniquement lorsque des points de synchronisation définis par le programme lui-même sont atteints. En utilisant les points de synchronisation existants, notre approche réduit les interférences et surcoûts résultant de la synchronisation, au prix d’un retard pour déterminer le verdict. / Runtime Verification (RV) is a lightweight formal method which consists in verifying that a run of a system is correct with respect to a specification. The specification formalizes the behavior of the system typically using logics or finite-state machines. While RV comprehensively deals with monolithic systems, multiple challenges are presented when scaling existing approaches to decentralized systems, that is, systems with multiple components with no central observation point. We focus particularly on three challenges: managing partial information, separating monitor deployment from the monitoring process itself, and reasoning about decentralization in a modular and hierarchical way. We present the notion of a decentralized specification wherein multiple specifications are provided for separate parts of the system. Decentralized specifications provide various advantages such as modularity, and allowing for realistic monitor synthesis of the specifications. We also present a general monitoring algorithm for decentralized specifications, and a general datastructure to encode automata execution with partial observations. We develop the THEMIS tool, which provides a platform for designing decentralized monitoring algorithms, metrics for algorithms, and simulation to better understand the algorithms, and design reproducible experiments.We illustrate the approach with two applications. First, we use decentralized specifications to perform a worst-case analysis, adapt, compare, and simulate three existing decentralized monitoring algorithms on both a real example of a user interface, and randomly generated traces and specifications. Second, we use decentralized specifications to check various specifications in a smart apartment: behavioral correctness of the apartment sensors, detection of specific user activities (known as activities of daily living), and composition of properties of the previous types.Furthermore, we elaborate on utilizing decentralized specifications for the decentralized online monitoring of multithreadedprograms. We first expand on the limitations of existing tools and approaches when meeting the challenges introduced by concurrency and ensure that concurrency needs to be taken into account by considering partial orders in traces. We detail the description of such concurrency areas in a single program execution, and provide a general approach which allows re-using existing RV techniques. In our setting, monitors are deployed within specific threads, and only exchange information upon reaching synchronization regions defined by the program itself. By using the existing synchronization, we reduce additional overhead and interference to synchronize at the cost of adding a delay to determine the verdict. Vérification à l’exécution Monitoring décentralisé Spécifications décentralisées Programmes parallèles Méthodes formelles Habitats intelligents Decentralized monitoring Decentralized specifications Runtime verification Multithreaded programs Formal methods Smart homes 004
18	Analyzing hybrid architectures for massively parallel graph analysis Ediger, David 08 April 2013 (has links) The quantity of rich, semi-structured data generated by sensor networks, scientific simulation, business activity, and the Internet grows daily. The objective of this research is to investigate architectural requirements for emerging applications in massive graph analysis. Using emerging hybrid systems, we will map applications to architectures and close the loop between software and hardware design in this application space. Parallel algorithms and specialized machine architectures are necessary to handle the immense size and rate of change of today's graph data. To highlight the impact of this work, we describe a number of relevant application areas ranging from biology to business and cybersecurity. With several proposed architectures for massively parallel graph analysis, we investigate the interplay of hardware, algorithm, data, and programming model through real-world experiments and simulations. We demonstrate techniques for obtaining parallel scaling on multithreaded systems using graph algorithms that are orders of magnitude faster and larger than the state of the art. The outcome of this work is a proposed hybrid architecture for massive-scale analytics that leverages key aspects of data-parallel and highly multithreaded systems. In simulations, the hybrid systems incorporating a mix of multithreaded, shared memory systems and solid state disks performed up to twice as fast as either homogeneous system alone on graphs with as many as 18 trillion edges. Data intensive computing Computer architectures Cray XMT Streaming graph algorithms Multithreaded graph algorithms Computer algorithms Graph algorithms Parallel algorithms
19	Supporting Selective Formalism in CSP++ with Process-Specific Storage Gumtie, Alicia 14 September 2012 (has links) Communicating Sequential Processes (CSP) is a formal language whose primary purpose is to model and verify concurrent systems. The CSP++ toolset was created to embody the concept of selective formalism by making machine-readable CSPm specifications both executable (through the automatic synthesis of C++ source) and extensible (by allowing the integration of C++ user-coded functions). However, these user-coded functions were limited by their inability to share data with each other, which meant that their application was constrained to solving simple problems in isolation. We extend CSP++ by providing user-coded functions in the same CSP process with safe access to a shared storage area, similar in concept and API to Pthreads' thread-local storage, enabling cooperation between them and granting them the ability to undertake more complex tasks without breaking the formalism of the underlying specification. This feature's utility is demonstrated in our line-following robot case study. CSP CSP++ selective formalism code generation formal methods code synthesis concurrency software development multithreaded applications thread-local storage line following threading robots
20	A high-performance framework for analyzing massive complex networks Madduri, Kamesh 08 July 2008 (has links) Graphs are a fundamental and widely-used abstraction for representing data. We can analytically study interesting aspects of real-world complex systems such as the Internet, social systems, transportation networks, and biological interaction data by modeling them as graphs. Graph-theoretic and combinatorial problems are also pervasive in scientific computing and engineering applications. In this dissertation, we address the problem of analyzing large-scale complex networks that represent interactions between hundreds of thousands to billions of entities. We present SNAP, a new high-performance computational framework for efficiently processing graph-theoretic queries on massive datasets. Graph analysis is computationally very different from traditional scientific computing, and solving massive graph-theoretic problems on current high performance computing systems is challenging due to several reasons. First, real-world graphs are often characterized by a low diameter and unbalanced degree distributions, and are difficult to partition on parallel systems. Second, parallel algorithms for solving graph-theoretic problems are typically memory intensive, and the memory accesses are fine-grained and highly irregular. The primary contributions of this dissertation are the design and implementation of novel parallel graph algorithms for traversal, shortest paths, and centrality computations, optimized for the small-world network topology, and high-performance multithreaded architectures and multicore servers. SNAP (Small-world Network Analysis and Partitioning) is a modular, open-source framework for the exploratory analysis and partitioning of large-scale networks. With SNAP, we demonstrate the capability to process massive graphs with billions of vertices and edges, and achieve up to two orders of magnitude speedup over state-of-the-art network analysis approaches. We also design a new parallel computing benchmark for characterizing the performance of graph-theoretic problems on high-end systems; study data representations for dynamic graph problems on parallel systems; and apply algorithms in SNAP to solve real-world problems in social network analysis and systems biology. Parallel computing Graph algorithms Multithreaded algorithms Complex networks Graph analysis framework Graph theory Data processing Network analysis (Planning) Combinatorial analysis Graph algorithms

Search results