Global ETD Search

1	Client-side data caching in mobile computing environments / Xu, Jianliang. January 2002 (has links) Thesis (Ph. D.)--Hong Kong University of Science and Technology, 2002. / Includes bibliographical references (leaves 146-158). Also available in electronic version. Access restricted to campus users. Mobile computing. Cache memory.
2	Spiking Neural Network with Memristive Based Computing-In-Memory Circuits and Architecture Nowshin, Fabiha January 2021 (has links) In recent years neuromorphic computing systems have achieved a lot of success due to its ability to process data much faster and using much less power compared to traditional Von Neumann computing architectures. There are two main types of Artificial Neural Networks (ANNs), Feedforward Neural Network (FNN) and Recurrent Neural Network (RNN). In this thesis we first study the types of RNNs and then move on to Spiking Neural Networks (SNNs). SNNs are an improved version of ANNs that mimic biological neurons closely through the emission of spikes. This shows significant advantages in terms of power and energy when carrying out data intensive applications by allowing spatio-temporal information processing. On the other hand, emerging non-volatile memory (eNVM) technology is key to emulate neurons and synapses for in-memory computations for neuromorphic hardware. A particular eNVM technology, memristors, have received wide attention due to their scalability, compatibility with CMOS technology and low power consumption properties. In this work we develop a spiking neural network by incorporating an inter-spike interval encoding scheme to convert the incoming input signal to spikes and use a memristive crossbar to carry out in-memory computing operations. We develop a novel input and output processing engine for our network and demonstrate the spatio-temporal information processing capability. We demonstrate an accuracy of a 100% with our design through a small-scale hardware simulation for digit recognition and demonstrate an accuracy of 87% in software through MNIST simulations. / M.S. / In recent years neuromorphic computing systems have achieved a lot of success due to its ability to process data much faster and using much less power compared to traditional Von Neumann computing architectures. Artificial Neural Networks (ANNs) are models that mimic biological neurons where artificial neurons or neurodes are connected together via synapses, similar to the nervous system in the human body. here are two main types of Artificial Neural Networks (ANNs), Feedforward Neural Network (FNN) and Recurrent Neural Network (RNN). In this thesis we first study the types of RNNs and then move on to Spiking Neural Networks (SNNs). SNNs are an improved version of ANNs that mimic biological neurons closely through the emission of spikes. This shows significant advantages in terms of power and energy when carrying out data intensive applications by allowing spatio-temporal information processing capability. On the other hand, emerging non-volatile memory (eNVM) technology is key to emulate neurons and synapses for in-memory computations for neuromorphic hardware. A particular eNVM technology, memristors, have received wide attention due to their scalability, compatibility with CMOS technology and low power consumption properties. In this work we develop a spiking neural network by incorporating an inter-spike interval encoding scheme to convert the incoming input signal to spikes and use a memristive crossbar to carry out in-memory computing operations. We demonstrate the accuracy of our design through a small-scale hardware simulation for digit recognition and demonstrate an accuracy of 87% in software through MNIST simulations. Neuromorphic Computing Spiking Neural Network Reservoir Computing Pattern Recognition Digit Recognition Memristor Computing-In-Memory LIF Neuron
3	Enabling Full-Fledged Parallelism on Intermittently Powered Computing Akhunov, Khakim 24 June 2024 (has links) Energy-harvesting batteryless devices exploit power from various sources, such as radio waves, sunlight, and vibration. However, the sporadic availability of ambient energy causes frequent power failures, forcing the systems to operate intermittently. The computation interruptions violate forward progress and memory consistency. State-of-the-art solutions have proposed multiple mature approaches for intermittent computing to provide both application termination guarantees and consistent and idempotent results. Some solutions propose so-called just-in-time (JIT) checkpoints, where dedicated hardware is used to constantly monitor available energy and warn the system when the energy level in the energy buffer reaches critical points. These points indicate potential power failures before which the system must back up its architectural state. Other solutions propose placing checkpoints in the program code at compile time based on the energy consumption of code execution between checkpoints. A power failure can occur at any time during execution, but the computation recovers from the recent checkpoint. Instead of explicitly placing checkpoints, another set of solutions assumes the software developers split the application into failure-atomic tasks directly manipulating non-volatile memory. The common condition in task-based intermittent programming is to keep the energy consumption of each task within the capacity of the energy buffer. While efficient, the proposed solutions target off-the-shelf single-core ultra-low-power microcontrollers (MCUs) with limited flexibility and performance capability. These MCUs are energy-efficient and ideal for performing low-cost tasks. On the other hand, contemporary compute- and data-intensive, parallelizable applications demand the execution of high-cost tasks on edge devices. The reason is that sending large amounts of raw sensor data wirelessly to offload the intensive tasks to the cloud is too energy-inefficient, especially for energy-harvesting devices. Four critical limitations prevent the use of advanced multicore devices and emerging technologies for the efficient execution of modern applications on ultra-low-power batteryless edges. First, in existing systems, programmers need to exploit underlying parallelism manually by interacting directly with low-power accelerators, which is cumbersome. Programmable general-purpose multicore platforms provide the highest degree of flexibility, but the intermittent computing community has overlooked them so far. Existing intermittent computing runtimes do not support parallelism or provide language constructs to express parallelizable code blocks. Second, the availability of energy and the strength of incoming power affect an intermittent system's charging and discharging cyclical nature. When incoming power is strong enough, the device charges rapidly and spends more time on computation. Similarly, low input power forces the system to spend more time collecting energy than computing. To respond to ambient power dynamics and increase throughput, existing works have proposed workload, accuracy, voltage, frequency, and computational unit scaling techniques. However, the solutions work on a fixed hardware configuration, and target systems are limited by the performance of a single-core processor without employing available degrees of application parallelism. Third, existing low-power multicore platforms are not designed for intermittent computing. Their internal non-volatile flash memories are not suitable for intermittent computing because they have high energy requirements, low speed, and limited write endurance. The only way to exploit current low-power multicore platforms for intermittent computing is to introduce an external non-volatile memory, such as FRAM. However, this architectural configuration is very inefficient as compared to embedded FRAM due to its significant energy overhead, making backup and recovery operations energy-expensive. Finally, using emerging memories, e.g., MRAM, as an external non-volatile memory allows for in-memory processing (PIM) of data-intensive computations, eliminating unnecessary data movement and enabling data-level parallelism. While inherently idempotent, such in-memory computation is hard to integrate into traditional MCU-based intermittent systems. Successful integration lacks the effective maintenance of data flow and computation in a power failure-resilient manner. In this thesis, we tackle the limitations. In Chapter 3, we introduce AdaMICA, an intermittent computing runtime that supports parallel intermittent multicore computing and provides the highest degree of flexibility of programmable general-purpose multiple cores. AdaMICA adaptively switches to the best multicore configuration considering the dynamic input power. Therefore, it allows an intermittent system to benefit from workload parallelization, thereby increasing systems throughput and decreasing end-to-end delay while considering the energy availability. Chapter 4 presents PEARL, a power- and energy-aware multicore intermittent computing that enables, for the first time, the efficient adaptation of the common off-the-shelf low-power multicore microcontroller platforms to the intermittent computing paradigm. PEARL features a novel backup policy that significantly reduces the number of accesses to non-volatile memory on multicore platforms. PEARL benefits from multicore power-aware adaptation to adjust the underlying hardware architecture and exploits energy awareness to transition an intermittent system to ultra-low-power mode, retaining memory content. In Chapter 6, we address emerging non-volatile memory, CRAM (Computational RAM), presenting PiMCo and LUTIC, novel programmable CRAM-based in-memory coprocessors that facilitate the power-failure resilient execution of parallelizable computational loads. The coprocessors are pluggable into and controlled by a general-purpose MCU via a standard communication protocol. In Chapter 7, we propose Viadotto, a novel adaptive intermittent computing system that bridges the gap between existing MCU-based intermittent systems and the emerging compute-in-memory paradigm. Viadotto introduces a high-level programming model supported by its compiler, software library, and power failure-resilient memory controller, hiding detailed low-level logic operations and data flow management in CRAM from programmers. Viadotto exploits adaptation by controlling data-level parallelism with respect to the ambient power level. In essence, this thesis addresses several pivotal challenges to enabling full-fledged parallelism on ultra-low-power batteryless devices. Hence, we have made a significant step towards the efficient deployment of modern complex applications on energy-harvesting systems. Intermittent computing Batteryless system Multicore system Parallelism Processing in-memory Computing in-memory
4	An initial operating system adaptation heuristic for Swap Cluster Max (SCM) Somanathan, Muthuveer, January 2008 (has links) Thesis (M.S.)--University of Texas at El Paso, 2008. / Title from title screen. Vita. CD-ROM. Includes bibliographical references. Also available online.
5	Parallel distributed-memory particle methods for acquisition-rate segmentation and uncertainty quantifications of large fluorescence microscopy images Afshar, Yaser 17 October 2016 (has links) Modern fluorescence microscopy modalities, such as light-sheet microscopy, are capable of acquiring large three-dimensional images at high data rate. This creates a bottleneck in computational processing and analysis of the acquired images, as the rate of acquisition outpaces the speed of processing. Moreover, images can be so large that they do not fit the main memory of a single computer. Another issue is the information loss during image acquisition due to limitations of the optical imaging systems. Analysis of the acquired images may, therefore, find multiple solutions (or no solution) due to imaging noise, blurring, and other uncertainties introduced during image acquisition. In this thesis, we address the computational processing time and memory issues by developing a distributed parallel algorithm for segmentation of large fluorescence-microscopy images. The method is based on the versatile Discrete Region Competition (Cardinale et al., 2012) algorithm, which has previously proven useful in microscopy image segmentation. The present distributed implementation decomposes the input image into smaller sub-images that are distributed across multiple computers. Using network communication, the computers orchestrate the collective solving of the global segmentation problem. This not only enables segmentation of large images (we test images of up to 10^10 pixels) but also accelerates segmentation to match the time scale of image acquisition. Such acquisition-rate image segmentation is a prerequisite for the smart microscopes of the future and enables online data inspection and interactive experiments. Second, we estimate the segmentation uncertainty on large images that do not fit the main memory of a single computer. We there- fore develop a distributed parallel algorithm for efficient Markov- chain Monte Carlo Discrete Region Sampling (Cardinale, 2013). The parallel algorithm provides a measure of segmentation uncertainty in a statistically unbiased way. It approximates the posterior probability densities over the high-dimensional space of segmentations around the previously found segmentation. / Moderne Fluoreszenzmikroskopie, wie zum Beispiel Lichtblattmikroskopie, erlauben die Aufnahme hochaufgelöster, 3-dimensionaler Bilder. Dies führt zu einen Engpass bei der Bearbeitung und Analyse der aufgenommenen Bilder, da die Aufnahmerate die Datenverarbeitungsrate übersteigt. Zusätzlich können diese Bilder so groß sein, dass sie die Speicherkapazität eines einzelnen Computers überschreiten. Hinzu kommt der aus Limitierungen des optischen Abbildungssystems resultierende Informationsverlust während der Bildaufnahme. Bildrauschen, Unschärfe und andere Messunsicherheiten können dazu führen, dass Analysealgorithmen möglicherweise mehrere oder keine Lösung für Bildverarbeitungsaufgaben finden. Im Rahmen der vorliegenden Arbeit entwickeln wir einen verteilten, parallelen Algorithmus für die Segmentierung von speicherintensiven Fluoreszenzmikroskopie-Bildern. Diese Methode basiert auf dem vielseitigen "Discrete Region Competition" Algorithmus (Cardinale et al., 2012), der sich bereits in anderen Anwendungen als nützlich für die Segmentierung von Mikroskopie-Bildern erwiesen hat. Das hier präsentierte Verfahren unterteilt das Eingangsbild in kleinere Unterbilder, welche auf die Speicher mehrerer Computer verteilt werden. Die Koordinierung des globalen Segmentierungsproblems wird durch die Benutzung von Netzwerkkommunikation erreicht. Dies erlaubt die Segmentierung von sehr großen Bildern, wobei wir die Anwendung des Algorithmus auf Bildern mit bis zu 10^10 Pixeln demonstrieren. Zusätzlich wird die Segmentierungsgeschwindigkeit erhöht und damit vergleichbar mit der Aufnahmerate des Mikroskops. Dies ist eine Grundvoraussetzung für die intelligenten Mikroskope der Zukunft, und es erlaubt die Online-Betrachtung der aufgenommenen Daten, sowie interaktive Experimente. Wir bestimmen die Unsicherheit des Segmentierungsalgorithmus bei der Anwendung auf Bilder, deren Größe den Speicher eines einzelnen Computers übersteigen. Dazu entwickeln wir einen verteilten, parallelen Algorithmus für effizientes Markov-chain Monte Carlo "Discrete Region Sampling" (Cardinale, 2013). Dieser Algorithmus quantifiziert die Segmentierungsunsicherheit statistisch erwartungstreu. Dazu wird die A-posteriori-Wahrscheinlichkeitsdichte über den hochdimensionalen Raum der Segmentierungen in der Umgebung der zuvor gefundenen Segmentierung approximiert. info:eu-repo/classification/ddc/004 ddc:004

1

Page generated in 0.0524 seconds