1 |
Towards the Inference, Understanding, and Reasoning on Edge DevicesMa, Guoqing 10 May 2023 (has links)
This thesis explores the potential of edge devices in three applications: indoor localization, urban traffic prediction, and multi-modal representation learning. For indoor localization, we propose a reliable data transmission network and robust data processing framework by visible light communications and machine learning to enhance the intelligence of smart buildings. The urban traffic prediction proposes a dynamic spatial and temporal origin-destination feature enhanced deep network with the graph convolutional network to collaboratively learn a low-dimensional representation for each region to predict in-traffic and out-traffic for every city region simultaneously. The multi-modal representation learning proposes using dynamic contexts to uniformly model visual and linguistic causalities, introducing a novel dynamic-contexts-based similarity metric that considers the correlation of potential causes and effects to measure the relevance among images.
To enhance distributed training on edge devices, we introduced a new system called Distributed Artificial Intelligence Over-the-Air (AirDAI), which involves local training on raw data and sending trained outputs, such as model parameters, from local clients back to a central server for aggregation. To aid the development of AirDAI in wireless communication networks, we suggested a general system design and an associated simulator that can be tailored based on wireless channels and system-level configurations. We also conducted experiments to confirm the effectiveness and efficiency of the proposed system design and presented an analysis of the effects of wireless environments to facilitate future implementations and updates.
This thesis proposes FedForest to address the communication and computation limitations in heterogeneous edge networks, which optimizes the global network by distilling knowledge from aggregated sub-networks. The sub-network sampling process is differentiable, and the model size is used as an additional constraint to extract a new sub-network for the subsequent local optimization process. FedForest significantly reduces server-to-client communication and local device computation costs compared to conventional algorithms while maintaining performance with the benchmark Top-K sparsification method. FedForest can accelerate the deployment of large-scale deep learning models on edge devices.
|
2 |
A Study on Fault-tolerance of Deep Neural Networks for Embedded SystemsMalekzadeh, Elaheh January 2021 (has links)
Deep learning is replacing many traditional data processing methods in computer vision, speech recognition, natural language processing and many more diverse end applications. Until only a few years ago, using deep learning networks for inference required large amount of computational resources such as memory, processing power and energy. It was not trivial to deploy the computationally-expensive deep neural networks on embedded devices with limited capabilities. In recent years however, deep learning is finding its way through the world of embedded devices. Embedded systems such as Internet of Things (IoT) devices, phones and even components in cars are being equipped with deep neural networks. This raises interesting challenges for both embedded designers and deep learning scientists to close the gap between these two domains. In this thesis work, some challenges involved in deploying deep learning for embedded systems were discussed, as well as some of the available solutions and frameworks. Moreover, focusing on the safety and fault-tolerance aspects of embedded systems, tolerance of deep neural networks against faults was investigated using an experiment-based research strategy. A fault injection framework was designed and implemented that targeted deep neural networks defined using PyTorch. The framework developed was used to perform fault injection experiments on a small deep learning network. It was found how faults have various impacts on the accuracy of the neural network depending on the type of layer targeted by the faults. Worst-case faults were identified and several architectural modifications on the deep neural network were examined to improve the fault tolerance of the neural network under study. / Djupinlärning ersätter många traditionella databehandlingsmetoder inom datorseende, röstigenkänning, språkteknologi och flera olika slutanvändningar. Fram tills några år sedan har användningen av djupinlärningsnätverk för inferens krävt stora mängder beräkningsresurser såsom minne, processorkraft och energi. Det är inte trivialt att distribuera det beräknings-dyra djupa neurala nätverket på inbyggda enheter med begränsad kapacitet. Under de senaste åren har dock djupinlärning nått världens inbyggda enheter. Inbyggda system som Internet of Things (IoT) -enheter, telefoner och till och med komponenter i bilar utrustas med djupa neurala nätverk. Detta medför intressanta utmaningar för både designers av inbyggda system och djupinlärningsforskare att minska klyftan mellan dessa två domäner. I detta examensarbete diskuteras några av utmaningarna med att distribuera djupinlärning för inbyggda system, samt tillgängliga lösningar och ramverk. Djupa neurala nätverks feltolerans undersöks. Ett ramverk för felinjicering skapades som riktade sig mot djupa neurala nätverk som använder sig av Pytorch. Ramverket användes sedan för att utföra felinjektionsexperiment på ett mindre djupinlärningsnätverk. Man kunde se hur felinjiceringar påverkade nätverkets noggrannhet på olika sätt beroende på vilket lager av nätverket felinjiceringen gjordes på. Den värsta sortens felinjicering identifierades och olika arkitektoniska modifieringar av det djupa neurala nätverket undersöktes för att förbättra feltoleransen av nätverket.
|
3 |
From Edge Computing to Edge Intelligence: exploring novel design approaches to intelligent IoT applicationsAntonini, Mattia 11 June 2021 (has links)
The Internet of Things (IoT) has deeply changed how we interact with our world. Today, smart homes, self-driving cars, connected industries, and wearables are just a few mainstream applications where IoT plays the role of enabling technology. When IoT became popular, Cloud Computing was already a mature technology able to deliver the computing resources necessary to execute heavy tasks (e.g., data analytic, storage, AI tasks, etc.) on data coming from IoT devices, thus practitioners started to design and implement their applications exploiting this approach. However, after a hype that lasted for a few years, cloud-centric approaches have started showing some of their main limitations when dealing with the connectivity of many devices with remote endpoints, like high latency, bandwidth usage, big data volumes, reliability, privacy, and so on. At the same time, a few new distributed computing paradigms emerged and gained attention. Among all, Edge Computing allows to shift the execution of applications at the edge of the network (a partition of the network physically close to data-sources) and provides improvement over the Cloud Computing paradigm. Its success has been fostered by new powerful embedded computing devices able to satisfy the everyday-increasing computing requirements of many IoT applications. Given this context, how can next-generation IoT applications take advantage of the opportunity offered by Edge Computing to shift the processing from the cloud toward the data sources and exploit everyday-more-powerful devices? This thesis provides the ingredients and the guidelines for practitioners to foster the migration from cloud-centric to novel distributed design approaches for IoT applications at the edge of the network, addressing the issues of the original approach. This requires the design of the processing pipeline of applications by considering the system requirements and constraints imposed by embedded devices. To make this process smoother, the transition is split into different steps starting with the off-loading of the processing (including the Artificial Intelligence algorithms) at the edge of the network, then the distribution of computation across multiple edge devices and even closer to data-sources based on system constraints, and, finally, the optimization of the processing pipeline and AI models to efficiently run on target IoT edge devices. Each step has been validated by delivering a real-world IoT application that fully exploits the novel approach. This paradigm shift leads the way toward the design of Edge Intelligence IoT applications that efficiently and reliably execute Artificial Intelligence models at the edge of the network.
|
4 |
Efficient Processing of Convolutional Neural Networks on the Edge: A Hybrid Approach Using Hardware Acceleration and Dual-Teacher CompressionAlhussain, Azzam 01 January 2024 (has links) (PDF)
This dissertation addresses the challenge of accelerating Convolutional Neural Networks (CNNs) for edge computing in computer vision applications by developing specialized hardware solutions that maintain high accuracy and perform real-time inference. Driven by open-source hardware design frameworks such as FINN and HLS4ML, this research focuses on hardware acceleration, model compression, and efficient implementation of CNN algorithms on AMD SoC-FPGAs using High-Level Synthesis (HLS) to optimize resource utilization and improve the throughput/watt of FPGA-based AI accelerators compared to traditional fixed-logic chips, such as CPUs, GPUs, and other edge accelerators. The dissertation introduces a novel CNN compression technique, "Two-Teachers Net," which utilizes PyTorch FX-graph mode to train an 8-bit quantized student model using knowledge distillation from two teacher models, improving the accuracy of the compressed model by 1%-2% compared to existing solutions for edge platforms. This method can be applied to any CNN model and dataset for image classification and seamlessly integrated into existing AI hardware and software optimization toolchains, including Vitis-AI, OpenVINO, TensorRT, and ONNX, without architectural adjustments. This provides a scalable solution for deploying high-accuracy CNNs on low-power edge devices across various applications, such as autonomous vehicles, surveillance systems, robotics, healthcare, and smart cities.
|
5 |
Exploration and Evaluation of RNN Models on Low-Resource Embedded Devices for Human Activity Recognition / Undersökning och utvärdering av RNN-modeller på resurssvaga inbyggda system för mänsklig aktivitetsigenkänningBjörnsson, Helgi Hrafn, Kaldal, Jón January 2023 (has links)
Human activity data is typically represented as time series data, and RNNs, often with LSTM cells, are commonly used for recognition in this field. However, RNNs and LSTM-RNNs are often too resource-intensive for real-time applications on resource constrained devices, making them unsuitable. This thesis project is carried out at Wrlds AB, Stockholm. At Wrlds, all machine learning is run in the cloud, but they have been attempting to run their AI algorithms on their embedded devices. The main task of this project was to investigate alternative network structures to minimize the size of the networks to be used on human activity data. This thesis investigates the use of Fast GRNN, a deep learning algorithm developed by Microsoft researchers, to classify human activity on resource-constrained devices. The FastGRNN algorithm was compared to state-of-the-art RNNs, LSTM, GRU, and Simple RNN in terms of accuracy, classification time, memory usage, and energy consumption. This research is limited to implementing the FastRNN algorithm on Nordic SoCs using their SDK and TensorFlow Lite Micro. The result of this thesis shows that the proposed network has similar performance as LSTM networks in terms of accuracy while being both considerably smaller and faster, making it a promising solution for human activity recognition on embedded devices with limited computational resources and merits further investigation. / Rörelse igenkännings analys är oftast representerat av tidsseriedata där ett RNN modell meden LSTM arkitektur är oftast den självklara vägen att ta. Dock så är denna arkitektur väldigt resurskrävande för applikationer i realtid och gör att det uppstår problem med resursbegränsad hårdvara. Detta examensarbete är utfört i samarbete med Wrlds Technologies AB. På Wrlds så körs deras maskin inlärningsmodeller på molnet och lokalt på mobiltelefoner. Wrlds har nu påbörjat en resa för att kunna köra modeller direkt på små inbyggda system. Examensarbete kommer att utvärdera en FastGRNN som är en NN-arkitektur utvecklad av Microsoft i syfte att användas på resurs begränsad hårdvara. FastGRNN algoritmen jämfördes med andra högkvalitativa arkitekturer som RNNs, LSTM, GRU och en simpel RNN. Träffsäkerhet, klassifikationstid, minnesanvändning samt energikonsumtion användes för att jämföra dom olika varianterna. Detta arbete kommer bara att utvärdera en FastGRNN algoritm på en Nordic SoCs och kommer att användas deras SDK samt Tensorflow Lite Micro. Resultatet från detta examensarbete visar att det utvärderade nätverket har liknande prestanda som ett LSTM nätverk men också att nätverket är betydligt mindre i storlek och därmed snabbare. Detta betyder att ett FastGRNN visar lovande resultat för användningen av rörelseigenkänning på inbyggda system med begränsad prestanda kapacitet.
|
6 |
Lite-Agro: Integrating Federated Learning and TinyML on IoAT-Edge for Plant Disease ClassificationDockendorf, Catherine April 05 1900 (has links)
Lite-Agro studies applications of TinyML in pear (Pyrus communis) tree disease identification and explores hardware implementations with an ESP32 microcontroller. The study works with the DiaMOS Pear Dataset to learn through image analysis whether the leaf is healthy or not, and classifies it according to curl, healthy, spot or slug categories. The system is designed as a low cost and light-duty computing detection edge solution that compares models such as InceptionV3, XceptionV3, EfficientNetB0, and MobileNetV2. This work also researches integration with federated learning frameworks and provides an introduction to federated averaging algorithms.
|
7 |
An Intelligent UAV Platform For Multi-Agent SystemsTaashi Kapoor (12437445) 21 April 2022 (has links)
<p> This thesis presents work and simulations containing the use of Artificial Intelligence for real-time perception and real-time anomaly detection using the computer and sensors onboard an Unmanned Aerial Vehicle. One goal of this research is to develop a highly accurate, high-performance computer vision system that can then be used as a framework for object detection, obstacle avoidance, motion estimation, 3D reconstruction, and vision-based GPS denied path planning. The method developed and presented in this paper integrates software and hardware techniques to reach optimal performance for real-time operations. </p>
<p>This thesis also presents a solution to real-time anomaly detection using neural networks to further the safety and reliability of operations for the UAV. Real-time telemetry data from different sensors are used to predict failures before they occur. Both these systems together form the framework behind the Intelligent UAV platform, which can be rapidly adopted for different varieties of use cases because of its modular nature and on-board suite of sensors. </p>
|
8 |
ENERGY EFFICIENT EDGE INFERENCE SYSTEMSSoumendu Kumar Ghosh (14060094) 07 August 2023 (has links)
<p>Deep Learning (DL)-based edge intelligence has garnered significant attention in recent years due to the rapid proliferation of the Internet of Things (IoT), embedded, and intelligent systems, collectively termed edge devices. Sensor data streams acquired by these edge devices are processed by a Deep Neural Network (DNN) application that runs on the device itself or in the cloud. However, the high computational complexity and energy consumption of processing DNNs often limit their deployment on these edge inference systems due to limited compute, memory and energy resources. Furthermore, high costs, strict application latency demands, data privacy, security constraints, and the absence of reliable edge-cloud network connectivity heavily impact edge application efficiency in the case of cloud-assisted DNN inference. Inevitably, performance and energy efficiency are of utmost importance in these edge inference systems, aside from the accuracy of the application. To facilitate energy- efficient edge inference systems running computationally complex DNNs, this dissertation makes three key contributions.</p>
<p><br></p>
<p>The first contribution adopts a full-system approach to Approximate Computing, a design paradigm that trades off a small degradation in application quality for significant energy savings. Within this context, we present the foundational concepts of AxIS, the first approximate edge inference system that jointly optimizes the constituent subsystems leading to substantial energy benefits compared to optimization of the individual subsystem. To illustrate the efficacy of this approach, we demonstrate multiple versions of an approximate smart camera system that executes various DNN-based unimodal computer vision applications, showcasing how the sensor, memory, compute, and communication subsystems can all be synergistically approximated for energy-efficient edge inference.</p>
<p><br></p>
<p>Building on this foundation, the second contribution extends AxIS to multimodal AI, harnessing data from multiple sensor modalities to impart human-like cognitive and perceptual abilities to edge devices. By exploring optimization techniques for multiple sensor modalities and subsystems, this research reveals the impact of synergistic modality-aware optimizations on system-level accuracy-efficiency (AE) trade-offs, culminating in the introduction of SysteMMX, the first AE scalable cognitive system that allows efficient multimodal inference at the edge. To illustrate the practicality and effectiveness of this approach, we present an in-depth case study centered around a multimodal system that leverages RGB and Depth sensor modalities for image segmentation tasks.</p>
<p><br></p>
<p>The final contribution focuses on optimizing the performance of an edge-cloud collaborative inference system through intelligent DNN partitioning and computation offloading. We delve into the realm of distributed inference across edge devices and cloud servers, unveiling the challenges associated with finding the optimal partitioning point in DNNs for significant inference latency speedup. To address these challenges, we introduce PArtNNer, a platform-agnostic and adaptive DNN partitioning framework capable of dynamically adapting to changes in communication bandwidth and cloud server load. Unlike existing approaches, PArtNNer does not require pre-characterization of underlying edge computing platforms, making it a versatile and efficient solution for real-world edge-cloud scenarios.</p>
<p><br></p>
<p>Overall, this thesis provides novel insights, innovative techniques, and intelligent solutions to enable energy-efficient AI at the edge. The contributions presented herein serve as a solid foundation for future researchers to build upon, driving innovation and shaping the trajectory of research in edge AI.</p>
|
Page generated in 0.052 seconds