Global ETD Search

101	Road Segmentation and Optimal Route Prediction using Deep Neural Networks and Graphs / Vägsegmentering och förutsägelse av optimala rutter genom djupa neurala nätverk och grafer Ossmark, Viktor January 2021 (has links) Observing the earth from above is a great way of understanding our world better. From space, many complex patterns and relationships on the ground can be identified through high-quality satellite data. The quality and availability of this data in combination with recent advancement in various deep learning techniques allows us to find these patterns more effectively then ever. In this thesis, we will analyze satellite imagery by using deep neural networks in an attempt to find road networks in different cities around the world. Once we have located networks of roads in the cities we will represent them as graphs and deploy the Dijkstra shortest path algorithm to find optimal routes within these networks. Having the ability to efficiently use satellite imagery for near real-time road detection and optimal route prediction has many possible applications, especially from a humanitarian and commercial point of view. For example, in the humanitarian realm, the frequency of natural disasters is unfortunately increasing due to climate change and the need for emergency real-time mapping for relief organisations in the case of a severe flood or similar is growing. The state-of-the-art deep neural network models that will be implemented, compared and contrasted for this task are mainly based on the U-net and ResNet architectures. However, before introducing these architectures the reader will be given a comprehensive introduction and theoretical background of deep neural networks to distinctly formulate the mathematical groundwork. The final results demonstrates an overall strong model performance across different metrics and data sets, with the highest obtained IoU-score being approximately 0.7 for the segmentation task. For some models we can also see a high degree of similarity between the predicted optimal paths and the ground truth optimal paths. / Att betrakta jorden från ovan är ett bra tillvägagångsätt för att förstå vår egen värld bättre. Från rymden, många komplexa mönster och samband på marken går att urskilja genom hög-upplöst satellitdata. Kvalitén och tillgängligheten av denna data, i kombination med de senaste framstegen inom djupa inlärningstekniker, möjliggör oss att hissa dessa mönster mer effektivt än någonsin. I denna avhandling kommer vi analysera satellitbilder med hjälp av djupa neurala nätverk i ett försök att hitta nätverk av vägar i olika städer runtom i världen. Efter vi har lokaliserat dessa nätverk av vägar så kommer vi att representera nätverken som grafer och använda oss av Dijkstras algoritm för att hitta optimala rutter inom dessa nätverk. Att ha förmågan att kunna effektivt använda sig av satellitbilder för att i nära realtid kunna identifiera vägar och optimala rutter har många möjliga applikationer. Speciellt ur ett humant och kommersiellt perspektiv. Exempelvis, inom det humanitära området, så ökar dessvärre frekvensen av naturkatastrofer på grund av klimatförändringar och därmed är behovet av nödkartläggning i realtid för hjälporganisationer större än någonsin. En effektiv nödkartläggning skulle exempelvis kunna underlätta enormt vid en allvarlig översvämning eller dylikt. Dem toppmoderna djupa neurala nätverksmodellerna som kommer implementeras, jämföras och nyanseras för denna uppgift är i huvudsak baserad på U-net och ResNet arkitekturerna. Innan vi presenterar dessa arkitekturer i denna avhandling så kommer läsaren att få en omfattande teoretisk bakgrund till djupa neurala nätverk för att tydligt formulera dem matematiska grundpelarna. Dem slutgiltiga resultaten visar övergripande stark prestanda för samtliga av våra modeller. Både på olika datauppsättningar samt utvärderingsmått. Den högste IoU poängen som uppnås är cirka 0,7 och vi kan även se en hög grad av likhet mellan vissa av våra förutsagda optimala rutter och mark sanningens optimala rutter. statistics applied mathematics machine learning computer vision deep learning satellite data remote sensing image segmentation optimal routing deep neural networks graphs AI statistik tillämpad matematik maskininlärning djup inlärning datorseende satellitdata bildsegmentering optimala rutter grafer AI Other Mathematics Annan matematik
102	Basil-GAN / Basilika-GAN Risberg, Jonatan January 2022 (has links) Developments in computer vision has sought to design deep neural networks which trained on a large set of images are able to generate high quality artificial images which share semantic qualities with the original image set. A pivotal shift was made with the introduction of the generative adversarial network (GAN) by Goodfellow et al.. Building on the work by Goodfellow more advanced models using the same idea have shown great improvements in terms of both image quality and data diversity. GAN models generate images by feeding samples from a vector space into a generative neural network. The structure of these so called latent vector samples show to correspond to semantic similarities of their corresponding generated images. In this thesis the DCGAN model is trained on a novel data set consisting of image sequences of the growth process of basil plants from germination to harvest. We evaluate the trained model by comparing the DCGAN performance on benchmark data sets such as MNIST and CIFAR10 and conclude that the model trained on the basil plant data set achieved similar results compared to the MNIST data set and better results in comparison to the CIFAR10 data set. To argue for the potential of using more advanced GAN models we compare the results from the DCGAN model with the contemporary StyleGAN2 model. We also investigate the latent vector space produced by the DCGAN model and confirm that in accordance with previous research, namely that the DCGAN model is able to generate a latent space with data specific semantic structures. For the DCGAN model trained on the data set of basil plants, the latent space is able to distinguish between images of early stage basil plants from late stage plants in the growth phase. Furthermore, utilizing the sequential semantics of the basil plant data set, an attempt at generating an artificial growth sequence is made using linear interpolation. Finally we present an unsuccessful attempt at visualising the latent space produced by the DCGAN model using a rudimentary approach at inverting the generator network function. / Utvecklingen inom datorseende har syftat till att utforma djupa neurala nätverk som tränas på en stor mängd bilder och kan generera konstgjorda bilder av hög kvalitet med samma semantiska egenskaper som de ursprungliga bilderna. Ett avgörande skifte skedde när Goodfellow et al. introducerade det generativa adversariella nätverket (GAN). Med utgångspunkt i Goodfellows arbete har flera mer avancerade modeller som använder samma idé uppvisat stora förbättringar när det gäller både bildkvalitet och datamångfald. GAN-modeller genererar bilder genom att mata in vektorer från ett vektorrum till ett generativt neuralt nätverk. Strukturen hos dessa så kallade latenta vektorer visar sig motsvara semantiska likheter mellan motsvarande genererade bilder. I detta examensarbete har DCGAN-modellen tränats på en ny datamängd som består av bildsekvenser av basilikaplantors tillväxtprocess från groning till skörd. Vi utvärderar den tränade modellen genom att jämföra DCGAN-modellen mot referensdataset som MNIST och CIFAR10 och drar slutsatsen att DCGAN tränad på datasetet för basilikaväxter uppnår liknande resultat jämfört med MNIST-dataset och bättre resultat jämfört med CIFAR10-datasetet. För att påvisa potentialen av att använda mer avancerade GAN-modeller jämförs resultaten från DCGAN-modellen med den mer avancerade StyleGAN2-modellen. Vi undersöker också det latenta vektorrum som produceras av DCGAN-modellen och bekräftar att DCGAN-modellen i enlighet med tidigare forskning kan generera ett latent rum med dataspecifika semantiska strukturer. För DCGAN-modellen som tränats på datamängden med basilikaplantor lyckas det latenta rummet skilja mellan bilder av basilikaplantor i tidiga stadier och sena stadier av plantor i tillväxtprocessen. Med hjälp av den sekventiella semantiken i datamängden för basilikaväxter gjörs dessutom ett försök att generera en artificiell tillväxtsekvens med hjälp av linjär interpolation. Slutligen presenterar vi ett misslyckat försök att visualisera det latenta rummet som produceras av DCGAN-modellen med hjälp av ett rudimentärt tillvägagångssätt för att invertera den generativa nätverksfunktionen. GAN mathematical statistics deep neural networks generative models latent space exploration sequential data GAN matematisk statistik djupa neurala nätverk generativa modeller utforskning av latenta rum sekventiell data Other Mathematics Annan matematik
103	Some phenomenological investigations in deep learning Baratin, Aristide 12 1900 (has links) Les remarquables performances des réseaux de neurones profonds dans de nombreux domaines de l'apprentissage automatique au cours de la dernière décennie soulèvent un certain nombre de questions théoriques. Par exemple, quels mecanismes permettent à ces reseaux, qui ont largement la capacité de mémoriser entièrement les exemples d'entrainement, de généraliser correctement à de nouvelles données, même en l'absence de régularisation explicite ? De telles questions ont fait l'objet d'intenses efforts de recherche ces dernières années, combinant analyses de systèmes simplifiés et études empiriques de propriétés qui semblent être corrélées à la performance de généralisation. Les deux premiers articles présentés dans cette thèse contribuent à cette ligne de recherche. Leur but est de mettre en évidence et d'etudier des mécanismes de biais implicites permettant à de larges modèles de prioriser l'apprentissage de fonctions "simples" et d'adapter leur capacité à la complexité du problème. Le troisième article aborde le problème de l'estimation de information mutuelle en haute, en mettant à profit l'expressivité et la scalabilité des reseaux de neurones profonds. Il introduit et étudie une nouvelle classe d'estimateurs, dont il présente plusieurs applications en apprentissage non supervisé, notamment à l'amélioration des modèles neuronaux génératifs. / The striking empirical success of deep neural networks in machine learning raises a number of theoretical puzzles. For example, why can they generalize to unseen data despite their capacity to fully memorize the training examples? Such puzzles have been the subject of intense research efforts in the past few years, which combine rigorous analysis of simplified systems with empirical studies of phenomenological properties shown to correlate with generalization. The first two articles presented in these thesis contribute to this line of work. They highlight and discuss mechanisms that allow large models to prioritize learning `simple' functions during training and to adapt their capacity to the complexity of the problem. The third article of this thesis addresses the long standing problem of estimating mutual information in high dimension, by leveraging the scalability of neural networks. It introduces and studies a new class of estimators and present several applications in unsupervised learning, especially on enhancing generative models. Apprentissage statistique réseaux de neurones profonds théorie de l'apprentissage information mutuelle modèles génératifs machine learning deep neural networks statistical learning theory mutual information generative models
104	Computational auditory scene analysis and robust automatic speech recognition Narayanan, Arun 14 November 2014 (has links) No description available. Engineering Computer Science Automatic speech recognition noise robustness computational auditory scene analysis binary masking ratio masking mask estimation deep neural networks acoustic modeling speech separation speech enhancement noisy ASR CHiME-2 Aurora-4
105	Deep Learning for Sensor Fusion Howard, Shaun Michael 30 August 2017 (has links) No description available. Computer Science Artificial Intelligence deep learning sensor fusion deep neural networks advanced driver assistance systems automated driving multi-stream neural networks feedforward multilayer perceptron recurrent gated recurrent unit long-short term memory camera radar
106	Flight search engine CPU consumption prediction Tao, Zhaopeng January 2021 (has links) The flight search engine is a technology used in the air travel industry. It allows the traveler to search and book for the best flight options, such as the combination of flights while keeping the best services, options, and price. The computation for a flight search query can be very intensive given its parameters and complexity. The project goal is to predict the flight search queries computation cost for a new flight search engine product when dealing with parameters change and optimizations. The problem of flight search cost prediction is a regression problem. We propose to solve the problem by delimiting the problem based on its business logic and meaning. Our problem has data defined as a graph, which is why we have chosen Graph Neural Network. We have investigated multiple pretraining strategies for the evaluation of node embedding concerning a realworld regression task, including using a line graph for the training. The embeddings are used for downstream regression tasks. Our work is based on some stateoftheart Machine Learning, Deep Learning, and Graph Neural Network methods. We conclude that for some business use cases, the predictions are suitable for production use. In addition, the prediction of tree ensemble boosting methods produces negatives predictions which further degrade the R2 score by 4% because of the business meaning. The Deep Neural Network outperformed the most performing Machine Learning methods by 8% to 12% of R2 score. The Deep Neural Network also outperformed Deep Neural Network with pretrained node embedding from the Graph Neural Network methods by 11% to 17% R2 score. The Deep Neural Network achieved 93%, 81%, and 63% R2 score for each task with increasing difficulty. The training time range from 1 hour for Machine Learning models, 2 to 10 hours for Deep Learning models, and 8 to 24 hours for Deep Learning model for tabular data trained end to end with Graph Neural Network layers. The inference time is around 15 minutes. Finally, we found that using Graph Neural Network for the node regression task does not outperform Deep Neural Network. / Flygsökmotor är en teknik som används inom flygresebranschen. Den gör det möjligt för resenären att söka och boka de bästa flygalternativen, t.ex. kombinationer av flygningar med bästa service, alternativ och pris. Beräkningen av en flygsökning kan vara mycket intensiv med tanke på dess parametrar och komplexitet. Projektets mål är att förutsäga beräkningskostnaden för flygsökfrågor för en ny produkt för flygsökmotor när parametrar ändras och optimeringar görs. Problemet med att förutsäga kostnaderna för flygsökning är ett regressionsproblem. Vi föreslår att man löser problemet genom att avgränsa det utifrån dess affärslogik och innebörd. Vårt problem har data som definieras som en graf, vilket är anledningen till att vi har valt Graph Neural Network. Vi har undersökt flera förträningsstrategier för utvärdering av nodinbäddning när det gäller en regressionsuppgift från den verkliga världen, bland annat genom att använda ett linjediagram för träningen. Inbäddningarna används för regressionsuppgifter i efterföljande led. Vårt arbete bygger på några toppmoderna metoder för maskininlärning, djupinlärning och grafiska neurala nätverk. Vi drar slutsatsen att förutsägelserna är lämpliga för produktionsanvändning i vissa Vi drar slutsatsen att förutsägelserna är lämpliga för produktionsanvändning i vissa fall. Dessutom ger förutsägelserna från trädens ensemble av boostingmetoder negativa förutsägelser som ytterligare försämrar R2poängen med 4% på grund av affärsmässiga betydelser. Deep Neural Network överträffade de mest effektiva metoderna för maskininlärning med 812% av R2poängen. Det djupa neurala nätverket överträffade också det djupa neurala nätverket med förtränad node embedding från metoderna för grafiska neurala nätverk med 11 till 17% av R2poängen. Deep Neural Network uppnådde 93, 81 och 63% R2poäng för varje uppgift med stigande svårighetsgrad. Träningstiden varierar från 1 timme för maskininlärningsmodeller, 2 till 10 timmar för djupinlärningsmodeller och 8 till 24 timmar för djupinlärningsmodeller för tabelldata som tränats från början till slut med grafiska neurala nätverkslager. Inferenstiden är cirka 15 minuter. Slutligen fann vi att användningen av Graph Neural Network för uppgiften om regression av noder inte överträffar Deep Neural Network. Flight Search Engine Deep Neural Networks Tabular Data Regression Machine Learning Flight Schedule Embedding Node Embedding Graph Embedding Line Graph Embedding Sökmotor för flygresor Djupa neurala nätverk Tabulära data Regression Maskininlärning Inbäddning av flygschema Inbäddning av noder Inbäddning av grafer Inbäddning av linjediagram Computer and Information Sciences Data- och informationsvetenskap
107	Fusion pour la séparation de sources audio / Fusion for audio source separation Jaureguiberry, Xabier 16 June 2015 (has links) La séparation aveugle de sources audio dans le cas sous-déterminé est un problème mathématique complexe dont il est aujourd'hui possible d'obtenir une solution satisfaisante, à condition de sélectionner la méthode la plus adaptée au problème posé et de savoir paramétrer celle-ci soigneusement. Afin d'automatiser cette étape de sélection déterminante, nous proposons dans cette thèse de recourir au principe de fusion. L'idée est simple : il s'agit, pour un problème donné, de sélectionner plusieurs méthodes de résolution plutôt qu'une seule et de les combiner afin d'en améliorer la solution. Pour cela, nous introduisons un cadre général de fusion qui consiste à formuler l'estimée d'une source comme la combinaison de plusieurs estimées de cette même source données par différents algorithmes de séparation, chaque estimée étant pondérée par un coefficient de fusion. Ces coefficients peuvent notamment être appris sur un ensemble d'apprentissage représentatif du problème posé par minimisation d'une fonction de coût liée à l'objectif de séparation. Pour aller plus loin, nous proposons également deux approches permettant d'adapter les coefficients de fusion au signal à séparer. La première formule la fusion dans un cadre bayésien, à la manière du moyennage bayésien de modèles. La deuxième exploite les réseaux de neurones profonds afin de déterminer des coefficients de fusion variant en temps. Toutes ces approches ont été évaluées sur deux corpus distincts : l'un dédié au rehaussement de la parole, l'autre dédié à l'extraction de voix chantée. Quelle que soit l'approche considérée, nos résultats montrent l'intérêt systématique de la fusion par rapport à la simple sélection, la fusion adaptative par réseau de neurones se révélant être la plus performante. / Underdetermined blind source separation is a complex mathematical problem that can be satisfyingly resolved for some practical applications, providing that the right separation method has been selected and carefully tuned. In order to automate this selection process, we propose in this thesis to resort to the principle of fusion which has been widely used in the related field of classification yet is still marginally exploited in source separation. Fusion consists in combining several methods to solve a given problem instead of selecting a unique one. To do so, we introduce a general fusion framework in which a source estimate is expressed as a linear combination of estimates of this same source given by different separation algorithms, each source estimate being weighted by a fusion coefficient. For a given task, fusion coefficients can then be learned on a representative training dataset by minimizing a cost function related to the separation objective. To go further, we also propose two ways to adapt the fusion coefficients to the mixture to be separated. The first one expresses the fusion of several non-negative matrix factorization (NMF) models in a Bayesian fashion similar to Bayesian model averaging. The second one aims at learning time-varying fusion coefficients thanks to deep neural networks. All proposed methods have been evaluated on two distinct corpora. The first one is dedicated to speech enhancement while the other deals with singing voice extraction. Experimental results show that fusion always outperform simple selection in all considered cases, best results being obtained by adaptive time-varying fusion with neural networks. Sélection de modèles Combinaison de modèles Séparation de sources audio Rehaussement de la parole Factorisation en matrices non-négatives Inférence variationnelle bayésienne Moyennage bayésien de modèles Réseaux de neurones profonds Model selection Model combination Audio source separation Speech enhancement Non-negative matrix factorization (NMF) Variational Bayesian inference Bayesian model averaging Deep neural networks
108	Improving Brain Tumor Segmentation using synthetic images from GANs Nijhawan, Aashana January 2021 (has links) Artificial intelligence (AI) has been seeing a great amount of hype around it for a few years but more so now in the field of diagnostic medical imaging. AI-based diagnoses have shown improvements in detecting the smallest abnormalities present in tumors and lesions. This can tremendously help public healthcare. There is a large amount of data present in the field of biomedical imaging with the hospitals but only a small amount is available for the use of research due to data and privacy protection. The task of manually segmenting tumors in this magnetic resonance imaging (MRI) can be quite expensive and time taking. This segmentation and classification would need high precision which is usually performed by medical experts that follow clinical medical standards. Due to this small amount of data when used with machine learning models, the trained models tend to overfit. With advancing deep learning techniques it is possible to generate images using Generative Adversarial Networks (GANs). GANs has garnered a heap of attention towards itself for its power to produce realistic-looking images, videos, and audios. This thesis aims to use the synthetic images generated by progressive growing GANs (PGGAN) along with real images to perform segmentation on brain tumor MRI. The idea is to investigate whether the addition of this synthetic data improves the segmentation significantly or not. To analyze the quality of the images produced by the PGGAN, Multi-scale Similarity Index Measure (MS-SSIM) and Sliced Wasserstein Distance (SWD) are recorded. To exam-ine the segmentation performance, Dice Similarity Coefficient (DSC) and accuracy scores are observed. To inspect if the improved performance by synthetic images is significant or not, a parametric paired t-test and non-parametric permutation test are used. It could be seen that the addition of synthetic images with real images is significant for most cases in comparison to using only real images. However, this addition of synthetic images makes the model uncertain. The models’ robustness is tested using training-free uncertainty estimation of neural networks. Image segmentation Generative Adversarial Networks GANs Computer Vision synthetic images generator discriminator uncertainty estimation deep neural networks U-net PGGAN Medical Image Processing Medicinsk bildbehandling Computer and Information Sciences Data- och informationsvetenskap Probability Theory and Statistics Sannolikhetsteori och statistik
109	Finding the QRS Complex in a Sampled ECG Signal Using AI Methods / Hitta QRS komplex in en samplad EKG signal med AI metoder Skeppland Hole, Jeanette Marie Victoria January 2023 (has links) This study aimed to explore the application of artificial intelligence (AI) and machine learning (ML) techniques in implementing a QRS detector forambulatory electrocardiography (ECG) monitoring devices. Three ML models, namely long short-term memory (LSTM), convolutional neural network (CNN), and multilayer perceptron (MLP), were compared and evaluated using the MIT-BIH arrhythmia database (MITDB) and the MIT-BIH noise stress test database (NSTDB). The MLP model consistently outperformed the other models, achieving high accuracy in R-peak detection. However, when tested on noisy data, all models faced challenges in accurately predicting R-peaks, indicating the need for further improvement. To address this, the study emphasized the importance of iteratively refining the input data configurations for achieving accurate R-peak detection. By incorporating both the MITDB and NSTDB during training, the models demonstrated improved generalization to noisy signals. This iterative refinement process allowed for the identification of the best models and configurations, consistently surpassing existing ML-based implementations and outperforming the current ECG analysis system. The MLP model, without shifting segments and utilizing both datasets, achieved an outstanding accuracy of 99.73 % in R-peak detection. This accuracy exceeded values reported in the literature, demonstrating the superior performance of this approach. Furthermore, the shifted MLP model, which considered temporal dependencies by incorporating shifted segments, showed promising results with an accuracy of 99.75 %. It exhibited enhanced accuracy, precision, and F1-score compared to the other models, highlighting the effectiveness of incorporating shifted segments. For future research, it is important to address challenges such as overfitting and validate the models on independent datasets. Additionally, continuous refinement and optimization of the input data configurations will contribute to further advancements in ECG signal analysis and improve the accuracy of R-peak detection. This study underscores the potential of ML techniques in enhancing ECG analysis, ultimately leading to improved cardiac diagnostics and better patient care. / Syftet med denna studie var att utforska användningen av AI- och ML-tekniker för att implementera en QRS-detektor i EKG-övervakningsenheter. Tre olika ML-modeller, LSTM, CNN och MLP jämfördes och utvärderades med hjälp av MITDB och NSTDB. Resultaten visade att MLP-modellen konsekvent presterade bättre än de andra modellerna och uppnådde hög noggrannhet vid detektion av R-toppar i EKG-signalen. Trots detta stötte alla modeller på utmaningar när de testades på brusig realtidsdata, vilket indikerade behovet av ytterligare förbättringar. För att hantera dessa utmaningar betonade studien vikten av att iterativt förbättra konfigurationen av indata för att uppnå noggrann detektering av R toppar. Genom att inkludera både MITDB och NSTDB under träningen visade modellerna förbättrad förmåga att generalisera till brusiga signaler. Denna iterativa process möjliggjorde identifiering av de bästa modellerna och konfigurationerna, vilka konsekvent överträffade befintliga ML-baserade implementeringar och presterade bättre än den nuvarande EKG-analysystemet. MLP-modellen, utan användning av skiftade segment och med båda databaserna, uppnådde en imponerande noggrannhet på 99,73 % vid detektion av R-toppar. Denna noggrannhet överträffade tidigare studier och visade på den överlägsna prestandan hos denna metod. Dessutom visade den skiftade MLP-modellen, som inkluderade skiftade segment för att beakta tidsberoenden, lovande resultat med en noggrannhet på 99,75 %. Modellen uppvisade förbättrad noggrannhet, precision och F1-score jämfört med de andra modellerna, vilket betonar vikten av att inkludera skiftade segment. För framtida studier är det viktigt att hantera utmaningar som överanpassning och att validera modellerna med oberoende datamängder. Dessutom kommer en kontinuerlig förfining och optimering av konfigurationen av indata att bidra till ytterligare framsteg inom EKG-signalanalys och förbättrad noggrannhet vid detektion av R-toppar. Denna studie understryker potentialen hos ML-modeller för att förbättra EKG-analysen och därigenom bidra till förbättrad diagnostik av hjärtsjukdomar och högre kvalitet inom patientvården. ECG ECG-analysis QRS detector Artificial Intelligence Machine Learning Deep neural networks Long short-term memory Convolutional neural network Multilayer perceptron EKG EKG-analys QRS detektor Artificiell intelligens Maskininlärning Djupa neurala nätverk Long short-term memory Convolutional neural network Multilayer perceptron Physical Sciences Fysik
110	ENERGY EFFICIENT EDGE INFERENCE SYSTEMS Soumendu Kumar Ghosh (14060094) 07 August 2023 (has links) <p>Deep Learning (DL)-based edge intelligence has garnered significant attention in recent years due to the rapid proliferation of the Internet of Things (IoT), embedded, and intelligent systems, collectively termed edge devices. Sensor data streams acquired by these edge devices are processed by a Deep Neural Network (DNN) application that runs on the device itself or in the cloud. However, the high computational complexity and energy consumption of processing DNNs often limit their deployment on these edge inference systems due to limited compute, memory and energy resources. Furthermore, high costs, strict application latency demands, data privacy, security constraints, and the absence of reliable edge-cloud network connectivity heavily impact edge application efficiency in the case of cloud-assisted DNN inference. Inevitably, performance and energy efficiency are of utmost importance in these edge inference systems, aside from the accuracy of the application. To facilitate energy- efficient edge inference systems running computationally complex DNNs, this dissertation makes three key contributions.</p> <p><br></p> <p>The first contribution adopts a full-system approach to Approximate Computing, a design paradigm that trades off a small degradation in application quality for significant energy savings. Within this context, we present the foundational concepts of AxIS, the first approximate edge inference system that jointly optimizes the constituent subsystems leading to substantial energy benefits compared to optimization of the individual subsystem. To illustrate the efficacy of this approach, we demonstrate multiple versions of an approximate smart camera system that executes various DNN-based unimodal computer vision applications, showcasing how the sensor, memory, compute, and communication subsystems can all be synergistically approximated for energy-efficient edge inference.</p> <p><br></p> <p>Building on this foundation, the second contribution extends AxIS to multimodal AI, harnessing data from multiple sensor modalities to impart human-like cognitive and perceptual abilities to edge devices. By exploring optimization techniques for multiple sensor modalities and subsystems, this research reveals the impact of synergistic modality-aware optimizations on system-level accuracy-efficiency (AE) trade-offs, culminating in the introduction of SysteMMX, the first AE scalable cognitive system that allows efficient multimodal inference at the edge. To illustrate the practicality and effectiveness of this approach, we present an in-depth case study centered around a multimodal system that leverages RGB and Depth sensor modalities for image segmentation tasks.</p> <p><br></p> <p>The final contribution focuses on optimizing the performance of an edge-cloud collaborative inference system through intelligent DNN partitioning and computation offloading. We delve into the realm of distributed inference across edge devices and cloud servers, unveiling the challenges associated with finding the optimal partitioning point in DNNs for significant inference latency speedup. To address these challenges, we introduce PArtNNer, a platform-agnostic and adaptive DNN partitioning framework capable of dynamically adapting to changes in communication bandwidth and cloud server load. Unlike existing approaches, PArtNNer does not require pre-characterization of underlying edge computing platforms, making it a versatile and efficient solution for real-world edge-cloud scenarios.</p> <p><br></p> <p>Overall, this thesis provides novel insights, innovative techniques, and intelligent solutions to enable energy-efficient AI at the edge. The contributions presented herein serve as a solid foundation for future researchers to build upon, driving innovation and shaping the trajectory of research in edge AI.</p> Computer vision Energy-efficient computing Deep learning Edge AI deep learning at IoT edge collaborative AI Edge inference embedded systems (ES) deep neural networks (DNNs) Object detection and classification Approximate Computing Approximate Systems energy efficiency Accuracy - Efficiency trade-off Multimodal Deep Learning

Search results