Global ETD Search

301	Multi-Scale Task Dynamics in Transfer and Multi-Task Learning : Towards Efficient Perception for Autonomous Driving / Flerskalig Uppgiftsdynamik vid Överförings- och Multiuppgiftsinlärning : Mot Effektiv Perception för Självkörande Fordon Ekman von Huth, Simon January 2023 (has links) Autonomous driving technology has the potential to revolutionize the way we think about transportation and its impact on society. Perceiving the environment is a key aspect of autonomous driving, which involves multiple computer vision tasks. Multi-scale deep learning has dramatically improved the performance on many computer vision tasks, but its practical use in autonomous driving is limited by the available resources in embedded systems. Multi-task learning offers a solution to this problem by allowing more compact deep learning models that share parameters between tasks. However, not all tasks benefit from being learned together. One way of avoiding task interference during training is to learn tasks in sequence, with each task providing useful information for the next – a scheme which builds on transfer learning. Multi-task and transfer dynamics are both concerned with the relationships between tasks, but have previously only been studied separately. This Master’s thesis investigates how different computer vision tasks relate to each other in the context of multi-task and transfer learning, using a state-ofthe-art efficient multi-scale deep learning model. Through an experimental research methodology, the performance on semantic segmentation, depth estimation, and object detection were evaluated on the Virtual KITTI 2 dataset in a multi-task and transfer learning setting. In addition, transfer learning with a frozen encoder was compared to constrained encoder fine tuning, to uncover the effects of fine-tuning on task dynamics. The results suggest that findings from previous work regarding semantic segmentation and depth estimation in multi-task learning generalize to multi-scale learning on autonomous driving data. Further, no statistically significant correlation was found between multitask learning dynamics and transfer learning dynamics. An analysis of the results from transfer learning indicate that some tasks might be more sensitive to fine-tuning than others, suggesting that transferring with a frozen encoder only captures a subset of the complexities involved in transfer relationships. Regarding object detection, it is observed to negatively impact the performance on other tasks during multi-task learning, but might be a valuable task to transfer from due to lower annotation costs. Possible avenues for future work include applying the used methodology to real-world datasets and exploring ways of utilizing the presented findings for more efficient perception algorithms. / Självkörande teknik har potential att revolutionera transport och dess påverkan på samhället. Självkörning medför ett flertal uppgifter inom datorseende, som bäst löses med djupa neurala nätverk som lär sig att tolka bilder på flera olika skalor. Begränsningar i mobil hårdvara kräver dock att tekniker som multiuppgifts- och sekventiell inlärning används för att minska neurala nätverkets fotavtryck, där sekventiell inlärning bygger på överföringsinlärning. Dynamiken bakom både multiuppgiftsinlärning och överföringsinlärning kan till stor del krediteras relationen mellan olika uppdrag. Tidigare studier har dock bara undersökt dessa dynamiker var för sig. Detta examensarbete undersöker relationen mellan olika uppdrag inom datorseende från perspektivet av både multiuppgifts- och överföringsinlärning. En experimentell forskningsmetodik användes för att jämföra och undersöka tre uppgifter inom datorseende på datasetet Virtual KITTI 2. Resultaten stärker tidigare forskning och föreslår att tidigare fynd kan generaliseras till flerskaliga nätverk och data för självkörning. Resultaten visar inte på någon signifikant korrelation mellan multiuppgift- och överföringsdynamik. Slutligen antyder resultaten att vissa uppgiftspar ställer högre krav än andra på att nätverket anpassas efter överföring. Autonomous Driving Computer Vision Deep Learning Machine Learning Multi-Task Learning Transfer Learning Task Relationships Task Dynamics Python Multi-Scale Representation Learning Fuss-Free Network Självkörande Fordon Datorseende Djupinlärning Maskininlärning Multiuppgiftsinlärning Överföringsinlärning Uppgiftsrelationer Uppgiftsdynamik Python Flerskalig Representationsinlärning Fuss-Free Nätverk Computer and Information Sciences Data- och informationsvetenskap
302	3D OBJECT DETECTION USING VIRTUAL ENVIRONMENT ASSISTED DEEP NETWORK TRAINING Ashley S Dale (8771429) 07 January 2021 (has links) <div> <div> <div> <p>An RGBZ synthetic dataset consisting of five object classes in a variety of virtual environments and orientations was combined with a small sample of real-world image data and used to train the Mask R-CNN (MR-CNN) architecture in a variety of configurations. When the MR-CNN architecture was initialized with MS COCO weights and the heads were trained with a mix of synthetic data and real world data, F1 scores improved in four of the five classes: The average maximum F1-score of all classes and all epochs for the networks trained with synthetic data is F1∗ = 0.91, compared to F1 = 0.89 for the networks trained exclusively with real data, and the standard deviation of the maximum mean F1-score for synthetically trained networks is σ∗ <sub>F1 </sub>= 0.015, compared to σF 1 = 0.020 for the networks trained exclusively with real data. Various backgrounds in synthetic data were shown to have negligible impact on F1 scores, opening the door to abstract backgrounds and minimizing the need for intensive synthetic data fabrication. When the MR-CNN architecture was initialized with MS COCO weights and depth data was included in the training data, the net- work was shown to rely heavily on the initial convolutional input to feed features into the network, the image depth channel was shown to influence mask generation, and the image color channels were shown to influence object classification. A set of latent variables for a subset of the synthetic datatset was generated with a Variational Autoencoder then analyzed using Principle Component Analysis and Uniform Manifold Projection and Approximation (UMAP). The UMAP analysis showed no meaningful distinction between real-world and synthetic data, and a small bias towards clustering based on image background. </p></div></div></div> Computer Engineering Software Engineering Applied Computer Science Information Systems Signal Processing Computer Graphics Computer Vision Image Processing Pattern Recognition and Data Mining Simulation and Modelling Virtual Reality and Related Simulation Applied Discrete Mathematics Coding and Information Theory Conceptual Modelling Information Engineering and Theory Machine Learning Mask R-CNN Artificial Intelligence Image Processing 3D Image Multispectral Data Signal Processing CNN DNN Object Detection Threat Detection Virtual Environments Synthetic Dataset F1 score image segmentation methods Image Segmentation RGBD RGBD video RGBZ Algorithms MS COCO Microsoft Common Objects in Context Transfer Learning
303	Machine Learning for Speech Forensics and Hypersonic Vehicle Applications Emily R Bartusiak (6630773) 06 December 2022 (has links) <p>Synthesized speech may be used for nefarious purposes, such as fraud, spoofing, and misinformation campaigns. We present several speech forensics methods based on deep learning to protect against such attacks. First, we use a convolutional neural network (CNN) and transformers to detect synthesized speech. Then, we investigate closed set and open set speech synthesizer attribution. We use a transformer to attribute a speech signal to its source (i.e., to identify the speech synthesizer that created it). Additionally, we show that our approach separates different known and unknown speech synthesizers in its latent space, even though it has not seen any of the unknown speech synthesizers during training. Next, we explore machine learning for an objective in the aerospace domain.</p> <p><br></p> <p>Compared to conventional ballistic vehicles and cruise vehicles, hypersonic glide vehicles (HGVs) exhibit unprecedented abilities. They travel faster than Mach 5 and maneuver to evade defense systems and hinder prediction of their final destinations. We investigate machine learning for identifying different HGVs and a conic reentry vehicle (CRV) based on their aerodynamic state estimates. We also propose a HGV flight phase prediction method. Inspired by natural language processing (NLP), we model flight phases as “words” and HGV trajectories as “sentences.” Next, we learn a “grammar” from the HGV trajectories that describes their flight phase transition patterns. Given “words” from the initial part of a HGV trajectory and the “grammar”, we predict future “words” in the “sentence” (i.e., future HGV flight phases in the trajectory). We demonstrate that this approach successfully predicts future flight phases for HGV trajectories, especially in scenarios with limited training data. We also show that it can be used in a transfer learning scenario to predict flight phases of HGV trajectories that exhibit new maneuvers and behaviors never seen before during training.</p> Audio processing Computer vision Digital forensics Deep learning machine learning deep learning speech forensics media forensics convolutional neural network transformer convolutional transformer ensemble spectrogram analysis mel spectrogram analysis synthesized speech detection synthesized speech attribution closed set open set t-stochastic neighbor embedding latent space analysis hypersonics hypersonic glide vehicles vehicle classification flight phase prediction stochastic grammar k-nearest neighbors support vector machine probabilistic context-free grammar automatic distillation of structure generalized earley parser transfer learning

Page generated in 0.0367 seconds