Spelling suggestions: "subject:"pytorch"" "subject:"gpytorch""
31 |
OBJECT DETECTION USING VISION TRANSFORMED EFFICIENTDETShreyanil Kar (16285265) 30 August 2023 (has links)
<p>This research presents a novel approach for object detection by integrating Vision Transformers (ViT) into the EfficientDet architecture. The field of computer vision, encompassing artificial intelligence, focuses on the interpretation and analysis of visual data. Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have significantly improved the accuracy and efficiency of computer vision systems. Object detection, a widely studied application within computer vision, involves the identification and localization of objects in images.</p>
<p>The ViT backbone, renowned for its success in image classification and natural language processing tasks, employs self-attention mechanisms to capture global dependencies in input images. However, ViT’s capability to capture fine-grained details and context information is limited. To address this limitation, the integration of ViT into the EfficientDet architecture is proposed. EfficientDet is recognized for its efficiency and accuracy in object detection. By combining the strengths of ViT and EfficientDet, the proposed integration enhances the network’s ability to capture fine-grained details and context information. It leverages ViT’s global dependency modeling alongside EfficientDet’s efficient object detection framework, resulting in highly accurate and efficient performance. Noteworthy object detection frameworks utilized in the industry, such as RetinaNet, EfficientNet, and EfficientDet, primarily employ convolution.</p>
<p>Experimental evaluations were conducted using the PASCAL VOC 2007 and 2012 datasets, widely acknowledged benchmarks for object detection. The integrated ViT-EfficientDet model achieved an impressive mean Average Precision (mAP) score of 86.27% when tested on the PASCAL VOC 2007 dataset, demonstrating its superior accuracy. These results underscore the potential of the proposed integration for real-world applications.</p>
<p>In conclusion, the research introduces a novel integration of Vision Transformers into the EfficientDet architecture, yielding significant improvements in object detection performance. By combining ViT’s ability to capture global dependencies with EfficientDet’s efficiency and accuracy, the proposed approach offers enhanced object detection capabilities. Future research directions may explore additional datasets and evaluate the performance of the proposed framework across various computer vision tasks.</p>
|
32 |
EXAMINATION OF A PRIORI SIMULATION PROCESS ESTIMATION ON STRUCTURAL ANALYSIS CASEMatthew R Spinazzola (14221838) 07 December 2022 (has links)
<p> </p>
<p>In the field of Engineering Analysis and Simulation, part simplification is often used to reduce the computational time and requirements of finite element solvers. Reducing the complexity of the model through simplification introduces error into the analysis, the amount of which depends on the engineering scenario, CAD model, and method of simplification. Expert Analysts utilize their experience and understanding to mitigate the error in analysis through intelligent simplification method selection, however, there is no formalized system of selection. Artificial Intelligence, specifically through the use of Machine Learning algorithms, has been explored as a method of capturing and automating upon this informal knowledge. One existing method which found success only explored Computational Fluid Dynamics simulations without validating the method on other kinds of engineering analysis cases. This study attempts to validate this a priori method on a new situation and directly compare the results between studies. To accomplish this, a new CAD Assembly model database was generated of over 300 simplified and non-simplified examples. Afterwards, the models were subjected to a Structural Analysis simulation, where analysis data could be generated and stored. Finally, a Regression Neural Network was utilized to create Machine Learning models to predict analysis result errors. This study examines the question of how minimal a neural network architecture will be able to make predictions with a comparable accuracy to that of the previous studies. </p>
|
33 |
Enhanced 3D Object Detection And Tracking In Autonomous Vehicles: An Efficient Multi-modal Deep Fusion ApproachPriyank Kalgaonkar (10911822) 03 September 2024 (has links)
<p dir="ltr">This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios.</p><p dir="ltr">To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision.</p><p dir="ltr">The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars.</p><p dir="ltr">Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%.</p><p dir="ltr">NeXtFusion’s excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.</p>
|
34 |
Ambient Temperature Estimation : Exploring Machine Learning Models for Ambient TemperatureEstimation Using Mobile’s Internal SensorsOmar, Alfakir January 2024 (has links)
Ambient temperature poses a significant challenge to the performance of mobile phones, impacting their internal thermal flow and increasing the likelihood of overheating, leading to a compromised user experience. The knowledge about the ambient temperature in mobile phones is crucial as it assists engineers in correlating external factors with internal factors that might affect the mobile's performance under various conditions. Notably, these devices lack dedicated sensors to measure ambient temperature independently, underscoring the need for innovative solutions to estimate it accurately. In response to this challenge, our research investigates the feasibility of estimating ambient temperature using machine-learning algorithms based on data from internal thermal sensors in Sony mobile phones. Through comprehensive data collection and analysis, custom datasets were constructed to simulate different use-case scenarios, including CPU workloads, camera operation, and GPU tasks. These scenarios introduced varying levels of thermal disturbance, providing a robust basis for evaluating model performance. Feature engineering played a pivotal role in ensuring that the models could effectively interpret the internal thermal dynamics and correlate them with the ambient temperature. The results demonstrate that while simpler models like Linear Regression offer computational efficiency, they fall short in scenarios with complex thermal patterns. In contrast, deep learning models, particularly those incorporating time series analysis, showed superior accuracy and robustness. The Attention-LSTM model, in particular, excelled in generalizing across diverse and novel thermal conditions, although its complexity poses challenges for on-device deployment. This research underscores the importance of selecting appropriate sensors and incorporating a wide range of training scenarios to enhance model performance. It also highlights the potential of advanced machine learning techniques in providing advance solutions for ambient temperature estimation, thereby contributing to more effective thermal management in mobile devices.
|
35 |
AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational ResourcesPriyank Kalgaonkar (10911822) 05 August 2021 (has links)
Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.<br>
|
36 |
GPS-Free UAV Geo-Localization Using a Reference 3D DatabaseKarlsson, Justus January 2022 (has links)
The goal of this thesis has been global geolocalization using only visual input and a 3D database for reference. In recent years Convolutional Neural Networks (CNNs) have seen huge success in the task of classifying images. The flattened tensors at the final layers of a CNN can be viewed as vectors describing different input image features. Two networks were trained so that satellite and aerial images taken from different views of the same location had feature vectors that were similar. The networks were also trained so that images taken from different locations had different feature vectors. After training, the position of a given aerial image can then be estimated by finding the satellite image with a feature vector that is the most similar to that of the aerial image. A previous method called Where-CNN was used as a baseline model. Batch-Hard triplet loss, the Adam optimizer, and a different CNN backbone were tested as possible augmentations to this method. The models were trained on 2640 different locations in Linköping and Norrköping. The models were then tested on a sequence of 4411 query images along a path in Jönköping. The search region had 1449 different locations constituting a total area of 24km2. In Top-1% accuracy, there was a significant improvement over the baseline, increasing from 61.62% accuracy to 88.62%. The environment was modeled as a Hidden Markov Model to filter the sequence of guesses. The Viterbi algorithm was then used to find the most probable path. This filtering procedure reduced the average error along the path from 2328.0 m to just 264.4 m for the best model. Here the baseline had an average error of 563.0 m after filtering. A few different 3D methods were also tested. One drawback was that no pretrained weights existed for these models, as opposed to the 2D models, which were pretrained on the ImageNet dataset. The best 3D model achieved a Top-1% accuracy of 70.41%. It should be noted that the best 2D model without using any pretraining achieved a lower Top-1% accuracy of 49.38%. In addition, a 3D method for efficiently doing convolution on sparse 3D data was presented. Compared to the straight-forward method, it was almost 2.5 times faster while still having comparable accuracy at individual query prediction. While there was a significant improvement over the baseline, it was not significant enough to provide reliable and accurate localization for individual images. For global navigation, using the entire Earth as search space, the information in a 2D image might not be enough to be uniquely identifiable. However, the 3D CNN techniques tested did not improve the results of the pretrained 2D models. The use of more data and experimentation with different 3D CNN architectures is a direction in which further research would be exciting.
|
37 |
Exploring feasibility of reinforcement learning flight route planning / Undersökning av använding av förstärkningsinlärning för flyruttsplanneringWickman, Axel January 2021 (has links)
This thesis explores and compares traditional and reinforcement learning (RL) methods of performing 2D flight path planning in 3D space. A wide overview of natural, classic, and learning approaches to planning s done in conjunction with a review of some general recurring problems and tradeoffs that appear within planning. This general background then serves as a basis for motivating different possible solutions for this specific problem. These solutions are implemented, together with a testbed inform of a parallelizable simulation environment. This environment makes use of random world generation and physics combined with an aerodynamical model. An A* planner, a local RL planner, and a global RL planner are developed and compared against each other in terms of performance, speed, and general behavior. An autopilot model is also trained and used both to measure flight feasibility and to constrain the planners to followable paths. All planners were partially successful, with the global planner exhibiting the highest overall performance. The RL planners were also found to be more reliable in terms of both speed and followability because of their ability to leave difficult decisions to the autopilot. From this it is concluded that machine learning in general, and reinforcement learning in particular, is a promising future avenue for solving the problem of flight route planning in dangerous environments.
|
38 |
Constrained optimization for machine learning : algorithms and applicationsGallego-Posada, Jose 06 1900 (has links)
Le déploiement généralisé de modèles d’apprentissage automatique de plus en plus performants a entraîné des pressions croissantes pour améliorer la robustesse, la sécurité et l’équité de ces modèles—-souvent en raison de considérations réglementaires et éthiques. En outre, la mise en œuvre de solutions d’intelligence artificielle dans des applications réelles est limitée par leur incapacité actuelle à garantir la conformité aux normes industrielles et aux réglementations gouvernementales. Les pipelines standards pour le développement de modèles d’apprentissage automatique adoptent une mentalité de “construire maintenant, réparer plus tard”, intégrant des mesures de sécurité a posteriori. Cette accumulation continue de dette technique entrave le progrès du domaine à long terme.
L’optimisation sous contraintes offre un cadre conceptuel accompagné d’outils algorithmiques permettant d’imposer de manière fiable des propriétés complexes sur des modèles d’apprentissage automatique. Cette thèse appelle à un changement de paradigme dans lequel les contraintes constituent une partie intégrante du processus de développement des modèles, visant à produire des modèles d’apprentissage automatique qui sont intrinsèquement sécurisés par conception.
Cette thèse offre une perspective holistique sur l’usage de l’optimisation sous contraintes dans les tâches d’apprentissage profond. Nous examinerons i) la nécessité de formulations contraintes, ii) les avantages offerts par le point de vue de l’optimisation sous contraintes, et iii) les défis algorithmiques qui surgissent dans la résolution de ces problèmes. Nous présentons plusieurs études de cas illustrant l’application des techniques d’optimisation sous contraintes à des problèmes courants d’apprentissage automatique.
Dans la Contribution I, nous plaidons en faveur de l’utilisation des formulations sous contraintes en apprentissage automatique. Nous soutenons qu’il est préférable de gérer des régularisateurs interprétables via des contraintes explicites plutôt que par des pénalités additives, particulièrement lorsqu’il s’agit de modèles non convexes. Nous considérons l’entraînement de modèles creux avec une régularisation L0 et démontrons que i) il est possible de trouver des solutions réalisables et performantes à des problèmes de grande envergure avec des contraintes non convexes ; et que ii) l’approche contrainte peut éviter les coûteux ajustements par essais et erreurs inhérents aux techniques basées sur les pénalités.
La Contribution II approfondit la contribution précédente en imposant des contraintes explicites sur le taux de compression atteint par les Représentations Neuronales Implicites—-une classe de modèles visant à entreposer efficacement des données (telles qu’une image) dans les paramètres d’un réseau neuronal. Dans ce travail, nous nous concentrons sur l’interaction entre la taille du modèle, sa capacité représentationnelle, et le temps d’entraînement requis. Plutôt que de restreindre la taille du modèle à un budget fixe (qui se conforme au taux de compression requis), nous entraînons un modèle surparamétré et creux avec des contraintes de taux de compression. Cela nous permet d’exploiter la puissance de modèles plus grands pour obtenir de meilleures reconstructions, plus rapidement, sans avoir à nous engager à leur taux de compression indésirable.
La Contribution III présente les avantages des formulations sous contraintes dans une application réaliste de la parcimonie des modèles avec des contraintes liées à l’équité non différentiables. Les performances des réseaux neuronaux élagués se dégradent de manière inégale entre les sous-groupes de données, nécessitant ainsi l’utilisation de techniques d’atténuation. Nous proposons une formulation qui impose des contraintes sur les changements de précision du modèle dans chaque sous-groupe, contrairement aux travaux antérieurs qui considèrent des contraintes basées sur des métriques de substitution (telles que la perte du sous-groupe). Nous abordons les défis de la non-différentiabilité et de la stochasticité posés par nos contraintes proposées, et démontrons que notre méthode s’adapte de manière fiable aux problèmes d’optimisation impliquant de grands modèles et des centaines de sous-groupes.
Dans la Contribution IV, nous nous concentrons sur la dynamique de l’optimisation lagrangienne basée sur le gradient, une technique populaire pour résoudre les problèmes sous contraintes non convexes en apprentissage profond. La nature adversariale du jeu min-max lagrangien le rend sujet à des comportements oscillatoires ou instables. En nous basant sur des idées tirées de la littérature sur les régulateurs PID, nous proposons un algorithme pour modifier les multiplicateurs de Lagrange qui offre une dynamique d’entraînement robuste et stable. Cette contribution met en place les bases pour que les praticiens adoptent et mettent en œuvre des approches sous contraintes avec confiance dans diverses applications réelles.
Dans la Contribution V, nous fournissons un aperçu de Cooper : une bibliothèque pour l’optimisation sous contraintes basée sur le lagrangien dans PyTorch. Cette bibliothèque open-source implémente toutes les contributions principales présentées dans les chapitres précédents et s’intègre harmonieusement dans le cadre PyTorch. Nous avons développé Cooper dans le but de rendre les techniques d’optimisation sous contraintes facilement accessibles aux chercheurs et praticiens de l’apprentissage automatique. / The widespread deployment of increasingly capable machine learning models has resulted in mounting pressures to enhance the robustness, safety and fairness of such models--often arising from regulatory and ethical considerations. Further, the implementation of artificial intelligence solutions in real-world applications is limited by their current inability to guarantee compliance with industry standards and governmental regulations. Current standard pipelines for developing machine learning models embrace a “build now, fix later” mentality, retrofitting safety measures as afterthoughts. This continuous incurrence of technical debt hinders the progress of the field in the long-term.
Constrained optimization offers a conceptual framework accompanied by algorithmic tools for reliably enforcing complex properties on machine learning models. This thesis calls for a paradigm shift in which constraints constitute an integral part of the model development process, aiming to produce machine learning models that are inherently secure by design.
This thesis provides a holistic perspective on the use of constrained optimization in deep learning tasks. We shall explore i) the need for constrained formulations, ii) the advantages afforded by the constrained optimization standpoint and iii) the algorithmic challenges arising in the solution of such problems. We present several case-studies illustrating the application of constrained optimization techniques to popular machine learning problems.
In Contribution I, we advocate for the use of constrained formulations in machine learning. We argue that it is preferable to handle interpretable regularizers via explicit constraints, rather than using additive penalties, specially when dealing with non-convex models. We consider the training of sparse models with L0-regularization and demonstrate that i) it is possible to find feasible, well-performing solutions to large-scale problems with non-convex constraints; and that ii) the constrained approach can avoid the costly trial-and-error tuning inherent to penalty-based techniques.
Contribution II expands on the previous contribution by imposing explicit constraints on the compression-rate achieved by Implicit Neural Representations—-a class of models that aim to efficiently store data (such as an image) within a neural network’s parameters. In this work we concentrate on the interplay between the model size, its representational capacity and the required training time. Rather than restricting the model size to a fixed budget (that complies with the required compression rate), we train an overparametrized, sparse model with compression-rate constraints. This allows us to exploit the power of larger models to achieve better reconstructions, faster; without having to commit to their undesirable compression rate.
Contribution III showcases the advantages of constrained formulations in a realistic model sparsity application with non-differentiable fairness-related constraints. The performance of pruned neural networks degrades unevenly across data sub-groups, thus requiring the use of mitigation techniques. We propose a formulation that imposes constraints on changes in the model accuracy in each sub-group, in contrast to prior work which considers constraints based on surrogate metrics (such as the sub-group loss). We address the non-differentiability and stochasticity challenges posed by our proposed constraints, and demonstrate that our method scales reliably to optimization problems involving large models and hundreds of sub-groups.
In Contribution IV, we focus on the dynamics of gradient-based Lagrangian optimization, a popular technique for solving the non-convex constrained problems arising in deep learning. The adversarial nature of the min-max Lagrangian game makes it prone to oscillatory or unstable behaviors. Based on ideas from the PID control literature, we propose an algorithm for updating the Lagrange multipliers which yields robust, stable training dynamics. This contribution lays the groundwork for practitioners to adopt and implement constrained approaches confidently in diverse real-world applications.
In Contribution V, we provide an overview of Cooper: a library for Lagrangian-based constrained optimization in PyTorch. This open-source library implements all the core contributions presented in the preceding chapters and integrates seamlessly with the PyTorch framework. We developed Cooper with the goal of making constrained optimization techniques readily available to machine learning researchers and practitioners.
|
Page generated in 0.0774 seconds