Spelling suggestions: "subject:"pytorch"" "subject:"gpytorch""
31 |
OBJECT DETECTION USING VISION TRANSFORMED EFFICIENTDETShreyanil Kar (16285265) 30 August 2023 (has links)
<p>This research presents a novel approach for object detection by integrating Vision Transformers (ViT) into the EfficientDet architecture. The field of computer vision, encompassing artificial intelligence, focuses on the interpretation and analysis of visual data. Recent advancements in deep learning, particularly convolutional neural networks (CNNs), have significantly improved the accuracy and efficiency of computer vision systems. Object detection, a widely studied application within computer vision, involves the identification and localization of objects in images.</p>
<p>The ViT backbone, renowned for its success in image classification and natural language processing tasks, employs self-attention mechanisms to capture global dependencies in input images. However, ViT’s capability to capture fine-grained details and context information is limited. To address this limitation, the integration of ViT into the EfficientDet architecture is proposed. EfficientDet is recognized for its efficiency and accuracy in object detection. By combining the strengths of ViT and EfficientDet, the proposed integration enhances the network’s ability to capture fine-grained details and context information. It leverages ViT’s global dependency modeling alongside EfficientDet’s efficient object detection framework, resulting in highly accurate and efficient performance. Noteworthy object detection frameworks utilized in the industry, such as RetinaNet, EfficientNet, and EfficientDet, primarily employ convolution.</p>
<p>Experimental evaluations were conducted using the PASCAL VOC 2007 and 2012 datasets, widely acknowledged benchmarks for object detection. The integrated ViT-EfficientDet model achieved an impressive mean Average Precision (mAP) score of 86.27% when tested on the PASCAL VOC 2007 dataset, demonstrating its superior accuracy. These results underscore the potential of the proposed integration for real-world applications.</p>
<p>In conclusion, the research introduces a novel integration of Vision Transformers into the EfficientDet architecture, yielding significant improvements in object detection performance. By combining ViT’s ability to capture global dependencies with EfficientDet’s efficiency and accuracy, the proposed approach offers enhanced object detection capabilities. Future research directions may explore additional datasets and evaluate the performance of the proposed framework across various computer vision tasks.</p>
|
32 |
EXAMINATION OF A PRIORI SIMULATION PROCESS ESTIMATION ON STRUCTURAL ANALYSIS CASEMatthew R Spinazzola (14221838) 07 December 2022 (has links)
<p> </p>
<p>In the field of Engineering Analysis and Simulation, part simplification is often used to reduce the computational time and requirements of finite element solvers. Reducing the complexity of the model through simplification introduces error into the analysis, the amount of which depends on the engineering scenario, CAD model, and method of simplification. Expert Analysts utilize their experience and understanding to mitigate the error in analysis through intelligent simplification method selection, however, there is no formalized system of selection. Artificial Intelligence, specifically through the use of Machine Learning algorithms, has been explored as a method of capturing and automating upon this informal knowledge. One existing method which found success only explored Computational Fluid Dynamics simulations without validating the method on other kinds of engineering analysis cases. This study attempts to validate this a priori method on a new situation and directly compare the results between studies. To accomplish this, a new CAD Assembly model database was generated of over 300 simplified and non-simplified examples. Afterwards, the models were subjected to a Structural Analysis simulation, where analysis data could be generated and stored. Finally, a Regression Neural Network was utilized to create Machine Learning models to predict analysis result errors. This study examines the question of how minimal a neural network architecture will be able to make predictions with a comparable accuracy to that of the previous studies. </p>
|
33 |
Enhanced 3D Object Detection And Tracking In Autonomous Vehicles: An Efficient Multi-modal Deep Fusion ApproachPriyank Kalgaonkar (10911822) 03 September 2024 (has links)
<p dir="ltr">This dissertation delves into a significant challenge for Autonomous Vehicles (AVs): achieving efficient and robust perception under adverse weather and lighting conditions. Systems that rely solely on cameras face difficulties with visibility over long distances, while radar-only systems struggle to recognize features like stop signs, which are crucial for safe navigation in such scenarios.</p><p dir="ltr">To overcome this limitation, this research introduces a novel deep camera-radar fusion approach using neural networks. This method ensures reliable AV perception regardless of weather or lighting conditions. Cameras, similar to human vision, are adept at capturing rich semantic information, whereas radars can penetrate obstacles like fog and darkness, similar to X-ray vision.</p><p dir="ltr">The thesis presents NeXtFusion, an innovative and efficient camera-radar fusion network designed specifically for robust AV perception. Building on the efficient single-sensor NeXtDet neural network, NeXtFusion significantly enhances object detection accuracy and tracking. A notable feature of NeXtFusion is its attention module, which refines critical feature representation for object detection, minimizing information loss when processing data from both cameras and radars.</p><p dir="ltr">Extensive experiments conducted on large-scale datasets such as Argoverse, Microsoft COCO, and nuScenes thoroughly evaluate the capabilities of NeXtDet and NeXtFusion. The results show that NeXtFusion excels in detecting small and distant objects compared to existing methods. Notably, NeXtFusion achieves a state-of-the-art mAP score of 0.473 on the nuScenes validation set, outperforming competitors like OFT by 35.1% and MonoDIS by 9.5%.</p><p dir="ltr">NeXtFusion’s excellence extends beyond mAP scores. It also performs well in other crucial metrics, including mATE (0.449) and mAOE (0.534), highlighting its overall effectiveness in 3D object detection. Visualizations of real-world scenarios from the nuScenes dataset processed by NeXtFusion provide compelling evidence of its capability to handle diverse and challenging environments.</p>
|
34 |
Ambient Temperature Estimation : Exploring Machine Learning Models for Ambient TemperatureEstimation Using Mobile’s Internal SensorsOmar, Alfakir January 2024 (has links)
Ambient temperature poses a significant challenge to the performance of mobile phones, impacting their internal thermal flow and increasing the likelihood of overheating, leading to a compromised user experience. The knowledge about the ambient temperature in mobile phones is crucial as it assists engineers in correlating external factors with internal factors that might affect the mobile's performance under various conditions. Notably, these devices lack dedicated sensors to measure ambient temperature independently, underscoring the need for innovative solutions to estimate it accurately. In response to this challenge, our research investigates the feasibility of estimating ambient temperature using machine-learning algorithms based on data from internal thermal sensors in Sony mobile phones. Through comprehensive data collection and analysis, custom datasets were constructed to simulate different use-case scenarios, including CPU workloads, camera operation, and GPU tasks. These scenarios introduced varying levels of thermal disturbance, providing a robust basis for evaluating model performance. Feature engineering played a pivotal role in ensuring that the models could effectively interpret the internal thermal dynamics and correlate them with the ambient temperature. The results demonstrate that while simpler models like Linear Regression offer computational efficiency, they fall short in scenarios with complex thermal patterns. In contrast, deep learning models, particularly those incorporating time series analysis, showed superior accuracy and robustness. The Attention-LSTM model, in particular, excelled in generalizing across diverse and novel thermal conditions, although its complexity poses challenges for on-device deployment. This research underscores the importance of selecting appropriate sensors and incorporating a wide range of training scenarios to enhance model performance. It also highlights the potential of advanced machine learning techniques in providing advance solutions for ambient temperature estimation, thereby contributing to more effective thermal management in mobile devices.
|
35 |
AI on the Edge with CondenseNeXt: An Efficient Deep Neural Network for Devices with Constrained Computational ResourcesPriyank Kalgaonkar (10911822) 05 August 2021 (has links)
Research work presented within this thesis propose a neoteric variant of deep convolutional neural network architecture, CondenseNeXt, designed specifically for ARM-based embedded computing platforms with constrained computational resources. CondenseNeXt is an improved version of CondenseNet, the baseline architecture whose roots can be traced back to ResNet. CondeseNeXt replaces group convolutions in CondenseNet with depthwise separable convolutions and introduces group-wise pruning, a model compression technique, to prune (remove) redundant and insignificant elements that either are irrelevant or do not affect performance of the network upon disposition. Cardinality, a new dimension to the existing spatial dimensions, and class-balanced focal loss function, a weighting factor inversely proportional to the number of samples, has been incorporated in order to relieve the harsh effects of pruning, into the design of CondenseNeXt’s algorithm. Furthermore, extensive analyses of this novel CNN architecture was performed on three benchmarking image datasets: CIFAR-10, CIFAR-100 and ImageNet by deploying the trained weight on to an ARM-based embedded computing platform: NXP BlueBox 2.0, for real-time image classification. The outputs are observed in real-time in RTMaps Remote Studio’s console to verify the correctness of classes being predicted. CondenseNeXt achieves state-of-the-art image classification performance on three benchmark datasets including CIFAR-10 (4.79% top-1 error), CIFAR-100 (21.98% top-1 error) and ImageNet (7.91% single model, single crop top-5 error), and up to 59.98% reduction in forward FLOPs compared to CondenseNet. CondenseNeXt can also achieve a final trained model size of 2.9 MB, however at the cost of 2.26% in accuracy loss. Thus, performing image classification on ARM-Based computing platforms without requiring a CUDA enabled GPU support, with outstanding efficiency.<br>
|
36 |
GPS-Free UAV Geo-Localization Using a Reference 3D DatabaseKarlsson, Justus January 2022 (has links)
The goal of this thesis has been global geolocalization using only visual input and a 3D database for reference. In recent years Convolutional Neural Networks (CNNs) have seen huge success in the task of classifying images. The flattened tensors at the final layers of a CNN can be viewed as vectors describing different input image features. Two networks were trained so that satellite and aerial images taken from different views of the same location had feature vectors that were similar. The networks were also trained so that images taken from different locations had different feature vectors. After training, the position of a given aerial image can then be estimated by finding the satellite image with a feature vector that is the most similar to that of the aerial image. A previous method called Where-CNN was used as a baseline model. Batch-Hard triplet loss, the Adam optimizer, and a different CNN backbone were tested as possible augmentations to this method. The models were trained on 2640 different locations in Linköping and Norrköping. The models were then tested on a sequence of 4411 query images along a path in Jönköping. The search region had 1449 different locations constituting a total area of 24km2. In Top-1% accuracy, there was a significant improvement over the baseline, increasing from 61.62% accuracy to 88.62%. The environment was modeled as a Hidden Markov Model to filter the sequence of guesses. The Viterbi algorithm was then used to find the most probable path. This filtering procedure reduced the average error along the path from 2328.0 m to just 264.4 m for the best model. Here the baseline had an average error of 563.0 m after filtering. A few different 3D methods were also tested. One drawback was that no pretrained weights existed for these models, as opposed to the 2D models, which were pretrained on the ImageNet dataset. The best 3D model achieved a Top-1% accuracy of 70.41%. It should be noted that the best 2D model without using any pretraining achieved a lower Top-1% accuracy of 49.38%. In addition, a 3D method for efficiently doing convolution on sparse 3D data was presented. Compared to the straight-forward method, it was almost 2.5 times faster while still having comparable accuracy at individual query prediction. While there was a significant improvement over the baseline, it was not significant enough to provide reliable and accurate localization for individual images. For global navigation, using the entire Earth as search space, the information in a 2D image might not be enough to be uniquely identifiable. However, the 3D CNN techniques tested did not improve the results of the pretrained 2D models. The use of more data and experimentation with different 3D CNN architectures is a direction in which further research would be exciting.
|
37 |
Exploring feasibility of reinforcement learning flight route planning / Undersökning av använding av förstärkningsinlärning för flyruttsplanneringWickman, Axel January 2021 (has links)
This thesis explores and compares traditional and reinforcement learning (RL) methods of performing 2D flight path planning in 3D space. A wide overview of natural, classic, and learning approaches to planning s done in conjunction with a review of some general recurring problems and tradeoffs that appear within planning. This general background then serves as a basis for motivating different possible solutions for this specific problem. These solutions are implemented, together with a testbed inform of a parallelizable simulation environment. This environment makes use of random world generation and physics combined with an aerodynamical model. An A* planner, a local RL planner, and a global RL planner are developed and compared against each other in terms of performance, speed, and general behavior. An autopilot model is also trained and used both to measure flight feasibility and to constrain the planners to followable paths. All planners were partially successful, with the global planner exhibiting the highest overall performance. The RL planners were also found to be more reliable in terms of both speed and followability because of their ability to leave difficult decisions to the autopilot. From this it is concluded that machine learning in general, and reinforcement learning in particular, is a promising future avenue for solving the problem of flight route planning in dangerous environments.
|
Page generated in 0.0219 seconds