• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1850
  • 57
  • 54
  • 38
  • 37
  • 37
  • 19
  • 13
  • 10
  • 7
  • 4
  • 4
  • 2
  • 2
  • 1
  • Tagged with
  • 2668
  • 2668
  • 1104
  • 955
  • 832
  • 608
  • 579
  • 488
  • 487
  • 463
  • 438
  • 432
  • 411
  • 410
  • 373
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

ENHANCING PRECISION OF OBJECT DETECTORS: BRIDGING CLASSIFICATION AND LOCALIZATION GAPS FOR 2D AND 3D MODELS

NIRANJAN RAVI (7013471) 03 June 2024 (has links)
<p dir="ltr">Artificial Intelligence (AI) has revolutionized and accelerated significant advancements in various fields such as healthcare, finance, education, agriculture and the development of autonomous vehicles. We are rapidly approaching Level 5 Autonomy due to recent developments in autonomous technology, including self-driving cars, robot navigation, smart traffic monitoring systems, and dynamic routing. This success has been made possible due to Deep Learning technologies and advanced Computer Vision (CV) algorithms. With the help of perception sensors such as Camera, LiDAR and RADAR, CV algorithms enable a self-driving vehicle to interact with the environment and make intelligent decisions. Object detection lays the foundations for various applications, such as collision and obstacle avoidance, lane detection, pedestrian and vehicular safety, and object tracking. Object detection has two significant components: image classification and object localization. In recent years, enhancing the performance of 2D and 3D object detectors has spiked interest in the research community. This research aims to resolve the drawbacks associated with localization loss estimation of 2D and 3D object detectors by addressing the bounding box regression problem, addressing the class imbalance issue affecting the confidence loss estimation, and finally proposing a dynamic cross-model 3D hybrid object detector with enhanced localization and confidence loss estimation.</p><p dir="ltr">This research aims to address challenges in object detectors through four key contributions. In the first part, we aim to address the problems associated with the image classification component of 2D object detectors. Class imbalance is a common problem associated with supervised training. Common causes are noisy data, a scene with a tiny object surrounded by background pixels, or a dense scene with too many objects. These scenarios can produce many negative samples compared to positive ones, affecting the network learning and reducing the overall performance. We examined these drawbacks and proposed an Enhanced Hard Negative Mining (EHNM) approach, which utilizes anchor boxes with 20% to 50% overlap and positive and negative samples to boost performance. The efficiency of the proposed EHNM was evaluated using Single Shot Multibox Detector (SSD) architecture on the PASCAL VOC dataset, indicating that the detection accuracy of tiny objects increased by 3.9% and 4% and the overall accuracy improved by 0.9%. </p><p dir="ltr">To address localization loss, our second approach investigates drawbacks associated with existing bounding box regression problems, such as poor convergence and incorrect regression. We analyzed various cases, such as when objects are inclusive of one another, two objects with the same centres, two objects with the same centres and similar aspect ratios. During our analysis, we observed existing intersections over Union (IoU) loss and its variant’s failure to address them. We proposed two new loss functions, Improved Intersection Over Union (IIoU) and Balanced Intersection Over Union (BIoU), to enhance performance and minimize computational efforts. Two variants of the YOLOv5 model, YOLOv5n6 and YOLOv5s, were utilized to demonstrate the superior performance of IIoU on PASCAL VOC and CGMU datasets. With help of ROS and NVIDIA’s devices, inference speed was observed in real-time. Extensive experiments were performed to evaluate the performance of BIoU on object detectors. The evaluation results indicated MASK_RCNN network trained on the COCO dataset, YOLOv5n6 network trained on SKU-110K and YOLOv5x trained on the custom e-scooter dataset demonstrated 3.70% increase on small objects, 6.20% on 55% overlap and 9.03% on 80% overlap.</p><p dir="ltr">In the earlier parts, we primarily focused on 2D object detectors. Owing to its success, we extended the scope of our research to 3D object detectors in the later parts. The third portion of our research aims to solve bounding box problems associated with 3D rotated objects. Existing axis-aligned loss functions suffer a performance gap if the objects are rotated. We enhanced the earlier proposed IIoU loss by considering two additional parameters: the objects’ Z-axis and rotation angle. These two parameters aid in localizing the object in 3D space. Evaluation was performed on LiDAR and Fusion methods on 3D KITTI and nuScenes datasets.</p><p dir="ltr">Once we addressed the drawbacks associated with confidence and localization loss, we further explored ways to increase the performance of cross-model 3D object detectors. We discovered from previous studies that perception sensors are volatile to harsh environmental conditions, sunlight, and blurry motion. In the final portion of our research, we propose a hybrid 3D cross-model detection network (MAEGNN) equipped with MaskedAuto Encoders 14 (MAE) and Graph Neural Networks (GNN) along with earlier proposed IIoU and ENHM. The performance evaluation on MAEGNN on the KITTI validation dataset and KITTI test set yielded a detection accuracy of 69.15%, 63.99%, 58.46% and 40.85%, 37.37% on 3D pedestrians with overlap of 50%. This developed hybrid detector overcomes the challenges of localization error and confidence estimation and outperforms many state-of-art 3D object detectors for autonomous platforms.</p>
122

Image Analysis and Deep Learning for Applications in Microscopy

Ishaq, Omer January 2016 (has links)
Quantitative microscopy deals with the extraction of quantitative measurements from samples observed under a microscope. Recent developments in microscopy systems, sample preparation and handling techniques have enabled high throughput biological experiments resulting in large amounts of image data, at biological scales ranging from subcellular structures such as fluorescently tagged nucleic acid sequences to whole organisms such as zebrafish embryos. Consequently, methods and algorithms for automated quantitative analysis of these images have become increasingly important. These methods range from traditional image analysis techniques to use of deep learning architectures. Many biomedical microscopy assays result in fluorescent spots. Robust detection and precise localization of these spots are two important, albeit sometimes overlapping, areas for application of quantitative image analysis. We demonstrate the use of popular deep learning architectures for spot detection and compare them against more traditional parametric model-based approaches. Moreover, we quantify the effect of pre-training and change in the size of training sets on detection performance. Thereafter, we determine the potential of training deep networks on synthetic and semi-synthetic datasets and their comparison with networks trained on manually annotated real data. In addition, we present a two-alternative forced-choice based tool for assisting in manual annotation of real image data. On a spot localization track, we parallelize a popular compressed sensing based localization method and evaluate its performance in conjunction with different optimizers, noise conditions and spot densities. We investigate its sensitivity to different point spread function estimates. Zebrafish is an important model organism, attractive for whole-organism image-based assays for drug discovery campaigns. The effect of drug-induced neuronal damage may be expressed in the form of zebrafish shape deformation. First, we present an automated method for accurate quantification of tail deformations in multi-fish micro-plate wells using image analysis techniques such as illumination correction, segmentation, generation of branch-free skeletons of partial tail-segments and their fusion to generate complete tails. Later, we demonstrate the use of a deep learning-based pipeline for classifying micro-plate wells as either drug-affected or negative controls, resulting in competitive performance, and compare the performance from deep learning against that from traditional image analysis approaches.
123

Packaging Demand Forecasting in Logistics using Deep Neural Networks

Bachu, Yashwanth January 2019 (has links)
Background: Logistics have a vital role in supply chain management and those logistics operations are dependent on the availability of packaging material for packing goods and material to be shipped. Forecasting packaging material demand for a long period of time will help organization planning to meet the demand. Using time-series data with Deep Neural Networks for long term forecasting is proposed for research. Objectives: This study is to identify the DNN used in forecasting packaging demand and in similar problems in terms of data, data similar to the available data with the organization (Volvo). Identifying the best-practiced approach for long-term forecasting and then combining the approach with identified and selected DNN for forecasting. The end objective of the thesis is to suggest the best DNN model for packaging demand forecasting. Methods: An experiment is conducted to evaluate the DNN models selected for demand forecasting. Three models are selected by a preliminary systematic literature review. Another Systematic literature review is performed in parallel for identifying metrics to evaluate the models to measure performance. Results from the preliminary literature review were instrumental in performing the experiment. Results: Three models observed in this study are performing well with considerable forecasting values. But based on the type and amount of historical data that models were given to learn, three models have a very slight difference in performance measures in terms of forecasting performance. Comparisons are made with different measures that are selected by the literature review. For a better understanding of the batch size impact on model performance, experimented three models were developed with two different batch sizes. Conclusions: Proposed models are performing considerable forecasting of packaging demand for planning the next 52 weeks (∼ 1 Year). Results show that by adopting DNN in forecasting, reliable packaging demand can be forecasted on time series data for packaging material. The combination of CNN-LSTM is better performing than the respective individual models by a small margin. By extending the forecasting at the granule level of the supply chain (Individual suppliers and plants) will benefit the organization by controlling the inventory and avoiding excess inventory.
124

Watermarking in Audio using Deep Learning

Tegendal, Lukas January 2019 (has links)
Watermarking is a technique used to used to mark the ownership in media such as audio or images by embedding a watermark, e.g. copyrights information, into the media. A good watermarking method should perform this embedding without affecting the quality of the media. Recent methods for watermarking in images uses deep learning to embed and extract the watermark in the images. In this thesis, we investigate watermarking in the hearable frequencies of audio using deep learning. More specifically, we try to create a watermarking method for audio that is robust to noise in the carrier, and that allows for the extraction of the embedded watermark from the audio after being played over-the-air. The proposed method consists of two deep convolutional neural network trained end-to-end on music with simulated noise. Experiments show that the proposed method successfully creates watermarks robust to simulated noise with moderate quality reductions, but it is not robust to the real world noise introduced after playing and recording the audio over-the-air.
125

Attributed Multi-Relational Attention Network for Fact-checking URL Recommendation

You, Di 11 July 2019 (has links)
To combat fake news, researchers mostly focused on detecting fake news and journalists built and maintained fact-checking sites (e.g., Snopes.com and Politifact.com). However, fake news dissemination has been greatly promoted by social media sites, and these fact-checking sites have not been fully utilized. To overcome these problems and complement existing methods against fake news, in this thesis, we propose a deep-learning based fact-checking URL recommender system to mitigate impact of fake news in social media sites such as Twitter and Facebook. In particular, our proposed framework consists of a multi-relational attentive module and a heterogeneous graph attention network to learn complex/semantic relationship between user-URL pairs, user-user pairs, and URL-URL pairs. Extensive experiments on a real-world dataset show that our proposed framework outperforms seven state-of-the-art recommendation models, achieving at least 3~5.3% improvement.
126

3D Visualization of MPC-based Algorithms for Autonomous Vehicles

Sörliden, Pär January 2019 (has links)
The area of autonomous vehicles is an interesting research topic, which is popular in both research and industry worldwide. Linköping university is no exception and some of their research is based on using Model Predictive Control (MPC) for autonomous vehicles. They are using MPC to plan a path and control the autonomous vehicles. Additionally, they are using different methods (for example deep learning or likelihood) to calculate collision probabilities for the obstacles. These are very complex algorithms, and it is not always easy to see how they work. Therefore, it is interesting to study if a visualization tool, where the algorithms are presented in a three-dimensional way, can be useful in understanding them, and if it can be useful in the development of the algorithms.  This project has consisted of implementing such a visualization tool, and evaluating it. This has been done by implementing a visualization using a 3D library, and then evaluating it both analytically and empirically. The evaluation showed positive results, where the proposed tool is shown to be helpful when developing algorithms for autonomous vehicles, but also showing that some aspects of the algorithm still would need more research on how they could be implemented. This concerns the neural networks, which was shown to be difficult to visualize, especially given the available data. It was found that more information about the internal variables in the network would be needed to make a better visualization of them.
127

Skin lesion segmentation and classification using deep learning

Unknown Date (has links)
Melanoma, a severe and life-threatening skin cancer, is commonly misdiagnosed or left undiagnosed. Advances in artificial intelligence, particularly deep learning, have enabled the design and implementation of intelligent solutions to skin lesion detection and classification from visible light images, which are capable of performing early and accurate diagnosis of melanoma and other types of skin diseases. This work presents solutions to the problems of skin lesion segmentation and classification. The proposed classification approach leverages convolutional neural networks and transfer learning. Additionally, the impact of segmentation (i.e., isolating the lesion from the rest of the image) on the performance of the classifier is investigated, leading to the conclusion that there is an optimal region between “dermatologist segmented” and “not segmented” that produces best results, suggesting that the context around a lesion is helpful as the model is trained and built. Generative adversarial networks, in the context of extending limited datasets by creating synthetic samples of skin lesions, are also explored. The robustness and security of skin lesion classifiers using convolutional neural networks are examined and stress-tested by implementing adversarial examples. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection
128

Parallel Distributed Deep Learning on Cluster Computers

Unknown Date (has links)
Deep Learning is an increasingly important subdomain of arti cial intelligence. Deep Learning architectures, arti cial neural networks characterized by having both a large breadth of neurons and a large depth of layers, bene ts from training on Big Data. The size and complexity of the model combined with the size of the training data makes the training procedure very computationally and temporally expensive. Accelerating the training procedure of Deep Learning using cluster computers faces many challenges ranging from distributed optimizers to the large communication overhead speci c to a system with o the shelf networking components. In this thesis, we present a novel synchronous data parallel distributed Deep Learning implementation on HPCC Systems, a cluster computer system. We discuss research that has been conducted on the distribution and parallelization of Deep Learning, as well as the concerns relating to cluster environments. Additionally, we provide case studies that evaluate and validate our implementation. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection
129

Using Deep Learning Semantic Segmentation to Estimate Visual Odometry

Unknown Date (has links)
In this research, image segmentation and visual odometry estimations in real time are addressed, and two main contributions were made to this field. First, a new image segmentation and classification algorithm named DilatedU-NET is introduced. This deep learning based algorithm is able to process seven frames per-second and achieves over 84% accuracy using the Cityscapes dataset. Secondly, a new method to estimate visual odometry is introduced. Using the KITTI benchmark dataset as a baseline, the visual odometry error was more significant than could be accurately measured. However, the robust framerate speed made up for this, able to process 15 frames per second. / Includes bibliography. / Thesis (M.S.)--Florida Atlantic University, 2018. / FAU Electronic Theses and Dissertations Collection
130

Encoder-decoder neural networks

Kalchbrenner, Nal January 2017 (has links)
This thesis introduces the concept of an encoder-decoder neural network and develops architectures for the construction of such networks. Encoder-decoder neural networks are probabilistic conditional generative models of high-dimensional structured items such as natural language utterances and natural images. Encoder-decoder neural networks estimate a probability distribution over structured items belonging to a target set conditioned on structured items belonging to a source set. The distribution over structured items is factorized into a product of tractable conditional distributions over individual elements that compose the items. The networks estimate these conditional factors explicitly. We develop encoder-decoder neural networks for core tasks in natural language processing and natural image and video modelling. In Part I, we tackle the problem of sentence modelling and develop deep convolutional encoders to classify sentences; we extend these encoders to models of discourse. In Part II, we go beyond encoders to study the longstanding problem of translating from one human language to another. We lay the foundations of neural machine translation, a novel approach that views the entire translation process as a single encoder-decoder neural network. We propose a beam search procedure to search over the outputs of the decoder to produce a likely translation in the target language. Besides known recurrent decoders, we also propose a decoder architecture based solely on convolutional layers. Since the publication of these new foundations for machine translation in 2013, encoder-decoder translation models have been richly developed and have displaced traditional translation systems both in academic research and in large-scale industrial deployment. In services such as Google Translate these models process in the order of a billion translation queries a day. In Part III, we shift from the linguistic domain to the visual one to study distributions over natural images and videos. We describe two- and three- dimensional recurrent and convolutional decoder architectures and address the longstanding problem of learning a tractable distribution over high-dimensional natural images and videos, where the likely samples from the distribution are visually coherent. The empirical validation of encoder-decoder neural networks as state-of- the-art models of tasks ranging from machine translation to video prediction has a two-fold significance. On the one hand, it validates the notions of assigning probabilities to sentences or images and of learning a distribution over a natural language or a domain of natural images; it shows that a probabilistic principle of compositionality, whereby a high- dimensional item is composed from individual elements at the encoder side and whereby a corresponding item is decomposed into conditional factors over individual elements at the decoder side, is a general method for modelling cognition involving high-dimensional items; and it suggests that the relations between the elements are best learnt in an end-to-end fashion as non-linear functions in distributed space. On the other hand, the empirical success of the networks on the tasks characterizes the underlying cognitive processes themselves: a cognitive process as complex as translating from one language to another that takes a human a few seconds to perform correctly can be accurately modelled via a learnt non-linear deterministic function of distributed vectors in high-dimensional space.

Page generated in 0.3478 seconds