Global ETD Search

101	Visual Question Answering in the Medical Domain Sharma, Dhruv 21 July 2020 (has links) Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. Moreover, the VQA system for medical images needs to consider a limited amount of training data available in this domain. In this thesis, we develop a deep learning-based model for VQA on medical images taking the associated challenges into account. Our MedFuseNet system aims at maximizing the learning with minimal complexity by breaking the problem statement into simpler tasks and weaving everything together to predict the answer. We tackle two types of answer prediction - categorization and generation. We conduct an extensive set of both quantitative and qualitative analyses to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks. / Master of Science / Medical images are extremely complicated to comprehend for a person without expertise. The limited number of practitioners across the globe often face the issue of fatigue due to the high number of cases. This fatigue, physical and mental, can induce human-errors during the diagnosis. In such scenarios, having an additional opinion can be helpful in boosting the confidence of the decision-maker. Thus, it becomes crucial to have a reliable Visual Question Answering (VQA) system which can provide a "second opinion" on medical cases. However, most of the VQA systems that work today cater to real-world problems and are not specifically tailored for handling medical images. In this thesis, we propose an end-to-end deep learning-based system, MedFuseNet, for predicting the answer for the input query associated with the image. We cater to close-ended as well as open-ended type question-answer pairs. We conduct an extensive analysis to evaluate the performance of MedFuseNet. Our results conclude that MedFuseNet outperforms other state-of-the-art methods available in the literature for these tasks. Visual Question Answering deep learning medical images
102	Synthesizing Realistic Data for Vision Based Drone-to-Drone Detection Yellapantula, Sudha Ravali 15 July 2019 (has links) In the thesis, we aimed at building a robust UAV(drone) detection algorithm through which, one drone could detect another drone in flight. Though this was a straight forward object detection problem, the biggest challenge we faced for drone detection is the limited amount of drone images for training. To address this issue, we used Generative Adversarial Networks, CycleGAN to be precise, for the generation of realistic looking fake images which were indistinguishable from real data. CycleGAN is a classic example of Image to Image Translation technique, and we this applied in our situation where synthetic images from one domain were transformed into another domain, containing real data. The model, once trained, was capable of generating realistic looking images from synthetic data without the presence of real images. Following this, we employed a state of the art object detection model, YOLO(You Only Look Once), to build a Drone Detection model that was trained on the generated images. Finally, the performance of this model was compared against different datasets in order to evaluate its performance. / Master of Science / In the recent years, technologies like Deep Learning and Machine Learning have seen many rapid developments. Among the many applications they have, object detection is one of the widely used application and well established problems. In our thesis, we deal with a scenario where we have a swarm of drones and our aim is for one drone to recognize another drone in its field of vision. As there was no drone image dataset readily available, we explored different ways of generating realistic data to address this issue. Finally, we proposed a solution to generate realistic images using Deep Learning techniques and trained an object detection model on it where we evaluated how well it has performed against other models. GANs Deep learning (Machine learning) Object Detection
103	Collaborative Path Planning and Control for Ground Agents Via Photography Collected by Unmanned Aerial Vehicles Wood, Sami Warren 24 June 2022 (has links) Natural disasters damage infrastructure and create significant obstacles to humanitarian aid efforts. Roads may become unusable, hindering or halting efforts to provide food, water, shelter, and life-saving emergency care. Finding a safe route during a disaster is especially difficult because as the disaster unfolds, the usability of roads and other infrastructure can change quickly, rendering most navigation services useless. With the proliferation of cheap cameras and unmanned aerial vehicles [UAVs], the rapid collection of aerial data after a natural disaster has become increasingly common. This data can be used to quickly appraise the damage to critical infrastructure, which can help solve navigational and logistical problems that may arise after the disaster. This work focuses on a framework in which a UAV is paired with an unmanned ground vehicle [UGV]. The UAV follows the UGV with a downward-facing camera and helps the ground vehicle navigate the flooded environment. This work makes several contributions: a simulation environment is created to allow for automated data collection in hypothetical disaster scenarios. The simulation environment uses real-world satellite and elevation data to emulate natural disasters such as floods. The environment partially simulates the dynamics of the UAV and UGV, allowing agents to ex- plore during hypothetical disasters. Several semantic image segmentation models are tested for efficacy in identifying obstacles and creating cost maps for navigation within the environ- ment, as seen by the UAV. A deep homography model incorporates temporal relations across video frames to stitch cost maps together. A weighted version of a navigation algorithm is presented to plan a path through the environment. The synthesis of these modules leads to a novel framework wherein a UAV may guide a UGV safely through a disaster area. / Master of Science / Damage to infrastructure after a natural disaster can make navigation a major challenge. Imagine a hurricane has hit someone's house; they are hurt and need to go to the hospital. Using a traditional GPS navigation system or even their memory may not work as many roads could be impassible. However, if the GPS could be quickly updated as to which roads were not flooded, it could still be used to navigate and avoid hazards. While the system presented is designed to work with a self-driving vehicle, it could easily be extended to give directions to a human. The goal of this work is to provide a system that could be used as a replacement for a GPS based on aerial photography. The advantage of this system is that flooded or damaged infrastructure can be identified and avoided in real-time. The system could even identify other possible routes by using photography, such as driving across a field to reach higher ground. Like a GPS, the system works automatically, tracking a user's position and sug- gesting turns, aiding navigation. A contribution of this work is a simulation of the environment designed in a video game engine. The game engine creates a video game world that can be flooded and used to test the new navigation system. The video game environment is used to train an artificial intel- ligence computer model to identify hazards and create routes that would avoid them. The system could be used in a real-world disaster following training in a video game world. Path Planning Computer Vision Deep Learning
104	Towards a Resource Efficient Framework for Distributed Deep Learning Applications Han, Jingoo 24 August 2022 (has links) Distributed deep learning has achieved tremendous success for solving scientific problems in research and discovery over the past years. Deep learning training is quite challenging because it requires training on large-scale massive dataset, especially with graphics processing units (GPUs) in latest high-performance computing (HPC) supercomputing systems. HPC architectures bring different performance trends in training throughput compared to the existing studies. Multiple GPUs and high-speed interconnect are used for distributed deep learning on HPC systems. Extant distributed deep learning systems are designed for non-HPC systems without considering efficiency, leading to under-utilization of expensive HPC hardware. In addition, increasing resource heterogeneity has a negative effect on resource efficiency in distributed deep learning methods including federated learning. Thus, it is important to focus on an increasing demand for both high performance and high resource efficiency for distributed deep learning systems, including latest HPC systems and federated learning systems. In this dissertation, we explore and design novel methods and frameworks to improve resource efficiency of distributed deep learning training. We address the following five important topics: performance analysis on deep learning for supercomputers, GPU-aware deep learning job scheduling, topology-aware virtual GPU training, heterogeneity-aware adaptive scheduling, and token-based incentive algorithm. In the first chapter (Chapter 3), we explore and focus on analyzing performance trend of distributed deep learning on latest HPC systems such as Summitdev supercomputer at Oak Ridge National Laboratory. We provide insights by conducting a comprehensive performance study on how deep learning workloads have effects on the performance of HPC systems with large-scale parallel processing capabilities. In the second part (Chapter 4), we design and develop a novel deep learning job scheduler MARBLE, which considers efficiency of GPU resource based on non-linear scalability of GPUs in a single node and improves GPU utilization by sharing GPUs with multiple deep learning training workloads. The third part of this dissertation (Chapter 5) proposes topology-aware virtual GPU training systems TOPAZ, specifically designed for distributed deep learning on recent HPC systems. In the fourth chapter (Chapter 6), we conduct exploration on an innovative holistic federated learning scheduling that employs a heterogeneity-aware adaptive selection method for improving resource efficiency and accuracy performance, coupled with resource usage profiling and accuracy monitoring to achieve multiple goals. In the fifth part of this dissertation (Chapter 7), we are focused on how to provide incentives to participants according to contribution for reaching high performance of final federated model, while tokens are used as a means of paying for the services of providing participants and the training infrastructure. / Doctor of Philosophy / Distributed deep learning is widely used for solving critical scientific problems with massive datasets. However, to accelerate the scientific discovery, resource efficiency is also important for the deployment on real-world systems, such as high-performance computing (HPC) systems. Deployment of existing deep learning applications on these distributed systems may lead to underutilization of HPC hardware resources. In addition, extreme resource heterogeneity has negative effects on distributed deep learning training. However, much of the prior work has not focused on specific challenges in distributed deep learning including HPC systems and heterogeneous federated systems, in terms of optimizing resource utilization.This dissertation addresses the challenges in improving resource efficiency of distributed deep learning applications, through performance analysis on deep learning for supercomputers, GPU-aware deep learning job scheduling, topology-aware virtual GPU training, and heterogeneity-aware adaptive federated learning scheduling and incentive algorithms. Deep Learning Federated Learning HPC Distributed Systems
105	A Comparison of Image Classification with Different Activation Functions in Balanced and Unbalanced Datasets Zhang, Moqi 04 June 2021 (has links) When the novel coronavirus (COVID-19) outbreak began to ring alarm bells worldwide, rapid, efficient diagnosis was critical to the emergency response. The limited ability of medical systems and the increasing number of daily cases pushed researchers to investigate automated models. The use of deep neural networks to help doctors make the correct diagnosis has dramatically reduced the pressure on the healthcare system. Promoting the improvement of diagnosis networks depends not only on the network structure design but also on the activation function performance. To identify an optimal activation function, this study investigates the correlation between the activation function selection and image classification performance in balanced or imbalanced datasets. Our analysis evaluates various network architectures for both commonly used and novel datasets and presents a comprehensive analysis of ten widely used activation functions. The experimental results show that the swish and softplus functions enhance the classification ability of state-of-the-art networks. Finally, this thesis distinguishes the neural networks using ten activation functions, analyzes their pros and cons, and puts forward detailed suggestions on choosing appropriate activation functions in future work. / Master of Science / When the novel coronavirus (COVID-19) outbreak began to ring alarm bells worldwide, the rapid, efficient diagnosis was critical to the emergency response. The manual diagnosis of chest X-rays by radiologists is time and cost-consuming. Compared with traditional diagnostic technology, the artificial intelligence medical system can simultaneously analyze and diagnose hundreds of medical images and speedily obtain high precision and high-efficiency returns. As we all know, machines are brilliant in learning new things and never sleep. Suppose machines can be used to replace human beings in some positions. In that case, it can significantly relieve the pressure on the medical system and buy time for medical practitioners to concentrate more on the research of new technologies. We need to know that the critical decision unit of the intelligent diagnosis system is the activation function. Therefore, this work provides an in-depth evaluation and comparison of the traditional and widely used activation functions with the emerging activation functions, which helps to improve the accuracy of the most advanced diagnostic model on the COVID-19 image dataset. Besides, the results of this study also summarize the cons and pros of using various neural functions and provide many suggestions for future work. deep learning activation function COVID-19
106	Capsule Networks: Framework and Application to Disentanglement for Generative Models Moghimi, Zahra 30 June 2021 (has links) Generative models are one of the most prominent components of unsupervised learning models that have a plethora of applications in various domains such as image-to-image translation, video prediction, and generating synthetic data where accessing real data is expensive, unethical, or compromising privacy. One of the main challenges in designing a generative model is creating a disentangled representation of generative factors which gives control over various characteristics of the generated data. Since the architecture of variational autoencoders is centered around latent variables and their objective function directly governs the generative factors, they are the perfect choice for creating a more disentangled representation. However, these architectures generate samples that are blurry and of lower quality compared to other state-of-the-art generative models such as generative adversarial networks. Thus, we attempt to increase the disentanglement of latent variables in variational autoencoders without compromising the generated image quality. In this thesis, a novel generative model based on capsule networks and a variational autoencoder is proposed. Motivated by the concept of capsule neural networks and their vectorized output, these structures are employed to create a disentangled representation of latent features in variational autoencoders. In particular, the proposed structure, called CapsuleVAE, utilizes a capsule encoder whose vector outputs can translate to latent variables in a meaningful way. It is shown that CapsuleVAE generates results that are sharper and more diverse based on FID score and a metric inspired by the inception score. Furthermore, two different methods for training CapsuleVAE are proposed, and the generated results are investigated. In the first method, an objective function with regularization is proposed, and the optimal regularization hyperparameter is derived. In the second method, called sequential optimization, a novel training technique for training CapsuleVAE is proposed and the results are compared to the first method. Moreover, a novel metric for measuring disentanglement in latent variables is introduced. Based on this metric, it is shown that the proposed CapsuleVAE creates more disentangled representations. In summary, our proposed generative model enhances the disentanglement of latent variables which contributes to the model's generalizing well to new tasks and more control over the generated data. Our model also increases the generated image quality which addresses a common disadvantage in variational autoencoders. / Master of Science / Generative models are algorithms that, given a large enough initial dataset, create data points (such as images) similar to the initial dataset from random input numbers. These algorithms have various applications in different fields, such as generating synthetic healthcare data, wireless systems data generation in extreme or rare conditions, generating high-resolution, colorful images from grey-scale photos or sketches, and in general, generating synthetic data for applications where obtaining real data is expensive, inaccessible, unethical, or compromising privacy. Some generative models create a representation for the data and divide it into several ``generative factors". Researchers have shown that a better data representation is one where the generative factors are ``disentangled", meaning that each generative factor is responsible for only one particular feature in the generated data. Unfortunately, creating a model with disentangled generative factors sacrifices the image quality. In this work, we design a generative model that enhances the disentanglement of generative factors without compromising the quality of the generated images. In order to design a generative model with more disentangled generative factors, we employ capsule networks in the architecture of the generative model. Capsule networks are algorithms that classify the inputted information into different categories. We show that by using capsule networks, our designed generative model achieves higher performance in the quality of the generated images and creates a more disentangled representation of generative factors. Deep Learning Generative models Capsule Networks Disentanglement
107	Optimizing Urban Traffic Management Through AI with Digital Twin Simulation and Validation Sioldea, Daniel 08 1900 (has links) The number of vehicles on the road continuously increases, revealing a lack of robust and effective traffic management systems in urban settings. Urban traffic makes up a substantial portion of the total traffic problem, and current traffic light architecture has been limiting the traffic flow noticeably. This thesis focuses on developing an artificial intelligence-based smart traffic management system using a double duelling deep Q network (DDDQN), validated through a user-controlled 3D simulation, determining the system’s effectiveness. This work leverages current fisheye camera architecture to present a system that can be implemented into current architecture with little intrusion. The challenges surrounding large computer vision datasets, and the challenges and limitations surrounding fisheye cameras are discussed. The data and conditions required to replicate these features in a simulated environment are identified. Finally, a baseline traffic flow and traffic light phase model is created using camera data from the City of Hamilton. A DDDQN optimization algorithm used to reduce individual traffic light queue length and wait times is developed using the SUMO traffic simulator. The algorithm is trained over different maps and is then deployed onto a large map of various streets in the City of Hamilton. The algorithm is tested through a user-controlled driving simulator, observing excellent performance results over long routes. / Thesis / Master of Applied Science (MASc) AI Deep Learning Reinforcement Learning Simulation
108	Attention-based LSTM network for rumor veracity estimation of tweets Singh, J.P., Kumar, A., Rana, Nripendra P., Dwivedi, Y.K. 12 August 2020 (has links) Yes / Twitter has become a fertile place for rumors, as information can spread to a large number of people immediately. Rumors can mislead public opinion, weaken social order, decrease the legitimacy of government, and lead to a significant threat to social stability. Therefore, timely detection and debunking rumor are urgently needed. In this work, we proposed an Attention-based Long-Short Term Memory (LSTM) network that uses tweet text with thirteen different linguistic and user features to distinguish rumor and non-rumor tweets. The performance of the proposed Attention-based LSTM model is compared with several conventional machine and deep learning models. The proposed Attention-based LSTM model achieved an F1-score of 0.88 in classifying rumor and non-rumor tweets, which is better than the state-of-the-art results. The proposed system can reduce the impact of rumors on society and weaken the loss of life, money, and build the firm trust of users with social media platforms. Rumor Rumour Twitter Deep learning Machine learning
109	Camera-based Recovery of Cardiovascular Signals from Unconstrained Face Videos Using an Attention Network Deshpande, Yogesh Rajan 22 June 2023 (has links) This work addresses the problem of recovering the morphology of blood volume pulse (BVP) information from a video of a person's face. Video-based remote plethysmography methods have shown promising results in estimating vital signs such as heart rate and breathing rate. However, recovering the instantaneous pulse rate signals is still a challenge for the community. This is due to the fact that most of the previous methods concentrate on capturing the temporal average of the cardiovascular signals. In contrast, we present an approach in which BVP signals are extracted with a focus on the recovery of the signal morphology as a generalized form for the computation of physiological metrics. We also place emphasis on allowing natural movements by the subject. Furthermore, our system is capable of extracting individual BVP instances with sufficient signal detail to facilitate candidate re-identification. These improvements have resulted in part from the incorporation of a robust skin-detection module into the overall imaging-based photoplethysmography (iPPG) framework. We present extensive experimental results using the challenging UBFC-Phys dataset and the well-known COHFACE dataset. The source code is available at https://github.com/yogeshd21/CVPM-2023-iPPG-Paper. / Master of Science / In this work we are trying to study and recover human health related metrics and the physiological signals which are at the core for the derivation of such metrics. A well know form of physiological signals is ECG (Electrocardiogram) signals and for our research we work with BVP (Blood Volume Pulse) signals. With this work we are proposing a Deep Learning based model for non-invasive retrieval of human physiological signals from human face videos. Most of the state of the art models as well as researchers try to recover averaged cardiac pulse based metrics like heart rate, breathing rate, etc. without focusing on the details of the recovered physiological signal. Physiological signals like BVP have details like systolic peak, diastolic peak and dicrotic notch, and these signals also have applications in various domains like human mental health study, emotional stimuli study, etc. Hence with this work we focus on retrieval of the morphology of such physiological signals and present a quantitative as well as qualitative results for the same. An efficient attention based deep learning model is presented and scope of reidentification using the retrieved signals is also explored. Along with significant implementations like skin detection model our proposed architecture also shows better performance than state of the art models for two very challenging datasets UBFC-Phys as well as COHFACE. The source code is available at https://github.com/yogeshd21/CVPM-2023-iPPG-Paper. Deep Learning Remote Photoplethysmograph (iPPG) Biometrics
110	Grounding deep models of visual data Bargal, Sarah Adel 21 February 2019 (has links) Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this thesis, we study: 1) Improving Model Classification. We show that by utilizing web action images along with videos in training for action recognition, significant performance boosts of convolutional models can be achieved. Without explicit grounding, labeled web action images tend to contain discriminative action poses, which highlight discriminative portions of a video’s temporal progression. 2) Spatial Grounding. We visualize spatial evidence of deep model predictions using a discriminative top-down attention mechanism, called Excitation Backprop. We show how such visualizations are equally informative for correct and incorrect model predictions, and highlight the shift of focus when different training strategies are adopted. 3) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 4) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidenceprediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 5) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network’s classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks. Computer science Deep learning Grounding Visual data

Search results