• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 1929
  • 59
  • 57
  • 38
  • 38
  • 37
  • 21
  • 16
  • 14
  • 14
  • 7
  • 4
  • 4
  • 2
  • 1
  • Tagged with
  • 2811
  • 2811
  • 1144
  • 1012
  • 871
  • 643
  • 582
  • 516
  • 506
  • 482
  • 467
  • 461
  • 426
  • 422
  • 404
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Collaborative Path Planning and Control for Ground Agents Via Photography Collected by Unmanned Aerial Vehicles

Wood, Sami Warren 24 June 2022 (has links)
Natural disasters damage infrastructure and create significant obstacles to humanitarian aid efforts. Roads may become unusable, hindering or halting efforts to provide food, water, shelter, and life-saving emergency care. Finding a safe route during a disaster is especially difficult because as the disaster unfolds, the usability of roads and other infrastructure can change quickly, rendering most navigation services useless. With the proliferation of cheap cameras and unmanned aerial vehicles [UAVs], the rapid collection of aerial data after a natural disaster has become increasingly common. This data can be used to quickly appraise the damage to critical infrastructure, which can help solve navigational and logistical problems that may arise after the disaster. This work focuses on a framework in which a UAV is paired with an unmanned ground vehicle [UGV]. The UAV follows the UGV with a downward-facing camera and helps the ground vehicle navigate the flooded environment. This work makes several contributions: a simulation environment is created to allow for automated data collection in hypothetical disaster scenarios. The simulation environment uses real-world satellite and elevation data to emulate natural disasters such as floods. The environment partially simulates the dynamics of the UAV and UGV, allowing agents to ex- plore during hypothetical disasters. Several semantic image segmentation models are tested for efficacy in identifying obstacles and creating cost maps for navigation within the environ- ment, as seen by the UAV. A deep homography model incorporates temporal relations across video frames to stitch cost maps together. A weighted version of a navigation algorithm is presented to plan a path through the environment. The synthesis of these modules leads to a novel framework wherein a UAV may guide a UGV safely through a disaster area. / Master of Science / Damage to infrastructure after a natural disaster can make navigation a major challenge. Imagine a hurricane has hit someone's house; they are hurt and need to go to the hospital. Using a traditional GPS navigation system or even their memory may not work as many roads could be impassible. However, if the GPS could be quickly updated as to which roads were not flooded, it could still be used to navigate and avoid hazards. While the system presented is designed to work with a self-driving vehicle, it could easily be extended to give directions to a human. The goal of this work is to provide a system that could be used as a replacement for a GPS based on aerial photography. The advantage of this system is that flooded or damaged infrastructure can be identified and avoided in real-time. The system could even identify other possible routes by using photography, such as driving across a field to reach higher ground. Like a GPS, the system works automatically, tracking a user's position and sug- gesting turns, aiding navigation. A contribution of this work is a simulation of the environment designed in a video game engine. The game engine creates a video game world that can be flooded and used to test the new navigation system. The video game environment is used to train an artificial intel- ligence computer model to identify hazards and create routes that would avoid them. The system could be used in a real-world disaster following training in a video game world.
112

Camera-based Recovery of Cardiovascular Signals from Unconstrained Face Videos Using an Attention Network

Deshpande, Yogesh Rajan 22 June 2023 (has links)
This work addresses the problem of recovering the morphology of blood volume pulse (BVP) information from a video of a person's face. Video-based remote plethysmography methods have shown promising results in estimating vital signs such as heart rate and breathing rate. However, recovering the instantaneous pulse rate signals is still a challenge for the community. This is due to the fact that most of the previous methods concentrate on capturing the temporal average of the cardiovascular signals. In contrast, we present an approach in which BVP signals are extracted with a focus on the recovery of the signal morphology as a generalized form for the computation of physiological metrics. We also place emphasis on allowing natural movements by the subject. Furthermore, our system is capable of extracting individual BVP instances with sufficient signal detail to facilitate candidate re-identification. These improvements have resulted in part from the incorporation of a robust skin-detection module into the overall imaging-based photoplethysmography (iPPG) framework. We present extensive experimental results using the challenging UBFC-Phys dataset and the well-known COHFACE dataset. The source code is available at https://github.com/yogeshd21/CVPM-2023-iPPG-Paper. / Master of Science / In this work we are trying to study and recover human health related metrics and the physiological signals which are at the core for the derivation of such metrics. A well know form of physiological signals is ECG (Electrocardiogram) signals and for our research we work with BVP (Blood Volume Pulse) signals. With this work we are proposing a Deep Learning based model for non-invasive retrieval of human physiological signals from human face videos. Most of the state of the art models as well as researchers try to recover averaged cardiac pulse based metrics like heart rate, breathing rate, etc. without focusing on the details of the recovered physiological signal. Physiological signals like BVP have details like systolic peak, diastolic peak and dicrotic notch, and these signals also have applications in various domains like human mental health study, emotional stimuli study, etc. Hence with this work we focus on retrieval of the morphology of such physiological signals and present a quantitative as well as qualitative results for the same. An efficient attention based deep learning model is presented and scope of reidentification using the retrieved signals is also explored. Along with significant implementations like skin detection model our proposed architecture also shows better performance than state of the art models for two very challenging datasets UBFC-Phys as well as COHFACE. The source code is available at https://github.com/yogeshd21/CVPM-2023-iPPG-Paper.
113

Attention-based LSTM network for rumor veracity estimation of tweets

Singh, J.P., Kumar, A., Rana, Nripendra P., Dwivedi, Y.K. 12 August 2020 (has links)
Yes / Twitter has become a fertile place for rumors, as information can spread to a large number of people immediately. Rumors can mislead public opinion, weaken social order, decrease the legitimacy of government, and lead to a significant threat to social stability. Therefore, timely detection and debunking rumor are urgently needed. In this work, we proposed an Attention-based Long-Short Term Memory (LSTM) network that uses tweet text with thirteen different linguistic and user features to distinguish rumor and non-rumor tweets. The performance of the proposed Attention-based LSTM model is compared with several conventional machine and deep learning models. The proposed Attention-based LSTM model achieved an F1-score of 0.88 in classifying rumor and non-rumor tweets, which is better than the state-of-the-art results. The proposed system can reduce the impact of rumors on society and weaken the loss of life, money, and build the firm trust of users with social media platforms.
114

Improving Hydrologic Connectivity Delineation Based on High-Resolution DEMs and Geospatial Artificial Intelligence

Wu, Di 01 August 2024 (has links) (PDF)
Hydrological connectivity is crucial for understanding and managing water resources, ecological processes, and landscape dynamics. High-Resolution Digital Elevation Models (HRDEMs) derived from Light Detection and Ranging (LiDAR) data offer unprecedented detail and accuracy in representing terrain features, making them invaluable for mapping hydrological networks and analyzing landscape connectivity. However, challenges persist in accurately delineating flow networks, identifying flow barriers, and optimizing computational efficiency, particularly in large-scale applications and complex terrain conditions. This dissertation addresses these challenges through a comprehensive exploration of advanced techniques in deep learning, spatial analysis, and parallel computing. A common practice is to breach the elevation of roads near drainage crossing locations to remove flow barriers, which, however, are often unavailable or with variable quality. Thus, developing a reliable drainage crossing dataset is essential to improve the HRDEMs for hydrographic delineation. Deep learning models were developed for classifying images that contain the locations of flow barriers. Based on HRDEMs and aerial orthophotos, different Convolutional Neural Network (CNN) models were trained and compared to assess their effectiveness in image classification in four different watersheds across the U.S. Midwest. The results show that most deep learning models can consistently achieve over 90% accuracies. The CNN model with a batch size of 16, a learning rate of 0.01, an epoch of 100, and the HRDEM as the sole input feature exhibits the best performance with 93% accuracy. The addition of aerial orthophotos and their derived spectral indices is insignificant to or even worsens the model’s accuracy. Transferability assessments across geographic regions show promising potential of best-fit model for broader applications, albeit with varying accuracies influenced by hydrography complexity. Based on identified drainage crossing locations, Drainage Barrier Processing (DBP), such as HRDEM excavation, is employed to remove the flow barriers. However, there's a gap in quantitatively assessing the impact of DBP on HRDEM-derived flowlines, especially at finer scales. HRDEM-derived flowlines generated with different flow direction algorithms were evaluated by developing a framework to measure the effects of flow barrier removal. The results show that the primary factor influencing flowline quality is the presence of flow accumulation artifacts. Quality issues also stem from differences between natural and artificial flow paths, unrealistic flowlines in flat areas, complex canal networks, and ephemeral drainageways. Notably, the improvement achieved by DBP is demonstrated to be more than 6%, showcasing its efficacy in reducing the impact of flow barriers on hydrologic connectivity. To overcome the computational intensity and speed up data processing, the efficiency of parallel computing techniques for GeoAI and hydrological modeling was evaluated. The performance of CPU parallel processing on High-Performance Computing (HPC) systems was compared with serial processing on desktop computers and GPU processing using Graphics Processing Units (GPUs). Results demonstrated substantial performance enhancements with GPU processing, particularly in accelerating computationally intensive tasks such as deep learning-based feature detection and hydrological modeling. However, efficiency trends exhibit nonlinear patterns influenced by factors such as communication overhead, task distribution, and resource contention. In summary, this dissertation presents a GeoAI-Hydro framework that significantly advances the quality of hydrological connectivity modeling. By integrating deep learning for accurate flow barrier identification, employing DBP to enhance flowline quality, and utilizing parallel computing to address computational demands, the framework offers a robust solution for high-quality hydrological network mapping and analysis. It paves the way for contributions to more effective water resource management, ecological conservation, and landscape planning.
115

Grounding deep models of visual data

Bargal, Sarah Adel 21 February 2019 (has links)
Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this thesis, we study: 1) Improving Model Classification. We show that by utilizing web action images along with videos in training for action recognition, significant performance boosts of convolutional models can be achieved. Without explicit grounding, labeled web action images tend to contain discriminative action poses, which highlight discriminative portions of a video’s temporal progression. 2) Spatial Grounding. We visualize spatial evidence of deep model predictions using a discriminative top-down attention mechanism, called Excitation Backprop. We show how such visualizations are equally informative for correct and incorrect model predictions, and highlight the shift of focus when different training strategies are adopted. 3) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 4) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidenceprediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 5) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network’s classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks.
116

Handling Label Sparsity: A Semi-Unsupervised Approach / Über den Umgang mit spärlichen Annotationen: Ein halb-unüberwachter Ansatz

Davidson, Padraig January 2025 (has links) (PDF)
In the realm of computer science, the acquisition of large-scale datasets has been greatly facilitated by advancements in sensor technologies and the ubiquity of the Internet. This influx of data has led to the widespread adoption of Deep Learning (DL) techniques for various data processing tasks. However, achieving high-quality predictions using DL approaches is challenging, primarily due to two factors: the substantial annotation requirements and the need for comprehensive coverage of data aspects, which are often impractical or impossible. In practical scenarios, there is a demand for an approach that can classify data with minimal annotated samples for a defined number of categories while also identifying unknown and unlabeled categories. For instance, while a beekeeper can collect vast amounts of (temporal) high-resolution data from an apiary, creating a comprehensive dataset for honey bees is challenging due to their complexity. Similarly, in laboratory settings, inferring cell types from tissue samples can be challenging due to the presence of unknown or rare cell types, despite the availability of large and accurate reference datasets. Recently, Semi-Unsupervised Learning (SuSL) has emerged as a promising paradigm that leverages both labeled and unlabeled data to enhance (DL) model performance. However, the current Multi-Layer Perceptron (MLP) implementation of SuSL has limitations when applied to real-world processes or organisms, such as honey bees. This thesis addresses these limitations by proposing novel methods to enhance the practical applicability of SuSL: (1A) Establishing a generalized encoder-decoder framework for SuSL, enabling seamless integration with various neural network architectures without manual feature extraction. (1B) Investigating critical parameters and dataset characteristics to understand their impact on prediction tasks in a standardized setup. (2) Addressing dataset imbalance in both labeled and unlabeled subsets. (3) Modeling diverse tabular input data while maintaining a compact representation within the network for downstream tasks. (4) Efficiently processing raw sensor data, including time-dependent sensor relations, without manual feature extraction. In addition to theoretical developments and benchmarking results, this thesis demonstrates the real-world applicability of the proposed SuSL implementation through two case studies in biology: (1) Analysis of a billion-tick dataset from ground-level observations of honey bees: from backend to explorative data analysis to SuSL usage. (2) Application of SuSL on newly recorded and unlabeled data using a large reference atlas for cell type annotation: from data integration to SuSL application. Through these contributions, this research aims to advance the practical utility of SuSL in real-world scenarios, particularly in domains with complex and heterogeneous data. / Im Bereich der Informatik wurde die Erfassung umfangreicher Datensätze durch Fortschritte in der Sensortechnologie und die Ubiquität des Internets erheblich erleichtert. Dieser Datenzustrom hat zur weit verbreiteten Nutzung von Deep Learning (DL)-Techniken für verschiedene Datenverarbeitungsaufgaben geführt. Das Erreichen qualitativ hochwertiger Vorhersagen mithilfe von DL-Ansätzen ist jedoch eine Herausforderung, vor allem aufgrund zweier Faktoren: der erheblichen Annotationsanforderungen und der Notwendigkeit einer umfassenden Abdeckung von Datenaspekten, die oft unbrauchbar oder unmöglich ist. In praktischen Szenarien besteht Bedarf an einem Ansatz, der Daten mit minimalen annotierten Samples für eine definierte Anzahl von Kategorien klassifizieren und gleichzeitig unbekannte und nicht-annotierte Kategorien identifizieren kann. Während ein Imker beispielsweise große Mengen (zeitlich) hochauflösender Daten von einem Bienenstand sammeln kann, ist die Erstellung eines umfassenden Datensatzes für Honigbienen aufgrund ihrer Komplexität eine Herausforderung. Ebenso kann es in Laborumgebungen schwierig sein, Zelltypen aus Gewebeproben abzuleiten, da unbekannte oder seltene Zelltypen vorhanden sind, obwohl große und genaue Referenzdatensätze verfügbar sind. Neuerdings hat sich Semi-Unsupervised Learning (SuSL) als vielversprechendes Paradigma herausgestellt, das sowohl annotierte als auch nicht-annotierte Daten nutzt, um die (DL-)Modellleistung zu verbessern. Die aktuelle Multi-Layer Perceptron (MLP)-Implementierung von SuSL weist jedoch Einschränkungen auf, wenn sie auf reale Prozesse oder Organismen wie Honigbienen trifft. Diese Dissertation befasst sich mit diesen Einschränkungen, indem sie neuartige Methoden vorschlägt, um die praktische Anwendbarkeit von SuSL zu verbessern: (1A) Etablierung eines verallgemeinerten Encoder-Decoder-Frameworks für SuSL, das eine nahtlose Integration mit verschiedenen neuronalen Netzwerkarchitekturen ohne manuelle Merkmalsextraktion ermöglicht. (1B) Untersuchung kritischer Parameter und Datensatzmerkmale, um deren Auswirkungen auf Vorhersageaufgaben innerhalb eines standardisierten Setups zu verstehen. (2) Adaption an shief verteilte Datensätze sowohl innerhalb der annotierten als auch in nicht-annotierten Teilmengen. (3) Modellierung verschiedener tabellarischer Eingabedaten unter Beibehaltung einer kompakten Darstellung innerhalb des Netzwerks für nachgelagerte Aufgaben. (4) Effiziente Verarbeitung von Sensorrohdaten, einschließlich zeitabhängiger Sensorbeziehungen, ohne manuelle Merkmalsextraktion. Neben theoretischen Entwicklungen und Benchmarking-Ergebnissen demonstriert diese Arbeit die praktische Anwendbarkeit der vorgeschlagenen SuSL-Implementierung anhand zweier Fallstudien aus der Biologie: (1) Analyse eines Milliarden-Tick-Datensatzes aus Beobachtungen von Honigbienen vom Grund auf: vom Backend über die explorative Datenanalyse bis hin zur SuSL-Nutzung. (2) Anwendung von SuSL auf neu aufgezeichnete und nicht-annotierten Daten unter Verwendung eines großen Referenzatlas für die Annotation von Zelltypen: von der Datenintegration bis zur Anwendung von SuSL. Durch diese Beiträge zielt diese Forschungsarbeit darauf ab, den praktischen Nutzen von SuSL in realen Szenarien voranzutreiben, insbesondere in Bereichen mit komplexen und heterogenen Daten.
117

Learning to Recognize Actions with Weak Supervision / Reconnaissance d'actions de manière faiblement supervisée

Chesneau, Nicolas 23 February 2018 (has links)
L'accroissement rapide des données numériques vidéographiques fait de la compréhension automatiquedes vidéos un enjeu de plus en plus important. Comprendre de manière automatique une vidéo recouvrede nombreuses applications, parmi lesquelles l'analyse du contenu vidéo sur le web, les véhicules autonomes,les interfaces homme-machine. Cette thèse présente des contributions dans deux problèmes majeurs pourla compréhension automatique des vidéos : la détection d'actions supervisée par des données web, et la localisation d'actions humaines.La détection d'actions supervisées par des données web a pour objectif d'apprendre à reconnaître des actions dans des contenus vidéos sur Internet, sans aucune autre supervision. Nous proposons une approche originaledans ce contexte, qui s'appuie sur la synergie entre les données visuelles (les vidéos) et leur description textuelle associée, et ce dans le but d'apprendre des classifieurs pour les événements sans aucune supervision. Plus précisément, nous télechargeons dans un premier temps une base de données vidéos à partir de requêtes construites automatiquement en s'appuyant sur la description textuelle des événéments, puis nous enlevons les vidéos téléchargées pour un événement, et dans laquelle celui-ci n'apparaït pas. Enfin, un classifieur est appris pour chaque événement. Nous montrons l'importance des deux étapes principales, c'est-à-dire la créations des requêtes et l'étape de suppression des vidéos, par des résutatsquantitatifs. Notre approche est évaluée dans des conditions difficiles, où aucune annotation manuelle n'est disponible, dénotées EK0 dans les challenges TrecVid. Nous obtenons l'état de l'art sur les bases de donnéesMED 2011 et 2013.Dans la seconde partie de notre thèse, nous nous concentrons sur la localisation des actions humaines, ce qui implique de reconnaïtre à la fois les actions se déroulant dans la vidéo, comme par exemple "boire" ou "téléphoner", et leur étendues spatio-temporelles. Nous proposons une nouvelle méthode centrée sur la personne, traquant celle-ci dans les vidéos pour en extraire des tubes encadrant le corps entier, même en cas d'occultations ou dissimulations partielles. Deux raisons motivent notre approche. La première est qu'elle permet de gérer les occultations et les changements de points de vue de la caméra durant l'étape de localisation des personnes, car celle-ci estime la position du corps entier à chaque frame. La seconde est que notre approche fournit une meilleure grille de référence que les tubes humains standards (c'est-à-dire les tubes qui n'encadrent que les parties visibles) pour extraire de l'information sur l'action. Le coeur de notre méthode est un réseau de neurones convolutionnel qui apprend à générer des propositions de parties du corps humain. Notre algorithme de tracking connecte les détections temporellement pour extraire des tubes encadrant le corps entier. Nous évaluons notre nouvelle méthode d'extraction de tubes sur une base de données difficile, DALY, et atteignons l'état de l'art. / With the rapid growth of digital video content, automaticvideo understanding has become an increasingly important task. Video understanding spansseveral applications such as web-video content analysis, autonomous vehicles, human-machine interfaces (eg, Kinect). This thesismakes contributions addressing two major problems in video understanding:webly-supervised action detection and human action localization.Webly-supervised action recognition aims to learn actions from video content on the internet, with no additional supervision. We propose a novel approach in this context, which leverages thesynergy between visual video data and the associated textual metadata, to learnevent classifiers with no manual annotations. Specifically, we first collect avideo dataset with queries constructed automatically from textual descriptionof events, prune irrelevant videos with text and video data, and then learn thecorresponding event classifiers. We show the importance of both the main steps of our method, ie,query generation and data pruning, with quantitative results. We evaluate this approach in the challengingsetting where no manually annotated training set is available, i.e., EK0 in theTrecVid challenge, and show state-of-the-art results on MED 2011 and 2013datasets.In the second part of the thesis, we focus on human action localization, which involves recognizing actions that occur in a video, such as ``drinking'' or ``phoning'', as well as their spatial andtemporal extent. We propose a new person-centric framework for action localization that trackspeople in videos and extracts full-body human tubes, i.e., spatio-temporalregions localizing actions, even in the case of occlusions or truncations.The motivation is two-fold. First, it allows us to handle occlusions and camera viewpoint changes when localizing people, as it infers full-body localization. Second, it provides a better reference grid for extracting action information than standard human tubes, ie, tubes which frame visible parts only.This is achieved by training a novel human part detector that scores visibleparts while regressing full-body bounding boxes, even when they lie outside the frame. The core of our method is aconvolutional neural network which learns part proposals specific to certainbody parts. These are then combined to detect people robustly in each frame.Our tracking algorithm connects the image detections temporally to extractfull-body human tubes. We evaluate our new tube extraction method on a recentchallenging dataset, DALY, showing state-of-the-art results.
118

Coherent Nonlinear Raman Microscopy and the Applications of Deep Learning & Pattern Recognition Methods to the Extraction of Quantitative Information

Abdolghader, Pedram 16 September 2021 (has links)
Coherent Raman microscopy (CRM) is a powerful nonlinear optical imaging technique based on contrast via Raman active molecular vibrations. CRM has been used in domains ranging from biology to medicine to geology in order to provide quick, sensitive, chemical-specific, and label-free 3D sectioning of samples. The Raman contrast is usually obtained by combining two ultrashort pulse input beams, known as Pump and Stokes, whose frequency difference is adjusted to the Raman vibrational frequency of interest. CRM can be used in conjunction with other imaging modalities such as second harmonic generation, fluorescence, and third harmonic generation microscopy, resulting in a multimodal imaging technique that can capture a massive amount of data. Two fundamental elements are crucial in CRM. First, a laser source which is broadband, stable, rapidly tunable, and low in noise. Second, a strategy for image analysis that can handle denoising and material classification issues in the relatively large datasets obtained by CRM techniques. Stimulated Raman Scattering (SRS) microscopy is a subset of CRM techniques, and this thesis is devoted entirely to it. Although Raman imaging based on a single vibrational resonance can be useful, non-resonant background signals and overlapping bands in SRS can impair contrast and chemical specificity. Tuning over the Raman spectrum is therefore crucial for target identification, which necessitates the use of a broadband and easily tunable laser source. Although supercontinuum generation in a nonlinear fibre could provide extended tunability, it is typically not viable for some CRM techniques, specifically in SRS microscopy. Signal acquisition schemes in SRS microscopy are focused primarily on detecting a tiny modulation transfer between the Pump and Stokes input laser beams. As a result, very low noise source is required. The primary and most important component in hyperspectral SRS microscopy is a low-noise broadband laser source. The second problem in SRS microscopy is poor signal-to-noise (SNR) ratios in some situations, which can be caused by low target-molecule concentrations in the sample and/or scattering losses in deep-tissue imaging, as examples. Furthermore, in some SRS imaging applications (e.g., in vivo), fast imaging, low input laser power or short integration time is required to prevent sample photodamage, typically resulting in low contrast (low SNR) images. Low SNR images also typically suffer from poorly resolved spectral features. Various de-noising techniques have been used to date in image improvement. However, to enable averaging, these often require either previous knowledge of the noise source or numerous images of the same field of view (under better observing conditions), which may result in the image having lower spatial-spectral resolution. Sample segmentation or converting a 2D hyperspectral image to a chemical concentration map, is also a critical issue in SRS microscopy. Raman vibrational bands in heterogeneous samples are likely to overlap, necessitating the use of chemometrics to separate and segment them. We will address the aforementioned issues in SRS microscopy in this thesis. To begin, we demonstrate that a supercontinuum light source based on all normal dispersion (ANDi) fibres generates a stable broadband output with very low incremental source noise. The ANDi fibre output's noise power spectral density was evaluated, and its applicability in hyperspectral SRS microscopy applications was shown. This demonstrates the potential of ANDi fibre sources for broadband SRS imaging as well as their ease of implementation. Second, we demonstrate a deep learning neural net model and unsupervised machine-learning algorithm for rapid and automated de-noising and segmentation of SRS images based on a ten-layer convolutional autoencoder: UHRED (Unsupervised Hyperspectral Resolution Enhancement and De-noising). UHRED is trained in an unsupervised manner using only a single (“one-shot”) hyperspectral image, with no requirements for training on high quality (ground truth) labelled data sets or images.
119

Semantic Segmentation For Free Drive-able Space Estimation

Gallagher, Eric 02 October 2020 (has links)
Autonomous Vehicles need precise information as to the Drive-able space in order to be able to safely navigate. In recent years deep learning and Semantic Segmentation have attracted intense research. It is a highly advancing and rapidly evolving field that continues to provide excellent results. Research has shown that deep learning is emerging as a powerful tool in many applications. The aim of this study is to develop a deep learning system to estimate the Free Drive-able space. Building on the state of the art deep learning techniques, semantic segmentation will be used to replace the need for highly accurate maps, that are expensive to license. Free Drive-able space is defined as the drive-able space on the correct side of the road, that can be reached without a collision with another road user or pedestrian. A state of the art deep network will be trained with a custom data-set in order to learn complex driving decisions. Motivated by good results, further deep learning techniques will be applied to measure distance from monocular images. The findings demonstrate the power of deep learning techniques in complex driving decisions. The results also indicate the economic and technical feasibility of semantic segmentation over expensive high definition maps.
120

FROM SEEING BETTER TO UNDERSTANDING BETTER: DEEP LEARNING FOR MODERN COMPUTER VISION APPLICATIONS

Tianqi Guo (12890459) 17 June 2022 (has links)
<p>In this dissertation, we document a few of our recent attempts in bridging the gap between the fast evolving deep learning research and the vast industry needs for dealing with computer vision challenges. More specifically, we developed novel deep-learning-based techniques for the following application-driven computer vision challenges: image super-resolution with quality restoration, motion estimation by optical flow, object detection for shape reconstruction, and object segmentation for motion tracking. Those four topics cover the computer vision hierarchy from the low level where digital images are processed to restore missing information for better human perception, to middle level where certain objects of interest are recognized and their motions are analyzed, finally to high level where the scene captured in the video footage will be interpreted for further analysis. In the process of building the whole-package of  ready-to-deploy solutions, we center our efforts on designing and training the most suitable convolutional neural networks for the particular computer vision problem at hand. Complementary procedures for data collection, data annotation,  post-processing of network outputs tailored for specific application needs, and deployment details will also be discussed where necessary. We hope our work demonstrates the applicability and versatility of convolutional neural networks for real-world computer vision tasks on a broad spectrum, from seeing better to understanding better.</p>

Page generated in 0.1904 seconds