Global ETD Search

111	Camera-based Recovery of Cardiovascular Signals from Unconstrained Face Videos Using an Attention Network Deshpande, Yogesh Rajan 22 June 2023 (has links) This work addresses the problem of recovering the morphology of blood volume pulse (BVP) information from a video of a person's face. Video-based remote plethysmography methods have shown promising results in estimating vital signs such as heart rate and breathing rate. However, recovering the instantaneous pulse rate signals is still a challenge for the community. This is due to the fact that most of the previous methods concentrate on capturing the temporal average of the cardiovascular signals. In contrast, we present an approach in which BVP signals are extracted with a focus on the recovery of the signal morphology as a generalized form for the computation of physiological metrics. We also place emphasis on allowing natural movements by the subject. Furthermore, our system is capable of extracting individual BVP instances with sufficient signal detail to facilitate candidate re-identification. These improvements have resulted in part from the incorporation of a robust skin-detection module into the overall imaging-based photoplethysmography (iPPG) framework. We present extensive experimental results using the challenging UBFC-Phys dataset and the well-known COHFACE dataset. The source code is available at https://github.com/yogeshd21/CVPM-2023-iPPG-Paper. / Master of Science / In this work we are trying to study and recover human health related metrics and the physiological signals which are at the core for the derivation of such metrics. A well know form of physiological signals is ECG (Electrocardiogram) signals and for our research we work with BVP (Blood Volume Pulse) signals. With this work we are proposing a Deep Learning based model for non-invasive retrieval of human physiological signals from human face videos. Most of the state of the art models as well as researchers try to recover averaged cardiac pulse based metrics like heart rate, breathing rate, etc. without focusing on the details of the recovered physiological signal. Physiological signals like BVP have details like systolic peak, diastolic peak and dicrotic notch, and these signals also have applications in various domains like human mental health study, emotional stimuli study, etc. Hence with this work we focus on retrieval of the morphology of such physiological signals and present a quantitative as well as qualitative results for the same. An efficient attention based deep learning model is presented and scope of reidentification using the retrieved signals is also explored. Along with significant implementations like skin detection model our proposed architecture also shows better performance than state of the art models for two very challenging datasets UBFC-Phys as well as COHFACE. The source code is available at https://github.com/yogeshd21/CVPM-2023-iPPG-Paper. Deep Learning Remote Photoplethysmograph (iPPG) Biometrics
112	Grounding deep models of visual data Bargal, Sarah Adel 21 February 2019 (has links) Deep models are state-of-the-art for many computer vision tasks including object classification, action recognition, and captioning. As Artificial Intelligence systems that utilize deep models are becoming ubiquitous, it is also becoming crucial to explain why they make certain decisions: Grounding model decisions. In this thesis, we study: 1) Improving Model Classification. We show that by utilizing web action images along with videos in training for action recognition, significant performance boosts of convolutional models can be achieved. Without explicit grounding, labeled web action images tend to contain discriminative action poses, which highlight discriminative portions of a video’s temporal progression. 2) Spatial Grounding. We visualize spatial evidence of deep model predictions using a discriminative top-down attention mechanism, called Excitation Backprop. We show how such visualizations are equally informative for correct and incorrect model predictions, and highlight the shift of focus when different training strategies are adopted. 3) Spatial Grounding for Improving Model Classification at Training Time. We propose a guided dropout regularizer for deep networks based on the evidence of a network prediction. This approach penalizes neurons that are most relevant for model prediction. By dropping such high-saliency neurons, the network is forced to learn alternative paths in order to maintain loss minimization. We demonstrate better generalization ability, an increased utilization of network neurons, and a higher resilience to network compression. 4) Spatial Grounding for Improving Model Classification at Test Time. We propose Guided Zoom, an approach that utilizes spatial grounding to make more informed predictions at test time. Guided Zoom compares the evidence used to make a preliminary decision with the evidence of correctly classified training examples to ensure evidenceprediction consistency, otherwise refines the prediction. We demonstrate accuracy gains for fine-grained classification. 5) Spatiotemporal Grounding. We devise a formulation that simultaneously grounds evidence in space and time, in a single pass, using top-down saliency. We visualize the spatiotemporal cues that contribute to a deep recurrent neural network’s classification/captioning output. Based on these spatiotemporal cues, we are able to localize segments within a video that correspond with a specific action, or phrase from a caption, without explicitly optimizing/training for these tasks. Computer science Deep learning Grounding Visual data
113	Improving Hydrologic Connectivity Delineation Based on High-Resolution DEMs and Geospatial Artificial Intelligence Wu, Di 01 August 2024 (has links) (PDF) Hydrological connectivity is crucial for understanding and managing water resources, ecological processes, and landscape dynamics. High-Resolution Digital Elevation Models (HRDEMs) derived from Light Detection and Ranging (LiDAR) data offer unprecedented detail and accuracy in representing terrain features, making them invaluable for mapping hydrological networks and analyzing landscape connectivity. However, challenges persist in accurately delineating flow networks, identifying flow barriers, and optimizing computational efficiency, particularly in large-scale applications and complex terrain conditions. This dissertation addresses these challenges through a comprehensive exploration of advanced techniques in deep learning, spatial analysis, and parallel computing. A common practice is to breach the elevation of roads near drainage crossing locations to remove flow barriers, which, however, are often unavailable or with variable quality. Thus, developing a reliable drainage crossing dataset is essential to improve the HRDEMs for hydrographic delineation. Deep learning models were developed for classifying images that contain the locations of flow barriers. Based on HRDEMs and aerial orthophotos, different Convolutional Neural Network (CNN) models were trained and compared to assess their effectiveness in image classification in four different watersheds across the U.S. Midwest. The results show that most deep learning models can consistently achieve over 90% accuracies. The CNN model with a batch size of 16, a learning rate of 0.01, an epoch of 100, and the HRDEM as the sole input feature exhibits the best performance with 93% accuracy. The addition of aerial orthophotos and their derived spectral indices is insignificant to or even worsens the model’s accuracy. Transferability assessments across geographic regions show promising potential of best-fit model for broader applications, albeit with varying accuracies influenced by hydrography complexity. Based on identified drainage crossing locations, Drainage Barrier Processing (DBP), such as HRDEM excavation, is employed to remove the flow barriers. However, there's a gap in quantitatively assessing the impact of DBP on HRDEM-derived flowlines, especially at finer scales. HRDEM-derived flowlines generated with different flow direction algorithms were evaluated by developing a framework to measure the effects of flow barrier removal. The results show that the primary factor influencing flowline quality is the presence of flow accumulation artifacts. Quality issues also stem from differences between natural and artificial flow paths, unrealistic flowlines in flat areas, complex canal networks, and ephemeral drainageways. Notably, the improvement achieved by DBP is demonstrated to be more than 6%, showcasing its efficacy in reducing the impact of flow barriers on hydrologic connectivity. To overcome the computational intensity and speed up data processing, the efficiency of parallel computing techniques for GeoAI and hydrological modeling was evaluated. The performance of CPU parallel processing on High-Performance Computing (HPC) systems was compared with serial processing on desktop computers and GPU processing using Graphics Processing Units (GPUs). Results demonstrated substantial performance enhancements with GPU processing, particularly in accelerating computationally intensive tasks such as deep learning-based feature detection and hydrological modeling. However, efficiency trends exhibit nonlinear patterns influenced by factors such as communication overhead, task distribution, and resource contention. In summary, this dissertation presents a GeoAI-Hydro framework that significantly advances the quality of hydrological connectivity modeling. By integrating deep learning for accurate flow barrier identification, employing DBP to enhance flowline quality, and utilizing parallel computing to address computational demands, the framework offers a robust solution for high-quality hydrological network mapping and analysis. It paves the way for contributions to more effective water resource management, ecological conservation, and landscape planning. Deep learning GeoAI HEDEM HPC Hydrography LiDAR
114	Learning to Recognize Actions with Weak Supervision / Reconnaissance d'actions de manière faiblement supervisée Chesneau, Nicolas 23 February 2018 (has links) L'accroissement rapide des données numériques vidéographiques fait de la compréhension automatiquedes vidéos un enjeu de plus en plus important. Comprendre de manière automatique une vidéo recouvrede nombreuses applications, parmi lesquelles l'analyse du contenu vidéo sur le web, les véhicules autonomes,les interfaces homme-machine. Cette thèse présente des contributions dans deux problèmes majeurs pourla compréhension automatique des vidéos : la détection d'actions supervisée par des données web, et la localisation d'actions humaines.La détection d'actions supervisées par des données web a pour objectif d'apprendre à reconnaître des actions dans des contenus vidéos sur Internet, sans aucune autre supervision. Nous proposons une approche originaledans ce contexte, qui s'appuie sur la synergie entre les données visuelles (les vidéos) et leur description textuelle associée, et ce dans le but d'apprendre des classifieurs pour les événements sans aucune supervision. Plus précisément, nous télechargeons dans un premier temps une base de données vidéos à partir de requêtes construites automatiquement en s'appuyant sur la description textuelle des événéments, puis nous enlevons les vidéos téléchargées pour un événement, et dans laquelle celui-ci n'apparaït pas. Enfin, un classifieur est appris pour chaque événement. Nous montrons l'importance des deux étapes principales, c'est-à-dire la créations des requêtes et l'étape de suppression des vidéos, par des résutatsquantitatifs. Notre approche est évaluée dans des conditions difficiles, où aucune annotation manuelle n'est disponible, dénotées EK0 dans les challenges TrecVid. Nous obtenons l'état de l'art sur les bases de donnéesMED 2011 et 2013.Dans la seconde partie de notre thèse, nous nous concentrons sur la localisation des actions humaines, ce qui implique de reconnaïtre à la fois les actions se déroulant dans la vidéo, comme par exemple "boire" ou "téléphoner", et leur étendues spatio-temporelles. Nous proposons une nouvelle méthode centrée sur la personne, traquant celle-ci dans les vidéos pour en extraire des tubes encadrant le corps entier, même en cas d'occultations ou dissimulations partielles. Deux raisons motivent notre approche. La première est qu'elle permet de gérer les occultations et les changements de points de vue de la caméra durant l'étape de localisation des personnes, car celle-ci estime la position du corps entier à chaque frame. La seconde est que notre approche fournit une meilleure grille de référence que les tubes humains standards (c'est-à-dire les tubes qui n'encadrent que les parties visibles) pour extraire de l'information sur l'action. Le coeur de notre méthode est un réseau de neurones convolutionnel qui apprend à générer des propositions de parties du corps humain. Notre algorithme de tracking connecte les détections temporellement pour extraire des tubes encadrant le corps entier. Nous évaluons notre nouvelle méthode d'extraction de tubes sur une base de données difficile, DALY, et atteignons l'état de l'art. / With the rapid growth of digital video content, automaticvideo understanding has become an increasingly important task. Video understanding spansseveral applications such as web-video content analysis, autonomous vehicles, human-machine interfaces (eg, Kinect). This thesismakes contributions addressing two major problems in video understanding:webly-supervised action detection and human action localization.Webly-supervised action recognition aims to learn actions from video content on the internet, with no additional supervision. We propose a novel approach in this context, which leverages thesynergy between visual video data and the associated textual metadata, to learnevent classifiers with no manual annotations. Specifically, we first collect avideo dataset with queries constructed automatically from textual descriptionof events, prune irrelevant videos with text and video data, and then learn thecorresponding event classifiers. We show the importance of both the main steps of our method, ie,query generation and data pruning, with quantitative results. We evaluate this approach in the challengingsetting where no manually annotated training set is available, i.e., EK0 in theTrecVid challenge, and show state-of-the-art results on MED 2011 and 2013datasets.In the second part of the thesis, we focus on human action localization, which involves recognizing actions that occur in a video, such as ``drinking'' or ``phoning'', as well as their spatial andtemporal extent. We propose a new person-centric framework for action localization that trackspeople in videos and extracts full-body human tubes, i.e., spatio-temporalregions localizing actions, even in the case of occlusions or truncations.The motivation is two-fold. First, it allows us to handle occlusions and camera viewpoint changes when localizing people, as it infers full-body localization. Second, it provides a better reference grid for extracting action information than standard human tubes, ie, tubes which frame visible parts only.This is achieved by training a novel human part detector that scores visibleparts while regressing full-body bounding boxes, even when they lie outside the frame. The core of our method is aconvolutional neural network which learns part proposals specific to certainbody parts. These are then combined to detect people robustly in each frame.Our tracking algorithm connects the image detections temporally to extractfull-body human tubes. We evaluate our new tube extraction method on a recentchallenging dataset, DALY, showing state-of-the-art results. Attributs Reconnaissance Apprentissage Optimisation Statistiques Deep learning Attributes Recognition Machine learning Optimization Statistic Deep learning 510
115	Coherent Nonlinear Raman Microscopy and the Applications of Deep Learning & Pattern Recognition Methods to the Extraction of Quantitative Information Abdolghader, Pedram 16 September 2021 (has links) Coherent Raman microscopy (CRM) is a powerful nonlinear optical imaging technique based on contrast via Raman active molecular vibrations. CRM has been used in domains ranging from biology to medicine to geology in order to provide quick, sensitive, chemical-specific, and label-free 3D sectioning of samples. The Raman contrast is usually obtained by combining two ultrashort pulse input beams, known as Pump and Stokes, whose frequency difference is adjusted to the Raman vibrational frequency of interest. CRM can be used in conjunction with other imaging modalities such as second harmonic generation, fluorescence, and third harmonic generation microscopy, resulting in a multimodal imaging technique that can capture a massive amount of data. Two fundamental elements are crucial in CRM. First, a laser source which is broadband, stable, rapidly tunable, and low in noise. Second, a strategy for image analysis that can handle denoising and material classification issues in the relatively large datasets obtained by CRM techniques. Stimulated Raman Scattering (SRS) microscopy is a subset of CRM techniques, and this thesis is devoted entirely to it. Although Raman imaging based on a single vibrational resonance can be useful, non-resonant background signals and overlapping bands in SRS can impair contrast and chemical specificity. Tuning over the Raman spectrum is therefore crucial for target identification, which necessitates the use of a broadband and easily tunable laser source. Although supercontinuum generation in a nonlinear fibre could provide extended tunability, it is typically not viable for some CRM techniques, specifically in SRS microscopy. Signal acquisition schemes in SRS microscopy are focused primarily on detecting a tiny modulation transfer between the Pump and Stokes input laser beams. As a result, very low noise source is required. The primary and most important component in hyperspectral SRS microscopy is a low-noise broadband laser source. The second problem in SRS microscopy is poor signal-to-noise (SNR) ratios in some situations, which can be caused by low target-molecule concentrations in the sample and/or scattering losses in deep-tissue imaging, as examples. Furthermore, in some SRS imaging applications (e.g., in vivo), fast imaging, low input laser power or short integration time is required to prevent sample photodamage, typically resulting in low contrast (low SNR) images. Low SNR images also typically suffer from poorly resolved spectral features. Various de-noising techniques have been used to date in image improvement. However, to enable averaging, these often require either previous knowledge of the noise source or numerous images of the same field of view (under better observing conditions), which may result in the image having lower spatial-spectral resolution. Sample segmentation or converting a 2D hyperspectral image to a chemical concentration map, is also a critical issue in SRS microscopy. Raman vibrational bands in heterogeneous samples are likely to overlap, necessitating the use of chemometrics to separate and segment them. We will address the aforementioned issues in SRS microscopy in this thesis. To begin, we demonstrate that a supercontinuum light source based on all normal dispersion (ANDi) fibres generates a stable broadband output with very low incremental source noise. The ANDi fibre output's noise power spectral density was evaluated, and its applicability in hyperspectral SRS microscopy applications was shown. This demonstrates the potential of ANDi fibre sources for broadband SRS imaging as well as their ease of implementation. Second, we demonstrate a deep learning neural net model and unsupervised machine-learning algorithm for rapid and automated de-noising and segmentation of SRS images based on a ten-layer convolutional autoencoder: UHRED (Unsupervised Hyperspectral Resolution Enhancement and De-noising). UHRED is trained in an unsupervised manner using only a single (“one-shot”) hyperspectral image, with no requirements for training on high quality (ground truth) labelled data sets or images. SRS microscopy Nonlinear microscopy Fibre Laser source Deep learning Machine Learning Pattern Recognitions by deep learning
116	Semantic Segmentation For Free Drive-able Space Estimation Gallagher, Eric 02 October 2020 (has links) Autonomous Vehicles need precise information as to the Drive-able space in order to be able to safely navigate. In recent years deep learning and Semantic Segmentation have attracted intense research. It is a highly advancing and rapidly evolving field that continues to provide excellent results. Research has shown that deep learning is emerging as a powerful tool in many applications. The aim of this study is to develop a deep learning system to estimate the Free Drive-able space. Building on the state of the art deep learning techniques, semantic segmentation will be used to replace the need for highly accurate maps, that are expensive to license. Free Drive-able space is defined as the drive-able space on the correct side of the road, that can be reached without a collision with another road user or pedestrian. A state of the art deep network will be trained with a custom data-set in order to learn complex driving decisions. Motivated by good results, further deep learning techniques will be applied to measure distance from monocular images. The findings demonstrate the power of deep learning techniques in complex driving decisions. The results also indicate the economic and technical feasibility of semantic segmentation over expensive high definition maps. info:eu-repo/classification/ddc/004 ddc:004 Deep learning
117	FROM SEEING BETTER TO UNDERSTANDING BETTER: DEEP LEARNING FOR MODERN COMPUTER VISION APPLICATIONS Tianqi Guo (12890459) 17 June 2022 (has links) <p>In this dissertation, we document a few of our recent attempts in bridging the gap between the fast evolving deep learning research and the vast industry needs for dealing with computer vision challenges. More specifically, we developed novel deep-learning-based techniques for the following application-driven computer vision challenges: image super-resolution with quality restoration, motion estimation by optical flow, object detection for shape reconstruction, and object segmentation for motion tracking. Those four topics cover the computer vision hierarchy from the low level where digital images are processed to restore missing information for better human perception, to middle level where certain objects of interest are recognized and their motions are analyzed, finally to high level where the scene captured in the video footage will be interpreted for further analysis. In the process of building the whole-package of ready-to-deploy solutions, we center our efforts on designing and training the most suitable convolutional neural networks for the particular computer vision problem at hand. Complementary procedures for data collection, data annotation, post-processing of network outputs tailored for specific application needs, and deployment details will also be discussed where necessary. We hope our work demonstrates the applicability and versatility of convolutional neural networks for real-world computer vision tasks on a broad spectrum, from seeing better to understanding better.</p> Computer vision Deep learning deep learning computer vision cell tracking cell segmentation super-resolution
118	A SYSTEMATIC STUDY OF SPARSE DEEP LEARNING WITH DIFFERENT PENALTIES Xinlin Tao (13143465) 25 April 2023 (has links) <p>Deep learning has been the driving force behind many successful data science achievements. However, the deep neural network (DNN) that forms the basis of deep learning is</p> <p>often over-parameterized, leading to training, prediction, and interpretation challenges. To</p> <p>address this issue, it is common practice to apply an appropriate penalty to each connection</p> <p>weight, limiting its magnitude. This approach is equivalent to imposing a prior distribution</p> <p>on each connection weight from a Bayesian perspective. This project offers a systematic investigation into the selection of the penalty function or prior distribution. Specifically, under</p> <p>the general theoretical framework of posterior consistency, we prove that consistent sparse</p> <p>deep learning can be achieved with a variety of penalty functions or prior distributions.</p> <p>Examples include amenable regularization penalties (such as MCP and SCAD), spike-and?slab priors (such as mixture Gaussian distribution and mixture Laplace distribution), and</p> <p>polynomial decayed priors (such as the student-t distribution). Our theory is supported by</p> <p>numerical results.</p> <p><br></p> Deep learning Statistics not elsewhere classified Network Compression Sparse Deep Learning Nonlinear feature selection Posterior Consistency
119	TOWARD ROBUST AND INTERPRETABLE GRAPH AND IMAGE REPRESENTATION LEARNING Juan Shu (14816524) 27 April 2023 (has links) <p>Although deep learning models continue to gain momentum, their robustness and interpretability have always been a big concern because of the complexity of such models. In this dissertation, we studied several topics on the robustness and interpretability of convolutional neural networks (CNNs) and graph neural networks (GNNs). We first identified the structural problem of deep convolutional neural networks that leads to the adversarial examples and defined DNN uncertainty regions. We also argued that the generalization error, the large sample theoretical guarantee established for DNN, cannot adequately capture the phenomenon of adversarial examples. Secondly, we studied the dropout in GNNs, which is an effective regularization approach to prevent overfitting. Contrary to CNN, GNN usually has a shallow structure because a deep GNN normally sees performance degradation. We studied different dropout schemes and established a connection between dropout and over-smoothing in GNNs. Therefore we developed layer-wise compensation dropout, which allows GNN to go deeper without suffering performance degradation. We also developed a heteroscedastic dropout which effectively deals with a large number of missing node features due to heavy experimental noise or privacy issues. Lastly, we studied the interpretability of graph neural networks. We developed a self-interpretable GNN structure that denoises useless edges or features, leading to a more efficient message-passing process. The GNN prediction and explanation accuracy were boosted compared with baseline models. </p> Deep learning Applied statistics Deep Learning Dropout Interpretable Machine Learning Robustness Missing data
120	Analysis and Comparison of Distributed Training Techniques for Deep Neural Networks in a Dynamic Environment / Analys och jämförelse av distribuerade tränings tekniker för djupa neurala nätverk i en dynamisk miljö Gebremeskel, Ermias January 2018 (has links) Deep learning models' prediction accuracy tends to improve with the size of the model. The implications being that the amount of computational power needed to train models is continuously increasing. Distributed deep learning training tries to address this issue by spreading the computational load onto several devices. In theory, distributing computation onto N devices should give a performance improvement of xN. Yet, in reality the performance improvement is rarely xN, due to communication and other overheads. This thesis will study the communication overhead incurred when distributing deep learning training. Hopsworks is a platform designed for data science. The purpose of this work is to explore a feasible way of deploying distributed deep learning training on a shared cluster and analyzing the performance of different distributed deep learning algorithms to be used on this platform. The findings of this study show that bandwidth-optimal communication algorithms like ring all-reduce scales better than many-to-one communication algorithms like parameter server, but were less fault tolerant. Furthermore, system usage statistics collected revealed a network bottleneck when training is distributed on multiple machines. This work also shows that it is possible to run MPI on a hadoop cluster by building a prototype that orchestrates resource allocation, deployment, and monitoring of MPI based training jobs. Even though the experiments did not cover different cluster configurations, the results are still relevant in showing what considerations need to be made when distributing deep learning training. / Träffsäkerheten hos djupinlärningsmodeller tenderar att förbättras i relation med storleken på modellen. Implikationen blir att mängden beräkningskraft som krävs för att träna modeller ökar kontinuerligt.Distribuerad djupinlärning försöker lösa detta problem genom att distribuera beräkningsbelastning på flera enheter. Att distribuera beräkningarna på N enheter skulle i teorin innebär en linjär skalbarhet (xN). I verkligenheten stämmer sällan detta på grund av overhead från nätverkskommunikation eller I/O. Hopsworks är en dataanalys och maskininlärningsplattform. Syftetmed detta arbeta är att utforska ett möjligt sätt att utföra distribueraddjupinlärningträning på ett delat datorkluster, samt analysera prestandan hos olika algoritmer för distribuerad djupinlärning att använda i plattformen. Resultaten i denna studie visar att nätverksoptimala algoritmer såsom ring all-reduce skalar bättre för distribuerad djupinlärning änmånga-till-en kommunikationsalgoritmer såsom parameter server, men är inte lika feltoleranta. Insamlad data från experimenten visade på en flaskhals i nätverket vid träning på flera maskiner. Detta arbete visar även att det är möjligt att exekvera MPI program på ett hadoopkluster genom att bygga en prototyp som orkestrerar resursallokering, distribution och övervakning av exekvering. Trots att experimenten inte täcker olika klusterkonfigurationer så visar resultaten på vilka faktorer som bör tas hänsyn till vid distribuerad träning av djupinlärningsmodeller. deep learning large scale distributed deep learning data parallelism Computer Sciences Datavetenskap (datalogi)

Search results