Global ETD Search

481	Deep Convolutional Neural Networks for Real-Time Single Frame Monocular Depth Estimation Schennings, Jacob January 2017 (has links) Vision based active safety systems have become more frequently occurring in modern vehicles to estimate depth of the objects ahead and for autonomous driving (AD) and advanced driver-assistance systems (ADAS). In this thesis a lightweight deep convolutional neural network performing real-time depth estimation on single monocular images is implemented and evaluated. Many of the vision based automatic brake systems in modern vehicles only detect pre-trained object types such as pedestrians and vehicles. These systems fail to detect general objects such as road debris and roadside obstacles. In stereo vision systems the problem is resolved by calculating a disparity image from the stereo image pair to extract depth information. The distance to an object can also be determined using radar and LiDAR systems. By using this depth information the system performs necessary actions to avoid collisions with objects that are determined to be too close. However, these systems are also more expensive than a regular mono camera system and are therefore not very common in the average consumer car. By implementing robust depth estimation in mono vision systems the benefits from active safety systems could be utilized by a larger segment of the vehicle fleet. This could drastically reduce human error related traffic accidents and possibly save many lives. The network architecture evaluated in this thesis is more lightweight than other CNN architectures previously used for monocular depth estimation. The proposed architecture is therefore preferable to use on computationally lightweight systems. The network solves a supervised regression problem during the training procedure in order to produce a pixel-wise depth estimation map. The network was trained using a sparse ground truth image with spatially incoherent and discontinuous data and output a dense spatially coherent and continuous depth map prediction. The spatially incoherent ground truth posed a problem of discontinuity that was addressed by a masked loss function with regularization. The network was able to predict a dense depth estimation on the KITTI dataset with close to state-of-the-art performance. deep learning machine learning mono vision system lightweight CNN convolutional neural network depth estimation lidar kitti vehicle camera mono camera camera real-time real time ad autonomous driving adas advanced driver assistance systems mono depth computer vision regression pixel-wise pixel wise object detection general object detection pedestrian detection vehicle detection supervised learning supervised tensorflow python keras opencv autoliv
482	Incorporating Scene Depth in Discriminative Correlation Filters for Visual Tracking Stynsberg, John January 2018 (has links) Visual tracking is a computer vision problem where the task is to follow a targetthrough a video sequence. Tracking has many important real-world applications in several fields such as autonomous vehicles and robot-vision. Since visual tracking does not assume any prior knowledge about the target, it faces different challenges such occlusion, appearance change, background clutter and scale change. In this thesis we try to improve the capabilities of tracking frameworks using discriminative correlation filters by incorporating scene depth information. We utilize scene depth information on three main levels. First, we use raw depth information to segment the target from its surroundings enabling occlusion detection and scale estimation. Second, we investigate different visual features calculated from depth data to decide which features are good at encoding geometric information available solely in depth data. Third, we investigate handling missing data in the depth maps using a modified version of the normalized convolution framework. Finally, we introduce a novel approach for parameter search using genetic algorithms to find the best hyperparameters for our tracking framework. Experiments show that depth data can be used to estimate scale changes and handle occlusions. In addition, visual features calculated from depth are more representative if they were combined with color features. It is also shown that utilizing normalized convolution improves the overall performance in some cases. Lastly, the usage of genetic algorithms for hyperparameter search leads to accuracy gains as well as some insights on the performance of different components within the framework. Tracking Visual Deep Learning Machine Learning CNN Convolutional Neural Network Unsupervised Learning Clustering Genetic Algorithms Features Visual featues Channel Coding RGBD Scene Depth Map Kinect Discriminative Correlation Filters SRDCF DCF Spatial Spatially Regularized Hyperparameter Search Occlusion Detection Handling Kalman Filters Normalized Convolution Bayesian Gaussian Mixture Scale Estimation Conjugate Gradient Linkoping Sweden Visuell Följning Särdrag Djupa Faltningsnätverk Maskininlärning Djup Inlärning Genetiska Algoritmer Klustring Djup RGBD Linköping Sverige
483	Medical image captioning based on Deep Architectures / Medicinsk bild textning baserad på Djupa arkitekturer Moschovis, Georgios January 2022 (has links) Diagnostic Captioning is described as “the automatic generation of a diagnostic text from a set of medical images of a patient collected during an examination” [59] and it can assist inexperienced doctors and radiologists to reduce clinical errors or help experienced professionals increase their productivity. In this context, tools that would help medical doctors produce higher quality reports in less time could be of high interest for medical imaging departments, as well as significantly impact deep learning research within the biomedical domain, which makes it particularly interesting for people involved in industry and researchers all along. In this work, we attempted to develop Diagnostic Captioning systems, based on novel Deep Learning approaches, to investigate to what extent Neural Networks are capable of performing medical image tagging, as well as automatically generating a diagnostic text from a set of medical images. Towards this objective, the first step is concept detection, which boils down to predicting the relevant tags for X-RAY images, whereas the ultimate goal is caption generation. To this end, we further participated in ImageCLEFmedical 2022 evaluation campaign, addressing both the concept detection and the caption prediction tasks by developing baselines based on Deep Neural Networks; including image encoders, classifiers and text generators; in order to get a quantitative measure of my proposed architectures’ performance [28]. My contribution to the evaluation campaign, as part of this work and on behalf of NeuralDynamicsLab¹ group at KTH Royal Institute of Technology, within the school of Electrical Engineering and Computer Science, ranked 4th in the former and 5th in the latter task [55, 68] among 12 groups included within the top-10 best performing submissions in both tasks. / Diagnostisk textning avser automatisk generering från en diagnostisk text från en uppsättning medicinska bilder av en patient som samlats in under en undersökning och den kan hjälpa oerfarna läkare och radiologer, minska kliniska fel eller hjälpa erfarna yrkesmän att producera diagnostiska rapporter snabbare [59]. Därför kan verktyg som skulle hjälpa läkare och radiologer att producera rapporter av högre kvalitet på kortare tid vara av stort intresse för medicinska bildbehandlingsavdelningar, såväl som leda till inverkan på forskning om djupinlärning, vilket gör den domänen särskilt intressant för personer som är involverade i den biomedicinska industrin och djupinlärningsforskare. I detta arbete var mitt huvudmål att utveckla system för diagnostisk textning, med hjälp av nya tillvägagångssätt som används inom djupinlärning, för att undersöka i vilken utsträckning automatisk generering av en diagnostisk text från en uppsättning medi-cinska bilder är möjlig. Mot detta mål är det första steget konceptdetektering som går ut på att förutsäga relevanta taggar för röntgenbilder, medan slutmålet är bildtextgenerering. Jag deltog i ImageCLEF Medical 2022-utvärderingskampanjen, där jag deltog med att ta itu med både konceptdetektering och bildtextförutsägelse för att få ett kvantitativt mått på prestandan för mina föreslagna arkitekturer [28]. Mitt bidrag, där jag representerade forskargruppen NeuralDynamicsLab² , där jag arbetade som ledande forskningsingenjör, placerade sig på 4:e plats i den förra och 5:e i den senare uppgiften [55, 68] bland 12 grupper som ingår bland de 10 bästa bidragen i båda uppgifterna. Artificial Neural Networks Deep Learning Speech and language technology Natural Language Processing (NLP) Deep networks Generative deep networks Convolutional neural networks (CNN) Text generation Information retrieval Diagnostic captioning Image captioning concept prediction classification image encoders transformers Encoder-Decoder architecture abstractive summarization Neurala nätverk Djup inlärning Tal-och språkteknologi naturlig språkbehandling djup neurala nätverk generativa djupa nätverk konvolutionella neurala nätverk Textgenerering Informationssökning Diagnostisk textning Bildtextning konceptförutsägelse klassificering bildkodare transformatorer kodaravkodararkitektur abstrakt sammanfattning Computer and Information Sciences Data- och informationsvetenskap
484	Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioning Huang, Gabriel 09 1900 (has links) Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies. self-supervised learning few-shot classification few-shot object detection low-data learning object detection instance segmentation representation learning residual network visual transformer Faster R-CNN DETR parametric adversarial divergence generative adversarial network variational auto-encoder maximum-likelihood structured prediction optimal discriminator mutual information implicit generative model multimodal pretraining dense video captioning cross-attention YouCook2 HowTo-100M Youtube-8M Recipe-1M Pascal VOC MSCOCO LVIS mutual information neural estimation apprentissage auto-supervisé classification few-shot détection d'objets few-shot apprentissage efficace en données segmentation en instances apprentissage de représentation réseau résiduel transformer visual divergences antagonistes paramétriques auto-encodeur variationnel maximum de vraisemblance prédiction structurée discriminateur optimal information mutuelle modèle génératif implicite pré-apprentissage multi-modal description dense de vidéo attention croisée ResNet ViT GAN VAE MINE
485	Arquitectura de un sistema de geo-visualización espacio-temporal de actividad delictiva, basada en el análisis masivo de datos, aplicada a sistemas de información de comando y control (C2IS) Salcedo González, Mayra Liliana 03 April 2023 (has links) [ES] La presente tesis doctoral propone la arquitectura de un sistema de Geo-visualización Espaciotemporal de actividad delictiva y criminal, para ser aplicada a Sistemas de Comando y Control (C2S) específicamente dentro de sus Sistemas de Información de Comando y Control (C2IS). El sistema de Geo-visualización Espaciotemporal se basa en el análisis masivo de datos reales de actividad delictiva, proporcionado por la Policía Nacional Colombiana (PONAL) y está compuesto por dos aplicaciones diferentes: la primera permite al usuario geo-visualizar espaciotemporalmente de forma dinámica, las concentraciones, tendencias y patrones de movilidad de esta actividad dentro de la extensión de área geográfica y el rango de fechas y horas que se precise, lo cual permite al usuario realizar análisis e interpretaciones y tomar decisiones estratégicas de acción más acertadas; la segunda aplicación permite al usuario geo-visualizar espaciotemporalmente las predicciones de la actividad delictiva en periodos continuos y cortos a modo de tiempo real, esto también dentro de la extensión de área geográfica y el rango de fechas y horas de elección del usuario. Para estas predicciones se usaron técnicas clásicas y técnicas de Machine Learning (incluido el Deep Learning), adecuadas para el pronóstico en multiparalelo de varios pasos de series temporales multivariantes con datos escasos. Las dos aplicaciones del sistema, cuyo desarrollo se muestra en esta tesis, están realizadas con métodos novedosos que permitieron lograr estos objetivos de efectividad a la hora de detectar el volumen y los patrones y tendencias en el desplazamiento de dicha actividad, mejorando así la conciencia situacional, la proyección futura y la agilidad y eficiencia en los procesos de toma de decisiones, particularmente en la gestión de los recursos destinados a la disuasión, prevención y control del delito, lo cual contribuye a los objetivos de ciudad segura y por consiguiente de ciudad inteligente, dentro de arquitecturas de Sistemas de Comando y Control (C2S) como en el caso de los Centros de Comando y Control de Seguridad Ciudadana de la PONAL. / [CA] Aquesta tesi doctoral proposa l'arquitectura d'un sistema de Geo-visualització Espaitemporal d'activitat delictiva i criminal, per ser aplicada a Sistemes de Comandament i Control (C2S) específicament dins dels seus Sistemes d'informació de Comandament i Control (C2IS). El sistema de Geo-visualització Espaitemporal es basa en l'anàlisi massiva de dades reals d'activitat delictiva, proporcionada per la Policia Nacional Colombiana (PONAL) i està composta per dues aplicacions diferents: la primera permet a l'usuari geo-visualitzar espaitemporalment de forma dinàmica, les concentracions, les tendències i els patrons de mobilitat d'aquesta activitat dins de l'extensió d'àrea geogràfica i el rang de dates i hores que calgui, la qual cosa permet a l'usuari fer anàlisis i interpretacions i prendre decisions estratègiques d'acció més encertades; la segona aplicació permet a l'usuari geovisualitzar espaciotemporalment les prediccions de l'activitat delictiva en períodes continus i curts a mode de temps real, això també dins l'extensió d'àrea geogràfica i el rang de dates i hores d'elecció de l'usuari. Per a aquestes prediccions es van usar tècniques clàssiques i tècniques de Machine Learning (inclòs el Deep Learning), adequades per al pronòstic en multiparal·lel de diversos passos de sèries temporals multivariants amb dades escasses. Les dues aplicacions del sistema, el desenvolupament de les quals es mostra en aquesta tesi, estan realitzades amb mètodes nous que van permetre assolir aquests objectius d'efectivitat a l'hora de detectar el volum i els patrons i les tendències en el desplaçament d'aquesta activitat, millorant així la consciència situacional , la projecció futura i l'agilitat i eficiència en els processos de presa de decisions, particularment en la gestió dels recursos destinats a la dissuasió, prevenció i control del delicte, la qual cosa contribueix als objectius de ciutat segura i per tant de ciutat intel·ligent , dins arquitectures de Sistemes de Comandament i Control (C2S) com en el cas dels Centres de Comandament i Control de Seguretat Ciutadana de la PONAL. / [EN] This doctoral thesis proposes the architecture of a Spatiotemporal Geo-visualization system of criminal activity, to be applied to Command and Control Systems (C2S) specifically within their Command and Control Information Systems (C2IS). The Spatiotemporal Geo-visualization system is based on the massive analysis of real data of criminal activity, provided by the Colombian National Police (PONAL) and is made up of two different applications: the first allows the user to dynamically geo-visualize spatiotemporally, the concentrations, trends and patterns of mobility of this activity within the extension of the geographic area and the range of dates and times that are required, which allows the user to carry out analyses and interpretations and make more accurate strategic action decisions; the second application allows the user to spatially visualize the predictions of criminal activity in continuous and short periods like in real time, this also within the extension of the geographic area and the range of dates and times of the user's choice. For these predictions, classical techniques and Machine Learning techniques (including Deep Learning) were used, suitable for multistep multiparallel forecasting of multivariate time series with sparse data. The two applications of the system, whose development is shown in this thesis, are carried out with innovative methods that allowed achieving these effectiveness objectives when detecting the volume and patterns and trends in the movement of said activity, thus improving situational awareness, the future projection and the agility and efficiency in the decision-making processes, particularly in the management of the resources destined to the dissuasion, prevention and control of crime, which contributes to the objectives of a safe city and therefore of a smart city, within architectures of Command and Control Systems (C2S) as in the case of the Citizen Security Command and Control Centers of the PONAL. / Salcedo González, ML. (2023). Arquitectura de un sistema de geo-visualización espacio-temporal de actividad delictiva, basada en el análisis masivo de datos, aplicada a sistemas de información de comando y control (C2IS) [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/192685 Temporal data space Predictive geo-visualisation Command and control systems (C2S) Efficiency improvement Decision-making Future projection Real-time systems Sparse data Multivariate time series Forecasting of criminal activity Smart city Dynamic geo-visualisation of data Situational awareness Mobility of criminal activity Vector autoregressive (VAR) Multilayer perceptron (MLP) Espacio temporal de datos Geo-visualización predictiva Sistemas de mando y control (C2S) Mejora de la eficiencia Toma de decisiones Proyección futura Sistemas de tiempo real Datos dispersos Pronóstico multipaso y multiparalelo Series temporales multivariantes Pronóstico de actividad delictiva Ciudad segura Ciudad inteligente Geo-visualización dinámica de datos Conciencia situacional Movilidad de actividad delictiva INGENIERÍA TELEMÁTICA
486	Evaluation of Target Tracking Using Multiple Sensors and Non-Causal Algorithms Vestin, Albin, Strandberg, Gustav January 2019 (has links) Today, the main research field for the automotive industry is to find solutions for active safety. In order to perceive the surrounding environment, tracking nearby traffic objects plays an important role. Validation of the tracking performance is often done in staged traffic scenarios, where additional sensors, mounted on the vehicles, are used to obtain their true positions and velocities. The difficulty of evaluating the tracking performance complicates its development. An alternative approach studied in this thesis, is to record sequences and use non-causal algorithms, such as smoothing, instead of filtering to estimate the true target states. With this method, validation data for online, causal, target tracking algorithms can be obtained for all traffic scenarios without the need of extra sensors. We investigate how non-causal algorithms affects the target tracking performance using multiple sensors and dynamic models of different complexity. This is done to evaluate real-time methods against estimates obtained from non-causal filtering. Two different measurement units, a monocular camera and a LIDAR sensor, and two dynamic models are evaluated and compared using both causal and non-causal methods. The system is tested in two single object scenarios where ground truth is available and in three multi object scenarios without ground truth. Results from the two single object scenarios shows that tracking using only a monocular camera performs poorly since it is unable to measure the distance to objects. Here, a complementary LIDAR sensor improves the tracking performance significantly. The dynamic models are shown to have a small impact on the tracking performance, while the non-causal application gives a distinct improvement when tracking objects at large distances. Since the sequence can be reversed, the non-causal estimates are propagated from more certain states when the target is closer to the ego vehicle. For multiple object tracking, we find that correct associations between measurements and tracks are crucial for improving the tracking performance with non-causal algorithms. evaluation target tracking multiple sensors non-causal smoother smoothing tracking vehicle tracking camera lidar estimate estimation prediction vehicle dynamics sensor fusion real-time tracking extended kalman filter filter validation validation position estimation velocity estimation dynamic model model complexity multi object tracking multiple object tracking single object tracking data association tracking fundamentals iterated kalman filter track management gnn global nearest neighbour mahalanobis mahalanobis distance performance evaluation differential gps dgps roi ego several sensors sensors rmse root mean square error invertible motion anti-causal motion anti-causal tracking constant velocity gnn imu tfs two filter smoother ekf rts radar inertial measurement unit nonlinear nonlinear systems mono camera monocular camera noise model tracking performance fixed interval smoothing m/n logic centralized fusion non-causal object tracker car tracking car dynamics automotive active safety object tracking automotive industry thesis master reverse dynamics reverse tracking reverse sequence sequence tracking data propagation ground truth estimating ground truth additional sensors mounted sensors true estimates environment comparison algorithm independent targets overlapping measurements occluded track switch improve lower uncertainty more certain state process noise covariance sampling image sprt adas cnn cv pdf track target ego tracker tentative track observatiom online tracking offline tracking online offline recorded sequences robust self driving self-driving car traffic trajectory true state scenario scenarios future accurate output advanced driver assistance systems non-linear complex noise pedestrian truck bus maneuvering vehicles processed measurement frame state correction probability density function tuning likelihood transition measurement motion model recursion gaussian approximation distribution linear jacobian multiplicative noise ratio ad hoc ad hoc state space approach backward auction euclidean distance statistical threshold gating association margin normalize covariance matrix fusion confirmed rejected tentative history absolute error modular ego motion parameters variables logg hardware specification fused causal factorization independent uncorrelated transform moving rotation translation oncoming overtaking Control Engineering Reglerteknik

Page generated in 0.0716 seconds