Global ETD Search

31	Predicting the future high-risk SARS-CoV-2 variants with deep learning Chen, NingNing 04 July 2022 (has links) SARS-CoV-2 has plagued the world since 2019 with continuously emergence of new variants, resulting in repeated waves of outbreak. Although the countermeasures like vaccination campaign has taken worldwide, the sophisticated virus mutated to escape immune system, threatening the public health. To win the race with the virus and ultimately end the pandemic, we have to take one step ahead to predict how the SARSCoV-2 might evolve and defeat it at the beginning of a new wave. Hence, we proposed a deep learning based framework to ﬁrst build a deep learning model to shape the ﬁtness landscape of the virus and then use genetic algorithm to predict the high-risk variants that might appear in the future. By combining pre-trained protein language model and structure modeling, the model is trained in a supervised way, predicting the viral transmissibility and antibodies escape ability to eight antibodies simultaneously. The prevenient virus evolution trajectory can be largely recovered by our model with high correlation to their sampling time. Novel mutations predicted by our model show high antibody escape through in silico simulation and overlapped with the mutations developed in prevenient infected patients. Overall, our scheme can provide insights into the evolution of SARS-CoV-2 and hopefully guide the development of vaccination and increase the preparedness. SARS-CoV-2 deep learning
32	Estimation of Predictive Uncertainty in the Supervised Segmentation of Magnetic Resonance Imaging (MRI) Diffusion Images Using Deep Ensemble Learning / ESTIMATING PREDICTIVE UNCERTAINTY IN DEEP LEARNING SEGMENTATION FOR DIFFUSION MRI McCrindle, Brian January 2021 (has links) With the desired deployment of Artificial Intelligence (AI), concerns over whether AI can “communicate” why it has made its decisions is of particular importance. In this thesis, we utilize predictive entropy (PE) as an surrogate for predictive uncertainty and report it for various test-time conditions that alter the testing distribution. This is done to evaluate the potential for PE to indicate when users should trust or dis- trust model predictions under dataset shift or out-of-distribution (OOD) conditions, two scenarios that are prevalent in real-world settings. Specifically, we trained an ensemble of three 2D-UNet architectures to segment synthetically damaged regions in fractional anisotropy scalar maps, a widely used diffusion metric to indicate mi- crostructural white-matter damage. Baseline ensemble statistics report that the true positive rate, false negative rate, false positive rate, true negative rate, Dice score, and precision are 0.91, 0.091, 0.23, 0.77, 0.85, and 0.80, respectively. Test-time PE was reported before and after the ensemble was exposed to increasing geometric distortions (OOD), adversarial examples (OOD), and decreasing signal-to-noise ratios (dataset shift). We observed that even though PE shows a strong negative correlation with model performance for increasing adversarial severity (ρAE = −1), this correlation is not seen under distortion or SNR conditions (ρD = −0.26, ρSNR = −0.30). However, the PE variability (PE-Std) between individual model predictions was shown to be a better indicator of uncertainty as strong negative correlations between model performance and PE-Std were seen during geometric distortions and adversarial ex- amples (ρD = −0.83, ρAE = −1). Unfortunately, PE fails to report large absolute uncertainties during these conditions, thus restricting the analysis to correlative relationships. Finally, determining an uncertainty threshold between “certain” and “uncertain” model predictions was seen to be heavily dependant on model calibra- tion. For augmentation conditions close to the training distribution, a single threshold could be hypothesized. However, caution must be taken if such a technique is clinically applied, as model miscalibration could nullify such a threshold for samples far from the distribution. To ensure that PE or PE-Std could be used more broadly for uncertainty estimation, further work must be completed. / Thesis / Master of Applied Science (MASc) Deep Learning MRI mTBI Segmentation
33	Developing Deep Learning Tools in Earthquake Detection and Phase Picking Mai, Hao 31 August 2023 (has links) With the rapid growth of seismic data volumes, traditional automated processing methods, which have been in use for decades, face increasing challenges in handling these data, especially in noisy environments. Deep learning (DL) methods, due to their ability to handle large datasets and perform well in complex scenarios, offer promising solutions to these challenges. When I started my Ph.D. degree, although a sizeable number of researchers were beginning to explore the application of deep learning in seismology, almost no one was involved in the development of much-needed automated data annotation tools and deep learning training platforms for this field. In other rapidly evolving fields of artificial intelligence, such automated tools and platforms are often a prerequisite and critical to advancing the development of deep learning. Motivated by this gap, my Ph.D. research focuses on creating these essential tools and conducting critical investigations in the field of earthquake detection and phase picking using DL methods. The first research chapter introduces QuakeLabeler, an open-source Python toolbox that facilitates the efficient creation and management of seismic training datasets. This tool aims to address the laborious process of producing training labels in the vast amount of seismic data available today. Building on this foundational tool, the second research chapter presents Blockly Earthquake Transformer (BET), a deep learning platform that provides an interactive dashboard for efficient customization of deep learning phase pickers. BET aims to optimize the performance of seismic event detection and phase picking by allowing easy customization of model parameters and providing extensions for transfer learning and fine-tuning. The third and final research chapter investigates the performance of DL pickers by examining the effect of training data size and deployment settings on phase picking accuracy. This investigation provides insight into the optimal size of training datasets, the suitability of DL pickers for new target regions, and the impact of various factors on training and on model performance. Through the development of these tools and investigations, this thesis contributes to the application of DL in seismology, paving the way for more efficient seismic data processing, customizable model creation, and a better understanding of DL model performance in earthquake detection and phase-picking tasks. Earthquake Detection Seismology Deep Learning
34	Deep Learning on the Edge: Model Partitioning, Caching, and Compression Fang, Yihao January 2020 (has links) With the recent advancement in deep learning, there has been increasing interest to apply deep learning algorithms to mobile edge devices (e.g. wireless access points, mobile phones, and self-driving vehicles). Such devices are closer to end-users and data sources compared to cloud data centers, therefore deep learning on the edge leads to several merits: 1) reduce communication overhead (e.g. latency), 2) preserve data privacy (e.g. not leaking sensitive information to cloud service providers), and 3) promote autonomy without the need of continuous network connectivity. However, it also comes with a trade-off that deep learning on the edge often results in less prediction accuracy or longer inference time. How to optimize such a trade-off has drawn a lot of attention among the machine learning and systems research communities. Those communities have explored three main directions: partitioning, caching, and compression to solve the problem. Deep learning model partitioning works in distributed and parallel computing by leveraging computation units (e.g. edge nodes and end devices) of different capabilities to achieve the best of both worlds (accuracy and latency), but the inference time of partitioning is nevertheless lower bounded by the smallest of inference times on edge nodes (or end devices). In contrast, model caching is not limited by such a lower bound. There are two trends of studies in caching, 1) caching the prediction results on the edge node or end device, and 2) caching a partition or less complex model on the edge node or end device. Caching the prediction results usually compromises accuracy, since a mapping function (e.g. a hash function) from the inputs to the cached results often cannot match a complex function given by a full-size neural network. On the other hand, caching a model's partition does not sacrifice accuracy, if we employ a proper partition selection policy. Model compression reduces deep learning model size by e.g. pruning neural network edges or quantizing network parameters. A reduced model has a smaller size and fewer operations to compute on the edge nodes or end device. However, compression usually sacrifices prediction accuracy in exchange for shorter inference time. In this thesis, our contributions to partitioning, caching, and compression are covered with experiments on state-of-the-art deep learning models. In partitioning, we propose TeamNet based on competitive and selective learning schemes. Experiments using MNIST and CIFAR-10 datasets show that on Raspberry Pi and Jetson TX2 (with TensorFlow), TeamNet shortens neural network inference as much as 53% without compromising predictive accuracy. In caching, we propose CacheNet, which caches low-complexity models on end devices and high-complexity (or full) models on edge or cloud servers. Experiments using CIFAR-10 and FVG have shown on Raspberry Pi, Jetson Nano, and Jetson TX2 (with TensorFlow Lite and NCNN), CacheNet is 58-217% faster than baseline approaches that run inference tasks on end devices or edge servers alone. In compression, we propose the logographic subword model for compression in machine translation. Experiments demonstrate that in the tasks of English-Chinese/Chinese-English translation, logographic subword model reduces training and inference time by 11-77% with Theano and Torch. We demonstrate our approaches are promising for applying deep learning models on the mobile edge. / Thesis / Doctor of Philosophy (PhD) / Edge artificial intelligence (EI) has attracted much attention in recent years. EI is a new computing paradigm where artificial intelligence (e.g. deep learning) algorithms are distributed among edge nodes and end devices of computer networks. There are many merits in EI such as shorter latency, better privacy, and autonomy. These advantages motivate us to contribute to EI by developing intelligent solutions including partitioning, caching, and compression. Deep Learning Edge Artificial Intelligence
35	End-to-end Optical Music Recognition Beyond Staff-Level Transcription Ríos-Vila, Antonio 04 July 2024 (has links) El Reconocimiento Óptico de Música (Optical Music Recognition, OMR) es un campo de investigación que estudia cómo leer computacionalmente la notación musical presente en documentos y almacenarla en un formato digital estructurado. Los enfoques tradicionales de OMR suelen estructurarse en torno a un proceso de varias etapas: (i) preprocesamiento de imágenes, donde se abordan cuestiones relacionadas con el proceso de escaneado y la calidad del papel, (ii) segmentación y clasificación de símbolos, donde se detectan y etiquetan los distintos elementos de la imagen, (iii) reconstrucción de la notación musical, una fase de postprocesamiento del proceso de reconocimiento, y (iv) codificación de resultados, donde los elementos reconocidos se almacenan en un formato simbólico adecuado. Estos sistemas logran tasas de reconocimiento competitivas a costa de utilizar determinadas heurísticas, adaptadas a los casos para los que fueron diseñados. En consecuencia, la escalabilidad se convierte en una limitación importante, ya que para cada colección o tipo notacional es necesario diseñar un nuevo conjunto de heurísticas. Además, otro inconveniente de estos enfoques tradicionales es la necesidad de un etiquetado detallado, a menudo obtenido manualmente. Dado que cada símbolo se reconoce individualmente, se requieren las posiciones exactas de cada uno de ellos, junto con sus correspondientes etiquetas musicales. La integración del Aprendizaje Profundo (Deep Learning, DL) en el campo del OMR ha marcado un punto de inflexión hacia la adopción de sistemas holísticos o de extremo a extremo. Estos sistemas, fundamentados en la inteligencia artificial y las redes neuronales profundas, abordan la segmentación y la clasificación de símbolos musicales como un proceso unificado, en lugar de fraccionarlo en múltiples etapas discretas. La metodología permite que el aprendizaje de la extracción de características y la clasificación se realice de manera simultánea, eliminando la necesidad de desarrollar y ajustar procedimientos específicos para cada tarea. La clave de este enfoque radica en el uso de conjuntos de datos compuestos por imágenes de partituras y sus transcripciones correspondientes, obviando la necesidad de marcar la posición exacta de cada símbolo. Así, el avance simplifica significativamente el proceso de transcripción musical, al permitir que las características relevantes para la clasificación sean aprendidas directamente de los datos, sin intervención manual detallada en el etiquetado de elementos individuales. El paradigma de procesamiento de extremo a extremo ha sido objeto de análisis en investigaciones recientes. Estos trabajos, si bien avanzan bajo la premisa de que una fase de preprocesamiento específica ya ha llevado a cabo la segmentación de los pentagramas en las partituras, centran su atención en a recuperación de secuencias de símbolos musicales a partir de imágenes de pentagramas. En este ámbito, las Redes Neuronales Convolucionales Recurrentes (CRNN) son la solución más popular. En estas, el componente convolucional se dedica a la extracción de características significativas de las imágenes, mientras que las capas recurrentes se encargan de interpretar estas características como secuencias de símbolos musicales. Los resultados actuales de OMR han demostrado una gran precisión para transcribir partituras musicales, incluso en los casos más complejos. Estos avances permiten el planteamiento de metas más ambiciosas. Una línea de trabajo destacable es la del OMR universal. Un sistema de transcripción universal de música es aquel capaz de transcribir el contenido de cualquier documento musical. Esto significa que, independientemente de las características y la notación de dicho documento, el modelo es capaz transcribir, en una notación adecuada, y generar la versión digital del mismo. El OMR universal es un modelo ideal por diversas razones. La primera es práctica, ya que facilita el trabajo de los usuarios finales, quienes precisan actualmente de herramientas específicas para cada tipo de partitura musical. La producción de un transcriptor universal permitiría juntar estos programas en herramientas genéricas capaces de cubrir todo el espectro de necesidades de los usuarios, lo cual reduce el coste de procesamiento y mantenimiento de los documentos musicales. Desde un punto de vista científico, esta técnica desbloquearía el potencial de los modelos basados en aprendizaje automático para leer e interpretar documentos musicales, ya que lo harían desde un conocimiento genérico. El logro permite abordar tareas más complejas que necesitan de esta información, pero van más allá de ella. Algunas de estas tareas serían la detección de patrones de autor, la estimación de la dificultad de una partitura o la clasificación por época. Sin embargo, el estado de la cuestión de OMR no es capaz de abordar tal objetivo todavía, debido a una serie de limitaciones. En esta tesis, se proponen trabajos que avanzan el estado de la cuestión de OMR hacia ese objetivo. En primer lugar, se proponen contribuciones para completar los sistemas de OMR, los cuales no son capaces de exportar sus resultados en formatos compatibles con las herramientas musicológicas más comunes. Una vez obtenido un sistema de OMR completo, se proponen trabajos para abordar los problemas de Aligned Music Notation & Lyrics Transcription y polifonía, los cuales son retos relevantes que la literatura no ha abordado (por dificultad). De esta forma, mediante adaptaciones de los sistemas actuales, se avanza el estado de la cuestión en estos temas. Finalmente, se abordan los sistemas libres de segmentación para transcribir páginas musicales, liberando así a los modelos OMR de su estructura secuencial de segmentación y transcripción. En concreto, las investigaciones se enfocan hacia el Sheet Music Transformer, un modelo de transcripción basado en tecnologías de vanguardia para obtener la transcripción de una partitura directamente desde la imagen de su página. / This paper is part of the project I+D+i PID2020-118447RA-I00 (MultiScore), funded by MCIN/AEI/10.13039/501100011033. The first author is supported by grants ACIF/2021/356 and CIBEFP/2022/19 from the “Programa I+D+i de la Generalitat Valenciana”. Deep Learning Optical Music Recognition
36	Reevaluating the Ventral and Lateral Temporal Neural Pathways in Face Processing: Deep Learning Insights into Face Identity and Facial Expression Mechanisms Schwartz, Emily January 2024 (has links) Thesis advisor: Stefano Anzellotti / There has been much debate over how the functional organization of vision develops. Contemporary theories that are inspired by analyzing neural data with machine learning models have led to new insights in understanding brain organization. Given the evolutionary importance of face perception and the specialized mechanisms that have evolved to support evaluating it, examining faces offers a unique way to study a dedicated mechanism that shares much of its organization in ventral and lateral neural pathways with other social stimuli, and provide insight into a more general principle of the organization of social perception. According to a classical view of face perception (Bruce and Young, 1986; Haxby, Hoffman, and Gobbini, 2000), face identity and facial expression recognition are performed by separate neural substrates (ventral and lateral temporal face-selective regions, respectively). However, recent studies challenge this view, showing that expression valence can also be decoded from ventral regions (Skerry and Saxe, 2014; Li, Richardson, and Ghuman, 2019) and identity from lateral regions (Anzellotti and Caramazza, 2017). These recent findings have inspired the formulation of an alternative hypothesis. From a computational perspective, it may be possible to process face identity and facial expression jointly by disentangling information for the two properties. This hypothesis was tested using deep convolutional neural network (DCNN) models as a proof of principle. Subsequently, this is then followed by evaluating the representational content of static face stimuli within ventral and lateral temporal face- selective regions using intracranial electroencephalography (iEEG). This is then extended to investigating the representation content of dynamic faces within these regions using functional magnetic resonance imaging (fMRI). The results reported here as well as the reviewed literature may help to support the reevaluation of the roles the ventral and lateral temporal neural pathways play in processing socially-relevant stimuli. / Thesis (PhD) — Boston College, 2024. / Submitted to: Boston College. Graduate School of Arts and Sciences. / Discipline: Psychology and Neuroscience. deep learning face perception neuroimaging
37	A Naturalistic Driving Study for Lane Change Detection and Personalization Lakhkar, Radhika Anandrao 05 January 2023 (has links) Driver Assistance and Autonomous Driving features are becoming nearly ubiquitous in new vehicles. The intent of the Driver Assistant features is to assist the driver in making safer decisions. The intent of Autonomous Driving features is to execute vehicle maneuvers, without human intervention, in a safe manner. The overall goal of Driver Assistance and Autonomous Driving features is to reduce accidents, injuries, and deaths with a comforting driving experience. However, different drivers can react differently to advanced automated driving technology. It is therefore important to consider and improve the adaptability of these advances based on driver behavior. In this thesis, a human-centric approach is adopted in order to provide an enriching driving experience. The thesis investigates the natural behavior of drivers when changing lanes in terms of preferences of vehicle kinematics parameters using a real-world driving dataset collected as part of the Second Strategic Highway Research Program (SHRP2). The SHRP2 Naturalistic Driving Study (NDS) set is mined for lane change events. This work develops a way to detect reliable lane changing instances from a huge NDS dataset with more than 5,400,000 data files. The lane changing instances are distinguished from noisy and erroneous data by using machine vision lane tracking system variables such as left lane marker probability and right lane marker probability. We have shown that detected lane changing instances can be validated using only vehicle kinematics data. Kinematic vehicle parameters such as vehicle speed, lateral displacement, lateral acceleration, steering wheel angle, and lane change duration are then extracted and examined from time series data to characterize these lane-changing instances for a given driver. We have shown how these vehicle kinematic parameters change and exhibit patterns during lane change maneuvers for a specific driver. The thesis shows the limitations of analyzing vehicle kinematic parameters separately and develops a novel metric, Lane Change Dynamic Score(LCDS) that shows the collective effect of these vehicle kinematic parameters. LCDS is used to classify each lane change and thereby different driving styles. / Master of Science / The current tendency of car manufacturers is to create vehicles that will offer the user the most comfortable ride possible. The user experience is given a lot of attention to ensure it is up to par. With technological advancements, we are moving closer to an era in which automobiles perform many functions autonomously. However, different drivers may react differently to highly automated driving technologies. Therefore, adapting to different driving styles is critical to increasing the acceptance of autonomous vehicle features. In this work, we examine one of the stressful maneuvers of lane changes. The analysis of various drivers' lane-changing behaviors and the value of personalization are the main subjects of this study based on actual driving scenarios. To achieve this, we have provided an algorithm to identify occurrences of lane-changing from real driving trip data files. Following that, we investigated parameters such as lane change duration, vehicle speed, displacement, acceleration, and steering wheel angle when changing lanes. We have demonstrated the patterns and changes in these vehicle kinematic characteristics that occur when a particular driver performs lane change operations. The thesis shows the limitations of analyzing vehicle kinematic parameters separately and develops a novel metric, Lane Change Dynamic Score(LCDS) that shows the collective effect of these vehicle kinematic parameters. LCDS is used to classify each lane change and thereby different driving styles. Lane Change Personalization Deep Learning
38	Deep Learning-Driven Modeling of Dynamic Acoustic Sensing in Biommetic Soft Robotic Pinnae Chakrabarti, Sounak 02 October 2024 (has links) Bats possess remarkably sophisticated biosonar systems that seamlessly integrate the physical encoding of information through intricate ear motions with the neural extraction and processsing of sensory information. While previous studies have endeavored to mimic the pinna (outer ear) dynamics of bats using fixed deformation patterns in biomimetic soft-robotic sonar heads, such physical approaches are inherently limited in their ability to comprehensively explore the vast actuation pattern space that may enable bats to adaptively sense across diverse environments and tasks.To overcome these limitations, this thesis presents the development of deep regression neural networks capable of predicting the beampattern (acoustic radiation pattern) of a soft-robotic pinna as function of its actuator states. The pinna model geometry is derived from a tomographic scan of the right ear of the greater horseshoe bat (textit{Rhinolophus ferrumequinum}. Three virtual actuators are incorporated into this model to simulate a range of shape deformations. For each unique actuation pattern producing a distinct pinna shape conformation, the corresponding ultrasonic beampattern is numerically estimated using a frequency-domain boundary element method (BEM) simulation, providing ground truth data. Two neural networks architectures, a multilayer perceptron (MLP) and a radial basis function network (RBFN) based on von Mises functions were evaluated for their ability to accurately reproduce these numerical beampattern estimates as a function of spherical coordinates azimuth and elevation. Both networks demonstrate comparably low errors in replicating the beampattern data. However, the MLP exhibits significantly higher computational efficiency, reducing training time by 7.4 seconds and inference time by 0.7 seconds compared to the RBFN. The superior computational performance of deep neural network models in inferring biomimetic pinna beampatterns from actuator states enables an extensive exploration of the vast actuation pattern space to identify pinna actuation patterns optimally suited for specific biosonar sensing tasks. This simulation-based approach provides a powerful framework for elucidating the functional principles underlying the dynamic shape adaptations observed in bat biosonar systems. / Master of Science / The aim is to understand how bats can dynamically change the shape of their outer ears (pinnae) to optimally detect sounds in different environments and for different tasks. Previous studies tried to mimic bat ear motions using fixed deformation patterns in robotic ear models, but this approach is limited. Instead this thesis uses deep learning neural networks to predict how changing the shape of a robotic bat pinna model affects its acoustic beampattern (how it radiates and receives sound). The pinna geometry is based on a 3D scan of a greater horseshoe bat ear, with three virtual "actuators" to deform the shape. For many different actuator patterns deforming the pinna, the resulting beampattern is calculated using computer simulations. Neural networks ( multilayer perceptron and radial basis function network) are trained on this data to accurately predict the beampattern from the actuator states. The multilayer perceptron network is found to be significantly more computationally efficient for this task. This neural network based approach allows rapidly exploring the vast range of possible pinna actuations to identify optimal shapes for specific biosonar sensing tasks, shedding light on principles of dynamic ear shape control in bats. biosonar deep learning digital twin
39	Robot Motions that Mitigate Uncertainty Toubeh, Maymoonah 23 October 2024 (has links) This dissertation addresses the challenge of robot decision making in the presence of uncertainty, specifically focusing on robot motion decisions in the context of deep learning-based perception uncertainty. The first part of this dissertation introduces a risk-aware framework for path planning and assignment of multiple robots and multiple demands in unknown environments. The second part introduces a risk-aware motion model for searching for a target object in an unknown environment. To illustrate practical application, consider a situation such as disaster response or search-and-rescue, where it is imperative for ground vehicles to swiftly reach critical locations. Afterward, an agent deployed at a specified location must navigate inside a building to find a target, whether it is an object or a person. In the first problem, the terrain information is only available as an aerial georeferenced image frame. Semantic segmentation of the aerial images is performed using Bayesian deep learning techniques, creating a cost map for the safe navigation ground robots. The proposed framework also accounts for risk at a further level, using conditional value at risk (CVaR), for making risk-aware assignments between the source and goal. When the robot reaches its destination, the second problem addresses the object search task using a proposed machine learning-based intelligent motion model. A comparison of various motion models, including a simple greedy baseline, indicates that the proposed model yields more risk-aware and robust results. All in all, considering uncertainty in both systems leads to demonstrably safer decisions. / Doctor of Philosophy / Scientists need to demonstrate that robots are safe and reliable outside of controlled lab environments for real-world applications to be viable. This dissertation addresses the challenge of robot decision-making in the face of uncertainty, specifically focusing on robot motion decisions in the context of deep learning-based perception uncertainty. Deep learning (DL) refers to using large hierarchical structures, often called neural networks, to approximate semantic information from input data. The first part of this dissertation introduces a risk-aware framework for path planning and assignment of multiple robots and multiple demands in unknown environments. Path planning involves finding a route from the source to the goal, while assignment focuses on selecting source-goal paths to fulfill all demands. The second part introduces a risk-aware motion model for searching for a target object in an unknown environment. Being risk-aware in both cases means taking uncertainty into account. To illustrate practical application, consider a situation such as disaster response or search-and-rescue, where it is imperative for ground vehicles to swiftly reach critical locations. Afterward, an agent deployed at a specified location must navigate inside a building to find a target, whether it is an object or a person. In this dissertation, deep learning is used to interpret image inputs for two distinct robot systems. The input to the first system is an aerial georeferenced image; the second is an indoor scene. After the images are interpreted by deep learning, they undergo further processing to extract information about uncertainty. The information about the image and the uncertainty is used for later processing. In the first case, we use both a traditional path planning method and a novel path assignment method to assign one path from each source to a demand location. In the second case, a motion model is developed using image data, uncertainty, and position in relation to the anticipated target. Several potential motion models are compared for analysis. All in all, considering uncertainty in both systems leads to demonstrably safer decisions. Uncertainty Robot Motion Deep Learning
40	Integrating Multiple Modalities into Deep Learning Network McNeil, Patrick 01 January 2017 (has links) Deep learning networks in the literature traditionally only used a single input modality (or data stream). Integrating multiple modalities into deep learning networks with the goal of correlating extracted features was a major issue. Traditional methods involved treating each modality separately and then writing custom code to combine the extracted features. Current solutions for small numbers of modalities (three or less) showed there are multiple architectures for modality integration. With an increase in the number of modalities, the “curse of dimensionality” affects the performance of the system. The research showed current methods for larger scale integrations required separate, custom created modules with another integration layer outside the deep learning network. These current solutions do not scale well nor provide good generalized performance. This research report studied architectures using multiple modalities and the creation of a scalable and efficient architecture. deep learning multiple modality Computer Sciences

Search results