Spelling suggestions: "subject:"deep 1earning"" "subject:"deep c1earning""
61 |
Visualization design for improving layer-wise relevance propagation and multi-attribute image classificationHuang, Xinyi 01 December 2021 (has links)
No description available.
|
62 |
Accelerating the Computation and Design of Nanoscale Materials with Deep LearningRyczko, Kevin 03 December 2021 (has links)
In this article-based thesis, we cover applications of deep learning to different problems in condensed matter physics, where the goal is to either accelerate the computation or design of a nanoscale material. We first motivate and introduce how machine learning methods can be used to accelerate traditional condensed matter physics calculations. In addition, we discuss what designing a material means, and how it has been previously done. We then consider the fundamentals of electronic structure and conventional calculations which include density functional theory (DFT), density functional perturbation theory (DFPT), quantum Monte Carlo (QMC), and electron transport with tight binding. In addition, we cover the basics of deep learning. Afterwards, we discuss 6 articles. The first 5 articles are dedicated to accelerating the computation of nanoscale materials. In Article 1, we use convolutional neural networks to predict energies for diatomic molecules modelled with a Lennard-Jones potential and density functional theory energies of hexagonal lattices with and without defects. In Article 2, we use extensive deep neural networks to represent density functional theory energy functionals for electron gases by using the electron density as input and bypass the Kohn-Sham equations by using the external potential as input. In addition, we use deep convolutional inverse graphics networks to map the external potential directly to the electron density. In Article 3, we use voxel deep neural networks (VDNNs) to map electron densities to kinetic energy densities and functional derivatives of the kinetic energies for graphene lattices. We also use VDNNs to calculate an electron density from a direct minimization calculation and introduce a Monte Carlo based solver that avoids taking a functional derivative altogether. In Article 4, we use a deep learning framework to predict the polarization, dielectric function, Born-effective charges, longitudinal optical transverse optical splitting, Raman tensors, and Raman spectra for 2 crystalline systems. In Article 5, we use VDNNs to map DFT electron densities to QMC energy densities for graphene systems, and compute the energy barrier associated with forming a Stone-Wales defect. In Article 6, we design a graphene-based quantum transducer that has the ability to physically split valley currents by controlling the pn-doping of the lattice sites. The design is guided by an neural network that operates on a pristine lattice and outputs a lattice with pn-doping such that valley currents are optimally split. Lastly, we summarize the thesis and outline future work.
|
63 |
Analýza časových řad s využitím hlubokého učení / Time series analysis using deep learningHladík, Jakub January 2018 (has links)
The aim of the thesis was to create a tool for time-series prediction based on deep learning. The first part of the work is a brief description of deep learning and its comparison to classical machine learning. In the next section contains brief analysis of some tools, that are already used for time-series forecasting. The last part is focused on the analysis of the problem as well as on the actual creation of the program.
|
64 |
Digital Twin Coaching for Edge Computing Using Deep Learning Based 2D Pose EstimationGámez Díaz, Rogelio 15 April 2021 (has links)
In these challenging times caused by the COVID-19, technology that leverages Artificial Intelligence potential can help people cope with the pandemic. For example, people looking to perform physical exercises while in quarantine. We also find another opportunity in the widespread adoption of mobile smart devices, making complex Artificial Intelligence (AI) models accessible to the average user.
Taking advantage of this situation, we propose a Smart Coaching experience on the Edge with our Digital Twin Coaching (DTC) architecture. Since the general population is advised to work from home, sedentarism has become prevalent. Coaching is a positive force in exercising, but keeping physical distance while exercising is a significant problem. Therefore, a Smart Coach can help in this scenario as it involves using smart devices instead of direct communication with another person. Some researchers have worked on Smart Coaching, but their systems often involve complex devices such as RGB-Depth cameras, making them cumbersome to use. Our approach is one of the firsts to focus on everyday smart devices, like smartphones, to solve this problem.
Digital Twin Coaching can be defined as a virtual system designed to help people improve in a specific field and is a powerful tool if combined with edge technology. The DTC architecture has six characteristics that we try to fulfill: adaptability, compatibility, flexibility, portability, security, and privacy.
We collected training data of 10 subjects using a 2D pose estimation model to train our models since there was no dataset of Coach-Trainee videos. To effectively use this information, the most critical pre-processing step was synchronization. This step synchronizes the coach and the trainee’s poses to overcome the trainee's action lag while performing the routine in real-time.
We trained a light neural network called “Pose Inference Neural Network” (PINN) to serve as a fine-tuning architecture mechanism. We improved the generalist 2D pose estimation model with this trained neural network while keeping the time complexity relatively unaffected. We also propose an Angular Pose Representation to compare the trainee and coach's stances that consider the differences in different people's body proportions.
For the PINN model, we use Random Search Optimization to come up with the best configuration. The configurations tested included using 1, 2, 3, 4, 5, and 10 layers. We chose the 2-Layer Neural Network (2-LNN) configuration because it was the fastest to train and predict while providing a fair tradeoff between performance and resource consumption. Using frame synchronization in pre-processing, we improved 76% on the test loss (Mean Squared Error) while training with the 2-LNN. The PINN improved the R2 score of the PoseNet model by at least 15% and at most 93% depending on the configuration. Our approach only added 4 seconds (roughly 2% of the total time) to the total processing time on average. Finally, the usability test results showed that our Proof of Concept application, DTCoach, was considered easy to learn and convenient to use. At the same time, some participants mentioned that they would like to have more features and improved clarity to be more invested in using the app frequently.
We hope DTCoach can help people stay more active, especially in quarantine, as the application can serve as a motivator. Since it can be run on modern smartphones, it can quickly be adopted by many people.
|
65 |
Shape-Tailored Invariant Descriptors for SegmentationKhan, Naeemullah 11 1900 (has links)
Segmentation is one of the first steps in human visual system which helps us see the world around us. Humans pre-attentively segment scenes into regions of unique textures in around 10-20 ms. In this thesis, we address the problem of segmentation by grouping dense pixel-wise descriptors. Our work is based on the fact that human vision has a feed forward and a feed backward loop, where low level feature are used to refine high level features in forward feed, and higher level feature information is used to refine the low level features in backward feed. Most vision algorithms are based on a feed-forward loop, where low-level features are used to construct and refine high level features, but they don’t have the feed back loop. We have introduced ”Shape-Tailored Local Descriptors”, where we use the high level feature information (region approximation) to update low level features i.e. the descriptor, and the low level feature information of the descriptor is used to update the segmentation regions. Our ”Shape-Tailored Local Descriptor” are dense local descriptors which are tailored to an arbitrarily shaped region, aggregating data only within the region of interest. Since the segmentation, i.e., the regions, are not known a-priori, we propose a joint problem for Shape-Tailored Local Descriptors and Segmentation (regions).
Furthermore, since natural scenes consist of multiple objects, which may have different visual textures at different scales, we propose to use a multi-scale approach to segmentation. We have used a set of discrete scales, and a continuum of scales in our experiments, both resulted in state-of-the-art performance.
Lastly we have looked into the nature of the features selected, we tried handcrafted color and gradient channels and we have also introduced an algorithm to incorporate learning optimal descriptors in segmentation approaches. In the final part of this thesis we have introduced techniques for unsupervised learning of descriptors for segmentation. This eliminates the problem of deep learning methods where we need huge amounts of training data to train the networks. The optimum descriptors are learned, without any training data, on the go during segmentation.
|
66 |
RNA Sequence Classification Using Secondary Structure Fingerprints, Sequence-Based Features, and Deep LearningSutanto, Kevin 12 March 2021 (has links)
RNAs are involved in different facets of biological processes; including but not limited to controlling and inhibiting gene expressions, enabling transcription and translation from DNA to proteins, in processes involving diseases such as cancer, and virus-host interactions. As such, there are useful applications that may arise from studies and analyses involving RNAs, such as detecting cancer by measuring the abundance of specific RNAs, detecting and identifying infections involving RNA viruses, identifying the origins of and relationships between RNA viruses, and identifying potential targets when designing novel drugs.
Extracting sequences from RNA samples is usually not a major limitation anymore thanks to sequencing technologies such as RNA-Seq. However, accurately identifying and analyzing the extracted sequences is often still the bottleneck when it comes to developing RNA-based applications.
Like proteins, functional RNAs are able to fold into complex structures in order to perform specific functions throughout their lifecycle. This suggests that structural information can be used to identify or classify RNA sequences, in addition to the sequence information of the RNA itself. Furthermore, a strand of RNA may have more than one possible structural conformations it can fold into, and it is also possible for a strand to form different structures in vivo and in vitro. However, past studies that utilized secondary structure information for RNA identification purposes have relied on one predicted secondary structure for each RNA sequence, despite the possible one-to-many relationship between a strand of RNA and the possible secondary structures. Therefore, we hypothesized that using a representation that includes the multiple possible secondary structures of an RNA for classification purposes may improve the classification performance.
We proposed and built a pipeline that produces secondary structure fingerprints given a sequence of RNA, that takes into account the aforementioned multiple possible secondary structures for a single RNA. Using this pipeline, we explored and developed different types of secondary structure fingerprints in our studies. A type of fingerprints serves as high-level topological representations of the RNA structure, while another type represents matches with common known RNA secondary structure motifs we have curated from databases and the literature. Next, to test our hypothesis, the different fingerprints are then used with deep learning and with different datasets, alone and together with various sequence-based features, to investigate how the secondary structure fingerprints affect the classification performance.
Finally, by analyzing our findings, we also propose approaches that can be adopted by future studies to further improve our secondary structure fingerprints and classification performance.
|
67 |
FMRI IMAGE REGISTRATION USING DEEP LEARNINGZeledon Lostalo, Emilia Maria 01 December 2019 (has links)
fMRI imaging is considered key on the understanding of the brain and the mind, for this reason has been the subject of tremendous research connecting different disciplines. The intrinsic complexity of this 4-D type of data processing and analysis has been approached with every single computational perspective, lately increasing the trend to include artificial intelligence. One step critical on the fMRI pipeline is image registration. A model of Deep Networks based on Fully Convolutional Neural Networks, spatial transformation neural networks with a self-learning strategy was proposed for the implementation of a Fully deformable model image registration algorithm. Publicly available fMRI datasets with images from real-life subjects were used for training, testing and validating the model. The model performance was measured in comparison with ANTs deformable registration method with good results suggesting that Deep Learning can be used successfully for the development of the field using the basic strategy of studying the brain using the brain-self strategies.
|
68 |
High Precision Deep Learning-Based Tabular Data ExtractionJiang, Ji Chu 21 January 2021 (has links)
The advancements of AI methodologies and computing power enables automation and propels the Industry 4.0 phenomenon. Information and data are digitized more than ever, millions of documents are being processed every day, they are fueled by the growth in institutions, organizations, and their supply chains. Processing documents is a time consuming laborious task. Therefore automating data processing is a highly important task for optimizing supply chains efficiency across all industries. Document analysis for data extraction is an impactful field, this thesis aims to achieve the vital steps in an ideal data extraction pipeline. Data is often stored in tables since it is a structured formats and the user can easily associate values and attributes. Tables can contain vital information from specifications, dimensions, cost etc. Therefore focusing on table analysis and recognition in documents is a cornerstone to data extraction.
This thesis applies deep learning methodologies for automating the two main problems within table analysis for data extraction; table detection and table structure detection. Table detection is identifying and localizing the boundaries of the table. The output of the table detection model will be inputted into the table structure detection model for structure format analysis. Therefore the output of the table detection model must have high localization performance otherwise it would affect the rest of the data extraction pipeline. Our table detection improves bounding box localization performance by incorporating a Kullback–Leibler loss function that calculates the divergence between the probabilistic distribution between ground truth and predicted bounding boxes. As well as adding a voting procedure into the non-maximum suppression step to produce better localized merged bounding box proposals. This model improved precision of tabular detection by 1.2% while achieving the same recall as other state-of-the-art models on the public ICDAR2013 dataset. While also achieving state-of-the-art results of 99.8% precision on the ICDAR2017 dataset. Furthermore, our model showed huge improvements espcially at higher intersection over union (IoU) thresholds; at 95% IoU an improvement of 10.9% can be seen for ICDAR2013 dataset and an improvement of 8.4% can be seen for ICDAR2017 dataset.
Table structure detection is recognizing the internal layout of a table. Often times researchers approach this through detecting the rows and columns. However, in order for correct mapping of each individual cell data location in the semantic extraction step the rows and columns would have to be combined and form a matrix, this introduces additional degrees of error. Alternatively we propose a model that directly detects each individual cell. Our model is an ensemble of state-of-the-art models; Hybird Task Cascade as the detector and dual ResNeXt101 backbones arranged in a CBNet architecture. There is a lack of quality labeled data for table cell structure detection, therefore we hand labeled the ICDAR2013 dataset, and we wish to establish a strong baseline for this dataset. Our model was compared with other state-of-the-art models that excelled at table or table structure detection. Our model yielded a precision of 89.2% and recall of 98.7% on the ICDAR2013 cell structure dataset.
|
69 |
3D Object Detection for Advanced Driver Assistance SystemsDemilew, Selameab 29 June 2021 (has links)
Robust and timely perception of the environment is an essential requirement of all autonomous and semi-autonomous systems. This necessity has been the main factor behind the rapid growth and adoption of LiDAR sensors within the ADAS sensor suite. In this thesis, we develop a fast and accurate 3D object detector that converts raw point clouds collected by LiDARs into sparse occupancy cuboids to detect cars and other road users using deep convolutional neural networks. The proposed pipeline reduces the runtime of PointPillars by 43% and performs on par with other state-of-the-art models. We do not gain improvements in speed by compromising the network's complexity and learning capacity but rather through the use of an efficient input encoding procedure. In addition to rigorous profiling on three different platforms, we conduct a comprehensive error analysis and recognize principal sources of error among the predicted attributes.
Even though point clouds adequately capture the 3D structure of the physical world, they lack the rich texture information present in color images. In light of this, we explore the possibility of fusing the two modalities with the intent of improving detection accuracy. We present a late fusion strategy that merges the classification head of our LiDAR-based object detector with semantic segmentation maps inferred from images. Extensive experiments on the KITTI 3D object detection benchmark demonstrate the validity of the proposed fusion scheme.
|
70 |
Reconfigurable Snapshot HDR Imaging Using Coded MasksAlghamdi, Masheal M. 10 July 2021 (has links)
High Dynamic Range (HDR) image acquisition from a single image capture, also
known as snapshot HDR imaging, is challenging because the bit depths of camera
sensors are far from sufficient to cover the full dynamic range of the scene. Existing
HDR techniques focus either on algorithmic reconstruction or hardware modification
to extend the dynamic range. In this thesis, we propose a joint design for snapshot
HDR imaging by devising a spatially varying modulation mask in the hardware
combined with a deep learning algorithm to reconstruct the HDR image.
In this approach, we achieve a reconfigurable HDR camera design that does not
require custom sensors, and instead can be reconfigured between HDR and conventional
mode with very simple calibration steps. We demonstrate that the proposed
hardware-software solution offers a flexible, yet robust, way to modulate per-pixel
exposures, and the network requires little knowledge of the hardware to faithfully
reconstruct the HDR image. Comparative analysis demonstrated that our method
outperforms the state-of-the-art in terms of visual perception quality.
We leverage transfer learning to overcome the lack of sufficiently large HDR
datasets available. We show how transferring from a different large scale task (image
classification on ImageNet) leads to considerable improvements in HDR reconstruction
|
Page generated in 0.1029 seconds