1 |
Residual Capsule NetworkBhamidi, Sree Bala Shruthi 08 1900 (has links)
Indiana University-Purdue University Indianapolis (IUPUI) / The Convolutional Neural Network (CNN) have shown a substantial improvement in the field of Machine Learning. But they do come with their own set of drawbacks. Capsule Networks have addressed the limitations of CNNs and have shown a great improvement by calculating the pose and transformation of the image. Deeper networks are more powerful than shallow networks but at the same time, more difficult to train. Residual Networks ease the training and have shown evidence that they can give good accuracy with considerable depth. Putting the best of Capsule Network and Residual Network together, we present Residual Capsule Network and 3-Level Residual Capsule Network, a framework that uses the best of Residual Networks and Capsule Networks. The conventional Convolutional layer in Capsule Network is replaced by skip connections like the Residual Networks to decrease the complexity of the Baseline Capsule Network and seven ensemble Capsule Network. We trained our models on MNIST and CIFAR-10 datasets and have seen a significant decrease in the number of parameters when compared to the Baseline models.
|
2 |
Contextual Recurrent Level Set Networks and Recurrent Residual Networks for Semantic LabelingLe, Ngan Thi Hoang 01 May 2018 (has links)
Semantic labeling is becoming more and more popular among researchers in computer vision and machine learning. Many applications, such as autonomous driving, tracking, indoor navigation, augmented reality systems, semantic searching, medical imaging are on the rise, requiring more accurate and efficient segmentation mechanisms. In recent years, deep learning approaches based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) have dramatically emerged as the dominant paradigm for solving many problems in computer vision and machine learning. The main focus of this thesis is to investigate robust approaches that can tackle the challenging semantic labeling tasks including semantic instance segmentation and scene understanding. In the first approach, we convert the classic variational Level Set method to a learnable deep framework by proposing a novel definition of contour evolution named Recurrent Level Set (RLS). The proposed RLS employs Gated Recurrent Units to solve the energy minimization of a variational Level Set functional. The curve deformation processes in RLS is formulated as a hidden state evolution procedure and is updated by minimizing an energy functional composed of fitting forces and contour length. We show that by sharing the convolutional features in a fully end-to-end trainable framework, RLS is able to be extended to Contextual Recurrent Level Set (CRLS) Networks to address semantic segmentation in the wild problem. The experimental results have shown that our proposed RLS improves both computational time and segmentation accuracy against the classic variational Level Set-based methods whereas the fully end-to-end system CRLS achieves competitive performance compared to the state-of-the-art semantic segmentation approaches on PAS CAL VOC 2012 and MS COCO 2014 databases. The second proposed approach, Contextual Recurrent Residual Networks (CRRN), inherits all the merits of sequence learning information and residual learning in order to simultaneously model long-range contextual infor- mation and learn powerful visual representation within a single deep network. Our proposed CRRN deep network consists of three parts corresponding to sequential input data, sequential output data and hidden state as in a recurrent network. Each unit in hidden state is designed as a combination of two components: a context-based component via sequence learning and a visualbased component via residual learning. That means, each hidden unit in our proposed CRRN simultaneously (1) learns long-range contextual dependencies via a context-based component. The relationship between the current unit and the previous units is performed as sequential information under an undirected cyclic graph (UCG) and (2) provides powerful encoded visual representation via residual component which contains blocks of convolution and/or batch normalization layers equipped with an identity skip connection. Furthermore, unlike previous scene labeling approaches [1, 2, 3], our method is not only able to exploit the long-range context and visual representation but also formed under a fully-end-to-end trainable system that effectively leads to the optimal model. In contrast to other existing deep learning networks which are based on pretrained models, our fully-end-to-end CRRN is completely trained from scratch. The experiments are conducted on four challenging scene labeling datasets, i.e. SiftFlow, CamVid, Stanford background, and SUN datasets, and compared against various state-of-the-art scene labeling methods.
|
3 |
Wide Activated Separate 3D Convolution for Video Super-ResolutionYu, Xiafei 18 December 2019 (has links)
Video super-resolution (VSR) aims to recover a realistic high-resolution (HR) frame
from its corresponding center low-resolution (LR) frame and several neighbouring supporting frames. The neighbouring supporting LR frames can provide extra information to help recover the HR frame. However, these frames are not aligned with the center frame due to the motion of objects. Recently, many video super-resolution methods based on deep learning have been proposed with the rapid development of neural networks. Most of these methods utilize motion estimation and compensation models as preprocessing to handle spatio-temporal alignment problem. Therefore, the accuracy of these motion estimation models are critical for predicting the high-resolution frames. Inaccurate results of motion compensation models will lead to artifacts and blurs, which also will damage the recovery of high-resolution frames. We propose an effective wide activated separate 3 dimensional (3D) Convolution Neural Network (CNN) for video super-resolution to overcome the drawback of utilizing motion compensation models. Separate 3D convolution factorizes the 3D convolution into convolutions in the spatial and temporal domain, which have benefit for the optimization of spatial and temporal convolution components. Therefore, our method can capture temporal and spatial information of input frames simultaneously without additional motion evaluation and compensation model. Moreover, the experimental results demonstrated the effectiveness of the proposed wide activated separate 3D CNN.
|
4 |
TOWARDS AN UNDERSTANDING OF RESIDUAL NETWORKS USING NEURAL TANGENT HIERARCHYYuqing Li (10223885) 06 May 2021 (has links)
<div>Deep learning has become an important toolkit for data science and artificial intelligence. In contrast to its practical success across a wide range of fields, theoretical understanding of the principles behind the success of deep learning has been an issue of controversy. Optimization, as an important component of theoretical machine learning, has attracted much attention. The optimization problems induced from deep learning is often non-convex and</div><div>non-smooth, which is challenging to locate the global optima. However, in practice, global convergence of first-order methods like gradient descent can be guaranteed for deep neural networks. In particular, gradient descent yields zero training loss in polynomial time for deep neural networks despite its non-convex nature. Besides that, another mysterious phenomenon is the compelling performance of Deep Residual Network (ResNet). Not only</div><div>does training ResNet require weaker conditions, the employment of residual connections by ResNet even enables first-order methods to train the neural networks with an order of magnitude more layers. Advantages arising from the usage of residual connections remain to be discovered.</div><div><br></div><div>In this thesis, we demystify these two phenomena accordingly. Firstly, we contribute to further understanding of gradient descent. The core of our analysis is the neural tangent hierarchy (NTH) that captures the gradient descent dynamics of deep neural networks. A recent work introduced the Neural Tangent Kernel (NTK) and proved that the limiting</div><div>NTK describes the asymptotic behavior of neural networks trained by gradient descent in the infinite width limit. The NTH outperforms the NTK in two ways: (i) It can directly study the time variation of NTK for neural networks. (ii) It improves the result to non-asymptotic settings. Moreover, by applying NTH to ResNet with smooth and Lipschitz activation function, we reduce the requirement on the layer width m with respect to the number of training samples n from quartic to cubic, obtaining a state-of-the-art result. Secondly, we extend our scope of analysis to structural properties of deep neural networks. By making fair and consistent comparisons between fully-connected network and ResNet, we suggest strongly that the particular skip-connection architecture possessed by ResNet is the main</div><div>reason for its triumph over fully-connected network.</div>
|
5 |
A comparative evaluation of 3d and spatio-temporal deep learning techniques for crime classification and predictionMatereke, Tawanda Lloyd January 2021 (has links)
>Magister Scientiae - MSc / This research is on a comparative evaluation of 3D and spatio-temporal deep learning
methods for crime classification and prediction using the Chicago crime dataset, which
has 7.29 million records, collected from 2001 to 2020. In this study, crime classification
experiments are carried out using two 3D deep learning algorithms, i.e., 3D Convolutional
Neural Network and the 3D Residual Network. The crime classification models
are evaluated using accuracy, F1 score, Area Under Receiver Operator Curve (AUROC),
and Area Under Curve - Precision-Recall (AUCPR). The effectiveness of spatial grid resolutions
on the performance of the classification models is also evaluated during training,
validation and testing.
|
6 |
LSTM Neural Networks for Detection and Assessment of Back Pain Risk in Manual LiftingThomas, Brennan January 2021 (has links)
No description available.
|
7 |
Precipitation Nowcasting using Residual NetworksVega Ezpeleta, Emilio January 2018 (has links)
The aim of this paper is to investigate if rainfall prediction (nowcasting) can successively be made using a deep learning approach. The input to the networks are different spatiotemporal variables including forecasts from a NWP model. The results indicate that these networks has some predictive power and could be use in real application. Another interesting empirical finding relates to the usage of transfer learning from a domain which is not related instead of random initialization. Using pretrained parameters resulted in better convergence and overall performance than random initialization of the parameters.
|
8 |
Residual Capsule NetworkSree Bala Shrut Bhamidi (6990443) 13 August 2019 (has links)
<p>The Convolutional Neural
Network (CNN) have shown a substantial improvement in the field of Machine
Learning. But they do come with their own set of drawbacks. Capsule Networks
have addressed the limitations of CNNs and have shown a great improvement by calculating
the pose and transformation of the image. Deeper networks are more powerful
than shallow networks but at the same time, more difficult to train. Residual
Networks ease the training and have shown evidence that they can give good
accuracy with considerable depth. Putting the best of Capsule Network and
Residual Network together, we present Residual Capsule Network and 3-Level
Residual Capsule Network, a framework that uses the best of Residual Networks
and Capsule Networks. The conventional Convolutional layer in Capsule Network
is replaced by skip connections like the Residual Networks to decrease the
complexity of the Baseline Capsule Network and seven ensemble Capsule Network.
We trained our models on MNIST and CIFAR-10 datasets and have seen a significant
decrease in the number of parameters when compared to the Baseline models.</p>
|
9 |
Verifikace osob podle hlasu bez extrakce příznaků / Speaker Verification without Feature ExtractionLukáč, Peter January 2021 (has links)
Verifikácia osôb je oblasť, ktorá sa stále modernizuje, zlepšuje a snaží sa vyhovieť požiadavkám, ktoré sa na ňu kladú vo oblastiach využitia ako sú autorizačné systmémy, forenzné analýzy, atď. Vylepšenia sa uskutočňujú vďaka pokrom v hlbokom učení, tvorením nových trénovacích a testovacích dátovych sad a rôznych súťaží vo verifikácií osôb a workshopov. V tejto práci preskúmame modely pre verifikáciu osôb bez extrakcie príznakov. Používanie nespracovaných zvukových stôp ako vstupy modelov zjednodušuje spracovávanie vstpu a teda znižujú sa výpočetné a pamäťové požiadavky a redukuje sa počet hyperparametrov potrebných pre tvorbu príznakov z nahrávok, ktoré ovplivňujú výsledky. Momentálne modely bez extrakcie príznakov nedosahujú výsledky modelov s extrakciou príznakov. Na základných modeloch budeme experimentovať s modernými technikamy a budeme sa snažiť zlepšiť presnosť modelov. Experimenty s modernými technikamy značne zlepšili výsledky základných modelov ale stále sme nedosiahli výsledky vylepšeného modelu s extrakciou príznakov. Zlepšenie je ale dostatočné nato aby sme vytovrili fúziu so s týmto modelom. Záverom diskutujeme dosiahnuté výsledky a navrhujeme zlepšenia na základe týchto výsledkov.
|
10 |
Towards meaningful and data-efficient learning : exploring GAN losses, improving few-shot benchmarks, and multimodal video captioningHuang, Gabriel 09 1900 (has links)
Ces dernières années, le domaine de l’apprentissage profond a connu des progrès énormes dans des applications allant de la génération d’images, détection d’objets, modélisation du langage à la réponse aux questions visuelles. Les approches classiques telles que l’apprentissage supervisé nécessitent de grandes quantités de données étiquetées et spécifiques à la tâches. Cependant, celles-ci sont parfois coûteuses, peu pratiques, ou trop longues à collecter. La modélisation efficace en données, qui comprend des techniques comme l’apprentissage few-shot (à partir de peu d’exemples) et l’apprentissage self-supervised (auto-supervisé), tentent de remédier au manque de données spécifiques à la tâche en exploitant de grandes quantités de données plus “générales”. Les progrès de l’apprentissage profond, et en particulier de l’apprentissage few-shot, s’appuient sur les benchmarks (suites d’évaluation), les métriques d’évaluation et les jeux de données, car ceux-ci sont utilisés pour tester et départager différentes méthodes sur des tâches précises, et identifier l’état de l’art. Cependant, du fait qu’il s’agit de versions idéalisées de la tâche à résoudre, les benchmarks sont rarement équivalents à la tâche originelle, et peuvent avoir plusieurs limitations qui entravent leur rôle de sélection des directions de recherche les plus prometteuses. De plus, la définition de métriques d’évaluation pertinentes peut être difficile, en particulier dans le cas de sorties structurées et en haute dimension, telles que des images, de l’audio, de la parole ou encore du texte. Cette thèse discute des limites et des perspectives des benchmarks existants, des fonctions de coût (training losses) et des métriques d’évaluation (evaluation metrics), en mettant l’accent sur la modélisation générative - les Réseaux Antagonistes Génératifs (GANs) en particulier - et la modélisation efficace des données, qui comprend l’apprentissage few-shot et self-supervised. La première contribution est une discussion de la tâche de modélisation générative, suivie d’une exploration des propriétés théoriques et empiriques des fonctions de coût des GANs. La deuxième contribution est une discussion sur la limitation des few-shot classification benchmarks, certains ne nécessitant pas de généralisation à de nouvelles sémantiques de classe pour être résolus, et la proposition d’une méthode de base pour les résoudre sans étiquettes en phase de testing. La troisième contribution est une revue sur les méthodes few-shot et self-supervised de détection d’objets , qui souligne les limites et directions de recherche prometteuses. Enfin, la quatrième contribution est une méthode efficace en données pour la description de vidéo qui exploite des jeux de données texte et vidéo non supervisés. / In recent years, the field of deep learning has seen tremendous progress for applications ranging from image generation, object detection, language modeling, to visual question answering. Classic approaches such as supervised learning require large amounts of task-specific and labeled data, which may be too expensive, time-consuming, or impractical to collect. Data-efficient methods, such as few-shot and self-supervised learning, attempt to deal with the limited availability of task-specific data by leveraging large amounts of general data. Progress in deep learning, and in particular, few-shot learning, is largely driven by the relevant benchmarks, evaluation metrics, and datasets. They are used to test and compare different methods on a given task, and determine the state-of-the-art. However, due to being idealized versions of the task to solve, benchmarks are rarely equivalent to the original task, and can have several limitations which hinder their role of identifying the most promising research directions. Moreover, defining meaningful evaluation metrics can be challenging, especially in the case of high-dimensional and structured outputs, such as images, audio, speech, or text. This thesis discusses the limitations and perspectives of existing benchmarks, training losses, and evaluation metrics, with a focus on generative modeling—Generative Adversarial Networks (GANs) in particular—and data-efficient modeling, which includes few-shot and self-supervised learning. The first contribution is a discussion of the generative modeling task, followed by an exploration of theoretical and empirical properties of the GAN loss. The second contribution is a discussion of a limitation of few-shot classification benchmarks, which is that they may not require class semantic generalization to be solved, and the proposal of a baseline method for solving them without test-time labels. The third contribution is a survey of few-shot and self-supervised object detection, which points out the limitations and promising future research for the field. Finally, the fourth contribution is a data-efficient method for video captioning, which leverages unsupervised text and video datasets, and explores several multimodal pretraining strategies.
|
Page generated in 0.0746 seconds