121 |
Multimodal Deep Learning for Multi-Label Classification and Ranking ProblemsDubey, Abhishek January 2015 (has links) (PDF)
In recent years, deep neural network models have shown to outperform many state of the art algorithms. The reason for this is, unsupervised pretraining with multi-layered deep neural networks have shown to learn better features, which further improves many supervised tasks. These models not only automate the feature extraction process but also provide with robust features for various machine learning tasks. But the unsupervised pretraining and feature extraction using multi-layered networks are restricted only to the input features and not to the output. The performance of many supervised learning algorithms (or models) depends on how well the output dependencies are handled by these algorithms [Dembczy´nski et al., 2012]. Adapting the standard neural networks to handle these output dependencies for any specific type of problem has been an active area of research [Zhang and Zhou, 2006, Ribeiro et al., 2012].
On the other hand, inference into multimodal data is considered as a difficult problem in machine learning and recently ‘deep multimodal neural networks’ have shown significant results [Ngiam et al., 2011, Srivastava and Salakhutdinov, 2012]. Several problems like classification with complete or missing modality data, generating the missing modality etc., are shown to perform very well with these models. In this work, we consider three nontrivial supervised learning tasks (i) multi-class classification (MCC),
(ii) multi-label classification (MLC) and (iii) label ranking (LR), mentioned in the order of increasing complexity of the output. While multi-class classification deals with predicting one class for every instance, multi-label classification deals with predicting more than one classes for every instance and label ranking deals with assigning a rank to each label for every instance. All the work in this field is associated around formulating new error functions that can force network to identify the output dependencies.
Aim of our work is to adapt neural network to implicitly handle the feature extraction (dependencies) for output in the network structure, removing the need of hand crafted error functions. We show that the multimodal deep architectures can be adapted for these type of problems (or data) by considering labels as one of the modalities. This also brings unsupervised pretraining to the output along with the input. We show that these models can not only outperform standard deep neural networks, but also outperform standard adaptations of neural networks for individual domains under various metrics over several data sets considered by us. We can observe that the performance of our models over other models improves even more as the complexity of the output/ problem increases.
|
122 |
Sequential modeling, generative recurrent neural networks, and their applications to audioMehri, Soroush 12 1900 (has links)
No description available.
|
123 |
Deep Learning for Whole Slide Image Cytology : A Human-in-the-Loop ApproachRydell, Christopher January 2021 (has links)
With cancer being one of the leading causes of death globally, and with oral cancers being among the most common types of cancer, it is of interest to conduct large-scale oral cancer screening among the general population. Deep Learning can be used to make this possible despite the medical expertise required for early detection of oral cancers. A bottleneck of Deep Learning is the large amount of data required to train a good model. This project investigates two topics: certainty calibration, which aims to make a machine learning model produce more reliable predictions, and Active Learning, which aims to reduce the amount of data that needs to be labeled for Deep Learning to be effective. In the investigation of certainty calibration, five different methods are compared, and the best method is found to be Dirichlet calibration. The Active Learning investigation studies a single method, Cost-Effective Active Learning, but it is found to produce poor results with the given experiment setting. These two topics inspire the further development of the cytological annotation tool CytoBrowser, which is designed with oral cancer data labeling in mind. The proposedevolution integrates into the existing tool a Deep Learning-assisted annotation workflow that supports multiple users.
|
124 |
Vizuální systém pro detekci obsazenosti parkoviště pomocí hlubokých neuronových sítí / Visual Car-Detection on the Parking Lots Using Deep Neural NetworksStránský, Václav January 2017 (has links)
The concept of smart cities is inherently connected with efficient parking solutions based on the knowledge of individual parking space occupancy. The subject of this paper is the design and implementation of a robust system for analyzing parking space occupancy from a multi-camera system with the possibility of visual overlap between cameras. The system is designed and implemented in Robot Operating System (ROS) and its core consists of two separate classifiers. The more successful, however, a slower option is detection by a deep neural network. A quick interaction is provided by a less accurate classifier of movement with a background model. The system is capable of working in real time on a graphic card as well as on a processor. The success rate of the system on a testing data set from real operation exceeds 95 %.
|
125 |
Zlepšování systému pro automatické hraní hry Starcraft II v prostředí PySC2 / Improving Bots Playing Starcraft II Game in PySC2 EnvironmentKrušina, Jan January 2018 (has links)
The aim of this thesis is to create an automated system for playing a real-time strategy game Starcraft II. Learning from replays via supervised learning and reinforcement learning techniques are used for improving bot's behavior. The proposed system should be capable of playing the whole game utilizing PySC2 framework for machine learning. Performance of the bot is evaluated against the built-in scripted AI in the game.
|
126 |
Automatické hodnocení anglické výslovnosti nerodilých mluvčích / Automatic Pronunciation Evaluation of Non-Native English SpeakersGazdík, Peter January 2019 (has links)
Computer-Assisted Pronunciation Training (CAPT) is becoming more and more popular these days. However, the accuracy of existing CAPT systems is still quite low. Therefore, this diploma thesis focuses on improving existing methods for automatic pronunciation evaluation on the segmental level. The first part describes common techniques for this task. Afterwards, we proposed the system based on two approaches. Finally, performed experiments show significant improvement over the reference system.
|
127 |
Hluboké neuronové sítě / Deep Neural NetworksHabrnál, Matěj January 2014 (has links)
The thesis addresses the topic of Deep Neural Networks, in particular the methods regar- ding the field of Deep Learning, which is used to initialize the weight and learning process s itself within Deep Neural Networks. The focus is also put to the basic theory of the classical Neural Networks, which is important to comprehensive understanding of the issue. The aim of this work is to determine the optimal set of optional parameters of the algori- thms on various complexity levels of image recognition tasks through experimenting with created application applying Deep Neural Networks. Furthermore, evaluation and analysis of the results and lessons learned from the experimentation with classical and Deep Neural Networks are integrated in the thesis.
|
128 |
Novel Instances and Applications of Shared Knowledge in Computer Vision and Machine Learning SystemsSynakowski, Stuart R. January 2021 (has links)
No description available.
|
129 |
Improving the Robustness of Deep Neural Networks against Adversarial Examples via Adversarial Training with Maximal Coding Rate Reduction / Förbättra Robustheten hos Djupa Neurala Nätverk mot Exempel på en Motpart genom Utbildning för motståndare med Maximal Minskning av KodningshastighetenChu, Hsiang-Yu January 2022 (has links)
Deep learning is one of the hottest scientific topics at the moment. Deep convolutional networks can solve various complex tasks in the field of image processing. However, adversarial attacks have been shown to have the ability of fooling deep learning models. An adversarial attack is accomplished by applying specially designed perturbations on the input image of a deep learning model. The noises are almost visually indistinguishable to human eyes, but can fool classifiers into making wrong predictions. In this thesis, adversarial attacks and methods to improve deep learning ’models robustness against adversarial samples were studied. Five different adversarial attack algorithm were implemented. These attack algorithms included white-box attacks and black-box attacks, targeted attacks and non-targeted attacks, and image-specific attacks and universal attacks. The adversarial attacks generated adversarial examples that resulted in significant drop in classification accuracy. Adversarial training is one commonly used strategy to improve the robustness of deep learning models against adversarial examples. It is shown that adversarial training can provide an additional regularization benefit beyond that provided by using dropout. Adversarial training is performed by incorporating adversarial examples into the training process. Traditionally, during this process, cross-entropy loss is used as the loss function. In order to improve the robustness of deep learning models against adversarial examples, in this thesis we propose two new methods of adversarial training by applying the principle of Maximal Coding Rate Reduction. The Maximal Coding Rate Reduction loss function maximizes the coding rate difference between the whole data set and the sum of each individual class. We evaluated the performance of different adversarial training methods by comparing the clean accuracy, adversarial accuracy and local Lipschitzness. It was shown that adversarial training with Maximal Coding Rate Reduction loss function would yield a more robust network than the traditional adversarial training method. / Djupinlärning är ett av de hetaste vetenskapliga ämnena just nu. Djupa konvolutionella nätverk kan lösa olika komplexa uppgifter inom bildbehandling. Det har dock visat sig att motståndarattacker har förmågan att lura djupa inlärningsmodeller. En motståndarattack genomförs genom att man tillämpar särskilt utformade störningar på den ingående bilden för en djup inlärningsmodell. Störningarna är nästan visuellt omöjliga att särskilja för mänskliga ögon, men kan lura klassificerare att göra felaktiga förutsägelser. I den här avhandlingen studerades motståndarattacker och metoder för att förbättra djupinlärningsmodellers robusthet mot motståndarexempel. Fem olika algoritmer för motståndarattack implementerades. Dessa angreppsalgoritmer omfattade white-box-attacker och black-box-attacker, riktade attacker och icke-målinriktade attacker samt bildspecifika attacker och universella attacker. De negativa attackerna genererade motståndarexempel som ledde till en betydande minskning av klassificeringsnoggrannheten. Motståndsträning är en vanligt förekommande strategi för att förbättra djupinlärningsmodellernas robusthet mot motståndarexempel. Det visas att motståndsträning kan ge en ytterligare regulariseringsfördel utöver den som ges genom att använda dropout. Motståndsträning utförs genom att man införlivar motståndarexempel i träningsprocessen. Traditionellt används under denna process cross-entropy loss som förlustfunktion. För att förbättra djupinlärningsmodellernas robusthet mot motståndarexempel föreslår vi i den här avhandlingen två nya metoder för motståndsträning genom att tillämpa principen om maximal minskning av kodningshastigheten. Förlustfunktionen Maximal Coding Rate Reduction maximerar skillnaden i kodningshastighet mellan hela datamängden och summan av varje enskild klass. Vi utvärderade prestandan hos olika metoder för motståndsträning genom att jämföra ren noggrannhet, motstånds noggrannhet och lokal Lipschitzness. Det visades att motståndsträning med förlustfunktionen Maximal Coding Rate Reduction skulle ge ett mer robust nätverk än den traditionella motståndsträningsmetoden.
|
130 |
The Role of Temporal Fine Structure in Everyday HearingAgudemu Borjigin (12468234) 28 April 2022 (has links)
<p>This thesis aims to investigate how one fundamental component of the inner-ear (cochlear) response to all sounds, the temporal fine structure (TFS), is used by the auditory system in everyday hearing. Although it is well known that neurons in the cochlea encode the TFS through exquisite phase locking, how this initial/peripheral temporal code contributes to everyday hearing and how its degradation contributes to perceptual deficits are foundational questions in auditory neuroscience and clinical audiology that remain unresolved despite extensive prior research. This is largely because the conventional approach to studying the role of TFS involves performing perceptual experiments with acoustic manipulations of stimuli (such as sub-band vocoding), rather than direct physiological or behavioral measurements of TFS coding, and hence is intrinsically limited. The present thesis addresses these gaps in three parts: 1) developing assays that can quantify TFS coding at the individual level 2) comparing individual differences in TFS coding to differences in speech-in-noise perception across a range of real-world listening conditions, and 3) developing deep neural network (DNN) models of speech separation/enhancement to complement the individual-difference approach. By comparing behavioral and electroencephalogram (EEG)-based measures, Part 1 of this work identified a robust test battery that measures TFS processing in individual humans. Using this battery, Part 2 subdivided a large sample of listeners (N=200) into groups with “good” and “poor” TFS sensitivity. A comparison of speech-in-noise scores under a range of listening conditions between the groups revealed that good TFS coding reduces the negative impact of reverberation on speech intelligibility, and leads to reduced reaction times suggesting lessened listening effort. These results raise the possibility that cochlear implant (CI) sound coding strategies could be improved by attempting to provide usable TFS information, and that these individualized TFS assays can also help predict listening outcomes in reverberant, real-world listening environments. Finally, the DNN models (Part 3) introduced significant improvements in speech quality and intelligibility, as evidenced by all acoustic evaluation metrics and test results from CI listeners (N=8). These models can be incorporated as “front-end” noise-reduction algorithms in hearing assistive devices, as well as complement other approaches by serving as a research tool to help generate and rapidly sub-select the most viable hypotheses about the role of TFS coding in complex listening scenarios.</p>
|
Page generated in 0.0672 seconds