141 |
Semantic Segmentation of Oblique Views in a 3D-EnvironmentTranell, Victor January 2019 (has links)
This thesis presents and evaluates different methods to semantically segment 3D-models by rendered 2D-views. The 2D-views are segmented separately and then merged together. The thesis evaluates three different merge strategies, two different classification architectures, how many views should be rendered and how these rendered views should be arranged. The results are evaluated both quantitatively and qualitatively and then compared with the current classifier at Vricon presented in [30]. The conclusion of this thesis is that there is a performance gain to be had using this method. The best model was using two views and attains an accuracy of 90.89% which can be compared with 84.52% achieved by the single view network from [30]. The best nine view system achieved a 87.72%. The difference in accuracy between the two and the nine view system is attributed to the higher quality mesh on the sunny side of objects, which typically is the south side. The thesis provides a proof of concept and there are still many areas where the system can be improved. One of them being the extraction of training data which seemingly would have a huge impact on the performance.
|
142 |
Representation of spatial transformations in deep neural networksLenc, Karel January 2017 (has links)
This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection. Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well. Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived covariance constraint can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification. The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.
|
143 |
Exploiting diversity for efficient machine learningGeras, Krzysztof Jerzy January 2018 (has links)
A common practice for solving machine learning problems is currently to consider each problem in isolation, starting from scratch every time a new learning problem is encountered or a new model is proposed. This is a perfectly feasible solution when the problems are sufficiently easy or, if the problem is hard when a large amount of resources, both in terms of the training data and computation, are available. Although this naive approach has been the main focus of research in machine learning for a few decades and had a lot of success, it becomes infeasible if the problem is too hard in proportion to the available resources. When using a complex model in this naive approach, it is necessary to collect large data sets (if possible at all) to avoid overfitting and hence it is also necessary to use large computational resources to handle the increased amount of data, first during training to process a large data set and then also at test time to execute a complex model. An alternative to this strategy of treating each learning problem independently is to leverage related data sets and computation encapsulated in previously trained models. By doing that we can decrease the amount of data necessary to reach a satisfactory level of performance and, consequently, improve the accuracy achievable and decrease training time. Our attack on this problem is to exploit diversity - in the structure of the data set, in the features learnt and in the inductive biases of different neural network architectures. In the setting of learning from multiple sources we introduce multiple-source cross-validation, which gives an unbiased estimator of the test error when the data set is composed of data coming from multiple sources and the data at test time are coming from a new unseen source. We also propose new estimators of variance of the standard k-fold cross-validation and multiple-source cross-validation, which have lower bias than previously known ones. To improve unsupervised learning we introduce scheduled denoising autoencoders, which learn a more diverse set of features than the standard denoising auto-encoder. This is thanks to their training procedure, which starts with a high level of noise, when the network is learning coarse features and then the noise is lowered gradually, which allows the network to learn some more local features. A connection between this training procedure and curriculum learning is also drawn. We develop further the idea of learning a diverse representation by explicitly incorporating the goal of obtaining a diverse representation into the training objective. The proposed model, the composite denoising autoencoder, learns multiple subsets of features focused on modelling variations in the data set at different levels of granularity. Finally, we introduce the idea of model blending, a variant of model compression, in which the two models, the teacher and the student, are both strong models, but different in their inductive biases. As an example, we train convolutional networks using the guidance of bidirectional long short-term memory (LSTM) networks. This allows to train the convolutional neural network to be more accurate than the LSTM network at no extra cost at test time.
|
144 |
Image context for object detection, object context for part detectionGonzalez-Garcia, Abel January 2018 (has links)
Objects and parts are crucial elements for achieving automatic image understanding. The goal of the object detection task is to recognize and localize all the objects in an image. Similarly, semantic part detection attempts to recognize and localize the object parts. This thesis proposes four contributions. The first two make object detection more efficient by using active search strategies guided by image context. The last two involve parts. One of them explores the emergence of parts in neural networks trained for object detection, whereas the other improves on part detection by adding object context. First, we present an active search strategy for efficient object class detection. Modern object detectors evaluate a large set of windows using a window classifier. Instead, our search sequentially chooses what window to evaluate next based on all the information gathered before. This results in a significant reduction on the number of necessary window evaluations to detect the objects in the image. We guide our search strategy using image context and the score of the classifier. In our second contribution, we extend this active search to jointly detect pairs of object classes that appear close in the image, exploiting the valuable information that one class can provide about the location of the other. This leads to an even further reduction on the number of necessary evaluations for the smaller, more challenging classes. In the third contribution of this thesis, we study whether semantic parts emerge in Convolutional Neural Networks trained for different visual recognition tasks, especially object detection. We perform two quantitative analyses that provide a deeper understanding of their internal representation by investigating the responses of the network filters. Moreover, we explore several connections between discriminative power and semantics, which provides further insights on the role of semantic parts in the network. Finally, the last contribution is a part detection approach that exploits object context. We complement part appearance with the object appearance, its class, and the expected relative location of the parts inside it. We significantly outperform approaches that use part appearance alone in this challenging task.
|
145 |
Identification of Individuals from Ears in Real World ConditionsHansley, Earnest Eugene 05 April 2018 (has links)
A number of researchers have shown that ear recognition is a viable alternative to more common biometrics such as fingerprint, face and iris because the ear is relatively stable over time, the ear is non-invasive to capture, the ear is expressionless, and both the ear’s geometry and shape have significant variation among individuals. Researchers have used different approaches to enhance ear recognition. Some researchers have improved upon existing algorithms, some have developed algorithms from scratch to assist with recognizing individuals by ears, and some researchers have taken algorithms tried and tested for another purpose, i.e., face recognition, and applied them to ear recognition. These approaches have resulted in a number of state-of-the-art effective methods to identify individuals by ears. However, most ear recognition research has been done using ear images that were captured in an ideal setting: ear images have near perfect lighting for image quality, ears are in the same position for each subject, and ears are without earrings, hair occlusions, or anything else that could block viewing of the entire ear.
In order for ear recognition to be practical, current approaches must be improved. Ear recognition must move beyond ideal settings and demonstrate effectiveness in an unconstrained environment reflective of real world conditions. Ear recognition approaches must be scalable to handle large groups of people. And, ear recognition should demonstrate effectiveness across a diverse population.
This dissertation advances ear recognition from ideal settings to real world settings. We devised an ear recognition framework that outperformed state-of-the-art recognition approaches using the most challenging sets of publicly available ear images and the most voluminous set of unconstrained ear images that we are aware of. We developed a Convolutional Neural Network-based solution for ear normalization and description, we designed a two-stage landmark detector, and we fused learned and handcrafted descriptors. Using our framework, we identified some individuals that are wearing earrings and that have other occlusions, such as hair. The results suggest that our framework can be a gateway for identification of individuals in real world conditions.
|
146 |
Object Detection using deep learning and synthetic dataLidberg, Love January 2018 (has links)
This thesis investigates how synthetic data can be utilized when training convolutional neural networks to detect flags with threatening symbols. The synthetic data used in this thesis consisted of rendered 3D flags with different textures and flags cut out from real images. The synthetic data showed that it can achieve an accuracy above 80% compared to 88% accuracy achieved by a data set containing only real images. The highest accuracy scored was achieved by combining real and synthetic data showing that synthetic data can be used as a complement to real data. Some attempts to improve the accuracy score was made using generative adversarial networks without achieving any encouraging results.
|
147 |
Estudo e implementação de un sistema IEEE 802.11g empregando o conceito de software Defined RadioPerez Junior, José Antonio Gonzalez January 2017 (has links)
Orientador: Prof. Dr. Carlos Eduardo Capovilla / Dissertação (mestrado) - Universidade Federal do ABC, Programa de Pós-Graduação em Engenharia Elétrica, 2017. / Com a evolução dos meios de comunicação e a constante necessidade por altas
taxas de transferencia de dados, a comunicação sem fio torna-se constantemente o
principal e favorito meio para as mais diversas aplicações. Por aliar agilidade, desempenho
e facilidade de instalação, é frequentemente encontrada em sistemas de
controle, áudio e televisão, acesso a internet, etc. Porém, devido as imperfeições e
ruído no canal, essa comunicação requer uma eficiente modulação e uma adequada
proteção contra erros na transmissao dos dados. A versão IEEE 802.11g, presente
em praticamente todos sistemas de comunicação moderno e amplamente difundido
pelas redes conhecidas como WiFi surge como perfeita solução, pois permite alinhar
técnicas robustas e efcientes, como a modulação OFDM e a codificação Convolucional.
Alinhado ao conceito digital e a forma dinamica que a comunicação sem fio
proporciona, o conceito de SDR (Software Dened Radio), torna-se uma interessante
e poderosa ferramenta com a possibilidade de simulação e implementação de transceptores
para diversas aplicaçõess em um único dispositivo. Assim, este projeto de
mestrado tem como objetivo o estudo e testabilidade de um sistema IEEE 802.11g de
comunicação sem fio utilizando dispositivo SDR, com foco em sistemas eficientes e de
baixo custo, para fazer a interface entre o meio físico e o ambiente de processamento
do sinal digital. / With the advancements of communication technology and the constant need for
high rates of data transfer, wireless communication is consistently the main and favorite
option for the most kind of applications. By combining agility, performance and
fast installation, it is often found in control systems, audio and television systems,
internet access, etc. However, due to the imperfections and noise in the channel, this
communication requires an eficient modulation and an adequate protection against
errors in the data transmission. The IEEE 802.11g standard, also used in practically
all modern communication systems and widely difused by the networks known as
WiFi, appears as a perfect solution, since it allows to align robust and eficient techniques
such as OFDM modulation and Convolutional coding. Using digital concept
and the dynamic behavior of wireless communication, the concept of SDR (Software
Dened Radio) becomes an interesting and powerful tool because the possibility of
simulation and implementation of transceivers for several applications in a single
device. This project aims to make a wireless IEEE 802.11g communication system
using Software Defined Radios focusing on low cost radios and high performance to
make the interface between the real world and the digital signal processing.
|
148 |
Chinese Text Classification Based On Deep LearningWang, Xutao January 2018 (has links)
Text classification has always been a concern in area of natural language processing, especially nowadays the data are getting massive due to the development of internet. Recurrent neural network (RNN) is one of the most popular method for natural language processing due to its recurrent architecture which give it ability to process serialized information. In the meanwhile, Convolutional neural network (CNN) has shown its ability to extract features from visual imagery. This paper combine the advantages of RNN and CNN and proposed a model called BLSTM-C for Chinese text classification. BLSTM-C begins with a Bidirectional long short-term memory (BLSTM) layer which is an special kind of RNN to get a sequence output based on the past context and the future context. Then it feed this sequence to CNN layer which is utilized to extract features from the previous sequence. We evaluate BLSTM-C model on several tasks such as sentiment classification and category classification and the result shows our model’s remarkable performance on these text tasks.
|
149 |
A Unified Framework based on Convolutional Neural Networks for Interpreting Carotid Intima-Media Thickness VideosJanuary 2016 (has links)
abstract: Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable, offering several advantages over CT coronary artery calcium score. However, each CIMT examination includes several ultrasound videos, and interpreting each of these CIMT videos involves three operations: (1) select three enddiastolic ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI) in each selected frame, and (3) trace the lumen-intima interface and the media-adventitia interface in each ROI to measure CIMT. These operations are tedious, laborious, and time consuming, a serious limitation that hinders the widespread utilization of CIMT in clinical practice. To overcome this limitation, this paper presents a new system to automate CIMT video interpretation. Our extensive experiments demonstrate that the suggested system significantly outperforms the state-of-the-art methods. The superior performance is attributable to our unified framework based on convolutional neural networks (CNNs) coupled with our informative image representation and effective post-processing of the CNN outputs, which are uniquely designed for each of the above three operations. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
150 |
A Computational Approach to Relative Image AestheticsJanuary 2016 (has links)
abstract: Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated in this work. A new data-set of image pairs with relative labels is constructed by carefully selecting images from the popular AVA data-set. Unlike in aesthetics classification, there is no single threshold which would determine the ranking order of the images across the entire data-set.
This problem is attempted using a deep neural network based approach that is trained on image pairs by incorporating principles from relative learning. Results show that such relative training procedure allows the network to rank the images with a higher accuracy than a state-of-art network trained on the same set of images using binary labels. Further analyzing the results show that training a model using the image pairs learnt better aesthetic features than training on same number of individual binary labelled images.
Additionally, an attempt is made at enhancing the performance of the system by incorporating saliency related information. Given an image, humans might fixate their vision on particular parts of the image, which they might be subconsciously intrigued to. I therefore tried to utilize the saliency information both stand-alone as well as in combination with the global and local aesthetic features by performing two separate sets of experiments. In both the cases, a standard saliency model is chosen and the generated saliency maps are convoluted with the images prior to passing them to the network, thus giving higher importance to the salient regions as compared to the remaining. Thus generated saliency-images are either used independently or along with the global and the local features to train the network. Empirical results show that the saliency related aesthetic features might already be learnt by the network as a sub-set of the global features from automatic feature extraction, thus proving the redundancy of the additional saliency module. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
Page generated in 0.031 seconds