Spelling suggestions: "subject:"convolutional beural networks"" "subject:"convolutional aneural networks""
41 |
Representation of spatial transformations in deep neural networksLenc, Karel January 2017 (has links)
This thesis addresses the problem of investigating the properties and abilities of a variety of computer vision representations with respect to spatial geometric transformations. Our approach is to employ machine learning methods for finding the behaviour of existing image representations empirically and to apply deep learning to new computer vision tasks where the underlying spatial information is of importance. The results help to further the understanding of modern computer vision representations, such as convolutional neural networks (CNNs) in image classification and object detection and to enable their application to new domains such as local feature detection. Because our theoretical understanding of CNNs remains limited, we investigate two key mathematical properties of representations: equivariance (how transformations of the input image are encoded) and equivalence (how two representations, for example two different parameterizations, layers or architectures share the same visual information). A number of methods to establish these properties empirically are proposed. These methods reveal interesting aspects of their structure, including clarifying at which layers in a CNN geometric invariances are achieved and how various CNN architectures differ. We identify several predictors of geometric and architectural compatibility. Direct applications to structured-output regression are demonstrated as well. Local covariant feature detection has been difficult to approach with machine learning techniques. We propose the first fully general formulation for learning local covariant feature detectors which casts detection as a regression problem, enabling the use of powerful regressors such as deep neural networks. The derived covariance constraint can be used to automatically learn which visual structures provide stable anchors for local feature detection. We support these ideas theoretically, and show that existing detectors can be derived in this framework. Additionally, in cooperation with Imperial College London, we introduce a novel large-scale dataset for evaluation of local detectors and descriptors. It is suitable for training and testing modern local features, together with strictly defined evaluation protocols for descriptors in several tasks such as matching, retrieval and verification. The importance of pixel-wise image geometry for object detection is unknown as the best results used to be obtained with combination of CNNs with cues from image segmentation. We propose a detector which uses constant region proposals and, while it approximates objects poorly, we show that a bounding box regressor using intermediate convolutional features can recover sufficiently accurate bounding boxes, demonstrating that the required geometric information is contained in the CNN itself. Combined with other improvements, we obtain an excellent and fast detector that processes an image only with the CNN.
|
42 |
Exploiting diversity for efficient machine learningGeras, Krzysztof Jerzy January 2018 (has links)
A common practice for solving machine learning problems is currently to consider each problem in isolation, starting from scratch every time a new learning problem is encountered or a new model is proposed. This is a perfectly feasible solution when the problems are sufficiently easy or, if the problem is hard when a large amount of resources, both in terms of the training data and computation, are available. Although this naive approach has been the main focus of research in machine learning for a few decades and had a lot of success, it becomes infeasible if the problem is too hard in proportion to the available resources. When using a complex model in this naive approach, it is necessary to collect large data sets (if possible at all) to avoid overfitting and hence it is also necessary to use large computational resources to handle the increased amount of data, first during training to process a large data set and then also at test time to execute a complex model. An alternative to this strategy of treating each learning problem independently is to leverage related data sets and computation encapsulated in previously trained models. By doing that we can decrease the amount of data necessary to reach a satisfactory level of performance and, consequently, improve the accuracy achievable and decrease training time. Our attack on this problem is to exploit diversity - in the structure of the data set, in the features learnt and in the inductive biases of different neural network architectures. In the setting of learning from multiple sources we introduce multiple-source cross-validation, which gives an unbiased estimator of the test error when the data set is composed of data coming from multiple sources and the data at test time are coming from a new unseen source. We also propose new estimators of variance of the standard k-fold cross-validation and multiple-source cross-validation, which have lower bias than previously known ones. To improve unsupervised learning we introduce scheduled denoising autoencoders, which learn a more diverse set of features than the standard denoising auto-encoder. This is thanks to their training procedure, which starts with a high level of noise, when the network is learning coarse features and then the noise is lowered gradually, which allows the network to learn some more local features. A connection between this training procedure and curriculum learning is also drawn. We develop further the idea of learning a diverse representation by explicitly incorporating the goal of obtaining a diverse representation into the training objective. The proposed model, the composite denoising autoencoder, learns multiple subsets of features focused on modelling variations in the data set at different levels of granularity. Finally, we introduce the idea of model blending, a variant of model compression, in which the two models, the teacher and the student, are both strong models, but different in their inductive biases. As an example, we train convolutional networks using the guidance of bidirectional long short-term memory (LSTM) networks. This allows to train the convolutional neural network to be more accurate than the LSTM network at no extra cost at test time.
|
43 |
Image context for object detection, object context for part detectionGonzalez-Garcia, Abel January 2018 (has links)
Objects and parts are crucial elements for achieving automatic image understanding. The goal of the object detection task is to recognize and localize all the objects in an image. Similarly, semantic part detection attempts to recognize and localize the object parts. This thesis proposes four contributions. The first two make object detection more efficient by using active search strategies guided by image context. The last two involve parts. One of them explores the emergence of parts in neural networks trained for object detection, whereas the other improves on part detection by adding object context. First, we present an active search strategy for efficient object class detection. Modern object detectors evaluate a large set of windows using a window classifier. Instead, our search sequentially chooses what window to evaluate next based on all the information gathered before. This results in a significant reduction on the number of necessary window evaluations to detect the objects in the image. We guide our search strategy using image context and the score of the classifier. In our second contribution, we extend this active search to jointly detect pairs of object classes that appear close in the image, exploiting the valuable information that one class can provide about the location of the other. This leads to an even further reduction on the number of necessary evaluations for the smaller, more challenging classes. In the third contribution of this thesis, we study whether semantic parts emerge in Convolutional Neural Networks trained for different visual recognition tasks, especially object detection. We perform two quantitative analyses that provide a deeper understanding of their internal representation by investigating the responses of the network filters. Moreover, we explore several connections between discriminative power and semantics, which provides further insights on the role of semantic parts in the network. Finally, the last contribution is a part detection approach that exploits object context. We complement part appearance with the object appearance, its class, and the expected relative location of the parts inside it. We significantly outperform approaches that use part appearance alone in this challenging task.
|
44 |
Object Detection using deep learning and synthetic dataLidberg, Love January 2018 (has links)
This thesis investigates how synthetic data can be utilized when training convolutional neural networks to detect flags with threatening symbols. The synthetic data used in this thesis consisted of rendered 3D flags with different textures and flags cut out from real images. The synthetic data showed that it can achieve an accuracy above 80% compared to 88% accuracy achieved by a data set containing only real images. The highest accuracy scored was achieved by combining real and synthetic data showing that synthetic data can be used as a complement to real data. Some attempts to improve the accuracy score was made using generative adversarial networks without achieving any encouraging results.
|
45 |
A Unified Framework based on Convolutional Neural Networks for Interpreting Carotid Intima-Media Thickness VideosJanuary 2016 (has links)
abstract: Cardiovascular disease (CVD) is the leading cause of mortality yet largely preventable, but the key to prevention is to identify at-risk individuals before adverse events. For predicting individual CVD risk, carotid intima-media thickness (CIMT), a noninvasive ultrasound method, has proven to be valuable, offering several advantages over CT coronary artery calcium score. However, each CIMT examination includes several ultrasound videos, and interpreting each of these CIMT videos involves three operations: (1) select three enddiastolic ultrasound frames (EUF) in the video, (2) localize a region of interest (ROI) in each selected frame, and (3) trace the lumen-intima interface and the media-adventitia interface in each ROI to measure CIMT. These operations are tedious, laborious, and time consuming, a serious limitation that hinders the widespread utilization of CIMT in clinical practice. To overcome this limitation, this paper presents a new system to automate CIMT video interpretation. Our extensive experiments demonstrate that the suggested system significantly outperforms the state-of-the-art methods. The superior performance is attributable to our unified framework based on convolutional neural networks (CNNs) coupled with our informative image representation and effective post-processing of the CNN outputs, which are uniquely designed for each of the above three operations. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
46 |
A Computational Approach to Relative Image AestheticsJanuary 2016 (has links)
abstract: Computational visual aesthetics has recently become an active research area. Existing state-of-art methods formulate this as a binary classification task where a given image is predicted to be beautiful or not. In many applications such as image retrieval and enhancement, it is more important to rank images based on their aesthetic quality instead of binary-categorizing them. Furthermore, in such applications, it may be possible that all images belong to the same category. Hence determining the aesthetic ranking of the images is more appropriate. To this end, a novel problem of ranking images with respect to their aesthetic quality is formulated in this work. A new data-set of image pairs with relative labels is constructed by carefully selecting images from the popular AVA data-set. Unlike in aesthetics classification, there is no single threshold which would determine the ranking order of the images across the entire data-set.
This problem is attempted using a deep neural network based approach that is trained on image pairs by incorporating principles from relative learning. Results show that such relative training procedure allows the network to rank the images with a higher accuracy than a state-of-art network trained on the same set of images using binary labels. Further analyzing the results show that training a model using the image pairs learnt better aesthetic features than training on same number of individual binary labelled images.
Additionally, an attempt is made at enhancing the performance of the system by incorporating saliency related information. Given an image, humans might fixate their vision on particular parts of the image, which they might be subconsciously intrigued to. I therefore tried to utilize the saliency information both stand-alone as well as in combination with the global and local aesthetic features by performing two separate sets of experiments. In both the cases, a standard saliency model is chosen and the generated saliency maps are convoluted with the images prior to passing them to the network, thus giving higher importance to the salient regions as compared to the remaining. Thus generated saliency-images are either used independently or along with the global and the local features to train the network. Empirical results show that the saliency related aesthetic features might already be learnt by the network as a sub-set of the global features from automatic feature extraction, thus proving the redundancy of the additional saliency module. / Dissertation/Thesis / Masters Thesis Computer Science 2016
|
47 |
Content Detection in Handwritten DocumentsJanuary 2018 (has links)
abstract: Handwritten documents have gained popularity in various domains including education and business. A key task in analyzing a complex document is to distinguish between various content types such as text, math, graphics, tables and so on. For example, one such aspect could be a region on the document with a mathematical expression; in this case, the label would be math. This differentiation facilitates the performance of specific recognition tasks depending on the content type. We hypothesize that the recognition accuracy of the subsequent tasks such as textual, math, and shape recognition will increase, further leading to a better analysis of the document.
Content detection on handwritten documents assigns a particular class to a homogeneous portion of the document. To complete this task, a set of handwritten solutions was digitally collected from middle school students located in two different geographical regions in 2017 and 2018. This research discusses the methods to collect, pre-process and detect content type in the collected handwritten documents. A total of 4049 documents were extracted in the form of image, and json format; and were labelled using an object labelling software with tags being text, math, diagram, cross out, table, graph, tick mark, arrow, and doodle. The labelled images were fed to the Tensorflow’s object detection API to learn a neural network model. We show our results from two neural networks models, Faster Region-based Convolutional Neural Network (Faster R-CNN) and Single Shot detection model (SSD). / Dissertation/Thesis / Masters Thesis Computer Science 2018
|
48 |
Image Representation using Attribute-GraphsPrabhu, Nikita January 2016 (has links) (PDF)
In a digital world of Flickr, Picasa and Google Images, developing a semantic image represen-tation has become a vital problem. Image processing and computer vision researchers to date, have used several di erent representations for images. They vary from low level features such as SIFT, HOG, GIST etc. to high level concepts such as objects and people.
When asked to describe an object or a scene, people usually resort to mid-level features such as size, appearance, feel, use, behaviour etc. Such descriptions are commonly referred to as the attributes of the object or scene. These human understandable, machine detectable attributes have recently become a popular feature category for image representation for various vision tasks. In addition to image and object characteristics, object interactions and back-ground/context information and the actions taking place in the scene form an important part of an image description. It is therefore, essential, to develop an image representation which can e ectively describe various image components and their interactions.
Towards this end, we propose a novel image representation, termed Attribute-Graph. An Attribute-Graph is an undirected graph, incorporating both local and global image character-istics. The graph nodes characterise objects as well as the overall scene context using mid-level semantic attributes, while the edges capture the object topology and the actions being per-formed.
We demonstrate the e ectiveness of Attribute-Graphs by applying them to the problem of image ranking. Since an image retrieval system should rank images in a way which is compatible with visual similarity as perceived by humans, it is intuitive that we work in a human understandable feature space. Most content based image retrieval algorithms treat images as a set of low level features or try to de ne them in terms of the associated text. Such a representation fails to capture the semantics of the image. This, more often than not, results in retrieved images which are semantically dissimilar to the query. Ranking using the proposed attribute-graph representation alleviates this problem.
We benchmark the performance of our ranking algorithm on the rPascal and rImageNet datasets, which we have created in order to evaluate the ranking performance on complex
queries containing multiple objects. Our experimental evaluation shows that modelling images as Attribute-Graphs results in improved ranking performance over existing techniques.
|
49 |
Classification Performance of Convolutional Neural NetworksMattsson, Niklas January 2016 (has links)
The purpose of this thesis is to determine the performance of convolutional neural networks in classifications per millisecond, not training or accuracy, for the GTX960 and the TegraX1. This is done through varying parameters of the convolutional neural networks and using the Python framework Theano's function profiler to measure the time taken for different networks. The results show that increasing any parameter of the convolutional neural network also increases the time required for the classification of an image. The parameters do not punish the network equally, however. Convolutional layers and their depth have a far bigger negative impact on the network's performance than fully-connected layers and the amount of neurons in them. Additionally, the time needed for training the networks does not appear to correlate with the time needed for classification.
|
50 |
Analyse fine 2D/3D de véhicules par réseaux de neurones profonds / 2D/3D fine-grained analysis of vehicles using deep neural networksChabot, Florian 28 June 2017 (has links)
Les travaux développés dans cette thèse s’intéressent à l’analyse fine des véhicules à partir d’une image. Nous définissons le terme d’analyse fine comme un regroupement des concepts suivants : la détection des véhicules dans l’image, l’estimation de leur point de vue (ou orientation), la caractérisation de leur visibilité, leur localisation 3D dans la scène et la reconnaissance de leur marque et de leur modèle. La construction de solutions fiables d’analyse fine de véhicules laisse place à de nombreuses applications notamment dans le domaine du transport intelligent et de la vidéo surveillance.Dans ces travaux, nous proposons plusieurs contributions permettant de traiter partiellement ou complètement cette problématique. Les approches mises en oeuvre se basent sur l’utilisation conjointe de l’apprentissage profond et de modèles 3D de véhicule. Dans une première partie, nous traitons le problème de reconnaissance de marques et modèles en prenant en compte la difficulté de la création de bases d’apprentissage. Dans une seconde partie, nous investiguons une méthode de détection et d’estimation du point de vue précis en nous basant sur l’extraction de caractéristiques visuelles locales et de la cohérence géométrique. La méthode utilise des modèles mathématiques uniquement appris sur des données synthétiques. Enfin, dans une troisième partie, un système complet d’analyse fine de véhicules dans le contexte de la conduite autonome est proposé. Celui-ci se base sur le concept d’apprentissage profond multi-tâches. Des résultats quantitatifs et qualitatifs sont présentés tout au long de ce manuscrit. Sur certains aspects de l’analyse fine de véhicules à partir d’une image, ces recherches nous ont permis de dépasser l’état de l’art. / In this thesis, we are interested in fine-grained analysis of vehicle from an image. We define fine-grained analysis as the following concepts : vehicle detection in the image, vehicle viewpoint (or orientation) estimation, vehicle visibility characterization, vehicle 3D localization and make and model recognition. The design of reliable solutions for fine-grained analysis of vehicle open the door to multiple applications in particular for intelligent transport systems as well as video surveillance systems. In this work, we propose several contributions allowing to address partially or wholly this issue. Proposed approaches are based on joint deep learning technologies and 3D models. In a first section, we deal with make and model classification keeping in mind the difficulty to create training data. In a second section, we investigate a novel method for both vehicle detection and fine-grained viewpoint estimation based on local apparence features and geometric spatial coherence. It uses models learned only on synthetic data. Finally, in a third section, a complete system for fine-grained analysis is proposed. It is based on the multi-task concept. Throughout this report, we provide quantitative and qualitative results. On several aspects related to vehicle fine-grained analysis, this work allowed to outperform state of the art methods.
|
Page generated in 0.0697 seconds