591 |
Novel Image Representations and Learning TasksJanuary 2017 (has links)
abstract: Computer Vision as a eld has gone through signicant changes in the last decade.
The eld has seen tremendous success in designing learning systems with hand-crafted
features and in using representation learning to extract better features. In this dissertation
some novel approaches to representation learning and task learning are studied.
Multiple-instance learning which is generalization of supervised learning, is one
example of task learning that is discussed. In particular, a novel non-parametric k-
NN-based multiple-instance learning is proposed, which is shown to outperform other
existing approaches. This solution is applied to a diabetic retinopathy pathology
detection problem eectively.
In cases of representation learning, generality of neural features are investigated
rst. This investigation leads to some critical understanding and results in feature
generality among datasets. The possibility of learning from a mentor network instead
of from labels is then investigated. Distillation of dark knowledge is used to eciently
mentor a small network from a pre-trained large mentor network. These studies help
in understanding representation learning with smaller and compressed networks. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
592 |
Towards Learning Representations in Visual Computing TasksJanuary 2017 (has links)
abstract: The performance of most of the visual computing tasks depends on the quality of the features extracted from the raw data. Insightful feature representation increases the performance of many learning algorithms by exposing the underlying explanatory factors of the output for the unobserved input. A good representation should also handle anomalies in the data such as missing samples and noisy input caused by the undesired, external factors of variation. It should also reduce the data redundancy. Over the years, many feature extraction processes have been invented to produce good representations of raw images and videos.
The feature extraction processes can be categorized into three groups. The first group contains processes that are hand-crafted for a specific task. Hand-engineering features requires the knowledge of domain experts and manual labor. However, the feature extraction process is interpretable and explainable. Next group contains the latent-feature extraction processes. While the original feature lies in a high-dimensional space, the relevant factors for a task often lie on a lower dimensional manifold. The latent-feature extraction employs hidden variables to expose the underlying data properties that cannot be directly measured from the input. Latent features seek a specific structure such as sparsity or low-rank into the derived representation through sophisticated optimization techniques. The last category is that of deep features. These are obtained by passing raw input data with minimal pre-processing through a deep network. Its parameters are computed by iteratively minimizing a task-based loss.
In this dissertation, I present four pieces of work where I create and learn suitable data representations. The first task employs hand-crafted features to perform clinically-relevant retrieval of diabetic retinopathy images. The second task uses latent features to perform content-adaptive image enhancement. The third task ranks a pair of images based on their aestheticism. The goal of the last task is to capture localized image artifacts in small datasets with patch-level labels. For both these tasks, I propose novel deep architectures and show significant improvement over the previous state-of-art approaches. A suitable combination of feature representations augmented with an appropriate learning approach can increase performance for most visual computing tasks. / Dissertation/Thesis / Doctoral Dissertation Computer Science 2017
|
593 |
Deep Active Learning Explored Across Diverse Label SpacesJanuary 2018 (has links)
abstract: Deep learning architectures have been widely explored in computer vision and have
depicted commendable performance in a variety of applications. A fundamental challenge
in training deep networks is the requirement of large amounts of labeled training
data. While gathering large quantities of unlabeled data is cheap and easy, annotating
the data is an expensive process in terms of time, labor and human expertise.
Thus, developing algorithms that minimize the human effort in training deep models
is of immense practical importance. Active learning algorithms automatically identify
salient and exemplar samples from large amounts of unlabeled data and can augment
maximal information to supervised learning models, thereby reducing the human annotation
effort in training machine learning models. The goal of this dissertation is to
fuse ideas from deep learning and active learning and design novel deep active learning
algorithms. The proposed learning methodologies explore diverse label spaces to
solve different computer vision applications. Three major contributions have emerged
from this work; (i) a deep active framework for multi-class image classication, (ii)
a deep active model with and without label correlation for multi-label image classi-
cation and (iii) a deep active paradigm for regression. Extensive empirical studies
on a variety of multi-class, multi-label and regression vision datasets corroborate the
potential of the proposed methods for real-world applications. Additional contributions
include: (i) a multimodal emotion database consisting of recordings of facial
expressions, body gestures, vocal expressions and physiological signals of actors enacting
various emotions, (ii) four multimodal deep belief network models and (iii)
an in-depth analysis of the effect of transfer of multimodal emotion features between
source and target networks on classification accuracy and training time. These related
contributions help comprehend the challenges involved in training deep learning
models and motivate the main goal of this dissertation. / Dissertation/Thesis / Doctoral Dissertation Electrical Engineering 2018
|
594 |
Design and Mining of Health Information Systems for Process and Patient Care ImprovementJanuary 2018 (has links)
abstract: In healthcare facilities, health information systems (HISs) are used to serve different purposes. The radiology department adopts multiple HISs in managing their operations and patient care. In general, the HISs that touch radiology fall into two categories: tracking HISs and archive HISs. Electronic Health Records (EHR) is a typical tracking HIS, which tracks the care each patient receives at multiple encounters and facilities. Archive HISs are typically specialized databases to store large-size data collected as part of the patient care. A typical example of an archive HIS is the Picture Archive and Communication System (PACS), which provides economical storage and convenient access to diagnostic images from multiple modalities. How to integrate such HISs and best utilize their data remains a challenging problem due to the disparity of HISs as well as high-dimensionality and heterogeneity of the data. My PhD dissertation research includes three inter-connected and integrated topics and focuses on designing integrated HISs and further developing statistical models and machine learning algorithms for process and patient care improvement.
Topic 1: Design of super-HIS and tracking of quality of care (QoC). My research developed an information technology that integrates multiple HISs in radiology, and proposed QoC metrics defined upon the data that measure various dimensions of care. The DDD assisted the clinical practices and enabled an effective intervention for reducing lengthy radiologist turnaround times for patients.
Topic 2: Monitoring and change detection of QoC data streams for process improvement. With the super-HIS in place, high-dimensional data streams of QoC metrics are generated. I developed a statistical model for monitoring high- dimensional data streams that integrated Singular Vector Decomposition (SVD) and process control. The algorithm was applied to QoC metrics data, and additionally extended to another application of monitoring traffic data in communication networks.
Topic 3: Deep transfer learning of archive HIS data for computer-aided diagnosis (CAD). The novelty of the CAD system is the development of a deep transfer learning algorithm that combines the ideas of transfer learning and multi- modality image integration under the deep learning framework. Our system achieved high accuracy in breast cancer diagnosis compared with conventional machine learning algorithms. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2018
|
595 |
Learning Transferable Data Representations Using Deep Generative ModelsJanuary 2018 (has links)
abstract: Machine learning models convert raw data in the form of video, images, audio,
text, etc. into feature representations that are convenient for computational process-
ing. Deep neural networks have proven to be very efficient feature extractors for a
variety of machine learning tasks. Generative models based on deep neural networks
introduce constraints on the feature space to learn transferable and disentangled rep-
resentations. Transferable feature representations help in training machine learning
models that are robust across different distributions of data. For example, with the
application of transferable features in domain adaptation, models trained on a source
distribution can be applied to a data from a target distribution even though the dis-
tributions may be different. In style transfer and image-to-image translation, disen-
tangled representations allow for the separation of style and content when translating
images.
This thesis examines learning transferable data representations in novel deep gen-
erative models. The Semi-Supervised Adversarial Translator (SAT) utilizes adversar-
ial methods and cross-domain weight sharing in a neural network to extract trans-
ferable representations. These transferable interpretations can then be decoded into
the original image or a similar image in another domain. The Explicit Disentangling
Network (EDN) utilizes generative methods to disentangle images into their core at-
tributes and then segments sets of related attributes. The EDN can separate these
attributes by controlling the ow of information using a novel combination of losses
and network architecture. This separation of attributes allows precise modi_cations
to speci_c components of the data representation, boosting the performance of ma-
chine learning tasks. The effectiveness of these models is evaluated across domain
adaptation, style transfer, and image-to-image translation tasks. / Dissertation/Thesis / Masters Thesis Computer Science 2018
|
596 |
Study of Knowledge Transfer Techniques For Deep Learning on Edge DevicesJanuary 2018 (has links)
abstract: With the emergence of edge computing paradigm, many applications such as image recognition and augmented reality require to perform machine learning (ML) and artificial intelligence (AI) tasks on edge devices. Most AI and ML models are large and computational heavy, whereas edge devices are usually equipped with limited computational and storage resources. Such models can be compressed and reduced in order to be placed on edge devices, but they may loose their capability and may not generalize and perform well compared to large models. Recent works used knowledge transfer techniques to transfer information from a large network (termed teacher) to a small one (termed student) in order to improve the performance of the latter. This approach seems to be promising for learning on edge devices, but a thorough investigation on its effectiveness is lacking.
The purpose of this work is to provide an extensive study on the performance (both in terms of accuracy and convergence speed) of knowledge transfer, considering different student-teacher architectures, datasets and different techniques for transferring knowledge from teacher to student.
A good performance improvement is obtained by transferring knowledge from both the intermediate layers and last layer of the teacher to a shallower student. But other architectures and transfer techniques do not fare so well and some of them even lead to negative performance impact. For example, a smaller and shorter network, trained with knowledge transfer on Caltech 101 achieved a significant improvement of 7.36\% in the accuracy and converges 16 times faster compared to the same network trained without knowledge transfer. On the other hand, smaller network which is thinner than the teacher network performed worse with an accuracy drop of 9.48\% on Caltech 101, even with utilization of knowledge transfer. / Dissertation/Thesis / Masters Thesis Computer Science 2018
|
597 |
Deep Reinforcement Learning for Cavity Filter TuningLarsson, Hannes January 2018 (has links)
In this Master's thesis the option of using deep reinforcement learning for cavity filter tuning has been explored. Several reinforcement learning algorithms have been explained and discussed, and then the deep deterministic policy gradient algorithm has been used to solve a simulated filter tuning problem. Both the filter environment and the reinforcement learning agent were implemented, with the filter environment making use of existing circuit models. The reinforcement learning agent learned how to tune filters with four poles and one transmission zero, or eight tune-able screws in total. A comparison was also made between constant exploration noise and exploration noise decaying over time, together with different maximum lengths of the episodes. For the particular noise used here, decaying exploration noise was shown to be better than constant, and a maximum length of 100 steps was shown to be better than 200 for the 8 screw filter.
|
598 |
Deep Understanding of Urban Mobility from CityscapeWebcamsZhang, Shanghang 01 May 2018 (has links)
Deep understanding of urban mobility is of great significance for many real-world applications, such as urban traffic management and autonomous driving. This thesis develops deep learning methodologies to extract vehicle counts from streaming realtime video captured by multiple low resolution web cameras and construct maps of traffic density in a city environment; in particular, we focus on cameras installed in the Manhattan borough of NYC. The large-scale videos from these web cameras have low spatial and temporal resolution, high occlusion, large perspective, and variable environment conditions, making most existing methods to lose their efficacy. To overcome these challenges, the thesis develops several techniques: 1. a block-level regression model with a rank constraint to map the dense image feature into vehicle densities; 2. a deep multi-task learning framework based on fully convolutional neural networks to jointly learn vehicle density and vehicle count; 3. deep spatio-temporal networks for vehicle counting to incorporate temporal information of the traffic flow; and 4. multi-source domain adaptation mechanisms with adversarial learning to adapt the deep counting model to multiple cameras. To train and validate the proposed system, we have collected a largescale webcam traffic dataset CityCam that contains 60 million frames from 212 webcams installed in key intersections of NYC. Of there, 60; 000 frames have been annotated with rich information, leading to about 900; 000 annotated objects. To the best of our knowledge, it is the first and largest webcam traffic dataset with such large number of elaborate annotations. The proposed methods are integrated into the CityScapeEye system that has been extensively evaluated and compared to existing techniques on different counting tasks and datasets, with experimental results demonstrating the effectiveness and robustness of CityScapeEye.
|
599 |
Convolutional neural network reliability on an APSoC platform a traffic-sign recognition case study / Confiabilidade de uma rede neural convolucional em uma plataforma APSoC: um estudo para reconhecimento de placas de trânsitoLopes, Israel da Costa January 2017 (has links)
O aprendizado profundo tem inúmeras aplicações na visão computacional, reconhecimento de fala, processamento de linguagem natural e outras aplicações de interesse comercial. A visão computacional, por sua vez, possui muitas aplicações em áreas distintas, indo desde o entretenimento à aplicações relevantes e críticas. O reconhecimento e manipulação de faces (Snapchat), e a descrição de objetos em fotos (OneDrive) são exemplos de aplicações no entretenimento. Ao passo que, a inspeção industrial, o diagnóstico médico, o reconhecimento de objetos em imagens capturadas por satélites (usadas em missões de resgate e defesa), os carros autônomos e o Sistema Avançado de Auxílio ao Motorista (SAAM) são exemplos de aplicações relevantes e críticas. Algumas das empresas de circuitos integrados mais importantes do mundo, como Xilinx, Intel e Nvidia estão apostando em plataformas dedicadas para acelerar o treinamento e a implementação de algoritmos de aprendizado profundo e outras alternativas de visão computacional para carros autônomos e SAAM devido às suas altas necessidades computacionais. Assim, implementar sistemas de aprendizado profundo que alcançam alto desempenho com o custo de baixa utilização de área e dissipação de potência é um grande desafio. Além do mais, os circuitos eletrônicos para a indústria automotiva devem ser confiáveis mesmo sob efeitos da radiação, defeitos de fabricação e efeitos do envelhecimento. Assim, um gerador automático de VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL) para Redes Neurais Convolucionais (RNC) foi desenvolvido para reduzir o tempo associado a implementação de algoritmos de aprendizado profundo em hardware. Como estudo de caso, uma RNC foi treinada pela ferramenta Convolutional Architecture for Fast Feature Embedding (Caffe), de modo a classificar 6 classes de placas de trânsito, alcançando uma precisão de cerca de 89,8% no conjunto de dados German Traffic-Sign Recognition Benchmark (GTSRB), que contém imagens de placas de trânsito em cenários complexos. Essa RNC foi implementada num All-Programmable System-on- Chip (APSoC) Zynq-7000, resultando em 313 Frames Por Segundo (FPS) em imagens normalizadas para 32x32, com o APSoC dissipando uma potência de somente 2.057 W, enquanto uma Graphics Processing Unit (GPU) embarcada, em seu modo de operação mínimo, dissipa 10 W. A confiabilidade da RNC proposta foi investigada por injeções de falhas acumuladas e aleatórias por emulação nos bits de configuração da Lógica Programável (LP) do APSoC, alcançando uma confiabilidade de 80,5% sob Single-Bit-Upset (SBU) onde foram considerados ambos os Dados Corrompidos Silenciosos (DCSs) críticos e os casos em que o sistema não respondeu no tempo esperado (time-outs). Em relação às falhas múltiplas, a confiabilidade da RNC decresce exponencialmente com o número de falhas acumuladas. Em vista disso, a confiabilidade da RNC proposta deve ser aumentada através do uso de técnicas de proteção durante o fluxo de projeto. / Deep learning has a plethora of applications in computer vision, speech recognition, natural language processing and other applications of commercial interest. Computer vision, in turn, has many applications in distinct areas, ranging from entertainment applications to relevant and critical applications. Face recognition and manipulation (Snapchat), and object description in pictures (OneDrive) are examples of entertainment applications. Industrial inspection, medical diagnostics, object recognition in images captured by satellites (used in rescue and defense missions), autonomous cars and Advanced Driver-Assistance System (ADAS) are examples of relevant and critical applications. Some of the most important integrated circuit companies around the world, such as Xilinx, Intel and Nvidia are waging in dedicated platforms for accelerating the training and deployment of deep learning and other computer vision algorithms for autonomous cars and ADAS due to their high computational requirement. Thus, implementing a deep learning system that achieves high performance with low area utilization and power consumption costs is a big challenge. Besides, electronic equipment for automotive industry must be reliable even under radiation effects, manufacturing defects and aging effects, inasmuch as if a system failure occurs, a car accident can happen. Thus, a Convolutional Neural Network (CNN) VHSIC (Very High Speed Integrated Circuit) Hardware Description Language (VHDL) automatic generator was developed to reduce the design time associated to the implementation of deep learning algorithms in hardware. As a case study, a CNN was trained by the Convolutional Architecture for Fast Feature Embedding (Caffe) framework, in order to classify 6 traffic-sign classes, achieving an average accuracy of about 89.8% on the German Traffic-Sign Recognition Benchmark (GTSRB) dataset, which contains trafficsigns images in complex scenarios. This CNN was implemented on a Zynq-7000 All- Programmable System-on-Chip (APSoC), achieving about 313 Frames Per Second (FPS) on 32x32-normalized images, with the APSoC consuming only 2.057W, while an embedded Graphics Processing Unit (GPU), in its minimum operation mode, consumes 10W. The proposed CNN reliability was investigated by random piled-up fault injection by emulation in the Programming Logic (PL) configuration bits of the APSoC, achieving 80.5% of reliability under Single-Bit-Upset (SBU) where both critical Silent Data Corruptions (SDCs) and time-outs were considered. Regarding the multiple faults, the proposed CNN reliability exponentially decreases with the number of piled-up faults. Hence, the proposed CNN reliability must be increased by using hardening techniques during the design flow.
|
600 |
Machine Learning Models for High-dimensional Biomedical DataJanuary 2018 (has links)
abstract: The recent technological advances enable the collection of various complex, heterogeneous and high-dimensional data in biomedical domains. The increasing availability of the high-dimensional biomedical data creates the needs of new machine learning models for effective data analysis and knowledge discovery. This dissertation introduces several unsupervised and supervised methods to help understand the data, discover the patterns and improve the decision making. All the proposed methods can generalize to other industrial fields.
The first topic of this dissertation focuses on the data clustering. Data clustering is often the first step for analyzing a dataset without the label information. Clustering high-dimensional data with mixed categorical and numeric attributes remains a challenging, yet important task. A clustering algorithm based on tree ensembles, CRAFTER, is proposed to tackle this task in a scalable manner.
The second part of this dissertation aims to develop data representation methods for genome sequencing data, a special type of high-dimensional data in the biomedical domain. The proposed data representation method, Bag-of-Segments, can summarize the key characteristics of the genome sequence into a small number of features with good interpretability.
The third part of this dissertation introduces an end-to-end deep neural network model, GCRNN, for time series classification with emphasis on both the accuracy and the interpretation. GCRNN contains a convolutional network component to extract high-level features, and a recurrent network component to enhance the modeling of the temporal characteristics. A feed-forward fully connected network with the sparse group lasso regularization is used to generate the final classification and provide good interpretability.
The last topic centers around the dimensionality reduction methods for time series data. A good dimensionality reduction method is important for the storage, decision making and pattern visualization for time series data. The CRNN autoencoder is proposed to not only achieve low reconstruction error, but also generate discriminative features. A variational version of this autoencoder has great potential for applications such as anomaly detection and process control. / Dissertation/Thesis / Doctoral Dissertation Industrial Engineering 2018
|
Page generated in 0.0661 seconds