• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 382
  • 64
  • 43
  • 24
  • 6
  • 4
  • 4
  • 3
  • 2
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 610
  • 610
  • 272
  • 217
  • 213
  • 148
  • 134
  • 130
  • 104
  • 96
  • 92
  • 89
  • 78
  • 78
  • 78
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
111

Margin learning in spiking neural networks

Brune, Rafael 15 December 2017 (has links)
No description available.
112

Efficient deterministic approximate Bayesian inference for Gaussian process models

Bui, Thang Duc January 2018 (has links)
Gaussian processes are powerful nonparametric distributions over continuous functions that have become a standard tool in modern probabilistic machine learning. However, the applicability of Gaussian processes in the large-data regime and in hierarchical probabilistic models is severely limited by analytic and computational intractabilities. It is, therefore, important to develop practical approximate inference and learning algorithms that can address these challenges. To this end, this dissertation provides a comprehensive and unifying perspective of pseudo-point based deterministic approximate Bayesian learning for a wide variety of Gaussian process models, which connects previously disparate literature, greatly extends them and allows new state-of-the-art approximations to emerge. We start by building a posterior approximation framework based on Power-Expectation Propagation for Gaussian process regression and classification. This framework relies on a structured approximate Gaussian process posterior based on a small number of pseudo-points, which is judiciously chosen to summarise the actual data and enable tractable and efficient inference and hyperparameter learning. Many existing sparse approximations are recovered as special cases of this framework, and can now be understood as performing approximate posterior inference using a common approximate posterior. Critically, extensive empirical evidence suggests that new approximation methods arisen from this unifying perspective outperform existing approaches in many real-world regression and classification tasks. We explore the extensions of this framework to Gaussian process state space models, Gaussian process latent variable models and deep Gaussian processes, which also unify many recently developed approximation schemes for these models. Several mean-field and structured approximate posterior families for the hidden variables in these models are studied. We also discuss several methods for approximate uncertainty propagation in recurrent and deep architectures based on Gaussian projection, linearisation, and simple Monte Carlo. The benefit of the unified inference and learning frameworks for these models are illustrated in a variety of real-world state-space modelling and regression tasks.
113

Error Pattern Recognition Using Machine Learning

Zhendong, Wang January 2018 (has links)
Mobile networks use automated continuous integration to secure the new technologies, which must reach high quality and backwards compatibility. The machinery needs to be constantly improved to meet the high demands that exist today and will evolve in the future. When testing products in large scale in a telecommunication environment, many parameters may be causing the error. Machine learning can help to assign troubleshooting labels and identify problematic areas in the test environment. In this thesis project, different modeling approaches will be applied step-wise. First, both the TF-IDF (term frequency-inverse document frequency) method and Topic model- ing will be applied for constructing variables. Since the TF-IDF method generates high dimensional variables in this case, Principal component analysis (PCA) is considered as a regularization method to reduce the dimensions. The results of this part will be evaluated by using different criteria. After the variable construction, two semi-supervised models called Label propagation and Label spreading will be applied for the purpose of assigning troubleshooting labels. In both algorithms, one weight matrix for measuring the similarities between different cases needs to be constructed. Two different methods for building up the weight matrix will be tested separately: Gaussian kernel and the nearest-neighbor method. Different hyperparameters in these two algorithms will be experimented with, to select the one which will return the optimal results. After the optimal model is selected, the unlabeled data will be divided up in different proportions for fitting the model. This is to test if the proportions of unlabeled data will affect the result of semi-supervised learning in our case. The classification results from the modeling part will be examined using three classical measures: accuracy, precision and recall. In addition, random permutations cross- validation is applied for the evaluation.
114

Um método para deduplicação de metadados bibliográficos baseado no empilhamento de classificadores / A method for bibliographic metadata deduplication based on stacked generalization

Borges, Eduardo Nunes January 2013 (has links)
Metadados bibliográficos duplicados são registros que correspondem a referências bibliográficas semanticamente equivalentes, ou seja, que descrevem a mesma publicação. Identificar metadados bibliográficos duplicados em uma ou mais bibliotecas digitais é uma tarefa essencial para garantir a qualidade de alguns serviços como busca, navegação e recomendação de conteúdo. Embora diversos padrões de metadados tenham sido propostos, eles não resolvem totalmente os problemas de interoperabilidade porque mesmo que exista um mapeamento entre diferentes esquemas de metadados, podem existir variações na representação do conteúdo. Grande parte dos trabalhos propostos para identificar duplicatas aplica uma ou mais funções sobre o conteúdo de determinados campos no intuito de captar a similaridade entre os registros. Entretanto, é necessário escolher um limiar que defina se dois registros são suficientemente similares para serem considerados semanticamente equivalentes ou duplicados. Trabalhos mais recentes tratam a deduplicação de registros como um problema de classificação de dados, em que um modelo preditivo é treinado para estimar a que objeto do mundo real um registro faz referência. O objetivo principal desta tese é o desenvolvimento de um método efetivo e automático para identificar metadados bibliográficos duplicados, combinando o aprendizado de múltiplos classificadores supervisionados, sem a necessidade de intervenção humana na definição de limiares de similaridade. Sobre o conjunto de treinamento são aplicadas funções de similaridade desenvolvidas especificamente para o contexto de bibliotecas digitais e com baixo custo computacional. Os escores produzidos pelas funções são utilizados para treinar múltiplos modelos de classificação heterogêneos, ou seja, a partir de algoritmos de diversos tipos: baseados em árvores, regras, redes neurais artificiais e probabilísticos. Os classificadores aprendidos são combinados através da estratégia de empilhamento visando potencializar o resultado da deduplicação a partir do conhecimento heterogêneo adquirido individualmente pelos algoritmo de aprendizagem. O modelo de classificação final é aplicado aos pares candidatos ao casamento retornados por uma estratégia de blocagem de dois níveis bastante eficiente. A solução proposta é baseada na hipótese de que o empilhamento de classificadores supervisionados pode aumentar a qualidade da deduplicação quando comparado a outras estratégias de combinação. A avaliação experimental mostra que a hipótese foi confirmada quando o método proposto é comparado com a escolha do melhor classificador e com o voto da maioria. Ainda são analisados o impacto da diversidade dos classificadores no resultado do empilhamento e os casos de falha do método proposto. / Duplicated bibliographic metadata are semantically equivalent records, i.e., references that describe the same publication. Identifying duplicated bibliographic metadata in one or more digital libraries is an essential task to ensure the quality of some services such as search, navigation, and content recommendation. Although many metadata standards have been proposed, they do not completely solve interoperability problems because even if there is a mapping between different metadata schemas, there may be variations in the content representation. Most of work proposed to identify duplicated records uses one or more functions on some fields in order to capture the similarity between the records. However, we need to choose a threshold that defines whether two records are sufficiently similar to be considered semantically equivalent or duplicated. Recent studies deal with record deduplication as a data classification problem, in which a predictive model is trained to estimate the real-world object to which a record refers. The main goal of this thesis is the development of an effective and automatic method to identify duplicated bibliographic metadata, combining multiple supervised classifiers, without any human intervention in the setting of similarity thresholds. We have applied on the training set cheap similarity functions specifically designed for the context of digital libraries. The scores returned by these functions are used to train multiple and heterogeneous classification models, i.e., using learning algorithms based on trees, rules, artificial neural networks and probabilistic models. The learned classifiers are combined by stacked generalization strategy to improve the deduplication result through heterogeneous knowledge acquired by each learning algorithm. The final model is applied to pairs of records that are candidate to matching. These pairs are defined by an efficient two phase blocking strategy. The proposed solution is based on the hypothesis that stacking supervised classifiers can improve the quality of deduplication when compared to other combination strategies. The experimental evaluation shows that the hypothesis has been confirmed by comparing the proposed method to selecting the best classifier or the majority vote technique. We also have analyzed the impact of classifiers diversity on the stacking results and the cases for which the proposed method fails.
115

Classify-normalize-classify : a novel data-driven framework for classifying forest pixels in remote sensing images / Classifica-normaliza-classifica : um nova abordagem para classficar pixels de floresta em imagens de sensoriamento remoto

Souza, César Salgado Vieira de January 2017 (has links)
O monitoramento do meio ambiente e suas mudanças requer a análise de uma grade quantidade de imagens muitas vezes coletadas por satélites. No entanto, variações nos sinais devido a mudanças nas condições atmosféricas frequentemente resultam num deslocamento da distribuição dos dados para diferentes locais e datas. Isso torna difícil a distinção dentre as várias classes de uma base de dados construída a partir de várias imagens. Neste trabalho introduzimos uma nova abordagem de classificação supervisionada, chamada Classifica-Normaliza-Classifica (CNC), para amenizar o problema de deslocamento dos dados. A proposta é implementada usando dois classificadores. O primeiro é treinado em imagens não normalizadas de refletância de topo de atmosfera para distinguir dentre pixels de uma classe de interesse (CDI) e pixels de outras categorias (e.g. floresta versus não-floresta). Dada uma nova imagem de teste, o primeiro classificador gera uma segmentação das regiões da CDI e então um vetor mediano é calculado para os valores espectrais dessas áreas. Então, esse vetor é subtraído de cada pixel da imagem e portanto fixa a distribuição de dados de diferentes imagens num mesmo referencial. Finalmente, o segundo classificador, que é treinado para minimizar o erro de classificação em imagens já centralizadas pela mediana, é aplicado na imagem de teste normalizada no segundo passo para produzir a segmentação binária final. A metodologia proposta foi testada para detectar desflorestamento em pares de imagens co-registradas da Landsat 8 OLI sobre a floresta Amazônica. Experimentos usando imagens multiespectrais de refletância de topo de atmosfera mostraram que a CNC obteve maior acurácia na detecção de desflorestamento do que classificadores aplicados em imagens de refletância de superfície fornecidas pelo United States Geological Survey. As acurácias do método proposto também se mostraram superiores às obtidas pelas máscaras de desflorestamento do programa PRODES. / Monitoring natural environments and their changes over time requires the analysis of a large amount of image data, often collected by orbital remote sensing platforms. However, variations in the observed signals due to changing atmospheric conditions often result in a data distribution shift for different dates and locations making it difficult to discriminate between various classes in a dataset built from several images. This work introduces a novel supervised classification framework, called Classify-Normalize-Classify (CNC), to alleviate this data shift issue. The proposed scheme uses a two classifier approach. The first classifier is trained on non-normalized top-of-the-atmosphere reflectance samples to discriminate between pixels belonging to a class of interest (COI) and pixels from other categories (e.g. forest vs. non-forest). At test time, the estimated COI’s multivariate median signal, derived from the first classifier segmentation, is subtracted from the image and thus anchoring the data distribution from different images to the same reference. Then, a second classifier, pre-trained to minimize the classification error on COI median centered samples, is applied to the median-normalized test image to produce the final binary segmentation. The proposed methodology was tested to detect deforestation using bitemporal Landsat 8 OLI images over the Amazon rainforest. Experiments using top-of-the-atmosphere multispectral reflectance images showed that the deforestation was mapped by the CNC framework more accurately as compared to running a single classifier on surface reflectance images provided by the United States Geological Survey (USGS). Accuracies from the proposed framework also compared favorably with the benchmark masks of the PRODES program.
116

Social training : aprendizado semi supervisionado utilizando funções de escolha social / Social-Training: Semi-Supervised Learning Using Social Choice Functions

Alves, Matheus January 2017 (has links)
Dada a grande quantidade de dados gerados atualmente, apenas uma pequena porção dos mesmos pode ser rotulada manualmente por especialistas humanos. Isso é um desafio comum para aplicações de aprendizagem de máquina. Aprendizado semi-supervisionado aborda este problema através da manipulação dos dados não rotulados juntamente aos dados rotulados. Entretanto, se apenas uma quantidade limitada de exemplos rotulados está disponível, o desempenho da tarefa de aprendizagem de máquina (e.g., classificação) pode ser não satisfatória. Diversas soluções abordam este problema através do uso de uma ensemble de classificadores, visto que essa abordagem aumenta a diversidade dos classificadores. Algoritmos como o co-training e o tri-training utilizam múltiplas partições de dados ou múltiplos algoritmos de aprendizado para melhorar a qualidade da classificação de instâncias não rotuladas através de concordância por maioria simples. Além disso, existem abordagens que estendem esta ideia e adotam processos de votação menos triviais para definir os rótulos, como eleição por maioria ponderada, por exemplo. Contudo, estas soluções requerem que os rótulos possuam um certo nível de confiança para serem utilizados no treinamento. Consequentemente, nem toda a informação disponível é utilizada. Por exemplo: informações associadas a níveis de confiança baixos são totalmente ignoradas. Este trabalho propõe uma abordagem chamada social-training, que utiliza toda a informação disponível na tarefa de aprendizado semi-supervisionado. Para isto, múltiplos classificadores heterogêneos são treinados com os dados rotulados e geram diversas classificações para as mesmas instâncias não rotuladas. O social-training, então, agrega estes resultados em um único rótulo por meio de funções de escolha social que trabalham com agregação de rankings sobre as instâncias. Especificamente, a solução trabalha com casos de classificação binária. Os resultados mostram que trabalhar com o ranking completo, ou seja, rotular todas as instâncias não rotuladas, é capaz de reduzir o erro de classificação para alguns conjuntos de dados da base da UCI utilizados. / Given the huge quantity of data currently being generated, just a small portion of it can be manually labeled by human experts. This is a challenge for machine learning applications. Semi-supervised learning addresses this problem by handling unlabeled data alongside labeled ones. However, if only a limited quantity of labeled examples is available, the performance of the machine learning task (e.g., classification) can be very unsatisfactory. Many solutions address this issue by using a classifier ensemble because this increases diversity. Algorithms such as co-training and tri-training use multiple views or multiple learning algorithms in order to improve the classification of unlabeled instances through simple majority agreement. Also, there are approaches that extend this idea and adopt less trivial voting processes to define the labels, like weighted majority voting. Nevertheless, these solutions require some confidence level on the label in order to use it for training. Hence, not all information is used, i.e., information associated with low confidence level is disregarded completely. An approach called social-training is proposed, which uses all information available in the semi-supervised learning task. For this, multiple heterogeneous classifiers are trained with the labeled data and generate diverse classifications for the same unlabeled instances. Social-training then aggregates these results into a single label by means of social choice functions that work with rank aggregation over the instances. The solution addresses binary classification cases. The results show that working with the full ranking, i.e., labeling all unlabeled instances, is able to reduce the classification error for some UCI data sets used.
117

Robots that Anticipate Pain: Anticipating Physical Perturbations from Visual Cues through Deep Predictive Models

January 2017 (has links)
abstract: To ensure system integrity, robots need to proactively avoid any unwanted physical perturbation that may cause damage to the underlying hardware. In this thesis work, we investigate a machine learning approach that allows robots to anticipate impending physical perturbations from perceptual cues. In contrast to other approaches that require knowledge about sources of perturbation to be encoded before deployment, our method is based on experiential learning. Robots learn to associate visual cues with subsequent physical perturbations and contacts. In turn, these extracted visual cues are then used to predict potential future perturbations acting on the robot. To this end, we introduce a novel deep network architecture which combines multiple sub- networks for dealing with robot dynamics and perceptual input from the environment. We present a self-supervised approach for training the system that does not require any labeling of training data. Extensive experiments in a human-robot interaction task show that a robot can learn to predict physical contact by a human interaction partner without any prior information or labeling. Furthermore, the network is able to successfully predict physical contact from either depth stream input or traditional video input or using both modalities as input. / Dissertation/Thesis / Masters Thesis Computer Science 2017
118

Supervised and Ensemble Classification of Multivariate Functional Data: Applications to Lupus Diagnosis

January 2018 (has links)
abstract: This dissertation investigates the classification of systemic lupus erythematosus (SLE) in the presence of non-SLE alternatives, while developing novel curve classification methodologies with wide ranging applications. Functional data representations of plasma thermogram measurements and the corresponding derivative curves provide predictors yet to be investigated for SLE identification. Functional nonparametric classifiers form a methodological basis, which is used herein to develop a) the family of ESFuNC segment-wise curve classification algorithms and b) per-pixel ensembles based on logistic regression and fused-LASSO. The proposed methods achieve test set accuracy rates as high as 94.3%, while returning information about regions of the temperature domain that are critical for population discrimination. The undertaken analyses suggest that derivate-based information contributes significantly in improved classification performance relative to recently published studies on SLE plasma thermograms. / Dissertation/Thesis / Doctoral Dissertation Applied Mathematics 2018
119

A comperative study of text classification models on invoices : The feasibility of different machine learning algorithms and their accuracy

Ekström, Linus, Augustsson, Andreas January 2018 (has links)
Text classification for companies is becoming more important in a world where an increasing amount of digital data are made available. The aim is to research whether five different machine learning algorithms can be used to automate the process of classification of invoice data and see which one gets the highest accuracy. Algorithms are in a later stage combined for an attempt to achieve higher results. N-grams are used, and results are compared in form of total accuracy of classification for each algorithm. A library in Python, called scikit-learn, implementing the chosen algorithms, was used. Data is collected and generated to represent data present on a real invoice where data has been extracted. Results from this thesis show that it is possible to use machine learning for this type of problem. The highest scoring algorithm (LinearSVC from scikit-learn) classifies 86% of all samples correctly. This is a margin of 16% above the acceptable level of 70%.
120

Using supervised learning algorithms to model the behavior of Road Weather Information System sensors

Axelsson, Tobias January 2018 (has links)
Trafikverket, the agency in charge of state road maintenance in Sweden, have a number of so-called Road Weather Information Systems (RWIS). The main purpose of the stations is to provide winter road maintenance workers with information to decide when roads need to be plowed and/or salted. Each RWIS have a number of sensors which make road weather-related measurements every 30 minutes. One of the sensors is dug into the road which can cause traffic disturbances and be costly for Trafikverket. Other RWIS sensors fail occasionally. This project aims at modelling a set of RWIS sensors using supervised machine learning algorithms. The sensors that are of interest to model are: Optic Eye, Track Ice Road Sensor (TIRS) and DST111. Optic Eye measures precipitation type and precipitation amount. Both TIRS and DST111 measure road surface temperature. The difference between TIRS and DST111 is that the former is dug into the road, and DST111 measures road surface temperature from a distance via infrared laser. Any supervised learning algorithm trained to model a given measurement made by a sensor, may only train on measurements made by the other sensors as input features. Measurements made by TIRS may not be used as input in modelling other sensors, since it is desired to see if TIRS can be removed. The following input features may also be used for training: road friction, road surface condition and timestamp. Scikit-learn was used as machine learning software in this project. An experimental approach was chosen to achieve the project results: A pre-determined set of supervised algorithms were compared using different amount of top relevant input features and different hyperparameter settings. Prior to achieving the results, a data preparation process was conducted. Observations with suspected or definitive errors were removed in this process. During the data preparation process, the timestamp feature was transformed into two new features: month and hour. The results in this project show that precipitation type was best modelled using Classification And Regression Tree (CART) on Scikit-learn default settings, achieving a performance score of Macro-F1test = 0.46 and accuracy = 0.84 using road surface condition, road friction, DST111 road surface temperature, hour and month as input features. Precipitation amount was best modelled using k-Nearest Neighbor (kNN); with k = 64 and road friction used as the only input feature, a performance score of MSEtest = 0.31 was attained. TIRS road surface temperature was best modelled with Multi-Layer Perceptron (MLP) using 64 hidden nodes and DST111 road surface temperature, road surface condition, road friction, month, hour and precipitation type as input features, with which a performance score of MSEtest = 0.88 was achieved. DST111 road surface temperature was best modelled using Random forest on Scikit-learn default settings with road surface condition, road friction, month, precipitation type and hour as input features, achieving a performance score of MSEtest = 10.16.

Page generated in 0.073 seconds