Global ETD Search

41	Discovering compositional structure / Harrison, Matthew T. January 2005 (has links) Thesis (Ph.D.)--Brown University, 2005. / Vita. Thesis advisor: Stuart Geman. Includes bibliographical references (leaves 7-9, 31-33, 64-68, 107-107, 131-132, 155-157, 267-268). Also available online. Learning. Machine learning.
42	Learning in spectral clustering / Shortreed, Susan, January 2006 (has links) Thesis (Ph. D.)--University of Washington, 2006. / Vita. Includes bibliographical references (p. 167-170). Cluster analysis. Machine learning
43	Anomaly Detection Through Statistics-Based Machine Learning For Computer Networks Zhu, Xuejun. January 2006 (has links) (PDF) Dissertation (PhD)--University of Arizona, Tucson, Arizona, 2006. Computer networks Machine learning.
44	Asymptotics of Gaussian Regularized Least-Squares Lippert, Ross, Rifkin, Ryan 20 October 2005 (has links) We consider regularized least-squares (RLS) with a Gaussian kernel. Weprove that if we let the Gaussian bandwidth $\sigma \rightarrow\infty$ while letting the regularization parameter $\lambda\rightarrow 0$, the RLS solution tends to a polynomial whose order iscontrolled by the relative rates of decay of $\frac{1}{\sigma^2}$ and$\lambda$: if $\lambda = \sigma^{-(2k+1)}$, then, as $\sigma \rightarrow\infty$, the RLS solution tends to the $k$th order polynomial withminimal empirical error. We illustrate the result with an example. AI machine learning regularization
45	Multi-cue visual tracking: feature learning and fusion Lan, Xiangyuan 10 August 2016 (has links) As an important and active research topic in computer vision community, visual tracking is a key component in many applications ranging from video surveillance and robotics to human computer. In this thesis, we propose new appearance models based on multiple visual cues and address several research issues in feature learning and fusion for visual tracking. Feature extraction and feature fusion are two key modules to construct the appearance model for the tracked target with multiple visual cues. Feature extraction aims to extract informative features for visual representation of the tracked target, and many kinds of hand-crafted feature descriptors which capture different types of visual information have been developed. However, since large appearance variations, e.g. occlusion, illumination may occur during tracking, the target samples may be contaminated/corrupted. As such, the extracted raw features may not be able to capture the intrinsic properties of the target appearance. Besides, without explicitly imposing the discriminability, the extracted features may potentially suffer background distraction problem. To extract uncontaminated discriminative features from multiple visual cues, this thesis proposes a novel robust joint discriminative feature learning framework which is capable of 1) simultaneously and optimally removing corrupted features and learning reliable classifiers, and 2) exploiting the consistent and feature-specific discriminative information of multiple feature. In this way, the features and classifiers learned from potentially corrupted tracking samples can be better utilized for target representation and foreground/background discrimination. As shown by the Data Processing Inequality, information fusion in feature level contains more information than that in classifier level. In addition, not all visual cues/features are reliable, and thereby combining all the features may not achieve a better tracking performance. As such, it is more reasonable to dynamically select and fuse multiple visual cues for visual tracking. Based on aforementioned considerations, this thesis proposes a novel joint sparse representation model in which feature selection, fusion, and representation are performed optimally in a unified framework. By taking advantages of sparse representation, unreliable features are detected and removed while reliable features are fused on feature level for target representation. In order to capture the non-linear similarity of features, the model is further extended to perform feature fusion in kernel space. Experimental results demonstrate the effectiveness of the proposed model. Since different visual cues extracted from the same object should share some commonalities in their representations and each feature should also have some diversities to reflect its complementarity in appearance modeling, another important problem in feature fusion is how to learn the commonality and diversity in the fused representations of multiple visual cues to enhance the tracking accuracy. Different from existing multi-cue sparse trackers which only consider the commonalities among the sparsity patterns of multiple visual cues, this thesis proposes a novel multiple sparse representation model for multi-cue visual tracking which jointly exploits the underlying commonalities and diversities of different visual cues by decomposing multiple sparsity patterns. Moreover, this thesis introduces a novel online multiple metric learning to efficiently and adaptively incorporate the appearance proximity constraint, which ensures that the learned commonalities of multiple visual cues are more representative. Experimental results on tracking benchmark videos and other challenging videos show that the proposed tracker achieves better performance than the existing sparsity-based trackers and other state-of-the-art trackers. Computer vision;Machine learning
46	Predictive analytics for classification of immigration visa applications: a discriminative machine learning approach Vegesana, Sharmila January 1900 (has links) Master of Science / Department of Computer Science / William Hsu / This work focuses on the data science challenge problem of predicting the decision for past immigration visa applications using supervised machine learning for classification. I describe an end-to-end approach that first prepares historical data for supervised inductive learning, trains various discriminative models, and evaluates these models using simple statistical validation methods. The H-1B visa allows employers in the United States to temporarily employ foreign nationals in various specialty occupations that require a bachelor’s degree or higher in the specific specialty, or its equivalents. These specialty occupations may often include, but are not limited to: medicine, health, journalism, and areas of science, technology, engineering and mathematics (STEM). Every year the United States Citizenship and Immigration Service (USCIS) grants a current maximum of 85,000 visas, even though the number of applicants surpasses this amount by a huge difference and this selection process is claimed to be a lottery system. The dataset used for this experimental research project contains all the petitions made for this visa cap from the year 2011 to 2016. This project aims at using discriminative machine learning techniques to classify these petitions and predict the “case status” of each petition based on various factors. Exploratory data analysis is also done to determine the top employers, the locations which most appeal for foreign nationals under this visa cap and the job roles which have the highest number of foreign workers. I apply supervised inductive learning algorithms such as Gaussian Naïve Bayes, Logistic Regression, and Random Forests to identify the most probable factors for H-1B visa certifications and compare the results of each to determine the best predictive model for this testbed. Classification Machine learning
47	Algoritmo AdaBoost robusto ao ruído : aplicação à detecção de faces em imagens de baixa resolução / Noise robust AdaBoost algorithm : applying to face detection in low resolution images Fernandez Merjildo, Diego Alonso, 1982- 12 June 2013 (has links) Orientador: Lee Luan Ling / Dissertação (mestrado) - Universidade Estadual de Campinas, Faculdade de Engenharia Elétrica e de Computação / Made available in DSpace on 2018-08-24T05:09:39Z (GMT). No. of bitstreams: 1 FernandezMerjildo_DiegoAlonso_M.pdf: 6281716 bytes, checksum: 6e22526557511699a8961e5b44949c78 (MD5) Previous issue date: 2013 / Resumo: O presente trabalho propõe um algoritmo AdaBoost modificado, que minimiza o efeito do overfitting no treinamento produzido por amostras ruidosas. Para este fim, a atualização da distribuição de pesos é feita baseado na fragmentação do erro de treinamento, o qual permite atualizar efetivamente as amostras classificadas incorretamente para cada nível de taxa de erro. Subsequentemente, o algoritmo desenvolvido é aplicado no processo de detecção de faces, utilizando os Padrões Binários Locais Multi-Escala em Blocos (Multiscale Block Local Binary Patterns (MB-LBP)) como padrões característicos para formação de uma cascata de classificadores. Os resultados experimentais mostram que o algoritmo proposto é simples e eficiente, evidenciando vantagens sobre os algoritmos AdaBoost clássicos, em termos de maior capacidade de generalização, prevenção de overfitting e maiores taxas de acerto em imagens de baixa resolução / Abstract: This work aims a modification to the AdaBoost algorithm applied to face detection. Initially, we present the approaches used in face detection, highlighting the success of methods based on appearance. Then, we focus on the AdaBoost algorithm, its performance and the improvements realized by author as published literature. Despite the indisputable success of Boosting algorithms, it is affected by the high sensitivity to noisy samples. In order to avoid overfitting of noisy samples, we consider that the error rate is divided into fragmentary errors. We introduce a factor based on misclassified samples, to update the weight distribution in the training procedure. Furthermore, the algorithm developed is applied to face detection procedure, for which it is used Block Multiscale Local Binary Patterns (MB-LBP) in feature extraction as well as a cascade of classifiers. The experimental results show that the proposal to include a factor based on the frequency of misclassified samples, is simple and efficient, showing advantages over classical AdaBoost algorithms, which include ability to generalize, preventing overfitting and higher hit rates in images of low resolution / Mestrado / Telecomunicações e Telemática / Mestre em Engenharia Elétrica Aprendizado de máquina Machine learning
48	GAINING INSIGHTS INTO TOURMALINE-BEARING LOCALITIES WITH MACHINE LEARNING ALGORITHMS Williams, Jason Ryan 01 September 2021 (has links) Machine learning algorithms can be used to analyze large datasets and to identify relationships and patterns that otherwise might be missed by more traditional scientific and statistical approaches. The aim of this study is to evaluate the ability of machine learning algorithms to classify mineral systems and provide insights into the geological processes operating on Earth. This study examines the potential of machine learning algorithms as interpretive tools for the identification of geological processes and additional approaches are implemented to predict how geological processes may have evolved at tourmaline-bearing localities in the United States. Tourmaline mineral occurrence data for localities in the United States were retrieved from mineral databases and exploratory machine learning algorithms, such as market basket analysis and hierarchical clustering, were used to identify geological and geochemical processes. Common geological processes operating in sedimentary, igneous, metamorphic, and hydrothermal systems were all identified based on the presence of diagnostic mineral assemblages such as actinolite-wollastonite-dravite in metamorphic rocks or microcline-schorl-beryl in igneous deposits. Several different iterations of supervised machine learning algorithms were used with models incorporating different combinations of mineral occurrence data, environmental data, and geological process labels in order to learn how to predict the geologic evolution of tourmaline-bearing localities. A test dataset was generated by selecting different locations within the United States randomly and mineralogy was assigned to each site by using interpolation methods. Decision tree and random forest algorithms were both then used to classify the randomly generated test dataset. Cross-validation approaches show that the decision trees likely performed better when classifying the test dataset. The results discussed throughout this study highlight how machine learning algorithms can be very effective and accurate supplementary tools when characterizing tourmaline-bearing deposits. The models discussed in this paper were able to classify different geological processes with over ~90% accuracy and they were able to predict how geological processes evolved at different tourmaline-bearing localities with an estimated ~70% accuracy. The most accurate classification of tourmaline-bearing localities occurred when analyzing deposits that were subjected to higher temperatures and pressures which in turn generates more distinct mineralogies that allow machine learning algorithms to identify patterns with greater confidence. The analysis of tourmaline localities associated with low-temperature hydrothermal and sedimentary environments results in much more error-prone classifiers which can be attributed to a lack of tourmaline-bearing sedimentary deposits in mineral databases and because sedimentary deposits can have a record of processes from multiple geologic environments that may or may not be related. The strengths and limitations of the models trained are detailed throughout this paper. Associations Machine learning Tourmaline
49	Performance Enhancement Schemesand Effective Incentives for Federated Learning Wang, Yuwei 16 November 2021 (has links) The advent of artificial intelligence applications demands for massive amount of data to supplement the training of machine learning models. Traditional machine learning schemes require central processing of large volumes of data that may contain sensitive patterns such as user location, personal information, or transactions history. Federated Learning (FL) has been proposed to complement the traditional centralized methods where multiple local models are trained and aggregated over a centralized cloud server. However, the performance of FL needs to be further improved, since its accuracy is not on par with traditional centralized machine learning approaches. Furthermore, due to the possibility of privacy information leakage, there are not enough clients willing to participate in FL training process. Common practice for the uploaded local models is an evenly weighted aggregation, assuming that each node of the network contributes to advancing the global model equally, which is unfair with higher contribution model owners. This thesis focuses on three aspects of improving a whole federated learning pipeline: client selection; reputation enabled weight aggregation; and incentive mechanism. For client selection, a reputation score consists of evaluation metrics is introduced to eliminate poor performing model contributions. This scheme enhances the original implementation by up to 10% for non-IID datasets. We also reduce the training time of selection scheme by roughly 27.7% compared to the baseline implementation. Then, a reputation-enabled weighted aggregation of the local models for distributed learning is proposed. Thus, the contribution of a local model and its aggregation weight is evaluated and determined by its reputation score, which is formulated as same above. Numerical comparison of the proposed methodology that assigns different aggregation weights based on the accuracy of each model to a baseline that utilizes standard average aggregation weight shows an accuracy improvement of 17.175% over the standard baseline for not independent and identically distributed (non-IID) scenarios for an FL network of 100 participants. Last but not least, for incentive mechanism, we can reward participants based on data quality, data quantity, reputation and resource allocation of participants. In this thesis, we adopt a reputation-aware reverse auction that was earlier proposed to recruit dependable participants for mobile crowdsensing campaigns, and modify that incentive to adapt it to a FL setting where user utility is defined as a function of the assigned payment from the central server and the user’s service cost, such as battery and processor usage. Through numerical results, we show that: 1) the proposed incentive can improve the user utilities when compared to the baseline approaches, 2) platform utility can be maintained at a close value to that under the baselines, 3) the overall test accuracy of the aggregated global model can even slightly improve. Federated Learning Machine Learning
50	APIC: A method for automated pattern identification and classification Goss, Ryan Gavin January 2017 (has links) Machine Learning (ML) is a transformative technology at the forefront of many modern research endeavours. The technology is generating a tremendous amount of attention from researchers and practitioners, providing new approaches to solving complex classification and regression tasks. While concepts such as Deep Learning have existed for many years, the computational power for realising the utility of these algorithms in real-world applications has only recently become available. This dissertation investigated the efficacy of a novel, general method for deploying ML in a variety of complex tasks, where best feature selection, data-set labelling, model definition and training processes were determined automatically. Models were developed in an iterative fashion, evaluated using both training and validation data sets. The proposed method was evaluated using three distinct case studies, describing complex classification tasks often requiring significant input from human experts. The results achieved demonstrate that the proposed method compares with, and often outperforms, less general, comparable methods designed specifically for each task. Feature selection, data-set annotation, model design and training processes were optimised by the method, where less complex, comparatively accurate classifiers with lower dependency on computational power and human expert intervention were produced. In chapter 4, the proposed method demonstrated improved efficacy over comparable systems, automatically identifying and classifying complex application protocols traversing IP networks. In chapter 5, the proposed method was able to discriminate between normal and anomalous traffic, maintaining accuracy in excess of 99%, while reducing false alarms to a mere 0.08%. Finally, in chapter 6, the proposed method discovered more optimal classifiers than those implemented by comparable methods, with classification scores rivalling those achieved by state-of-the-art systems. The findings of this research concluded that developing a fully automated, general method, exhibiting efficacy in a wide variety of complex classification tasks with minimal expert intervention, was possible. The method and various artefacts produced in each case study of this dissertation are thus significant contributions to the field of ML. Pattern Recognition Machine Learning

Search results