Global ETD Search

71	A support vector machine model for pipe crack size classification Miao, Chuxiong Unknown Date No description available. support vector machines KFD data dependent kernel
72	Essays on wage dispersion Davies, Stuart January 1999 (has links) No description available. 330
73	Learning matrix and functional models in high-dimensions Balasubramanian, Krishnakumar 27 August 2014 (has links) Statistical machine learning methods provide us with a principled framework for extracting meaningful information from noisy high-dimensional data sets. A significant feature of such procedures is that the inferences made are statistically significant, computationally efficient and scientifically meaningful. In this thesis we make several contributions to such statistical procedures. Our contributions are two-fold. We first address prediction and estimation problems in non-standard situations. We show that even when given no access to labeled samples, one can still consistently estimate error rate of predictors and train predictors with respect to a given (convex) loss function. We next propose an efficient procedure for predicting with large output spaces, that scales logarithmically in the dimensionality of the output space. We further propose an asymptotically optimal procedure for sparse multi-task learning when the tasks share a joint support. We show consistency of the proposed method and derive rates of convergence. We next address the problem of learning meaningful representations of data. We propose a method for learning sparse representations that takes into account the structure of the data space and demonstrate how it enables one to obtain meaningful features. We establish sample complexity results for the proposed approach. We then propose a model-free feature selection procedure and establish its sure-screening property in the high dimensional regime. Furthermore we show that with a slight modification, the approach previously proposed for sparse multi-task learning enables one to obtain sparse representations for multiple related tasks simultaneously. Statistics Machine learning Matrix Kernel Consistency
74	Clustering of Questionnaire Based on Feature Extracted by Geometric Algebra Tachibana, Kanta, Furuhashi, Takeshi, Yoshikawa, Tomohiro, Hitzer, Eckhard, MINH TUAN PHAM January 2008 (has links) Session ID: FR-G2-2 / Joint 4th International Conference on Soft Computing and Intelligent Systems and 9th International Symposium on advanced Intelligent Systems, September 17-21, 2008, Nagoya University, Nagoya, Japan Semi-Supervised Learning Kernel Alignment Geometric Algebra
75	Development of a Cottonseed Dehulling Process to Yield Intact Seed Meats Nunneley, Jacob Lawrence 02 October 2013 (has links) With recent genetic advances in development of gossypol-free cotton varieties, there is interest in retrieving undamaged, dehulled cottonseed kernels for development of new food and feed products. Current methods used to dehull cottonseed provide a low turnout of undamaged kernels that would be desirable for new market niches where intact kernels are desirable. The first objective of the described work was to develop a process for dehulling fuzzy cottonseed to render a high percentage of undamaged seed meats. A series of methods were tested and optimized to identify the suite of processes that provided the highest yields. The final process included steam conditioning, cracking and dehulling using roller mills, and finally separating kernels from hull material using a roller separator and air aspirator. The reintroduction of un-dehulled seed to the roller mills for a second pass significantly increased the final yield of undamaged seed meats. Lab-scale tests show that yields of 65% to 70% can be obtained using this process, representing a significant increase over conventional dehulling, which typically results in less than 5% yields of undamaged kernels. The second objective of the research was to integrate components of the lab-scale milling process into a continuous-flow, pilot-scale system. The performance of the milling system with and without steam conditioning was evaluated. Pilot-scale, continuous-flow tests resulted in undamaged kernel yields of 67.9 ± 3.0% (mean ± 95% confidence interval) during wet milling, comparable to results of initial batch processing and far exceeding yields of whole kernels from current milling techniques. During dry milling, the efficiency of the system to extract all possible kernel material was found to be 68 ± 2.9%, but most of the resulting kernel material is in broken fragments between 3.35 mm and 0.706 mm in diameter. Cottonseed Dehulling Milling Kernel Gossypol-free
76	A support vector machine model for pipe crack size classification Miao, Chuxiong 11 1900 (has links) Classifying pipe cracks by size from their pulse-echo ultrasonic signal is difficult but highly significant for the defect evaluation required in pipe testing and maintenance decision making. For this thesis, a binary Support Vector Machine (SVM) classifier, which divides pipe cracks into two categories: large and small, was developed using collected ultrasonic signals. To improve the performance of this SVM classifier in terms of reducing test errors, we first combined the Sequential Backward Selection and Sequential Forward Selection schemes for input feature reduction. Secondly, we used the data dependent kernel instead of the Gaussian kernel as the kernel function in the SVM classifier. Thirdly, as it is time-consuming to use the classic grid-search method for parameter selection of SVM, this work proposes a Kernel Fisher Discriminant Ratio (KFD Ratio) which makes it possible to more quickly select parameters for the SVM classifier. support vector machines KFD data dependent kernel
77	Graphical Models: Modeling, Optimization, and Hilbert Space Embedding Zhang, Xinhua, xinhua.zhang.cs@gmail.com January 2010 (has links) Over the past two decades graphical models have been widely used as powerful tools for compactly representing distributions. On the other hand, kernel methods have been used extensively to come up with rich representations. This thesis aims to combine graphical models with kernels to produce compact models with rich representational abilities. Graphical models are a powerful underlying formalism in machine learning. Their graph theoretic properties provide both an intuitive modular interface to model the interacting factors, and a data structure facilitating efficient learning and inference. The probabilistic nature ensures the global consistency of the whole framework, and allows convenient interface of models to data. Kernel methods, on the other hand, provide an effective means of representing rich classes of features for general objects, and at the same time allow efficient search for the optimal model. Recently, kernels have been used to characterize distributions by embedding them into high dimensional feature space. Interestingly, graphical models again decompose this characterization and lead to novel and direct ways of comparing distributions based on samples. Among the many uses of graphical models and kernels, this thesis is devoted to the following four areas: Conditional random fields for multi-agent reinforcement learning Conditional random fields (CRFs) are graphical models for modelling the probability of labels given the observations. They have traditionally been trained with using a set of observation and label pairs. Underlying all CRFs is the assumption that, conditioned on the training data, the label sequences of different training examples are independent and identically distributed (iid ). We extended the use of CRFs to a class of temporal learning algorithms, namely policy gradient reinforcement learning (RL). Now the labels are no longer iid. They are actions that update the environment and affect the next observation. From an RL point of view, CRFs provide a natural way to model joint actions in a decentralized Markov decision process. They define how agents can communicate with each other to choose the optimal joint action. We tested our framework on a synthetic network alignment problem, a distributed sensor network, and a road traffic control system. Using tree sampling by Hamze & de Freitas (2004) for inference, the RL methods employing CRFs clearly outperform those which do not model the proper joint policy. Bayesian online multi-label classification Gaussian density filtering (GDF) provides fast and effective inference for graphical models (Maybeck, 1982). Based on this natural online learner, we propose a Bayesian online multi-label classification (BOMC) framework which learns a probabilistic model of the linear classifier. The training labels are incorporated to update the posterior of the classifiers via a graphical model similar to TrueSkill (Herbrich et al., 2007), and inference is based on GDF with expectation propagation. Using samples from the posterior, we label the test data by maximizing the expected F-score. Our experiments on Reuters1-v2 dataset show that BOMC delivers significantly higher macro-averaged F-score than the state-of-the-art online maximum margin learners such as LaSVM (Bordes et al., 2005) and passive aggressive online learning (Crammer et al., 2006). The online nature of BOMC also allows us to effciently use a large amount of training data. Hilbert space embedment of distributions Graphical models are also an essential tool in kernel measures of independence for non-iid data. Traditional information theory often requires density estimation, which makes it unideal for statistical estimation. Motivated by the fact that distributions often appear in machine learning via expectations, we can characterize the distance between distributions in terms of distances between means, especially means in reproducing kernel Hilbert spaces which are called kernel embedment. Under this framework, the undirected graphical models further allow us to factorize the kernel embedment onto cliques, which yields efficient measures of independence for non-iid data (Zhang et al., 2009). We show the effectiveness of this framework for ICA and sequence segmentation, and a number of further applications and research questions are identified. Optimization in maximum margin models for structured data Maximum margin estimation for structured data, e.g. (Taskar et al., 2004), is an important task in machine learning where graphical models also play a key role. They are special cases of regularized risk minimization, for which bundle methods (BMRM, Teo et al., 2007) and the closely related SVMStruct (Tsochantaridis et al., 2005) are state-of-the-art general purpose solvers. Smola et al. (2007b) proved that BMRM requires O(1/έ) iterations to converge to an έ accurate solution, and we further show that this rate hits the lower bound. By utilizing the structure of the objective function, we devised an algorithm for the structured loss which converges to an έ accurate solution in O(1/√έ) iterations. This algorithm originates from Nesterov's optimal first order methods (Nesterov, 2003, 2005b). Machine Learning Graphical Models Kernel Methods Optimization
78	Integrating Sequence and Structure for Annotating Proteins in the Twilight Zone: A Machine Learning Approach Isye Arieshanti Unknown Date (has links) Determining protein structure and function experimentally is both costly and time consuming. Transferring function-related protein annotations based on homology-based methods is relatively straightforward for proteins that have sequence identity of more than 40%. However, there are many proteins in the "twilight zone" where sequence similarity with any other protein is very weak, while being structurally similar to several. Such cases require methods that are capable of using and exploiting both sequence and structural similarity. To understand ways of how such methods can and should be designed is the focus of this study. In this thesis, models that use both sequence and structure features are applied on two protein prediction problems that are particularly challenging when relying on sequence alone. Enzyme classification benefits from both kinds of features because on one hand, enzymes can have identical function with limited sequence similarity while on the other hand, proteins with similar fold may have disparate enzyme class annotation. This thesis shows that the full integration of protein sequence and structure-related features (via the use of kernels) automatically places proteins with similar biological properties closer together, leading to superior classification accuracy using Support Vector Machines. Disulfide-bonds link residues in a protein structure, but may appear distant in sequence. Sequence similarity reflecting such structural properties is thus very hard to detect. It is sufficient for the structure to be similar for accurate prediction of disulfide-bonds, but such information is very scarce and predictors that rely on protein structure are not nearly as useful as those operating on sequence alone. This thesis proposes a novel approach based on Kernel Canonical Correlation Analysis that uses structural features during training only. It does so by finding sequence representations that correlate with structural features that are essential for a disulfide bond. The resulting representations enable high prediction accuracy for a range of disulfide-bond problems. The proposed model thus taps the advantage of structural features without requiring protein structure to be available in the prediction process. The merits of this approach should apply to a number of open protein structure prediction problems.
79	An implementation of kernelization via matchings Xiao, Dan. January 2004 (has links) Thesis (M.S.)--Ohio University, March, 2004. / Title from PDF t.p. Includes bibliographical references (leaves 51-55).
80	Kernel-based clustering and low rank approximation / Zhang, Kai. January 2008 (has links) Thesis (Ph.D.)--Hong Kong University of Science and Technology, 2008. / Includes bibliographical references (leaves 88-98). Also available in electronic version.

Search results