Global ETD Search

121	Cost-Sensitive Classification Methods for the Detection of Smuggled Nuclear Material in Cargo Containers Webster, Jennifer B 16 December 2013 (has links) Classification problems arise in so many different parts of life – from sorting machine parts to diagnosing a disease. Humans make these classifications utilizing vast amounts of data, filtering observations for useful information, and then making a decision based on a subjective level of cost/risk of classifying objects incorrectly. This study investigates the translation of the human decision process into a mathematical problem in the context of a border security problem: How does one find special nuclear material being smuggled inside large cargo crates while balancing the cost of invasively searching suspect containers against the risk of al lowing radioactive material to escape detection? This may be phrased as a classification problem in which one classifies cargo containers into two categories – those containing a smuggled source and those containing only innocuous cargo. This task presents numerous challenges, e.g., the stochastic nature of radiation and the low signal-to-noise ratio caused by background radiation and cargo shielding. In the course of this work, we will break the analysis of this problem into three major sections – the development of an optimal decision rule, the choice of most useful measurements or features, and the sensitivity of developed algorithms to physical variations. This will include an examination of how accounting for the cost/risk of a decision affects the formulation of our classification problem. Ultimately, a support vector machine (SVM) framework with F -score feature selection will be developed to provide nearly optimal classification given a constraint on the reliability of detection provided by our algorithm. In particular, this can decrease the fraction of false positives by an order of magnitude over current methods. The proposed method also takes into account the relationship between measurements, whereas current methods deal with detectors independently of one another. cost-sensitive classification feature selection error control border security smuggled nuclear material
122	Active Control Strategies for Chemical Sensors and Sensor Arrays Gosangi, Rakesh 16 December 2013 (has links) Chemical sensors are generally used as one-dimensional devices, where one measures the sensor’s response at a fixed setting, e.g., infrared absorption at a specific wavelength, or conductivity of a solid-state sensor at a specific operating temperature. In many cases, additional information can be extracted by modulating some internal property (e.g., temperature, voltage) of the sensor. However, this additional information comes at a cost (e.g., sensing times, power consumption), so offline optimization techniques (such as feature-subset selection) are commonly used to identify a subset of the most informative sensor tunings. An alternative to offline techniques is active sensing, where the sensor tunings are adapted in real-time based on the information obtained from previous measurements. Prior work in domains such as vision, robotics, and target tracking has shown that active sensing can schedule agile sensors to manage their sensing resources more efficiently than passive sensing, and also balance between sensing costs and performance. Inspired from the history of active sensing, in this dissertation, we developed active sensing algorithms that address three different computational problems in chemical sensing. First, we consider the problem of classification with a single tunable chemical sensor. We formulate the classification problem as a partially observable Markov decision process, and solve it with a myopic algorithm. At each step, the algorithm estimates the utility of each sensing configuration as the difference between expected reduction in Bayesian risk and sensing cost, and selects the configuration with maximum utility. We evaluated this approach on simulated Fabry-Perot interferometers (FPI), and experimentally validated on metal-oxide (MOX) sensors. Our results show that the active sensing method obtains better classification performance than passive sensing methods, and also is more robust to additive Gaussian noise in sensor measurements. Second, we consider the problem of estimating concentrations of the constituents in a gas mixture using a tunable sensor. We formulate this multicomponent-analysis problem as that of probabilistic state estimation, where each state represents a different concentration profile. We maintain a belief distribution that assigns a probability to each profile, and update the distribution by incorporating the latest sensor measurements. To select the sensor’s next operating configuration, we use a myopic algorithm that chooses the operating configuration expected to best reduce the uncertainty in the future belief distribution. We validated this approach on both simulated and real MOX sensors. The results again demonstrate improved estimation performance and robustness to noise. Lastly, we present an algorithm that extends active sensing to sensor arrays. This algorithm borrows concepts from feature subset selection to enable an array of tunable sensors operate collaboratively for the classification of gas samples. The algorithm constructs an optimized action vector at each sensing step, which contains separate operating configurations for each sensor in the array. When dealing with sensor arrays, one needs to account for the correlation among sensors. To this end, we developed two objective functions: weighted Fisher scores, and dynamic mutual information, which can quantify the discriminatory information and redundancy of a given action vector with respect to the measurements already acquired. Once again, we validated the approach on simulated FPI arrays and experimentally tested it on an array of MOX sensors. The results show improved classification performance and robustness to additive noise. Active sensing Chemical sensors Tunable sensors Adaptive sensing Markov models Feature selection
123	Contributions to generic visual object categorization Fu, Huanzhang 14 December 2010 (has links) (PDF) This thesis is dedicated to the active research topic of generic Visual Object Categorization(VOC), which can be widely used in many applications such as videoindexation and retrieval, video monitoring, security access control, automobile drivingsupport etc. Due to many realistic difficulties, it is still considered to be one ofthe most challenging problems in computer vision and pattern recognition. In thiscontext, we have proposed in this thesis our contributions, especially concerning thetwo main components of the methods addressing VOC problems, namely featureselection and image representation.Firstly, an Embedded Sequential Forward feature Selection algorithm (ESFS)has been proposed for VOC. Its aim is to select the most discriminant features forobtaining a good performance for the categorization. It is mainly based on thecommonly used sub-optimal search method Sequential Forward Selection (SFS),which relies on the simple principle to add incrementally most relevant features.However, ESFS not only adds incrementally most relevant features in each stepbut also merges them in an embedded way thanks to the concept of combinedmass functions from the evidence theory which also offers the benefit of obtaining acomputational cost much lower than the one of original SFS.Secondly, we have proposed novel image representations to model the visualcontent of an image, namely Polynomial Modeling and Statistical Measures basedImage Representation, called PMIR and SMIR respectively. They allow to overcomethe main drawback of the popular "bag of features" method which is the difficultyto fix the optimal size of the visual vocabulary. They have been tested along withour proposed region based features and SIFT. Two different fusion strategies, earlyand late, have also been considered to merge information from different "channels"represented by the different types of features.Thirdly, we have proposed two approaches for VOC relying on sparse representation,including a reconstructive method (R_SROC) as well as a reconstructiveand discriminative one (RD_SROC). Indeed, sparse representation model has beenoriginally used in signal processing as a powerful tool for acquiring, representingand compressing the high-dimensional signals. Thus, we have proposed to adaptthese interesting principles to the VOC problem. R_SROC relies on the intuitiveassumption that an image can be represented by a linear combination of trainingimages from the same category. Therefore, the sparse representations of images arefirst computed through solving the ℓ1 norm minimization problem and then usedas new feature vectors for images to be classified by traditional classifiers such asSVM. To improve the discrimination ability of the sparse representation to betterfit the classification problem, we have also proposed RD_SROC which includes adiscrimination term, such as Fisher discrimination measure or the output of a SVMclassifier, to the standard sparse representation objective function in order to learna reconstructive and discriminative dictionary. Moreover, we have also proposedChapter 0. Abstractto combine the reconstructive and discriminative dictionary and the adapted purereconstructive dictionary for a given category so that the discrimination power canfurther be increased.The efficiency of all the methods proposed in this thesis has been evaluated onpopular image datasets including SIMPLIcity, Caltech101 and Pascal2007. [SPI] Engineering Sciences [SPI] Sciences de l'ingénieur Visual object categorization Feature selection Image representation Sparse representation
124	Machine Learning and Graph Theory Approaches for Classification and Prediction of Protein Structure Altun, Gulsah 22 April 2008 (has links) Recently, many methods have been proposed for the classification and prediction problems in bioinformatics. One of these problems is the protein structure prediction. Machine learning approaches and new algorithms have been proposed to solve this problem. Among the machine learning approaches, Support Vector Machines (SVM) have attracted a lot of attention due to their high prediction accuracy. Since protein data consists of sequence and structural information, another most widely used approach for modeling this structured data is to use graphs. In computer science, graph theory has been widely studied; however it has only been recently applied to bioinformatics. In this work, we introduced new algorithms based on statistical methods, graph theory concepts and machine learning for the protein structure prediction problem. A new statistical method based on z-scores has been introduced for seed selection in proteins. A new method based on finding common cliques in protein data for feature selection is also introduced, which reduces noise in the data. We also introduced new binary classifiers for the prediction of structural transitions in proteins. These new binary classifiers achieve much higher accuracy results than the current traditional binary classifiers. protein structure prediction feature selection support vector machines graph theory machine learning algorithm Computer Sciences
125	REGION-COLOR BASED AUTOMATED BLEEDING DETECTION IN CAPSULE ENDOSCOPY VIDEOS 2014 June 1900 (has links) Capsule Endoscopy (CE) is a unique technique for facilitating non-invasive and practical visualization of the entire small intestine. It has attracted a critical mass of studies for improvements. Among numerous studies being performed in capsule endoscopy, tremendous efforts are being made in the development of software algorithms to identify clinically important frames in CE videos. This thesis presents a computer-assisted method which performs automated detection of CE video-frames that contain bleeding. Specifically, a methodology is proposed to classify the frames of CE videos into bleeding and non-bleeding frames. It is a Support Vector Machine (SVM) based supervised method which classifies the frames on the basis of color features derived from image-regions. Image-regions are characterized on the basis of statistical features. With 15 available candidate features, an exhaustive feature-selection is followed to obtain the best feature subset. The best feature-subset is the combination of features that has the highest bleeding discrimination ability as determined by the three performance-metrics: accuracy, sensitivity and specificity. Also, a ground truth label annotation method is proposed in order to partially automate delineation of bleeding regions for training of the classifier. The method produced promising results with sensitivity and specificity values up to 94%. All the experiments were performed separately for RGB and HSV color spaces. Experimental results show the combination of the mean planes in red and green planes to be the best feature-subset in RGB (Red-Green-Blue) color space and the combination of the mean values of all three planes of the color space to be the best feature-subset in HSV (Hue-Saturation-Value).
126	REGION-COLOR BASED AUTOMATED BLEEDING DETECTION IN CAPSULE ENDOSCOPY VIDEOS 2014 June 1900 (has links) Capsule Endoscopy (CE) is a unique technique for facilitating non-invasive and practical visualization of the entire small intestine. It has attracted a critical mass of studies for improvements. Among numerous studies being performed in capsule endoscopy, tremendous efforts are being made in the development of software algorithms to identify clinically important frames in CE videos. This thesis presents a computer-assisted method which performs automated detection of CE video-frames that contain bleeding. Specifically, a methodology is proposed to classify the frames of CE videos into bleeding and non-bleeding frames. It is a Support Vector Machine (SVM) based supervised method which classifies the frames on the basis of color features derived from image-regions. Image-regions are characterized on the basis of statistical features. With 15 available candidate features, an exhaustive feature-selection is followed to obtain the best feature subset. The best feature-subset is the combination of features that has the highest bleeding discrimination ability as determined by the three performance-metrics: accuracy, sensitivity and specificity. Also, a ground truth label annotation method is proposed in order to partially automate delineation of bleeding regions for training of the classifier. The method produced promising results with sensitivity and specificity values up to 94%. All the experiments were performed separately for RGB and HSV color spaces. Experimental results show the combination of the mean planes in red and green planes to be the best feature-subset in RGB (Red-Green-Blue) color space and the combination of the mean values of all three planes of the color space to be the best feature-subset in HSV (Hue-Saturation-Value).
127	Effective and Efficient Optimization Methods for Kernel Based Classification Problems Tayal, Aditya January 2014 (has links) Kernel methods are a popular choice in solving a number of problems in statistical machine learning. In this thesis, we propose new methods for two important kernel based classification problems: 1) learning from highly unbalanced large-scale datasets and 2) selecting a relevant subset of input features for a given kernel specification. The first problem is known as the rare class problem, which is characterized by a highly skewed or unbalanced class distribution. Unbalanced datasets can introduce significant bias in standard classification methods. In addition, due to the increase of data in recent years, large datasets with millions of observations have become commonplace. We propose an approach to address both the problem of bias and computational complexity in rare class problems by optimizing area under the receiver operating characteristic curve and by using a rare class only kernel representation, respectively. We justify the proposed approach theoretically and computationally. Theoretically, we establish an upper bound on the difference between selecting a hypothesis from a reproducing kernel Hilbert space and a hypothesis space which can be represented using a subset of kernel functions. This bound shows that for a fixed number of kernel functions, it is optimal to first include functions corresponding to rare class samples. We also discuss the connection of a subset kernel representation with the Nystrom method for a general class of regularized loss minimization methods. Computationally, we illustrate that the rare class representation produces statistically equivalent test error results on highly unbalanced datasets compared to using the full kernel representation, but with significantly better time and space complexity. Finally, we extend the method to rare class ordinal ranking, and apply it to a recent public competition problem in health informatics. The second problem studied in the thesis is known as the feature selection problem in literature. Embedding feature selection in kernel classification leads to a non-convex optimization problem. We specify a primal formulation and solve the problem using a second-order trust region algorithm. To improve efficiency, we use the two-block Gauss-Seidel method, breaking the problem into a convex support vector machine subproblem and a non-convex feature selection subproblem. We reduce possibility of saddle point convergence and improve solution quality by sharing an explicit functional margin variable between block iterates. We illustrate how our algorithm improves upon state-of-the-art methods. Kernel Methods Statistical Classification Rare Class Learning Ordinal Regression Feature Selection Optimization Trust Region Method
128	Feature selection and hierarchical classifier design with applications to human motion recognition Freeman, Cecille January 2014 (has links) The performance of a classifier is affected by a number of factors including classifier type, the input features and the desired output. This thesis examines the impact of feature selection and classification problem division on classification accuracy and complexity. Proper feature selection can reduce classifier size and improve classifier performance by minimizing the impact of noisy, redundant and correlated features. Noisy features can cause false association between the features and the classifier output. Redundant and correlated features increase classifier complexity without adding additional information. Output selection or classification problem division describes the division of a large classification problem into a set of smaller problems. Problem division can improve accuracy by allocating more resources to more difficult class divisions and enabling the use of more specific feature sets for each sub-problem. The first part of this thesis presents two methods for creating feature-selected hierarchical classifiers. The feature-selected hierarchical classification method jointly optimizes the features and classification tree-design using genetic algorithms. The multi-modal binary tree (MBT) method performs the class division and feature selection sequentially and tolerates misclassifications in the higher nodes of the tree. This yields a piecewise separation for classes that cannot be fully separated with a single classifier. Experiments show that the accuracy of MBT is comparable to other multi-class extensions, but with lower test time. Furthermore, the accuracy of MBT is significantly higher on multi-modal data sets. The second part of this thesis focuses on input feature selection measures. A number of filter-based feature subset evaluation measures are evaluated with the goal of assessing their performance with respect to specific classifiers. Although there are many feature selection measures proposed in literature, it is unclear which feature selection measures are appropriate for use with different classifiers. Sixteen common filter-based measures are tested on 20 real and 20 artificial data sets, which are designed to probe for specific feature selection challenges. The strengths and weaknesses of each measure are discussed with respect to the specific feature selection challenges in the artificial data sets, correlation with classifier accuracy and their ability to identify known informative features. The results indicate that the best filter measure is classifier-specific. K-nearest neighbours classifiers work well with subset-based RELIEF, correlation feature selection or conditional mutual information maximization, whereas Fisher's interclass separability criterion and conditional mutual information maximization work better for support vector machines. Based on the results of the feature selection experiments, two new filter-based measures are proposed based on conditional mutual information maximization, which performs well but cannot identify dependent features in a set and does not include a check for correlated features. Both new measures explicitly check for dependent features and the second measure also includes a term to discount correlated features. Both measures correctly identify known informative features in the artificial data sets and correlate well with classifier accuracy. The final part of this thesis examines the use of feature selection for time-series data by using feature selection to determine important individual time windows or key frames in the series. Time-series feature selection is used with the MBT algorithm to create classification trees for time-series data. The feature selected MBT algorithm is tested on two human motion recognition tasks: full-body human motion recognition from joint angle data and hand gesture recognition from electromyography data. Results indicate that the feature selected MBT is able to achieve high classification accuracy on the time-series data while maintaining a short test time. classification feature selection human motion recognition hand gesture recognition filter measures hierarchical classifiers
129	Reconnaissance de forme dans cybersécurité Vashaee, Ali January 2014 (has links) Résumé : L’expansion des images sur le Web a provoqué le besoin de mettre en œuvre des méthodes de classement d’images précises pour plusieurs applications notamment la cybersécurité. L’extraction des caractéristiques est une étape primordiale dans la procédure du classement des images vu son impact direct sur la performance de la catégorisation finale des images et de leur classement. L’objectif de cette étude est d’analyser l’état de l’art des différents espaces de caractéristiques pour évaluer leur efficacité dans le contexte de la reconnaissance de forme pour les applications de cybersécurité. Les expériences ont montré que les descripteurs de caractéristiques HOG et GIST ont une performance élevée. Par contre, cette dernière se dégrade face aux transformations géométriques des objets dans les images. Afin d’obtenir des systèmes de classement d’image plus fiables basés sur ces descripteurs, nous proposons deux méthodes. Dans la première méthode (PrMI) nous nous concentrons sur l’amélioration de la propriété d’invariance du système de classement par tout en maintenant la performance du classement. Dans cette méthode, un descripteur invariant par rapport à la rotation dérivé de HOG est utilisé (RIHOG) dans une technique de recherche "top-down" pour le classement des images. La méthode (PrMI) proposée donne non seulement une robustesse face aux transformations géométriques des objets, mais aussi une performance élevée similaire à celle de HOG. Elle est aussi efficace en terme de coût de calcul avec une complexité de l’ordre de O(n). Dans la deuxième méthode proposée (PrMII), nous nous focalisons sur la performance du classement en maintenant la propriété d’invariance du système de classement. Les objets sont localisés d’une façon invariante aux changement d’échelle dans l’espace de caractéristiques de covariance par région. Ensuite elles sont décrites avec les descripteurs HOG et GIST. Cette méthode procure une performance de classement meilleure en comparaison avec les méthodes implémentées dans l’étude et quelques méthodes CBIR expérimentées sur les données Caltech-256 dans les travaux antérieurs. // Abstract : The tremendous growth of accessible online images (Web images), provokes the need to perform accurate image ranking for applications like cyber-security. Feature extraction is an important step in image ranking procedures due to its direct impact on final categorization and ranking performance. The goal of this study is to analyse the state of the art feature spaces in order to evaluate their efficiency in the abject recognition context and image ranking framework for cyber-security applications. Experiments show that HOG and GIST feature descriptors exhibit high ranking performance. Whereas, these features are not rotation and scale invariant. In order to obtain more reliable image ranking systems based on these feature spaces, we proposed two methods. In the first method (PrMI) we focused on improving the invariance property of the ranking system while maintaining the ranking performance. In this method, a rotation invariant feature descriptor is derived from HOC (RIHOC). This descriptor is used in a top-down searching technique to caver the scale variation of the abjects in the images. The proposed method (PrMI) not only pro vides robustness against geometrical transformations of objects but also provides high ranking performance close to HOC performance. It is also computationally efficient with complexity around O(n). In the second proposed method (PrMII) we focused on the ranking performance while maintaining the invariance property of the ranking system. Objects are localized in a scale invariant fashion under a Region Covariance feature space, then they are described using HOC and CIST features. Finally to ob tain better evaluation over the performance of proposed method we compare it with existing research in the similar domain(CBIR) on Caltech-256. Proposed methods provide highest ranking performance in comparison with implemented methods in this study, and some of the CBIR methods on Caltech-256 dataset in previous works. Reconnaissance de forme Cyber-sécurité Sélection des fonctionnalités Object recognition Cyber-security Feature selection
130	New tools for unsupervised learning Xiao, Ying 12 January 2015 (has links) In an unsupervised learning problem, one is given an unlabelled dataset and hopes to find some hidden structure; the prototypical example is clustering similar data. Such problems often arise in machine learning and statistics, but also in signal processing, theoretical computer science, and any number of quantitative scientific fields. The distinguishing feature of unsupervised learning is that there are no privileged variables or labels which are particularly informative, and thus the greatest challenge is often to differentiate between what is relevant or irrelevant in any particular dataset or problem. In the course of this thesis, we study a number of problems which span the breadth of unsupervised learning. We make progress in Gaussian mixtures, independent component analysis (where we solve the open problem of underdetermined ICA), and we formulate and solve a feature selection/dimension reduction model. Throughout, our goal is to give finite sample complexity bounds for our algorithms -- these are essentially the strongest type of quantitative bound that one can prove for such algorithms. Some of our algorithmic techniques turn out to be very efficient in practice as well. Our major technical tool is tensor spectral decomposition: tensors are generalisations of matrices, and often allow access to the "fine structure" of data. Thus, they are often the right tools for unravelling the hidden structure in an unsupervised learning setting. However, naive generalisations of matrix algorithms to tensors run into NP-hardness results almost immediately, and thus to solve our problems, we are obliged to develop two new tensor decompositions (with robust analyses) from scratch. Both of these decompositions are polynomial time, and can be viewed as efficient generalisations of PCA extended to tensors. Tensor Spectral decomposition Unsupervised learning Independent component analysis Fourier transform Gaussian mixture model Feature selection

Search results