Global ETD Search

281	Quantifying the Trenches: Machine Learning Applied to NFL Offensive Lineman Valuation Pyne, Sean 01 January 2017 (has links) There are 32 teams in the National Football League all competing to be the best by creating the strongest roster possible. The problem of evaluating talent has created extreme competition between teams in the form of a rookie draft and a fiercely competitive veteran free agent market. The difficulty with player evaluation is due to the noise associated with measuring a particular player’s value. The intent of this paper is to create an algorithm for identifying the inefficiencies in pricing in these player markets. In particular, this paper focuses on the veteran free agent market for offensive linemen in the NFL. NFL offensive linemen are difficult to evaluate empirically because of the significant amount of noise present due to an inability to measure a lineman’s performance directly. The algorithm first uses a machine learning technique, k-means cluster analysis, to generate a comparative set of offensive lineman. Then using that set of comparable offensive linemen, the algorithm flags any lineman that vary significantly in earnings from their peers. It is in this fashion that the algorithm provides relative valuations for particular offensive lineman. Machine Learning NFL Offensive Line Football Econometrics
282	Online intrusion detection design and implementation for SCADA networks Wang, Hongrui 25 April 2017 (has links) The standardization and interconnection of supervisory control and data acquisition (SCADA) systems has exposed the systems to cyber attacks. To improve the security of the SCADA systems, intrusion detection system (IDS) design is an effective method. However, traditional IDS design in the industrial networks mainly exploits the prede fined rules, which needs to be complemented and developed to adapt to the big data scenario. Therefore, this thesis aims to design an anomaly-based novel hierarchical online intrusion detection system (HOIDS) for SCADA networks based on machine learning algorithms theoretically and implement the theoretical idea of the anomaly-based intrusion detection on a testbed. The theoretical design of HOIDS by utilizing the server-client topology while keeping clients distributed for global protection, high detection rate is achieved with minimum network impact. We implement accurate models of normal-abnormal binary detection and multi-attack identification based on logistic regression and quasi-Newton optimization algorithm using the Broyden-Fletcher-Goldfarb-Shanno approach. The detection system is capable of accelerating detection by information gain based feature selection or principle component analysis based dimension reduction. By evaluating our system using the KDD99 dataset and the industrial control system datasets, we demonstrate that our design is highly scalable, e fficient and cost effective for securing SCADA infrastructures. Besides the theoretical IDS design, a testbed is modi ed and implemented for SCADA network security research. It simulates the working environment of SCADA systems with the functions of data collection and analysis for intrusion detection. The testbed is implemented to be more flexible and extensible compared to the existing related work on the testbeds. In the testbed, Bro network analyzer is introduced to support the research of anomaly-based intrusion detection. The procedures of both signature-based intrusion detection and anomaly-based intrusion detection using Bro analyzer are also presented. Besides, a generic Linux-based host is used as the container of different network functions and a human machine interface (HMI) together with the supervising network is set up to simulate the control center. The testbed does not implement a large number of traffic generation methods, but still provides useful examples of generating normal and abnormal traffic. Besides, the testbed can be modi ed or expanded in the future work about SCADA network security. / Graduate Intrusion detection SCADA networks Machine learning
283	Domain adaptation for classifying disaster-related Twitter data Sopova, Oleksandra January 1900 (has links) Master of Science / Department of Computing and Information Sciences / Doina Caragea / Machine learning is the subfield of Artificial intelligence that gives computers the ability to learn without being explicitly programmed, as it was defined by Arthur Samuel - the American pioneer in the field of computer gaming and artificial intelligence who was born in Emporia, Kansas. Supervised Machine Learning is focused on building predictive models given labeled training data. Data may come from a variety of sources, for instance, social media networks. In our research, we use Twitter data, specifically, user-generated tweets about disasters such as floods, hurricanes, terrorist attacks, etc., to build classifiers that could help disaster management teams identify useful information. A supervised classifier trained on data (training data) from a particular domain (i.e. disaster) is expected to give accurate predictions on unseen data (testing data) from the same domain, assuming that the training and test data have similar characteristics. Labeled data is not easily available for a current target disaster. However, labeled data from a prior source disaster is presumably available, and can be used to learn a supervised classifier for the target disaster. Unfortunately, the source disaster data and the target disaster data may not share the same characteristics, and the classifier learned from the source may not perform well on the target. Domain adaptation techniques, which use unlabeled target data in addition to labeled source data, can be used to address this problem. We study single-source and multi-source domain adaptation techniques, using Nave Bayes classifier. Experimental results on Twitter datasets corresponding to six disasters show that domain adaptation techniques improve the overall performance as compared to basic supervised learning classifiers. Domain adaptation is crucial for many machine learning applications, as it enables the use of unlabeled data in domains where labeled data is not available. Machine learning Domain adaptation Classification Twitter
284	Predicting the concentration of residual methanol in industrial formalin using machine learning / Forutspå koncentrationen av resterande metanol i industriell formalin med hjälp av maskininlärning Heidkamp, William January 2016 (has links) In this thesis, a machine learning approach was used to develop a predictive model for residual methanol concentration in industrial formalin produced at the Akzo Nobel factory in Kristinehamn, Sweden. The MATLABTM computational environment supplemented with the Statistics and Machine LearningTM toolbox from the MathWorks were used to test various machine learning algorithms on the formalin production data from Akzo Nobel. As a result, the Gaussian Process Regression algorithm was found to provide the best results and was used to create the predictive model. The model was compiled to a stand-alone application with a graphical user interface using the MATLAB CompilerTM. Machine learning Predictive modeling Formalin MATLAB
285	A Machine Learning Approach to Determine Oyster Vessel Behavior Frey, Devin 16 December 2016 (has links) A support vector machine (SVM) classifier was designed to replace a previous classifier which predicted oyster vessel behavior in the public oyster grounds of Louisiana. The SVM classifier predicts vessel behavior (docked, poling, fishing, or traveling) based on each vessel’s speed and either net speed or movement angle. The data from these vessels was recorded by a Vessel Monitoring System (VMS), and stored in a PostgreSQL database. The SVM classifier was written in Python, using the scikit-learn library, and was trained by using predictions from the previous classifier. Several validation and parameter optimization techniques were used to improve the SVM classifier’s accuracy. The previous classifier could classify about 93% of points from July 2013 to August 2014, but the SVM classifier can classify about 99.7% of those points. This new classifier can easily be expanded with additional features to further improve its predictive capabilities. Machine Learning, SVM, VMS, Fisheries Computer Sciences
286	Embodied Visual Object Recognition / Förkroppsligad objektigenkänning Wallenberg, Marcus January 2017 (has links) Object recognition is a skill we as humans often take for granted. Due to our formidable object learning, recognition and generalisation skills, it is sometimes hard to see the multitude of obstacles that need to be overcome in order to replicate this skill in an artificial system. Object recognition is also one of the classical areas of computer vision, and many ways of approaching the problem have been proposed. Recently, visually capable robots and autonomous vehicles have increased the focus on embodied recognition systems and active visual search. These applications demand that systems can learn and adapt to their surroundings, and arrive at decisions in a reasonable amount of time, while maintaining high object recognition performance. This is especially challenging due to the high dimensionality of image data. In cases where end-to-end learning from pixels to output is needed, mechanisms designed to make inputs tractable are often necessary for less computationally capable embodied systems.Active visual search also means that mechanisms for attention and gaze control are integral to the object recognition procedure. Therefore, the way in which attention mechanisms should be introduced into feature extraction and estimation algorithms must be carefully considered when constructing a recognition system.This thesis describes work done on the components necessary for creating an embodied recognition system, specifically in the areas of decision uncertainty estimation, object segmentation from multiple cues, adaptation of stereo vision to a specific platform and setting, problem-specific feature selection, efficient estimator training and attentional modulation in convolutional neural networks. Contributions include the evaluation of methods and measures for predicting the potential uncertainty reduction that can be obtained from additional views of an object, allowing for adaptive target observations. Also, in order to separate a specific object from other parts of a scene, it is often necessary to combine multiple cues such as colour and depth in order to obtain satisfactory results. Therefore, a method for combining these using channel coding has been evaluated. In order to make use of three-dimensional spatial structure in recognition, a novel stereo vision algorithm extension along with a framework for automatic stereo tuning have also been investigated. Feature selection and efficient discriminant sampling for decision tree-based estimators have also been implemented. Finally, attentional multi-layer modulation of convolutional neural networks for recognition in cluttered scenes has been evaluated. Several of these components have been tested and evaluated on a purpose-built embodied recognition platform known as Eddie the Embodied. / Embodied Visual Object Recognition / FaceTrack object recognition machine learning computer vision
287	Semi-supervised and Self-evolving Learning Algorithms with Application to Anomaly Detection in Cloud Computing Pannu, Husanbir Singh 12 1900 (has links) Semi-supervised learning (SSL) is the most practical approach for classification among machine learning algorithms. It is similar to the humans way of learning and thus has great applications in text/image classification, bioinformatics, artificial intelligence, robotics etc. Labeled data is hard to obtain in real life experiments and may need human experts with experimental equipments to mark the labels, which can be slow and expensive. But unlabeled data is easily available in terms of web pages, data logs, images, audio, video les and DNA/RNA sequences. SSL uses large unlabeled and few labeled data to build better classifying functions which acquires higher accuracy and needs lesser human efforts. Thus it is of great empirical and theoretical interest. We contribute two SSL algorithms (i) adaptive anomaly detection (AAD) (ii) hybrid anomaly detection (HAD), which are self evolving and very efficient to detect anomalies in a large scale and complex data distributions. Our algorithms are capable of modifying an existing classier by both retiring old data and adding new data. This characteristic enables the proposed algorithms to handle massive and streaming datasets where other existing algorithms fail and run out of memory. As an application to semi-supervised anomaly detection and for experimental illustration, we have implemented a prototype of the AAD and HAD systems and conducted experiments in an on-campus cloud computing environment. Experimental results show that the detection accuracy of both algorithms improves as they evolves and can achieve 92.1% detection sensitivity and 83.8% detection specificity, which makes it well suitable for anomaly detection in large and streaming datasets. We compared our algorithms with two popular SSL methods (i) subspace regularization (ii) ensemble of Bayesian sub-models and decision tree classifiers. Our contributed algorithms are easy to implement, significantly better in terms of space, time complexity and accuracy than these two methods for semi-supervised anomaly detection mechanism. Machine learning anomaly detection cloud computing
288	Machine learning in systems biology at different scales : from molecular biology to ecology Aderhold, Andrej January 2015 (has links) Machine learning has been a source for continuous methodological advances in the field of computational learning from data. Systems biology has profited in various ways from machine learning techniques but in particular from network inference, i.e. the learning of interactions given observed quantities of the involved components or data that stem from interventional experiments. Originally this domain of system biology was confined to the inference of gene regulation networks but recently expanded to other levels of organization of biological and ecological systems. Especially the application to species interaction networks in a varying environment is of mounting importance in order to improve our understanding of the dynamics of species extinctions, invasions, and population behaviour in general. The aim of this thesis is to demonstrate an extensive study of various state-of-art machine learning techniques applied to a genetic regulation system in plants and to expand and modify some of these methods to infer species interaction networks in an ecological setting. The first study attempts to improve the knowledge about circadian regulation in the plant Arabidopsis thaliana from the view point of machine learning and gives suggestions on what methods are best suited for inference, how the data should be processed and modelled mathematically, and what quality of network learning can be expected by doing so. To achieve this, I generate a rich and realistic synthetic data set that is used for various studies under consideration of different effects and method setups. The best method and setup is applied to real transcriptional data, which leads to a new hypothesis about the circadian clock network structure. The ecological study is focused on the development of two novel inference methods that exploit a common principle from transcriptional time-series, which states that expression profiles over time can be temporally heterogeneous. A corresponding concept in a spatial domain of 2 dimensions is that species interaction dynamics can be spatially heterogeneous, i.e. can change in space dependent on the environment and other factors. I will demonstrate the expansion from the 1-dimensional time domain to the 2-dimensional spatial domain, introduce two distinct space segmentation schemes, and consider species dispersion effects with spatial autocorrelation. The two novel methods display a significant improvement in species interaction inference compared to competing methods and display a high confidence in learning the spatial structure of different species neighbourhoods or environments. 570.285
289	Autotuning wavefront patterns for heterogeneous architectures Mohanty, Siddharth January 2015 (has links) Manual tuning of applications for heterogeneous parallel systems is tedious and complex. Optimizations are often not portable, and the whole process must be repeated when moving to a new system, or sometimes even to a different problem size. Pattern based parallel programming models were originally designed to provide programmers with an abstract layer, hiding tedious parallel boilerplate code, and allowing a focus on only application specific issues. However, the constrained algorithmic model associated with each pattern also enables the creation of pattern-specific optimization strategies. These can capture more complex variations than would be accessible by analysis of equivalent unstructured source code. These variations create complex optimization spaces. Machine learning offers well established techniques for exploring such spaces. In this thesis we use machine learning to create autotuning strategies for heterogeneous parallel implementations of applications which follow the wavefront pattern. In a wavefront, computation starts from one corner of the problem grid and proceeds diagonally like a wave to the opposite corner in either two or three dimensions. Our framework partitions and optimizes the work created by these applications across systems comprising multicore CPUs and multiple GPU accelerators. The tuning opportunities for a wavefront include controlling the amount of computation to be offloaded onto GPU accelerators, choosing the number of CPU and GPU threads to process tasks, tiling for both CPU and GPU memory structures, and trading redundant halo computation against communication for multiple GPUs. Our exhaustive search of the problem space shows that these parameters are very sensitive to the combination of architecture, wavefront instance and problem size. We design and investigate a family of autotuning strategies, targeting single and multiple CPU + GPU systems, and both two and three dimensional wavefront instances. These yield an average of 87% of the performance found by offline exhaustive search, with up to 99% in some cases. 004
290	Exploiting Application Characteristics for Efficient System Support of Data-Parallel Machine Learning Cui, Henggang 01 May 2017 (has links) Large scale machine learning has many characteristics that can be exploited in the system designs to improve its efficiency. This dissertation demonstrates that the characteristics of the ML computations can be exploited in the design and implementation of parameter server systems, to greatly improve the efficiency by an order of magnitude or more. We support this thesis statement with three case study systems, IterStore, GeePS, and MLtuner. IterStore is an optimized parameter server system design that exploits the repeated data access pattern characteristic of ML computations. The designed optimizations allow IterStore to reduce the total run time of our ML benchmarks by up to 50×. GeePS is a parameter server that is specialized for deep learning on distributed GPUs. By exploiting the layer-by-layer data access and computation pattern of deep learning, GeePS provides almost linear scalability from single-machine baselines (13× more training throughput with 16 machines), and also supports neural networks that do not fit in GPU memory. MLtuner is a system for automatically tuning the training tunables of ML tasks. It exploits the characteristic that the best tunable settings can often be decided quickly with just a short trial time. By making use of optimization-guided online trial-and-error, MLtuner can robustly find and re-tune tunable settings for a variety of machine learning applications, including image classification, video classification, and matrix factorization, and is over an order of magnitude faster than traditional hyperparameter tuning approaches. Big Data Analytics Large-Scale Machine Learning

Search results