Global ETD Search

201	Computational Analysis of Metabolomic Toxicological Data Derived from NMR Spectroscopy Kelly, Benjamin J. 26 May 2009 (has links) No description available. Bioinformatics Computer Science Statistics Toxicology metabolomics toxicology NMR spectroscopy O-PLS-DA feature selection pattern recognition
202	Sparse Multinomial Logistic Regression via Approximate Message Passing Byrne, Evan Michael 14 October 2015 (has links) No description available. Electrical Engineering multinomial logistic regression multiclass linear classification approximate message passing feature selection
203	Multiple-Instance Feature Ranking Latham, Andrew C. 26 January 2016 (has links) No description available. Computer Science Machine Learning Feature Selection Feature Ranking Multiple-Instance Learning
204	A Genetic Algorithm Approach to Feature Selection for Computer Aided Detection of Lung Nodules Sprague, Matthew J. January 2016 (has links) No description available. Electrical Engineering genetic algorithm feature selection lung nodule detection computed tomography chest radiography
205	Applied Machine Learning : A case study in machine learning in the paper industry / Tillämpad maskininlärning : En fallstudie om maskininlärning i pappersindustrin Sjögren, Anton, Quan, Baiwei January 2022 (has links) With the rapid advancement of hardware and software technologies, machine learning has been pushed to the forefront of business value generating technologies. More and more businesses start to invest in machine learning to keep up with those that have already benefited from it. A local paper processing business is looking to improve upon the estimation of each order's runtime on the machines by leveraging the machine learning technologies. Traditionally, the predictions are done by experienced planners, but the actual runtimes do not always match the predictions. This thesis conducted an investigation about whether a machine learning model could be built to produce better estimations on behalf of the local business. By following a well-defined machine learning workflow in combination with Microsoft's AutoML model builder and data processing techniques, the result shows that predictions made by the machine learning model are able to perform better than the human made ones within an accepted margin. Machine learning Microsoft AutoML feature selection regression model Computer Sciences Datavetenskap (datalogi)
206	Graph theory applications in the energy sector : From the perspective of electric utility companies Espinosa, Kristofer, Vu, Tam January 2020 (has links) Graph theory is a mathematical study of objects and their pairwise relations, also known as nodes and edges. The birth of graph theory is often considered to take place in 1736 when Leonhard Euler tried to solve a problem involving seven bridges of Königsberg in Prussia. In more recent times, graphs has caught the attention of companies from many industries due to its power of modelling and analysing large networks. This thesis investigates the usage of graph theory in the energy sector for a utility company, in particular Fortum whose activities consist of, but not limited to, production and distribution of electricity and heat. The output of the thesis is a wide overview of graph-theoretic concepts and their applications, as well as an evaluation of energy-related use-cases where some concepts are put into deeper analysis. The chosen use-case within the scope of this thesis is feature selection for electricity price forecasting. Feature selection is a process for reducing the number of features, also known as input variables, typically before a regression model is built to avoid overfitting and to increase model interpretability. Five graph-based feature selection methods with different points of view are studied. Experiments are conducted on realistic data sets with many features to verify the legitimacy of the methods. One of the data sets is owned by Fortum and used for forecasting the electricity price, among other important quantities. The obtained results look promising according to several evaluation metrics and can be used by Fortum as a support tool to develop prediction models. In general, a utility company can likely take advantage graph theory in many ways and add value to their business with enriched mathematical knowledge. / Grafteori är ett matematiskt område där objekt och deras parvisa relationer, även kända som noder respektive kanter, studeras. Grafteorins födsel anses ofta ha ägt rum år 1736 när Leonhard Euler försökte lösa ett problem som involverade sju broar i Königsberg i Preussen. På senare tid har grafer fått uppmärksamhet från företag inom flera branscher på grund av dess kraft att modellera och analysera stora nätverk. Detta arbete undersöker användningen av grafteori inom energisektorn för ett allmännyttigt företag, närmare bestämt Fortum, vars verksamhet består av, men inte är begränsad till, produktion och distribution av el och värme. Arbetet resulterar i en bred genomgång av grafteoretiska begrepp och deras tillämpningar inom både allmänna tekniska sammanhang och i synnerhet energisektorn, samt ett fallstudium där några begrepp sätts in i en djupare analys. Den valda fallstudien inom ramen för arbetet är variabelselektering för elprisprognostisering. Variabelselektering är en process för att minska antalet ingångsvariabler, vilket vanligtvis genomförs innan en regressions- modell skapas för att undvika överanpassning och öka modellens tydbarhet. Fem grafbaserade metoder för variabelselektering med olika ståndpunkter studeras. Experiment genomförs på realistiska datamängder med många ingångsvariabler för att verifiera metodernas giltighet. En av datamängderna ägs av Fortum och används för att prognostisera elpriset, bland andra viktiga kvantiteter. De erhållna resultaten ser lovande ut enligt flera utvärderingsmått och kan användas av Fortum som ett stödverktyg för att utveckla prediktionsmodeller. I allmänhet kan ett energiföretag sannolikt dra fördel av grafteori på många sätt och skapa värde i sin affär med hjälp av berikad matematisk kunskap graph theory feature selection energy industry grafteori variabelselektering energiindustri Engineering and Technology Teknik och teknologier
207	LEARNING FROM INCOMPLETE HIGH-DIMENSIONAL DATA Lou, Qiang January 2013 (has links) Data sets with irrelevant and redundant features and large fraction of missing values are common in the real life application. Learning such data usually requires some preprocess such as selecting informative features and imputing missing values based on observed data. These processes can provide more accurate and more efficient prediction as well as better understanding of the data distribution. In my dissertation I will describe my work in both of these aspects and also my following up work on feature selection in incomplete dataset without imputing missing values. In the last part of my dissertation, I will present my current work on more challenging situation where high-dimensional data is time-involving. The first two parts of my dissertation consist of my methods that focus on handling such data in a straightforward way: imputing missing values first, and then applying traditional feature selection method to select informative features. We proposed two novel methods, one for imputing missing values and the other one for selecting informative features. We proposed a new method that imputes the missing attributes by exploiting temporal correlation of attributes, correlations among multiple attributes collected at the same time and space, and spatial correlations among attributes from multiple sources. The proposed feature selection method aims to find a minimum subset of the most informative variables for classification/regression by efficiently approximating the Markov Blanket which is a set of variables that can shield a certain variable from the target. I present, in the third part, how to perform feature selection in incomplete high-dimensional data without imputation, since imputation methods only work well when data is missing completely at random, when fraction of missing values is small, or when there is prior knowledge about the data distribution. We define the objective function of the uncertainty margin-based feature selection method to maximize each instance's uncertainty margin in its own relevant subspace. In optimization, we take into account the uncertainty of each instance due to the missing values. The experimental results on synthetic and 6 benchmark data sets with few missing values (less than 25%) provide evidence that our method can select the same accurate features as the alternative methods which apply an imputation method first. However, when there is a large fraction of missing values (more than 25%) in data, our feature selection method outperforms the alternatives, which impute missing values first. In the fourth part, I introduce my method handling more challenging situation where the high-dimensional data varies in time. Existing way to handle such data is to flatten temporal data into single static data matrix, and then applying traditional feature selection method. In order to keep the dynamics in the time series data, our method avoid flattening the data in advance. We propose a way to measure the distance between multivariate temporal data from two instances. Based on this distance, we define the new objective function based on the temporal margin of each data instance. A fixed-point gradient descent method is proposed to solve the formulated objective function to learn the optimal feature weights. The experimental results on real temporal microarray data provide evidence that the proposed method can identify more informative features than the alternatives that flatten the temporal data in advance. / Computer and Information Science Computer Science Data Mining Feature Selection High-dimensional Data Incomplete Data Machine Learning
208	Solar flare prediction using advanced feature extraction, machine learning and feature selection Ahmed, Omar W., Qahwaji, Rami S.R., Colak, Tufan, Higgins, P.A., Gallagher, P.T., Bloomfield, D.S. 03 1900 (has links) Yes / Novel machine-learning and feature-selection algorithms have been developed to study: (i) the flare prediction capability of magnetic feature (MF) properties generated by the recently developed Solar Monitor Active Region Tracker (SMART); (ii) SMART's MF properties that are most significantly related to flare occurrence. Spatio-temporal association algorithms are developed to associate MFs with flares from April 1996 to December 2010 in order to differentiate flaring and non-flaring MFs and enable the application of machine learning and feature selection algorithms. A machine-learning algorithm is applied to the associated datasets to determine the flare prediction capability of all 21 SMART MF properties. The prediction performance is assessed using standard forecast verification measures and compared with the prediction measures of one of the industry's standard technologies for flare prediction that is also based on machine learning - Automated Solar Activity Prediction (ASAP). The comparison shows that the combination of SMART MFs with machine learning has the potential to achieve more accurate flare prediction than ASAP. Feature selection algorithms are then applied to determine the MF properties that are most related to flare occurrence. It is found that a reduced set of 6 MF properties can achieve a similar degree of prediction accuracy as the full set of 21 SMART MF properties.
209	Unsupervised Learning for Feature Selection: A Proposed Solution for Botnet Detection in 5G Networks Lefoane, Moemedi, Ghafir, Ibrahim, Kabir, Sohag, Awan, Irfan U. 01 August 2022 (has links) Yes / The world has seen exponential growth in deploying Internet of Things (IoT) devices. In recent years, connected IoT devices have surpassed the number of connected non-IoT devices. The number of IoT devices continues to grow and they are becoming a critical component of the national infrastructure. IoT devices' characteristics and inherent limitations make them attractive targets for hackers and cyber criminals. Botnet attack is one of the serious threats on the Internet today. This article proposes pattern-based feature selection methods as part of a machine learning (ML) based botnet detection system. Specifically, two methods are proposed: the first is based on the most dominant pattern feature values and the second is based on Maximal Frequent Itemset (MFI) mining. The proposed feature selection method uses Gini Impurity (GI) and an unsupervised clustering method to select the most influential features automatically. The evaluation results show that the proposed methods have improved the performance of the detection system. The developed system has a True Positive Rate (TPR) of 100% and a False Positive Rate (FPR) of 0% for best performing models. In addition, the proposed methods reduce the computational cost of the system as evidenced by the detection speed of the system. Botnet attack Internet of Things Network security Intrusion detection system Machine learning Feature selection
210	Flight Data Processing Techniques to Identify Unusual Events Mugtussids, Iossif B. 26 June 2000 (has links) Modern aircraft are capable of recording hundreds of parameters during flight. This fact not only facilitates the investigation of an accident or a serious incident, but also provides the opportunity to use the recorded data to predict future aircraft behavior. It is believed that, by analyzing the recorded data, one can identify precursors to hazardous behavior and develop procedures to mitigate the problems before they actually occur. Because of the enormous amount of data collected during each flight, it becomes necessary to identify the segments of data that contain useful information. The objective is to distinguish between typical data points, that are present in the majority of flights, and unusual data points that can be only found in a few flights. The distinction between typical and unusual data points is achieved by using classification procedures. In this dissertation, the application of classification procedures to flight data is investigated. It is proposed to use a Bayesian classifier that tries to identify the flight from which a particular data point came. If the flight from which the data point came is identified with a high level of confidence, then the conclusion that the data point is unusual within the investigated flights can be made. The Bayesian classifier uses the overall and conditional probability density functions together with a priori probabilities to make a decision. Estimating probability density functions is a difficult task in multiple dimensions. Because many of the recorded signals (features) are redundant or highly correlated or are very similar in every flight, feature selection techniques are applied to identify those signals that contain the most discriminatory power. In the limited amount of data available to this research, twenty five features were identified as the set exhibiting the best discriminatory power. Additionally, the number of signals is reduced by applying feature generation techniques to similar signals. To make the approach applicable in practice, when many flights are considered, a very efficient and fast sequential data clustering algorithm is proposed. The order in which the samples are presented to the algorithm is fixed according to the probability density function value. Accuracy and reduction level are controlled using two scalar parameters: a distance threshold value and a maximum compactness factor. / Ph. D. Pattern Recognition Flight Data Recorders Flight Data Analysis Feature Generation Clustering Feature Selection Classification Bayes' Classifier

Search results