Global ETD Search

541	Extension of Machine Learning Model for Dynamic Risk Analysis Seifert, Björn January 2021 (has links) During this study a model for predicting the next week's alarm codes based on the past week's alarm codes was developed. The model used alarm data from the location and its surroundings. The model was tuned using hyper parameter optimization to improve performance, this resulted in a model performing better than previous models used on this data set. The performance when adding weather data was evaluated and it was shown that it improved the performance for some alarm codes and the performance for the majority of other alarm codes was not compromised resulting in an improvement in the overall performance. The weather data consisted of temperature, precipitation, cloud coverage, air pressure and wind direction and speed data. Two labeling methods were trialed for the weather data, the first one used the data of the closest weather station for each type of data. The second labeling method used data of the ten closest weather stations within 100 km. The final model using weather data labeled with method 2 had a precision micro average of 0.90, a recall micro average of 0.86, a precision macro average of 0.80 and a recall macro average of 0.77. Machine Learning Engineering and Technology Teknik och teknologier
542	Machine Learning Identification of Protein Properties Useful for Specific Applications Khamis, Abdullah M. 31 March 2016 (has links) Proteins play critical roles in cellular processes of living organisms. It is therefore important to identify and characterize their key properties associated with their functions. Correlating protein’s structural, sequence and physicochemical properties of its amino acids (aa) with protein functions could identify some of the critical factors governing the specific functionality. We point out that not all functions of even well studied proteins are known. This, complemented by the huge increase in the number of newly discovered and predicted proteins, makes challenging the experimental characterization of the whole spectrum of possible protein functions for all proteins of interest. Consequently, the use of computational methods has become more attractive. Here we address two questions. The first one is how to use protein aa sequence and physicochemical properties to characterize a family of proteins. The second one focuses on how to use transcription factor (TF) protein’s domains to enhance accuracy of predicting TF DNA binding sites (TFBSs). To address the first question, we developed a novel method using computational representation of proteins based on characteristics of different protein regions (N-terminal, M-region and C-terminal) and combined these with the properties of protein aa sequences. We show that this description provides important biological insight about characterization of the protein functional groups. Using feature selection techniques, we identified key properties of proteins that allow for very accurate characterization of different protein families. We demonstrated efficiency of our method in application to a number of antimicrobial peptide families. To address the second question we developed another novel method that uses a combination of aa properties of DNA binding domains of TFs and their TFBS properties to develop machine learning models for predicting TFBSs. Feature selection is used to identify the most relevant characteristics of the aa for such modeling. In addition to reducing the number of required models to only 14 for several hundred TFs, the final prediction accuracy of our models appears dramatically better than with other methods. Overall, we show how to efficiently utilize properties of proteins in deriving more accurate solutions for two important problems of computational biology and bioinformatics. Machine Learning feature selection protein properties Bioinformatics
543	Evaluating and enhancing the security of cyber physical systems using machine learning approaches Sharma, Mridula 08 April 2020 (has links) The main aim of this dissertation is to address the security issues of the physical layer of Cyber Physical Systems. The network security is first assessed using a 5-level Network Security Evaluation Scheme (NSES). The network security is then enhanced using a novel Intrusion Detection System that is designed using Supervised Machine Learning. Defined as a complete architecture, this framework includes a complete packet analysis of radio traffic of Routing Protocol for Low-Power and Lossy Networks (RPL). A dataset of 300 different simulations of RPL network is defined for normal traffic, hello flood attack, DIS attack, increased version attack and decreased rank attack. The IDS is a multi-model detection model that provides an efficient detection against the known as well as new attacks. The model analysis is done with the cross-validation method as well as using the new data from a similar network. To detect the known attacks, the model performed at 99% accuracy rate and for the new attack, 85% accuracy is achieved. / Graduate CPS Supervised Machine Learning RPL Feature Selection
544	Automation of Medical Underwriting by Appliance of Machine Learning / AUTOMATISERING AV FÖRSÄKRINGSMEDICINSK UTREDNING GENOM TILLÄMPNING AV MASKININLÄRNING Rosén, Henrik January 2020 (has links) One of the most important fields regarding growth and development for mostorganizations today is the digitalization, or digital transformation. The offering oftechnological solutions to enhance existing, or create new, processes or products isemerging. That is, it’s of great importance that organizations continuously affirm thepotential of applying new technical solutions into their existing processes. For example, a well implemented AI solution for automation of an existing process is likely tocontribute with considerable business value.Medical underwriting for individual insurances, which is the process consideredin this project, is all about risk assessment based on the individuals medical record.Such task appears well suited for automation by a machine learning based applicationand would thereby contribute with substantial business value. However, to make aproper replacement of a manual decision making process, no important informationmight be excluded, which becomes rather challenging due to the fact that a considerable fraction of the information the medical records consists of unstructured textdata. In addition, the underwriting process is extremely sensible to mistakes regarding unnecessarily approve insurances where an enhanced risk of future claims can beassessed.Three algorithms, Logistic Regression, XGBoost and a Deep Learning model, wereevaluated on training data consisting of the medical records structured data from categorical and numerical answers, the text data as TF-IDF observation vectors, and acombination of both subsets of features. The XGBoost were the classifier performingbest according to the key metric, a pAUC over an FPR from 0 to 0.03.There is no question about the substantial importance of not to disregard anytype of information from the medical records when developing machine learning classifiers to predict the medical underwriting outcomes. At a very risk conservative andperformance pessimistic approach the best performing classifier did manage, if consider only the group of youngest kids (50% of sample), to recall close to 50% of allstandard risk applications at a false positive rate of 2%, when both structured andtext data were considered. Even though the structured data accounts for most of theexplanatory ability it becomes clear that the inclusive of the text data as TF-IDF observation vectors make for the differences needed to potentially generate a positivenet present value to an implementation of the model Mathematics Matematik
545	Caracterización de música según emociones y complejidad, utilizando RNN-LSTM y teoría de la información, para analizar sus efectos sobre la empatía hacia el dolor Peña Peña, Leonardo Ismael January 2019 (has links) Memoria para optar al título de Ingeniero Civil Eléctrico / La empatía en la humanidad es un elemento fundamental para construir una sociedad justa. A su vez la empatía puede ser modulada por diferentes factores, como la emoción que tiene un individuo. Por ende la música, como detonante de emociones en el humano, es capaz de modular la empatía. Al mismo tiempo, se postula que la complejidad que tiene la música, en conjunto con la capacidad que tiene un individuo para percibir diferentes grados de complejidad de ésta, podría modular también, en el cerebro, la respuesta empática que tienen las personas. Se propone en este trabajo diferentes medidores para evaluar la emoción y la complejidad que tienen ciertas piezas musicales. Esta información se pone a disposición, junto con el diseño de un experimento que las utiliza, a la investigación psicológica acerca de el efecto de la musica en la respuesta empática de las personas. En cuanto a las emociones, se presenta un enfoque que utiliza aprendizaje de máquinas, específicamente RNN-LSTM para la predicción de las emociones que evoca la música en un sujeto mientras la escucha. En dicho trabajo se obtuvo 0.8 en el promedio de los errores de test. Por otro lado, en lo referente a las complejidades, se aplican a diferentes repertorios de música clásica de los siglos XVII y XVIII, diferentes medidas de la información, tales como la entropía de primer orden, la entropía condicional y entropía normalizada, para luego, en base a un análisis cualitativo, evaluar qué medida, aplicada a que aspecto de las partituras de cada repertorio, es el que mejor representa la complejidad en la música, resultando que es la entropía condicional, la cual posiciona a "El clavecín bien temperado" de Bach como el repertorio más complejo y a "Los cuartetos de barbería" como el menos complejo. En el experimento propuesto se toman dichas características y se realiza un EEG mientras los sujetos escuchan la música caracterizada y ven imágenes con y sin contenido de dolor, además de que responden cuestionarios relacionados a la empatía y a la música. Con esta información se espera verificar la existencia de algún tipo de correlación entre las características extraídas de la música y la respuesta empática hacia el dolor. En síntesis, este trabajo intenta fundamentalmente aportar herramientas ingenieriles a la investigación acerca de cómo afecta la música en la respuesta empática de las personas. Teoría de la información Música Emociones Machine learning
546	MixUp as Directional Adversarial Training: A Unifying Understanding of MixUp and Adversarial Training Perrault Archambault, Guillaume 29 April 2020 (has links) This thesis aims to contribute to the field of neural networks by improving upon the performance of a state-of-the-art regularization scheme called MixUp, and by contributing to the conceptual understanding of MixUp. MixUp is a data augmentation scheme in which pairs of training samples and their corresponding labels are mixed using linear coefficients. Without label mixing, MixUp becomes a more conventional scheme: input samples are moved but their original labels are retained. Because samples are preferentially moved in the direction of other classes we refer to this method as directional adversarial training, or DAT. We show that under two mild conditions, MixUp asymptotically convergences to a subset of DAT. We define untied MixUp (UMixUp), a superset of MixUp wherein training labels are mixed with different linear coefficients to those of their corresponding samples. We show that under the same mild conditions, untied MixUp converges to the entire class of DAT schemes. Motivated by the understanding that UMixUp is both a generalization of MixUp and a scheme possessing adversarial-training properties, we experiment with different datasets and loss functions to show that UMixUp provides improves performance over MixUp. In short, we present a novel interpretation of MixUp as belonging to a class highly analogous to adversarial training, and on this basis we introduce a simple generalization which outperforms MixUp. Adversarial training MixUp Neural Networks Machine Learning
547	Prediction of disease spread phenomena in large dynamic topology with application to malware detection in ad hoc networks Nadra M Guizani (8848631) 18 May 2020 (has links) Prediction techniques based on data are applied in a broad range of applications such as bioinformatics, disease spread, and mobile intrusion detection, just to name a few. With the rapid emergence of on-line technologies numerous techniques for collecting and storing data for prediction-based analysis have been proposed in the literature. With the growing size of global population, the spread of epidemics is increasing at an alarming rate. Consequently, public and private health care officials are in a dire need of developing technological solutions for managing epidemics. Most of the existing syndromic surveillance and disease detection systems deal with a small portion of a real dataset. From the communication network perspective, the results reported in the literature generally deal with commonly known network topologies. Scalability of a disease detection system is a real challenge when it comes to modeling and predicting disease spread across a large population or large scale networks. In this dissertation, we address this challenge by proposing a hierarchical aggregation approach that classifies a dynamic disease spread phenomena at different scalability levels. Specifically, we present a finite state model (SEIR-FSM) for predicting disease spread, the model manifests itself into three different levels of data aggregation and accordingly makes prediction of disease spread at various scales. We present experimental results of this model for different disease spread behaviors on all levels of granularity. Subsequently, we present a mechanism for mapping the population interaction network model to a wireless mobile network topology. The objective is to analyze the phenomena of malware spread based on vulnerabilities. The goal is to develop and evaluate a wireless mobile intrusion detection system that uses a Hidden Markov model in connection with the FSM disease spread model (HMM-FSM). Subsequently, we propose a software-based architecture that acts as a network function virtualization (NFV) to combat malware spread in IoT based networks. Taking advantage of the NFV infrastructure's potential to provide new security solutions for IoT environments to combat malware attacks. We propose a scalable and generalized IDS that uses a Recurrent Neural Network Long Short Term Memory (RNN-LSTM) learning model for predicting malware attacks in a timely manner for the NFV to deploy the appropriate countermeasures. The analysis utilizes the susceptible (S), exposed (E), infected (I), and resistant (R) (SEIR) model to capture the dynamics of the spread of the malware attack and subsequently provide a patching mechanism for the network. Our analysis focuses primarily on the feasibility and the performance evaluation of the NFV RNN-LSTM proposed model. Computer Engineering machine learning-based decentralized networks
548	Community Detection in Social Networks: Multilayer Networks and Pairwise Covariates Huang, Sihan January 2020 (has links) Community detection is one of the most fundamental problems in network study. The stochastic block model (SBM) is arguably the most studied model for network data with different estimation methods developed with their community detection consistency results unveiled. Due to its stringent assumptions, SBM may not be suitable for many real-world problems. In this thesis, we present two approaches that incorporate extra information compared with vanilla SBM to help improve community detection performance and be suitable for applications. One approach is to stack multilayer networks that are composed of multiple single-layer networks with common community structure. Numerous methods have been proposed based on spectral clustering, but most rely on optimizing an objective function while the associated theoretical properties remain to be largely unexplored. We focus on the `early fusion' method, of which the target is to minimize the spectral clustering error of the weighted adjacency matrix (WAM). We derive the optimal weights by studying the asymptotic behavior of eigenvalues and eigenvectors of the WAM. We show that the eigenvector of WAM converges to a normal distribution, and the clustering error is monotonically decreasing with the eigenvalue gap. This fact reveals the intrinsic link between eigenvalues and eigenvectors, and thus the algorithm will minimize the clustering error by maximizing the eigenvalue gap. The numerical study shows that our algorithm outperforms other state-of-art methods significantly, especially when signal-to-noise ratios of layers vary widely. Our algorithm also yields higher accuracy result for S&P 1500 stocks dataset than competing models. The other approach we propose is to consider heterogeneous connection probabilities to remove the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariates information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to fit PCABM efficiently. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model under a wide range of simulated and real networks when covariate information is accessible. Statistics Machine learning Social networks Algorithms--Evaluation
549	Churn Prediction Åkermark, Alexander, Hallefält, Mattias January 2019 (has links) Churn analysis is an important tool for companies as it can reduce the costs that are related to customer churn. Churn prediction is the process of identifying users before they churn, this is done by implementing methods on collected data in order to ﬁnd patterns that can be helpful when predicting new churners in the future.The objective of this report is to identify churners with the use of surveys collected from diﬀerent golfclubs, their members and guests. This was accomplished by testing several diﬀerent supervised machine learning algorithms in order to ﬁnd the diﬀerent classes and to see which supervised algorithms are most suitable for this kind of data.The margin of success was to have a greater accuracy than the percentage of major class in the datasetThe data was processed using label encoding, ONE-hot encoding and principal component analysis and was split into 10 folds, 9 training folds and 1 testing fold ensuring cross validation when iterated 10 times rearranging the test and training folds. Each algorithm processed the training data to create a classiﬁer which was tested on the test data.The classiﬁers used for the project was K nearest neighbours, Support vector machine, multi-layer perceptron, decision trees and random forest.The diﬀerent classiﬁers generally had an accuracy of around 72% and the best classiﬁer which was random forest had an accuracy of 75%. All the classiﬁers had an accuracy above the margin of success.K-folding, confusion-matrices, classiﬁcation report and other internal crossvalidation techniques were performed on the the data to ensure the quality of the classiﬁer.The project was a success although there is a strong belief that the bottleneck for the project was the quality of the data in terms of new legislation when collecting and storing data that results in redundant and faulty data. / Churn analys är ett viktigt verktyg för företag då det kan reducera kostnaderna som är relaterade till kund churn. Churn prognoser är processen av att identiﬁera användare innan de churnas, detta är gjort med implementering av metoder på samlad data för att hitta mönster som är hjälpsamma när framtida användare ska prognoseras. Objektivet med denna rapport är att identiﬁera churnare med användning av enkäter samlade från golfklubbar och deras kunder och gäster. Det är uppnå att igenom att testa ﬂera olika kontrollerade maskinlärnings algoritmer för att jämföra vilken algoritm som passar bäst. Felmarginalen uppgick till att ha en större träﬀsäkerhet än procenthalten av den dominanta klassen i datasetet. Datan behandlades med label encoding, ONE-hot encoding och principial komponent analys och delades upp i 10 delar, 9 träning och 1 test del för att säkerställa korsvalidering. Varje algoritm behandlade träningsdatan för att skapa att klassiﬁerare som sedan testades på test datan. Klassiﬁerarna som användes för projekted innefattar K nearest neighbours, Support vector machine, multi-layer perceptron, decision trees och random forest. De olika klassiﬁerarna hade en generell träﬀssäkerhet omkring 72%, där den bästa var random forest med en träﬀssäkerhet på 75%. Alla klassiﬁerare hade en träffsäkerhet än den felmarginal som st¨alldes. K-folding, confusion matrices, classiﬁcation report och andra interna korsvaliderings tekniker användes för att säkerställa kvaliteten på klassiﬁeraren. Projektet var lyckat, men det ﬁnns misstanke om att ﬂaskhalsen för projektet låg inom kvaliteten på datan med hänsyn på villkor för ny lagstiftning vid insamling och lagring av data som leder till överﬂödiga och felaktiga uppgifter. Churn Prediction Machine Learning Computer Systems Datorsystem
550	Stress-Aware Personalized Road Navigation System Mandorah, Obai 16 December 2019 (has links) Driving can be a stressful task, especially under congestion conditions. Several studies have shown a positive correlation between stress and aggressive behaviour behind the wheel, leading to accidents. One common way to minimize stress while driving is to avoid highly congested roads. However, not all drivers show the same response towards high traffic situations or other road conditions. For instance, some drivers may prefer congested routes to longer ones to minimize travel time. Increasingly, drivers are employing Advanced Traveller Information Systems while commuting to both familiar and unfamiliar destinations, not just to obtain information on how to reach a certain endpoint, but to acquire real-time data on the state of the roads and avoid undesired traffic conditions. In this thesis, we propose an Advanced Traveller Information System that personalizes the driver’s route using their road preferences and measures their physiological signals during the trip to assess mental stress. The system then links road attributes, such as number of lanes, speed limit, and traffic severity, with the driver’s stress levels. Then, it uses machine learning to predict their stress levels on similar roads. Hence, routes that contribute to high-levels of stress can therefore be avoided in future trips. The average accuracy of the proposed stress level prediction model is 76.11%. Routing Stress Machine Learning Heart Rate Variability

Search results