Global ETD Search

541	Automation of Medical Underwriting by Appliance of Machine Learning / AUTOMATISERING AV FÖRSÄKRINGSMEDICINSK UTREDNING GENOM TILLÄMPNING AV MASKININLÄRNING Rosén, Henrik January 2020 (has links) One of the most important fields regarding growth and development for mostorganizations today is the digitalization, or digital transformation. The offering oftechnological solutions to enhance existing, or create new, processes or products isemerging. That is, it’s of great importance that organizations continuously affirm thepotential of applying new technical solutions into their existing processes. For example, a well implemented AI solution for automation of an existing process is likely tocontribute with considerable business value.Medical underwriting for individual insurances, which is the process consideredin this project, is all about risk assessment based on the individuals medical record.Such task appears well suited for automation by a machine learning based applicationand would thereby contribute with substantial business value. However, to make aproper replacement of a manual decision making process, no important informationmight be excluded, which becomes rather challenging due to the fact that a considerable fraction of the information the medical records consists of unstructured textdata. In addition, the underwriting process is extremely sensible to mistakes regarding unnecessarily approve insurances where an enhanced risk of future claims can beassessed.Three algorithms, Logistic Regression, XGBoost and a Deep Learning model, wereevaluated on training data consisting of the medical records structured data from categorical and numerical answers, the text data as TF-IDF observation vectors, and acombination of both subsets of features. The XGBoost were the classifier performingbest according to the key metric, a pAUC over an FPR from 0 to 0.03.There is no question about the substantial importance of not to disregard anytype of information from the medical records when developing machine learning classifiers to predict the medical underwriting outcomes. At a very risk conservative andperformance pessimistic approach the best performing classifier did manage, if consider only the group of youngest kids (50% of sample), to recall close to 50% of allstandard risk applications at a false positive rate of 2%, when both structured andtext data were considered. Even though the structured data accounts for most of theexplanatory ability it becomes clear that the inclusive of the text data as TF-IDF observation vectors make for the differences needed to potentially generate a positivenet present value to an implementation of the model Mathematics Matematik
542	Caracterización de música según emociones y complejidad, utilizando RNN-LSTM y teoría de la información, para analizar sus efectos sobre la empatía hacia el dolor Peña Peña, Leonardo Ismael January 2019 (has links) Memoria para optar al título de Ingeniero Civil Eléctrico / La empatía en la humanidad es un elemento fundamental para construir una sociedad justa. A su vez la empatía puede ser modulada por diferentes factores, como la emoción que tiene un individuo. Por ende la música, como detonante de emociones en el humano, es capaz de modular la empatía. Al mismo tiempo, se postula que la complejidad que tiene la música, en conjunto con la capacidad que tiene un individuo para percibir diferentes grados de complejidad de ésta, podría modular también, en el cerebro, la respuesta empática que tienen las personas. Se propone en este trabajo diferentes medidores para evaluar la emoción y la complejidad que tienen ciertas piezas musicales. Esta información se pone a disposición, junto con el diseño de un experimento que las utiliza, a la investigación psicológica acerca de el efecto de la musica en la respuesta empática de las personas. En cuanto a las emociones, se presenta un enfoque que utiliza aprendizaje de máquinas, específicamente RNN-LSTM para la predicción de las emociones que evoca la música en un sujeto mientras la escucha. En dicho trabajo se obtuvo 0.8 en el promedio de los errores de test. Por otro lado, en lo referente a las complejidades, se aplican a diferentes repertorios de música clásica de los siglos XVII y XVIII, diferentes medidas de la información, tales como la entropía de primer orden, la entropía condicional y entropía normalizada, para luego, en base a un análisis cualitativo, evaluar qué medida, aplicada a que aspecto de las partituras de cada repertorio, es el que mejor representa la complejidad en la música, resultando que es la entropía condicional, la cual posiciona a "El clavecín bien temperado" de Bach como el repertorio más complejo y a "Los cuartetos de barbería" como el menos complejo. En el experimento propuesto se toman dichas características y se realiza un EEG mientras los sujetos escuchan la música caracterizada y ven imágenes con y sin contenido de dolor, además de que responden cuestionarios relacionados a la empatía y a la música. Con esta información se espera verificar la existencia de algún tipo de correlación entre las características extraídas de la música y la respuesta empática hacia el dolor. En síntesis, este trabajo intenta fundamentalmente aportar herramientas ingenieriles a la investigación acerca de cómo afecta la música en la respuesta empática de las personas. Teoría de la información Música Emociones Machine learning
543	MixUp as Directional Adversarial Training: A Unifying Understanding of MixUp and Adversarial Training Perrault Archambault, Guillaume 29 April 2020 (has links) This thesis aims to contribute to the field of neural networks by improving upon the performance of a state-of-the-art regularization scheme called MixUp, and by contributing to the conceptual understanding of MixUp. MixUp is a data augmentation scheme in which pairs of training samples and their corresponding labels are mixed using linear coefficients. Without label mixing, MixUp becomes a more conventional scheme: input samples are moved but their original labels are retained. Because samples are preferentially moved in the direction of other classes we refer to this method as directional adversarial training, or DAT. We show that under two mild conditions, MixUp asymptotically convergences to a subset of DAT. We define untied MixUp (UMixUp), a superset of MixUp wherein training labels are mixed with different linear coefficients to those of their corresponding samples. We show that under the same mild conditions, untied MixUp converges to the entire class of DAT schemes. Motivated by the understanding that UMixUp is both a generalization of MixUp and a scheme possessing adversarial-training properties, we experiment with different datasets and loss functions to show that UMixUp provides improves performance over MixUp. In short, we present a novel interpretation of MixUp as belonging to a class highly analogous to adversarial training, and on this basis we introduce a simple generalization which outperforms MixUp. Adversarial training MixUp Neural Networks Machine Learning
544	Prediction of disease spread phenomena in large dynamic topology with application to malware detection in ad hoc networks Nadra M Guizani (8848631) 18 May 2020 (has links) Prediction techniques based on data are applied in a broad range of applications such as bioinformatics, disease spread, and mobile intrusion detection, just to name a few. With the rapid emergence of on-line technologies numerous techniques for collecting and storing data for prediction-based analysis have been proposed in the literature. With the growing size of global population, the spread of epidemics is increasing at an alarming rate. Consequently, public and private health care officials are in a dire need of developing technological solutions for managing epidemics. Most of the existing syndromic surveillance and disease detection systems deal with a small portion of a real dataset. From the communication network perspective, the results reported in the literature generally deal with commonly known network topologies. Scalability of a disease detection system is a real challenge when it comes to modeling and predicting disease spread across a large population or large scale networks. In this dissertation, we address this challenge by proposing a hierarchical aggregation approach that classifies a dynamic disease spread phenomena at different scalability levels. Specifically, we present a finite state model (SEIR-FSM) for predicting disease spread, the model manifests itself into three different levels of data aggregation and accordingly makes prediction of disease spread at various scales. We present experimental results of this model for different disease spread behaviors on all levels of granularity. Subsequently, we present a mechanism for mapping the population interaction network model to a wireless mobile network topology. The objective is to analyze the phenomena of malware spread based on vulnerabilities. The goal is to develop and evaluate a wireless mobile intrusion detection system that uses a Hidden Markov model in connection with the FSM disease spread model (HMM-FSM). Subsequently, we propose a software-based architecture that acts as a network function virtualization (NFV) to combat malware spread in IoT based networks. Taking advantage of the NFV infrastructure's potential to provide new security solutions for IoT environments to combat malware attacks. We propose a scalable and generalized IDS that uses a Recurrent Neural Network Long Short Term Memory (RNN-LSTM) learning model for predicting malware attacks in a timely manner for the NFV to deploy the appropriate countermeasures. The analysis utilizes the susceptible (S), exposed (E), infected (I), and resistant (R) (SEIR) model to capture the dynamics of the spread of the malware attack and subsequently provide a patching mechanism for the network. Our analysis focuses primarily on the feasibility and the performance evaluation of the NFV RNN-LSTM proposed model. Computer Engineering machine learning-based decentralized networks
545	Community Detection in Social Networks: Multilayer Networks and Pairwise Covariates Huang, Sihan January 2020 (has links) Community detection is one of the most fundamental problems in network study. The stochastic block model (SBM) is arguably the most studied model for network data with different estimation methods developed with their community detection consistency results unveiled. Due to its stringent assumptions, SBM may not be suitable for many real-world problems. In this thesis, we present two approaches that incorporate extra information compared with vanilla SBM to help improve community detection performance and be suitable for applications. One approach is to stack multilayer networks that are composed of multiple single-layer networks with common community structure. Numerous methods have been proposed based on spectral clustering, but most rely on optimizing an objective function while the associated theoretical properties remain to be largely unexplored. We focus on the `early fusion' method, of which the target is to minimize the spectral clustering error of the weighted adjacency matrix (WAM). We derive the optimal weights by studying the asymptotic behavior of eigenvalues and eigenvectors of the WAM. We show that the eigenvector of WAM converges to a normal distribution, and the clustering error is monotonically decreasing with the eigenvalue gap. This fact reveals the intrinsic link between eigenvalues and eigenvectors, and thus the algorithm will minimize the clustering error by maximizing the eigenvalue gap. The numerical study shows that our algorithm outperforms other state-of-art methods significantly, especially when signal-to-noise ratios of layers vary widely. Our algorithm also yields higher accuracy result for S&P 1500 stocks dataset than competing models. The other approach we propose is to consider heterogeneous connection probabilities to remove the strong assumption that all nodes in the same community are stochastically equivalent, which may not be suitable for practical applications. We introduce a pairwise covariates-adjusted stochastic block model (PCABM), a generalization of SBM that incorporates pairwise covariates information. We study the maximum likelihood estimates of the coefficients for the covariates as well as the community assignments. It is shown that both the coefficient estimates of the covariates and the community assignments are consistent under suitable sparsity conditions. Spectral clustering with adjustment (SCWA) is introduced to fit PCABM efficiently. Under certain conditions, we derive the error bound of community estimation under SCWA and show that it is community detection consistent. PCABM compares favorably with the SBM or degree-corrected stochastic block model under a wide range of simulated and real networks when covariate information is accessible. Statistics Machine learning Social networks Algorithms--Evaluation
546	Churn Prediction Åkermark, Alexander, Hallefält, Mattias January 2019 (has links) Churn analysis is an important tool for companies as it can reduce the costs that are related to customer churn. Churn prediction is the process of identifying users before they churn, this is done by implementing methods on collected data in order to ﬁnd patterns that can be helpful when predicting new churners in the future.The objective of this report is to identify churners with the use of surveys collected from diﬀerent golfclubs, their members and guests. This was accomplished by testing several diﬀerent supervised machine learning algorithms in order to ﬁnd the diﬀerent classes and to see which supervised algorithms are most suitable for this kind of data.The margin of success was to have a greater accuracy than the percentage of major class in the datasetThe data was processed using label encoding, ONE-hot encoding and principal component analysis and was split into 10 folds, 9 training folds and 1 testing fold ensuring cross validation when iterated 10 times rearranging the test and training folds. Each algorithm processed the training data to create a classiﬁer which was tested on the test data.The classiﬁers used for the project was K nearest neighbours, Support vector machine, multi-layer perceptron, decision trees and random forest.The diﬀerent classiﬁers generally had an accuracy of around 72% and the best classiﬁer which was random forest had an accuracy of 75%. All the classiﬁers had an accuracy above the margin of success.K-folding, confusion-matrices, classiﬁcation report and other internal crossvalidation techniques were performed on the the data to ensure the quality of the classiﬁer.The project was a success although there is a strong belief that the bottleneck for the project was the quality of the data in terms of new legislation when collecting and storing data that results in redundant and faulty data. / Churn analys är ett viktigt verktyg för företag då det kan reducera kostnaderna som är relaterade till kund churn. Churn prognoser är processen av att identiﬁera användare innan de churnas, detta är gjort med implementering av metoder på samlad data för att hitta mönster som är hjälpsamma när framtida användare ska prognoseras. Objektivet med denna rapport är att identiﬁera churnare med användning av enkäter samlade från golfklubbar och deras kunder och gäster. Det är uppnå att igenom att testa ﬂera olika kontrollerade maskinlärnings algoritmer för att jämföra vilken algoritm som passar bäst. Felmarginalen uppgick till att ha en större träﬀsäkerhet än procenthalten av den dominanta klassen i datasetet. Datan behandlades med label encoding, ONE-hot encoding och principial komponent analys och delades upp i 10 delar, 9 träning och 1 test del för att säkerställa korsvalidering. Varje algoritm behandlade träningsdatan för att skapa att klassiﬁerare som sedan testades på test datan. Klassiﬁerarna som användes för projekted innefattar K nearest neighbours, Support vector machine, multi-layer perceptron, decision trees och random forest. De olika klassiﬁerarna hade en generell träﬀssäkerhet omkring 72%, där den bästa var random forest med en träﬀssäkerhet på 75%. Alla klassiﬁerare hade en träffsäkerhet än den felmarginal som st¨alldes. K-folding, confusion matrices, classiﬁcation report och andra interna korsvaliderings tekniker användes för att säkerställa kvaliteten på klassiﬁeraren. Projektet var lyckat, men det ﬁnns misstanke om att ﬂaskhalsen för projektet låg inom kvaliteten på datan med hänsyn på villkor för ny lagstiftning vid insamling och lagring av data som leder till överﬂödiga och felaktiga uppgifter. Churn Prediction Machine Learning Computer Systems Datorsystem
547	Stress-Aware Personalized Road Navigation System Mandorah, Obai 16 December 2019 (has links) Driving can be a stressful task, especially under congestion conditions. Several studies have shown a positive correlation between stress and aggressive behaviour behind the wheel, leading to accidents. One common way to minimize stress while driving is to avoid highly congested roads. However, not all drivers show the same response towards high traffic situations or other road conditions. For instance, some drivers may prefer congested routes to longer ones to minimize travel time. Increasingly, drivers are employing Advanced Traveller Information Systems while commuting to both familiar and unfamiliar destinations, not just to obtain information on how to reach a certain endpoint, but to acquire real-time data on the state of the roads and avoid undesired traffic conditions. In this thesis, we propose an Advanced Traveller Information System that personalizes the driver’s route using their road preferences and measures their physiological signals during the trip to assess mental stress. The system then links road attributes, such as number of lanes, speed limit, and traffic severity, with the driver’s stress levels. Then, it uses machine learning to predict their stress levels on similar roads. Hence, routes that contribute to high-levels of stress can therefore be avoided in future trips. The average accuracy of the proposed stress level prediction model is 76.11%. Routing Stress Machine Learning Heart Rate Variability
548	Rational Design Inspired Application of Natural Language Processing Algorithms to Red Shift mNeptune684 Parkinson, Scott 26 March 2021 (has links) Recent innovations and progress in machine learning algorithms from the Natural Language Processing (NLP) community have motivated efforts to apply these models and concepts to proteins. The representations generated by trained NLP models have been shown to capture important semantic and structural understanding of proteins encompassing biochemical and biophysical properties, among other key concepts. In turn, these representations have demonstrated application to protein engineering tasks including mutation analysis and design of novel proteins. Here we use this NLP paradigm in a protein engineering effort to further red shift the emission wavelength of the red fluorescent protein mNeptune684 using only a small number of functional training variants ('Low-N' scenario). The collaborative nature of this thesis with the Department of Chemistry and Biomolecular Sciences explores using these tools and methods in the rational design process. machine learning protein engineering natural language processing
549	STATISTICAL MODELING OF SHIP AIRWAKES INCLUDING THE FEASIBILITY OF APPLYING MACHINE LEARNING Unknown Date (has links) Airwakes are shed behind the ship’s superstructure and represent a highly turbulent and rapidly distorting flow field. This flow field severely affects pilot’s workload and such helicopter shipboard operations. It requires both the one-point statistics of autospectrum and the two-point statistics of coherence (normalized cross-spectrum) for a relatively complete description. Recent advances primarily refer to generating databases of flow velocity points through experimental and computational fluid dynamics (CFD) investigations, numerically computing autospectra along with a few cases of cross-spectra and coherences, and developing a framework for extracting interpretive models of autospectra in closed form from a database along with an application of this framework to study the downwash effects. By comparison, relatively little is known about coherences. In fact, even the basic expressions of cross-spectra and coherences for three components of homogeneous isotropic turbulence (HIT) vary from one study to the other, and the related literature is scattered and piecemeal. Accordingly, this dissertation begins with a unified account of all the cross-spectra and coherences of HIT from first principles. Then, it presents a framework for constructing interpretive coherence models of airwake from a database on the basis of perturbation theory. For each velocity component, the coherence is represented by a separate perturbation series in which the basis function or the first term on the right-hand side of the series is represented by the corresponding coherence for HIT. The perturbation series coefficients are evaluated by satisfying the theoretical constraints and fitting a curve in a least squares sense on a set of numerically generated coherence points from a database. Although not tested against a specific database, the framework has a mathematical basis. Moreover, for assumed values of perturbation series constants, coherence results are presented to demonstrate how coherences of airwakes and such flow fields compare to those of HIT. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection Ships--Aerodynamics Turbulence--Statistical methods Machine learning
550	CONNECTED MULTI-DOMAIN AUTONOMY AND ARTIFICIAL INTELLIGENCE: AUTONOMOUS LOCALIZATION, NETWORKING, AND DATA CONFORMITY EVALUATION Unknown Date (has links) The objective of this dissertation work is the development of a solid theoretical and algorithmic framework for three of the most important aspects of autonomous/artificialintelligence (AI) systems, namely data quality assurance, localization, and communications. In the era of AI and machine learning (ML), data reign supreme. During learning tasks, we need to ensure that the training data set is correct and complete. During operation, faulty data need to be discovered and dealt with to protect from -potentially catastrophic- system failures. With our research in data quality assurance, we develop new mathematical theory and algorithms for outlier-resistant decomposition of high-dimensional matrices (tensors) based on L1-norm principal-component analysis (PCA). L1-norm PCA has been proven to be resistant to irregular data-points and will drive critical real-world AI learning and autonomous systems operations in the future. At the same time, one of the most important tasks of autonomous systems is self-localization. In GPS-deprived environments, localization becomes a fundamental technical problem. State-of-the-art solutions frequently utilize power-hungry or expensive architectures, making them difficult to deploy. In this dissertation work, we develop and implement a robust, variable-precision localization technique for autonomous systems based on the direction-of-arrival (DoA) estimation theory, which is cost and power-efficient. Finally, communication between autonomous systems is paramount for mission success in many applications. In the era of 5G and beyond, smart spectrum utilization is key.. In this work, we develop physical (PHY) and medium-access-control (MAC) layer techniques that autonomously optimize spectrum usage and minimizes intra and internetwork interference. / Includes bibliography. / Dissertation (Ph.D.)--Florida Atlantic University, 2020. / FAU Electronic Theses and Dissertations Collection Artificial intelligence Machine learning Tensor algebra

Search results