Global ETD Search

701	Machine Learning for Financial Crisis Prediction Voskamp, Joseph January 2024 (has links) We investigate the potential applications of using machine-learning models in financial crisis prediction. We aim to identify crises one or two years ahead of their start dates by recognizing trends in a variety of economic variables. We look at two different datasets of banking crises, as well as currency and inflation crises. For consistency in analysis, we manually construct the crisis variables for the years 2017-2020. By analyzing the models in both cross-validation and forecasting experiments, we show that machine-learning models can outperform logistic regression in financial crisis prediction. We employ a Shapley value framework in an attempt to mitigate the black box nature of the machine-learning models. We show that the global economic climate is of vital importance in identifying banking and currency crises. Wages are shown to be the most important predictor of inflation crises. We then investigate the nonlinear relationships between the predictors and their Shapley values to further understand the driving forces behind the model predictions. / Thesis / Master of Science (MSc) machine learning model crisis financial crisis
702	Enhancing Integrated Logistic Support through Machine Learning-Driven Optimization : Predicting component category based on part number Mian, Faizan January 2024 (has links) Accurate categorization of components based on part numbers is a fundamental task for reliability engineers, essential for assessing and improving the reliability of systems. Manual categorization is labor-intensive and prone to errors, highlighting the need for an automated approach. This thesis presents a machine learning classifier designed to predict component categories from part numbers, with the goal of enhancing the efficiency of Integrated Logistic Support. The proposed solution utilizes TF-IDF vectorization combined with classifiers such as Multinomial Naive Bayes, Stochastic Gradient Descent and Linear Support Vector Classifier, enabling the model to effectively analyse and categorize part numbers. An interactive graphical user interface facilitates user input and provides immediate predictions, thereby streamlining the categorization process for reliability engineers. This ML-driven tool not only reduces the manual effort required, but also enhances the precision of component categorization, leading to better reliability assessments and system evaluations. The research demonstrates the potential of ML in automating complex engineering tasks and suggests pathways for future enhancements, including the integration of additional component attributes and validation in diverse real-world scenarios. The ultimate goal is to create a robust tool that can be widely adopted in the field of reliability engineering, thereby optimizing workflows and improving overall system reliability. Machine Learning Computer and Information Sciences Data- och informationsvetenskap
703	Federated Machine Learning Architectures for Image Classification Albahaca, Juan January 2024 (has links) In this thesis, we explore a new method for binary image classification of semiconductorcomponents using federated learning at Mycronic AB, enabling model training on Pick andPlace (PnP) machines without centralizing sensitive data. Initially, we set a baseline bychoosing a suitable Convolutional Neural Network (CNN) architecture, implementing datapreprocessing methods, and optimizing various hyperparameters. We then assess variousfederated learning algorithms to manage the inherent statistical heterogeneity in distributeddatasets. Our approach is validated using a real-world dataset annotated by Mycronic,confirming that our findings are applicable to real industrial scenarios. Federated Machine Learning Computer Sciences Datavetenskap (datalogi)
704	Enhancing PointNet: New Aggregation Functions and Contextual Normalization Isaksson Jonek, Markus January 2024 (has links) The PointNet architecture is a foundational deep learning model for 3D point clouds, solving classification and segmentation tasks. We hypothesize that the full potential of PointNet has not been reached and is greatly restrained by a single Max pooling layer. First, this thesis introduces new and more complex learnable aggregation functions. Secondly, a novel normalization technique, Context Normalization, is proposed for further feature extraction. Context Normalization is similar to Batch Normalization but independently normalizes each point cloud within a mini-batch and always uses dynamic statistics. The experiments show that replacing Max pooling with Principal Neighborhood Aggregation (PNA) increased classification accuracy from 73.3% to 78.7% on an SO(3) augmented version of the ModelNet40 dataset. Combining PNA with Context Normalization further increased accuracy to 84.6%. Machine Learning Point Clouds PointNet Mathematics Matematik
705	Soar CGFs that learn inductively : a hybrid autonomous approach based on a modified naive bayes learning algorithm Chia, Chien Wei 01 October 2003 (has links) No description available. Computer simulation; Machine learning Engineering Industrial Engineering
706	Hilbert Space Filling Curve (HSFC) Nearest Neighbor Classifier Reeder, John 01 January 2005 (has links) The Nearest Neighbor algorithm is one of the simplest and oldest classification techniques. A given collection of historic data (Training Data) of known classification is stored in memory. Then based on the stored knowledge the classification of an unknown data (Test Data) is predicted by finding the classification of the nearest neighbor. For example, if an instance from the test set is presented to the nearest neighbor classifier, its nearest neighbor, in terms of some distance metric, in the training set is found. Then its classification is predicted to be the classification of the nearest neighbor. This classifier is known as the 1-NN (one-nearest-neighbor). An extension to this classifier is the k-NN classifier. It follows the same principle as the 1-NN classifier with the addition of finding k (k > l) neighbors and taking the classification represented by the highest number of its neighbors. It is easy to see that the implementation of the nearest neighbor classifier is effortless, simply store the training data and their classifications. The drawback of this classifier is found when a test instance is presented to be classified. The distance from the test pattern. to every point in the training set must be found. The required computations to find these distances are proportional to the number of training points (N), which is computationally complex, especially with N large. The purpose of this thesis is to reduce the computational complexity of the testing phase of the nearest neighbor by using the Hilbert Space Filling Curve (HSFC). The HSFC NN classifier was implemented and its accuracy and computational complexity is compared to the original NN classifier to test the validity of using the HSFC in classification. Hilbert space; Machine learning Computer Engineering
707	Teaching Robots using Interactive Imitation Learning Jonnavittula, Ananth 28 June 2024 (has links) As robots transition from controlled environments, such as industrial settings, to more dynamic and unpredictable real-world applications, the need for adaptable and robust learning methods becomes paramount. In this dissertation we develop Interactive Imitation Learning (IIL) based methods that allow robots to learn from imperfect demonstrations. We achieve this by incorporating human factors such as the quality of their demonstrations and the level of effort they are willing to invest in teaching the robot. Our research is structured around three key contributions. First, we examine scenarios where robots have access to high-quality human demonstrations and abundant corrective feedback. In this setup, we introduce an algorithm called SARI (Shared Autonomy across Repeated Interactions), that leverages repeated human-robot interactions to learn from humans. Through extensive simulations and real-world experiments, we demonstrate that SARI significantly enhances the robot's ability to perform complex tasks by iteratively improving its understanding and responses based on human feedback. Second, we explore scenarios where human demonstrations are suboptimal and no additional corrective feedback is provided. This approach acknowledges the inherent imperfections in human teaching and aims to develop robots that can learn effectively under such conditions. We accomplish this by allowing the robot to adopt a risk-averse strategy that underestimates the human's abilities. This method is particularly valuable in household environments where users may not have the expertise or patience to provide perfect demonstrations. Finally, we address the challenge of learning from a single video demonstration. This is particularly relevant for enabling robots to learn tasks without extensive human involvement. We present VIEW (Visual Imitation lEarning with Waypoints), a method that focuses on extracting critical waypoints from video demonstrations. By identifying key positions and movements, VIEW allows robots to efficiently replicate tasks with minimal training data. Our experiments show that VIEW can significantly reduce both the number of trials required and the time needed for the robot to learn new tasks. The findings from this research highlight the importance of incorporating advanced learning algorithms and interactive methods to enhance the robot's ability to operate autonomously in diverse environments. By addressing the variability in human teaching and leveraging innovative learning strategies, this dissertation contributes to the development of more adaptable, efficient, and user-friendly robotic systems. / Doctor of Philosophy / Robots are becoming increasingly common outside manufacturing facilities. In these unstructured environments, people might not always be able to give perfect instructions or might make mistakes. This dissertation explores methods that allow robots to learn tasks by observing human demonstrations, even when those demonstrations are imperfect. First, we look at scenarios where humans can provide high-quality demonstrations and corrections. We introduce an algorithm called SARI (Shared Autonomy across Repeated Interactions). SARI helps robots get better at tasks by learning from repeated interactions with humans. Through various experiments, we found that SARI significantly improves the robot's ability to perform complex tasks, making it more reliable and efficient. Next, we explore scenarios where the human demonstrations are not perfect, and no additional corrections are given. This approach takes everyday scenarios into account, where people might not have the time or expertise to provide perfect instructions. By designing a method that assumes humans might make mistakes, we can create robots that can learn safely and effectively. This makes the robots more adaptable and easier to use for a diverse group of people. Finally, we tackle the challenge of teaching robots from a single video demonstration. This method is particularly useful because it requires less involvement from humans. We developed VIEW (Visual Imitation lEarning with Waypoints), a method that helps robots learn tasks by focusing on the most important parts of a video demonstration. By identifying key points and movements, VIEW allows robots to quickly and efficiently replicate tasks with minimal training. This method significantly reduces the time and effort needed for robots to learn new tasks. Overall, this research shows that by using advanced learning techniques and interactive methods, we can create robots that are more adaptable, efficient, and user-friendly. These robots can learn from humans in various environments and become valuable assistants in our daily lives. Imitation Learning Robotics Artificial Intelligence Machine Learning
708	Towards a fully automated extraction and interpretation of tabular data using machine learning Hedbrant, Per January 2019 (has links) Motivation A challenge for researchers at CBCS is the ability to efficiently manage the different data formats that frequently are changed. This handling includes import of data into the same format, regardless of the output of the various instruments used. There are commercial solutions available for this process, but to our knowledge, all these require prior generation of templates to which data must conform.A challenge for researchers at CBCS is the ability to efficiently manage the different data formats that frequently are changed. Significant amount of time is spent on manual pre- processing, converting from one format to another. There are currently no solutions that uses pattern recognition to locate and automatically recognise data structures in a spreadsheet. Problem Definition The desired solution is to build a self-learning Software as-a-Service (SaaS) for automated recognition and loading of data stored in arbitrary formats. The aim of this study is three-folded: A) Investigate if unsupervised machine learning methods can be used to label different types of cells in spreadsheets. B) Investigate if a hypothesis-generating algorithm can be used to label different types of cells in spreadsheets. C) Advise on choices of architecture and technologies for the SaaS solution. Method A pre-processing framework is built that can read and pre-process any type of spreadsheet into a feature matrix. Different datasets are read and clustered. An investigation on the usefulness of reducing the dimensionality is also done. A hypothesis-driven algorithm is built and adapted to two of the data formats CBCS uses most frequently. Discussions are held on choices of architecture and technologies for the SaaS solution, including system design patterns, web development framework and database. Result The reading and pre-processing framework is in itself a valuable result, due to its general applicability. No satisfying results are found when using mini-batch K means clustering method. When only reading data from one format, the dimensionality can be reduced from 542 to around 40 dimensions. The hypothesis-driven algorithm can consistently interpret the format it is designed for. More work is needed to make it more general. Implication The study contribute to the desired solution in short-term by the hypothesis-generating algorithm, and in a more generalisable way by the unsupervised learning approach. The study also contributes by initiating a conversation around the system design choices. machine learning unsupervised machine learning spreadsheets Information Systems
709	Analyse statistique et interprétation automatique de données diagraphiques pétrolières différées à l’aide du calcul haute performance / Statistical analysis and automatic interpretation of oil logs using high performance computing Bruned, Vianney 18 October 2018 (has links) Dans cette thèse, on s'intéresse à l’automatisation de l’identification et de la caractérisation de strates géologiques à l’aide des diagraphies de puits. Au sein d’un puits, on détermine les strates géologiques grâce à la segmentation des diagraphies assimilables à des séries temporelles multivariées. L’identification des strates de différents puits d’un même champ pétrolier nécessite des méthodes de corrélation de séries temporelles. On propose une nouvelle méthode globale de corrélation de puits utilisant les méthodes d’alignement multiple de séquences issues de la bio-informatique. La détermination de la composition minéralogique et de la proportion des fluides au sein d’une formation géologique se traduit en un problème inverse mal posé. Les méthodes classiques actuelles sont basées sur des choix d’experts consistant à sélectionner une combinaison de minéraux pour une strate donnée. En raison d’un modèle à la vraisemblance non calculable, une approche bayésienne approximée (ABC) aidée d’un algorithme de classification basé sur la densité permet de caractériser la composition minéralogique de la couche géologique. La classification est une étape nécessaire afin de s’affranchir du problème d’identifiabilité des minéraux. Enfin, le déroulement de ces méthodes est testé sur une étude de cas. / In this thesis, we investigate the automation of the identification and the characterization of geological strata using well logs. For a single well, geological strata are determined thanks to the segmentation of the logs comparable to multivariate time series. The identification of strata on different wells from the same field requires correlation methods for time series. We propose a new global method of wells correlation using multiple sequence alignment algorithms from bioinformatics. The determination of the mineralogical composition and the percentage of fluids inside a geological stratum results in an ill-posed inverse problem. Current methods are based on experts’ choices: the selection of a subset of mineral for a given stratum. Because of a model with a non-computable likelihood, an approximate Bayesian method (ABC) assisted with a density-based clustering algorithm can characterize the mineral composition of the geological layer. The classification step is necessary to deal with the identifiability issue of the minerals. At last, the workflow is tested on a study case. Problème Inverse Statistiques Machine Learning Petro-Physique Hpc Inverse Problem Machine Learning Statistics Petrophysics Hpc
710	Gene fusions in cancer: Classification of fusion events and regulation patterns of fusion pathway neighbors Hughes, Katelyn 05 May 2016 (has links) Cancer is a leading cause of death worldwide, resulting in an estimated 1.6 million mortalities and 600,000 new cases in the US alone in 2015. Gene fusions, hybrid genes formed from two originally separated genes, are known drivers of cancer. However, gene fusions have also been found in healthy cells due to routine errors in replication. This project aims to understand the role of gene fusion in cancer. Specifically, we seek to achieve two goals. First, we would like to develop a computational method that predicts if a gene fusion event is associated with the cancer or healthy sample. Second, we would like to use this information to determine and characterize molecular mechanisms behind the gene fusion events. Recent studies have attempted to address these problems, but without explicit consideration of the fact that there are overlapping fusion events in both cancer and healthy cells. Here, we address this problem using FUsion Enriched Learning of CANcer Mutations (FUELCAN), a semi-supervised model, which classifies all overlapping fusion events as unlabeled to start. The model is trained using the known cancer and healthy samples and tested using the unlabeled dataset. Unlabeled data is classified as associated with healthy or cancer samples and the top 20 data points are put back into the training set. The process continues until all have been appropriately classified. Three datasets were analyzed from Acute Lymphoblastic Leukemia (ALL), breast cancer and colorectal cancer. We obtained similar results for both supervised and semi-supervised classification. To improve our model, we assessed the functional landscape of gene fusion events and observed that the pathway neighbors of both gene fusion partners are differentially expressed in each cancer dataset. The significant neighbors are also shown to have direct connections to cancer pathways and functions, indicating that these gene fusions are important for cancer development. Future directions include applying the acquired transcriptomic knowledge to our machine learning algorithm, counting transcription factors and kinases within the gene fusion events and their neighbors and assessing the differences between upstream and downstream effects within the pathway neighbors. gene fusion chromosomal abnormalities machine learning semi-supervised machine learning cancer

Search results