Global ETD Search

1	Automated Machine Learning for Time Series Forecasting Rosenberger, Daniel 26 April 2022 (has links) Time series forecasting has become a common problem in day-to-day applications and various machine learning algorithms have been developed to tackle this task. Finding the model that performs the best forecasting on a given dataset can be time consuming as multiple algorithms and hyperparameter configurations must be examined to find the best model. This problem can be solved using automated machine learning, an approach that automates all steps required for developing a machine learning algorithm including finding the best algorithm and hyperparameter configuration. This study develops and builds an automated machine learning pipeline focused on finding the best forecasting model for a given dataset. This includes choosing different forecasting algorithms to cover a wide range of tasks and identifying the best method to find the best model in these algorithms. Lastly, the final pipeline will then be tested on a variety of datasets to evaluate the performance on time series data with different characteristics.:Abstract List of Figures List of Tables List of Abbreviations List of Symbols 1. Introduction 2. Theoretical Background 2.1. Machine Learning 2.2. Automated Machine Learning 2.3. Hyperparameter Optimization 2.3.1. Model-Free Methods 2.3.2. Bayesian Optimization 3. Time Series Forecasting Algorithms 3.1. Time Series Data 3.2. Baselines 3.2.1. Naive Forecast 3.2.2. Moving Average 3.3. Linear Regression 3.4. Autoregression 3.5. SARIMAX 3.6. XGBoost 3.7. LSTM Neural Network 4. Automated Machine Learning Pipeline 4.1. Data Preparation 4.2. Model Selection 4.3. Hyperparameter Optimization Method 4.3.1. Sequential Model-Based Algorithm Configuration 4.3.2. Tree-structured Parzen Estimator 4.3.3. Comparison of Bayesian Optimization Hyperparameter Optimization Methods 4.4. Pipeline Structure 5. Testing on external Datasets 5.1. Beijing PM2.5 Pollution 5.2. Perrin Freres Monthly Champagne Sales 6. Testing on internal Datasets 6.1. Deutsche Telekom Call Count 6.1.1. Comparison of Bayesian Optimization and Random Search 6.2. Deutsche Telekom Call Setup Time 7. Conclusion Bibliography A. Details Search Space B. Pipeline Results - Predictions C. Pipeline Results - Configurations D. Pipeline Results - Experiment Details E. Deutsche Telekom Data Usage Permissions
2	Machine Learning and Multivariate Statistics for Optimizing Bioprocessing and Polyolefin Manufacturing Agarwal, Aman 07 January 2022 (has links) Chemical engineers have routinely used computational tools for modeling, optimizing, and debottlenecking chemical processes. Because of the advances in computational science over the past decade, multivariate statistics and machine learning have become an integral part of the computerization of chemical processes. In this research, we look into using multivariate statistics, machine learning tools, and their combinations through a series of case studies including a case with a successful industrial deployment of machine learning models for fermentation. We use both commercially-available software tools, Aspen ProMV and Python, to demonstrate the feasibility of the computational tools. This work demonstrates a novel application of ensemble-based machine learning methods in bioprocessing, particularly for the prediction of different fermenter types in a fermentation process (to allow for successful data integration) and the prediction of the onset of foaming. We apply two ensemble frameworks, Extreme Gradient Boosting (XGBoost) and Random Forest (RF), to build classification and regression models. Excessive foaming can interfere with the mixing of reactants and lead to problems, such as decreasing effective reactor volume, microbial contamination, product loss, and increased reaction time. Physical modeling of foaming is an arduous process as it requires estimation of foam height, which is dynamic in nature and varies for different processes. In addition to foaming prediction, we extend our work to control and prevent foaming by allowing data-driven ad hoc addition of antifoam using exhaust differential pressure as an indicator of foaming. We use large-scale real fermentation data for six different types of sporulating microorganisms to predict foaming over multiple strains of microorganisms and build exploratory time-series driven antifoam profiles for four different fermenter types. In order to successfully predict the antifoam addition from the large-scale multivariate dataset (about half a million instances for 163 batches), we use TPOT (Tree-based Pipeline Optimization Tool), an automated genetic programming algorithm, to find the best pipeline from 600 other pipelines. Our antifoam profiles are able to decrease hourly volume retention by over 53% for a specific fermenter. A decrease in hourly volume retention leads to an increase in fermentation product yield. We also study two different cases associated with the manufacturing of polyolefins, particularly LDPE (low-density polyethylene) and HDPE (high-density polyethylene). Through these cases, we showcase the usage of machine learning and multivariate statistical tools to improve process understanding and enhance the predictive capability for process optimization. By using indirect measurements such as temperature profiles, we demonstrate the viability of such measures in the prediction of polyolefin quality parameters, anomaly detection, and statistical monitoring and control of the chemical processes associated with a LDPE plant. We use dimensionality reduction, visualization tools, and regression analysis to achieve our goals. Using advanced analytical tools and a combination of algorithms such as PCA (Principal Component Analysis), PLS (Partial Least Squares), Random Forest, etc., we identify predictive models that can be used to create inferential schemes. Soft-sensors are widely used for on-line monitoring and real-time prediction of process variables. In one of our cases, we use advanced machine learning algorithms to predict the polymer melt index, which is crucial in determining the product quality of polymers. We use real industrial data from one of the leading chemical engineering companies in the Asia-Pacific region to build a predictive model for a HDPE plant. Lastly, we show an end-to-end workflow for deep learning on both industrial and simulated polyolefin datasets. Thus, using these five cases, we explore the usage of advanced machine learning and multivariate statistical techniques in the optimization of chemical and biochemical processes. The recent advances in computational hardware allow engineers to design such data-driven models, which enhances their capacity to effectively and efficiently monitor and control a process. We showcase that even non-expert chemical engineers can implement such machine learning algorithms with ease using open-source or commercially available software tools. / Doctor of Philosophy / Most chemical and biochemical processes are equipped with advanced probes and connectivity sensors that collect large amounts of data on a daily basis. It is critical to manage and utilize the significant amount of data collected from the start and throughout the development and manufacturing cycle. Chemical engineers have routinely used computational tools for modeling, designing, optimizing, debottlenecking, and troubleshooting chemical processes. Herein, we present different applications of machine learning and multivariate statistics using industrial datasets. This dissertation also includes a deployed industrial solution to mitigate foaming in commercial fermentation reactors as a proof-of-concept (PoC). Our antifoam profiles are able to decrease volume loss by over 53% for a specific fermenter. Throughout this dissertation, we demonstrate applications of several techniques like ensemble methods, automated machine learning, exploratory time series, and deep learning for solving industrial problems. Our aim is to bridge the gap from industrial data acquisition to finding meaningful insights for process optimization. foaming antifoam profiles fermentation Machine learning multivariate statistics ensemble methods automated Machine learning deep learning
3	Optimalizace hyperparametrů v systémech automatického strojového učení / Hyperparameter optimization in AutoML systems Pešková, Klára January 2019 (has links) In the last few years, as processing the data became a part of everyday life in different areas of human activity, the automated machine learning systems that are designed to help with the process of data mining, are on the rise. Various metalearning techniques, including recommendation of the right method to use, or the sequence of steps to take, and to find its optimum hyperparameters configuration, are integrated into these systems to help the researchers with the machine learning tasks. In this thesis, we proposed metalearning algorithms and techniques for hyperparameters optimization, narrowing the intervals of hyperparameters, and recommendations of a machine learning method for a never before seen dataset. We designed two AutoML machine learning systems, where these metalearning techniques are implemented. The extensive set of experiments was proposed to evaluate these algorithms, and the results are presented.
4	Automatic Feature Extraction for Human Activity Recognitionon the Edge Cleve, Oscar, Gustafsson, Sara January 2019 (has links) This thesis evaluates two methods for automatic feature extraction to classify the accelerometer data of periodic and sporadic human activities. The first method selects features using individual hypothesis tests and the second one is using a random forest classifier as an embedded feature selector. The hypothesis test was combined with a correlation filter in this study. Both methods used the same initial pool of automatically generated time series features. A decision tree classifier was used to perform the human activity recognition task for both methods.The possibility of running the developed model on a processor with limited computing power was taken into consideration when selecting methods for evaluation. The classification results showed that the random forest method was good at prioritizing among features. With 23 features selected it had a macro average F1 score of 0.84 and a weighted average F1 score of 0.93. The first method, however, only had a macro average F1 score of 0.40 and a weighted average F1 score of 0.63 when using the same number of features. In addition to the classification performance this thesis studies the potential business benefits that automation of feature extractioncan result in. / Denna studie utvärderar två metoder som automatiskt extraherar features för att klassificera accelerometerdata från periodiska och sporadiska mänskliga aktiviteter. Den första metoden väljer features genom att använda individuella hypotestester och den andra metoden använder en random forest-klassificerare som en inbäddad feature-väljare. Hypotestestmetoden kombinerades med ett korrelationsfilter i denna studie. Båda metoderna använde samma initiala samling av automatiskt genererade features. En decision tree-klassificerare användes för att utföra klassificeringen av de mänskliga aktiviteterna för båda metoderna. Möjligheten att använda den slutliga modellen på en processor med begränsad hårdvarukapacitet togs i beaktning då studiens metoder valdes. Klassificeringsresultaten visade att random forest-metoden hade god förmåga att prioritera bland features. Med 23 utvalda features erhölls ett makromedelvärde av F1 score på 0,84 och ett viktat medelvärde av F1 score på 0,93. Hypotestestmetoden resulterade i ett makromedelvärde av F1 score på 0,40 och ett viktat medelvärde av F1 score på 0,63 då lika många features valdes ut. Utöver resultat kopplade till klassificeringsproblemet undersöker denna studie även potentiella affärsmässiga fördelar kopplade till automatisk extrahering av features. Human Activity Recognition Automatic Feature Extraction Automatic Feature Selection Automated Machine Learning Random Forest Classifier Hypothesis Test Computer and Information Sciences Data- och informationsvetenskap
5	A Comparison of AutoML Hyperparameter Optimization Tools for Tabular Data Pokhrel, Prativa 02 May 2023 (has links) No description available. Computer Science Artificial Intelligence Comparative Information Systems Artificial Intelligence Machine Learning Automated Machine Learning Tools Hyperparameters Hyperparameter Optimization Benchmarking Optuna Hyperopt Tabular Data
6	Maximizing the performance of point cloud 4D panoptic segmentation using AutoML technique / Maximera prestandan för punktmoln 4D panoptisk segmentering med hjälp av AutoML-teknik Ma, Teng January 2022 (has links) Environment perception is crucial to autonomous driving. Panoptic segmentation and objects tracking are two challenging tasks, and the combination of both, namely 4D panoptic segmentation draws researchers’ attention recently. In this work, we implement 4D panoptic LiDAR segmentation (4D-PLS) on Volvo datasets and provide a pipeline of data preparation, model building and model optimization. The main contributions of this work include: (1) building the Volvo datasets; (2) adopting an 4D-PLS model improved by Hyperparameter Optimization (HPO). We annotate point cloud data collected from Volvo CE, and take a supervised learning approach by employing a Deep Neural Network (DNN) to extract features from point cloud data. On the basis of the 4D-PLS model, we employ Bayesian Optimization to find the best hyperparameters for our data, and improve the model performance within a small training budget. / Miljöuppfattning är avgörande för autonom körning. Panoptisk segmentering och objektspårning är två utmanande uppgifter, och kombinationen av båda, nämligen 4D panoptisk segmentering, har nyligen uppmärksammat forskarna. I detta arbete implementerar vi 4D-PLS på Volvos datauppsättningar och tillhandahåller en pipeline av dataförberedelse, modellbyggande och modelloptimering. De huvudsakliga bidragen från detta arbete inkluderar: (1) bygga upp Volvos datauppsättningar; (2) anta en 4D-PLS-modell förbättrad av HPO. Vi kommenterar punktmolndata som samlats in från Volvo CE och använder ett övervakat lärande genom att använda en DNN för att extrahera funktioner från punktmolnsdata. På basis av 4D-PLS-modellen använder vi Bayesian Optimization för att hitta de bästa hyperparametrarna för vår data och förbättra modellens prestanda inom en liten utbildningsbudget. LiDAR perception 4D panoptic segmentation Hyperparameter Optimization Deep learning Automated Machine Learning LiDAR-uppfattning 4D-panoptisk segmentering hyperparameteroptimering djupinlärning automatiserad maskininlärning Computer and Information Sciences Data- och informationsvetenskap
7	Predicting Consumer Purchase behavior using Automatic Machine Learning : A case study in online purchase flows / Prediktering av Konsumentbeteenden med Automatisk Maskininlärning : En fallstudie i onlinebaserade köpflöden Sandström, Olle January 2022 (has links) Online payment purchase flows are designed to be as effective and smooth as possible in regards to the user experience. The user is in the center of this process, who, to a certain degree decides whether the purchase eventually will be placed. What is left up to the payment provider is the process of enabling an effective purchase flow where information needs to be collected for various purposes. To design these purchase flows as efficiently as possible, this research investigates if and how consumer purchase behavior can be predicted. Which algorithms perform the best at modeling the outcome and what kind of underlying features can be used to model the outcome? The features are graded in regard to their feature importance to see how and how much they affect the best-performing model. To investigate consumer behavior, the task was set up as a supervised binary classification problem to model the outcome of user purchase sessions. Either the sessions result in a purchase or they do not. Several automatic machine learning (also referred to as automated machine learning) frameworks were considered before the choice of using H2O AutoML because of its historical performance on other supervised binary classification problems. The dataset contained information from user sessions relating to the consumer, the transaction, and the time when the purchase was initiated. These variables were either in a numerical or categorical format and were then evaluated using the SHAP importance metric as well as an aggregated SHAP summary plot, which describes how features are affecting the model. The results show that the Distributed Random Forest Algorithm performed the best, generating a 26 percentage points improvement in accuracy, predicting whether a session will be converted into a purchase from an undersampled baseline of 50%. Furthermore two of the most important features according to the model were categorical features related to the intersection of consumer and transaction information. Another time-based categorical variable also proved to be important in the model prediction. The research also shows that automatic machine learning has come a long way in the pre-processing of variables, enabling the developer of the models to more efficiently deploy these kinds of machine learning problems. The results echo some earlier findings confirming the possibility of predicting consumer purchase behavior and in particular, the outcome of a purchase flow consumer session. This implies that payment providers hypothetically could use these kinds of insights and predictions in the development of their flows, to individually cater to specific groups of consumers, enabling a more efficient and personalized payment flow. / Köpflöden för onlinebetalningar är utformade för att vara så effektiva och smidiga som möjligt med avseende på användarupplevelsen. I processen står användaren i centrum, som delvis avgör om köpet i slutändan konverteras eller ej. Det som är upp till betalningsleverantören är möjliggörandet av ett effektivt köpflöde där information behöver samlas in för olika ändamål. För att utforma dessa köpflöden så effektivt som möjligt undersöker detta arbete om och hur konsumenters köpbeteende kan förutsägas. Vilka algoritmer fungerar bäst på att modellera resultatet och vilken typ av underliggande attribut kan användas för att modellera resultatet? Dessa attribut graderas med avseende på deras relevans (feature importance) för att se hur och hur mycket de faktiskt påverkar den bäst presterande modellen. För att undersöka konsumentbeteendet sattes uppgiften upp som ett övervakat binärt klassificeringsproblem för att modellera resultatet av användarnas sessioner. Antingen resulterar sessionerna i ett köp eller så gör de det inte. Flera ramverk för automatisk maskininlärning övervägdes innan valet att använda H2O AutoML på grund av dess historiska prestanda på andra övervakade binära klassificeringsproblem. Dataunderlaget innehöll information från användarsessioner som rör konsumenten, transaktionen och tidpunkten då köpet påbörjades. Dessa variabler var antingen i ett numeriskt eller kategoriskt format och utvärderades sedan med hjälp av SHAP-viktighetsmåttet (SHAP Feature Importance) såväl som ett aggregerat SHAP-diagram, som beskriver hur de olika attributen påverkar modellen. Resultaten visar att Distributed Random Forest algoritmen presterade bäst, genererade en förbättring på 26 procentenheter i noggrannhet (accuracy), i prediktionen av ifall en session omvandlas till ett köp eller ej, baserat på ett undersamplat dataset med en baslinje på 50%. Dessutom var två av de viktigaste attributen enligt modellen kategoriska attribut relaterade till skärningspunkten mellan konsument- och transaktionsinformation. En annan tidsbaserad kategorisk variabel visade sig också vara viktig i prediktionen. Arbetet visar också att automatisk maskininlärning har kommit långt i förbearbetningen av variabler, vilket gör det möjligt för utvecklaren av modellerna att mer effektivt distribuera den här typen av maskininlärningsproblem. Resultaten återspeglar tidigare insikter som bekräftar möjligheten att förutsäga konsumenternas köpbeteende och i synnerhet resultatet av en konsumentsession i ett köpflöde. Detta innebär att betalningsleverantörer hypotetiskt skulle kunna använda denna typ av insikter och förutsägelser i utvecklingen av sina flöden, för att individuellt tillgodose specifika grupper av konsumenter, vilket möjliggör ett ännu mer effektivt och skräddarsytt betalningsflöde. Automatic Machine Learning Automated Machine Learning Binary Classification Payment Provider Purchase Flow Supervised Machine Learning Automatisk Maskininlärning Betalflöde Betalningsleverantör Binär klassificering Vägledd Maskininlärning Computer and Information Sciences Data- och informationsvetenskap
8	The Allure of Automated Machine Learning Services : How SMEs and non-expert users can benefit from AutoML frameworks Lux Dryselius, Felix January 2023 (has links) This study investigates how small and medium sized enterprises (SMEs) and other resource-lacking organisations can utilise automated machine learning (AutoML) to lessen the development hurdles associated with machine learning model development. This is achieved by comparing the performance, cost of usage, as well as usability and documentation for machine learning models developed through two AutoML frameworks: Vertex AI on Google Cloud™ and the open-source library AutoGluon, developed by Amazon Web Services. The study also presents a roadmap and a time plan that can be utilised by resource-lacking enterprises to guide the development of machine learning solutions implemented through AutoML frameworks. The results of the study show that AutoML frameworks are easy to use and capable in generating machine learning models. However, performance is not guaranteed and machine learning projects utilising AutoML frameworks still necessitates substantial development effort. Furthermore, the limiting factor in model performance is often training data quality which AutoML frameworks do not address. small and medium-sized enterprises SME machine learning automated machine learning AutoML low-code no-code Vertex AI AutoGluon digitalization project management Computer and Information Sciences Data- och informationsvetenskap

Search results