Global ETD Search

11	Prediction of battery lifetime using early cycle data : A data driven approach Enholm, Isabelle, Valfridsson, Olivia January 2022 (has links) A form of laboratory tests are performed to determine battery degradation due to charging and discharging of batteries (cycling). This is done as part of quality assurance in battery production since a certain amount of degradation corresponds to the end of the battery lifetime. Currently, this requires a significant amount of cycling. Thus, if it’s possible to decrease the number of cycles required, the time and costs for battery degradation testing can be reduced. The aim of this thesis is therefore to create a model for prediction of battery lifetime while using early cycle data. Further, to assist planning regarding scale of cycle testing this study aims to examine the impact of implementing such a prediction model in production. To examine which data driven model that should be used to predict the battery lifetime at the company, extensive feature engineering is performed where measurements from specific cycles are used, inspired by the previous work of Severson et al. (2019) and Fei et al. (2021). Two models are then examined: Linear Regression with Elastic net and Support Vector Regression. To investigate the extent to which an implementation of such a model can affect battery testing capacity, two scenarios are compared. The first scenario is that of the current cycle testing at the company and the second scenario involves implementing a prediction model. The comparison then examines the time required for battery testing and the number of machines to cycle the batteries (cyclers). Based on the results obtained, the data driven model that should be implemented is a Support Vector Regression model with features relating to different battery cycling phases or measurements, such as charge process, temperature and capacity. It can also be shown that if a battery lifetime prediction model is implemented, it can reduce the time and number of cyclers required for testing with approximately 93 %, compared to traditional testing. early prediction Support Vector Regression Elastic Net battery lifetime cycle life battery degradation Mathematics Matematik
12	Sparse Ridge Fusion For Linear Regression Mahmood, Nozad 01 January 2013 (has links) For a linear regression, the traditional technique deals with a case where the number of observations n more than the number of predictor variables p (n > p). In the case n < p, the classical method fails to estimate the coefficients. A solution of the problem is the case of correlated predictors is provided in this thesis. A new regularization and variable selection is proposed under the name of Sparse Ridge Fusion (SRF). In the case of highly correlated predictor, the simulated examples and a real data show that the SRF always outperforms the lasso, eleastic net, and the S-Lasso, and the results show that the SRF selects more predictor variables than the sample size n while the maximum selected variables by lasso is n size. Lasso coordinate descent elastic net smooth lasso sparsity collinearity high dimensional data variable selection Statistics and Probability
13	Contributions to the Interface between Experimental Design and Machine Learning Lian, Jiayi 31 July 2023 (has links) In data science, machine learning methods, such as deep learning and other AI algorithms, have been widely used in many applications. These machine learning methods often have complicated model structures with a large number of model parameters and a set of hyperparameters. Moreover, these machine learning methods are data-driven in nature. Thus, it is not easy to provide a comprehensive evaluation on the performance of these machine learning methods with respect to the data quality and hyper-parameters of the algorithms. In the statistical literature, design of experiments (DoE) is a set of systematical methods to effectively investigate the effects of input factors for the complex systems. There are few works focusing on the use of DoE methodology for evaluating the quality assurance of AI algorithms, while an AI algorithm is naturally a complex system. An understanding of the quality of Artificial Intelligence (AI) algorithms is important for confidently deploying them in real applications such as cybersecurity, healthcare, and autonomous driving. In this proposal, I aim to develop a set of novel methods on the interface between experimental design and machine learning, providing a systematical framework of using DoE methodology for AI algorithms. This proposal contains six chapters. Chapter 1 provides a general introduction of design of experiments, machine learning, and surrogate modeling. Chapter 2 focuses on investigating the robustness of AI classification algorithms by conducting a comprehensive set of mixture experiments. Chapter 3 proposes a so-called Do-AIQ framework of using DoE for evaluating the AI algorithm’s quality assurance. I establish a design-of-experiment framework to construct an efficient space-filling design in a high-dimensional constraint space and develop an effective surrogate model using additive Gaussian process to enable the quality assessment of AI algorithms. Chapter 4 introduces a framework to generate continual learning (CL) datsets for cybersecurity applications. Chapter 5 presents a variable selection method under cumulative exposure model for time-to-event data with time-varying covariates. Chapter 6 provides the summary of the entire dissertation. / Doctor of Philosophy / Artificial intelligence (AI) techniques, including machine learning and deep learning algorithms, are widely used in various applications in the era of big data. While these algorithms have impressed the public with their remarkable performance, their underlying mechanisms are often highly complex and difficult to interpret. As a result, it becomes challenging to comprehensively evaluate the overall performance and quality of these algorithms. The Design of Experiments (DoE) offers a valuable set of tools for studying and understanding the underlying mechanisms of complex systems, thereby facilitating improvements. DoE has been successfully applied in diverse areas such as manufacturing, agriculture, and healthcare. The use of DoE has played a crucial role in enhancing processes and ensuring high quality. However, there are few works focusing on the use of DoE methodology for evaluating the quality assurance of AI algorithms, where an AI algorithm can be naturally considered as a complex system. This dissertation aims to develop innovative methodologies on the interface between experimental design and machine learning. The research conducted in this dissertation can serve as practical tools to use DoE methodology in the context of AI algorithms.
14	Regularisation and variable selection using penalized likelihood / Régularisation et sélection de variables par le biais de la vraisemblance pénalisée El anbari, Mohammed 14 December 2011 (has links) Dans cette thèse nous nous intéressons aux problèmes de la sélection de variables en régression linéaire. Ces travaux sont en particulier motivés par les développements récents en génomique, protéomique, imagerie biomédicale, traitement de signal, traitement d’image, en marketing, etc… Nous regardons ce problème selon les deux points de vue fréquentielle et bayésienne.Dans un cadre fréquentiel, nous proposons des méthodes pour faire face au problème de la sélection de variables, dans des situations pour lesquelles le nombre de variables peut être beaucoup plus grand que la taille de l’échantillon, avec présence possible d’une structure supplémentaire entre les variables, telle qu’une forte corrélation ou un certain ordre entre les variables successives. Les performances théoriques sont explorées ; nous montrons que sous certaines conditions de régularité, les méthodes proposées possèdent de bonnes propriétés statistiques, telles que des inégalités de parcimonie, la consistance au niveau de la sélection de variables et la normalité asymptotique.Dans un cadre bayésien, nous proposons une approche globale de la sélection de variables en régression construite sur les lois à priori g de Zellner dans une approche similaire mais non identique à celle de Liang et al. (2008) Notre choix ne nécessite aucune calibration. Nous comparons les approches de régularisation bayésienne et fréquentielle dans un contexte peu informatif où le nombre de variables est presque égal à la taille de l’échantillon. / We are interested in variable sélection in linear régression models. This research is motivated by recent development in microarrays, proteomics, brain images, among others. We study this problem in both frequentist and bayesian viewpoints.In a frequentist framework, we propose methods to deal with the problem of variable sélection, when the number of variables is much larger than the sample size with a possibly présence of additional structure in the predictor variables, such as high corrélations or order between successive variables. The performance of the proposed methods is theoretically investigated ; we prove that, under regularity conditions, the proposed estimators possess statistical good properties, such as Sparsity Oracle Inequalities, variable sélection consistency and asymptotic normality.In a Bayesian Framework, we propose a global noninformative approach for Bayesian variable sélection. In this thesis, we pay spécial attention to two calibration-free hierarchical Zellner’s g-priors. The first one is the Jeffreys prior which is not location invariant. A second one avoids this problem by only considering models with at least one variable in the model. The practical performance of the proposed methods is illustrated through numerical experiments on simulated and real world datasets, with a comparison betwenn Bayesian and frequentist approaches under a low informative constraint when the number of variables is almost equal to the number of observations. Réduction de la dimension Grandes dimensions Lasso Scad Elastic-net Sélection de modèles Propriétés d’Oracle Zellner’s g- prior Calibration Dimensionality réduction High dimensionality LASSO Scad Elastic-net Model selection Oracle property Zellner’s g-prior Calibration Scad
15	Utvärdering av maskininlärningsmodeller för riktad marknadsföring inom dagligvaruhandeln / Evaluation of machine learning methods for direct marketing within the FMCG trade Sundström, Ebba, Goodbrand Skagerlind, Valentin January 2020 (has links) Företag inom dagligvaruhandeln använder sig ofta av database marketing för att anpassa deras erbjudande till deras kunder och därmed stärka kundrelationen och ökaderas försäljning. Länge har logistisk regression varit en modell som ofta används för att bygga upp maskininlärningsmodeller som kan förutse vilka erbjudanden som löses in av vilken kund. I arbetet utvärderas en maskininlärningsmodell med logistisk regression och stepwise selection på kunddata från en av Sveriges större aktörer inom dagligvaruhandeln. Modellen jämförs med en annan modell som istället använder sig utav elastic net, vilket är en regulariserad regressionsmetod. Modellerna testas på fem olika produkter ur företagets sortiment och baseras på ett femtiotal variabler som beskriver kundernas sociodemografiska data och historiska köpbeteende i företagets butiker. Dessa utvärderas med hjälp av en förväxlingsmatris och värden för deras Accuracy, Balanced Accuracy, Precision, Recall och F1-score. Dessutom utvärderas modellen utifrån affärsnytta, påverkan på kundrelationer och hållbarhet. Studien visade att den logistiska regressionen med stepwise selection hade ett genomsnittligt värde för Precision på 23 procent. Vid användning av elastic net ökade värdet för Precision med i genomsnitt 7 procentenheter för samtliga modeller. Detta kan bero på att vissa av parametrarna i modellen med stepwise selection får överdrivet stora värden samt att stepwise selection väljer ut variabler för modellen som inte är optimala för att förutsäga kundens beteende. Det noterades även att kunder generellt verkade nöjda med de erbjudanden de fått, men missnöjda ifall de kände sig missförstådda av företaget. / Companies within the FMCG trade often uses database marketing to customize offers to each customer, and thereby strengthen customer relationships to the company and increase their sales. For a long time, logistic regression has been the preferred machine modelling method to predict which offer to present to each costumer. This study evaluates a machinelearning model based on logistic regression and stepwise selection on costumer data from one of Sweden’s larger companies within the FMCG trade. The model is later compared to another model based on the elastic net-method, which is a regularized regressionmodel. The models are tested on five different products from the company’s assortment and are based on about fifty different variables which describes the costumers’ sociodemographic factors and purchasing history. The models are evaluated using a confusion matrix and values stating their Accuracy, BalancedAccuracy, Precision, Recall and F1-score. Furthermore, the model is evaluated in the perspectives of business advantages, costumer relations and sustainability. The study concluded that the logistic regression and stepwise selection-model had an average Precisionon 23 procent. When the elastic net-method was used the Precision increased with approximately 7 percentage points. This might depend on the fact that some of the parameters in the logistic regression-model had an overrated value and that the stepwise selection chose a subset of features that was not optimal to predict the consumer behaviour. It was also noted that costumers most often seemed content, but were dissatisfied if they felt misunderstood by the company. Machine Learning Prediction Direct Marketing Database Marketing Logistic Regression Elastic Net Stepwise Selection Costumer Behavior Computer and Information Sciences Data- och informationsvetenskap
16	Regularisation and variable selection using penalized likelihood. El anbari, Mohammed 14 December 2011 (has links) (PDF) We are interested in variable sélection in linear régression models. This research is motivated by recent development in microarrays, proteomics, brain images, among others. We study this problem in both frequentist and bayesian viewpoints.In a frequentist framework, we propose methods to deal with the problem of variable sélection, when the number of variables is much larger than the sample size with a possibly présence of additional structure in the predictor variables, such as high corrélations or order between successive variables. The performance of the proposed methods is theoretically investigated ; we prove that, under regularity conditions, the proposed estimators possess statistical good properties, such as Sparsity Oracle Inequalities, variable sélection consistency and asymptotic normality.In a Bayesian Framework, we propose a global noninformative approach for Bayesian variable sélection. In this thesis, we pay spécial attention to two calibration-free hierarchical Zellner's g-priors. The first one is the Jeffreys prior which is not location invariant. A second one avoids this problem by only considering models with at least one variable in the model. The practical performance of the proposed methods is illustrated through numerical experiments on simulated and real world datasets, with a comparison betwenn Bayesian and frequentist approaches under a low informative constraint when the number of variables is almost equal to the number of observations. Dimensionality réduction High dimensionality LASSO Scad Elastic-net Model selection Oracle property Zellner's g-prior Calibration
17	Prediction with Penalized Logistic Regression : An Application on COVID-19 Patient Gender based on Case Series Data Schwarz, Patrick January 2021 (has links) The aim of the study was to evaluate dierent types of logistic regression to find the optimal model to predict the gender of hospitalized COVID-19 patients. The models were based on COVID-19 case series data from Pakistan using a set of 18 explanatory variables out of which patient age and BMI were numerical and the rest were categorical variables, expressing symptoms and previous health issues. Compared were a logistic regression using all variables, a logistic regression that used stepwise variable selection with 4 explanatory variables, a logistic Ridge regression model, a logistic Lasso regression model and a logistic Elastic Net regression model. Based on several metrics assessing the goodness of fit of the models and the evaluation of predictive power using the area under the ROC curve the Elastic Net that was only using the Lasso penalty had the best result and was able to predict 82.5% of the test cases correctly. Covid-19 Logistic Regression Penalized Regression Ridge Regression Elastic Net Classification Predictive Modeling Statistical Modeling glmnet Probability Theory and Statistics Sannolikhetsteori och statistik
18	Predicting deliveries from suppliers : A comparison of predictive models Sawert, Marcus January 2020 (has links) In the highly competitive environment that companies find themselves in today, it is key to have a well-functioning supply chain. For manufacturing companies, having a good supply chain is dependent on having a functioning production planning. The production planning tries to fulfill the demand while considering the resources available. This is complicated by the uncertainties that exist, such as the uncertainty in demand, in manufacturing and in supply. Several methods and models have been created to deal with production planning under uncertainty, but they often overlook the complexity in the supply uncertainty, by considering it as a stochastic uncertainty. To improve these models, a prediction based on earlier data regarding the supplier or item could be used to see when the delivery is likely to arrive. This study looked to compare different predictive models to see which one could best be suited for this purpose. Historic data regarding earlier deliveries was gathered from a large international manufacturing company and was preprocessed before used in the models. The target value that the models were to predict was the actual delivery time from the supplier. The data was then tested with the following four regression models in Python: Linear regression, ridge regression, Lasso and Elastic net. The results were calculated by cross-validation and presented in the form of the mean absolute error together with the standard deviation. The results showed that the Elastic net was the overall best performing model, and that the linear regression performed worst. Production planning Supply Deliveries Prediction Linear regression Ridge regression Lasso Elastic net Övrig annan teknik
19	Exploring relevant features associated with measles nonvaccination using a machine learning approach Olaya Bucaro, Orlando January 2020 (has links) Measles is resurging around the world, and large outbreaks have been observed in several parts of the world. In 2019 the Philippines suffered a major measles outbreak partly due to low immunization rates in certain parts of the population. There is currently limited research on how to identify and reach pockets of unvaccinated individuals effectively. This thesis aims to find important factors associated with non-vaccination against measles using a machine learning approach, using data from the 2017 Philippine National Demographic and Health Survey. In the analyzed sample (n = 4006), 74.84% of children aged 9 months to 3 years had received their first dose of measles vaccine, and 25.16% had not. Logistic regression with all 536 candidate features was fit with the regularized regression method Elastic Net, capable of automatically selecting relevant features. The final model consists of 32 predictors, and these are related to access and contact with healthcare, the region of residence, wealth, education, religion, ethnicity, sanitary conditions, the ideal number of children, husbands’ occupation, age and weight of the child, and features relating to pre and postnatal care. Total accuracy of the final model is 79.02% [95% confidence interval: (76.37%, 81.5%)], sensitivity: 97.73%, specificity: 23.41% and area under receiver operating characteristic curve: 0.81. The results indicate that socioeconomic differences determine to a degree measles vaccination. However, the difficulty in classifying non-vaccinated children, the low specificity, using only health and demographic characteristics suggests other factors than what is available in the analyzed data, possibly vaccine hesitation, could have a large effect on measles non-vaccination. Based on the results, efforts should be made to ensure access to facility-based delivery for all mothers regardless of socioeconomic status, to improve measles vaccination rates in the Philippines. Elastic Net Variable selection Determinants of immunization Regularized Regression Philippines
20	Authentication of Complex Botanical Materials by Chemometrics and Chemical Profiling Chen, Zewei 25 May 2021 (has links) No description available. Chemistry Botanical Materials Cannabis Marijuana Hemp Chemometrics Authentication Quality assurance Multivariate classification Library search Support vector elastic net Spectroscopy

Search results