1 |
Defining and predicting fast-selling clothing optionsJesperson, Sara January 2019 (has links)
This thesis aims to find a definition of fast-selling clothing options and to find a way to predict them using only a few weeks of sale data as input. The data used for this project contain daily sales and intake quantity for seasonal options, with sale start 2016-2018, provided by the department store chain Åhléns. A definition is found to describe fast-selling clothing options as those having sold a certain percentage of their intake after a fixed number of days. An alternative definition based on cluster affiliation is proven less effective. Two predictive models are tested, the first one being a probabilistic classifier and the second one being a k-nearest neighbor classifier, using the Euclidean distance. The probabilistic model is divided into three steps: transformation, clustering, and classification. The time series are transformed with B-splines to reduce dimensionality, where each time series is represented by a vector with its length and B-spline coefficients. As a tool to improve the quality of the predictions, the B-spline vectors are clustered with a Gaussian mixture model where every cluster is assigned one of the two labels fast-selling or ordinary, thus dividing the clusters into disjoint sets: one containing fast-selling clusters and the other containing ordinary clusters. Lastly, the time series to be predicted are assumed to be Laplace distributed around a B-spline and using the probability distributions provided by the clustering, the posterior probability for each class is used to classify the new observations. In the transformation step, the number of knots for the B-splines are evaluated with cross-validation and the Gaussian mixture models, from the clustering step, are evaluated with the Bayesian information criterion, BIC. The predictive performance of both classifiers is evaluated with accuracy, precision, and recall. The probabilistic model outperforms the k-nearest neighbor model with considerably higher values of accuracy, precision, and recall. The performance of each model is improved by using more data to make the predictions, most prominently with the probabilistic model.
|
2 |
Prediction Performance of Survival ModelsYuan, Yan January 2008 (has links)
Statistical models are often used for the prediction of
future random variables. There are two types of prediction, point
prediction and probabilistic prediction. The prediction accuracy is
quantified by performance measures, which are typically based on
loss functions. We study the estimators of these performance
measures, the prediction error and performance scores, for point and
probabilistic predictors, respectively. The focus of this thesis is
to assess the prediction performance of survival models that analyze
censored survival times. To accommodate censoring, we extend the
inverse probability censoring weighting (IPCW) method, thus
arbitrary loss functions can be handled. We also develop confidence
interval procedures for these performance measures.
We compare model-based, apparent loss based and cross-validation
estimators of prediction error under model misspecification and
variable selection, for absolute relative error loss (in chapter 3)
and misclassification error loss (in chapter 4). Simulation results
indicate that cross-validation procedures typically produce reliable
point estimates and confidence intervals, whereas model-based
estimates are often sensitive to model misspecification. The methods
are illustrated for two medical contexts in chapter 5. The apparent
loss based and cross-validation estimators of performance scores for
probabilistic predictor are discussed and illustrated with an
example in chapter 6. We also make connections for performance.
|
3 |
Prediction Performance of Survival ModelsYuan, Yan January 2008 (has links)
Statistical models are often used for the prediction of
future random variables. There are two types of prediction, point
prediction and probabilistic prediction. The prediction accuracy is
quantified by performance measures, which are typically based on
loss functions. We study the estimators of these performance
measures, the prediction error and performance scores, for point and
probabilistic predictors, respectively. The focus of this thesis is
to assess the prediction performance of survival models that analyze
censored survival times. To accommodate censoring, we extend the
inverse probability censoring weighting (IPCW) method, thus
arbitrary loss functions can be handled. We also develop confidence
interval procedures for these performance measures.
We compare model-based, apparent loss based and cross-validation
estimators of prediction error under model misspecification and
variable selection, for absolute relative error loss (in chapter 3)
and misclassification error loss (in chapter 4). Simulation results
indicate that cross-validation procedures typically produce reliable
point estimates and confidence intervals, whereas model-based
estimates are often sensitive to model misspecification. The methods
are illustrated for two medical contexts in chapter 5. The apparent
loss based and cross-validation estimators of performance scores for
probabilistic predictor are discussed and illustrated with an
example in chapter 6. We also make connections for performance.
|
4 |
Data-driven decision support in digital retailingSweidan, Dirar January 2023 (has links)
In the digital era and advent of artificial intelligence, digital retailing has emerged as a notable shift in commerce. It empowers e-tailers with data-driven insights and predictive models to navigate a variety of challenges, driving informed decision-making and strategic formulation. While predictive models are fundamental for making data-driven decisions, this thesis spotlights binary classifiers as a central focus. These classifiers reveal the complexities of two real-world problems, marked by their particular properties. Specifically, binary decisions are made based on predictions, relying solely on predicted class labels is insufficient because of the variations in classification accuracy. Furthermore, prediction outcomes have different costs associated with making different mistakes, which impacts the utility. To confront these challenges, probabilistic predictions, often unexplored or uncalibrated, is a promising alternative to class labels. Therefore, machine learning modelling and calibration techniques are explored, employing benchmark data sets alongside empirical studies grounded in industrial contexts. These studies analyse predictions and their associated probabilities across diverse data segments and settings. The thesis found, as a proof of concept, that specific algorithms inherently possess calibration while others, with calibrated probabilities, demonstrate reliability. In both cases, the thesis concludes that utilising top predictions with the highest probabilities increases the precision level and minimises the false positives. In addition, adopting well-calibrated probabilities is a powerful alternative to mere class labels. Consequently, by transforming probabilities into reliable confidence values through classification with a rejection option, a pathway emerges wherein confident and reliable predictions take centre stage in decision-making. This enables e-tailers to form distinct strategies based on these predictions and optimise their utility. This thesis highlights the value of calibrated models and probabilistic prediction and emphasises their significance in enhancing decision-making. The findings have practical implications for e-tailers leveraging data-driven decision support. Future research should focus on producing an automated system that prioritises high and well-calibrated probability predictions while discarding others and optimising utilities based on the costs and gains associated with the different prediction outcomes to enhance decision support for e-tailers. / <p>The current thesis is a part of the industrial graduate school in digital retailing (INSiDR) at the University of Borås and funded by the Swedish Knowledge Foundation.</p>
|
Page generated in 0.1282 seconds