Global ETD Search

1	Models for fitting correlated non-identical bernoulli random variables with applications to an airline data problem Perez Romo Leroux, Andres January 2021 (has links) Our research deals with the problem of devising models for fitting non- identical dependent Bernoulli variables and using these models to predict fu- ture Bernoulli trials.We focus on modelling and predicting random Bernoulli response variables which meet all of the following conditions: 1. Each observed as well as future response corresponds to a Bernoulli trial 2. The trials are non-identical, having possibly different probabilities of occurrence 3. The trials are mutually correlated, with an underlying complex trial cluster correlation structure. Also allowing for the possible partitioning of trials within clusters into groups. Within cluster - group level correlation is reflected in the correlation structure. 4. The probability of occurrence and correlation structure for both ob- served and future trials can depend on a set of observed covariates. A number of proposed approaches meeting some of the above conditions are present in the current literature. Our research expands on existing statistical and machine learning methods. We propose three extensions to existing models that make use of the above conditions. Each proposed method brings specific advantages for dealing with correlated binary data. The proposed models allow for within cluster trial grouping to be reflected in the correlation structure. We partition sets of trials into groups either explicitly estimated or implicitly inferred. Explicit groups arise from the determination of common covariates; inferred groups arise via imposing mixture models. The main motivation of our research is in modelling and further understanding the potential of introducing binary trial group level correlations. In a number of applications, it can be beneficial to use models that allow for these types of trial groupings, both for improved predictions and better understanding of behavior of trials. The first model extension builds on the Multivariate Probit model. This model makes use of covariates and other information from former trials to determine explicit trial groupings and predict the occurrence of future trials. We call this the Explicit Groups model. The second model extension uses mixtures of univariate Probit models. This model predicts the occurrence of current trials using estimators of pa- rameters supporting mixture models for the observed trials. We call this the Inferred Groups model. Our third methods extends on a gradient descent based boosting algorithm which allows for correlation of binary outcomes called WL2Boost. We refer to our extension of this algorithm as GWL2Boost. Bernoulli trials are divided into observed and future trials; with all trials having associated known covariate information. We apply our methodology to the problem of predicting the set and total number of passengers who will not show up on commercial flights using covariate information and past passenger data. The models and algorithms are evaluated with regards to their capac- ity to predict future Bernoulli responses. We compare the models proposed against a set of competing existing models and algorithms using available air- line passenger no-show data. We show that our proposed algorithm extension GWL2Boost outperforms top existing algorithms and models that assume in- dependence of binary outcomes in various prediction metrics. / Statistics Statistics Applied case study Binary group correlation Correlated binary data Gradient descent boosting Machine learning Multivariate probit model
2	MODELING LARGE-SCALE CROSS EFFECT IN CO-PURCHASE INCIDENCE: COMPARING ARTIFICIAL NEURAL NETWORK TECHNIQUES AND MULTIVARIATE PROBIT MODELING Yang, Zhiguo 01 January 2015 (has links) This dissertation examines cross-category effects in consumer purchases from the big data and analytics perspectives. It uses data from Nielsen Consumer Panel and Scanner databases for its investigations. With big data analytics it becomes possible to examine the cross effects of many product categories on each other. The number of categories whose cross effects are studied is called category scale or just scale in this dissertation. The larger the category scale the higher the number of categories whose cross effects are studied. This dissertation extends research on models of cross effects by (1) examining the performance of MVP model across category scale; (2) customizing artificial neural network (ANN) techniques for large-scale cross effect analysis; (3) examining the performance of ANN across scale; and (4) developing a conceptual model of spending habits as a source of cross effect heterogeneity. The results provide researchers and managers new knowledge about using the two techniques in large category scale settings The computational capabilities required by MVP models grow exponentially with scale and thus are more significantly limited by computational capabilities than are ANN models. In our experiments, for scales 4, 8, 16 and 32, using Nielsen data, MVP models could not be estimated using baskets with 16 and more categories. We attempted to and could calibrate ANN models, on the other hand, for both scales 16 and 32. Surprisingly, the predictive results of ANN models exhibit an inverted U relationship with scale. As an ancillary result we provide a method for determining the existence and extent of non-linear own and cross category effects on likelihood of purchase of a category using ANN models. Besides our empirical studies, we draw on the mental budgeting model and impulsive spending literature, to provide a conceptualization of consumer spending habits as a source of heterogeneity in cross effect context. Finally, after a discussion of conclusions and limitations, the dissertation concludes with a discussion of open questions for future research. Cross category Co-purchase Large scale analysis Multivariate probit model Artificial neural network Business Intelligence Management Information Systems Marketing

Search results

Models for fitting correlated non-identical bernoulli random variables with applications to an airline data problem

MODELING LARGE-SCALE CROSS EFFECT IN CO-PURCHASE INCIDENCE: COMPARING ARTIFICIAL NEURAL NETWORK TECHNIQUES AND MULTIVARIATE PROBIT MODELING