Global ETD Search

41	Purchase behaviour analysis in the retail industry using Generalized Linear Models / Analys av köpbeteende inom detaljhandeln med hjälp av generaliserade linjära modeller Karlsson, Sofia January 2018 (has links) This master thesis uses applied mathematicalstatistics to analyse purchase behaviour based on customer data of the Swedishbrand Indiska. The aim of the study is to build a model that can helppredicting the sales quantities of different product classes and identify whichfactors are the most significant in the different models and furthermore, tocreate an algorithm that can provide suggested product combinations in thepurchasing process. Generalized linear models with a Negative binomial distributionare applied to retrieve the predicted sales quantity. Moreover, conditionalprobability is used in the algorithm which results in a product recommendationengine based on the calculated conditional probability that the suggestedcombinations are purchased.From the findings, it can be concluded that all variables considered in themodels; original price, purchase month, colour, cluster, purchase country andchannel are significant for the predicted outcome of the sales quantity foreach product class. Furthermore, by using conditional probability andhistorical sales data, an algorithm can be constructed which createsrecommendations of product combinations of either one or two products that canbe bought together with an initial product that a customer shows interest in. / Matematisk statistik tillämpas i denna masteruppsats för att analysera köpbeteende baserat på kunddata från det svenska varumärket Indiska. Syftet med studien är att bygga modeller som kan hjälpa till att förutsäga försäljningskvantiteter för olika produktklasser och identifiera vilka faktorer som är mest signifikanta i de olika modellerna och därtill att skapa en algoritm som ger förslag på rekommenderade produktkombinationer i köpprocessen. Generaliserade linjära modeller med en negativ binomialfördelning utvecklades för att beräkna den förutspådda försäljningskvantiteten för de olika produktklasserna. Dessutom används betingad sannolikhet i algoritmen som resulterar i en produktrekommendationsmotor som baseras på den betingade sannolikheten att de föreslagna produktkombinationerna är inköpta.Från resultaten kan slutsatsen dras att alla variabler som beaktas i modellerna; originalpris, inköpsmånad, produktfärg, kluster, inköpsland och kanal är signifikanta för det predikterade resultatet av försäljningskvantiteten för varje produktklass. Vidare är det möjligt att, med hjälp av betingad sannolikhet och historisk försäljningsdata, konstruera en algoritm som skapar rekommendationer av produktkombinationer av en eller två produkter som kan köpas tillsammans med en produkt som en kund visar intresse för. Generalized linear models Algorithm Historical transaction Retail Fashion Recommendation engine Computational Mathematics Beräkningsmatematik
42	On the Efficiency of Designs for Linear Models in Non-regular Regions and the Use of Standard Desings for Generalized Linear Models Zahran, Alyaa R. 16 July 2002 (has links) The Design of an experiment involves selection of levels of one or more factor in order to optimize one or more criteria such as prediction variance or parameter variance criteria. Good experimental designs will have several desirable properties. Typically, one can not achieve all the ideal properties in a single design. Therefore, there are frequently several good designs and choosing among them involves tradeoffs. This dissertation contains three different components centered around the area of optimal design: developing a new graphical evaluation technique, discussing designs for non-regular regions for first order models with interaction for the two- and three-factor case, and using the standard designs in the case of generalized linear models (GLM). The Fraction of Design Space (FDS) technique is proposed as a new graphical evaluation technique that addresses good prediction. The new technique is comprised of two tools that give the researcher more detailed information by quantifying the fraction of design space where the scaled predicted variance is less than or equal to any pre-specified value. The FDS technique complements Variance Dispersion Graphs (VDGs) to give the researcher more insight about the design prediction capability. Several standard designs are studied with both methods: VDG and FDS. Many Standard designs are constructed for a factor space that is either a p-dimensional hypercube or hypersphere and any point inside or on the boundary of the shape is a candidate design point. However, some economic, or practical constraints may occur that restrict factor settings and result in an irregular experimental region. For the two- and three-factor case with one corner of the cuboidal design space excluded, three sensible alternative designs are proposed and compared. Properties of these designs and relative tradeoffs are discussed. Optimum experimental designs for GLM depend on the values of the unknown parameters. Several solutions to the dependency of the parameters of the optimality function were suggested in the literature. However, they are often unrealistic in practice. The behavior of the factorial designs, the well-known standard designs of the linear case, is studied for the GLM case. Conditions under which these designs have high G-efficiency are formulated. / Ph. D. non-regular design spaces design optimality fraction of design space technique generalized linear models linear models
43	Linking Streamflow Trends with Land Cover Change in a Southern US Water Tower Miele, Alexander 21 December 2023 (has links) Characterizing streamflow trends is important for water resources management. Streamflow conditions, and trends thereof, are critical drivers of all aspects of stream geomorphology, sediment and nutrient transport, and ecological processes. Using the non-parametric modified Mann-Kendall test, we analyzed streamflow trends from 1996 to 2022 for the Southern Appalachian (SA) region of the U.S. The forested uplands of the SA receive high amounts of rain and act as a "water tower" for the surrounding lowland area, both of which have experienced higher than average population growth and urban development. For the total of 127 USGS gages with continuous streamflow measurements, we also evaluated precipitation and land change rates and patterns within the upstream contributing areas. Statistical methods (i.e., generalized linear models) were then used to assess any linkages between land cover change (LCC) and streamflow trends. Our results show that 42 drainage areas are experiencing increasing trends in their precipitation, and 1 is experiencing a negative trend. A total of 71 drainage areas are experiencing increasing trends in either their annual streamflow minimums, maximums, medians, or variability, with some experiencing changes in multiple. From our models, it is suggested that agricultural expansion is associated with increasing minimum streamflow trends, but increasing precipitation is also positively linked. With this information, water managers would be aware of which areas are experiencing changes in streamflow amounts from LCC or precipitation and could then apply this in planning and predictions. / Master of Science / Water availability is important for resources management. Streamflow is a measure of available surface water and is an important component in the hydrological cycle. Using the non-parametric modified Mann-Kendall test, we analyzed streamflow trends from 1996 to 2022 for the Southern Appalachian (SA) region of the U.S. The forested uplands of the SA receive high amounts of rain and act as a "water tower" for the surrounding lowland area, both of which have experienced higher than average population growth and city expansion. For the total of 127 USGS gages with continuous streamflow measurements, we also evaluated precipitation and land cover change rates within the area upstream of the gage (or drainage/contributing area). Statistical methods (i.e., generalized linear models) were then used to assess any linkages between land cover change (LCC) and streamflow trends. Our results show that 42 drainage areas are experiencing increasing trends in their precipitation, and 1 is experiencing a negative trend. A total of 71 drainage areas are experiencing increasing trends in either their annual streamflow minimums, maximums, medians, or variability, with some experiencing changes in multiple. From our models, it is suggested that agricultural expansion is associated with increasing minimum streamflow trends, but increasing precipitation is also positively linked. With this information, water managers would be aware of which areas are experiencing changes in streamflow amounts from LCC or precipitation and could then apply this in planning and predictions.
44	Regression Analysis for Zero Inflated Population Under Complex Sampling Designs Paneru, Khyam Narayan 20 December 2013 (has links) No description available. Statistics zero-inflated pseudo-likelihood regression generalized linear models link function complex sampling desings
45	Estimating Veterans' Health Benefit Grants Using the Generalized Linear Mixed Cluster-Weighted Model with Incomplete Data Deng, Xiaoying January 2018 (has links) The poverty rate among veterans in US has increased over the past decade, according to the U.S. Department of Veterans Affairs (2015). Thus, it is crucial to veterans who live below the poverty level to get sufficient benefit grants. A study on prudently managing health benefit grants for veterans may be helpful for government and policy-makers making appropriate decisions and investments. The purpose of this research is to find an underlying group structure for the veterans' benefit grants dataset and then estimate veterans' benefit grants sought using incomplete data. The generalized linear mixed cluster-weighted model based on mixture models is carried out by grouping similar observations to the same cluster. Finally, the estimates of veterans' benefit grants sought will provide reference for future public policies. / Thesis / Master of Science (MSc) Cluster-weighted models Mixture models Generalized linear models Clustering Mixed-type data Incomplete data
46	Understanding Scaled Prediction Variance Using Graphical Methods for Model Robustness, Measurement Error and Generalized Linear Models for Response Surface Designs Ozol-Godfrey, Ayca 23 December 2004 (has links) Graphical summaries are becoming important tools for evaluating designs. The need to compare designs in term of their prediction variance properties advanced this development. A recent graphical tool, the Fraction of Design Space plot, is useful to calculate the fraction of the design space where the scaled prediction variance (SPV) is less than or equal to a given value. In this dissertation we adapt FDS plots, to study three specific design problems: robustness to model assumptions, robustness to measurement error and design properties for generalized linear models (GLM). This dissertation presents a graphical method for examining design robustness related to the SPV values using FDS plots by comparing designs across a number of potential models in a pre-specified model space. Scaling the FDS curves by the G-optimal bounds of each model helps compare designs on the same model scale. FDS plots are also adapted for comparing designs under the GLM framework. Since parameter estimates need to be specified, robustness to parameter misspecification is incorporated into the plots. Binomial and Poisson examples are used to study several scenarios. The third section involves a special type of response surface designs, mixture experiments, and deals with adapting FDS plots for two types of measurement error which can appear due to inaccurate measurements of the individual mixture component amounts. The last part of the dissertation covers mixture experiments for the GLM case and examines prediction properties of mixture designs using the adapted FDS plots. / Ph. D. FDS Plots Design Optimality Generalized Linear Models LD5655.V856 2004.O965
47	Semiparametric Regression Methods with Covariate Measurement Error Johnson, Nels Gordon 06 December 2012 (has links) In public health, biomedical, epidemiological, and other applications, data collected are often measured with error. When mismeasured data is used in a regression analysis, not accounting for the measurement error can lead to incorrect inference about the relationships between the covariates and the response. We investigate measurement error in the covariates of two types of regression models. For each we propose a fully Bayesian approach that treats the variable measured with error as a latent variable to be integrated over, and a semi-Bayesian approach which uses a first order Laplace approximation to marginalize the variable measured with error out of the likelihood. The first model is the matched case-control study for analyzing clustered binary outcomes. We develop low-rank thin plate splines for the case where a variable measured with error has an unknown, nonlinear relationship with the response. In addition to the semi- and fully Bayesian approaches, we propose another using expectation-maximization to detect both parametric and nonparametric relationships between the covariates and the binary outcome. We assess the performance of each method via simulation terms of mean squared error and mean bias. We illustrate each method on a perturbed example of 1--4 matched case-control study. The second regression model is the generalized linear model (GLM) with unknown link function. Usually, the link function is chosen by the user based on the distribution of the response variable, often to be the canonical link. However, when covariates are measured with error, incorrect inference as a result of the error can be compounded by incorrect choice of link function. We assess performance via simulation of the semi- and fully Bayesian methods in terms of mean squared error. We illustrate each method on the Framingham Heart Study dataset. The simulation results for both regression models support that the fully Bayesian approach is at least as good as the semi-Bayesian approach for adjusting for measurement error, particularly when the distribution of the variable of measure with error and the distribution of the measurement error are misspecified. / Ph. D. Bayesian methods error-in-covariates generalized linear models matched case-control studies mixed models semiparametric reg
48	Long-term Benefits of Extracurricular Activities on Socioeconomic Outcomes and Their Trends in 1988-2012 Long, Thomas Carl 09 November 2015 (has links) Across the country, budget cuts to education have resulted in decreased funds available for extracurricular activities. This trend in policy may have a significant impact on future outcomes, as reflected in student success measures. Using two datasets that were collected over the last two decades, in the present study, the researcher assessed the relationship between participation in extracurricular activities and the future socioeconomic outcomes in respondents' lives, including post-secondary education, full-time employment status, and income. Two existing large-scale longitudinal studies of the U.S. secondary students, i.e., the National Education Longitudinal Study of 1988 (NELS: 88) and the Education Longitudinal Study of 2002 (ELS: 2002), served as data sources. As these surveys were conducted about a decade apart, the information they yielded was suitable for meeting the study aims. Generalized linear models, such as multiple regression and logistic regression analyses, by applying sample weights, were performed to examine the impacts of extracurricular activity participation on the aforementioned outcome measures. The implications of the study findings, including the comparison of the results from two different datasets collected at different time points, were interpreted with respect to school budget policy. Results from the NELS: 88 and ELS: 2002 were also compared to evaluate the trends in the characteristics and performance of U.S. high school students during the 1988-2012 period. / Ph. D. Extracurricular Activities Socioeconomic Outcomes NELS: 88 ELS: 2002 Generalized Linear Models
49	Data-Driven Methods for Modeling and Predicting Multivariate Time Series using Surrogates Chakraborty, Prithwish 05 July 2016 (has links) Modeling and predicting multivariate time series data has been of prime interest to researchers for many decades. Traditionally, time series prediction models have focused on finding attributes that have consistent correlations with target variable(s). However, diverse surrogate signals, such as News data and Twitter chatter, are increasingly available which can provide real-time information albeit with inconsistent correlations. Intelligent use of such sources can lead to early and real-time warning systems such as Google Flu Trends. Furthermore, the target variables of interest, such as public heath surveillance, can be noisy. Thus models built for such data sources should be flexible as well as adaptable to changing correlation patterns. In this thesis we explore various methods of using surrogates to generate more reliable and timely forecasts for noisy target signals. We primarily investigate three key components of the forecasting problem viz. (i) short-term forecasting where surrogates can be employed in a now-casting framework, (ii) long-term forecasting problem where surrogates acts as forcing parameters to model system dynamics and, (iii) robust drift models that detect and exploit 'changepoints' in surrogate-target relationship to produce robust models. We explore various 'physical' and 'social' surrogate sources to study these sub-problems, primarily to generate real-time forecasts for endemic diseases. On modeling side, we employed matrix factorization and generalized linear models to detect short-term trends and explored various Bayesian sequential analysis methods to model long-term effects. Our research indicates that, in general, a combination of surrogates can lead to more robust models. Interestingly, our findings indicate that under specific scenarios, particular surrogates can decrease overall forecasting accuracy - thus providing an argument towards the use of 'Good data' against 'Big data'. / Ph. D. Multivariate Time Series Surrogates Generalized Linear Models Bayesian Sequential Analysis Computational Epidemiology
50	Statistical Methods for Dating Collections of Historical Documents Tilahun, Gelila 31 August 2011 (has links) The problem in this thesis was originally motivated by problems presented with documents of Early England Data Set (DEEDS). The central problem with these medieval documents is the lack of methods to assign accurate dates to those documents which bear no date. With the problems of the DEEDS documents in mind, we present two methods to impute missing features of texts. In the first method, we suggest a new class of metrics for measuring distances between texts. We then show how to combine the distances between the texts using statistical smoothing. This method can be adapted to settings where the features of the texts are ordered or unordered categoricals (as in the case of, for example, authorship assignment problems). In the second method, we estimate the probability of occurrences of words in texts using nonparametric regression techniques of local polynomial fitting with kernel weight to generalized linear models. We combine the estimated probability of occurrences of words of a text to estimate the probability of occurrence of a text as a function of its feature -- the feature in this case being the date in which the text is written. The application and results of our methods to the DEEDS documents are presented. Kernel Dating Documents Shingle Correspondence distance Smoothing Generalized linear models Logistics regression Local polynomial regression 0581 0463 0800

Search results