• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 135
  • 16
  • Tagged with
  • 151
  • 151
  • 144
  • 141
  • 33
  • 32
  • 26
  • 26
  • 21
  • 17
  • 16
  • 15
  • 13
  • 13
  • 13
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
91

Monitoring portfolio weights by means of the Shewhart method

Mohammadian, Jeela January 2010 (has links)
The distribution of asset returns may lead to structural breaks. Thesebreaks may result in changes of the optimal portfolio weights. For a port-folio investor, the ability of timely detection of any systematic changesin the optimal portfolio weights is of a great interest.In this master thesis work, the use of the Shewhart method, as amethod for detecting a sudden parameter change, the implied changein the multivariate portfolio weights and its performance is reviewed.
92

Monotonic and Semiparametric Regression for the Detection of Trends in Environmental Quality Data

Hussian, Mohamed January 2005 (has links)
Natural fluctuations in the state of the environment can long conceal or distort important trends in the human impact on our ecosystems. Accordingly, there is increasing interest in statistical normalisation techniques that can clarify the anthropogenic effects by removing meteorologically driven fluctuations and other natural variation in time series of environmental quality data. This thesis shows that semi- and nonparametric regression methods can provide effective tools for applying such normalisation to collected data. In particular, it is demonstrated how monotonic regression can be utilised in this context. A new numerical algorithm for this type of regression can accommodate two or more discrete or continuous explanatory variables, which enables simultaneous estimation of a monotonic temporal trend and correction for one or more covariates that have a monotonic relationship with the response variable under consideration. To illustrate the method, a case study of mercury levels in fish is presented, using body length and weight as covariates. Semiparametric regression techniques enable trend analyses in which a nonparametric representation of temporal trends is combined with parametrically modelled corrections for covariates. Here, it is described how such models can be employed to extract trends from data collected over several seasons, and this procedure is exemplified by discussing how temporal trends in the load of nutrients carried by the Elbe River can be detected while adjusting for water discharge and other factors. In addition, it is shown how semiparametric models can be used for joint normalisation of several time series of data.
93

Minimizing the expected opportunity loss by optimizing the ordering of shipping methods in e-Commerce using Machine Learning / Minimera den potentiella förlusten genom att optimera ordningen av leveransmetoder inom e-handel med maskininlärning

Ay, Jonatan, Azrak, Jamil January 2022 (has links)
The shopping industry is rapidly changing as the technology is advancing. This is especially true for the online industry where consumers are nowadays able to to shop much of what the need over the internet. In order to make the shopping experience as smooth as possible, different companies develops their sites and checkouts to be as friction-less as possible. In this thesis, the shipping module of Klarnas checkout was analyzed and different models were created to get an understanding of how the likelihood of a customer finalizing a purchase (conversion rate) could be improved. The shipping module consists of a number of shipping methods along with shipping carriers. Currently, there is no logic to sort the different shipping method/carriers other than a static ordering for all customers. The order of the shipping methods and carriers are what were investigated in the thesis. Hence, the core problem is to understand how the opportunity loss could be minimized by a different ordering of the shipping methods, where the opportunity loss are derived by the reduction in conversion rate between the control group (current setup) and a new model. To achieve this, a dataset was prepared and features were engineered in such a way that the same training and test datasets could be used in all algorithms. The features were engineered using a point-in-time concept so that no target leakage would be present. The target that was used was a plain concatenation of shipping method plus the shipping carrier. Finally, three different methods tackling this multiclass classification problem were investigated, namely Logistic Regression, Extreme Gradient Boosting and Artificial Neural Network. The aim of these algorithms is to create a learner that has been trained on a given dataset and that is able to predict the combination of shipping method plus carrier given a certain set of features. By the end of the investigation, it was concluded that using a model to predict the most relevant shipping method (plus carrier) for the customer made a positive difference on the conversion rate and in turn, the increase in sales. The overall accuracy of the Logistic Regression was 65.09%, 71.61% for the Extreme Gradient Boosting and 70.88% for the Artificial Neural Network. Once the models were trained, they were used in a back-simulation (that would be a proxy for an A/B-test) on a validation set to see the effect on the conversion rate. Here, the results showed that the conversion rate was 84.85% for the Logistic Regression model, 84.95% for the Extreme Gradient Boosting and 85.02% for the Artificial Neural Network. The control group which was a random sample of the current logic had a conversion rate of 84.21%. Thus, implementing the Artificial Neural Network would increase Klarnas sales by about 6.5 SEK per session. / Detaljhandelsindustrin förändras i en snabb takt i samband med att teknologin utvecklas. Detta är speciellt fallet för näthandeln där konsumenter numer har möjligheten att handla i stort sett allt de behöver över internet. För att göra köpupplevelsen så smidig som möjlig utvecklar olika bolag deras hemsidor och online kassor så att de innehåller så lite friktion som möjligt. I denna avhandling utreddes Klarnas leveransmodul som är en den av Klarnas onlinekassa (Checkout). Här utvecklades flera modeller och analyserades för att få en förståelse för hur sannolikheten att kunden slutför ett köp (konverterinsgrad) kunde ökas. Leveransmodulen består av ett flertalet leveransmetoder tillsammans med en leverantör. I dagsläget finns det ingen logik för att sortera dessa metoder annat än en statisk sortering för alla kunder. Ordningen på leveransmetoderna och leverantörerna är alltså vad som utreddes. Kärnproblemet i denna avhandling är alltså att förstå hur den potentiella förlusten av att ha en suboptimal sortering, där den potentiella förlusten härleds av minskningen av konverteringsgraden mellan den nuvarande lösningen och en ny modell. För att uppnå detta förbereddes ett dataset och variabler skapades på sådant vis att både tränings och test datan kunde användas för samtliga algoritmer. Variablerna skapades med en Point-in-time koncept så att ingen ogiltig information skulle komma med. Målvariabeln, eller den beroende variabeln, var en enkel ihopslagning av leveransmetoden plus leverantörens namn. Sedan användes tre algoritmer för att tackla detta multiklass klassifikationsproblem, nämligen Logistisk Regression, Extreme Gradient Boosting samt ett Artificiellt Neuralt Nätverk. Målet med dessa algoritmer är att skapa en modell som tränats på ett givet dataset och som kan förutspå kombinationen av leveransmetod plus leverantör givet ett bestämt set av värden på variablerna. I slutet av utredningen drogs slutsatsen att en modell, som kunde förutspå den mest relevanta leveransmetoden (plus leverantör) för kunden, hade en positiv inverkan på konverteringsgraden och i sin tur ökningen i försäljning. Noggrannheten för den Logistiska Regressionen var 65.09%, för Extreme Gradient Boosting var den 71 69% och för det Artificiella Neurala Nätverket var den 70.88%. Efter att modellerna tränats användes de i en simulering (som skulle representera ett A/B-test) på ett valideringsset för att förstå effekten på konverteringsgraden. Här visade resultaten att konverteringsgraden var 84.55% för Logistiska Regressionen, 84.95% för Extreme Gradient Boosting samt 85.02% för det Artificiella Neurala Nätverket. Kontrollgruppen som bestod av slumpmässigt valda rader från den nuvarande logiken hade en konvertingsgrad på 84.21%. Detta innebar alltså att om det Artificiella Neurala Nätverket hade implementerats, så hade det ökat Klarnas försäljning med ca 6.5 SEK per session.
94

A Non-Gaussian Limit Process with Long-Range Dependence

Gaigalas, Raimundas January 2004 (has links)
<p>This thesis, consisting of three papers and a summary, studies topics in the theory of stochastic processes related to long-range dependence. Much recent interest in such probabilistic models has its origin in measurements of Internet traffic data, where typical characteristics of long memory have been observed. As a macroscopic feature, long-range dependence can be mathematically studied using certain scaling limit theorems. </p><p>Using such limit results, two different scaling regimes for Internet traffic models have been identified earlier. In one of these regimes traffic at large scales can be approximated by long-range dependent Gaussian or stable processes, while in the other regime the rescaled traffic fluctuates according to stable ``memoryless'' processes with independent increments. In Paper I a similar limit result is proved for a third scaling scheme, emerging as an intermediate case of the other two. The limit process here turns out to be a non-Gaussian and non-stable process with long-range dependence.</p><p>In Paper II we derive a representation for the latter limit process as a stochastic integral of a deterministic function with respect to a certain compensated Poisson random measure. This representation enables us to study some further properties of the process. In particular, we prove that the process at small scales behaves like a Gaussian process with long-range dependence, while at large scales it is close to a stable process with independent increments. Hence, the process can be regarded as a link between these two processes of completely different nature.</p><p>In Paper III we construct a class of processes locally behaving as Gaussian and globally as stable processes and including the limit process obtained in Paper I. These processes can be chosen to be long-range dependent and are potentially suitable as models in applications with distinct local and global behaviour. They are defined using stochastic integrals with respect to the same compensated Poisson random measure as used in Paper II.</p>
95

A Non-Gaussian Limit Process with Long-Range Dependence

Gaigalas, Raimundas January 2004 (has links)
This thesis, consisting of three papers and a summary, studies topics in the theory of stochastic processes related to long-range dependence. Much recent interest in such probabilistic models has its origin in measurements of Internet traffic data, where typical characteristics of long memory have been observed. As a macroscopic feature, long-range dependence can be mathematically studied using certain scaling limit theorems. Using such limit results, two different scaling regimes for Internet traffic models have been identified earlier. In one of these regimes traffic at large scales can be approximated by long-range dependent Gaussian or stable processes, while in the other regime the rescaled traffic fluctuates according to stable ``memoryless'' processes with independent increments. In Paper I a similar limit result is proved for a third scaling scheme, emerging as an intermediate case of the other two. The limit process here turns out to be a non-Gaussian and non-stable process with long-range dependence. In Paper II we derive a representation for the latter limit process as a stochastic integral of a deterministic function with respect to a certain compensated Poisson random measure. This representation enables us to study some further properties of the process. In particular, we prove that the process at small scales behaves like a Gaussian process with long-range dependence, while at large scales it is close to a stable process with independent increments. Hence, the process can be regarded as a link between these two processes of completely different nature. In Paper III we construct a class of processes locally behaving as Gaussian and globally as stable processes and including the limit process obtained in Paper I. These processes can be chosen to be long-range dependent and are potentially suitable as models in applications with distinct local and global behaviour. They are defined using stochastic integrals with respect to the same compensated Poisson random measure as used in Paper II.
96

On Methods for Real Time Sampling and Distributions in Sampling

Meister, Kadri January 2004 (has links)
This thesis is composed of six papers, all dealing with the issue of sampling from a finite population. We consider two different topics: real time sampling and distributions in sampling. The main focus is on Papers A–C, where a somewhat special sampling situation referred to as real time sampling is studied. Here a finite population passes or is passed by the sampler. There is no list of the population units available and for every unit the sampler should decide whether or not to sample it when he/she meets the unit. We focus on the problem of finding suitable sampling methods for the described situation and some new methods are proposed. In all, we try not to sample units close to each other so often, i.e. we sample with negative dependencies. Here the correlations between the inclusion indicators, called sampling correlations, play an important role. Some evaluation of the new methods are made by using a simulation study and asymptotic calculations. We study new methods mainly in comparison to standard Bernoulli sampling while having the sample mean as an estimator for the population mean. Assuming a stationary population model with decreasing autocorrelations, we have found the form for the nearly optimal sampling correlations by using asymptotic calculations. Here some restrictions on the sampling correlations are used. We gain most in efficiency using methods that give negatively correlated indicator variables, such that the correlation sum is small and the sampling correlations are equal for units up to lag m apart and zero afterwards. Since the proposed methods are based on sequences of dependent Bernoulli variables, an important part of the study is devoted to the problem of how to generate such sequences. The correlation structure of these sequences is also studied. The remainder of the thesis consists of three diverse papers, Papers D–F, where distributional properties in survey sampling are considered. In Paper D the concern is with unified statistical inference. Here both the model for the population and the sampling design are taken into account when considering the properties of an estimator. In this paper the framework of the sampling design as a multivariate distribution is used to outline two-phase sampling. In Paper E, we give probability functions for different sampling designs such as conditional Poisson, Sampford and Pareto designs. Methods to sample by using the probability function of a sampling design are discussed. Paper F focuses on the design-based distributional characteristics of the π-estimator and its variance estimator. We give formulae for the higher-order moments and cumulants of the π-estimator. Formulae of the design-based variance of the variance estimator, and covariance of the π-estimator and its variance estimator are presented.
97

Latent variable models for longitudinal twin data

Dominicus, Annica January 2006 (has links)
<p>Longitudinal twin data provide important information for exploring sources of variation in human traits. In statistical models for twin data, unobserved genetic and environmental factors influencing the trait are represented by latent variables. In this way, trait variation can be decomposed into genetic and environmental components. With repeated measurements on twins, latent variables can be used to describe individual trajectories, and the genetic and environmental variance components are assessed as functions of age. This thesis contributes to statistical methodology for analysing longitudinal twin data by (i) exploring the use of random change point models for modelling variance as a function of age, (ii) assessing how nonresponse in twin studies may affect estimates of genetic and environmental influences, and (iii) providing a method for hypothesis testing of genetic and environmental variance components. The random change point model, in contrast to linear and quadratic random effects models, is shown to be very flexible in capturing variability as a function of age. Approximate maximum likelihood inference through first-order linearization of the random change point model is contrasted with Bayesian inference based on Markov chain Monte Carlo simulation. In a set of simulations based on a twin model for informative nonresponse, it is demonstrated how the effect of nonresponse on estimates of genetic and environmental variance components depends on the underlying nonresponse mechanism. This thesis also reveals that the standard procedure for testing variance components is inadequate, since the null hypothesis places the variance components on the boundary of the parameter space. The asymptotic distribution of the likelihood ratio statistic for testing variance components in classical twin models is derived, resulting in a mixture of chi-square distributions. Statistical methodology is illustrated with applications to empirical data on cognitive function from a longitudinal twin study of aging. </p>
98

Estimation of wood fibre length distributions from censored mixture data

Svensson, Ingrid January 2007 (has links)
<p>The motivating forestry background for this thesis is the need for fast, non-destructive, and cost-efficient methods to estimate fibre length distributions in standing trees in order to evaluate the effect of silvicultural methods and breeding programs on fibre length. The usage of increment cores is a commonly used non-destructive sampling method in forestry. An increment core is a cylindrical wood sample taken with a special borer, and the methods proposed in this thesis are especially developed for data from increment cores. Nevertheless the methods can be used for data from other sampling frames as well, for example for sticks with the shape of an elongated rectangular box.</p><p>This thesis proposes methods to estimate fibre length distributions based on censored mixture data from wood samples. Due to sampling procedures, wood samples contain cut (censored) and uncut observations. Moreover the samples consist not only of the fibres of interest but of other cells (fines) as well. When the cell lengths are determined by an automatic optical fibre-analyser, there is no practical possibility to distinguish between cut and uncut cells or between fines and fibres. Thus the resulting data come from a censored version of a mixture of the fine and fibre length distributions in the tree. The methods proposed in this thesis can handle this lack of information.</p><p>Two parametric methods are proposed to estimate the fine and fibre length distributions in a tree. The first method is based on grouped data. The probabilities that the length of a cell from the sample falls into different length classes are derived, the censoring caused by the sampling frame taken into account. These probabilities are functions of the unknown parameters, and ML estimates are found from the corresponding multinomial model.</p><p>The second method is a stochastic version of the EM algorithm based on the individual length measurements. The method is developed for the case where the distributions of the true lengths of the cells at least partially appearing in the sample belong to exponential families. The cell length distribution in the sample and the conditional distribution of the true length of a cell at least partially appearing in the sample given the length in the sample are derived. Both these distributions are necessary in order to use the stochastic EM algorithm. Consistency and asymptotic normality of the stochastic EM estimates is proved.</p><p>The methods are applied to real data from increment cores taken from Scots pine trees (Pinus sylvestris L.) in Northern Sweden and further evaluated through simulation studies. Both methods work well for sample sizes commonly obtained in practice.</p>
99

Perturbed Renewal Equations with Non-Polynomial Perturbations

Ni, Ying January 2010 (has links)
<p>This thesis deals with a model of nonlinearly perturbed continuous-time renewal equation with nonpolynomial perturbations. The characteristics, namely the defect and moments, of the distribution function generating the renewal equation are assumed to have expansions with respect to a non-polynomial asymptotic scale: $\{\varphi_{\nn} (\varepsilon) =\varepsilon^{\nn \cdot \w}, \nn \in \mathbf{N}_0^k\}$  as $\varepsilon \to 0$, where $\mathbf{N}_0$ is the set of non-negative integers, $\mathbf{N}_0^k \equiv \mathbf{N}_0 \times \cdots \times \mathbf{N}_0, 1\leq k <\infty$ with the product being taken $k$ times and $\w$ is a $k$ dimensional parameter vector that satisfies certain properties. For the one-dimensional case, i.e., $k=1$, this model reduces to the model of nonlinearly perturbed renewal equation with polynomial perturbations which is well studied in the literature.  The goal of the present study is to obtain the exponential asymptotics for the solution to the perturbed renewal equation in the form of exponential asymptotic expansions and present possible applications.</p><p>The thesis is based on three papers which study successively the model stated above. Paper A investigates the two-dimensional case, i.e. where $k=2$. The corresponding asymptotic exponential expansion for the solution to the perturbed renewal equation is given. The asymptotic results are applied to an example of the perturbed risk process, which leads to diffusion approximation type asymptotics for the ruin probability.  Numerical experimental studies on this example of perturbed risk process are conducted in paper B, where Monte Carlo simulation are used to study the accuracy and properties of the asymptotic formulas. Paper C presents the asymptotic results for the more general case where the dimension $k$ satisfies $1\leq k <\infty$, which are applied to the asymptotic analysis of the ruin probability in an example of perturbed risk processes with this general type of non-polynomial perturbations.  All the proofs of the theorems stated in paper C are collected in its supplement: paper D.</p>
100

Approximations of Bayes Classifiers for Statistical Learning of Clusters

Ekdahl, Magnus January 2006 (has links)
<p>It is rarely possible to use an optimal classifier. Often the classifier used for a specific problem is an approximation of the optimal classifier. Methods are presented for evaluating the performance of an approximation in the model class of Bayesian Networks. Specifically for the approximation of class conditional independence a bound for the performance is sharpened.</p><p>The class conditional independence approximation is connected to the minimum description length principle (MDL), which is connected to Jeffreys’ prior through commonly used assumptions. One algorithm for unsupervised classification is presented and compared against other unsupervised classifiers on three data sets.</p> / Report code: LiU-TEK-LIC 2006:11.

Page generated in 0.1151 seconds