Spelling suggestions: "subject:"skewed distribution"" "subject:"askewed distribution""
1 |
Distribution-Free Confidence Intervals for Difference and Ratio of MediansPrice, Robert M., Bonett, Douglas G. 01 December 2002 (has links)
The classic nonparametric confidence intervals for a difference or ratio of medians assume that the distributions of the response variable or the log-transformed response variable have identical shapes in each population. Asymptotic distribution-free confidence intervals for a difference and ratio of medians are proposed which do not require identically shaped distributions. The new asymptotic methods are easy to compute and simulation results show that they perform well in small samples.
|
2 |
Bayesian Inference on Mixed-effects Models with Skewed Distributions for HIV longitudinal DataChen, Ren 01 January 2012 (has links)
Statistical models have greatly improved our understanding of the pathogenesis of HIV-1 infection
and guided for the treatment of AIDS patients and evaluation of antiretroviral (ARV) therapies.
Although various statistical modeling and analysis methods have been applied for estimating the
parameters of HIV dynamics via mixed-effects models, a common assumption of distribution is
normal for random errors and random-effects. This assumption may lack the robustness against
departures from normality so may lead misleading or biased inference. Moreover, some covariates
such as CD4 cell count may be often measured with substantial errors. Bivariate clustered
(correlated) data are also commonly encountered in HIV dynamic studies, in which the data set particularly
exhibits skewness and heavy tails. In the literature, there has been considerable interest in,
via tangible computation methods, comparing different proposed models related to HIV dynamics,
accommodating skewness (in univariate) and covariate measurement errors, or considering skewness
in multivariate outcomes observed in longitudinal studies. However, there have been limited
studies that address these issues simultaneously.
One way to incorporate skewness is to use a more general distribution family that can provide
flexibility in distributional assumptions of random-effects and model random errors to produce robust
parameter estimates. In this research, we developed Bayesian hierarchical models in which the
skewness was incorporated by using skew-elliptical (SE) distribution and all of the inferences were
carried out through Bayesian approach via Markov chain Monte Carlo (MCMC). Two real data set
from HIV/AIDS clinical trial were used to illustrate the proposed models and methods.
This dissertation explored three topics. First, with an SE distribution assumption, we compared
models with different time-varying viral decay rate functions. The effect of skewness on the model
fitting was also evaluated. The associations between the estimated decay rates based on the best
fitted model and clinical related variables such as baseline HIV viral load, CD4 cell count and longterm
response status were also evaluated. Second, by jointly modeling via a Bayesian approach,
we simultaneously addressed the issues of outcome with skewness and a covariate process with measurement errors. We also investigated how estimated parameters were changed under linear,
nonlinear and semiparametric mixed-effects models. Third, in order to accommodate individual
clustering within subjects as well as the correlation between bivariate measurements such as CD4
and CD8 cell count measured during the ARV therapies, bivariate linear mixed-effects models with
skewed distributions were investigated. Extended underlying normality assumption with SE distribution
assumption was proposed. The impacts of different distributions in SE family on the model
fit were also evaluated and compared.
Real data sets from AIDS clinical trial studies were used to illustrate the proposed methodologies
based on the three topics and compare various potential models with different distribution
specifications. The results may be important for HIV/AIDS studies in providing guidance to better
understand the virologic responses to antiretroviral treatment. Although this research is motivated
by HIV/AIDS studies, the basic concepts of the methods developed here can have generally broader
applications in other fields as long as the relevant technical specifications are met. In addition, the
proposed methods can be easily implemented by using the publicly available WinBUGS package,
and this makes our approach quite accessible to practicing statisticians in the fields.
|
3 |
Amélioration de la dissémination de données biaisées dans les réseaux structurés / Improving skewed data dissemination in structured overlaysAntoine, Maeva 23 September 2015 (has links)
De nombreux systèmes distribués sont confrontés au problème du déséquilibre de charge entre machines. Avec l'émergence du Big Data, de larges volumes de données aux valeurs souvent biaisées sont produits par des sources hétérogènes pour être souvent traités en temps réel. Il faut donc être capable de s'adapter aux variations de volume/contenu/provenance de ces données. Nous nous intéressons ici aux données RDF, un format du Web Sémantique. Nous proposons une nouvelle approche pour améliorer la répartition des données, basée sur l'utilisation de plusieurs fonctions de hachage préservant l'ordre naturel des données dans le réseau. Cela permet à chaque pair de pouvoir indépendamment modifier la fonction de hachage qu'il applique sur les données afin de réduire l'intervalle de valeurs dont il est responsable. Plus généralement, pour résoudre le problème du déséquilibre de charge, il existe presque autant de stratégies qu'il y a de systèmes différents. Nous montrons que de nombreux dispositifs d'équilibrage de charge sont constitués des mêmes éléments de base, et que seules la mise en œuvre et l'interconnexion de ces éléments varient. Partant de ce constat, nous décrivons les concepts derrière la construction d'une API générique pour appliquer une stratégie d'équilibrage de charge qui est indépendante du reste du code. Mise en place sur notre système, l'API a un impact minimal sur le code métier et permet de changer une partie d'une stratégie sans modifier d'autres composants. Nous montrons aussi que la variation de certains paramètres peut influer sur les résultats obtenus. / Many distributed systems face the problem of load imbalance between machines. With the advent of Big Data, large datasets whose values are often highly skewed are produced by heterogeneous sources to be often processed in real time. Thus, it is necessary to be able to adapt to the variations of size/content/source of the incoming data. In this thesis, we focus on RDF data, a format of the Semantic Web. We propose a novel approach to improve data distribution, based on the use of several order-preserving hash functions. This allows an overloaded peer to independently modify its hash function in order to reduce the interval of values it is responsible for. More generally, to address the load imbalance issue, there exist almost as many load balancing strategies as there are different systems. We show that many load balancing schemes are comprised of the same basic elements, and only the implementation and interconnection of these elements vary. Based on this observation, we describe the concepts behind the building of a common API to implement any load balancing strategy independently from the rest of the code. Implemented on our distributed storage system, the API has a minimal impact on the business code and allows the developer to change only a part of a strategy without modifying the other components. We also show how modifying some parameters can lead to significant improvements in terms of results.
|
4 |
Bayesian Models for the Analyzes of Noisy Responses From Small Areas: An Application to Poverty EstimationManandhar, Binod 26 April 2017 (has links)
We implement techniques of small area estimation (SAE) to study consumption, a welfare indicator, which is used to assess poverty in the 2003-2004 Nepal Living Standards Survey (NLSS-II) and the 2001 census. NLSS-II has detailed information of consumption, but it can give estimates only at stratum level or higher. While population variables are available for all households in the census, they do not include the information on consumption; the survey has the `population' variables nonetheless. We combine these two sets of data to provide estimates of poverty indicators (incidence, gap and severity) for small areas (wards, village development committees and districts). Consumption is the aggregate of all food and all non-food items consumed. In the welfare survey the responders are asked to recall all information about consumptions throughout the reference year. Therefore, such data are likely to be noisy, possibly due to response errors or recalling errors. The consumption variable is continuous and positively skewed, so a statistician might use a logarithmic transformation, which can reduce skewness and help meet the normality assumption required for model building. However, it could be problematic since back transformation may produce inaccurate estimates and there are difficulties in interpretations. Without using the logarithmic transformation, we develop hierarchical Bayesian models to link the survey to the census. In our models for consumption, we incorporate the `population' variables as covariates. First, we assume that consumption is noiseless, and it is modeled using three scenarios: the exponential distribution, the gamma distribution and the generalized gamma distribution. Second, we assume that consumption is noisy, and we fit the generalized beta distribution of the second kind (GB2) to consumption. We consider three more scenarios of GB2: a mixture of exponential and gamma distributions, a mixture of two gamma distributions, and a mixture of two generalized gamma distributions. We note that there are difficulties in fitting the models for noisy responses because these models have non-identifiable parameters. For each scenario, after fitting two hierarchical Bayesian models (with and without area effects), we show how to select the most plausible model and we perform a Bayesian data analysis on Nepal's poverty data. We show how to predict the poverty indicators for all wards, village development committees and districts of Nepal (a big data problem) by combining the survey data with the census. This is a computationally intensive problem because Nepal has about four million households with about four thousand households in the survey and there is no record linkage between households in the survey and the census. Finally, we perform empirical studies to assess the quality of our survey-census procedure.
|
Page generated in 0.0654 seconds