• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 25
  • 6
  • 3
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • 1
  • 1
  • Tagged with
  • 64
  • 14
  • 13
  • 10
  • 10
  • 9
  • 9
  • 8
  • 7
  • 6
  • 5
  • 5
  • 4
  • 4
  • 4
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
41

Bayesian Inference on Longitudinal Semi-continuous Substance Abuse/Dependence Symptoms Data

Xing, Dongyuan 16 September 2015 (has links)
Substance use data such as alcohol drinking often contain a high proportion of zeros. In studies examining the alcohol consumption in college students, for instance, many students may not drink in the studied period, resulting in a number of zeros. Zero-inflated continuous data, also called semi continuous data, typically consist of a mixture of a degenerate distribution at the origin (zero) and a right-skewed, continuous distribution for the positive values. Ignoring the extreme non-normality in semi-continuous data may lead to substantially biased estimates and inference. Longitudinal or repeated measures of semi-continuous data present special challenges in statistical inference because of the correlation tangled in the repeated measures on the same subject. Linear mixed-eects models (LMM) with normality assumption that is routinely used to analyze correlated continuous outcomes are inapplicable for analyzing semi-continuous outcome. Data transformation such as log transformation is typically used to correct the non-normality in data. However, log-transformed data, after the addition of a small constant to handle zeros, may not successfully approximate the normal distribution due to the spike caused by the zeros in the original observations. In addition, the reasons that data transformation should be avoided include: (i) transforming usually provides reduced information on an underlying data generation mechanism; (ii) data transformation causes diculty in regard to interpretation of the transformed scale; and (iii) it may cause re-transformation bias. Two-part mixed-eects models with one component modeling the probability of being zero and one modeling the intensity of nonzero values have been developed over the last ten years to analyze the longitudinal semi-continuous data. However, log transformation is still needed for the right-skewed nonzero continuous values in the two-part modeling. In this research, we developed Bayesian hierarchical models in which the extreme non-normality in the longitudinal semi-continuous data caused by the spike at zero and right skewness was accommodated using skew-elliptical (SE) distribution and all of the inferences were carried out through Bayesian approach via Markov chain Monte Carlo (MCMC). The substance abuse/dependence data, including alcohol abuse/dependence symptoms (AADS) data and marijuana abuse/dependence symptoms (MADS) data from a longitudinal observational study, were used to illustrate the proposed models and methods. This dissertation explored three topics: First, we presented one-part LMM with skew-normal (SN) distribution under Bayesian framework and applied it to AADS data. The association between AADS and gene serotonin transporter polymorphism (5-HTTLPR) and baseline covariates was analyzed. The results from the proposed model were compared with those from LMMs with normal, Gamma and LN distributional assumptions. Simulation studies were conducted to evaluate the performance of the proposed models. We concluded that the LMM with SN distribution not only provides the best model t based on Deviance Information Criterion (DIC), but also offers more intuitive and convenient interpretation of results, because it models the original scale of response variable. Second, we proposed a flexible two-part mixed-effects model with skew distributions including skew-t (ST) and SN distributions for the right-skewed nonzero values in Part II of model under a Bayesian framework. The proposed model is illustrated with the longitudinal AADS data and the results from models with ST, SN and normal distributions were compared under different random-effects structures. Simulation studies are conducted to evaluate the performance of the proposed models. Third, multivariate (bivariate) correlated semi-continuous data are also commonly encountered in clinical research. For instance, the alcohol use and marijuana use may be observed in the same subject and there might be underlying common factors to cause the dependence of alcohol and marijuana uses. There is very limited literature on multivariate analysis of semi-continuous data. We proposed a Bayesian approach to analyze bivariate semi-continuous outcomes by jointly modeling a logistic mixed-effects model on zero-inflation in either response and a bivariate linear mixed-effects model (BLMM) on the positive values through a correlated random-effects structure. Multivariate skew distributions including ST and SN distributions were used to relax the normality assumption in BLMM. The proposed models were illustrated with an application to the longitudinal AADS and MADS data. A simulation study was conducted to evaluate the performance of the proposed models.
42

Využití metody paralelního sekvenování při stanovování zešikmení X inaktivace / Use of massive parallel sequencing in determination of skewed X inactivation

Veselková, Tereza January 2016 (has links)
Skewed X chromosome inactivation has been often studied as a possible factor that influences manifestation of X-linked diseases in heterozygous women. Yet the association between phenotype and degree of skewing stays unclear for most disorders. Current works rely mostly on methods that are based on methyl-sensitive restriction while determining the X inactivation pattern and mainly the HUMARA assay which investigates the methylation profile in the AR gene. However those methods have some known disadvantages and therefore we are still seeking new methodical approaches. We used DNA isolated from whole blood and in some cases also buccal swabs to asses X inactivation patterns in 54 women using methylation-based methods for loci AR, CNKSR2 and RP2. Transcription-based assay was utilized to evaluate skewing of X inactivation in 32 of those women, whose samples were available for RNA extraction, using massive parallel sequencing and polymorphisms LAMP2 c.156A>T, IDS c.438C>T and ABCD1 c.1548G>A. Partly thanks to almost no stuttering during PCR the RP2 locus was the most informative in our study (71 % of women) and approximately the same number of women (69 %) were informative for the HUMARA assay. However when comparing the results of those two methods we determined difference greater than 10 % in...
43

Amélioration de la dissémination de données biaisées dans les réseaux structurés / Improving skewed data dissemination in structured overlays

Antoine, Maeva 23 September 2015 (has links)
De nombreux systèmes distribués sont confrontés au problème du déséquilibre de charge entre machines. Avec l'émergence du Big Data, de larges volumes de données aux valeurs souvent biaisées sont produits par des sources hétérogènes pour être souvent traités en temps réel. Il faut donc être capable de s'adapter aux variations de volume/contenu/provenance de ces données. Nous nous intéressons ici aux données RDF, un format du Web Sémantique. Nous proposons une nouvelle approche pour améliorer la répartition des données, basée sur l'utilisation de plusieurs fonctions de hachage préservant l'ordre naturel des données dans le réseau. Cela permet à chaque pair de pouvoir indépendamment modifier la fonction de hachage qu'il applique sur les données afin de réduire l'intervalle de valeurs dont il est responsable. Plus généralement, pour résoudre le problème du déséquilibre de charge, il existe presque autant de stratégies qu'il y a de systèmes différents. Nous montrons que de nombreux dispositifs d'équilibrage de charge sont constitués des mêmes éléments de base, et que seules la mise en œuvre et l'interconnexion de ces éléments varient. Partant de ce constat, nous décrivons les concepts derrière la construction d'une API générique pour appliquer une stratégie d'équilibrage de charge qui est indépendante du reste du code. Mise en place sur notre système, l'API a un impact minimal sur le code métier et permet de changer une partie d'une stratégie sans modifier d'autres composants. Nous montrons aussi que la variation de certains paramètres peut influer sur les résultats obtenus. / Many distributed systems face the problem of load imbalance between machines. With the advent of Big Data, large datasets whose values are often highly skewed are produced by heterogeneous sources to be often processed in real time. Thus, it is necessary to be able to adapt to the variations of size/content/source of the incoming data. In this thesis, we focus on RDF data, a format of the Semantic Web. We propose a novel approach to improve data distribution, based on the use of several order-preserving hash functions. This allows an overloaded peer to independently modify its hash function in order to reduce the interval of values it is responsible for. More generally, to address the load imbalance issue, there exist almost as many load balancing strategies as there are different systems. We show that many load balancing schemes are comprised of the same basic elements, and only the implementation and interconnection of these elements vary. Based on this observation, we describe the concepts behind the building of a common API to implement any load balancing strategy independently from the rest of the code. Implemented on our distributed storage system, the API has a minimal impact on the business code and allows the developer to change only a part of a strategy without modifying the other components. We also show how modifying some parameters can lead to significant improvements in terms of results.
44

Four Essays on Risk Assessment with Financial Econometrics Models

Castillo, Brenda 25 July 2022 (has links)
This thesis includes four essays on risk assessment with financial econometrics models. The first chapter provides Monte Carlo evidence on the efficiency gains obtained in GARCH-base estimations of VaR and ES by incorporating dependence information through copulas and subsequently using full maximum likelihood (FML) estimates. First, individual returns series are considered; in this case, the efficiency gain stems from exploiting the relationship with another returns series using a copula model. Second, portfolio returns series obtained as a linear combination of returns series related with a copula model, are considered; in this case, the efficiency gain stems from using FML estimates instead of two-stage maximum likelihood estimates. Our results show that, in these situations, using copula models and FML leads to a substantial reduction in the mean squared error of the VaR and ES estimates (around 50\% when there is a medium degree of dependence between returns) and a notable improvement in the performance of backtesting procedures. Then, chapter 2 analyzes the impact of the COVID-19 pandemic on the conditional variance of stock returns. In this work, we look at this effect from a global perspective, employing series of major stock market and sector indices. We use the Hansen’s Skewed-t distribution with EGARCH extended to control for sudden changes in volatility. We oversee the COVID-19 effect on the VaR. Our results show that there is a significant sudden shift up in the return distribution variance post the announcement of the pandemic, which must be explained properly to obtain reliable measures for financial risk management. In chapter 3, we assess VaR and ES estimates assuming different models for standardised returns such as Cornish-Fisher and Gram-Charlier polynomial expansions, and well-known parametric densities such as normal, skewed Student-t family of Zhu and Galbraith (2010), and Johnson. This paper aims to check whether models based on polynomial expansions outperform the parametric ones. We carry out the model performance comparison in two stages. First, a backtesting analysis for VaR and ES, and second, using the loss function approach. Our backtesting results in our empirical exercise suggest that all distributions, but the normal, perform quite well in VaR and ES estimations. Regarding the loss function analysis, we conclude that the Cornish-Fisher expansion usually outperforms the others in VaR estimation, but Johnson distribution is the one that provides the best ES estimates in most cases. Although the differences among all distributions (excluding the normal) are not great. Finally, chapter 4 assess whether accounting for asymmetry and tail-dependence in returns distributions may help to identify more profitable investment strategies in asset portfolios. Three copula models are used to parameterize the multivariate distribution of returns: Gaussian, C-Vine and R-Vine copulas. Using data from equities and ETFs from the US market, we find evidence that, for portfolios of 48 constituents or less, the R-Vine copula is able to produce more profitable portfolios with respect to both, the C-Vine and Gaussian copulas. However, for portfolios of 100 assets, performance of R- and C-Vine copulas is quite similar, being both better than the Gaussian copula.
45

Large-Scale Testing of Passive Force Behavior for Skewed Bridge Abutments with Gravel and Geosynthetic Reinforced Soil (GRS) Backfills

Fredrickson, Amy 01 July 2015 (has links) (PDF)
Correct understanding of passive force behavior is particularly key to lateral evaluations of bridges because plastic deformation of soil backfill is vital to dissipation of earthquake energy and thermally-induced stresses in abutments. Only recently have studies investigated the effects of skew on passive force. Numerical modeling and a handful of skewed abutment tests performed in sand backfill have found reduced passive force with increasing skew, but previous to this study no skewed tests had been performed in gravel or Geosynthetic Reinforced Soil (GRS) backfills. The goal of this study was to better understand passive force behavior in non-skewed and skewed abutments with gravel and GRS backfills. Prior to this study, passive pressures in a GRS integrated approach had not been investigated. Gravel backfills also lack extensive passive force tests.Large-scale testing was performed with non-skewed and 30° skewed abutment configurations. Two tests were performed at each skew angle, one with unconfined gravel backfill and one with GRS backfill, for a total of four tests. The test abutment backwall was 11 ft (3.35 m) wide, non-skewed, and 5.5 ft (1.68 m) high and loaded laterally into the backfill. However, due to actuator loading constraints, all tests except the non-skewed unconfined gravel test were performed to a backfill height of 3.5 ft (1.07 m). The passive force results for the unconfined gravel test was scaled to a 3.5 ft (1.07 m) height for comparison.Test results in both sets of backfills confirmed previous findings that there is significant reduction in passive force with skewed abutment configurations. The reduction factor was 0.58 for the gravel backfill and 0.63 for the GRS backfill, compared to the predicted reduction factor of 0.53 for a 30° skew. These results are within the scatter of previous skewed testing, but could indicate that slightly higher reduction factors may be applicable for gravel backfills. Both backfills exhibited greater passive strength than sand backfills due to increased internal friction angle and unit weight. The GRS backfill had reduced initial stiffness and only reached 79% to 87% of the passive force developed by the unreinforced gravel backfill. This reduction was considered to be a result of reduced interface friction due to the geotextile. Additionally, the GRS behaved more linearly than unreinforced soil. This backfill elasticity is favorable in the GRS-Integrated Bridge System (GRS-IBS) abutment configuration because it allows thermal movement without developing excessive induced stresses in the bridge superstructure.
46

A Dual-Port Data Cache with Pseudo-Direct Mapping Function

Gade, Arul Sandeep 07 May 2005 (has links)
Conventional on-chip (L1) data caches such as Direct-Mapped (DM) and 2-way Set-Associative Caches (SAC) have been widely used for high-performance uni (or multi)-processors. Unfortunately, these schemes suffer from high conflict misses since more than one address is mapped onto the same cache line. To reduce the conflict misses, much research has been done in developing different cache architectures such as 2-way Skewed-Associative cache (Skew cache). The 2-way Skew cache has a hardware complexity equivalent to that of 2-way SAC and has a miss-rate approaching that of 4-way SAC. However, the reduction in the miss-rate using a Skew cache is limited by the confined space available to disperse the conflicting accesses over small memory banks. This research proposes a dual-port data cache called Pseudo-Direct Cache (PDC) to minimize the conflict misses by dispersing addresses effectively over a single memory bank. Our simulation results show that PDC reduces those misses significantly compared to any conventional L1 caches and also achieves 10-15% lesser miss-rates than a 2-way Skew cache. SimpleScalar simulator is used for these simulations with SPEC95FP benchmark programs. Similar results were also seen over SPEC2000FP benchmark programs. Simulations over CACTI 3.0 were performed to evaluate the hardware implications of PDC over Skew cache. The simulation results show that the PDC has a simple hardware complexity similar to 2-way SAC and has 4-15% better AMAT compared to 2-way Skew cache. The PDC also reduces execution cycles significantly.
47

Essays on Fine Structure of Asset Returns, Jumps, and Stochastic Volatility

Yu, Jung-Suk 22 May 2006 (has links)
There has been an on-going debate about choices of the most suitable model amongst a variety of model specifications and parameterizations. The first dissertation essay investigates whether asymmetric leptokurtic return distributions such as Hansen's (1994) skewed tdistribution combined with GARCH specifications can outperform mixed GARCH-jump models such as Maheu and McCurdy's (2004) GARJI model incorporating the autoregressive conditional jump intensity parameterization in the discrete-time framework. I find that the more parsimonious GJR-HT model is superior to mixed GARCH-jump models. Likelihood-ratio (LR) tests, information criteria such as AIC, SC, and HQ and Value-at-Risk (VaR) analysis confirm that GJR-HT is one of the most suitable model specifications which gives us both better fit to the data and parsimony of parameterization. The benefits of estimating GARCH models using asymmetric leptokurtic distributions are more substantial for highly volatile series such as emerging stock markets, which have a higher degree of non-normality. Furthermore, Hansen's skewed t-distribution also provides us with an excellent risk management tool evidenced by VaR analysis. The second dissertation essay provides a variety of empirical evidences to support redundancy of stochastic volatility for SP500 index returns when stochastic volatility is taken into account with infinite activity pure Lévy jumps models and the importance of stochastic volatility to reduce pricing errors for SP500 index options without regard to jumps specifications. This finding is important because recent studies have shown that stochastic volatility in a continuous-time framework provides an excellent fit for financial asset returns when combined with finite-activity Merton's type compound Poisson jump-diffusion models. The second essay also shows that stochastic volatility with jumps (SVJ) and extended variance-gamma with stochastic volatility (EVGSV) models perform almost equally well for option pricing, which strongly imply that the type of Lévy jumps specifications is not important factors to enhance model performances once stochastic volatility is incorporated. In the second essay, I compute option prices via improved Fast Fourier Transform (FFT) algorithm using characteristic functions to match arbitrary log-strike grids with equal intervals with each moneyness and maturity of actual market option prices.
48

Bayesian Models for the Analyzes of Noisy Responses From Small Areas: An Application to Poverty Estimation

Manandhar, Binod 26 April 2017 (has links)
We implement techniques of small area estimation (SAE) to study consumption, a welfare indicator, which is used to assess poverty in the 2003-2004 Nepal Living Standards Survey (NLSS-II) and the 2001 census. NLSS-II has detailed information of consumption, but it can give estimates only at stratum level or higher. While population variables are available for all households in the census, they do not include the information on consumption; the survey has the `population' variables nonetheless. We combine these two sets of data to provide estimates of poverty indicators (incidence, gap and severity) for small areas (wards, village development committees and districts). Consumption is the aggregate of all food and all non-food items consumed. In the welfare survey the responders are asked to recall all information about consumptions throughout the reference year. Therefore, such data are likely to be noisy, possibly due to response errors or recalling errors. The consumption variable is continuous and positively skewed, so a statistician might use a logarithmic transformation, which can reduce skewness and help meet the normality assumption required for model building. However, it could be problematic since back transformation may produce inaccurate estimates and there are difficulties in interpretations. Without using the logarithmic transformation, we develop hierarchical Bayesian models to link the survey to the census. In our models for consumption, we incorporate the `population' variables as covariates. First, we assume that consumption is noiseless, and it is modeled using three scenarios: the exponential distribution, the gamma distribution and the generalized gamma distribution. Second, we assume that consumption is noisy, and we fit the generalized beta distribution of the second kind (GB2) to consumption. We consider three more scenarios of GB2: a mixture of exponential and gamma distributions, a mixture of two gamma distributions, and a mixture of two generalized gamma distributions. We note that there are difficulties in fitting the models for noisy responses because these models have non-identifiable parameters. For each scenario, after fitting two hierarchical Bayesian models (with and without area effects), we show how to select the most plausible model and we perform a Bayesian data analysis on Nepal's poverty data. We show how to predict the poverty indicators for all wards, village development committees and districts of Nepal (a big data problem) by combining the survey data with the census. This is a computationally intensive problem because Nepal has about four million households with about four thousand households in the survey and there is no record linkage between households in the survey and the census. Finally, we perform empirical studies to assess the quality of our survey-census procedure.
49

Bayesian inference on quantile regression-based mixed-effects joint models for longitudinal-survival data from AIDS studies

Zhang, Hanze 17 November 2017 (has links)
In HIV/AIDS studies, viral load (the number of copies of HIV-1 RNA) and CD4 cell counts are important biomarkers of the severity of viral infection, disease progression, and treatment evaluation. Recently, joint models, which have the capability on the bias reduction and estimates' efficiency improvement, have been developed to assess the longitudinal process, survival process, and the relationship between them simultaneously. However, the majority of the joint models are based on mean regression, which concentrates only on the mean effect of outcome variable conditional on certain covariates. In fact, in HIV/AIDS research, the mean effect may not always be of interest. Additionally, if obvious outliers or heavy tails exist, mean regression model may lead to non-robust results. Moreover, due to some data features, like left-censoring caused by the limit of detection (LOD), covariates with measurement errors and skewness, analysis of such complicated longitudinal and survival data still poses many challenges. Ignoring these data features may result in biased inference. Compared to the mean regression model, quantile regression (QR) model belongs to a robust model family, which can give a full scan of covariate effect at different quantiles of the response, and may be more robust to extreme values. Also, QR is more flexible, since the distribution of the outcome does not need to be strictly specified as certain parametric assumptions. These advantages make QR be receiving increasing attention in diverse areas. To the best of our knowledge, few study focuses on the QR-based joint models and applies to longitudinal-survival data with multiple features. Thus, in this dissertation research, we firstly developed three QR-based joint models via Bayesian inferential approach, including: (i) QR-based nonlinear mixed-effects joint models for longitudinal-survival data with multiple features; (ii) QR-based partially linear mixed-effects joint models for longitudinal data with multiple features; (iii) QR-based partially linear mixed-effects joint models for longitudinal-survival data with multiple features. The proposed joint models are applied to analyze the Multicenter AIDS Cohort Study (MACS) data. Simulation studies are also implemented to assess the performance of the proposed methods under different scenarios. Although this is a biostatistical methodology study, some interesting clinical findings are also discovered.
50

Classification Analysis Techniques for Skewed Class

Chyi, Yu-Meei 12 February 2003 (has links)
Abstract Existing classification analysis techniques (e.g., decision tree induction, backpropagation neural network, k-nearest neighbor classification, etc.) generally exhibit satisfactory classification effectiveness when dealing with data with non-skewed class distribution. However, real-world applications (e.g., churn prediction and fraud detection) often involve highly skewed data in decision outcomes (e.g., 2% churners and 98% non-churners). Such a highly skewed class distribution problem, if not properly addressed, would imperil the resulting learning effectiveness and might result in a ¡§null¡¨ prediction system that simply predicts all instances as having the majority decision class as the training instances (e.g., predicting all customers as non-churners). In this study, we extended the multi-classifier class-combiner approach and proposed a clustering-based multi-classifier class-combiner technique to address the highly skewed class distribution problem in classification analysis. In addition, we proposed four distance-based methods for selecting a subset of instances having the majority decision class for lowering the degree of skewness in a data set. Using two real-world datasets (including mortality prediction for burn patients and customer loyalty prediction), empirical results suggested that the proposed clustering-based multi-classifier class-combiner technique generally outperformed the traditional multi-classifier class-combiner approach and the four distance-based methods. Keywords: Data Mining, Classification Analysis, Skewed Class Distribution Problem, Decision Tree Induction, Multi-classifier Class-combiner Approach, Clustering-based Multi-classifier Class-combiner Approach

Page generated in 0.0409 seconds