541 |
Road Safety Assessment of U.S. States: A Joint Frontier and Neural Network ModelingApproachEgilmez, Gokhan 24 September 2013 (has links)
No description available.
|
542 |
Three Essays on Residential Land DevelopmentWrenn, Douglas Harvey, II 19 December 2012 (has links)
No description available.
|
543 |
ON SOME INFERENTIAL ASPECTS FOR TYPE-II AND PROGRESSIVE TYPE-II CENSORINGVolterman, William D. 10 1900 (has links)
<p>This thesis investigates nonparametric inference under multiple independent samples with various modes of censoring, and also presents results concerning Pitman Closeness under Progressive Type-II right censoring. For the nonparametric inference with multiple independent samples, the case of Type-II right censoring is first considered. Two extensions to this are then discussed: doubly Type-II censoring, and Progressive Type-II right censoring. We consider confidence intervals for quantiles, prediction intervals for order statistics from a future sample, and tolerance intervals for a population proportion. Benefits of using multiple samples over one sample are discussed. For each of these scenarios, we consider simulation as an alternative to exact calculations. In each case we illustrate the results with data from the literature. Furthermore, we consider two problems concerning Pitman Closeness and Progressive Type-II right censoring. We derive simple explicit formulae for the Pitman Closeness probabilities of the order statistics to population quantiles. Various tables are given to illustrate these results. We then use the Pitman Closeness measure as a criterion for determining the optimal censoring scheme for samples drawn from the exponential distribution. A general result is conjectured, and demonstrated in special cases</p> / Doctor of Philosophy (PhD)
|
544 |
Statistical methods for variant discovery and functional genomic analysis using next-generation sequencing dataTang, Man 03 January 2020 (has links)
The development of high-throughput next-generation sequencing (NGS) techniques produces massive amount of data, allowing the identification of biomarkers in early disease diagnosis and driving the transformation of most disciplines in biology and medicine. A greater concentration is needed in developing novel, powerful, and efficient tools for NGS data analysis. This dissertation focuses on modeling ``omics'' data in various NGS applications with a primary goal of developing novel statistical methods to identify sequence variants, find transcription factor (TF) binding patterns, and decode the relationship between TF and gene expression levels. Accurate and reliable identification of sequence variants, including single nucleotide polymorphisms (SNPs) and insertion-deletion polymorphisms (INDELs), plays a fundamental role in NGS applications. Existing methods for calling these variants often make simplified assumption of positional independence and fail to leverage the dependence of genotypes at nearby loci induced by linkage disequilibrium. We propose vi-HMM, a hidden Markov model (HMM)-based method for calling SNPs and INDELs in mapped short read data. Simulation experiments show that, under various sequencing depths, vi-HMM outperforms existing methods in terms of sensitivity and F1 score. When applied to the human whole genome sequencing data, vi-HMM demonstrates higher accuracy in calling SNPs and INDELs. One important NGS application is chromatin immunoprecipitation followed by sequencing (ChIP-seq), which characterizes protein-DNA relations through genome-wide mapping of TF binding sites. Multiple TFs, binding to DNA sequences, often show complex binding patterns, which indicate how TFs with similar functionalities work together to regulate the expression of target genes. To help uncover the transcriptional regulation mechanism, we propose a novel nonparametric Bayesian method to detect the clustering pattern of multiple-TF bindings from ChIP-seq datasets. Simulation study demonstrates that our method performs best with regard to precision, recall, and F1 score, in comparison to traditional methods. We also apply the method on real data and observe several TF clusters that have been recognized previously in mouse embryonic stem cells. Recent advances in ChIP-seq and RNA sequencing (RNA-Seq) technologies provides more reliable and accurate characterization of TF binding sites and gene expression measurements, which serves as a basis to study the regulatory functions of TFs on gene expression. We propose a log Gaussian cox process with wavelet-based functional model to quantify the relationship between TF binding site locations and gene expression levels. Through the simulation study, we demonstrate that our method performs well, especially with large sample size and small variance. It also shows a remarkable ability to distinguish real local feature in the function estimates. / Doctor of Philosophy / The development of high-throughput next-generation sequencing (NGS) techniques produces massive amount of data and bring out innovations in biology and medicine. A greater concentration is needed in developing novel, powerful, and efficient tools for NGS data analysis. In this dissertation, we mainly focus on three problems closely related to NGS and its applications: (1) how to improve variant calling accuracy, (2) how to model transcription factor (TF) binding patterns, and (3) how to quantify of the contribution of TF binding on gene expression. We develop novel statistical methods to identify sequence variants, find TF binding patterns, and explore the relationship between TF binding and gene expressions. We expect our findings will be helpful in promoting a better understanding of disease causality and facilitating the design of personalized treatments.
|
545 |
Semiparametric Bayesian Approach using Weighted Dirichlet Process Mixture For Finance Statistical ModelsSun, Peng 07 March 2016 (has links)
Dirichlet process mixture (DPM) has been widely used as exible prior in nonparametric Bayesian literature, and Weighted Dirichlet process mixture (WDPM) can be viewed as extension of DPM which relaxes model distribution assumptions. Meanwhile, WDPM requires to set weight functions and can cause extra computation burden. In this dissertation, we develop more efficient and exible WDPM approaches under three research topics. The first one is semiparametric cubic spline regression where we adopt a nonparametric prior for error terms in order to automatically handle heterogeneity of measurement errors or unknown mixture distribution, the second one is to provide an innovative way to construct weight function and illustrate some decent properties and computation efficiency of this weight under semiparametric stochastic volatility (SV) model, and the last one is to develop WDPM approach for Generalized AutoRegressive Conditional Heteroskedasticity (GARCH) model (as an alternative approach for SV model) and propose a new model evaluation approach for GARCH which produces easier-to-interpret result compared to the canonical marginal likelihood approach.
In the first topic, the response variable is modeled as the sum of three parts. One part is a linear function of covariates that enter the model parametrically. The second part is an additive nonparametric model. The covariates whose relationships to response variable are unclear will be included in the model nonparametrically using Lancaster and Šalkauskas bases. The third part is error terms whose means and variance are assumed to follow non-parametric priors. Therefore we denote our model as dual-semiparametric regression because we include nonparametric idea for both modeling mean part and error terms. Instead of assuming all of the error terms follow the same prior in DPM, our WDPM provides multiple candidate priors for each observation to select with certain probability. Such probability (or weight) is modeled by relevant predictive covariates using Gaussian kernel. We propose several different WDPMs using different weights which depend on distance in covariates. We provide the efficient Markov chain Monte Carlo (MCMC) algorithms and also compare our WDPMs to parametric model and DPM model in terms of Bayes factor using simulation and empirical study.
In the second topic, we propose an innovative way to construct weight function for WDPM and apply it to SV model. SV model is adopted in time series data where the constant variance assumption is violated. One essential issue is to specify distribution of conditional return. We assume WDPM prior for conditional return and propose a new way to model the weights. Our approach has several advantages including computational efficiency compared to the weight constructed using Gaussian kernel. We list six properties of this proposed weight function and also provide the proof of them. Because of the additional Metropolis-Hastings steps introduced by WDPM prior, we find the conditions which can ensure the uniform geometric ergodicity of transition kernel in our MCMC. Due to the existence of zero values in asset price data, our SV model is semiparametric since we employ WDPM prior for non-zero values and parametric prior for zero values.
On the third project, we develop WDPM approach for GARCH type model and compare different types of weight functions including the innovative method proposed in the second topic. GARCH model can be viewed as an alternative way of SV for analyzing daily stock prices data where constant variance assumption does not hold. While the response variable of our SV models is transformed log return (based on log-square transformation), GARCH directly models the log return itself. This means that, theoretically speaking, we are able to predict stock returns using GARCH models while this is not feasible if we use SV model. Because SV models ignore the sign of log returns and provides predictive densities for squared log return only. Motivated by this property, we propose a new model evaluation approach called back testing return (BTR) particularly for GARCH. This BTR approach produces model evaluation results which are easier to interpret than marginal likelihood and it is straightforward to draw conclusion about model profitability by applying this approach. Since BTR approach is only applicable to GARCH, we also illustrate how to properly cal- culate marginal likelihood to make comparison between GARCH and SV. Based on our MCMC algorithms and model evaluation approaches, we have conducted large number of model fittings to compare models in both simulation and empirical study. / Ph. D.
|
546 |
[pt] DIVERSIDADE DO LUCRO ENTRE AS PEQUENAS EMPRESAS BRASILEIRAS: O MERCADO DE CRÉDITO COMO UM DE SEUS POSSÍVEIS DETERMINANTES / [en] PROFIT DIVERSITY AMONG BRAZILIAN SMALL FIRMS PROFIT: CREDIT MARKET AS A DETERMINANTCRISTINE CAMPOS DE XAVIER PINTO 04 December 2003 (has links)
[pt] Uma característica de alguns conta-próprias e dos
empregadores com até cinco empregados brasileiros, é sua
baixa produtividade e lucratividade. Além disso, entre
estas pequenas empresas, existe uma grande diversidade de
níveis de lucro, algumas com patamares de lucro altos
quando comparadas às demais e outras com lucros
irrisórios. A maioria dos estudiosos associa esta baixa
produtividade das pequenas empresas brasileiras ao
mercado de crédito em que elas atuam. Segundo eles, as
imperfeições no mercado de crédito impedem que estas
pequenas produções consigam obter o volume de capital
necessário a execução dos projetos mais eficientes. Além
disso, como algumas empresas enfrentam restrições
creditícias mais severas que outras, elas podem ter
produtividade diferentes e consequentemente níveis de
lucro diferentes. Ao observar os dados fornecidos pelo IBGE
em 1997 sobre estas pequena produções percebe-se que de
fato elas operam em patamares de lucro diferentes,
algumas com lucros bem acima da média e outras com
lucros negativos, e utilizam volumes de capital muito
diferentes em seu processo produtivo. Este resultado é
estranho, uma vez que as pequenas empresas atuam nos mesmos
mercados, sendo tomadoras de preços e provavelmente usam a
mesma tecnologia, e portanto, segundo a teoria
microeconômica convencional, teriam que empregar o mesmo
volume de capital em seu processo produtivo e obter o
mesmo nível de lucro. Nesta dissertação, propõe uma
abordagem que relaciona o mercado de crédito a diversidade
de lucro entre as pequenas empresas brasileiras, sendo as
imperfeições credíticias um dos possíveis determinantes
para a lucratividade e produtividade destas empresas. / [en] The self-employees and the employers with no more than five
employees in Brazil are well known for their low
productivity and for their low profitability. These small
firms have a great variety of profits level among of them,
with some having profits above the mean and others with
negative profits. In the majority of the studies, low
profitability and low productivity are associated with the
imperfections in credit market. The majority of
this enterprises do not get the sufficient capital to
invest in the most productive projects, because of the
restrictions in credit market. When we work with the data
for the small employers and self-employees in Brazil, we
see that the profits level differs among these firms,
and they put different amount of capital in their
production. This result is not expected, because the small
firms in Brazil participate in the same markets, and
probably have the same technology. Thus according to the
microeconomic theory, they should have the same
profits level and should use the same amount of
capital in their production. This paper try to infer if
the credit constrains are the only factor that affects
small enterprises profit in Brazil.
|
547 |
Statistics for diffusion processes with low and high-frequency observationsChorowski, Jakub 11 November 2016 (has links)
Diese Dissertation betrachtet das Problem der nichtparametrischen Schätzung der Diffusionskoeffizienten eines ein-dimensionalen und zeitlich homogenen Itô-Diffusionsprozesses. Dabei werden verschiedene diskrete Sampling Regimes untersucht. Im ersten Teil zeigen wir, dass eine Variante des von Gobet, Hoffmann und Reiß konstruierten Niedrigfrequenz-Schätzers auch im Fall von zufälligen Beobachtungszeiten verwendet werden kann. Wir beweisen, dass der Schätzer optimal im Minimaxsinn und adaptiv bezüglich der Verteilung der Beobachtungszeiten ist. Außerdam wenden wir die Lepski Methode an um einen Schätzer zu erhalten, der zusätzlich adaptiv bezüglich der Sobolev-Glattheit des Drift- und Volatilitätskoeffizienten ist. Im zweiten Teil betrachten wir das Problem der Volatilitätsschätzung für äquidistante Beobachtungen. Im Fall eines stationären Prozesses, mit kompaktem Zustandsraum, erhalten wir einen Schätzer, der sowohl bei hochfrequenten als auch bei niedrigfrequenten Beobachtungen die optimale Minimaxrate erreicht. Die Konstruktion des Schätzers beruht auf spektralen Methoden. Im Fall von niedrigfrequenten Beobachtungen ist die Analyse des Schätzers ähnlich wie diejenige in der Arbeit von Gobet, Hoffmann und Reiß. Im hochfrequenten Fall hingegen finden wir die Konvergenzraten durch lokale Mittelwertbildung und stellen daubt eine Verbindung zum Hochfrequenzschätzer von Florens-Zmirou her. In der Analyse unseres universalen Schätzers benötigen wir scharfe obere Schranken für den Schätzfehler von Funktionalen der Occupation time für unstetige Funktionen. Wir untersuchen eine auf Riemannsummen basierende Approximation der Occupation time eines stationären, reversiblen Markov-Prozesses und leiten obere Schranken für den quadratischen Fehler her. Im Fall von Diffusionsprozessen erhalten wir Konvergenzraten für Sobolev Funktionen. / In this thesis, we consider the problem of nonparametric estimation of the diffusion coefficients of a scalar time-homogeneous Itô diffusion process from discrete observations under various sampling assumptions. In the first part, the low-frequency estimation method proposed by Gobet, Hoffmann and Reiß is modified to cover the case of random sampling times. The estimator is shown to be optimal in the minimax sense and adaptive to the sampling distribution. Moreover, Lepski''s method is applied to adapt to the unknown Sobolev smoothness of the drift and volatility coefficients. In the second part, we address the problem of volatility estimation from equidistant observations without a predefined frequency regime. In the case of a stationary diffusion with compact state space and boundary reflection, we introduce a universal estimator that attains the minimax optimal convergence rates for both low and high-frequency observations. Being based on the spectral method, the low-frequency analysis is similar to the study conducted by Gobet, Hoffmann and Reiß. On the other hand, the derivation of the convergence rates in the high-frequency regime requires local averaging of the low-frequency estimator, which makes it mimic the behaviour of the classical high-frequency estimator introduced by Florens-Zmirou. The analysis of the universal estimator requires tight upper bounds on the estimation error of the occupation time functional for non-continuous functions. In the third part of the thesis, we thus consider the Riemann sum approximation of the occupation time functional of a stationary, time-reversible Markov process. Upper bounds on the squared mean estimation error are provided. In the case of diffusion processes, convergence rates for Sobolev regular functions are obtained.
|
548 |
The Propagation-Separation Approach / theoretical study and application to magnetic resonance imagingBecker, Saskia 16 May 2014 (has links)
Lokal parametrische Modelle werden häufig im Kontext der nichtparametrischen Schätzung verwendet. Bei einer punktweisen Schätzung der Zielfunktion können die parametrischen Umgebungen mithilfe von Gewichten beschrieben werden, die entweder von den Designpunkten oder (zusätzlich) von den Beobachtungen abhängen. Der Vergleich von verrauschten Beobachtungen in einzelnen Punkten leidet allerdings unter einem Mangel an Robustheit. Der Propagations-Separations-Ansatz von Polzehl und Spokoiny [2006] verwendet daher einen Multiskalen-Ansatz mit iterativ aktualisierten Gewichten. Wir präsentieren hier eine theoretische Studie und numerische Resultate, die ein besseres Verständnis des Verfahrens ermöglichen. Zu diesem Zweck definieren und untersuchen wir eine neue Strategie für die Wahl des entscheidenden Parameters des Verfahrens, der Adaptationsbandweite. Insbesondere untersuchen wir ihre Variabilität in Abhängigkeit von der unbekannten Zielfunktion. Unsere Resultate rechtfertigen eine Wahl, die unabhängig von den jeweils vorliegenden Beobachtungen ist. Die neue Parameterwahl liefert für stückweise konstante und stückweise beschränkte Funktionen theoretische Beweise der Haupteigenschaften des Algorithmus. Für den Fall eines falsch spezifizierten Modells führen wir eine spezielle Stufenfunktion ein und weisen eine punktweise Fehlerschranke im Vergleich zum Schätzer des Algorithmus nach. Des Weiteren entwickeln wir eine neue Methode zur Entrauschung von diffusionsgewichteten Magnetresonanzdaten. Unser neues Verfahren (ms)POAS basiert auf einer speziellen Beschreibung der Daten, die eine zeitgleiche Glättung bezüglich der gemessenen Positionen und der Richtungen der verwendeten Diffusionsgradienten ermöglicht. Für den kombinierten Messraum schlagen wir zwei Distanzfunktionen vor, deren Eignung wir mithilfe eines differentialgeometrischen Ansatzes nachweisen. Schließlich demonstrieren wir das große Potential von (ms)POAS auf simulierten und experimentellen Daten. / In statistics, nonparametric estimation is often based on local parametric modeling. For pointwise estimation of the target function, the parametric neighborhoods can be described by weights that depend on design points or on observations. As it turned out, the comparison of noisy observations at single points suffers from a lack of robustness. The Propagation-Separation Approach by Polzehl and Spokoiny [2006] overcomes this problem by using a multiscale approach with iteratively updated weights. The method has been successfully applied to a large variety of statistical problems. Here, we present a theoretical study and numerical results, which provide a better understanding of this versatile procedure. For this purpose, we introduce and analyse a novel strategy for the choice of the crucial parameter of the algorithm, namely the adaptation bandwidth. In particular, we study its variability with respect to the unknown target function. This justifies a choice independent of the data at hand. For piecewise constant and piecewise bounded functions, this choice enables theoretical proofs of the main heuristic properties of the algorithm. Additionally, we consider the case of a misspecified model. Here, we introduce a specific step function, and we establish a pointwise error bound between this function and the corresponding estimates of the Propagation-Separation Approach. Finally, we develop a method for the denoising of diffusion-weighted magnetic resonance data, which is based on the Propagation-Separation Approach. Our new procedure, called (ms)POAS, relies on a specific description of the data, which enables simultaneous smoothing in the measured positions and with respect to the directions of the applied diffusion-weighting magnetic field gradients. We define and justify two distance functions on the combined measurement space, where we follow a differential geometric approach. We demonstrate the capability of (ms)POAS on simulated and experimental data.
|
549 |
ROBUST INFERENCE FOR HETEROGENEOUS TREATMENT EFFECTS WITH APPLICATIONS TO NHANES DATARan Mo (20329047) 10 January 2025 (has links)
<p dir="ltr">Estimating the conditional average treatment effect (CATE) using data from the National Health and Nutrition Examination Survey (NHANES) provides valuable insights into the heterogeneous</p><p dir="ltr">impacts of health interventions across diverse populations, facilitating public health strategies that consider individual differences in health behaviors and conditions. However, estimating CATE with NHANES data face challenges often encountered in observational studies, such as outliers, heavy-tailed error distributions, skewed data, model misspecification, and the curse of dimensionality. To address these challenges, this dissertation presents three consecutive studies that thoroughly explore robust methods for estimating heterogeneous treatment effects. </p><p dir="ltr">The first study introduces an outlier-resistant estimation method by incorporating M-estimation, replacing the \(L_2\) loss in the traditional inverse propensity weighting (IPW) method with a robust loss function. To assess the robustness of our approach, we investigate its influence function and breakdown point. Additionally, we derive the asymptotic properties of the proposed estimator, enabling valid inference for the proposed outlier-resistant estimator of CATE.</p><p dir="ltr">The method proposed in the first study relies on a symmetric assumption which is commonly required by standard outlier-resistant methods. To remove this assumption while maintaining </p><p dir="ltr">unbiasedness, the second study employs the adaptive Huber loss, which dynamically adjusts the robustification parameter based on the sample size to achieve optimal tradeoff between bias and robustness. The robustification parameter is explicitly derived from theoretical results, making it unnecessary to rely on time-consuming data-driven methods for its selection.</p><p dir="ltr">We also derive concentration and Berry-Esseen inequalities to precisely quantify the convergence rates as well as finite sample performance.</p><p dir="ltr">In both previous studies, the propensity scores were estimated parametrically, which is sensitive to model misspecification issues. The third study extends the robust estimator from our first </p><p dir="ltr">project by plugging in a kernel-based nonparametric estimation of the propensity score with sufficient dimension reduction (SDR). Specifically, we adopt a robust minimum average variance estimation (rMAVE) for the central mean space under the potential outcome framework. Together with higher-order kernels, the resulting CATE estimation gains enhanced efficiency.</p><p dir="ltr">In all three studies, the theoretical results are derived, and confidence intervals are constructed for inference based on these findings. The properties of the proposed estimators are verified through extensive simulations. Additionally, applying these methods to NHANES data validates the estimators' ability to handle diverse and contaminated datasets, further demonstrating their effectiveness in real-world scenarios.</p><p><br></p>
|
550 |
Essays on Modern Econometrics and Machine LearningKeilbar, Georg 16 June 2022 (has links)
Diese Dissertation behandelt verschiedene Aspekte moderner Ökonometrie und Machine Learnings. Kapitel 2 stellt einen neuen Schätzer für die Regressionsparameter in einem Paneldatenmodell mit interaktiven festen Effekten vor. Eine Besonderheit unserer Methode ist die Modellierung der factor loadings durch nichtparametrische Funktionen. Wir zeigen die root-NT-Konvergenz sowie die asymptotische Normalverteilung unseres Schätzers. Kapitel 3 betrachtet die rekursive Schätzung von Quantilen mit Hilfe des stochastic gradient descent (SGD) Algorithmus mit Polyak-Ruppert Mittelwertbildung. Der Algorithmus ist rechnerisch und Speicher-effizient verglichen mit herkömmlichen Schätzmethoden. Unser Fokus ist die Untersuchung des nichtasymptotischen Verhaltens, indem wir eine exponentielle Wahrscheinlichkeitsungleichung zeigen. In Kapitel 4 stellen wir eine neue Methode zur Kalibrierung von conditional Value-at-Risk (CoVaR) basierend auf Quantilregression mittels Neural Networks vor. Wir modellieren systemische Spillovereffekte in einem Netzwerk von systemrelevanten Finanzinstituten. Eine Out-of-Sample Analyse zeigt eine klare Verbesserung im Vergleich zu einer linearen Grundspezifikation. Im Vergleich mit bestehenden Risikomaßen eröffnet unsere Methode eine neue Perspektive auf systemisches Risiko. In Kapitel 5 modellieren wir die gemeinsame Dynamik von Kryptowährungen in einem nicht-stationären Kontext. Um eine Analyse in einem dynamischen Rahmen zu ermöglichen, stellen wir eine neue vector error correction model (VECM) Spezifikation vor, die wir COINtensity VECM nennen. / This thesis focuses on different aspects of the union of modern econometrics and machine learning. Chapter 2 considers a new estimator of the regression parameters in a panel data model with unobservable interactive fixed effects. A distinctive feature of the proposed approach is to model the factor loadings as a nonparametric function. We show that our estimator is root-NT-consistent and asymptotically normal, as well that it reaches the semiparametric efficiency bound under the assumption of i.i.d. errors.
Chapter 3 is concerned with the recursive estimation of quantiles using the stochastic gradient descent (SGD) algorithm with Polyak-Ruppert averaging. The algorithm offers a computationally and memory efficient alternative to the usual empirical estimator. Our focus is on studying the nonasymptotic behavior by providing exponentially decreasing tail probability bounds under minimal assumptions. In Chapter 4 we propose a novel approach to calibrate the conditional value-at-risk (CoVaR) of financial institutions based on neural network quantile regression. We model systemic risk spillover effects in a network context across banks by considering the marginal effects of the quantile regression procedure. An out-of-sample analysis shows great performance compared to a linear baseline specification, signifying the importance that nonlinearity plays for modelling systemic risk. A comparison to existing network-based risk measures reveals that our approach offers a new perspective on systemic risk. In Chapter 5 we aim to model the joint dynamics of cryptocurrencies in a nonstationary setting. In particular, we analyze the role of cointegration relationships within a large system of cryptocurrencies in a vector error correction model (VECM) framework. To enable analysis in a dynamic setting, we propose the COINtensity VECM, a nonlinear VECM specification accounting for a varying system-wide cointegration exposure.
|
Page generated in 0.0552 seconds