Global ETD Search

11	Choosing a data frequency to forecast the quarterly yen-dollar exchange rate Cann, Benjamin 03 October 2016 (has links) Potentially valuable information about the underlying data generating process of a dependent variable is often lost when an independent variable is transformed to fit into the same sampling frequency as a dependent variable. With the mixed data sampling (MIDAS) technique and increasingly available data at high frequencies, the issue of choosing an optimal sampling frequency becomes apparent. We use financial data and the MIDAS technique to estimate thousands of regressions and forecasts in the quarterly, monthly, weekly, and daily sampling frequencies. Model fit and forecast performance measurements are calculated from each estimation and used to generate summary statistics for each sampling frequency so that comparisons can be made between frequencies. Our regression models contain an autoregressive component and five additional independent variables and are estimated with varying lag length specifications that incrementally increase up to five years of lags. Each regression is used to forecast a rolling, one and two-step ahead, static forecast of the quarterly Yen and U.S Dollar spot exchange rate. Our results suggest that it may be favourable to include high frequency variables for closer modeling of the underlying data generating process but not necessarily for increased forecasting performance. / Graduate / 0501 / 0508 / 0511 / benjamincann@gmail.com mixed data sampling forecasting model selection criteria time-series yen dollar exchange rate econometrics economics MIDAS foreign exchange rates
12	Essays in Financial Econometrics Jeong, Dae Hee 14 January 2010 (has links) I consider continuous time asset pricing models with stochastic differential utility incorporating decision makers' concern with ambiguity on true probability measure. In order to identify and estimate key parameters in the models, I use a novel econometric methodology developed recently by Park (2008) for the statistical inference on continuous time conditional mean models. The methodology only imposes the condition that the pricing error is a continuous martingale to achieve identification, and obtain consistent and asymptotically normal estimates of the unknown parameters. Under a representative agent setting, I empirically evaluate alternative preference specifications including a multiple-prior recursive utility. My empirical findings are summarized as follows: Relative risk aversion is estimated around 1.5-5.5 with ambiguity aversion and 6-14 without ambiguity aversion. Related, the estimated ambiguity aversion is both economically and statistically significant and including the ambiguity aversion clearly lowers relative risk aversion. The elasticity of intertemporal substitution (EIS) is higher than 1, around 1.3-22 with ambiguity aversion, and quite high without ambiguity aversion. The identification of EIS appears to be fairly weak, as observed by many previous authors, though other aspects of my empirical results seem quite robust. Next, I develop an approach to test for martingale in a continuous time framework. The approach yields various test statistics that are consistent against a wide class of nonmartingale semimartingales. A novel aspect of my approach is to use a time change defined by the inverse of the quadratic variation of a semimartingale, which is to be tested for the martingale hypothesis. With the time change, a continuous semimartingale reduces to Brownian motion if and only if it is a continuous martingale. This follows immediately from the celebrated theorem by Dambis, Dubins and Schwarz. For the test of martingale, I may therefore see if the given process becomes Brownian motion after the time change. I use several existing tests for multivariate normality to test whether the time changed process is indeed Brownian motion. I provide asymptotic theories for my test statistics, on the assumption that the sampling interval decreases, as well as the time horizon expands. The stationarity of the underlying process is not assumed, so that my results are applicable also to nonstationary processes. A Monte-Carlo study shows that our tests perform very well for a wide range of realistic alternatives and have superior power than other discrete time tests. Recursive utility stochastic differential utility multiple priors ambiguity aversion time change martingale regression unobservable aggregate wealth mixed data frequencies martingale test
13	Comparison of Imputation Methods for Mixed Data Missing at Random Heidt, Kaitlyn 01 May 2019 (has links) A statistician's job is to produce statistical models. When these models are precise and unbiased, we can relate them to new data appropriately. However, when data sets have missing values, assumptions to statistical methods are violated and produce biased results. The statistician's objective is to implement methods that produce unbiased and accurate results. Research in missing data is becoming popular as modern methods that produce unbiased and accurate results are emerging, such as MICE in R, a statistical software. Using real data, we compare four common imputation methods, in the MICE package in R, at different levels of missingness. The results were compared in terms of the regression coefficients and adjusted R^2 values using the complete data set. The CART and PMM methods consistently performed better than the OTF and RF methods. The procedures were repeated on a second sample of real data and the same conclusions were drawn. Missing data Multiple imputation methods Multiple imputation by chained equation Mixed data Multivariate Analysis Physical Sciences and Mathematics Statistical Methodology Statistics and Probability
14	Imputation multiple par analyse factorielle : Une nouvelle méthodologie pour traiter les données manquantes / Multiple imputation using principal component methods : A new methodology to deal with missing values Audigier, Vincent 25 November 2015 (has links) Cette thèse est centrée sur le développement de nouvelles méthodes d'imputation multiples, basées sur des techniques d'analyse factorielle. L'étude des méthodes factorielles, ici en tant que méthodes d'imputation, offre de grandes perspectives en termes de diversité du type de données imputées d'une part, et en termes de dimensions de jeux de données imputés d'autre part. Leur propriété de réduction de la dimension limite en effet le nombre de paramètres estimés.Dans un premier temps, une méthode d'imputation simple par analyse factorielle de données mixtes est détaillée. Ses propriétés sont étudiées, en particulier sa capacité à gérer la diversité des liaisons mises en jeu et à prendre en compte les modalités rares. Sa qualité de prédiction est éprouvée en la comparant à l'imputation par forêts aléatoires.Ensuite, une méthode d'imputation multiple pour des données quantitatives basée sur une approche Bayésienne du modèle d'analyse en composantes principales est proposée. Elle permet d'inférer en présence de données manquantes y compris quand le nombre d'individus est petit devant le nombre de variables, ou quand les corrélations entre variables sont fortes.Enfin, une méthode d'imputation multiple pour des données qualitatives par analyse des correspondances multiples (ACM) est proposée. La variabilité de prédiction des données manquantes est reflétée via un bootstrap non-paramétrique. L'imputation multiple par ACM offre une réponse au problème de l'explosion combinatoire limitant les méthodes concurrentes dès lors que le nombre de variables ou de modalités est élev / This thesis proposes new multiple imputation methods that are based on principal component methods, which were initially used for exploratory analysis and visualisation of continuous, categorical and mixed multidimensional data. The study of principal component methods for imputation, never previously attempted, offers the possibility to deal with many types and sizes of data. This is because the number of estimated parameters is limited due to dimensionality reduction.First, we describe a single imputation method based on factor analysis of mixed data. We study its properties and focus on its ability to handle complex relationships between variables, as well as infrequent categories. Its high prediction quality is highlighted with respect to the state-of-the-art single imputation method based on random forests.Next, a multiple imputation method for continuous data using principal component analysis (PCA) is presented. This is based on a Bayesian treatment of the PCA model. Unlike standard methods based on Gaussian models, it can still be used when the number of variables is larger than the number of individuals and when correlations between variables are strong.Finally, a multiple imputation method for categorical data using multiple correspondence analysis (MCA) is proposed. The variability of prediction of missing values is introduced via a non-parametric bootstrap approach. This helps to tackle the combinatorial issues which arise from the large number of categories and variables. We show that multiple imputation using MCA outperforms the best current methods. Données manquantes Données mixtes Données qualitatives Imputation multiple Imputation simple Analyse factorielle des données mixtes Analyse en composantes principales Analyse des correspondances multiples Bayésien Bootstrap Missing data Mixed data Categorical data Multiple Imputation Single Imputation Factorial analysis of mixed data Principal component analysis Multiple correspondence analysis Bayesian Bootstrap
15	混合連續與間斷資料之馬式距離的穩健估計 / Robust estimation of the Mahalanobis distance for multivariate data mixed with continuous and discrete variables 任嘉珩, Jen , Chia Heng Unknown Date (has links) 本研究採用Lee 和Poon 所提出的隱藏常態變數模型來估計混合連續與間斷型變數之參數估計，並估計其馬式距離。此外，並利用穩健估計來估計混合型資料參數及其馬式距離，可在有離群值時解決最大蓋似估計的不穩定。 / Poon and Lee (1987) applied normal latent variable model to deal with the parameters estimation for the data mixed with continuous and discrete variables and Bedrick et al. (2000) used this idea to evaluate the Mahalanobis distance. In this thesis, we extend a similar idea to robustly estimate Multivariate Data Mixed with Continuous and Discrete Variables with the same model. Furthermore, we evaluate the Mahalanobis distance which can determine similarity of variables. The proposed method can overcome the unreliability of MLE while there exist outliers in the data. 混合型資料隱藏常態變數模型穩健估計馬式距離 mixed data normal latnet variable model robust estimation Mahalanobis distacne minimum covariance determinant
16	Méthodes de réduction de dimension pour la construction d'indicateurs de qualité de vie / Dimension reduction methods to construct quality of life indicators Labenne, Amaury 20 November 2015 (has links) L’objectif de cette thèse est de développer et de proposer de nouvellesméthodes de réduction de dimension pour la construction d’indicateurs composites dequalité de vie à l’échelle communale. La méthodologie statistique développée met l’accentsur la prise en compte de la multidimensionnalité du concept de qualité de vie, avecune attention particulière sur le traitement de la mixité des données (variables quantitativeset qualitatives) et l’introduction des conditions environnementales. Nous optonspour une approche par classification de variables et pour une méthode multi-tableaux(analyse factorielle multiple pour données mixtes). Ces deux méthodes permettent deconstruire des indicateurs composites que nous proposons comme mesure des conditionsde vie à l’échelle communale. Afin de faciliter l’interprétation des indicateurscomposites construits, une méthode de sélection de variables de type bootstrap estintroduite en analyse factorielle multiple. Enfin nous proposons la méthode hclustgeode classification d’observations qui intègre des contraintes de proximité géographiqueafin de mieux appréhender la spatialité des phénomènes mis en jeu. / The purpose of this thesis is to develop and suggest new dimensionreduction methods to construct composite indicators on a municipal scale. The developedstatistical methodology highlights the consideration of the multi-dimensionalityof the quality of life concept, with a particular attention on the treatment of mixeddata (quantitative and qualitative variables) and the introduction of environmentalconditions. We opt for a variable clustering approach and for a multi-table method(multiple factorial analysis for mixed data). These two methods allow to build compositeindicators that we propose as a measure of living conditions at the municipalscale. In order to facilitate the interpretation of the created composite indicators, weintroduce a method of selections of variables based on a bootstrap approach. Finally,we suggest the clustering of observations method, named hclustgeo, which integratesgeographical proximity constraints in the clustering procedure, in order to apprehendthe spatiality specificities better. Réduction de dimension Classification de variables Analyses factorielles Méthodes multi-tableaux Données mixtes Indicateurs composites Qualité de vie Dimension reduction Variable clustering Factor analysis Multi-table method Mixed data Composite indicators Quality of life
17	Nonparametric kernel estimation methods for discrete conditional functions in econometrics Elamin, Obbey Ahmed January 2013 (has links) This thesis studies the mixed data types kernel estimation framework for the models of discrete dependent variables, which are known as kernel discrete conditional functions. The conventional parametric multinomial logit MNL model is compared with the mixed data types kernel conditional density estimator in Chapter (2). A new kernel estimator for discrete time single state hazard models is developed in Chapter (3), and named as the discrete time “external kernel hazard” estimator. The discrete time (mixed) proportional hazard estimators are then compared with the discrete time external kernel hazard estimator empirically in Chapter (4). The work in Chapter (2) attempts to estimate a labour force participation decision model using a cross-section data from the UK labour force survey in 2007. The work in Chapter (4) estimates a hazard rate for job-vacancies in weeks, using data from Lancashire Careers Service (LCS) between the period from March 1988 to June 1992. The evidences from the vast literature regarding female labour force participation and the job-market random matching theory are used to examine the empirical results of the estimators. The parametric estimator are tighten by the restrictive assumption regarding the link function of the discrete dependent variable and the dummy variables of the discrete covariates. Adding interaction terms improves the performance of the parametric models but encounters other risks like generating multicollinearity problem, increasing the singularity of the data matrix and complicates the computation of the ML function. On the other hand, the mixed data types kernel estimation framework shows an outstanding performance compared with the conventional parametric estimation methods. The kernel functions that are used for the discrete variables, including the dependent variable, in the mixed data types estimation framework, have substantially improved the performance of the kernel estimators. The kernel framework uses very few assumptions about the functional form of the variables in the model, and relay on the right choice of the kernel functions in the estimator. The outcomes of the kernel conditional density shows that female education level and fertility have high impact on females propensity to work and be in the labour force. The kernel conditional density estimator captures more heterogeneity among the females in the sample than the MNL model due to the restrictive parametric assumptions in the later. The (mixed) proportional hazard framework, on the other hand, missed to capture the effect of the job-market tightness in the job-vacancies hazard rate and produce inconsistent results when the assumptions regarding the distribution of the unobserved heterogeneity are changed. The external kernel hazard estimator overcomes those problems and produce results that consistent with the job market random matching theory. The results in this thesis are useful for nonparametric estimation research in econometrics and in labour economics research. 519.5
18	Clustering of Unevenly Spaced Mixed Data Time Series / Klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler Sinander, Pierre, Ahmed, Asik January 2023 (has links) This thesis explores the feasibility of clustering mixed data and unevenly spaced time series for customer segmentation. The proposed method implements the Gower dissimilarity as the local distance function in dynamic time warping to calculate dissimilarities between mixed data time series. The time series are then clustered with k−medoids and the clusters are evaluated with the silhouette score and t−SNE. The study further investigates the use of a time warping regularisation parameter. It is derived that implementing time as a feature has the same effect as penalising time warping, andtherefore time is implemented as a feature where the feature weight is equivalent to a regularisation parameter. The results show that the proposed method successfully identifies clusters in customer transaction data provided by Nordea. Furthermore, the results show a decrease in the silhouette score with an increase in the regularisation parameter, suggesting that the time at which a transaction occurred might not be of relevance to the given dataset. However, due to the method’s high computational complexity, it is limited to relatively small datasets and therefore a need exists for a more scalable and efficient clustering technique. / Denna uppsats utforskar klustring av ojämnt fördelade tidsserier med numeriska och kategoriska variabler för kundsegmentering. Den föreslagna metoden implementerar Gower dissimilaritet som avståndsfunktionen i dynamic time warping för att beräkna dissimilaritet mellan tidsserierna. Tidsserierna klustras sedan med k-medoids och klustren utvärderas med silhouette score och t-SNE. Studien undersökte vidare användningen av en regulariserings parameter. Det härledes att implementering av tid som en egenskap hade samma effekt som att bestraffa dynamic time warping, och därför implementerades tid som en egenskap där dess vikt är ekvivalent med en regulariseringsparameter. Resultaten visade att den föreslagna metoden lyckades identifiera kluster i transaktionsdata från Nordea. Vidare visades det att silhouette score minskade då regulariseringsparametern ökade, vilket antyder att tiden transaktion då en transaktion sker inte är relevant för det givna datan. Det visade sig ytterligare att metoden är begränsad till reltaivt små dataset på grund av dess höga beräkningskomplexitet, och därför finns det behov av att utforksa en mer skalbar och effektiv klusteringsteknik. mixed data time series unevenly spaced time series clustering dynamic time warping Gower dissimilarity time warping regularisation numeriska och kategoriska tidsserier ojämnt fördelade tidsserier kluster analys dynamic time warping Gower dissimilaritet regularisering av tidsförvränging Other Mathematics Annan matematik

Search results