• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 81
  • 17
  • 9
  • 7
  • 7
  • 6
  • 5
  • 2
  • 2
  • 2
  • 1
  • 1
  • 1
  • Tagged with
  • 170
  • 170
  • 42
  • 41
  • 36
  • 33
  • 30
  • 30
  • 23
  • 22
  • 18
  • 18
  • 17
  • 16
  • 15
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
121

Model selection

Hildebrand, Annelize 11 1900 (has links)
In developing an understanding of real-world problems, researchers develop mathematical and statistical models. Various model selection methods exist which can be used to obtain a mathematical model that best describes the real-world situation in some or other sense. These methods aim to assess the merits of competing models by concentrating on a particular criterion. Each selection method is associated with its own criterion and is named accordingly. The better known ones include Akaike's Information Criterion, Mallows' Cp and cross-validation, to name a few. The value of the criterion is calculated for each model and the model corresponding to the minimum value of the criterion is then selected as the "best" model. / Mathematical Sciences / M. Sc. (Statistics)
122

國中生數學自我概念、自我效能與成就關係之探討: 以PISA2003香港資料為例 / The Relationship among Self-Concept, Self-Efficacy, and Performance in Mathematics: The PISA 2003 Hong Kong Data

盧玟伶, Lu, Wen-Ling Unknown Date (has links)
本研究目的,在利用PISA 2003資料庫為例,分辨數學自我概念、自我效能與數學成就關係之模式的建構。本研究選香港為研究對象,以參加PISA 2003的4402名香港的15歲學生為樣本來進行本研究。本研究運用探索性因素分析(EFA)檢視自我概念與自我效能之測量指標的信效度。分析結果顯示,「自我概念」與「自我效能」的測量模式的建構達良好的信效度。另一研究結果顯示,學生數學自我概念對數學成就之間沒有直接的影響效果,但會透過數學自我效能此中介變項,而產生對數學成就的間接影響效果。此外,在雙交叉驗證方面,顯示研究二組樣本具有交叉效度,研究模式之接受性均相當高。 / The purpose of this study was to examine the relationship among self-concept, self-efficacy, and performance in mathematics. The PISA 2003 Hong Kong data was used as an example. There were 4402 15-year-old participants in this survey. Explore factor analysis was used to identify the good measurement models of self-concept and self-efficacy in PISA 2003. The results showed that the measurement models had high reliability and validity. The other result showed self concept had no direct effects on the mathematics achievement. But under the mediation of the mediator, such as self-efficacy, there was indirect effect on the mathematics achievement. Analysis also showed that the two sets of samples have presented cross validity, the research model is highly acceptable.
123

兩種正則化方法用於假設檢定與判別分析時之比較 / A comparison between two regularization methods for discriminant analysis and hypothesis testing

李登曜, Li, Deng-Yao Unknown Date (has links)
在統計學上,高維度常造成許多分析上的問題,如進行多變量迴歸的假設檢定時,當樣本個數小於樣本維度時,其樣本共變異數矩陣之反矩陣不存在,使得檢定無法進行,本文研究動機即為在進行兩群多維常態母體的平均數檢定時,所遇到的高維度問題,並引發在分類上的研究,試圖尋找解決方法。本文研究目的為在兩種不同的正則化方法中,比較何者在檢定與分類上表現較佳。本文研究方法為以 Warton 與 Friedman 的正則化方法來分別進行檢定與分類上的分析,根據其檢定力與分類錯誤的表現來判斷何者較佳。由分析結果可知,兩種正則化方法並沒有絕對的優劣,須視母體各項假設而定。 / High dimensionality causes many problems in statistical analysis. For instance, consider the testing of hypotheses about multivariate regression models. Suppose that the dimension of the multivariate response is larger than the number of observations, then the sample covariance matrix is not invertible. Since the inverse of the sample covariance matrix is often needed when computing the usual likelihood ratio test statistic (under normality), the matrix singularity makes it difficult to implement the test . The singularity of the sample covariance matrix is also a problem in classification when the linear discriminant analysis (LDA) or the quadratic discriminant analysis (QDA) is used. Different regularization methods have been proposed to deal with the singularity of the sample covariance matrix for different purposes. Warton (2008) proposed a regularization procedure for testing, and Friedman (1989) proposed a regularization procedure for classification. Is it true that Warton's regularization works better for testing and Friedman's regularization works better for classification? To answer this question, some simulation studies are conducted and the results are presented in this thesis. It is found that neither regularization method is superior to the other.
124

MODELING HETEROTACHY IN PHYLOGENETICS

Zhou, Yan 04 1900 (has links)
Il a été démontré que l’hétérotachie, variation du taux de substitutions au cours du temps et entre les sites, est un phénomène fréquent au sein de données réelles. Échouer à modéliser l’hétérotachie peut potentiellement causer des artéfacts phylogénétiques. Actuellement, plusieurs modèles traitent l’hétérotachie : le modèle à mélange des longueurs de branche (MLB) ainsi que diverses formes du modèle covarion. Dans ce projet, notre but est de trouver un modèle qui prenne efficacement en compte les signaux hétérotaches présents dans les données, et ainsi améliorer l’inférence phylogénétique. Pour parvenir à nos fins, deux études ont été réalisées. Dans la première, nous comparons le modèle MLB avec le modèle covarion et le modèle homogène grâce aux test AIC et BIC, ainsi que par validation croisée. A partir de nos résultats, nous pouvons conclure que le modèle MLB n’est pas nécessaire pour les sites dont les longueurs de branche diffèrent sur l’ensemble de l’arbre, car, dans les données réelles, le signaux hétérotaches qui interfèrent avec l’inférence phylogénétique sont généralement concentrés dans une zone limitée de l’arbre. Dans la seconde étude, nous relaxons l’hypothèse que le modèle covarion est homogène entre les sites, et développons un modèle à mélanges basé sur un processus de Dirichlet. Afin d’évaluer différents modèles hétérogènes, nous définissons plusieurs tests de non-conformité par échantillonnage postérieur prédictif pour étudier divers aspects de l’évolution moléculaire à partir de cartographies stochastiques. Ces tests montrent que le modèle à mélanges covarion utilisé avec une loi gamma est capable de refléter adéquatement les variations de substitutions tant à l’intérieur d’un site qu’entre les sites. Notre recherche permet de décrire de façon détaillée l’hétérotachie dans des données réelles et donne des pistes à suivre pour de futurs modèles hétérotaches. Les tests de non conformité par échantillonnage postérieur prédictif fournissent des outils de diagnostic pour évaluer les modèles en détails. De plus, nos deux études révèlent la non spécificité des modèles hétérogènes et, en conséquence, la présence d’interactions entre différents modèles hétérogènes. Nos études suggèrent fortement que les données contiennent différents caractères hétérogènes qui devraient être pris en compte simultanément dans les analyses phylogénétiques. / Heterotachy, substitution rate variation across sites and time, has shown to be a frequent phenomenon in the real data. Failure to model heterotachy could potentially cause phylogenetic artefacts. Currently, there are several models to handle heterotachy, the mixture branch length model (MBL) and several variant forms of the covarion model. In this project, our objective is to find a model that efficiently handles heterotachous signals in the data, and thereby improves phylogenetic inference. In order to achieve our goal, two individual studies were conducted. In the first study, we make comparisons among the MBL, covarion and homotachous models using AIC, BIC and cross validation. Based on our results, we conclude that the MBL model, in which sites have different branch lengths along the entire tree, is an over-parameterized model. Real data indicate that the heterotachous signals which interfere with phylogenetic inference are generally limited to a small area of the tree. In the second study, we relax the assumption of the homogeneity of the covarion parameters over sites, and develop a mixture covarion model using a Dirichlet process. In order to evaluate different heterogeneous models, we design several posterior predictive discrepancy tests to study different aspects of molecular evolution using stochastic mappings. The posterior predictive discrepancy tests demonstrate that the covarion mixture +Γ model is able to adequately model the substitution variation within and among sites. Our research permits a detailed view of heterotachy in real datasets and gives directions for future heterotachous models. The posterior predictive discrepancy tests provide diagnostic tools to assess models in detail. Furthermore, both of our studies reveal the non-specificity of heterogeneous models. Our studies strongly suggest that different heterogeneous features in the data should be handled simultaneously.
125

Méthodes de Bootstrap pour les modèles à facteurs

Djogbenou, Antoine A. 07 1900 (has links)
Cette thèse développe des méthodes bootstrap pour les modèles à facteurs qui sont couram- ment utilisés pour générer des prévisions depuis l'article pionnier de Stock et Watson (2002) sur les indices de diffusion. Ces modèles tolèrent l'inclusion d'un grand nombre de variables macroéconomiques et financières comme prédicteurs, une caractéristique utile pour inclure di- verses informations disponibles aux agents économiques. Ma thèse propose donc des outils éco- nométriques qui améliorent l'inférence dans les modèles à facteurs utilisant des facteurs latents extraits d'un large panel de prédicteurs observés. Il est subdivisé en trois chapitres complémen- taires dont les deux premiers en collaboration avec Sílvia Gonçalves et Benoit Perron. Dans le premier article, nous étudions comment les méthodes bootstrap peuvent être utilisées pour faire de l'inférence dans les modèles de prévision pour un horizon de h périodes dans le futur. Pour ce faire, il examine l'inférence bootstrap dans un contexte de régression augmentée de facteurs où les erreurs pourraient être autocorrélées. Il généralise les résultats de Gonçalves et Perron (2014) et propose puis justifie deux approches basées sur les résidus : le block wild bootstrap et le dependent wild bootstrap. Nos simulations montrent une amélioration des taux de couverture des intervalles de confiance des coefficients estimés en utilisant ces approches comparativement à la théorie asymptotique et au wild bootstrap en présence de corrélation sérielle dans les erreurs de régression. Le deuxième chapitre propose des méthodes bootstrap pour la construction des intervalles de prévision permettant de relâcher l'hypothèse de normalité des innovations. Nous y propo- sons des intervalles de prédiction bootstrap pour une observation h périodes dans le futur et sa moyenne conditionnelle. Nous supposons que ces prévisions sont faites en utilisant un ensemble de facteurs extraits d'un large panel de variables. Parce que nous traitons ces facteurs comme latents, nos prévisions dépendent à la fois des facteurs estimés et les coefficients de régres- sion estimés. Sous des conditions de régularité, Bai et Ng (2006) ont proposé la construction d'intervalles asymptotiques sous l'hypothèse de Gaussianité des innovations. Le bootstrap nous permet de relâcher cette hypothèse et de construire des intervalles de prédiction valides sous des hypothèses plus générales. En outre, même en supposant la Gaussianité, le bootstrap conduit à des intervalles plus précis dans les cas où la dimension transversale est relativement faible car il prend en considération le biais de l'estimateur des moindres carrés ordinaires comme le montre une étude récente de Gonçalves et Perron (2014). Dans le troisième chapitre, nous suggérons des procédures de sélection convergentes pour les regressions augmentées de facteurs en échantillons finis. Nous démontrons premièrement que la méthode de validation croisée usuelle est non-convergente mais que sa généralisation, la validation croisée «leave-d-out» sélectionne le plus petit ensemble de facteurs estimés pour l'espace généré par les vraies facteurs. Le deuxième critère dont nous montrons également la validité généralise l'approximation bootstrap de Shao (1996) pour les regressions augmentées de facteurs. Les simulations montrent une amélioration de la probabilité de sélectionner par- cimonieusement les facteurs estimés comparativement aux méthodes de sélection disponibles. L'application empirique revisite la relation entre les facteurs macroéconomiques et financiers, et l'excès de rendement sur le marché boursier américain. Parmi les facteurs estimés à partir d'un large panel de données macroéconomiques et financières des États Unis, les facteurs fortement correlés aux écarts de taux d'intérêt et les facteurs de Fama-French ont un bon pouvoir prédictif pour les excès de rendement. / This thesis develops bootstrap methods for factor models which are now widely used for generating forecasts since the seminal paper of Stock and Watson (2002) on diffusion indices. These models allow the inclusion of a large set of macroeconomic and financial variables as predictors, useful to span various information related to economic agents. My thesis develops econometric tools that improves inference in factor-augmented regression models driven by few unobservable factors estimated from a large panel of observed predictors. It is subdivided into three complementary chapters. The two first chapters are joint papers with Sílvia Gonçalves and Benoit Perron. In the first chapter, we study how bootstrap methods can be used to make inference in h-step forecasting models which generally involve serially correlated errors. It thus considers bootstrap inference in a factor-augmented regression context where the errors could potentially be serially correlated. This generalizes results in Gonçalves and Perron (2013) and makes the bootstrap applicable to forecasting contexts where the forecast horizon is greater than one. We propose and justify two residual-based approaches, a block wild bootstrap (BWB) and a dependent wild bootstrap (DWB). Our simulations document improvement in coverage rates of confidence intervals for the coefficients when using BWB or DWB relative to both asymptotic theory and the wild bootstrap when serial correlation is present in the regression errors. The second chapter provides bootstrap methods for prediction intervals which allow relaxing the normality distribution assumption on innovations. We propose bootstrap prediction intervals for an observation h periods into the future and its conditional mean. We assume that these forecasts are made using a set of factors extracted from a large panel of variables. Because we treat these factors as latent, our forecasts depend both on estimated factors and estimated regression coefficients. Under regularity conditions, Bai and Ng (2006) proposed the construction of asymptotic intervals under Gaussianity of the innovations. The bootstrap allows us to relax this assumption and to construct valid prediction intervals under more general conditions. Moreover, even under Gaussianity, the bootstrap leads to more accurate intervals in cases where the cross-sectional dimension is relatively small as it reduces the bias of the ordinary least squares estimator as shown in a recent paper by Gonçalves and Perron (2014). The third chapter proposes two consistent model selection procedures for factor-augmented regressions in finite samples.We first demonstrate that the usual cross-validation is inconsistent, but that a generalization, leave-d-out cross-validation, selects the smallest basis of estimated factors for the space spanned by the true factors. The second proposed criterion is a generalization of the bootstrap approximation of the squared error of prediction of Shao (1996) to factor-augmented regressions which we also show is consistent. Simulation evidence documents improvements in the probability of selecting the smallest set of estimated factors than the usually available methods. An illustrative empirical application that analyzes the relationship between expected stock returns and macroeconomic and financial factors extracted from a large panel of U.S. macroeconomic and financial data is conducted. Our new procedures select factors that correlate heavily with interest rate spreads and with the Fama-French factors. These factors have strong predictive power for excess returns.
126

Modelos de agrupamento e classificação para os bairros da cidade do Rio de Janeiro sob a ótica da Inteligência Computacional: Lógica Fuzzy, Máquinas de Vetores Suporte e Algoritmos Genéticos / Clustering and classification models for the neighborhoods of the city of Rio de Janeiro from the perspective of Computational Intelligence: Fuzzy Logic, Support Vector Machine and Genetic Algorithms

Natalie Henriques Martins 19 June 2015 (has links)
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior / A partir de 2011, ocorreram e ainda ocorrerão eventos de grande repercussão para a cidade do Rio de Janeiro, como a conferência Rio+20 das Nações Unidas e eventos esportivos de grande importância mundial (Copa do Mundo de Futebol, Olimpíadas e Paraolimpíadas). Estes acontecimentos possibilitam a atração de recursos financeiros para a cidade, assim como a geração de empregos, melhorias de infraestrutura e valorização imobiliária, tanto territorial quanto predial. Ao optar por um imóvel residencial em determinado bairro, não se avalia apenas o imóvel, mas também as facilidades urbanas disponíveis na localidade. Neste contexto, foi possível definir uma interpretação qualitativa linguística inerente aos bairros da cidade do Rio de Janeiro, integrando-se três técnicas de Inteligência Computacional para a avaliação de benefícios: Lógica Fuzzy, Máquina de Vetores Suporte e Algoritmos Genéticos. A base de dados foi construída com informações da web e institutos governamentais, evidenciando o custo de imóveis residenciais, benefícios e fragilidades dos bairros da cidade. Implementou-se inicialmente a Lógica Fuzzy como um modelo não supervisionado de agrupamento através das Regras Elipsoidais pelo Princípio de Extensão com o uso da Distância de Mahalanobis, configurando-se de forma inferencial os grupos de designação linguística (Bom, Regular e Ruim) de acordo com doze características urbanas. A partir desta discriminação, foi tangível o uso da Máquina de Vetores Suporte integrado aos Algoritmos Genéticos como um método supervisionado, com o fim de buscar/selecionar o menor subconjunto das variáveis presentes no agrupamento que melhor classifique os bairros (Princípio da Parcimônia). A análise das taxas de erro possibilitou a escolha do melhor modelo de classificação com redução do espaço de variáveis, resultando em um subconjunto que contém informações sobre: IDH, quantidade de linhas de ônibus, instituições de ensino, valor m médio, espaços ao ar livre, locais de entretenimento e crimes. A modelagem que combinou as três técnicas de Inteligência Computacional hierarquizou os bairros do Rio de Janeiro com taxas de erros aceitáveis, colaborando na tomada de decisão para a compra e venda de imóveis residenciais. Quando se trata de transporte público na cidade em questão, foi possível perceber que a malha rodoviária ainda é a prioritária
127

Covariate Model Building in Nonlinear Mixed Effects Models

Ribbing, Jakob January 2007 (has links)
<p>Population pharmacokinetic-pharmacodynamic (PK-PD) models can be fitted using nonlinear mixed effects modelling (NONMEM). This is an efficient way of learning about drugs and diseases from data collected in clinical trials. Identifying covariates which explain differences between patients is important to discover patient subpopulations at risk of sub-therapeutic or toxic effects and for treatment individualization. Stepwise covariate modelling (SCM) is commonly used to this end. The aim of the current thesis work was to evaluate SCM and to develop alternative approaches. A further aim was to develop a mechanistic PK-PD model describing fasting plasma glucose, fasting insulin, insulin sensitivity and beta-cell mass.</p><p>The lasso is a penalized estimation method performing covariate selection simultaneously to shrinkage estimation. The lasso was implemented within NONMEM as an alternative to SCM and is discussed in comparison with that method. Further, various ways of incorporating information and propagating knowledge from previous studies into an analysis were investigated. In order to compare the different approaches, investigations were made under varying, replicated conditions. In the course of the investigations, more than one million NONMEM analyses were performed on simulated data. Due to selection bias the use of SCM performed poorly when analysing small datasets or rare subgroups. In these situations, the lasso method in NONMEM performed better, was faster, and additionally validated the covariate model. Alternatively, the performance of SCM can be improved by propagating knowledge or incorporating information from previously analysed studies and by population optimal design.</p><p>A model was also developed on a physiological/mechanistic basis to fit data from three phase II/III studies on the investigational drug, tesaglitazar. This model described fasting glucose and insulin levels well, despite heterogeneous patient groups ranging from non-diabetic insulin resistant subjects to patients with advanced diabetes. The model predictions of beta-cell mass and insulin sensitivity were well in agreement with values in the literature.</p>
128

Covariate Model Building in Nonlinear Mixed Effects Models

Ribbing, Jakob January 2007 (has links)
Population pharmacokinetic-pharmacodynamic (PK-PD) models can be fitted using nonlinear mixed effects modelling (NONMEM). This is an efficient way of learning about drugs and diseases from data collected in clinical trials. Identifying covariates which explain differences between patients is important to discover patient subpopulations at risk of sub-therapeutic or toxic effects and for treatment individualization. Stepwise covariate modelling (SCM) is commonly used to this end. The aim of the current thesis work was to evaluate SCM and to develop alternative approaches. A further aim was to develop a mechanistic PK-PD model describing fasting plasma glucose, fasting insulin, insulin sensitivity and beta-cell mass. The lasso is a penalized estimation method performing covariate selection simultaneously to shrinkage estimation. The lasso was implemented within NONMEM as an alternative to SCM and is discussed in comparison with that method. Further, various ways of incorporating information and propagating knowledge from previous studies into an analysis were investigated. In order to compare the different approaches, investigations were made under varying, replicated conditions. In the course of the investigations, more than one million NONMEM analyses were performed on simulated data. Due to selection bias the use of SCM performed poorly when analysing small datasets or rare subgroups. In these situations, the lasso method in NONMEM performed better, was faster, and additionally validated the covariate model. Alternatively, the performance of SCM can be improved by propagating knowledge or incorporating information from previously analysed studies and by population optimal design. A model was also developed on a physiological/mechanistic basis to fit data from three phase II/III studies on the investigational drug, tesaglitazar. This model described fasting glucose and insulin levels well, despite heterogeneous patient groups ranging from non-diabetic insulin resistant subjects to patients with advanced diabetes. The model predictions of beta-cell mass and insulin sensitivity were well in agreement with values in the literature.
129

Analysis of Spatial Performance of Meteorological Drought Indices

Patil, Sandeep 1986- 14 March 2013 (has links)
Meteorological drought indices are commonly calculated from climatic stations that have long-term historical data and then converted to a regular grid using spatial interpolation methods. The gridded drought indices are mapped to aid decision making by policy makers and the general public. This study analyzes the spatial performance of interpolation methods for meteorological drought indices in the United States based on data from the Co-operative Observer Network (COOP) and United States Historical Climatology Network (USHCN) for different months, climatic regions and years. An error analysis was performed using cross-validation and the results were compared for the 9 climate regions that comprise the United States. Errors are generally higher in regions and months dominated by convective precipitation. Errors are also higher in regions like the western United States that are dominated by mountainous terrain. Higher errors are consistently observed in the southeastern U.S. especially in Florida. Interpolation errors are generally higher in the summer than winter. The accuracy of different drought indices was also compared. The Standardized Precipitation and Evapotranspiration Index (SPEI) tends to have lower errors than Standardized Precipitation Index (SPI) in seasons with significant convective precipitation. This is likely because SPEI uses both precipitation and temperature data in its calculation, whereas SPI is based solely on precipitation. There are also variations in interpolation accuracy based on the network that is used. In general, COOP is more accurate than USHCN because the COOP network has a higher density of stations. USHCN is a subset of the COOP network that is comprised of high quality stations that have a long and complete record. However the difference in accuracy is not as significant as the difference in spatial density between the two networks. For multiscalar SPI, USHCN performs better than COOP because the stations tend to have a longer record. The ordinary kriging method (with optimal function fitting) performed better than Inverse Distance Weighted (IDW) methods (power parameters 2.0 and 2.5) in all cases and therefore it is recommended for interpolating drought indices. However, ordinary kriging only provided a statistically significant improvement in accuracy for the Palmer Drought Severity Index (PDSI) with the COOP network. Therefore it can be concluded that IDW is a reasonable method for interpolating drought indices, but optimal ordinary kriging provides some improvement in accuracy. The most significant factor affecting the spatial accuracy of drought indices is seasonality (precipitation climatology) and this holds true for almost all the regions of U.S. for 1-month SPI and SPEI. The high-quality USHCN network gives better interpolation accuracy with 6-, 9- and 12-month SPI and variation in errors amongst the different SPI time scales is minimal. The difference between networks is also significant for PDSI. Although the absolute magnitude of the differences between interpolation with COOP and USHCN are small, the accuracy of interpolation with COOP is much more spatially variable than with USHCN.
130

Modelling Implied Volatility of American-Asian Options : A Simple Multivariate Regression Approach

Radeschnig, David January 2015 (has links)
This report focus upon implied volatility for American styled Asian options, and a least squares approximation method as a way of estimating its magnitude. Asian option prices are calculated/approximated based on Quasi-Monte Carlo simulations and least squares regression, where a known volatility is being used as input. A regression tree then empirically builds a database of regression vectors for the implied volatility based on the simulated output of option prices. The mean squared errors between imputed and estimated volatilities are then compared using a five-folded cross-validation test as well as the non-parametric Kruskal-Wallis hypothesis test of equal distributions. The study results in a proposed semi-parametric model for estimating implied volatilities from options. The user must however be aware of that this model may suffer from bias in estimation, and should thereby be used with caution.

Page generated in 0.0701 seconds