Spelling suggestions: "subject:"cross validation"" "subject:"gross validation""
151 |
Flying in the Academic Environment : An Exploratory Panel Data Analysis of CO2 Emission at KTHArtman, Arvid January 2024 (has links)
In this study, a panel data set of flights made by employees at the Royal Institute of Technology (KTH) in Sweden is analyzed using generalized linear modeling approaches, with the aim to create a model with high predictive capability of the quarterly CO2 emission and the number of flights, for a year not included in the model estimation. A Zero-inflated Gamma regression model is fitted to the CO2 emission variable and a Zero-inflated Negative Binomial regression model is used for the number of flights. To build the models, cross-validation is performed with the observations from 2018 as the training set and the observations from the next year, 2019, as the test set. One at a time, the variable that best improves the prediction of the test set data (either as included in the count model or the zero-inflation model) is selected until an additional variable turns out insignificant on a 5% significance level in the estimated model. In addition to the variables in the data, three lags of the dependent variables (CO2 emission and flights) were included, as well as transformed versions of the continuous variables, and a random intercept each for the categorical variables indicating quarter and department at KTH, respectively. Neither model selected through the cross-validation process turned out to be particularly good at predicting the values for the upcoming year, but a number of variables were proven to have a statistically significant association with the respective dependent variable.
|
152 |
[pt] ESTIMAÇÕES NÃO PARAMÉTRICAS DE CURVAS DE JUROS: CRITÉRIO DE SELEÇÃO DE MODELO, FATORES DETERMINANTES DEDESEMPENHO E BID-ASK SPREAD / [en] NON-PARAMETRIC ESTIMATIONS OF INTEREST RATE CURVES : MODEL SELECTION CRITERION: MODEL SELECTION CRITERIONPERFORMANCE DETERMINANT FACTORS AND BID-ASK SANDRE MONTEIRO D ALMEIDA MONTEIRO 11 June 2002 (has links)
[pt] Esta tese investiga a estimação de curvas de juros sob o
ponto de vista de métodos não-paramétricos. O texto está
dividido em dois blocos. O primeiro investiga a questão do
critério utilizado para selecionar o método de melhor
desempenho na tarefa de interpolar a curva de juros
brasileira em uma dada amostra. Foi proposto um critério
de
seleção de método baseado em estratégias de re-amostragem
do tipo leave-k-out cross validation, onde K k £ £ 1
e K é função do número de contratos observados a cada
curva
da amostra. Especificidades do problema reduzem o esforço
computacional requerido, tornando o critério factível. A
amostra tem freqüência diária: janeiro de 1997 a
fevereiro
de 2001. O critério proposto apontou o spline cúbico
natural -utilizado com método de ajuste perfeito aos
dados - como o método de melhor desempenho. Considerando
a
precisão de negociação, este spline mostrou-se não
viesado. A análise quantitativa de seu desempenho
identificou, contudo, heterocedasticidades nos erros
simulados. A partir da especificação da variância
condicional destes erros e de algumas hipóteses, foi
proposto um esquema de intervalo de segurança para a
estimação de taxas de juros pelo spline cúbico natural,
empregado como método de ajuste perfeito aos
dados. O backtest sugere que o esquema proposto é
consistente, acomodando bem as hipóteses e aproximações
envolvidas. O segundo bloco investiga a estimação da
curva
de juros norte-americana construída a partir dos
contratos
de swaps de taxas de juros dólar-Libor pela Máquina de
Vetores Suporte (MVS), parte do corpo da Teoria do
Aprendizado Estatístico. A pesquisa em MVS tem obtido
importantes avanços teóricos, embora ainda sejam escassas
as implementações em problemas reais de regressão. A MVS
possui características atrativas para a modelagem de
curva
de juros: é capaz de introduzir já na estimação
informações
a priori sobre o formato da curva e sobre aspectos da
formação das taxas e liquidez de cada um dos contratos a
partir dos quais ela é construída. Estas últimas são
quantificadas pelo bid-ask spread (BAS) de cada contrato.
A formulação básica da MVS é alterada para assimilar
diferentes valores do BAS sem que as propriedades dela
sejam perdidas. É dada especial atenção ao levantamento
de
informação a priori para seleção dos parâmetros da MVS a
partir do formato típico da curva. A amostra tem
freqüência diária: março de 1997 a abril de 2001. Os
desempenhos fora da amostra de diversas especificações da
MVS foram confrontados com aqueles de outros métodos de
estimação. A MVS foi o método que melhor controlou o
trade-
off entre viés e variância dos erros. / [en] This thesis investigates interest rates curve estimation
under non-parametric approach. The text is divided into two
parts. The first one focus on which criterion to use to
select the best performance method in the task of
interpolating Brazilian interest rate curve. A selection
criterion is proposed to measure out-of-sample performance
by combining resample strategies leave-k-out cross
validation applied upon the whole sample curves, where K k
£ £ 1 and K is function of observed contract number in each
curve. Some particularities reduce substantially
the required computational effort, making the proposed
criterion feasible. The data sample range is daily, from
January 1997 to February 2001. The proposed criterion
selected natural cubic spline, used as data perfect-fitting
estimation method. Considering the trade rate
precision, the spline is non-biased. However, quantitative
analysis of performance determinant factors showed the
existence of out-of-sample error heteroskedasticities. From
a conditional variance specification of these errors,
a security interval scheme is proposed for
interest rate generated by perfect-fitting natural cubic
spline. A backtest showed that the proposed security
interval is consistent, accommodating the evolved
assumptions and approximations.
The second part estimate US free-for-floating interest rate
swap contract curve by using Support Vector Machine (SVM),
a method derived from Statistical Learning Theory.
The SVM research has got important theoretical results,
however the number of implementation on real regression
problems is low. SVM has some attractive characteristics
for interest rates curves modeling: it has the ability to
introduce already in its estimation process a priori
information about curve shape and about liquidity and price
formation aspects of the contracts that generate the curve.
The last information set is quantified by the bid-ask
spread. The basic SVM formulation is changed in order to be
able to incorporate the different values for bid-ask
spreads, without losing its properties. Great attention is
given to the question of how to extract a priori
information from swap curve typical shape to be used in
MVS parameter selection. The data sample range is daily,
from March 1997 to April 2001.
The out-of-sample performances of different SVM
specifications are faced with others
method performances. SVM got the better control of trade-
off between bias and variance of out-of-sample errors.
|
153 |
Klientų duomenų valdymas bankininkystėje / Client data management in bankingŽiupsnys, Giedrius 09 July 2011 (has links)
Darbas apima banko klientų kredito istorinių duomenų dėsningumų tyrimą. Pirmiausia nagrinėjamos banko duomenų saugyklos, siekiant kuo geriau perprasti bankinius duomenis. Vėliau naudojant banko duomenų imtis, kurios apima kreditų grąžinimo istoriją, siekiama įvertinti klientų nemokumo riziką. Tai atliekama adaptuojant algoritmus bei programinę įrangą duomenų tyrimui, kuris pradedamas nuo informacijos apdorojimo ir paruošimo. Paskui pritaikant įvairius klasifikavimo algoritmus, sudarinėjami modeliai, kuriais siekiama kuo tiksliau suskirstyti turimus duomenis, nustatant nemokius klientus. Taip pat siekiant įvertinti kliento vėluojamų mokėti paskolą dienų skaičių pasitelkiami regresijos algoritmai bei sudarinėjami prognozės modeliai. Taigi darbo metu atlikus numatytus tyrimus, pateikiami duomenų vitrinų modeliai, informacijos srautų schema. Taip pat nurodomi klasifikavimo ir prognozavimo modeliai bei algoritmai, geriausiai įvertinantys duotas duomenų imtis. / This work is about analysing regularities in bank clients historical credit data. So first of all bank information repositories are analyzed to comprehend banks data. Then using data mining algorithms and software for bank data sets, which describes credit repayment history, clients insolvency risk is being tried to estimate. So first step in analyzis is information preprocessing for data mining. Later various classification algorithms is used to make models wich classify our data sets and help to identify insolvent clients as accurate as possible. Besides clasiffication, regression algorithms are analyzed and prediction models are created. These models help to estimate how long client are late to pay deposit. So when researches have been done data marts and data flow schema are presented. Also classification and regressions algorithms and models, which shows best estimation results for our data sets, are introduced.
|
154 |
Validation croisée et pénalisation pour l'estimation de densité / Cross-validation and penalization for density estimationMagalhães, Nelo 26 May 2015 (has links)
Cette thèse s'inscrit dans le cadre de l'estimation d'une densité, considéré du point de vue non-paramétrique et non-asymptotique. Elle traite du problème de la sélection d'une méthode d'estimation à noyau. Celui-ci est une généralisation, entre autre, du problème de la sélection de modèle et de la sélection d'une fenêtre. Nous étudions des procédures classiques, par pénalisation et par rééchantillonnage (en particulier la validation croisée V-fold), qui évaluent la qualité d'une méthode en estimant son risque. Nous proposons, grâce à des inégalités de concentration, une méthode pour calibrer la pénalité de façon optimale pour sélectionner un estimateur linéaire et prouvons des inégalités d'oracle et des propriétés d'adaptation pour ces procédures. De plus, une nouvelle procédure rééchantillonnée, reposant sur la comparaison entre estimateurs par des tests robustes, est proposée comme alternative aux procédures basées sur le principe d'estimation sans biais du risque. Un second objectif est la comparaison de toutes ces procédures du point de vue théorique et l'analyse du rôle du paramètre V pour les pénalités V-fold. Nous validons les résultats théoriques par des études de simulations. / This thesis takes place in the density estimation setting from a nonparametric and nonasymptotic point of view. It concerns the statistical algorithm selection problem which generalizes, among others, the problem of model and bandwidth selection. We study classical procedures, such as penalization or resampling procedures (in particular V-fold cross-validation), which evaluate an algorithm by estimating its risk. We provide, thanks to concentration inequalities, an optimal penalty for selecting a linear estimator and we prove oracle inequalities and adaptative properties for resampling procedures. Moreover, new resampling procedure, based on estimator comparison by the mean of robust tests, is introduced as an alternative to procedures relying on the unbiased risk estimation principle. A second goal of this work is to compare these procedures from a theoretical point of view and to understand the role of V for V-fold penalization. We validate these theoretical results on empirical studies.
|
155 |
Improving Efficiency of Prevention in Telemedicine / Zlepšování učinnosti prevence v telemedicíněNálevka, Petr January 2010 (has links)
This thesis employs data-mining techniques and modern information and communication technology to develop methods which may improve efficiency of prevention oriented telemedical programs. In particular this thesis uses the ITAREPS program as a case study and demonstrates that an extension of the program based on the proposed methods may significantly improve the program's efficiency. ITAREPS itself is a state of the art telemedical program operating since 2006. It has been deployed in 8 different countries around the world, and solely in the Czech republic it helped prevent schizophrenic relapse in over 400 participating patients. Outcomes of this thesis are widely applicable not just to schizophrenic patients but also to other psychotic or non-psychotic diseases which follow a relapsing path and satisfy certain preconditions defined in this thesis. Two main areas of improvement are proposed. First, this thesis studies various temporal data-mining methods to improve relapse prediction efficiency based on diagnostic data history. Second, latest telecommunication technologies are used in order to improve quality of the gathered diagnostic data directly at the source.
|
156 |
臺灣地區的人口推估研究 / The study of population projection: a case study in Taiwan area黃意萍 Unknown Date (has links)
台灣地區的人口隨著生育率及死亡率的雙重下降而呈現快速老化,其中生育率的降低影響尤為顯著。民國50年時,台灣平均每位婦女生育5.58個小孩,到了民國70年卻只生育1.67個小孩,去年(民國90年)生育率更創歷年新低,只有1.4。死亡率的下降可由平均壽命的延長看出,民國75年時男性為70.97歲,女性為75.88歲;到了民國90年,男性延長到72.75歲,女性延長到78.49歲。由於生育率的變化幅度高於死亡率,對人口結構的影響較大,因此本文分成兩個部份,主要在研究台灣地區15至49歲婦女生育率的變化趨勢,再將研究結果用於台灣地區未來人口總數及其結構的預測。
本研究第一部分是生育率的研究,引進Gamma函數、Gompertz函數、Lee-Carter法三種模型及單一年齡組個別估計法,以民國40年至84年(西元1951年至1995年)的資料為基礎,民國85年至89年(西元1996年至2000年)資料為檢測樣本,比較模型的優劣,尋求較適合台灣地區生育率的模型,再以最合適的模型預測民國91年至140年(西元2002年至2051年)的生育率。第二部分是人口推估,採用人口變動要素合成方法(Cohort Component Projection Method)推估台灣地區未來50年的人口總數及其結構,其中生育率採用上述最適合台灣地區的模型、死亡率則引進國外知名的Lee-Carter法及SOA法(Society of Actuaries),探討人口結構,並與人力規劃處的結果比較之。 / Both the fertility rate and mortality rate have been experiencing dramatic decreases in recent years. As a result, the population aging has become one of the major concerns in Taiwan area, and the proportion of the elderly (age 65 and over) increases promptly from 2.6% in 1965 to 8.8% in 2001. The decrease of fertility rate is especially significant. For example, the total fertility rate was 5.58 in 1961, and then decreases dramatically to 1.67 in 1981 (1.4 in 2001), a reduction of almost 70% within 20 years.
The goal of this paper is to study the population aging in Taiwan area, in particular, the fertility pattern. The first part of this paper is to explore the fertility models and decide which model is the most suitable based on age-fertility fertility rates in Taiwan. The models considered are Gamma function, Gompertz function, Lee-Carter method and individual group estimation. We use the data from 1951 to 1995 as pilot data and 1996 to 2000 as test data to judge which model fit well. The second part of this study is to project the Taiwan population for the next 50 years, i.e. 2002-2051. The projection method used is Cohort Component Projection method, assuming the population in Taiwan area is closed. We also compare our projection result to that by Council for Economic Planning and Development, the Executive Yuan of the Republic of China.
|
157 |
以部分法修正地理加權迴歸 / A conditional modification to geographically weighted regression梁穎誼, Leong , Yin Yee Unknown Date (has links)
在二十世紀九十年代,學者提出地理加權迴歸(Geographically Weighted Regression;簡稱GWR)。GWR是一個企圖解決空間非穩定性的方法。此方法最大的特性,是模型中的迴歸係數可以依空間的不同而改變,這也意味著不同的地理位置可以有不同的迴歸係數。在係數的估計上,每個觀察值都擁有一個固定環寬,而估計值可以由環寬範圍內的觀察值取得。然而,若變數之間的特性不同,固定環寬的設定可能會產生不可靠的估計值。
為了解決這個問題,本文章提出CGWR(Conditional-based GWR)的方法嘗試修正估計值,允許各迴歸變數有不同的環寬。在估計的程序中,CGWR運用疊代法與交叉驗證法得出最終的估計值。本文驗證了CGWR的收斂性,也同時透過電腦模擬比較GWR, CGWR與local linear法(Wang and Mei, 2008)的表現。研究發現,當迴歸係數之間存有正相關時,CGWR比其他兩個方法來的優異。最後,本文使用CGWR分析台灣高齡老人失能資料,驗證CGWR的效果。 / Geographically weighted regression (GWR), first proposed in the 1990s, is a modelling technique used to deal with spatial non-stationarity. The main characteristic of GWR is that it allows regression coefficients to vary across space, and so the values of the parameters can vary depending on locations. The parameters for each location can be estimated by observations within a fixed range (or bandwidth). However, if the parameters differ considerably, the fixed bandwidth may produce unreliable or even unstable estimates.
To deal with the estimation of greatly varying parameter values, we propose Conditional-based GWR (CGWR), where a different bandwidth is selected for each independent variable. The bandwidths for the independent variables are derived via an iteration algorithm using cross-validation. In addition to showing the convergence of the algorithm, we also use computer simulation to compare the proposed method with the basic GWR and a local linear method (Wang and Mei, 2008). We found that the CGWR outperforms the other two methods if the parameters are positively correlated. In addition, we use elderly disability data from Taiwan to demonstrate the proposed method.
|
158 |
High angular resolution diffusion-weighted magnetic resonance imaging: adaptive smoothing and applicationsMetwalli, Nader 07 July 2010 (has links)
Diffusion-weighted magnetic resonance imaging (MRI) has allowed unprecedented non-invasive mapping of brain neural connectivity in vivo by means of fiber tractography applications. Fiber tractography has emerged as a useful tool for mapping brain white matter connectivity prior to surgery or in an intraoperative setting. The advent of high angular resolution diffusion-weighted imaging (HARDI) techniques in MRI for fiber tractography has allowed mapping of fiber tracts in areas of complex white matter fiber crossings. Raw HARDI images, as a result of elevated diffusion-weighting, suffer from depressed signal-to-noise ratio (SNR) levels. The accuracy of fiber tractography is dependent on the performance of the various methods extracting dominant fiber orientations from the HARDI-measured noisy diffusivity profiles. These methods will be sensitive to and directly affected by the noise. In the first part of the thesis this issue is addressed by applying an objective and adaptive smoothing to the noisy HARDI data via generalized cross-validation (GCV) by means of the smoothing splines on the sphere method for estimating the smooth diffusivity profiles in three dimensional diffusion space. Subsequently, fiber orientation distribution functions (ODFs) that reveal dominant fiber orientations in fiber crossings are then reconstructed from the smoothed diffusivity profiles using the Funk-Radon transform. Previous ODF smoothing techniques have been subjective and non-adaptive to data SNR. The GCV-smoothed ODFs from our method are accurate and are smoothed without external intervention facilitating more precise fiber tractography.
Diffusion-weighted MRI studies in amyotrophic lateral sclerosis (ALS) have revealed significant changes in diffusion parameters in ALS patient brains. With the need for early detection of possibly discrete upper motor neuron (UMN) degeneration signs in patients with early ALS, a HARDI study is applied in order to investigate diffusion-sensitive changes reflected in the diffusion tensor imaging (DTI) measures axial and radial diffusivity as well as the more commonly used measures fractional anisotropy (FA) and mean diffusivity (MD). The hypothesis is that there would be added utility in considering axial and radial diffusivities which directly reflect changes in the diffusion tensors in addition to FA and MD to aid in revealing neurodegenerative changes in ALS. In addition, applying adaptive smoothing via GCV to the HARDI data further facilitates the application of fiber tractography by automatically eliminating spurious noisy peaks in reconstructed ODFs that would mislead fiber tracking.
|
159 |
Three Essays on Application of Semiparametric Regression: Partially Linear Mixed Effects Model and Index Model / Drei Aufsätze über Anwendung der Semiparametrischen Regression: Teilweise Lineares Gemischtes Modell und Index ModellOhinata, Ren 03 May 2012 (has links)
No description available.
|
160 |
Méthode non-paramétrique des noyaux associés mixtes et applications / Non parametric method of mixed associated kernels and applicationsLibengue Dobele-kpoka, Francial Giscard Baudin 13 June 2013 (has links)
Nous présentons dans cette thèse, l'approche non-paramétrique par noyaux associés mixtes, pour les densités àsupports partiellement continus et discrets. Nous commençons par rappeler d'abord les notions essentielles d'estimationpar noyaux continus (classiques) et noyaux associés discrets. Nous donnons la définition et les caractéristiques desestimateurs à noyaux continus (classiques) puis discrets. Nous rappelons aussi les différentes techniques de choix deparamètres de lissage et nous revisitons les problèmes de supports ainsi qu'une résolution des effets de bord dans le casdiscret. Ensuite, nous détaillons la nouvelle méthode d'estimation de densités par les noyaux associés continus, lesquelsenglobent les noyaux continus (classiques). Nous définissons les noyaux associés continus et nous proposons laméthode mode-dispersion pour leur construction puis nous illustrons ceci sur les noyaux associés non-classiques de lalittérature à savoir bêta et sa version étendue, gamma et son inverse, gaussien inverse et sa réciproque le noyau dePareto ainsi que le noyau lognormal. Nous examinons par la suite les propriétés des estimateurs qui en sont issus plusprécisément le biais, la variance et les erreurs quadratiques moyennes ponctuelles et intégrées. Puis, nous proposons unalgorithme de réduction de biais que nous illustrons sur ces mêmes noyaux associés non-classiques. Des études parsimulations sont faites sur trois types d’estimateurs à noyaux lognormaux. Par ailleurs, nous étudions lescomportements asymptotiques des estimateurs de densité à noyaux associés continus. Nous montrons d'abord lesconsistances faibles et fortes ainsi que la normalité asymptotique ponctuelle. Ensuite nous présentons les résultats desconsistances faibles et fortes globales en utilisant les normes uniformes et L1. Nous illustrons ceci sur trois typesd’estimateurs à noyaux lognormaux. Par la suite, nous étudions les propriétés minimax des estimateurs à noyauxassociés continus. Nous décrivons d'abord le modèle puis nous donnons les hypothèses techniques avec lesquelles noustravaillons. Nous présentons ensuite nos résultats minimax tout en les appliquant sur les noyaux associés non-classiquesbêta, gamma et lognormal. Enfin, nous combinons les noyaux associés continus et discrets pour définir les noyauxassociés mixtes. De là, les outils d'unification d'analyses discrètes et continues sont utilisés, pour montrer les différentespropriétés des estimateurs à noyaux associés mixtes. Une application sur un modèle de mélange des lois normales et dePoisson tronquées est aussi donnée. Tout au long de ce travail, nous choisissons le paramètre de lissage uniquementavec la méthode de validation croisée par les moindres carrés. / We present in this thesis, the non-parametric approach using mixed associated kernels for densities withsupports being partially continuous and discrete. We first start by recalling the essential concepts of classical continuousand discrete kernel density estimators. We give the definition and characteristics of these estimators. We also recall thevarious technical for the choice of smoothing parameters and we revisit the problems of supports as well as a resolutionof the edge effects in the discrete case. Then, we describe a new method of continuous associated kernels for estimatingdensity with bounded support, which includes the classical continuous kernel method. We define the continuousassociated kernels and we propose the mode-dispersion for their construction. Moreover, we illustrate this on the nonclassicalassociated kernels of literature namely, beta and its extended version, gamma and its inverse, inverse Gaussianand its reciprocal, the Pareto kernel and the kernel lognormal. We subsequently examine the properties of the estimatorswhich are derived, specifically, the bias, variance and the pointwise and integrated mean squared errors. Then, wepropose an algorithm for reducing bias that we illustrate on these non-classical associated kernels. Some simulationsstudies are performed on three types of estimators lognormal kernels. Also, we study the asymptotic behavior of thecontinuous associated kernel estimators for density. We first show the pointwise weak and strong consistencies as wellas the asymptotic normality. Then, we present the results of the global weak and strong consistencies using uniform andL1norms. We illustrate this on three types of lognormal kernels estimators. Subsequently, we study the minimaxproperties of the continuous associated kernel estimators. We first describe the model and we give the technicalassumptions with which we work. Then we present our results that we apply on some non-classical associated kernelsmore precisely beta, gamma and lognormal kernel estimators. Finally, we combine continuous and discrete associatedkernels for defining the mixed associated kernels. Using the tools of the unification of discrete and continuous analysis,we show the different properties of the mixed associated kernel estimators. All through this work, we choose thesmoothing parameter using the least squares cross-validation method.
|
Page generated in 0.1035 seconds