Spelling suggestions: "subject:"bootstrap."" "subject:"gbootstrap.""
221 |
Bootstrap procedures for dynamic factor analysisZhang, Guangjian 12 September 2006 (has links)
No description available.
|
222 |
The Effects of Agriculture on Canada's Major WatershedsRamunno, Daniel 10 1900 (has links)
<p>Water contamination is one of the major environmental issues that negatively impacts water quality of watersheds. It negatively affects drinking water and aquatic wildlife, which can indirectly have negative effects on everyone's health. Many different institutions collected samples of water from four of Canada's major watersheds and counted the number of bacteria in each sample. The data used in this paper was taken from one of these institutions and was analysed to investigate if agricultural waste impacts the water quality of these four watersheds. It was found that the agricultural waste produced from nearby farms significantly impacts the water quality of three of these watersheds. Principal component analysis was also done on these data, and it was found that all of the data can be expressed in terms of one variable without losing very much information of the data. The bootstrap distributions of the principal component analysis parameters were estimated, and it was found that the sampling distributions of these parameters are stable. There was also evidence that the variables in the data are not normally distributed and not all the variables are independent.</p> / Master of Science (MSc)
|
223 |
TIME SERIES BLOCK BOOTSTRAP APPLICATION AND EFFECT OF AGGREGATION AND SYSTEMATIC SAMPLINGKim, Hang January 2018 (has links)
In this dissertation, we review the basic properties of the bootstrap and time series application. Then we apply parametric bootstrap on three simulated normal i.i.d. samples and nonparametric bootstrap on four real life financial returns. Among the time series bootstrap methods, we look into the specific method called block bootstrap and investigate the block length consideration to properly select a suitable block size for AR(1) model. We propose a new rule of blocking named as Combinatorially-Augmented Block Bootstrap(CABB). We compare the existing block bootstrap and CABB method using the simulated i.i.d. samples, AR(1) time series, and the real life examples. Both methods perform equally well in estimating AR(1) coefficients. CABB produces a smaller standard deviation based on our simulated and empirical studies. We study two procedures of collecting time series, (i) aggregation of a flow variable and (ii) systematic sampling of a stock variable. In these two procedures, we derive theorems that calculate exact equations for $m$ aggregated and $m^{th}$ systematically sampled series of the original AR(1) model. We evaluate the performance of block bootstrap estimation of the parameters of ARMA(1,1) and AR(1) model using aggregated and systematically sampled series. Simulation and real data analyses show that in some cases, the performance of the estimation based on the block bootstrap method for the MA(1) parameter of the ARMA(1,1) model in aggregated series is better than the one without using bootstrap. In an extreme case of stock price movement, which is close to a random walk, the block bootstrap estimate using systematically sampled series is closer to the true parameter, defined as the parameter calculated by the theorem. Specifically, the block bootstrap estimate of the parameter of AR(1) model using the systematically sampled series is closer to phi(n) than that based on the MLE for the AR(1) model. Future research problems include theoretical investigation of CABB, effectiveness of block bootstrap in other time series analyses such as nonlinear or VAR. / Statistics
|
224 |
Bandwidth Selection Concerns for Jump Point Discontinuity Preservation in the Regression Setting Using M-smoothers and the Extension to hypothesis TestingBurt, David Allan 31 March 2000 (has links)
Most traditional parametric and nonparametric regression methods operate under the assumption that the true function is continuous over the design space. For methods such as ordinary least squares polynomial regression and local polynomial regression the functional estimates are constrained to be continuous. Fitting a function that is not continuous with a continuous estimate will have practical scientific implications as well as important model misspecification effects. Scientifically, breaks in the continuity of the underlying mean function may correspond to specific physical phenomena that will be hidden from the researcher by a continuous regression estimate. Statistically, misspecifying a mean function as continuous when it is not will result in an increased bias in the estimate.
One recently developed nonparametric regression technique that does not constrain the fit to be continuous is the jump preserving M-smooth procedure of Chu, Glad, Godtliebsen & Marron (1998),`Edge-preserving smoothers for image processing', Journal of the American Statistical Association 93(442), 526-541. Chu et al.'s (1998) M-smoother is defined in such a way that the noise about the mean function is smoothed out while jumps in the mean function are preserved. Before the jump preserving M-smoother can be used in practice the choice of the bandwidth parameters must be addressed. The jump preserving M-smoother requires two bandwidth parameters h and g. These two parameters determine the amount of noise that is smoothed out as well as the size of the jumps which are preserved. If these parameters are chosen haphazardly the resulting fit could exhibit worse bias properties than traditional regression methods which assume a continuous mean function. Currently there are no automatic bandwidth selection procedures available for the jump preserving M-smoother of Chu et al. (1998).
One of the main objectives of this dissertation is to develop an automatic data driven bandwidth selection procedure for Chu et al.'s (1998) M-smoother. We actually present two bandwidth selection procedures. The first is a crude rule of thumb method and the second is a more sophistocated direct plug in method. Our bandwidth selection procedures are modeled after the methods of Chu et al. (1998) with two significant modifications which make the methods robust to possible jump points.
Another objective of this dissertation is to provide a nonparametric hypothesis test, based on Chu et al.'s (1998) M-smoother, to test for a break in the continuity of an underlying regression mean function. Our proposed hypothesis test is nonparametric in the sense that the mean function away from the jump point(s) is not required to follow a specific parametric model. In addition the test does not require the user to specify the number, position, or size of the jump points in the alternative hypothesis as do many current methods. Thus the null and alternative hypotheses for our test are: H0: The mean function is continuous (i.e. no jump points) vs. HA: The mean function is not continuous (i.e. there is at least one jump point).
Our testing procedure takes the form of a critical bandwidth hypothesis test. The test statistic is essentially the largest bandwidth that allows Chu et al.'s (1998) M-smoother to satisfy the null hypothesis. The significance of the test is then calculated via a bootstrap method. This test is currently in the experimental stage of its development. In this dissertation we outline the steps required to calculate the test as well as assess the power based on a small simulation study. Future work such as a faster calculation algorithm is required before the testing procedure will be practical for the general user. / Ph. D.
|
225 |
Rates and dates: Evaluating rhythmicity and cyclicity in sedimentary and biomineral recordsDexter, Troy Anthony 05 June 2011 (has links)
It is important to evaluate periodic fluctuations in environment or climate recorded through time to better understand the nature of Earth's history as well as to develop ideas about what the future may hold. There exist numerous proxies by which these environmental patterns can be demonstrated and analyzed through various time scales; from sequence stratigraphic bundles of transgressive-regressive cycles that demonstrate eustatic changes in global sea level, to the geochemical composition of a skeleton that records fluctuations in ocean temperature through the life of the biomineralizing organism. This study examines some of the methods by which we can analyze environmental fluctuations recorded at different time scales. The first project examines the methods by which extrabasinal orbital forcing (i.e. Milankovitch cycles) can be tested in the rock record. In order to distinguish these patterns, computer generated carbonate rock records were simulated with the resulting outcrops tested using common methods. These simulations were built upon eustatic sea level fluctuations with periods similar to what has been demonstrated in the rock record, as well as maintaining the many factors that affect the resultant rock composition such as tectonics, subsidence, and erosion. The result demonstrated that substantially large sea level fluctuations, such as those that occur when the planet is in an icehouse condition, are necessary to produce recognizable and preservable patterns that are otherwise overwhelmed by other depositional factors. The second project examines the temporal distribution of the bivalve Semele casali from Ubatuba Bay, Brazil by using amino acid racemization (AAR) calibrated with ¹⁴C radiometric dates. This data set is one of the largest ever compiled and demonstrates that surficial shell assemblages in the area have very long residence times extending back in time 10,000 years. The area has had very little change in sea level and the AAR ratios which are highly temperature dependent could be calibrated across sites varying from 10 to 53 meters in water depth. Long time scales of dated shells provide us with an opportunity to study climate fluctuations such as El Niño southern oscillation. The third project describes a newly developed method for estimating growth rates in organisms using closely related species from similar environments statistically analyzed for error using a jackknife corrected parametric bootstrap. As geochemical analyses get more precise while using less material, data can be collected through the skeleton of a biomineralizing organism, thus revealing information about environmental shifts at scales shorter than a year. For such studies, the rate of growth of an organism has substantial effects on the interpretation of results, and such rates of growth are difficult to ascertain, particularly in fossilized specimens. This method removes the need for direct measures of growth rates and even the most conservative estimates of growth rates are useful in constraining the age ranges of geochemical intra-skeletal studies, thus elucidating the likely time period under analysis. This study assesses the methods by which periodic environmental fluctuations at greatly varying time scales can be used to evaluate our understanding of earth processes using rigorous quantitative strategies. / Ph. D.
|
226 |
Confidence Interval Estimation for Distribution Systems Power Consumption by Using the Bootstrap MethodCugnet, Pierre 17 July 1997 (has links)
The objective of this thesis is to estimate, for a distribution network, confidence intervals containing the values of nodal hourly power consumption and nodal maximum power consumption per customer where they are not measured. The values of nodal hourly power consumption are needed in operational as well as in planning stages to carry out load flow studies. As for the values of nodal maximum power consumption per customer, they are used to solve planning problems such as transformer sizing. Confidence interval estimation was preferred to point estimation because it takes into consideration the large variability of the consumption values. A computationally intensive statistical technique, namely the bootstrap method, is utilized to estimate these intervals. It allows us to replace idealized model assumptions for the load distributions by model free analyses.
Two studies have been executed. The first one is based on the original nonparametric bootstrap method to calculate a 95% confidence interval for nodal hourly power consumption. This estimation is carried out for a given node and a given hour of the year. The second one makes use of the parametric bootstrap method in order to infer a 95% confidence interval for nodal maximum power consumption per customer. This estimation is realized for a given node and a given month. Simulation results carried out on a real data set are presented and discussed. / Master of Science
|
227 |
Imputação AMMI Bootstrap Não-paramétrico em dados multiambientais / AMMI imputation Non-parametric bootstrap in multenvironmental dataSilva, Maria Joseane Cruz da 20 January 2017 (has links)
Em estudos multiambientais, o processo de recomendação de genótipos com maior produção e a determinação de genótipos estáveis são de suma importância para os melhoristas. Porém, quando ocorre falta de genótipo em um ou mais ambientes este processo passa a ter dificuldades. Pois, este procedimento depende de métodos estatísticos que necessitam de uma matriz de dados sem dados em falta. Desde 1976 diversos matemáticos e estatísticos estudam, continuamente, uma forma de lidar com dados em falta em dados multiambientais buscando obter um método que estime, de forma precisa, as unidades ausentes sem perda de informação. Desta forma, esta pesquisa propõe um novo método de imputação baseado na metodologia AMMI fazendo reamostragens Bootstrap Não-paramétrico na matriz de médias de interação genótipos e ambientes (G × E), o modelo de imputação AMMI Bootstrap Não-paramétrico (IAMMI-BNP). Para estudo de simulação foi considerado o conjunto de dados referente a procedência S. of Ravenshoe - Mt Pandanus - QLD (14.420) de Eucalyptus grandis coletada na Austrália em 1983. Com a finalidade de obter estimativas precisas dos valores em falta, foi considerado dois estudos de simulação. O primeiro considerou 2000 reamostragens no sentido linha da matriz de interação G × E considerando duas porcentagens de perda de dados (10% e 20 %). O segundo estudo de simulação, considerou 200 reamostragens na matriz de falta (10%) e três diferentes modelos de IAMMI-BNP: IAMMI0-BNP, que considera apenas os efeitos principais do modelo AMMI; IAMMI1-BNP e IAMMI2-BNP que considera um e dois eixos multiplicados do modelo AMMI, respectivamente. De forma geral, de acordo com os métodos de comparação o método de imputação proposto nos dois estudos de simulação forneceu valores imputados próximos dos originais. Considerando os estudos de simulação com 10% de perda, a eficiência do método de imputação proposto foi melhor quando se utilizou o modelo IAMMI2-BNP (com dois eixos multiplicativos). O teste das ordens assinaladas de Wilcoxon mostrou que os valores imputados não influenciaram na estimativa da média, indicando que valores médios dos dados imputados de cada ambiente foram estatisticamente semelhantes aos valores médios originais. / In multienvironment studies, the process of recommendation of genotypes with higher production and the determination of stable environments are of utmost importance for plant breeders. However, when there is missing of genotype in one or more environments this process show difficulties. Therefore, this procedure depends on statistical methods that complete data matrix requered. Since 1976 various mathematical and statistical study, continually, one way of dealing with the loss of information on data multienvironments, seeking to obtain a method that estimate, precisely, the missing units without loss of information. In this way, the purpose of this study is develop a new method of apportionment based on the methodology AMMI doing reamostragens bootstrap nonparametric in the array of means of genotype x environment interaction (GE). For the study of simulation was considered the data set concerning the origin of S. Mexico City - Mt Pandanus - QLD (14,420) of Eucalyptus grandis collected in Australia in 1983. It was performed two studies of simulation. The first performed 2000 resampling on the lines of the interaction matrix G X E, for two percentages of missing data (10% and 20%). The second simulation study considered 200 replicates in the missing data set (10 %) and three different models of IMAMMI-BNP: AMAMMI0-BNP, which considers only the main effects of the AMMI model; IAMMI1-BNP and IAMMI2-BNP which considers one and two axes multiplied by the AMMI model, respectively. In general, according to the comparison methods, the imputation method proposed in the two simulation studies provided imputed values similar to the originals. Considering the simulation studies with 10 % loss, the efficiency of the proposed imputation method was better when using the IAMMI2-BNP model (with two multiplicative axes). The Wilcoxon test of the orders showed that the values imputed had no influence on the mean estimate, indicating that mean values of the data imputed from each environment were statistically similar to the original mean values.
|
228 |
On testing genetic covariance via the mean cross-products ratio / Teste da covariância genética via razão de produtos cruzados médiosSilva, Anderson Rodrigo da 17 July 2015 (has links)
When a genetic factor is being studied for more than one response variable, estimates of the genetic covariances are essential, specially in breeding programs. In a genetic covariance analysis, genetic and residual mean cross-products are obtained. Stochastically, to quantify the magnitude of the joint variation of two response variables due to genetic effect with respect to the variation due to residual effect may allow one to make inferences about the significance of the associated genetic covariance. In this study it is presented tests of significance for genetic covariance upon a twofold way: tests that take into account the genetic and environmental effects and tests that only consider the genetic information. The first way refers to tests based on the mean cross-products ratio via nonparametric bootstrap resampling and Monte Carlo simulation of Wishart matrices. The second way of testing genetic covariance refers to tests based on adaptation of Wilks\' and Pillai\'s statistics for evaluating independence of two sets of variables. For the first type of tests, empirical distributions under the null hypothesis, i.e., null genetic covariance, were built and graphically analyzed. In addition, the exact distribution of mean cross-products ratio obtained from variables normally distributed with zero mean and finite variance was examined. Writing computational algorithms in R language to perform the proposed tests was also an objective of this study. Only under certain conditions does the probability density function of the product of two random Gaussian variables approximate a normal curve. Therefore, studying the distribution of a mean cross-products ratio as a quotient of two Gaussian variables is not suitable. Tests based on mean cross-products ratio are related to both the value of the genetic covariance and the magnitude of the latter relative to the residual covariance. And both approaches (bootstrap and simulation) are more sensitive than the tests based only on genetic information. The performance of the tests based on mean cross-products ratio is related to the quality of the original data set in terms of the MANOVA assumptions, and the test statistic does not depend on the estimation of the matrix of genetic covariances ΣG. The adaptation of Wilks\' and Pillai\'s statistics can be used to test the genetic covariance. Their approximations to a χ21 distribution were checked and the accuracy of their inferences is related to the quality of G. / Quando um fator genético está sendo estudado em mais de uma variável de resposta, estimativas das covariâncias genéticas são essenciais, especialmente para programas de melhoramento. Em uma análise de covariância genética, produtos cruzados médios devido ao efeito genético, a partir do qual é obtida a covariância genética, e devido ao efeito residual são obtidos. Estocasticamente, quantificar a magnitude da variação conjunta de duas variáveis resposta devido ao efeito genético em relação à variação devida ao efeito residual pode permitir realizar inferências sobre a covariância genética associada. Neste estudo são apresentados testes de significância para a covariância genética de duas formas: testes que levam em conta os efeitos genéticos e ambientais (ou residuais) e testes que consideram apenas a informação genética. A primeira forma refere-se testes baseados na razão de produtos cruzados médios via bootstrap não paramétrico e simulação de matrizes Wishart pelo método de Monte Carlo. A segunda maneira de testar a covariância genética refere-se a testes com base em uma adaptação das estatísticas de Wilks e Pillai para avaliar a independência de dois conjuntos de variáveis. Para o primeiro tipo de testes, as distribuições empíricas sob a hipótese nula, ou seja, covariância genética nula, foram construídas e analisadas graficamente. Além disso, foi feito um estudo analítico da distribuição da razão de produtos cruzados médios obtidos a partir de variáveis normalmente distribuídas com média zero e variância finita. Escrever algoritmos computacionais em linguagem R para realizar os testes propostos também foi um dos objetivos deste estudo. Apenas sob certas condições a função de densidade de probabilidade do produto de duas variáveis aleatórias gaussianas aproxima-se da curva normal. Por conseguinte, o estudo da distribuição da razão de produtos cruzados médios como um quociente de duas variáveis gaussianas não é adequado. Os testes baseados na razão de produtos cruzados médios estão relacionados tanto com o valor da covariância genética quanto com a magnitude desta em relação à covariância residual. Ambas as abordagens (bootstrap e simulação) mostraram-se mais sensíveis do que os testes baseados apenas nas informações genéticas. O desempenho dos testes baseados na razão de produtos cruzados médios está relacionado à qualidade dos dados originais em termos das pressuposições da MANOVA, e a estatística de teste não depende da estimação da matriz de covariâncias genéticas ΣG. A adaptação das estatísticas de Wilks e Pillai pode ser usada para testar a covariância genética. As aproximações à distribuição Χ21 foi verificada. A precisão de suas inferências está relacionada a qualidade da matriz G.
|
229 |
Imputação AMMI Bootstrap Não-paramétrico em dados multiambientais / AMMI imputation Non-parametric bootstrap in multenvironmental dataMaria Joseane Cruz da Silva 20 January 2017 (has links)
Em estudos multiambientais, o processo de recomendação de genótipos com maior produção e a determinação de genótipos estáveis são de suma importância para os melhoristas. Porém, quando ocorre falta de genótipo em um ou mais ambientes este processo passa a ter dificuldades. Pois, este procedimento depende de métodos estatísticos que necessitam de uma matriz de dados sem dados em falta. Desde 1976 diversos matemáticos e estatísticos estudam, continuamente, uma forma de lidar com dados em falta em dados multiambientais buscando obter um método que estime, de forma precisa, as unidades ausentes sem perda de informação. Desta forma, esta pesquisa propõe um novo método de imputação baseado na metodologia AMMI fazendo reamostragens Bootstrap Não-paramétrico na matriz de médias de interação genótipos e ambientes (G × E), o modelo de imputação AMMI Bootstrap Não-paramétrico (IAMMI-BNP). Para estudo de simulação foi considerado o conjunto de dados referente a procedência S. of Ravenshoe - Mt Pandanus - QLD (14.420) de Eucalyptus grandis coletada na Austrália em 1983. Com a finalidade de obter estimativas precisas dos valores em falta, foi considerado dois estudos de simulação. O primeiro considerou 2000 reamostragens no sentido linha da matriz de interação G × E considerando duas porcentagens de perda de dados (10% e 20 %). O segundo estudo de simulação, considerou 200 reamostragens na matriz de falta (10%) e três diferentes modelos de IAMMI-BNP: IAMMI0-BNP, que considera apenas os efeitos principais do modelo AMMI; IAMMI1-BNP e IAMMI2-BNP que considera um e dois eixos multiplicados do modelo AMMI, respectivamente. De forma geral, de acordo com os métodos de comparação o método de imputação proposto nos dois estudos de simulação forneceu valores imputados próximos dos originais. Considerando os estudos de simulação com 10% de perda, a eficiência do método de imputação proposto foi melhor quando se utilizou o modelo IAMMI2-BNP (com dois eixos multiplicativos). O teste das ordens assinaladas de Wilcoxon mostrou que os valores imputados não influenciaram na estimativa da média, indicando que valores médios dos dados imputados de cada ambiente foram estatisticamente semelhantes aos valores médios originais. / In multienvironment studies, the process of recommendation of genotypes with higher production and the determination of stable environments are of utmost importance for plant breeders. However, when there is missing of genotype in one or more environments this process show difficulties. Therefore, this procedure depends on statistical methods that complete data matrix requered. Since 1976 various mathematical and statistical study, continually, one way of dealing with the loss of information on data multienvironments, seeking to obtain a method that estimate, precisely, the missing units without loss of information. In this way, the purpose of this study is develop a new method of apportionment based on the methodology AMMI doing reamostragens bootstrap nonparametric in the array of means of genotype x environment interaction (GE). For the study of simulation was considered the data set concerning the origin of S. Mexico City - Mt Pandanus - QLD (14,420) of Eucalyptus grandis collected in Australia in 1983. It was performed two studies of simulation. The first performed 2000 resampling on the lines of the interaction matrix G X E, for two percentages of missing data (10% and 20%). The second simulation study considered 200 replicates in the missing data set (10 %) and three different models of IMAMMI-BNP: AMAMMI0-BNP, which considers only the main effects of the AMMI model; IAMMI1-BNP and IAMMI2-BNP which considers one and two axes multiplied by the AMMI model, respectively. In general, according to the comparison methods, the imputation method proposed in the two simulation studies provided imputed values similar to the originals. Considering the simulation studies with 10 % loss, the efficiency of the proposed imputation method was better when using the IAMMI2-BNP model (with two multiplicative axes). The Wilcoxon test of the orders showed that the values imputed had no influence on the mean estimate, indicating that mean values of the data imputed from each environment were statistically similar to the original mean values.
|
230 |
Portfolio s maximálním výnosem / Maximum Return PortfolioPalko, Maximilián January 2019 (has links)
Classical method of portfolio selection is based on minimizing the variabi- lity of the portfolio. The Law of Large Numbers tells us that in case of longer investment horizon it should be enough to invest in the asset with the highest expected return which will eventually outperform any other portfolio. In our thesis we will suggest some portfolio creation methods which will create Maxi- mum Return Portfolios. These methods will be based on finding the asset with maximal expected return. That way we will avoid the problem of estimation errors of expected returns. Two of those methods will be selected based on the results of simulation analysis. Those two methods will be tested with the real stock data and compared with the S&P 500 index. Results of the testing suggest that our portfolios could have an application in the real world. Mainly because our portfolios showed to be significantly better than the index in the case of 10 year investment horizon. 1
|
Page generated in 0.0614 seconds