121 |
Genetic analysis of body weight at different ages in the Grootfontein Merino StudNemutandani, Khetho Ratshilumela January 2017 (has links)
Body weight is considered an important trait for the selection of replacement animals in both wool
and mutton sheep. Knowledge of the genetic variance of each trait and covariances among traits is
essential for effective genetic evaluation and improvement programs. It is important that estimated
breeding values for performance traits should be estimated as accurately as possible. This could be
achieved by fitting the most appropriate statistical model, which accounts for all known non-genetic
effects, as well as correctly partitioning the genetic variance into its various sources. The aim of this
study was to identify the most appropriate models for estimation of breeding values for body weights
recorded at different ages in Merino sheep. Various statistical procedures, including uni- and
multivariate linear models employing restricted maximum likelihood methods, random regression and
repeatability models were evaluated. The dataset used in this study comprises body weight data
recorded at different ages in the Grootfontein Merino stud from 1968 to 2012. The total number of
males and females for which birth weight was recorded, were 7794 and 8317 respectively. The
univariate direct heritability of body weight increased with an increase in age. Direct heritability
estimates were 0.20 ± 0.03 for birth weight, 0.16 ± 0.02 for weaning weight, 0.51 ± 0.04 for 15-month
body weight and 0.40 ± 0.05 for 3-year adult body weight. Maternal heritability estimates were 0.11 ±
0.02 for birth weight, 0.04 ± 0.01 for weaning weight and 0.08 ± 0.02 for 15-month body weight. The
genetic correlation between direct and maternal effects was negative for all weights where it was
included and ranged from -0.95 ± 0.14 for 6-month body weight to -0.28 ± 0.09 for birth weight. The
repeatability model including direct and maternal genetic effects, without splines, was the most
appropriate repeatability model for estimation of genetic parameters for body weight. The accuracy of
the estimated breeding values were determined using Spearman rank correlations and number and
proportion of common animals in the Top 10% and Top 1% lists. The comparison of estimated
breeding values for body weights obtained with univariate, multivariate and repeatability models
revealed that the multivariate model was the most efficient method due to the high accuracies
obtained with this procedure. These results will be implemented when estimating breeding values for
body weights for the animals in the Merino reference population during the development phase of a
suitable SNP key to be used in genomic selection for body weight in South African Merino sheep. / Dissertation (MSc (Agric))--University of Pretoria, 2017. / Animal and Wildlife Sciences / MSc (Agric) / Unrestricted
|
122 |
Robust A-optimal Subsampling for Massive Data Robust Linear RegressionZiting Tang (8081000) 05 December 2019 (has links)
<div>This thesis is concerned with massive data analysis via robust A-optimally efficient non-uniform subsampling. Motivated by the fact that massive data often contain outliers and that uniform sampling is not efficient, we give numerous sampling distributions by minimizing the sum of the component variances of the subsampling estimate. And these sampling distributions are robust against outliers. Massive data pose two computational bottlenecks. Namely, data exceed a computer’s storage space, and computation requires too long waiting time. The two bottle necks can be simultaneously addressed by selecting a subsample as a surrogate for the full sample and completing the data analysis. We develop our theory in a typical setting for robust linear regression in which the estimating functions are not differentiable. For an arbitrary sampling distribution, we establish consistency for the subsampling estimate for both fixed and growing dimension( as high dimensionality is common in massive data). We prove asymptotic normality for fixed dimension. We discuss the A-optimal scoring method for fast computing. We conduct large simulations to evaluate the numerical performance of our proposed A-optimal sampling distribution. Real data applications are also performed.</div>
|
123 |
Time-varying linear prediction as a base for an isolated-word recognition algorithmMcMillan, David Evans 17 November 2012 (has links)
There is a vast amount of research being done in the area of voice recognition. A large portion of this research concentrates on developing algorithms that will yield higher accuracy rates; such as algorithms based on dynamic time warping, vector quantization, and other mathematical methods [l2][21][l5].
In this research, the evaluation of the feasibility of using linear prediction (LP) with time-varying parameters as a base for a voice recognition algorithm will be investigated. First the development of an anti-aliasing filter is discussed with some results from the filter hardware realization included. Then a brief discussion of LP is presented and a method for time-varying LP is derived from this discussion. A comparison between time-varying and segmentation LP is made and a description of the developed algorithm that tests time-varying LP as a recognition technique is given. The evaluation is conducted with the developed algorithm configured for speaker-dependent and speaker-independent isolated-word recognition.
The conclusion drawn from this research is that this particular technique of voice recognition is very feasible as a base for a voice recognition algorithm. With the incorporation of other techniques, a complete algorithm can conceivably be developed that will yield very high accuracy rates. Recommendations for algorithm improvements are given along with other techniques that might be added to make a complete recognition algorithm. / Master of Science
|
124 |
Ill-conditioned information matrices and the generalized linear model: an asymptotically biased estimation approachMarx, Brian D. January 1988 (has links)
In the regression framework of the generalized linear model (Nelder and Wedderburn (1972)), interative maximum likelihood parameter estimation is employed via the method of scoring. This iterative procedure involves a key matrix, the information matrix. Ill-conditioning of the information matrix can be responsible for making many desirable properties of the parameter estimates unattainable. Some asymptotically biased alternatives to maximum likelihood estimation are put forth which alleviate the detrimental effects of near singular information. Notions of ridge estimation (Hoerl and Kennard (1970a) and Schaefer (1979)), principal component estimation (Webster et al. (1974) and Schaefer (1986)), and Stein estimation (Stein (1960)) are extended into a regression setting utilizing any one of an entire class of response distributions. / Ph. D.
|
125 |
Sequential robust response surface strategyDeFeo, Patrick A. January 1988 (has links)
General Response Surface Methodology involves the exploration of some response variable which is a function of other controllable variables. Many criteria exist for selecting an experimental design for the controllable variables. A good choice of a design is one that may not be optimal in a single sense, but rather near optimal with respect to several criteria. This robust approach can lend well to strategies that involve sequential or two stage experimental designs.
An experimenter that fits a first order regression model for the response often fears the presence of curvature in the system. Experimental designs can be chosen such that the experimenter who fits a first order model will have a high degree of protection against potential model bias from the presence of curvature. In addition, designs can also be selected such that the experimenter will have a high chance for detection of curvature in the system. A lack of fit test is usually performed for detection of curvature in the system. Ideally, an experimenter desires good detection capabilities along with good protection capabilities.
An experimental design criterion that incorporates both detection and protection capabilities is the A₂* criterion. This criterion is used to select the designs which maximize the average noncentrality parameter of the lack of fit test among designs with a fixed bias. The first order rotated design class is a new class of designs that offers an improvement in terms of the A₂* criterion over standard first order factorial designs. In conjunction with a sequential experimental strategy, a class of second order rotated designs are easily constructed by augmenting the first order rotated designs. These designs allow for estimation of second order model terms when a significant lack of fit is observed.
Two other design criteria, that are closely related, and incorporate both detection and protection capabilities are the J<sub>PCA</sub>, and J<sub>PCMAX</sub> criterion. J<sub>PCA</sub>, considers the average mean squared error of prediction for a first order model over a region where the detection capabilities of the lack of fit test are not strong. J<sub>PCMAX</sub> considers the maximum mean squared error of prediction over the region where the detection capabilities are not strong. The J<sub>PCA</sub> and J<sub>PCMAX</sub> criteria are used within a sequential strategy to select first order experimental designs that perform well in terms of the mean squared error of prediction when it is likely that a first order model will be employed. These two criteria are also adopted for nonsequential experiments for the evaluation of first order model prediction performance. For these nonsequential experiments, second order designs are used and constructed based upon J<sub>PCA</sub> and J<sub>PCMAX</sub> for first order model properties and D₂ -efficiency and D-efficiency for second order model properties. / Ph. D.
|
126 |
Unbiased Estimation for the Contextual Effect of Duration of Adolescent Height Growth on Adulthood Obesity and Health Outcomes via Hierarchical Linear and Nonlinear ModelsCarrico, Robert 22 May 2012 (has links)
This dissertation has multiple aims in studying hierarchical linear models in biomedical data analysis. In Chapter 1, the novel idea of studying the durations of adolescent growth spurts as a predictor of adulthood obesity is defined, established, and illustrated. The concept of contextual effects modeling is introduced in this first section as we study secular trend of adulthood obesity and how this trend is mitigated by the durations of individual adolescent growth spurts and the secular average length of adolescent growth spurts. It is found that individuals with longer periods of fast height growth in adolescence are more prone to having favorable BMI profiles in adulthood. In Chapter 2 we study the estimation of contextual effects in a hierarchical generalized linear model (HGLM). We simulate data and study the effects using the higher level group sample mean as the estimate for the true mean versus using an Empirical Bayes (EB) approach (Shin and Raudenbush 2010). We study this comparison for logistic, probit, log-linear, ordinal and nominal regression models. We find that in general the EB estimate lends a parameter estimate much closer to the true value, except for cases with very small variability in the upper level, where it is a more complicated situation and there is likely no need for contextual effects analysis. In Chapter 3 the HGLM studies are made clearer with large-scale simulations. These large scale simulations are shown for logistic regression and probit regression models for binary outcome data. With repetition we are able to establish coverage percentages of the confidence intervals of the true contextual effect. Coverage percentages show the percentage of simulations that have confidence intervals containing the true parameter values. Results confirm observations from the preliminary simulations in the previous section of this paper, and an accompanying example of adulthood hypertension shows how these results can be used in an application.
|
127 |
Modelos não lineares e lineares generalizados para avaliação da germinação de sementes de milho e soja / Non-linear and linear generalized models for evaluation of the germination of corn and soybean seedsAmorim, Deoclecio Jardim 24 January 2019 (has links)
Submitted by DEOCLECIO JARDIM AMORIM (deocleciojardim@hotmail.com) on 2019-01-31T12:16:23Z
No. of bitstreams: 1
DISSERTAÇÃO.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) / Approved for entry into archive by Ana Lucia de Grava Kempinas (algkempinas@fca.unesp.br) on 2019-01-31T18:34:52Z (GMT) No. of bitstreams: 1
amorim_dj_me_botfca.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) / Made available in DSpace on 2019-01-31T18:34:52Z (GMT). No. of bitstreams: 1
amorim_dj_me_botfca.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5)
Previous issue date: 2019-01-24 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Dentre as características mais estudadas na indústria de sementes e bancos de germoplasma, destaca-se o potencial fisiológico, tendo em vista que sementes de maior qualidade fisiológica permitem obter uma rápida e uniforme emergência das plântulas, e consequentemente o estabelecimento do estande. O objetivo dessa pesquisa foi avaliar a germinação de sementes de milho (Zea mays L.) e soja (Glycine max (L.) Merrill) por meio de modelos não lineares e lineares generalizados. Foram utilizadas as cultivares de milho: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante e BRS 4103, e de soja as cultivares: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO e CD2857 RR, ambas da safra 2016/17. Avaliou-se a germinação de 20 sementes com quatro repetições por cultivar por meio do teste de emissão da raiz primária (protrusão). A contagem das sementes germinadas foi efetuada em intervalos regulares de 6, 12 e 24 horas até 204 horas, adotando-se como critério de germinação a protrusão da raiz primária ≥ 2 mm. Os dados foram dispostos na forma de porcentagem acumulada ao longo do tempo e pela proporção de sementes viáveis em cada intervalo de tempo testado dado por uma sequência de ensaios de Bernoulli. Os dados de porcentagem acumulada ao longo do tempo foram modelados pelas curvas não lineares de Gompertz e função de Hill de quatro parâmetros e os dados de proporção foram avaliados por modelos lineares generalizados testando as funções ligação: Probit, Logit e Complemento Log Log. As cultivares de milho que apresentaram a maior velocidade de germinação foram: AL Bandeirante e BRS 4103. Para soja os melhores resultados foram observados para as cultivares CD251 RR e CD2737 RR. As metodologias corroboraram quanto à classificação da qualidade fisiológica das cultivares. A curva de Gompertz teve melhor ajuste e permitiu aplicações práticas para o estudo de germinação estabelecendo um novo parâmetro para comparação de diferentes lotes de sementes. Os modelos lineares generalizados constituem uma metodologia robusta para avaliação da germinação de sementes de diferentes lotes e espécies agrícolas permitindo estimar qualquer tempo de germinação e uniformidade. / Among the most studied characteristics in the seed industry and germplasm banks, the physiological potential stands out, since seeds of higher physiological quality allow a quick and uniform emergence of the seedlings, and consequently the establishment of the stand. The objective of this research was to evaluate the germination of corn (Zea mays L.) and soybean (Glycine max (L.) Merrill) seeds using nonlinear models and generalized linear. The used cultivars of corn were: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante and BRS 4103, and the soybean cultivars were: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO and CD2857 RR, both of the 2016/17 crop. The germination of 20 seeds with four replicates per cultivar was evaluated by the primary root emission test (protrusion). The germinated seeds were counted at regular intervals of 6, 12 and 24 hours up to 204 hours, with protrusion of the primary root ≥ 2 mm being the germination criterion. The data were plotted as a percentage accumulated over time and by the proportion of viable seeds at each interval of time tested given by a sequence of Bernoulli assays. The data of percentage accumulated over time were modeled by the non-linear Gompertz curves and Hill function with four parameters and the proportion data were evaluated by generalized linear models testing the linking functions: Probit, Logit and Complement Log Log.The corn cultivars with the highest germination speed were: AL Bandeirante and BRS 4103. For soybean the best results were observed for the cultivars CD251 RR and CD2737 RR. The methodologies corroborate the classification of the physiological quality of cultivars. The Gompertz curve had a better adjustment and allowed practical applications for the study of germination, establishing a new parameter for comparison of different seeds lots. The generalized linear models constitute a robust methodology to evaluate the germination of seeds of different lots and agricultural species, allowing to estimate any germination and uniformity time.
|
128 |
Discrepancy-based algorithms for best-subset model selectionZhang, Tao 01 May 2013 (has links)
The selection of a best-subset regression model from a candidate family is a common problem that arises in many analyses. In best-subset model selection, we consider all possible subsets of regressor variables; thus, numerous candidate models may need to be fit and compared. One of the main challenges of best-subset selection arises from the size of the candidate model family: specifically, the probability of selecting an inappropriate model generally increases as the size of the family increases. For this reason, it is usually difficult to select an optimal model when best-subset selection is attempted based on a moderate to large number of regressor variables.
Model selection criteria are often constructed to estimate discrepancy measures used to assess the disparity between each fitted candidate model and the generating model. The Akaike information criterion (AIC) and the corrected AIC (AICc) are designed to estimate the expected Kullback-Leibler (K-L) discrepancy. For best-subset selection, both AIC and AICc are negatively biased, and the use of either criterion will lead to overfitted models. To correct for this bias, we introduce a criterion AICi, which has a penalty term evaluated from Monte Carlo simulation. A multistage model selection procedure AICaps, which utilizes AICi, is proposed for best-subset selection.
In the framework of linear regression models, the Gauss discrepancy is another frequently applied measure of proximity between a fitted candidate model and the generating model. Mallows' conceptual predictive statistic (Cp) and the modified Cp (MCp) are designed to estimate the expected Gauss discrepancy. For best-subset selection, Cp and MCp exhibit negative estimation bias. To correct for this bias, we propose a criterion CPSi that again employs a penalty term evaluated from Monte Carlo simulation. We further devise a multistage procedure, CPSaps, which selectively utilizes CPSi.
In this thesis, we consider best-subset selection in two different modeling frameworks: linear models and generalized linear models. Extensive simulation studies are compiled to compare the selection behavior of our methods and other traditional model selection criteria. We also apply our methods to a model selection problem in a study of bipolar disorder.
|
129 |
Modelos lineares parciais aditivos generalizados com suavização por meio de P-splines / Generalized additive partial linear models with P-splines smoothingHolanda, Amanda Amorim 03 May 2018 (has links)
Neste trabalho apresentamos os modelos lineares parciais generalizados com uma variável explicativa contínua tratada de forma não paramétrica e os modelos lineares parciais aditivos generalizados com no mínimo duas variáveis explicativas contínuas tratadas de tal forma. São utilizados os P-splines para descrever a relação da variável resposta com as variáveis explicativas contínuas. Sendo assim, as funções de verossimilhança penalizadas, as funções escore penalizadas e as matrizes de informação de Fisher penalizadas são desenvolvidas para a obtenção das estimativas de máxima verossimilhança penalizadas por meio da combinação do algoritmo backfitting (Gauss-Seidel) e do processo iterativo escore de Fisher para os dois tipos de modelo. Em seguida, são apresentados procedimentos para a estimação do parâmetro de suavização, bem como dos graus de liberdade efetivos. Por fim, com o objetivo de ilustração, os modelos propostos são ajustados à conjuntos de dados reais. / In this work we present the generalized partial linear models with one continuous explanatory variable treated nonparametrically and the generalized additive partial linear models with at least two continuous explanatory variables treated in such a way. The P-splines are used to describe the relationship among the response and the continuous explanatory variables. Then, the penalized likelihood functions, penalized score functions and penalized Fisher information matrices are derived to obtain the penalized maximum likelihood estimators by the combination of the backfitting (Gauss-Seidel) algorithm and the Fisher escoring iterative method for the two types of model. In addition, we present ways to estimate the smoothing parameter as well as the effective degrees of freedom. Finally, for the purpose of illustration, the proposed models are fitted to real data sets.
|
130 |
Modelos lineares parciais aditivos generalizados com suavização por meio de P-splines / Generalized additive partial linear models with P-splines smoothingAmanda Amorim Holanda 03 May 2018 (has links)
Neste trabalho apresentamos os modelos lineares parciais generalizados com uma variável explicativa contínua tratada de forma não paramétrica e os modelos lineares parciais aditivos generalizados com no mínimo duas variáveis explicativas contínuas tratadas de tal forma. São utilizados os P-splines para descrever a relação da variável resposta com as variáveis explicativas contínuas. Sendo assim, as funções de verossimilhança penalizadas, as funções escore penalizadas e as matrizes de informação de Fisher penalizadas são desenvolvidas para a obtenção das estimativas de máxima verossimilhança penalizadas por meio da combinação do algoritmo backfitting (Gauss-Seidel) e do processo iterativo escore de Fisher para os dois tipos de modelo. Em seguida, são apresentados procedimentos para a estimação do parâmetro de suavização, bem como dos graus de liberdade efetivos. Por fim, com o objetivo de ilustração, os modelos propostos são ajustados à conjuntos de dados reais. / In this work we present the generalized partial linear models with one continuous explanatory variable treated nonparametrically and the generalized additive partial linear models with at least two continuous explanatory variables treated in such a way. The P-splines are used to describe the relationship among the response and the continuous explanatory variables. Then, the penalized likelihood functions, penalized score functions and penalized Fisher information matrices are derived to obtain the penalized maximum likelihood estimators by the combination of the backfitting (Gauss-Seidel) algorithm and the Fisher escoring iterative method for the two types of model. In addition, we present ways to estimate the smoothing parameter as well as the effective degrees of freedom. Finally, for the purpose of illustration, the proposed models are fitted to real data sets.
|
Page generated in 0.0828 seconds