Global ETD Search

201	Implementing SAE Techniques to Predict Global Spectacles Needs Zhang, Yuxue January 2023 (has links) This study delves into the application of Small Area Estimation (SAE) techniques to enhance the accuracy of predicting global needs for assistive spectacles. By leveraging the power of SAE, the research undertakes a comprehensive exploration, employing arange of predictive models including Linear Regression (LR), Empirical Best Linear Unbiased Prediction (EBLUP), hglm (from R package) with Conditional Autoregressive (CAR), and Generalized Linear Mixed Models (GLMM). At last phase,the global spectacle needs’ prediction includes various essential steps such as random effects simulation, coefficient extraction from GLMM estimates, and log-linear modeling. The investigation develops a multi-faceted approach, incorporating area-level modeling, spatial correlation analysis, and relative standard error, to assess their impact on predictive accuracy. The GLMM consistently displays the lowest Relative Standard Error (RSE) values, almost close to zero, indicating precise but potentially overfit results. Conversely, the hglm with CAR model presents a narrower RSE range, typically below 25%, reflecting greater accuracy; however, it is worth noting that it contains a higher number of outliers. LR illustrates a performance similar to EBLUP, with RSE values reaching around 50% in certain scenarios and displaying slight variations across different contexts. These findings underscore the trade-offs between precision and robustness across these models, especially for finer geographical levels and countries not included in the initial sample. small area estimation area-level model generalized linear mixed models Conditional Autoregressive spatial correlation spectacle needs assistive products auxiliary data hglm relative standard error simulation Probability Theory and Statistics Sannolikhetsteori och statistik
202	Optimierung von Messinstrumenten im Large-scale Assessment Hecht, Martin 21 July 2015 (has links) Messinstrumente stellen in der wissenschaftlichen Forschung ein wesentliches Element zur Erkenntnisgewinnung dar. Das Besondere an Messinstrumenten im Large-scale Assessment in der Bildungsforschung ist, dass diese normalerweise für jede Studie neu konstruiert werden und dass die Testteilnehmer verschiedene Versionen des Tests bekommen. Hierbei ergeben sich potentielle Gefahren für die Akkuratheit und Validität der Messung. Um solche Gefahren zu minimieren, sollten (a) die Ursachen für Verzerrungen der Messung und (b) mögliche Strategien zur Optimierung der Messinstrumente eruiert werden. Deshalb wird in der vorliegenden Dissertation spezifischen Fragestellungen im Rahmen dieser beiden Forschungsanliegen nachgegangen. / Measurement instruments are essential elements in the acquisition of knowledge in scientific research. Special features of measurement instruments in large-scale assessments of student achievement are their frequent reconstruction and the usage of different test versions. Here, threats for the accuracy and validity of the measurement may emerge. To minimize such threats, (a) sources for potential bias of measurement and (b) strategies to optimize measuring instruments should be explored. Therefore, the present dissertation investigates several specific topics within these two research areas. Psychologie Bildungsforschung Optimierung Kontexteffekte Bearbeitungszeit Blockpaarbalance Designeffekte Designbalancierung Generalisierte lineare Mischmodelle IQB IRT Lineare Mischmodelle Messmodell Messinstrument Multi-Matrix Design Parameterschätzung Parameterverzerrung Positionsbalance Rasch-Modell Testhefteffekte Testheftschwierigkeit Testheftleichtigkeit Testdesign Unvollständige Blockdesigns Positionseffekte Schulleistungsstudien context effects optimization linear mixed models generalized linear mixed models response time educational assessment cluster pair balance design effects design balance item response theory measurement model measurement instrument parameter estimation parameter bias position balance psychology Rasch model 1PL model booklet effects booklet difficulty booklet easiness test design booklet design incomplete block design multiple matrix sampling position effects large-scale assessment 150 Psychologie 11 Psychologie CM 3000 CM 3200 ddc:150
203	Hur påverkar avrundningar tillförlitligheten hos parameterskattningar i en linjär blandad modell? Stoorhöök, Li, Artursson, Sara January 2016 (has links) Tidigare studier visar på att blodtrycket hos gravida sjunker under andra trimestern och sedanökar i ett senare skede av graviditeten. Högt blodtryck hos gravida kan medföra hälsorisker, vilket gör mätningar av blodtryck relevanta. Dock uppstår det osäkerhet då olika personer inom vården hanterar blodtrycksmätningarna på olika sätt. Delar av vårdpersonalen avrundarmätvärden och andra gör det inte, vilket kan leda till svårigheter att tolkablodtrycksutvecklingen. I uppsatsen behandlas ett dataset innehållandes blodtrycksvärden hos gravida genom att skatta nio olika linjära regressionsmodeller med blandade effekter. Därefter genomförs en simuleringsstudie med syfte att undersöka hur mätproblem orsakat av avrundningar påverkar parameterskattningar och modellval i en linjär blandad modell. Slutsatsen är att blodtrycksavrundningarna inte påverkar typ 1-felet men påverkar styrkan. Dock innebär inte detta något problem vid fortsatt analys av blodtrycksvärdena i det verkliga datasetet. Coarsening Heaping Likelihood ratio test (LRT) Linear Mixed Models Mean square error (MSE) Measurement error Power Reliability Rounding Type I-error Validity Avrundning Förgrovning Hopning Likelihood ratio test (LRT) Linjär blandad modell Medelkvadratfel (MSE) Mätfel Reliabilitet Styrka Typ 1-fel Validitet
204	Design and analysis of sugarcane breeding experiments: a case study / Delineamento e análise de experimentos de melhoramento com cana de açúcar: um estudo de caso Santos, Alessandra dos 26 May 2017 (has links) One purpose of breeding programs is the selection of the better test lines. The accuracy of selection can be improved by using optimal design and using models that fit the data well. Finding this is not easy, especially in large experiments which assess more than one hundred lines without the possibility of replication due to the limited material, area and high costs. Thus, the large number of parameters in the complex variance structure that needs to be fitted relies on the limited number of replicated check varieties. The main objectives of this thesis were to model 21 trials of sugarcane provided by \"Centro de Tecnologia Canavieira\" (CTC - Brazilian company of sugarcane) and to evaluate the design employed, which uses a large number of unreplicated test lines (new varieties) and systematic replicated check (commercial) lines. The mixed linear model was used to identify the three major components of spatial variation in the plot errors and the competition effects at the genetic and residual levels. The test lines were assumed as random effects and check lines as fixed, because they came from different processes. The single and joint analyses were developed because the trials could be grouped into two types: (i) one longitudinal data set (two cuts) and (ii) five regional groups of experiment (each group was a region which had three sites). In a study of alternative designs, a fixed size trial was assumed to evaluate the efficiency of the type of unreplicated design employed in these 21 trials comparing to spatially optimized unreplicated and p-rep designs with checks and a spatially optimized p-rep design. To investigate models and design there were four simulation studies to assess mainly the i) fitted model, under conditions of competition effects at the genetic level, ii) accuracy of estimation in the separate versus joint analysis; iii) relation between the sugarcane lodging and the negative residual correlation, and iv) design efficiency. To conclude, the main information obtained from the simulation studies was: the convergence number of the algorithm model analyzed; the variance parameter estimates; the correlations between the direct genetic EBLUPs and the true direct genetic effects; the assertiveness of selection or the average similarity, where similarity was measured as the percentage of the 30 test lines with the highest direct genetic EBLUPs that are in the true 30 best test lines (generated); and the heritability estimates or the genetic gain. / Um dos propósitos dos programas de melhoramento genético é a seleção de novos clones melhores (novos materiais). A acurácia de seleção pode ser melhorada usando delineamentos ótimos e modelos bem ajustados. Porém, descobrir isso não é fácil, especialmente, em experimentos grandes que possuem mais de cem clones sem a possibilidade de repetição devido à limitação de material, área e custos elevados, dadas as poucas repetições de parcelas com variedades comerciais (testemunhas) e o número de parâmetros de complexa variância estrutural que necessitam ser assumidos. Os principais objetivos desta tese foram modelar 21 experimentos de cana de açúcar fornecidos pelo Centro de Tecnologia Canavieira (CTC - empresa brasileira de cana de açúcar) e avaliar o delineamento empregado, o qual usa um número grande de clones não repetidos e testemunhas sistematicamente repetidas. O modelo linear misto foi usado, identificando três principais componentes de variação espacial nos erros de parcelas e efeitos de competição, em nível genético e residual. Os clones foram assumidos de efeitos aleatórios e as testemunhas de efeitos fixos, pois vieram de processos diferentes. As análises individuais e conjuntas foram desenvolvidas neste material pois os experimentos puderam ser agrupados em dois tipos: (i) um delineamento longitudinal (duas colheitas) e (ii) cinco grupos de experimentos (cada grupo uma região com três locais). Para os estudos de delineamentos, um tamanho fixo de experimento foi assumido para se avaliar a eficiência do delineamento não replicado (empregado nesses 21 experimentos) com os não replicados otimizado espacialmente, os parcialmente replicados com testemunhas e os parcialmente replicados otimizado espacialmente. Quatro estudos de simulação foram feitos para avaliar i) os modelos ajustados, sob condições de efeito de competição em nível genético, ii) a acurácia das estimativas vindas dos modelos de análise individual e conjunta; iii) a relação entre tombamento da cana e a correlação residual negativa, e iv) a eficiência dos delineamentos. Para concluir, as principais informações utilizadas nos estudos de simulação foram: o número de vezes que o algoritmo convergiu; a variância na estimativa dos parâmetros; a correlação entre os EBLUPs genético direto e os efeitos genéticos reais; a assertividade de seleção ou a semelhança média, sendo semelhança medida como a porcentagem dos 30 clones com os maiores EBLUPS genético e os 30 melhores verdadeiros clones; e a estimativa da herdabilidade ou do ganho genético. Assertiveness of selection Assertividade na seleção Autoregressive correlation Banded correlation Competition Correlação autorregressiva Correlação em banda Delineamento não replicado Delineamento ótimo Delineamento parcialmente replicado Estudo de simulação Ganho genético Genetic gain Lodging Mixed models Modelos mistos Optimal design p-rep design Simulation study Tombamento Unreplicated design
205	Estimation robuste de courbes de consommmation électrique moyennes par sondage pour de petits domaines en présence de valeurs manquantes / Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values De Moliner, Anne 05 December 2017 (has links) Dans cette thèse, nous nous intéressons à l'estimation robuste de courbes moyennes ou totales de consommation électrique par sondage en population finie, pour l'ensemble de la population ainsi que pour des petites sous-populations, en présence ou non de courbes partiellement inobservées.En effet, de nombreuses études réalisées dans le groupe EDF, que ce soit dans une optique commerciale ou de gestion du réseau de distribution par Enedis, se basent sur l'analyse de courbes de consommation électrique moyennes ou totales, pour différents groupes de clients partageant des caractéristiques communes. L'ensemble des consommations électriques de chacun des 35 millions de clients résidentiels et professionnels Français ne pouvant être mesurées pour des raisons de coût et de protection de la vie privée, ces courbes de consommation moyennes sont estimées par sondage à partir de panels. Nous prolongeons les travaux de Lardin (2012) sur l'estimation de courbes moyennes par sondage en nous intéressant à des aspects spécifiques de cette problématique, à savoir l'estimation robuste aux unités influentes, l'estimation sur des petits domaines, et l'estimation en présence de courbes partiellement ou totalement inobservées.Pour proposer des estimateurs robustes de courbes moyennes, nous adaptons au cadre fonctionnel l'approche unifiée d'estimation robuste en sondages basée sur le biais conditionnel proposée par Beaumont (2013). Pour cela, nous proposons et comparons sur des jeux de données réelles trois approches : l'application des méthodes usuelles sur les courbes discrétisées, la projection sur des bases de dimension finie (Ondelettes ou Composantes Principales de l'Analyse en Composantes Principales Sphériques Fonctionnelle en particulier) et la troncature fonctionnelle des biais conditionnels basée sur la notion de profondeur d'une courbe dans un jeu de données fonctionnelles. Des estimateurs d'erreur quadratique moyenne instantanée, explicites et par bootstrap, sont également proposés.Nous traitons ensuite la problématique de l'estimation sur de petites sous-populations. Dans ce cadre, nous proposons trois méthodes : les modèles linéaires mixtes au niveau unité appliqués sur les scores de l'Analyse en Composantes Principales ou les coefficients d'ondelettes, la régression fonctionnelle et enfin l'agrégation de prédictions de courbes individuelles réalisées à l'aide d'arbres de régression ou de forêts aléatoires pour une variable cible fonctionnelle. Des versions robustes de ces différents estimateurs sont ensuite proposées en déclinant la démarche d'estimation robuste basée sur les biais conditionnels proposée précédemment.Enfin, nous proposons quatre estimateurs de courbes moyennes en présence de courbes partiellement ou totalement inobservées. Le premier est un estimateur par repondération par lissage temporel non paramétrique adapté au contexte des sondages et de la non réponse et les suivants reposent sur des méthodes d'imputation. Les portions manquantes des courbes sont alors déterminées soit en utilisant l'estimateur par lissage précédemment cité, soit par imputation par les plus proches voisins adaptée au cadre fonctionnel ou enfin par une variante de l'interpolation linéaire permettant de prendre en compte le comportement moyen de l'ensemble des unités de l'échantillon. Des approximations de variance sont proposées dans chaque cas et l'ensemble des méthodes sont comparées sur des jeux de données réelles, pour des scénarios variés de valeurs manquantes. / In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios. Arbres de régression Biais conditionnels Données fonctionnelles Données manquantes Estimation sur petits domaines Estimateurs à noyau Forêts aléatoires Modèles linéaires mixtes Plus proches voisins Robustesse Sondage Conditional bias Functional data Kernel estimators Missing data Linear mixed models Nearest neighbours Random forests Regression trees Robustness Small area estimation Survey sampling 510
206	Modelagem espacial, temporal e longitudinal: diferentes abordagens do estudo da leptospirose urbana / Space, time and longitudinal modeling: different approaches for the urban leptospirosis study Tassinari, Wagner de Souza January 2009 (has links) Made available in DSpace on 2011-05-04T12:42:00Z (GMT). No. of bitstreams: 0 Previous issue date: 2009 / (...) O objetivo desta tese foi modelar os fatores de risco associados à ocorrência de leptospirose urbana em diferentes contextos, com especial atenção para aspectos espaciais e temporais. Foram utilizadas técnicas de modelagem tais como, modelos generalizados aditivos e mistos. Também explorou-se técnicas de detecção de aglomerados espaço-temporais. (...) / Leptospirosis, a disease caused by pathogenic spirochete of the genus Leptospira, is one of the most widespread zoonoses in the world, considered a major public health problem associated with the lack of sanitation and poverty. It is endemic in Brazil, data from surveillance show that outbreaks of leptospirosis occur as cyclical annual epidemics during rainfalls. The aim of this thesis was modeling the risk factors associated with the occurrence of leptospirosis in di erent urban contexts, with particular attention to spatial and temporal aspects. We used some modeling techniques such as generalized additive and mixed models. Techniques for detection space-time clusters were also explored. This thesis has prioritized the use of free softwares - R, ubuntu linux operating system, LATEX , SatScan (this is not open source but free). This thesis was prepared in the form of three articles. In the rst article is presented a spatio-temporal analysis of leptospirosis cases occurrence in Rio de Janeiro between 1997 and 2002. Using the detection of space-time clusters - \outbreaks" method - were statistically signi cant only cluster ocorred in 1997 and 1998. Generalized Linear Mixed Models were used to evaluate the risk factors associated with the occurrence of cases that belonged to outbreaks in endemic cases. The cases belonging to the outbreaks are associated with the occurrence of rainfall over 4 mm (OR, 3.71; 95% CI, 1.83 - 7.51). There were no signi cant associations with socioeconomic covariates, in other words, being endemic or epidemic leptospirosis occurs in the same population. The second and third articles examined a seroprevalence survey and seroconversion cohort conducted in Pau da Lima community, Salvador, Bahia. In both Generalized Additive Models were used to t the exposure variables both in individuals and peridomicile context, as well as to estimate the spatial area of leptospirosis risk. The signi cant variables were: gender, age, presence of rats in the peridomicile, domicile near a trash collectin or an open sewer and domicile altitude above sea level. Studies show that individual and contextual variables explain much of the spatial variability of leptospirosis, but there are still factors that were not measured in the studies but which should be investigated. The maps of risk of seroprevalence and seroconversion show distinct regions where the spatial e ect is signi cantly di erent from the global average. It is still lack for a more robust integration between the professionals who develop and operate the GIS, epidemiologists and biostatistics. This integration represents an important advance enabling the development and use of these techniques in Public Health support. The study of prevalence and incidence of endemic areas, in the leptospirosis context, it is very complex and still grow up. The reunion of professional specialists from several areas of human knowledge (eg, clinicians, epidemiologists, geographers, biologists, statisticians, engineers, etc.), it is essential to advance the knowledge about the disease and their relationship to social inequality and environmental well to contribute to the creation of efficient and e ective measures to control endemic diseases. Epidemiologia Ambiental Analise Espacial Modelos Lineares Generalizados Mistos Modelos Aditivos Generalizados Environmental Epidemiology Spatial Analysis Generalized Linear Mixed Models Generalized additive models Leptospirose/epidemiologia Leptospirose/transmissäo Distribuição Espacial da População Zonas Urbanas -Modelos Lineares Conglomerados Espaço-Temporais Prevalência Fatores de Risco Brasil
207	Modelagem de dados longitudinais aplicada a uma coorte de pacientes hipertensos resistentes / Modeling of longitudinal data applied to a cohort of resistant hypertensive patients Magnanini, Monica Maria Ferreira January 2010 (has links) Made available in DSpace on 2011-05-04T12:42:04Z (GMT). No. of bitstreams: 0 Previous issue date: 2010 / A hipertensão arterial é um dos mais importantes fatores de risco para o desenvolvimento das complicações cardiovasculares, cerebrovasculares e renais. Embora seja facilmente detectável, o controle dos níveis tensionais constitui um enorme desafio da saúde pública. O objetivo desta tese foi analisar dados longitudinais de uma coorte de pacientes hipertensos resistentes. Em estudos longitudinais, o principal foco de interesse é na mudança ocorrida ao longo do tempo; seja ela avaliada como tempo até o evento ou como medidas repetidas tomadas durante o período de acompanhamento. O presente trabalho foi organizado em três artigos onde foram apresentadas essas duas abordagens. No primeiro artigo, foi realizada uma Análise de Sobrevida, tendo como desfecho eventos cardiovasculares fatais e não fatais, em mulheres hipertensas da coorte. Foi verificado que para atingir o objetivo de diminuir a morbidade e a mortalidade cardiovascular nessa população, as decisões deveriam ser baseadas no controle da pressão devigília obtida na Monitorização Ambulatorial da Pressão Arterial (MAPA) e não no controle dapressão de consultório. No segundo artigo, foram usadas as medidas da pressão arterial (PA) obtidas na MAPA em sua forma resumida usual (médias de PA 24h, vigília e noturna). Ospacientes hipertensos pseudorresistentes apresentaram trajetória ascendente, indicando a necessidade de acompanhamento desses pacientes a intervalos inferiores a um ano. Além disso, não foi observada redução dos valores do índice de massa corporal e da circunferência da cintura nesses pacientes. O terceiro artigo abordou a evolução temporal dos valores do descenso noturnopressórico nos pacientes da coorte, além de estimar as probabilidades brutas de transição entre as categorias do descenso noturno, em MAPAs sucessivas. Apesar de não ultrapassar o limite de normalidade de 10 por cento, houve uma queda acentuada nos valores percentuais do descenso noturnodos pacientes dippers ao longo do tempo. A probabilidade estimada de permanência no estado dipper foi de 52 por cento, enquanto que no estado non dipper esse valor foi de 46 por cento. Nesses dois artigos foram usados Modelos Aditivos Generalizados Mistos, que incorporam efeitos aleatórios, umavez que a variação intra-paciente foi expressiva. A incorporação de métodos estatísticos mais sofisticados faz jus à qualidade e custo de coleta das informações longitudinais. Com base nesses três artigos, concluiu-se que o uso da MAPA é primordial no acompanhamento de pacientes hipertensos resistentes, pois permite detectar as variações ao longo do tempo na evolução clínica. / Hypertension is one of the most important risk facotors for cardiovascular, cerebrovascular and renal diseases. While it is easy to detect, blood pressure control is a major public health challenger. The objective of this thesis was to examine longitudinal data fron a cohort of resistant hypetensive patients. In longitudinal studies, the main focus of interest is on changes over time, either evalueated as time-to -event or as repeated measures taken durin the foloow-up. this thesis was organized in tree articles which presented these two approaches. In the fist article, in the suvival approach, we modeled the time free of fatal and nonfatal cardiovasculr event, in hypertensive women of the cohort. It was found that to achieve the goal of decreasing cardiovasculr morbidity and mortality in this population, decisions should be based on the control of daytime Ambulatory Blood Pressure (ABP) and not on the control of office blood pressure. In the second article, it was used blood pressure (BP) measurementes from ABPM in its usual summary form (mean 24h, daytime ande nighttime). Pseudoresistant hypetensive patients showed an upward trend, indicating th need to monitor them more than once a year. Moreover, there was no reduction in body mass index and waist circunference values in these patientes. the third paper dealt with the evolution of nocturnal blood pressure values in these cohort patients, as well as estimated the crude probabilities of trnsitions between the nocturnal dip categories in sucessive ABPM. Although the limit of normality of 10% was not excessd, a sharp drop in nocturnal dip values was observed on dippers patients oves time. the maindtenance probbiliy in dippes status was estimted in 52% whereas in non dipper status figured in 46%. In these two articles generalized additive mixed models that incorporte random effects were used, since the intr-patient vrition was significant. the incorpotation of moro sophisticated statistical methods is justified by the quality and cost of longitudinal informations. Based on these three articles, it was conclued tht the use of ABPM is essentil in monitoring patients with resistant hypertension, since it allows to detect chnges over time in the clinical outcome. Hipertensão resistente Dados longitudinais Medidas repetidas Análise de sobrevida Modelos aditivos generalizados mistos Hipertensão/prevençäo e controle Resistant hypertension Ambulatory blood pressure mmonitoring Longitudinal data Repeated measures Survival analysis Generalized additive mixed models Hipertensão/prevençäo & controle Monitorização Ambulatorial da Pressão Arterial/utilizaçäo -Estudos Longitudinais Análise de Sobrevida Modelos Lineares Pressão Arterial
208	Design and analysis of sugarcane breeding experiments: a case study / Delineamento e análise de experimentos de melhoramento com cana de açúcar: um estudo de caso Alessandra dos Santos 26 May 2017 (has links) One purpose of breeding programs is the selection of the better test lines. The accuracy of selection can be improved by using optimal design and using models that fit the data well. Finding this is not easy, especially in large experiments which assess more than one hundred lines without the possibility of replication due to the limited material, area and high costs. Thus, the large number of parameters in the complex variance structure that needs to be fitted relies on the limited number of replicated check varieties. The main objectives of this thesis were to model 21 trials of sugarcane provided by \"Centro de Tecnologia Canavieira\" (CTC - Brazilian company of sugarcane) and to evaluate the design employed, which uses a large number of unreplicated test lines (new varieties) and systematic replicated check (commercial) lines. The mixed linear model was used to identify the three major components of spatial variation in the plot errors and the competition effects at the genetic and residual levels. The test lines were assumed as random effects and check lines as fixed, because they came from different processes. The single and joint analyses were developed because the trials could be grouped into two types: (i) one longitudinal data set (two cuts) and (ii) five regional groups of experiment (each group was a region which had three sites). In a study of alternative designs, a fixed size trial was assumed to evaluate the efficiency of the type of unreplicated design employed in these 21 trials comparing to spatially optimized unreplicated and p-rep designs with checks and a spatially optimized p-rep design. To investigate models and design there were four simulation studies to assess mainly the i) fitted model, under conditions of competition effects at the genetic level, ii) accuracy of estimation in the separate versus joint analysis; iii) relation between the sugarcane lodging and the negative residual correlation, and iv) design efficiency. To conclude, the main information obtained from the simulation studies was: the convergence number of the algorithm model analyzed; the variance parameter estimates; the correlations between the direct genetic EBLUPs and the true direct genetic effects; the assertiveness of selection or the average similarity, where similarity was measured as the percentage of the 30 test lines with the highest direct genetic EBLUPs that are in the true 30 best test lines (generated); and the heritability estimates or the genetic gain. / Um dos propósitos dos programas de melhoramento genético é a seleção de novos clones melhores (novos materiais). A acurácia de seleção pode ser melhorada usando delineamentos ótimos e modelos bem ajustados. Porém, descobrir isso não é fácil, especialmente, em experimentos grandes que possuem mais de cem clones sem a possibilidade de repetição devido à limitação de material, área e custos elevados, dadas as poucas repetições de parcelas com variedades comerciais (testemunhas) e o número de parâmetros de complexa variância estrutural que necessitam ser assumidos. Os principais objetivos desta tese foram modelar 21 experimentos de cana de açúcar fornecidos pelo Centro de Tecnologia Canavieira (CTC - empresa brasileira de cana de açúcar) e avaliar o delineamento empregado, o qual usa um número grande de clones não repetidos e testemunhas sistematicamente repetidas. O modelo linear misto foi usado, identificando três principais componentes de variação espacial nos erros de parcelas e efeitos de competição, em nível genético e residual. Os clones foram assumidos de efeitos aleatórios e as testemunhas de efeitos fixos, pois vieram de processos diferentes. As análises individuais e conjuntas foram desenvolvidas neste material pois os experimentos puderam ser agrupados em dois tipos: (i) um delineamento longitudinal (duas colheitas) e (ii) cinco grupos de experimentos (cada grupo uma região com três locais). Para os estudos de delineamentos, um tamanho fixo de experimento foi assumido para se avaliar a eficiência do delineamento não replicado (empregado nesses 21 experimentos) com os não replicados otimizado espacialmente, os parcialmente replicados com testemunhas e os parcialmente replicados otimizado espacialmente. Quatro estudos de simulação foram feitos para avaliar i) os modelos ajustados, sob condições de efeito de competição em nível genético, ii) a acurácia das estimativas vindas dos modelos de análise individual e conjunta; iii) a relação entre tombamento da cana e a correlação residual negativa, e iv) a eficiência dos delineamentos. Para concluir, as principais informações utilizadas nos estudos de simulação foram: o número de vezes que o algoritmo convergiu; a variância na estimativa dos parâmetros; a correlação entre os EBLUPs genético direto e os efeitos genéticos reais; a assertividade de seleção ou a semelhança média, sendo semelhança medida como a porcentagem dos 30 clones com os maiores EBLUPS genético e os 30 melhores verdadeiros clones; e a estimativa da herdabilidade ou do ganho genético. Assertividade na seleção Correlação autorregressiva Correlação em banda Delineamento não replicado Delineamento ótimo Delineamento parcialmente replicado Estudo de simulação Ganho genético Modelos mistos Tombamento Assertiveness of selection Autoregressive correlation Banded correlation Competition Genetic gain Lodging Mixed models Optimal design p-rep design Simulation study Unreplicated design
209	Assessing the robustness of genetic codes and genomes Sautié Castellanos, Miguel 06 1900 (has links) Deux approches principales existent pour évaluer la robustesse des codes génétiques et des séquences de codage. L'approche statistique est basée sur des estimations empiriques de probabilité calculées à partir d'échantillons aléatoires de permutations représentant les affectations d'acides aminés aux codons, alors que l'approche basée sur l'optimisation repose sur le pourcentage d’optimisation, généralement calculé en utilisant des métaheuristiques. Nous proposons une méthode basée sur les deux premiers moments de la distribution des valeurs de robustesse pour tous les codes génétiques possibles. En se basant sur une instance polynomiale du Problème d'Affectation Quadratique, nous proposons un algorithme vorace exact pour trouver la valeur minimale de la robustesse génomique. Pour réduire le nombre d'opérations de calcul des scores et de la borne supérieure de Cantelli, nous avons développé des méthodes basées sur la structure de voisinage du code génétique et sur la comparaison par paires des codes génétiques, entre autres. Pour calculer la robustesse des codes génétiques naturels et des génomes procaryotes, nous avons choisi 23 codes génétiques naturels, 235 propriétés d'acides aminés, ainsi que 324 procaryotes thermophiles et 418 procaryotes non thermophiles. Parmi nos résultats, nous avons constaté que bien que le code génétique standard soit plus robuste que la plupart des codes génétiques, certains codes génétiques mitochondriaux et nucléaires sont plus robustes que le code standard aux troisièmes et premières positions des codons, respectivement. Nous avons observé que l'utilisation des codons synonymes tend à être fortement optimisée pour amortir l'impact des changements d'une seule base, principalement chez les procaryotes thermophiles. / There are two main approaches to assess the robustness of genetic codes and coding sequences. The statistical approach is based on empirical estimates of probabilities computed from random samples of permutations representing assignments of amino acids to codons, whereas, the optimization-based approach relies on the optimization percentage frequently computed by using metaheuristics. We propose a method based on the first two moments of the distribution of robustness values for all possible genetic codes. Based on a polynomially solvable instance of the Quadratic Assignment Problem, we propose also an exact greedy algorithm to find the minimum value of the genome robustness. To reduce the number of operations for computing the scores and Cantelli’s upper bound, we developed methods based on the genetic code neighborhood structure and pairwise comparisons between genetic codes, among others. For assessing the robustness of natural genetic codes and genomes, we have chosen 23 natural genetic codes, 235 amino acid properties, as well as 324 thermophilic and 418 non-thermophilic prokaryotes. Among our results, we found that although the standard genetic code is more robust than most genetic codes, some mitochondrial and nuclear genetic codes are more robust than the standard code at the third and first codon positions, respectively. We also observed that the synonymous codon usage tends to be highly optimized to buffer the impact of single-base changes, mainly, in thermophilic prokaryotes. Quadratic assignment problem Cantelli’s upper bound Generalized linear mixed models Genetic code Hydrophobicity Thermophiles Codon usage bias Problème d'affectation quadratique Borne supérieure de Cantelli Code génétique Hydrophobicité Thermophiles Biais d'utilisation des codons
210	Estimation non-paramétrique adaptative pour des modèles bruités / Nonparametric adaptive estimation in measurement error models Mabon, Gwennaëlle 26 May 2016 (has links) Dans cette thèse, nous nous intéressons au problème d'estimation de densité dans le modèle de convolution. Ce cadre correspond aux modèles avec erreurs de mesures additives, c'est-à-dire que nous observons une version bruitée de la variable d'intérêt. Pour mener notre étude, nous adoptons le point de vue de l'estimation non-paramétrique adaptative qui repose sur des procédures de sélection de modèle développées par Birgé & Massart ou sur les méthodes de Lepski. Cette thèse se divise en deux parties. La première développe des méthodes spécifiques d'estimation adaptative quand les variables d'intérêt et les erreurs sont des variables aléatoires positives. Ainsi nous proposons des estimateurs adaptatifs de la densité ou encore de la fonction de survie dans ce modèle, puis de fonctionnelles linéaires de la densité cible. Enfin nous suggérons une procédure d'agrégation linéaire. La deuxième partie traite de l'estimation adaptative de densité dans le modèle de convolution lorsque la loi des erreurs est inconnue. Dans ce cadre il est supposé qu'un échantillon préliminaire du bruit est disponible ou que les observations sont disponibles sous forme de données répétées. Les résultats obtenus pour des données répétées dans le modèle de convolution permettent d'élargir cette méthodologie au cadre des modèles linéaires mixtes. Enfin cette méthode est encore appliquée à l'estimation de la densité de somme de variables aléatoires observées avec du bruit. / In this thesis, we are interested in nonparametric adaptive estimation problems of density in the convolution model. This framework matches additive measurement error models, which means we observe a noisy version of the random variable of interest. To carry out our study, we follow the paradigm of model selection developped by Birgé & Massart or criterion based on Lepski's method. The thesis is divided into two parts. In the first one, the main goal is to build adaptive estimators in the convolution model when both random variables of interest and errors are distributed on the nonnegative real line. Thus we propose adaptive estimators of the density along with the survival function, then of linear functionals of the target density. This part ends with a linear density aggregation procedure. The second part of the thesis deals with adaptive estimation of density in the convolution model when the distribution is unknown and distributed on the real line. To make this problem identifiable, we assume we have at hand either a preliminary sample of the noise or we observe repeated data. So, we can derive adaptive estimation with mild assumptions on the noise distribution. This methodology is then applied to linear mixed models and to the problem of density estimation of the sum of random variables when the latter are observed with an additive noise. Modèles de convolution Modèles de durées Modèles mixtes Estimation non-paramétrique Estimation adaptative Estimation par projection Sélection de modèles Méthodes de Goldenshluger et Lepski Agrégation Vitesses optimales minimax Convolution models Duration models Mixed models Nonparametric estimation Adaptive estimation Projection estimators Model selection Goldenshluger and Lepski method Aggregation Minimax optimal rates 519

Search results