Global ETD Search

71	Implementing SAE Techniques to Predict Global Spectacles Needs Zhang, Yuxue January 2023 (has links) This study delves into the application of Small Area Estimation (SAE) techniques to enhance the accuracy of predicting global needs for assistive spectacles. By leveraging the power of SAE, the research undertakes a comprehensive exploration, employing arange of predictive models including Linear Regression (LR), Empirical Best Linear Unbiased Prediction (EBLUP), hglm (from R package) with Conditional Autoregressive (CAR), and Generalized Linear Mixed Models (GLMM). At last phase,the global spectacle needs’ prediction includes various essential steps such as random effects simulation, coefficient extraction from GLMM estimates, and log-linear modeling. The investigation develops a multi-faceted approach, incorporating area-level modeling, spatial correlation analysis, and relative standard error, to assess their impact on predictive accuracy. The GLMM consistently displays the lowest Relative Standard Error (RSE) values, almost close to zero, indicating precise but potentially overfit results. Conversely, the hglm with CAR model presents a narrower RSE range, typically below 25%, reflecting greater accuracy; however, it is worth noting that it contains a higher number of outliers. LR illustrates a performance similar to EBLUP, with RSE values reaching around 50% in certain scenarios and displaying slight variations across different contexts. These findings underscore the trade-offs between precision and robustness across these models, especially for finer geographical levels and countries not included in the initial sample. small area estimation area-level model generalized linear mixed models Conditional Autoregressive spatial correlation spectacle needs assistive products auxiliary data hglm relative standard error simulation Probability Theory and Statistics Sannolikhetsteori och statistik
72	Optimierung von Messinstrumenten im Large-scale Assessment Hecht, Martin 21 July 2015 (has links) Messinstrumente stellen in der wissenschaftlichen Forschung ein wesentliches Element zur Erkenntnisgewinnung dar. Das Besondere an Messinstrumenten im Large-scale Assessment in der Bildungsforschung ist, dass diese normalerweise für jede Studie neu konstruiert werden und dass die Testteilnehmer verschiedene Versionen des Tests bekommen. Hierbei ergeben sich potentielle Gefahren für die Akkuratheit und Validität der Messung. Um solche Gefahren zu minimieren, sollten (a) die Ursachen für Verzerrungen der Messung und (b) mögliche Strategien zur Optimierung der Messinstrumente eruiert werden. Deshalb wird in der vorliegenden Dissertation spezifischen Fragestellungen im Rahmen dieser beiden Forschungsanliegen nachgegangen. / Measurement instruments are essential elements in the acquisition of knowledge in scientific research. Special features of measurement instruments in large-scale assessments of student achievement are their frequent reconstruction and the usage of different test versions. Here, threats for the accuracy and validity of the measurement may emerge. To minimize such threats, (a) sources for potential bias of measurement and (b) strategies to optimize measuring instruments should be explored. Therefore, the present dissertation investigates several specific topics within these two research areas. Psychologie Bildungsforschung Optimierung Kontexteffekte Bearbeitungszeit Blockpaarbalance Designeffekte Designbalancierung Generalisierte lineare Mischmodelle IQB IRT Lineare Mischmodelle Messmodell Messinstrument Multi-Matrix Design Parameterschätzung Parameterverzerrung Positionsbalance Rasch-Modell Testhefteffekte Testheftschwierigkeit Testheftleichtigkeit Testdesign Unvollständige Blockdesigns Positionseffekte Schulleistungsstudien context effects optimization linear mixed models generalized linear mixed models response time educational assessment cluster pair balance design effects design balance item response theory measurement model measurement instrument parameter estimation parameter bias position balance psychology Rasch model 1PL model booklet effects booklet difficulty booklet easiness test design booklet design incomplete block design multiple matrix sampling position effects large-scale assessment 150 Psychologie 11 Psychologie CM 3000 CM 3200 ddc:150
73	Hur påverkar avrundningar tillförlitligheten hos parameterskattningar i en linjär blandad modell? Stoorhöök, Li, Artursson, Sara January 2016 (has links) Tidigare studier visar på att blodtrycket hos gravida sjunker under andra trimestern och sedanökar i ett senare skede av graviditeten. Högt blodtryck hos gravida kan medföra hälsorisker, vilket gör mätningar av blodtryck relevanta. Dock uppstår det osäkerhet då olika personer inom vården hanterar blodtrycksmätningarna på olika sätt. Delar av vårdpersonalen avrundarmätvärden och andra gör det inte, vilket kan leda till svårigheter att tolkablodtrycksutvecklingen. I uppsatsen behandlas ett dataset innehållandes blodtrycksvärden hos gravida genom att skatta nio olika linjära regressionsmodeller med blandade effekter. Därefter genomförs en simuleringsstudie med syfte att undersöka hur mätproblem orsakat av avrundningar påverkar parameterskattningar och modellval i en linjär blandad modell. Slutsatsen är att blodtrycksavrundningarna inte påverkar typ 1-felet men påverkar styrkan. Dock innebär inte detta något problem vid fortsatt analys av blodtrycksvärdena i det verkliga datasetet. Coarsening Heaping Likelihood ratio test (LRT) Linear Mixed Models Mean square error (MSE) Measurement error Power Reliability Rounding Type I-error Validity Avrundning Förgrovning Hopning Likelihood ratio test (LRT) Linjär blandad modell Medelkvadratfel (MSE) Mätfel Reliabilitet Styrka Typ 1-fel Validitet
74	Structural equation models applied to quantitative genetics / Modelos de equações estruturais aplicados à genética quantitativa Cerqueira, Pedro Henrique Ramos 03 September 2015 (has links) Causal models have been used in different areas of knowledge in order to comprehend the causal associations between variables. Over the past decades, the amount of studies using these models have been growing a lot, especially those related to biological systems where studying and learning causal relationships among traits are essential for predicting the consequences of interventions in such system. Graph analysis (GA) and structural equation modeling (SEM) are tools used to explore such associations. While GA allows searching causal structures that express qualitatively how variables are causally connected, fitting SEM with a known causal structure allows to infer the magnitude of causal effects. Also SEM can be viewed as multiple regression models in which response variables can be explanatory variables for others. In quantitative genetics studies, SEM aimed to study the direct and indirect genetic effects associated to individuals through information related to them, beyond the observed characteristics, such as the kinship relations. In those studies typically the assumptions of linear relationships among traits are made. However, in some scenarios, nonlinear relationships can be observed, which make unsuitable the mentioned assumptions. To overcome this limitation, this paper proposes to use a mixed effects polynomial structural equation model, second or superior degree, to model those nonlinear relationships. Two studies were developed, a simulation and an application to real data. The first study involved simulation of 50 data sets, with a fully recursive causal structure involving three characteristics in which linear and nonlinear causal relations between them were allowed. The second study involved the analysis of traits related to dairy cows of the Holstein breed. Phenotypic relationships between traits were calving difficulty, gestation length and also the proportion of perionatal death. We compare the model of multiple traits and polynomials structural equations models, under different polynomials degrees in order to assess the benefits of the SEM polynomial of second or higher degree. For some situations the inappropriate assumption of linearity results in poor predictions of the direct, indirect and total of the genetic variances and covariance, either overestimating, underestimating, or even assign opposite signs to covariances. Therefore, we conclude that the inclusion of a polynomial degree increases the SEM expressive power. / Modelos causais têm sido muitos utilizados em estudos em diferentes áreas de conhecimento, a fim de compreender as associações ou relações causais entre variáveis. Durante as últimas décadas, o uso desses modelos têm crescido muito, especialmente estudos relacionados à sistemas biológicos, uma vez que compreender as relações entre características são essenciais para prever quais são as consequências de intervenções em tais sistemas. Análise do grafo (AG) e os modelos de equações estruturais (MEE) são utilizados como ferramentas para explorar essas relações. Enquanto AG nos permite buscar por estruturas causais, que representam qualitativamente como as variáveis são causalmente conectadas, ajustando o MEE com uma estrutura causal conhecida nos permite inferir a magnitude dos efeitos causais. Os MEE também podem ser vistos como modelos de regressão múltipla em que uma variável resposta pode ser vista como explanatória para uma outra característica. Estudos utilizando MEE em genética quantitativa visam estudar os efeitos genéticos diretos e indiretos associados aos indivíduos por meio de informações realcionadas aos indivíduas, além das característcas observadas, como por exemplo o parentesco entre eles. Neste contexto, é tipicamente adotada a suposição que as características observadas são relacionadas linearmente. No entanto, para alguns cenários, relações não lineares são observadas, o que torna as suposições mencionadas inadequadas. Para superar essa limitação, este trabalho propõe o uso de modelos de equações estruturais de efeitos polinomiais mistos, de segundo grau ou seperior, para modelar relações não lineares. Neste trabalho foram desenvolvidos dois estudos, um de simulação e uma aplicação a dados reais. O primeiro estudo envolveu a simulação de 50 conjuntos de dados, com uma estrutura causal completamente recursiva, envolvendo 3 características, em que foram permitidas relações causais lineares e não lineares entre as mesmas. O segundo estudo envolveu a análise de características relacionadas ao gado leiteiro da raça Holandesa, foram utilizadas relações entre os seguintes fenótipos: dificuldade de parto, duração da gestação e a proporção de morte perionatal. Nós comparamos o modelo misto de múltiplas características com os modelos de equações estruturais polinomiais, com diferentes graus polinomiais, a fim de verificar os benefícios do MEE polinomial de segundo grau ou superior. Para algumas situações a suposição inapropriada de linearidade resulta em previsões pobres das variâncias e covariâncias genéticas diretas, indiretas e totais, seja por superestimar, subestimar, ou mesmo atribuir sinais opostos as covariâncias. Portanto, verificamos que a inclusão de um grau de polinômio aumenta o poder de expressão do MEE. Amostrador de Gibbs Bayesian inference Gado leiteiro da raça Holandesa Genética quantitativa Gibbs sampler Holstein dairy cattle Inferência bayesiana Linear mixed models Modelos de equações estruturais Modelos lineares mistos Modelos mistos multi característicos Multiple trait mixed models Polynomial regression Quantitative genetics Regressão polinomial Structural equation models
75	Estimation robuste de courbes de consommmation électrique moyennes par sondage pour de petits domaines en présence de valeurs manquantes / Robust estimation of mean electricity consumption curves by sampling for small areas in presence of missing values De Moliner, Anne 05 December 2017 (has links) Dans cette thèse, nous nous intéressons à l'estimation robuste de courbes moyennes ou totales de consommation électrique par sondage en population finie, pour l'ensemble de la population ainsi que pour des petites sous-populations, en présence ou non de courbes partiellement inobservées.En effet, de nombreuses études réalisées dans le groupe EDF, que ce soit dans une optique commerciale ou de gestion du réseau de distribution par Enedis, se basent sur l'analyse de courbes de consommation électrique moyennes ou totales, pour différents groupes de clients partageant des caractéristiques communes. L'ensemble des consommations électriques de chacun des 35 millions de clients résidentiels et professionnels Français ne pouvant être mesurées pour des raisons de coût et de protection de la vie privée, ces courbes de consommation moyennes sont estimées par sondage à partir de panels. Nous prolongeons les travaux de Lardin (2012) sur l'estimation de courbes moyennes par sondage en nous intéressant à des aspects spécifiques de cette problématique, à savoir l'estimation robuste aux unités influentes, l'estimation sur des petits domaines, et l'estimation en présence de courbes partiellement ou totalement inobservées.Pour proposer des estimateurs robustes de courbes moyennes, nous adaptons au cadre fonctionnel l'approche unifiée d'estimation robuste en sondages basée sur le biais conditionnel proposée par Beaumont (2013). Pour cela, nous proposons et comparons sur des jeux de données réelles trois approches : l'application des méthodes usuelles sur les courbes discrétisées, la projection sur des bases de dimension finie (Ondelettes ou Composantes Principales de l'Analyse en Composantes Principales Sphériques Fonctionnelle en particulier) et la troncature fonctionnelle des biais conditionnels basée sur la notion de profondeur d'une courbe dans un jeu de données fonctionnelles. Des estimateurs d'erreur quadratique moyenne instantanée, explicites et par bootstrap, sont également proposés.Nous traitons ensuite la problématique de l'estimation sur de petites sous-populations. Dans ce cadre, nous proposons trois méthodes : les modèles linéaires mixtes au niveau unité appliqués sur les scores de l'Analyse en Composantes Principales ou les coefficients d'ondelettes, la régression fonctionnelle et enfin l'agrégation de prédictions de courbes individuelles réalisées à l'aide d'arbres de régression ou de forêts aléatoires pour une variable cible fonctionnelle. Des versions robustes de ces différents estimateurs sont ensuite proposées en déclinant la démarche d'estimation robuste basée sur les biais conditionnels proposée précédemment.Enfin, nous proposons quatre estimateurs de courbes moyennes en présence de courbes partiellement ou totalement inobservées. Le premier est un estimateur par repondération par lissage temporel non paramétrique adapté au contexte des sondages et de la non réponse et les suivants reposent sur des méthodes d'imputation. Les portions manquantes des courbes sont alors déterminées soit en utilisant l'estimateur par lissage précédemment cité, soit par imputation par les plus proches voisins adaptée au cadre fonctionnel ou enfin par une variante de l'interpolation linéaire permettant de prendre en compte le comportement moyen de l'ensemble des unités de l'échantillon. Des approximations de variance sont proposées dans chaque cas et l'ensemble des méthodes sont comparées sur des jeux de données réelles, pour des scénarios variés de valeurs manquantes. / In this thesis, we address the problem of robust estimation of mean or total electricity consumption curves by sampling in a finite population for the entire population and for small areas. We are also interested in estimating mean curves by sampling in presence of partially missing trajectories.Indeed, many studies carried out in the French electricity company EDF, for marketing or power grid management purposes, are based on the analysis of mean or total electricity consumption curves at a fine time scale, for different groups of clients sharing some common characteristics.Because of privacy issues and financial costs, it is not possible to measure the electricity consumption curve of each customer so these mean curves are estimated using samples. In this thesis, we extend the work of Lardin (2012) on mean curve estimation by sampling by focusing on specific aspects of this problem such as robustness to influential units, small area estimation and estimation in presence of partially or totally unobserved curves.In order to build robust estimators of mean curves we adapt the unified approach to robust estimation in finite population proposed by Beaumont et al (2013) to the context of functional data. To that purpose we propose three approaches : application of the usual method for real variables on discretised curves, projection on Functional Spherical Principal Components or on a Wavelets basis and thirdly functional truncation of conditional biases based on the notion of depth.These methods are tested and compared to each other on real datasets and Mean Squared Error estimators are also proposed.Secondly we address the problem of small area estimation for functional means or totals. We introduce three methods: unit level linear mixed model applied on the scores of functional principal components analysis or on wavelets coefficients, functional regression and aggregation of individual curves predictions by functional regression trees or functional random forests. Robust versions of these estimators are then proposed by following the approach to robust estimation based on conditional biais presented before.Finally, we suggest four estimators of mean curves by sampling in presence of partially or totally unobserved trajectories. The first estimator is a reweighting estimator where the weights are determined using a temporal non parametric kernel smoothing adapted to the context of finite population and missing data and the other ones rely on imputation of missing data. Missing parts of the curves are determined either by using the smoothing estimator presented before, or by nearest neighbours imputation adapted to functional data or by a variant of linear interpolation which takes into account the mean trajectory of the entire sample. Variance approximations are proposed for each method and all the estimators are compared to each other on real datasets for various missing data scenarios. Arbres de régression Biais conditionnels Données fonctionnelles Données manquantes Estimation sur petits domaines Estimateurs à noyau Forêts aléatoires Modèles linéaires mixtes Plus proches voisins Robustesse Sondage Conditional bias Functional data Kernel estimators Missing data Linear mixed models Nearest neighbours Random forests Regression trees Robustness Small area estimation Survey sampling 510
76	Modelagem espacial, temporal e longitudinal: diferentes abordagens do estudo da leptospirose urbana / Space, time and longitudinal modeling: different approaches for the urban leptospirosis study Tassinari, Wagner de Souza January 2009 (has links) Made available in DSpace on 2011-05-04T12:42:00Z (GMT). No. of bitstreams: 0 Previous issue date: 2009 / (...) O objetivo desta tese foi modelar os fatores de risco associados à ocorrência de leptospirose urbana em diferentes contextos, com especial atenção para aspectos espaciais e temporais. Foram utilizadas técnicas de modelagem tais como, modelos generalizados aditivos e mistos. Também explorou-se técnicas de detecção de aglomerados espaço-temporais. (...) / Leptospirosis, a disease caused by pathogenic spirochete of the genus Leptospira, is one of the most widespread zoonoses in the world, considered a major public health problem associated with the lack of sanitation and poverty. It is endemic in Brazil, data from surveillance show that outbreaks of leptospirosis occur as cyclical annual epidemics during rainfalls. The aim of this thesis was modeling the risk factors associated with the occurrence of leptospirosis in di erent urban contexts, with particular attention to spatial and temporal aspects. We used some modeling techniques such as generalized additive and mixed models. Techniques for detection space-time clusters were also explored. This thesis has prioritized the use of free softwares - R, ubuntu linux operating system, LATEX , SatScan (this is not open source but free). This thesis was prepared in the form of three articles. In the rst article is presented a spatio-temporal analysis of leptospirosis cases occurrence in Rio de Janeiro between 1997 and 2002. Using the detection of space-time clusters - \outbreaks" method - were statistically signi cant only cluster ocorred in 1997 and 1998. Generalized Linear Mixed Models were used to evaluate the risk factors associated with the occurrence of cases that belonged to outbreaks in endemic cases. The cases belonging to the outbreaks are associated with the occurrence of rainfall over 4 mm (OR, 3.71; 95% CI, 1.83 - 7.51). There were no signi cant associations with socioeconomic covariates, in other words, being endemic or epidemic leptospirosis occurs in the same population. The second and third articles examined a seroprevalence survey and seroconversion cohort conducted in Pau da Lima community, Salvador, Bahia. In both Generalized Additive Models were used to t the exposure variables both in individuals and peridomicile context, as well as to estimate the spatial area of leptospirosis risk. The signi cant variables were: gender, age, presence of rats in the peridomicile, domicile near a trash collectin or an open sewer and domicile altitude above sea level. Studies show that individual and contextual variables explain much of the spatial variability of leptospirosis, but there are still factors that were not measured in the studies but which should be investigated. The maps of risk of seroprevalence and seroconversion show distinct regions where the spatial e ect is signi cantly di erent from the global average. It is still lack for a more robust integration between the professionals who develop and operate the GIS, epidemiologists and biostatistics. This integration represents an important advance enabling the development and use of these techniques in Public Health support. The study of prevalence and incidence of endemic areas, in the leptospirosis context, it is very complex and still grow up. The reunion of professional specialists from several areas of human knowledge (eg, clinicians, epidemiologists, geographers, biologists, statisticians, engineers, etc.), it is essential to advance the knowledge about the disease and their relationship to social inequality and environmental well to contribute to the creation of efficient and e ective measures to control endemic diseases. Epidemiologia Ambiental Analise Espacial Modelos Lineares Generalizados Mistos Modelos Aditivos Generalizados Environmental Epidemiology Spatial Analysis Generalized Linear Mixed Models Generalized additive models Leptospirose/epidemiologia Leptospirose/transmissäo Distribuição Espacial da População Zonas Urbanas -Modelos Lineares Conglomerados Espaço-Temporais Prevalência Fatores de Risco Brasil
77	Structural equation models applied to quantitative genetics / Modelos de equações estruturais aplicados à genética quantitativa Pedro Henrique Ramos Cerqueira 03 September 2015 (has links) Causal models have been used in different areas of knowledge in order to comprehend the causal associations between variables. Over the past decades, the amount of studies using these models have been growing a lot, especially those related to biological systems where studying and learning causal relationships among traits are essential for predicting the consequences of interventions in such system. Graph analysis (GA) and structural equation modeling (SEM) are tools used to explore such associations. While GA allows searching causal structures that express qualitatively how variables are causally connected, fitting SEM with a known causal structure allows to infer the magnitude of causal effects. Also SEM can be viewed as multiple regression models in which response variables can be explanatory variables for others. In quantitative genetics studies, SEM aimed to study the direct and indirect genetic effects associated to individuals through information related to them, beyond the observed characteristics, such as the kinship relations. In those studies typically the assumptions of linear relationships among traits are made. However, in some scenarios, nonlinear relationships can be observed, which make unsuitable the mentioned assumptions. To overcome this limitation, this paper proposes to use a mixed effects polynomial structural equation model, second or superior degree, to model those nonlinear relationships. Two studies were developed, a simulation and an application to real data. The first study involved simulation of 50 data sets, with a fully recursive causal structure involving three characteristics in which linear and nonlinear causal relations between them were allowed. The second study involved the analysis of traits related to dairy cows of the Holstein breed. Phenotypic relationships between traits were calving difficulty, gestation length and also the proportion of perionatal death. We compare the model of multiple traits and polynomials structural equations models, under different polynomials degrees in order to assess the benefits of the SEM polynomial of second or higher degree. For some situations the inappropriate assumption of linearity results in poor predictions of the direct, indirect and total of the genetic variances and covariance, either overestimating, underestimating, or even assign opposite signs to covariances. Therefore, we conclude that the inclusion of a polynomial degree increases the SEM expressive power. / Modelos causais têm sido muitos utilizados em estudos em diferentes áreas de conhecimento, a fim de compreender as associações ou relações causais entre variáveis. Durante as últimas décadas, o uso desses modelos têm crescido muito, especialmente estudos relacionados à sistemas biológicos, uma vez que compreender as relações entre características são essenciais para prever quais são as consequências de intervenções em tais sistemas. Análise do grafo (AG) e os modelos de equações estruturais (MEE) são utilizados como ferramentas para explorar essas relações. Enquanto AG nos permite buscar por estruturas causais, que representam qualitativamente como as variáveis são causalmente conectadas, ajustando o MEE com uma estrutura causal conhecida nos permite inferir a magnitude dos efeitos causais. Os MEE também podem ser vistos como modelos de regressão múltipla em que uma variável resposta pode ser vista como explanatória para uma outra característica. Estudos utilizando MEE em genética quantitativa visam estudar os efeitos genéticos diretos e indiretos associados aos indivíduos por meio de informações realcionadas aos indivíduas, além das característcas observadas, como por exemplo o parentesco entre eles. Neste contexto, é tipicamente adotada a suposição que as características observadas são relacionadas linearmente. No entanto, para alguns cenários, relações não lineares são observadas, o que torna as suposições mencionadas inadequadas. Para superar essa limitação, este trabalho propõe o uso de modelos de equações estruturais de efeitos polinomiais mistos, de segundo grau ou seperior, para modelar relações não lineares. Neste trabalho foram desenvolvidos dois estudos, um de simulação e uma aplicação a dados reais. O primeiro estudo envolveu a simulação de 50 conjuntos de dados, com uma estrutura causal completamente recursiva, envolvendo 3 características, em que foram permitidas relações causais lineares e não lineares entre as mesmas. O segundo estudo envolveu a análise de características relacionadas ao gado leiteiro da raça Holandesa, foram utilizadas relações entre os seguintes fenótipos: dificuldade de parto, duração da gestação e a proporção de morte perionatal. Nós comparamos o modelo misto de múltiplas características com os modelos de equações estruturais polinomiais, com diferentes graus polinomiais, a fim de verificar os benefícios do MEE polinomial de segundo grau ou superior. Para algumas situações a suposição inapropriada de linearidade resulta em previsões pobres das variâncias e covariâncias genéticas diretas, indiretas e totais, seja por superestimar, subestimar, ou mesmo atribuir sinais opostos as covariâncias. Portanto, verificamos que a inclusão de um grau de polinômio aumenta o poder de expressão do MEE. Amostrador de Gibbs Gado leiteiro da raça Holandesa Genética quantitativa Inferência bayesiana Modelos de equações estruturais Modelos lineares mistos Modelos mistos multi característicos Regressão polinomial Bayesian inference Gibbs sampler Holstein dairy cattle Linear mixed models Multiple trait mixed models Polynomial regression Quantitative genetics Structural equation models
78	Assessing the robustness of genetic codes and genomes Sautié Castellanos, Miguel 06 1900 (has links) Deux approches principales existent pour évaluer la robustesse des codes génétiques et des séquences de codage. L'approche statistique est basée sur des estimations empiriques de probabilité calculées à partir d'échantillons aléatoires de permutations représentant les affectations d'acides aminés aux codons, alors que l'approche basée sur l'optimisation repose sur le pourcentage d’optimisation, généralement calculé en utilisant des métaheuristiques. Nous proposons une méthode basée sur les deux premiers moments de la distribution des valeurs de robustesse pour tous les codes génétiques possibles. En se basant sur une instance polynomiale du Problème d'Affectation Quadratique, nous proposons un algorithme vorace exact pour trouver la valeur minimale de la robustesse génomique. Pour réduire le nombre d'opérations de calcul des scores et de la borne supérieure de Cantelli, nous avons développé des méthodes basées sur la structure de voisinage du code génétique et sur la comparaison par paires des codes génétiques, entre autres. Pour calculer la robustesse des codes génétiques naturels et des génomes procaryotes, nous avons choisi 23 codes génétiques naturels, 235 propriétés d'acides aminés, ainsi que 324 procaryotes thermophiles et 418 procaryotes non thermophiles. Parmi nos résultats, nous avons constaté que bien que le code génétique standard soit plus robuste que la plupart des codes génétiques, certains codes génétiques mitochondriaux et nucléaires sont plus robustes que le code standard aux troisièmes et premières positions des codons, respectivement. Nous avons observé que l'utilisation des codons synonymes tend à être fortement optimisée pour amortir l'impact des changements d'une seule base, principalement chez les procaryotes thermophiles. / There are two main approaches to assess the robustness of genetic codes and coding sequences. The statistical approach is based on empirical estimates of probabilities computed from random samples of permutations representing assignments of amino acids to codons, whereas, the optimization-based approach relies on the optimization percentage frequently computed by using metaheuristics. We propose a method based on the first two moments of the distribution of robustness values for all possible genetic codes. Based on a polynomially solvable instance of the Quadratic Assignment Problem, we propose also an exact greedy algorithm to find the minimum value of the genome robustness. To reduce the number of operations for computing the scores and Cantelli’s upper bound, we developed methods based on the genetic code neighborhood structure and pairwise comparisons between genetic codes, among others. For assessing the robustness of natural genetic codes and genomes, we have chosen 23 natural genetic codes, 235 amino acid properties, as well as 324 thermophilic and 418 non-thermophilic prokaryotes. Among our results, we found that although the standard genetic code is more robust than most genetic codes, some mitochondrial and nuclear genetic codes are more robust than the standard code at the third and first codon positions, respectively. We also observed that the synonymous codon usage tends to be highly optimized to buffer the impact of single-base changes, mainly, in thermophilic prokaryotes. Quadratic assignment problem Cantelli’s upper bound Generalized linear mixed models Genetic code Hydrophobicity Thermophiles Codon usage bias Problème d'affectation quadratique Borne supérieure de Cantelli Code génétique Hydrophobicité Thermophiles Biais d'utilisation des codons
79	Modélisation et optimisation de la réponse à des vaccins et à des interventions immunothérapeutiques : application au virus Ebola et au VIH / Modeling and optimizing the response to vaccines and immunotherapeutic interventions : application to Ebola virus and HIV Pasin, Chloé 30 October 2018 (has links) Les vaccins ont été une grande réussite en matière de santé publique au cours des dernières années. Cependant, le développement de vaccins efficaces contre les maladies infectieuses telles que le VIH ou le virus Ebola reste un défi majeur. Cela peut être attribué à notre manque de connaissances approfondies en immunologie et sur le mode d'action de la mémoire immunitaire. Les modèles mathématiques peuvent aider à comprendre les mécanismes de la réponse immunitaire, à quantifier les processus biologiques sous-jacents et à développer des vaccins fondés sur un rationnel scientifique. Nous présentons un modèle mécaniste de la dynamique de la réponse immunitaire humorale après injection d'un vaccin Ebola basé sur des équations différentielles ordinaires. Les paramètres du modèle sont estimés par maximum de vraisemblance dans une approche populationnelle qui permet de quantifier le processus de la réponse immunitaire et ses facteurs de variabilité. En particulier, le schéma vaccinal n'a d'impact que sur la réponse à court terme, alors que des différences significatives entre des sujets de différentes régions géographiques sont observées à plus long terme. Cela pourrait avoir des implications dans la conception des futurs essais cliniques. Ensuite, nous développons un outil numérique basé sur la programmation dynamique pour optimiser des schémas d'injections répétées. En particulier, nous nous intéressons à des patients infectés par le VIH sous traitement mais incapables de reconstruire leur système immunitaire. Des injections répétées d'un produit immunothérapeutique (IL-7) sont envisagées pour améliorer la santé de ces patients. Le processus est modélisé par un modèle de Markov déterministe par morceaux et des résultats récents de la théorie du contrôle impulsionnel permettent de résoudre le problème numériquement à l'aide d'une suite itérative. Nous montrons dans une preuve de concept que cette méthode peut être appliquée à un certain nombre de pseudo-patients. Dans l'ensemble, ces résultats s'intègrent dans un effort de développer des méthodes sophistiquées pour analyser les données d'essais cliniques afin de répondre à des questions cliniques concrètes. / Vaccines have been one of the most successful developments in public health in the last years. However, a major challenge still resides in developing effective vaccines against infectious diseases such as HIV or Ebola virus. This can be attributed to our lack of deep knowledge in immunology and the mode of action of immune memory. Mathematical models can help understanding the mechanisms of the immune response, quantifying the underlying biological processes and eventually developing vaccines based on a solid rationale. First, we present a mechanistic model for the dynamics of the humoral immune response following Ebola vaccine immunizations based on ordinary differential equations. The parameters of the model are estimated by likelihood maximization in a population approach, which allows to quantify the process of the immune response and its factors of variability. In particular, the vaccine regimen is found to impact only the response on a short term, while significant differences between subjects of different geographic locations are found at a longer term. This could have implications in the design of future clinical trials. Then, we develop a numerical tool based on dynamic programming for optimizing schedule of repeated injections. In particular, we focus on HIV-infected patients under treatment but unable to recover their immune system. Repeated injections of an immunotherapeutic product (IL-7) are considered for improving the health of these patients. The process is first by a piecewise deterministic Markov model and recent results of the impulse control theory allow to solve the problem numerically with an iterative sequence. We show in a proof-of-concept that this method can be applied to a number of pseudo-patients. All together, these results are part of an effort to develop sophisticated methods for analyzing data from clinical trials to answer concrete clinical questions. Modélisation mécaniste Equations différentielles ordinaires Maximisation de la vraisemblance Modèles linéaires mixtes Contrôle optimal Programmation dynamique Vaccin Ebola Réponse immunitaire VIH Immunothérapie Mechanistic modeling Ordinary differential equations Likelihood maximization Linear mixed models Optimal control Piecewise deterministic Markov processes Dynamic programming Vaccine Ebola Immune response HIV Immunotherapy
80	Modélisation conjointe de trajectoire socioprofessionnelle individuelle et de la survie globale ou spécifique / Joint modeling of individual socio-professional trajectory and overall or cause-specific survival Karimi, Maryam 06 June 2016 (has links) Appartenir à une catégorie socio-économique moins élevée est généralement associé à une mortalité plus élevée pour de nombreuses causes de décès. De précédentes études ont déjà montré l’importance de la prise en compte des différentes dimensions des trajectoires socio-économiques au cours de la vie. L’analyse des trajectoires professionnelles constitue une étape importante pour mieux comprendre ces phénomènes. L’enjeu pour mesurer l’association entre les parcours de vie des trajectoires socio-économiques et la mortalité est de décomposer la part respective de ces facteurs dans l’explication du niveau de survie des individus. La complexité de l’interprétation de cette association réside dans la causalité bidirectionnelle qui la sous-tend: Les différentiels de mortalité sont-ils dus à des différentielsd’état de santé initial influençant conjointement la situation professionnelle et la mortalité, ou l’évolution professionnelle influence-t-elle directement l’état de santé puis la mortalité?Les méthodes usuelles ne tiennent pas compte de l’interdépendance des changements de situation professionnelle et de la bidirectionnalité de la causalité qui conduit à un biais important dans l’estimation du lien causale entre situation professionnelle et mortalité. Par conséquent, il est nécessaire de proposer des méthodes statistiques qui prennent en compte des mesures répétées (les professions) simultanément avec les variables de survie. Cette étude est motivée par la base de données Cosmop-DADS qui est un échantillon de la population salariée française.Le premier objectif de cette thèse était d’examiner l’ensemble des trajectoires professionnelles avec une classification professionnelle précise, au lieu d’utiliser un nombre limité d’états dans un parcours professionnel qui a été considéré précédemment. A cet effet, nous avons défini des variables dépendantes du temps afinde prendre en compte différentes dimensions des trajectoires professionnelles, à travers des modèles dits de "life-course", à savoir critical period, accumulation model et social mobility model, et nous avons mis en évidence l’association entre les trajectoires professionnelles et la mortalité par cause en utilisant ces variables dans un modèle de Cox.Le deuxième objectif a consisté à intégrer les épisodes professionnel comme un sous-modèle longitudinal dans le cadre des modèles conjoints pour réduire le biais issude l’inclusion des covariables dépendantes du temps endogènes dans le modèle de Cox. Nous avons proposé un modèle conjoint pour les données longitudinales nominaleset des données de risques concurrents dans une approche basée sur la vraisemblance. En outre, nous avons proposé une approche de type méta-analyse pour résoudre les problèmes liés au temps des calculs dans les modèles conjoints appliqués à l’analyse des grandes bases de données. Cette approche consiste à combiner les résultats issus d’analyses effectuées sur les échantillons stratifiés indépendants. Dans la même perspective de l’utilisation du modèle conjoint sur les grandes bases de données, nous avons proposé une procédure basée sur l’avantage computationnel de la régression de Poisson.Cette approche consiste à trouver les trajectoires typesà travers les méthodes de la classification, et d’appliquerle modèle conjoint sur ces trajectoires types. / Being in low socioeconomic position is associated with increased mortality risk from various causes of death. Previous studies have already shown the importance of considering different dimensions of socioeconomic trajectories across the life-course. Analyses of professional trajectories constitute a crucial step in order to better understand the association between socio-economic position and mortality. The main challenge in measuring this association is then to decompose the respectiveshare of these factors in explaining the survival level of individuals. The complexity lies in the bidirectional causality underlying the observed associations:Are mortality differentials due to differences in the initial health conditions that are jointly influencing employment status and mortality, or the professional trajectory influences directly health conditions and then mortality?Standard methods do not consider the interdependence of changes in occupational status and the bidirectional causal effect underlying the observed association and that leads to substantial bias in estimating the causal link between professional trajectory and mortality. Therefore, it is necessary to propose statistical methods that consider simultaneously repeated measurements (careers) and survivalvariables. This study was motivated by the Cosmop-DADS database, which is a sample of the French salaried population.The first aim of this dissertation was to consider the whole professional trajectories and an accurate occupational classification, instead of using limitednumber of stages during life course and a simple occupational classification that has been considered previously. For this purpose, we defined time-dependent variables to capture different life course dimensions, namely critical period, accumulation model and social mobility model, and we highlighted the association between professional trajectories and cause-specific mortality using the definedvariables in a Cox proportional hazards model.The second aim was to incorporate the employment episodes in a longitudinal sub-model within the joint model framework to reduce the bias resulting from the inclusion of internal time-dependent covariates in the Cox model. We proposed a joint model for longitudinal nominal outcomes and competing risks data in a likelihood-based approach. In addition, we proposed an approach mimicking meta-analysis to address the calculation problems in joint models and large datasets, by extracting independent stratified samples from the large dataset, applying the joint model on each sample and then combining the results. In the same objective, that is fitting joint model on large-scale data, we propose a procedure based on the appeal of the Poisson regression model. This approach consist of finding representativetrajectories by means of clustering methods and then applying the joint model on these representative trajectories. Modèles conjoints Données longitudinales Risques concurrents Risque cause-spécifique Modèle de Cox Algorithme EM Maximum de vraisemblance Régression de Poisson Effets aléatoires Joint models Generalized linear mixed models Longitudinal data Copmeting risks Cause-specific hazard Cox model EM algorithm Maximum likelihood Poisson regression Random effects

Search results