Global ETD Search

31	Modelos de fronteira estocástica: uma abordagem bayesiana / Stochastic frontier models: a bayesian approach Cespedes, Juliana Garcia 24 July 2008 (has links) A firma é o principal agente econômico para a produção e distribuição de bens e serviços. Seu constante investimento em melhorias e o aperfeiçoamento de sua capacidade produtiva, visando tornar-se cada vez mais eficiente, transforma-se em um determinante central do bem estar econômico da sociedade. O processo de medir a ineficiência de firmas baseia-se em análises de fronteiras, onde a ineficiência é medida como a distância entre os pontos observados da variável resposta e a função de produção, custo ou lucro verdadeiras, dependendo do modelo assumido para descrever a variável resposta. Existe uma variedade de formas funcionais para essas funções e algumas vezes é difícil julgar qual delas deve ser escolhida, visto que a forma verdadeira é desconhecida e pode ser somente aproximada. Em geral, na literatura, dados de produção são analisados assumindo-se modelos multiplicativos que impõem a restrição de que a produção é estritamente positiva e utiliza-se a transformação logarítmica para linearizar o modelo. Considera-se que o logaritmo do produto dada a ineficiência técnica tem distribuição contínua, independentemente de os dados serem contínuos ou discretos. A tese divide-se em dois artigos: o primeiro utiliza a inferência bayesiana para estimar a eficiência econômica de firmas utilizando os modelos de fronteira estocástica de custo com forma funcional flexível Fourier, que asseguram um bom ajuste para a fronteira, sendo fundamental para o cálculo da ineficiência econômica; o segundo artigo propõem os modelos generalizados de fronteira estocástica, baseando-se nos modelos lineares generalizados mistos com a abordagem bayesiana, para quantificar a ineficiência técnica de firmas (medida de incerteza) utilizando a variável resposta na escala original e distribuições pertencentes à família exponencial para a variável resposta dada a medida de ineficiência. / The firm is the main economic agent for the production and distribution of goods and services. Its constant investment in improvements and enhancement of its productive capacity to make itself more efficient becomes a central determinant of economic welfare of society. The measure process of inefficiency is based on frontier analysis, where inefficiency is measured as the distance between the observed points from variable response and real production, cost or profit function, depending on chosen model to describe the variable response. There are several functional forms to these functions and sometimes it is very difficult to decide which one has to be chosen because the true form is unknown and it can just be approximate. Generally, in the literature, production data are analyzed assuming multiplicative models that impose the restriction of what the production is strictly positive and use the logarithm transformation to turn the model lineal. It is considerate that the product\'s logarithm given the technical inefficiency has distribution continual, independent if the data are continuous or discrete. The papers presented in this thesis are: the first paper uses the bayesian inference to estimate the economic efficiency of firms in the cost stochastic frontier models using the Fourier flexible cost function, that assure a good settlement to the frontier being essential to calculate the economic inefficiency. The second paper proposes a generalized stochastic frontier models, based on generalized linear mixed models with the Bayesian approach, to quantify the inefficiency technical of the firms (uncertainty measures) by using the response variable in the scale original with distributions belonging on the exponential family to the response variable given the measure of inefficiency. Análise de Fourier Bayesian inference Econometria Economia - Eficiência Economic efficiency Fourier cost function Generalized linear mixed models. Modelos lineares Programação estocástica. Stochastic frontier models Technical effi- ciency
32	Modelos de fronteira estocástica: uma abordagem bayesiana / Stochastic frontier models: a bayesian approach Juliana Garcia Cespedes 24 July 2008 (has links) A firma é o principal agente econômico para a produção e distribuição de bens e serviços. Seu constante investimento em melhorias e o aperfeiçoamento de sua capacidade produtiva, visando tornar-se cada vez mais eficiente, transforma-se em um determinante central do bem estar econômico da sociedade. O processo de medir a ineficiência de firmas baseia-se em análises de fronteiras, onde a ineficiência é medida como a distância entre os pontos observados da variável resposta e a função de produção, custo ou lucro verdadeiras, dependendo do modelo assumido para descrever a variável resposta. Existe uma variedade de formas funcionais para essas funções e algumas vezes é difícil julgar qual delas deve ser escolhida, visto que a forma verdadeira é desconhecida e pode ser somente aproximada. Em geral, na literatura, dados de produção são analisados assumindo-se modelos multiplicativos que impõem a restrição de que a produção é estritamente positiva e utiliza-se a transformação logarítmica para linearizar o modelo. Considera-se que o logaritmo do produto dada a ineficiência técnica tem distribuição contínua, independentemente de os dados serem contínuos ou discretos. A tese divide-se em dois artigos: o primeiro utiliza a inferência bayesiana para estimar a eficiência econômica de firmas utilizando os modelos de fronteira estocástica de custo com forma funcional flexível Fourier, que asseguram um bom ajuste para a fronteira, sendo fundamental para o cálculo da ineficiência econômica; o segundo artigo propõem os modelos generalizados de fronteira estocástica, baseando-se nos modelos lineares generalizados mistos com a abordagem bayesiana, para quantificar a ineficiência técnica de firmas (medida de incerteza) utilizando a variável resposta na escala original e distribuições pertencentes à família exponencial para a variável resposta dada a medida de ineficiência. / The firm is the main economic agent for the production and distribution of goods and services. Its constant investment in improvements and enhancement of its productive capacity to make itself more efficient becomes a central determinant of economic welfare of society. The measure process of inefficiency is based on frontier analysis, where inefficiency is measured as the distance between the observed points from variable response and real production, cost or profit function, depending on chosen model to describe the variable response. There are several functional forms to these functions and sometimes it is very difficult to decide which one has to be chosen because the true form is unknown and it can just be approximate. Generally, in the literature, production data are analyzed assuming multiplicative models that impose the restriction of what the production is strictly positive and use the logarithm transformation to turn the model lineal. It is considerate that the product\'s logarithm given the technical inefficiency has distribution continual, independent if the data are continuous or discrete. The papers presented in this thesis are: the first paper uses the bayesian inference to estimate the economic efficiency of firms in the cost stochastic frontier models using the Fourier flexible cost function, that assure a good settlement to the frontier being essential to calculate the economic inefficiency. The second paper proposes a generalized stochastic frontier models, based on generalized linear mixed models with the Bayesian approach, to quantify the inefficiency technical of the firms (uncertainty measures) by using the response variable in the scale original with distributions belonging on the exponential family to the response variable given the measure of inefficiency. Análise de Fourier Econometria Economia - Eficiência Modelos lineares Programação estocástica. Bayesian inference Economic efficiency Fourier cost function Generalized linear mixed models. Stochastic frontier models Technical effi- ciency
33	Modelos lineares mistos e generalizados mistos em estudos de adaptação local e plasticidade fenotípica de Euterpe edulis / Linear mixed models and generalized mixed models applied in studies of local adaptation and phenotypic plasticity of Euterpe edulis Ezequiel Abraham López Bautista 18 June 2014 (has links) Este trabalho objetivou a avaliação da presença de plasticidade fenotípica e de adaptação local de três procedências de palmiteiro: Ombrófila Densa, Estacional Semidecidual e Restinga, em três locais no Estado de São Paulo: Parque Estadual da Ilha do Cardoso, Parque Estadual de Carlos Botelho e Estação Ecológica dos Caetetus, em ensaios de adaptação no estabelecimento (ou de semeadura) e de adaptação em juvenis (ou de crescimento). Os conjuntos de dados foram analisados utilizando estruturas de grupos de experimentos, com efeitos cruzados e aninhados. As variáveis relacionadas com a massa de matéria seca das plantas, nos dois ensaios, foram analisadas usando a abordagem de modelos lineares de efeitos mistos, por meio da incorporação de fatores de efeito aleatório, e fazendo uso do método da máxima verossimilhança restrita (REML) para estimação dos componentes de variância associados a tais fatores com um menor viés. Por outro lado, para a proporção de sementes germinadas, no ensaio de adaptação no estabelecimento, a análise estatística foi realizada a partir da abordagem dos modelos lineares generalizados mistos, sob a pressuposição de que a variável segue uma distribuição binomial, com função de ligação logito. O método da pseudo-verossimilhança foi empregado para obtenção da solução das equações de verossimilhança. Os resultados mostraram que as plantas originadas de sementes dos três biomas avaliados apresentaram um comportamento plástico, para todos os caracteres avaliados no ensaio de adaptação no estabelecimento. Com relação ao ensaio de adaptação em juvenis, a característica de plasticidade foi verificada somente para a massa de matéria seca da folha em plantas provenientes do bioma Estacional Semidecidual. A característica de adaptação local, apresentou-se de forma evidente no ensaio de adaptação no estabelecimento. Estes resultados evidenciaram que em cada local avaliado, as plantas originadas das sementes de diferentes procedências apresentaram um comportamento diferenciado nos caracteres relacionados à massa de matéria seca, podendo em alguns casos, tratar-se de adaptação local. Concluiu-se que os locais Carlos Botelho e Ilha do Cardoso são os mais favoráveis para a germinação das sementes de sua mesma procedência. / The aim of this work was to evaluate the presence of phenotypic plasticity and local adaptation of three provenances of the palm specie Euterpe edulis: Atlantic Rainforest, Seasonally Dry Forest and Restinga Forest, in permanent parcels inserted in three forest types of the São Paulo State (Brazil): Parque Estadual da Ilha do Cardoso, Parque Estadual de Carlos Botelho e Estação Ecológica dos Caetetus, in experiments of seedling establishment and juveniles plants growth. The data sets were analyzed using structures of groups of experiments, with crossed and nested effects. The variables related to dry matter content of plants in both assays were analyzed using linear mixed models (LMM) approach, through the incorporation of random effect factors, and using the restricted maximum likelihood method (REML) for estimation of variance components associated with these factors with a minor bias. On the other hand, germination proportion of the seeds at seedling establishment assay was analyzed using the generalized linear mixed models (GLMM) approach, under the assumption that the variable follows a binomial distribution, with logit link function. The pseudo-likelihood (PL) method was used to obtain the numerical solution of the likelihood equations. The results showed that, plants from seeds of the three biomes evaluated presented a plastic behavior for all characters assessed in the seedling establishment assay. In respect to juveniles adaptation assay, the phenotypic plasticity characteristic was observed only to the leaf dry matter content of plants from Seasonally Dry Forest biome. The local adaptation characteristic was clearly observed in the seedling establishment assay. These results showed that at each site evaluated, plants originating from seeds of different provenances exhibited different behavior on characters related to the dry matter content and may in some cases be local adaptation. It was concluded that locations Carlos Botelho and Ilha do Cardoso are the most favorable for seed germination of its same provenance. Adaptação local Análise de grupos de experimentos Modelos lineares generalizados mistos Modelos lineares mistos Plasticidade fenotípica Generalized linear mixed models Joint analysis from agronomical essays Linear mixed models Local adaptation Phenotypic plasticity
34	Implementing SAE Techniques to Predict Global Spectacles Needs Zhang, Yuxue January 2023 (has links) This study delves into the application of Small Area Estimation (SAE) techniques to enhance the accuracy of predicting global needs for assistive spectacles. By leveraging the power of SAE, the research undertakes a comprehensive exploration, employing arange of predictive models including Linear Regression (LR), Empirical Best Linear Unbiased Prediction (EBLUP), hglm (from R package) with Conditional Autoregressive (CAR), and Generalized Linear Mixed Models (GLMM). At last phase,the global spectacle needs’ prediction includes various essential steps such as random effects simulation, coefficient extraction from GLMM estimates, and log-linear modeling. The investigation develops a multi-faceted approach, incorporating area-level modeling, spatial correlation analysis, and relative standard error, to assess their impact on predictive accuracy. The GLMM consistently displays the lowest Relative Standard Error (RSE) values, almost close to zero, indicating precise but potentially overfit results. Conversely, the hglm with CAR model presents a narrower RSE range, typically below 25%, reflecting greater accuracy; however, it is worth noting that it contains a higher number of outliers. LR illustrates a performance similar to EBLUP, with RSE values reaching around 50% in certain scenarios and displaying slight variations across different contexts. These findings underscore the trade-offs between precision and robustness across these models, especially for finer geographical levels and countries not included in the initial sample. small area estimation area-level model generalized linear mixed models Conditional Autoregressive spatial correlation spectacle needs assistive products auxiliary data hglm relative standard error simulation Probability Theory and Statistics Sannolikhetsteori och statistik
35	Avaliação de técnicas de diagnóstico para a análise de dados com medidas repetidas / Evaluation of diagnostic techniques for the analysis of data with repeated measures Kurusu, Ricardo Salles 26 April 2013 (has links) Dentre as possíveis propostas encontradas na literatura estatística para analisar dados oriundos de estudos com observações correlacionadas, estão os modelos condicionais e os modelos marginais. Diversas técnicas têm sido propostas para a análise de diagnóstico nesses modelos. O objetivo deste trabalho é apresentar algumas das técnicas de diagnóstico disponíveis para os dois tipos de modelos e avaliá-las por meio de estudos de simulação. As técnicas apresentadas também foram aplicadas em um conjunto de dados reais. / Conditional and marginal models are among the possibilities in statistical literature to analyze data from studies with correlated observations. Several techniques have been proposed for diagnostic analysis in these models. The objective of this work is to present some of the diagnostic techniques available for both modeling approaches and to evaluate them by simulation studies. The presented techniques were also applied in a real dataset. conditional models diagnostic techniques equações de estimação generalizadas generalized estimating equations generalized linear mixed models hierarchical generalized linear models linear mixed models marginal models medidas repetidas modelos condicionais modelos lineares generalizados mistos modelos lineares mistos modelos marginais repeated measures técnicas de diagnóstico
36	Modelagem espacial, temporal e longitudinal: diferentes abordagens do estudo da leptospirose urbana / Space, time and longitudinal modeling: different approaches for the urban leptospirosis study Tassinari, Wagner de Souza January 2009 (has links) Made available in DSpace on 2011-05-04T12:42:00Z (GMT). No. of bitstreams: 0 Previous issue date: 2009 / (...) O objetivo desta tese foi modelar os fatores de risco associados à ocorrência de leptospirose urbana em diferentes contextos, com especial atenção para aspectos espaciais e temporais. Foram utilizadas técnicas de modelagem tais como, modelos generalizados aditivos e mistos. Também explorou-se técnicas de detecção de aglomerados espaço-temporais. (...) / Leptospirosis, a disease caused by pathogenic spirochete of the genus Leptospira, is one of the most widespread zoonoses in the world, considered a major public health problem associated with the lack of sanitation and poverty. It is endemic in Brazil, data from surveillance show that outbreaks of leptospirosis occur as cyclical annual epidemics during rainfalls. The aim of this thesis was modeling the risk factors associated with the occurrence of leptospirosis in di erent urban contexts, with particular attention to spatial and temporal aspects. We used some modeling techniques such as generalized additive and mixed models. Techniques for detection space-time clusters were also explored. This thesis has prioritized the use of free softwares - R, ubuntu linux operating system, LATEX , SatScan (this is not open source but free). This thesis was prepared in the form of three articles. In the rst article is presented a spatio-temporal analysis of leptospirosis cases occurrence in Rio de Janeiro between 1997 and 2002. Using the detection of space-time clusters - \outbreaks" method - were statistically signi cant only cluster ocorred in 1997 and 1998. Generalized Linear Mixed Models were used to evaluate the risk factors associated with the occurrence of cases that belonged to outbreaks in endemic cases. The cases belonging to the outbreaks are associated with the occurrence of rainfall over 4 mm (OR, 3.71; 95% CI, 1.83 - 7.51). There were no signi cant associations with socioeconomic covariates, in other words, being endemic or epidemic leptospirosis occurs in the same population. The second and third articles examined a seroprevalence survey and seroconversion cohort conducted in Pau da Lima community, Salvador, Bahia. In both Generalized Additive Models were used to t the exposure variables both in individuals and peridomicile context, as well as to estimate the spatial area of leptospirosis risk. The signi cant variables were: gender, age, presence of rats in the peridomicile, domicile near a trash collectin or an open sewer and domicile altitude above sea level. Studies show that individual and contextual variables explain much of the spatial variability of leptospirosis, but there are still factors that were not measured in the studies but which should be investigated. The maps of risk of seroprevalence and seroconversion show distinct regions where the spatial e ect is signi cantly di erent from the global average. It is still lack for a more robust integration between the professionals who develop and operate the GIS, epidemiologists and biostatistics. This integration represents an important advance enabling the development and use of these techniques in Public Health support. The study of prevalence and incidence of endemic areas, in the leptospirosis context, it is very complex and still grow up. The reunion of professional specialists from several areas of human knowledge (eg, clinicians, epidemiologists, geographers, biologists, statisticians, engineers, etc.), it is essential to advance the knowledge about the disease and their relationship to social inequality and environmental well to contribute to the creation of efficient and e ective measures to control endemic diseases. Epidemiologia Ambiental Analise Espacial Modelos Lineares Generalizados Mistos Modelos Aditivos Generalizados Environmental Epidemiology Spatial Analysis Generalized Linear Mixed Models Generalized additive models Leptospirose/epidemiologia Leptospirose/transmissäo Distribuição Espacial da População Zonas Urbanas -Modelos Lineares Conglomerados Espaço-Temporais Prevalência Fatores de Risco Brasil
37	Avaliação de técnicas de diagnóstico para a análise de dados com medidas repetidas / Evaluation of diagnostic techniques for the analysis of data with repeated measures Ricardo Salles Kurusu 26 April 2013 (has links) Dentre as possíveis propostas encontradas na literatura estatística para analisar dados oriundos de estudos com observações correlacionadas, estão os modelos condicionais e os modelos marginais. Diversas técnicas têm sido propostas para a análise de diagnóstico nesses modelos. O objetivo deste trabalho é apresentar algumas das técnicas de diagnóstico disponíveis para os dois tipos de modelos e avaliá-las por meio de estudos de simulação. As técnicas apresentadas também foram aplicadas em um conjunto de dados reais. / Conditional and marginal models are among the possibilities in statistical literature to analyze data from studies with correlated observations. Several techniques have been proposed for diagnostic analysis in these models. The objective of this work is to present some of the diagnostic techniques available for both modeling approaches and to evaluate them by simulation studies. The presented techniques were also applied in a real dataset. equações de estimação generalizadas medidas repetidas modelos condicionais modelos lineares generalizados mistos modelos lineares mistos modelos marginais técnicas de diagnóstico conditional models diagnostic techniques generalized estimating equations generalized linear mixed models hierarchical generalized linear models linear mixed models marginal models repeated measures
38	Assessing the robustness of genetic codes and genomes Sautié Castellanos, Miguel 06 1900 (has links) Deux approches principales existent pour évaluer la robustesse des codes génétiques et des séquences de codage. L'approche statistique est basée sur des estimations empiriques de probabilité calculées à partir d'échantillons aléatoires de permutations représentant les affectations d'acides aminés aux codons, alors que l'approche basée sur l'optimisation repose sur le pourcentage d’optimisation, généralement calculé en utilisant des métaheuristiques. Nous proposons une méthode basée sur les deux premiers moments de la distribution des valeurs de robustesse pour tous les codes génétiques possibles. En se basant sur une instance polynomiale du Problème d'Affectation Quadratique, nous proposons un algorithme vorace exact pour trouver la valeur minimale de la robustesse génomique. Pour réduire le nombre d'opérations de calcul des scores et de la borne supérieure de Cantelli, nous avons développé des méthodes basées sur la structure de voisinage du code génétique et sur la comparaison par paires des codes génétiques, entre autres. Pour calculer la robustesse des codes génétiques naturels et des génomes procaryotes, nous avons choisi 23 codes génétiques naturels, 235 propriétés d'acides aminés, ainsi que 324 procaryotes thermophiles et 418 procaryotes non thermophiles. Parmi nos résultats, nous avons constaté que bien que le code génétique standard soit plus robuste que la plupart des codes génétiques, certains codes génétiques mitochondriaux et nucléaires sont plus robustes que le code standard aux troisièmes et premières positions des codons, respectivement. Nous avons observé que l'utilisation des codons synonymes tend à être fortement optimisée pour amortir l'impact des changements d'une seule base, principalement chez les procaryotes thermophiles. / There are two main approaches to assess the robustness of genetic codes and coding sequences. The statistical approach is based on empirical estimates of probabilities computed from random samples of permutations representing assignments of amino acids to codons, whereas, the optimization-based approach relies on the optimization percentage frequently computed by using metaheuristics. We propose a method based on the first two moments of the distribution of robustness values for all possible genetic codes. Based on a polynomially solvable instance of the Quadratic Assignment Problem, we propose also an exact greedy algorithm to find the minimum value of the genome robustness. To reduce the number of operations for computing the scores and Cantelli’s upper bound, we developed methods based on the genetic code neighborhood structure and pairwise comparisons between genetic codes, among others. For assessing the robustness of natural genetic codes and genomes, we have chosen 23 natural genetic codes, 235 amino acid properties, as well as 324 thermophilic and 418 non-thermophilic prokaryotes. Among our results, we found that although the standard genetic code is more robust than most genetic codes, some mitochondrial and nuclear genetic codes are more robust than the standard code at the third and first codon positions, respectively. We also observed that the synonymous codon usage tends to be highly optimized to buffer the impact of single-base changes, mainly, in thermophilic prokaryotes. Quadratic assignment problem Cantelli’s upper bound Generalized linear mixed models Genetic code Hydrophobicity Thermophiles Codon usage bias Problème d'affectation quadratique Borne supérieure de Cantelli Code génétique Hydrophobicité Thermophiles Biais d'utilisation des codons
39	Modélisation conjointe de trajectoire socioprofessionnelle individuelle et de la survie globale ou spécifique / Joint modeling of individual socio-professional trajectory and overall or cause-specific survival Karimi, Maryam 06 June 2016 (has links) Appartenir à une catégorie socio-économique moins élevée est généralement associé à une mortalité plus élevée pour de nombreuses causes de décès. De précédentes études ont déjà montré l’importance de la prise en compte des différentes dimensions des trajectoires socio-économiques au cours de la vie. L’analyse des trajectoires professionnelles constitue une étape importante pour mieux comprendre ces phénomènes. L’enjeu pour mesurer l’association entre les parcours de vie des trajectoires socio-économiques et la mortalité est de décomposer la part respective de ces facteurs dans l’explication du niveau de survie des individus. La complexité de l’interprétation de cette association réside dans la causalité bidirectionnelle qui la sous-tend: Les différentiels de mortalité sont-ils dus à des différentielsd’état de santé initial influençant conjointement la situation professionnelle et la mortalité, ou l’évolution professionnelle influence-t-elle directement l’état de santé puis la mortalité?Les méthodes usuelles ne tiennent pas compte de l’interdépendance des changements de situation professionnelle et de la bidirectionnalité de la causalité qui conduit à un biais important dans l’estimation du lien causale entre situation professionnelle et mortalité. Par conséquent, il est nécessaire de proposer des méthodes statistiques qui prennent en compte des mesures répétées (les professions) simultanément avec les variables de survie. Cette étude est motivée par la base de données Cosmop-DADS qui est un échantillon de la population salariée française.Le premier objectif de cette thèse était d’examiner l’ensemble des trajectoires professionnelles avec une classification professionnelle précise, au lieu d’utiliser un nombre limité d’états dans un parcours professionnel qui a été considéré précédemment. A cet effet, nous avons défini des variables dépendantes du temps afinde prendre en compte différentes dimensions des trajectoires professionnelles, à travers des modèles dits de "life-course", à savoir critical period, accumulation model et social mobility model, et nous avons mis en évidence l’association entre les trajectoires professionnelles et la mortalité par cause en utilisant ces variables dans un modèle de Cox.Le deuxième objectif a consisté à intégrer les épisodes professionnel comme un sous-modèle longitudinal dans le cadre des modèles conjoints pour réduire le biais issude l’inclusion des covariables dépendantes du temps endogènes dans le modèle de Cox. Nous avons proposé un modèle conjoint pour les données longitudinales nominaleset des données de risques concurrents dans une approche basée sur la vraisemblance. En outre, nous avons proposé une approche de type méta-analyse pour résoudre les problèmes liés au temps des calculs dans les modèles conjoints appliqués à l’analyse des grandes bases de données. Cette approche consiste à combiner les résultats issus d’analyses effectuées sur les échantillons stratifiés indépendants. Dans la même perspective de l’utilisation du modèle conjoint sur les grandes bases de données, nous avons proposé une procédure basée sur l’avantage computationnel de la régression de Poisson.Cette approche consiste à trouver les trajectoires typesà travers les méthodes de la classification, et d’appliquerle modèle conjoint sur ces trajectoires types. / Being in low socioeconomic position is associated with increased mortality risk from various causes of death. Previous studies have already shown the importance of considering different dimensions of socioeconomic trajectories across the life-course. Analyses of professional trajectories constitute a crucial step in order to better understand the association between socio-economic position and mortality. The main challenge in measuring this association is then to decompose the respectiveshare of these factors in explaining the survival level of individuals. The complexity lies in the bidirectional causality underlying the observed associations:Are mortality differentials due to differences in the initial health conditions that are jointly influencing employment status and mortality, or the professional trajectory influences directly health conditions and then mortality?Standard methods do not consider the interdependence of changes in occupational status and the bidirectional causal effect underlying the observed association and that leads to substantial bias in estimating the causal link between professional trajectory and mortality. Therefore, it is necessary to propose statistical methods that consider simultaneously repeated measurements (careers) and survivalvariables. This study was motivated by the Cosmop-DADS database, which is a sample of the French salaried population.The first aim of this dissertation was to consider the whole professional trajectories and an accurate occupational classification, instead of using limitednumber of stages during life course and a simple occupational classification that has been considered previously. For this purpose, we defined time-dependent variables to capture different life course dimensions, namely critical period, accumulation model and social mobility model, and we highlighted the association between professional trajectories and cause-specific mortality using the definedvariables in a Cox proportional hazards model.The second aim was to incorporate the employment episodes in a longitudinal sub-model within the joint model framework to reduce the bias resulting from the inclusion of internal time-dependent covariates in the Cox model. We proposed a joint model for longitudinal nominal outcomes and competing risks data in a likelihood-based approach. In addition, we proposed an approach mimicking meta-analysis to address the calculation problems in joint models and large datasets, by extracting independent stratified samples from the large dataset, applying the joint model on each sample and then combining the results. In the same objective, that is fitting joint model on large-scale data, we propose a procedure based on the appeal of the Poisson regression model. This approach consist of finding representativetrajectories by means of clustering methods and then applying the joint model on these representative trajectories. Modèles conjoints Données longitudinales Risques concurrents Risque cause-spécifique Modèle de Cox Algorithme EM Maximum de vraisemblance Régression de Poisson Effets aléatoires Joint models Generalized linear mixed models Longitudinal data Copmeting risks Cause-specific hazard Cox model EM algorithm Maximum likelihood Poisson regression Random effects
40	Wiederholungen in Texten / segmentieren und klassifizieren mit vollständigen Substringfrequenzen Golcher, Felix 16 December 2013 (has links) Diese Arbeit untersucht vollständige Zeichenkettenfrequenzverteilungen natürlichsprachiger Texte auf ihren linguistischen und anwendungsbezogenen Gehalt. Im ersten Teil wird auf dieser Datengrundlage ein unüberwachtes Lernverfahren entwickelt, das Texte in Morpheme zerlegt. Die Zerlegung geht von der Satzebene aus und verwendet jegliche vorhandene Kontextinformation. Es ergibt sich ein sprachunabhängiger Algorithmus, der die gefundenen Morpheme teilweise zu Baumstrukturen zusammenordnet. Die Evaluation der Ergebnisse mit Hilfe statistischer Modelle ermöglicht die Identifizierung auch kleiner Performanzunterschiede. Diese sind einer linguistischen Interpretation zugänglich. Der zweite Teil der Arbeit besteht aus stilometrischen Untersuchungen anhand eines Textähnlichkeitsmaßes, das ebenfalls auf vollständigen Zeichenkettenfrequenzen beruht. Das Textähnlichkeitsmaß wird in verschiedenen Varianten definiert und anhand vielfältiger stilometrischer Fragestellungen und auf Grundlage unterschiedlicher Korpora ausgewertet. Dabei ist ein wiederholter Vergleich mit der Performanz bisheriger Forschungsansäzte möglich. Die Performanz moderner Maschinenlernverfahren kann mit dem hier vorgestellten konzeptuell einfacheren Verfahren reproduziert werden. Während die Segmentierung in Morpheme ein lokaler Vorgang ist, besteht Stilometrie im globalen Vergleich von Texten. Daher bietet die Untersuchung dieser zwei unverbunden scheinenden Fragestellungen sich gegenseitig ergänzende Perspektiven auf die untersuchten Häufigkeitsdaten. Darüber hinaus zeigt die Diskussion der rezipierten Literatur zu beiden Themen ihre Verbindungen durch verwandte Konzepte und Denkansätze auf. Aus der Gesamtheit der empirischen Untersuchungen zu beiden Fragestellungen kann abgeleitet werden, dass den längeren und damit selteneren Zeichenketten wesentlich mehr Informationsgehalt innewohnt, als in der bisherigen Forschung gemeinhin angenommen wird. / This thesis investigates the linguistic and application specific content of complete character substring frequency distributions of natural language texts. The first part develops on this basis an unsupervised learning algorithm for segmenting text into morphemes. The segmentation starts from the sentence level and uses all available context information. The result is a language independent algorithm which arranges the found morphemes partly into tree like structures. The evaluation of the output using advanced statistical modelling allows for identifying even very small performance differences. These are accessible to linguistic interpretation. The second part of the thesis consists of stylometric investigations by means of a text similarity measure also rooted in complete substring frequency statistics. The similarity measure is defined in different variants and evaluated for various stylometric tasks and on the basis of diverse corpora. In most of the case studies the presented method can be compared with publicly available performance figures of previous research. The high performance of modern machine learning methods is reproduced by the considerably simpler algorithm developed in this thesis. While the segmentation into morphemes is a local process, stylometry consists in the global comparison of texts. For this reason investigating of these two seemingly unconnected problems offers complementary perspectives on the explored frequency data. The discussion of the recieved litarature concerning both subjects additionally shows their connectedness by related concepts and approaches. It can be deduced from the totality of the empirical studies on text segmentation and stylometry conducted in this thesis that the long and rare character sequences contain considerably more information then assumed in previous research. Stilometrie Morphologische Induktion lineare gemischte Modelle generalisierte lineare gemischte Modelle logarithmische Transformation Stylometry Morphological Induction linear mixed models generalized linear mixed models logarithmic transformation 430 Deutsch und verwandte Sprachen ES 965 ddc:430

Search results