Global ETD Search

131	Estudo do valor adaptativo anual de fêmeas da raça Nelore utilizando modelos de regressão aleatória / Pessoa, Matilde da Conceição. January 2011 (has links) Orientador: Henrique Nunes de Oliveira / Banca: Marcilio Dias Silveira da Mota / Banca: Maria Eugênia Zerlotti Mercadante / Resumo: O Objetivo deste trabalho foi avaliar o valor adaptativo anual para possível utilização como critério de seleção para a eficiência reprodutiva de fêmeas da raça Nelore. Foram estudadas medidas de valor adaptativo do 4º ao 13º ano de permanência no rebanho de 21.610 fêmeas. Os valores adaptativos anuais foram calculados com base na capacidade de sobrevivência e no número de crias deixado ano após ano. O modelo de melhor ajuste aos dados, segundo os critérios adotados, foi o de 5ª ordem para a tendência média da população, 5ª ordem para o efeito genético aditivo direto e 3ª ordem para efeito de ambiente permanente de animal. O modelo heterogêneo com 10 classes foi o mais adequado na modelagem da das variâncias residuais. As herdabilidades para valor adaptativo anual aumentaram com a idade dos animais (0,05 a 0,55). As correlações entre os valores adaptativos em diferentes idades foram baixas nas idades menores e altas entre as idades adultas. A tendência genética para valor adaptativo anual foi realizada com base nos valores genéticos preditos referentes às medidas adaptativas do 4º (Pti4), 8º(Pti8) e 13º(Pti13) ano de idade. Como critério de comparação foram utilizadas as características idade ao primeiro parto (Ipp) e stayability (Stay). As associações entre os valores genéticos preditos das características foram feitas utilizando a correlação de Pearson e porcentagem de touros coincidentes. Estimativas de herdabilidade para Ipp, Stay1 e Stay2 foram respectivamente 0,12, 0,33 e 0,40. As tendências genéticas indicaram que houve ganhos para Pti4 e Pti13 e, para Pti8 as médias dos valores genéticos se mantiveram quase que constantes com o passar dos anos. As associações entre os valores genéticos indicaram maior associação entre valores genéticos preditos para valor adaptativo medido no 4º ano e valores genéticos preditos para as características Ipp e Stay / Abstract: The objective of this study was to evaluate the annual Fitness as selection criteria for reproductive performance of Nelore cows. We studied measures of fitness of the 4th to the 13th year of stayability of 21,610 females. The annual fitness was calculated based on survivability and the number of offspring left year after year. The most appropriate model, according to criteria adopted, was a 5th order for the average trend of the population, 5th order for the direct genetic effect and 3th order for the permanent environmental effect of animal. The heterogeneous model with 10 classes was the most appropriated in modeling of residual variances. Heritability estimates for annual fitness increased with age of animals (0.05 to 0.55). The correlations between fitness at ages different were lower in younger ages, and high among the adult ages. The genetic trend for annual fitness was based on predicted breeding values to adaptive measures relating to the 4th (Pti4), 8th (Pti8) and 13th (Pti13) years of age. As criterion for comparison were used the traits age at first calving (IPP) and stayability (Stay). The associations between predicted breeding values of traits were made using Pearson correlation and percentage of bulls coincide. Heritability estimates for Ipp, and Stay1 Stay2 were respectively 0.12, 0.33 and 0.40. The genetic trends indicated that there were gains for Pti4 and Pti13, however for Pti8, the average genetic values remained almost constant over the years. The associations between breeding values indicate greater association between breeding values for annual fitness measured in year 4th year and the breeding values for the traits Ipp and Stay / Mestre Bovino de corte - Melhoramento genetico. Reprodução animal. Evolução. Beef cattle. eng Longitudinal data. eng Evolution. eng Reproduction. eng
132	Fouille de données billettiques pour l'analyse de la mobilité dans les transports en commun / Analysis of Mobility in Public Transport Systems Through Machine Learning Applied to Ticketing Log Data Briand, Anne-Sarah 05 December 2017 (has links) Les données billettiques sont de plus en plus utilisées pour l'analyse de la mobilité dans les transports en commun. Leur richesse spatiale et temporelle ainsi que leur volume, en font un bon matériel pour une meilleure compréhension des habitudes des usagers, pour prédire les flux de passagers ou bien encore pour extraire des informations sur les événements atypiques (ou anomalies), correspondant par exemple à un accroissement ou à une baisse inhabituelle du nombre de validations enregistrées sur le réseau.Après une présentation des travaux ayant été menés sur les données billettiques, cette thèse s'est attachée à développer de nouveaux outils de traitement de ces données. Nous nous sommes particulièrement intéressés à deux challenges nous semblant non encore totalement résolus dans la littérature : l'aide à la mise en qualité des données et la modélisation et le suivi des habitudes temporelles des usagers.Un des principaux challenges de la mise en qualité des données consiste en la construction d'une méthodologie robuste qui soit capable de détecter des plages de données potentiellement problématique correspondant à des situations atypiques et ce quel que soit le contexte (jour de la semaine, vacances, jours fériés, ...). Pour cela une méthodologie en deux étapes a été déployée, à savoir le clustering pour la détermination du contexte et la détection d'anomalies. L'évaluation de la méthodologie proposée a été entreprise sur un jeu de données réelles collectées sur le réseau de transport en commun rennais. En croisant les résultats obtenus avec les événements sociaux et culturels de la ville, l'approche a permis d'évaluer l'impact de ces événements sur la demande en transport, en termes de sévérité et d'influence spatiale sur les stations voisines.Le deuxième volet de la thèse concerne la modélisation et le suivi de l'activité temporelle des usagers. Un modèle de mélange de gaussiennes a été développé pour partitionner les usagers dans les clusters en fonction des heures auxquelles ils utilisent les transports en commun. L'originalité de la méthodologie proposée réside dans l'obtention de profils temporels continus pour décrire finement les routines temporelles de chaque groupe d'usager. Les appartenance aux clusters ont également été croisées avec les données disponibles sur les usagers (type de carte) en vue d'obtenir une description plus précise de chaque cluster. L'évolution de l'appartenance aux clusters au cours des années a également été analysée afin d'évaluer la stabilité de l'utilisation des transports d'une année sur l'autre. / Ticketing logs are being increasingly used to analyse mobility in public transport. The spatial and temporal richness as well as the volume of these data make them useful for understanding passenger habits and predicting origin-destination flows. Information on the operations carried out on the transportation network can also be extracted in order to detect atypical events (or anomalies), such as an unusual increase or decrease in the number of validations.This thesis focuses on developing new tools to process ticketing log data. We are particularly interested in two challenges that seem to be not yet fully resolved in the literature: help with data quality as well as the modeling and monitoring of passengers' temporal habits.One of the main challenges in data quality is the construction of a robust methodology capable of detecting atypical situations in any context (day of the week, holidays, public holidays, etc.). To this end, two steps were deployed, namely clustering for context estimation and detection of anomalies. The evaluation of the proposed methodology is conducted on a real dataset collected on the Rennes public transport network. By cross-comparing the obtained results with the social and cultural events of the city, it is possible to assess the impact of these events on transport demand, in terms, of severity and spatial influence on neighboring stations.The second part of the thesis focuses on the modeling and the tracking of the temporal activity of passengers. A Gaussian mixture model is proposed to partition passengers into clusters according to the hours they use public transport. The originality of the methodology compared to existing approaches lies in obtaining continuous time profiles in order to finely describe the time routines of each passenger cluster. Cluster memberships are also cross-referenced with passenger data (card type) to obtain a more accurate description of each cluster. The cluster membership over the years has also been analyzed in order to study how the use of transport evolves Apprentissage statistique Données spatiales Données longitudinales Masse de données Suivi temporel Statistical learning Spatial data Longitudinal data Mass data Time tracking
133	Estudo genético e quantitativo da contagem de células somáticas em bubalinos leiteiros / Mendoza-Sánchez, Geovanny. January 2007 (has links) Resumo: Considerando-se que a contagem de células somáticas (CCS) de amostras de leite é um valioso indicador da saúde do úbere de búfalas, foi desenvolvido este trabalho com o objetivo de estimar a relação existente entre a CCS e a produção de leite (PL). Foram analisadas informações de 9404 amostras de controles de CCS e PL, referentes a 2198 lactações de animais da raça Murrah com idades entre 2 e 15 anos, filhas de 187 reprodutores, que ocorreram entre os anos 1997 e 2005. Para quantificar as perdas de PL em relação à CCS, nas análises de variância para a variável PL, foram incluídos no modelo os efeitos fixos de fazenda, ordem e ano de parto e estação do parto o escore da contagem de células somáticas (ECCS) como covariável, o efeito de animal dentro da fazenda foi considerado como aleatório. Para a estimação de parâmetros genéticos para a CCS, utilizaram-se "test day models", a média da contagem de células somáticas na lactação (CCSt270) e a produção de leite aos 270 dias (PL270); os componentes de (co) variância foram estimados usando método de máxima verossimilhança restrita. A CCS de cada mês da lactação foi considerada como uma característica distinta, em análises uni e bicaracterísticas, o modelo incluiu como efeitos aleatórios, o genético aditivo direto e de ambiente permanente e residual. Além disso, foram considerados como efeitos fixos: grupo de contemporâneos, número de controle e idade da vaca ao parto como covariável (efeito linear e quadrático). Para a CCSt nos diferentes meses, os grupos de contemporâneos foram definidos como... (Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Considering that the somatic cells count (SCC) of samples of milk is a valuable indicator of the health of the buffaloes' udder, this work was developed with the objective of estimating the relationship between SCC and milk yield (MY). Information on 9404 SCC and MY controls were analyzed. Data contained 2198 lactations of Murrah animals aging between 2 and 15 years, daughters of 187 sires, from 1997 and 2005. To quantify the decreases of MY in relation to SCC, the analyses of variance for the variable MY, included in the model a random animal effect nested in farm and the fixed effects of farm, order and year of parity and season of parity, somatic cells count score (SCCE) as covariate. For estimating genetic parameters for SCC, "test day models" were used. For average of somatic cells count in the lactation (SCCt270) and milk yield to 270 days (MY270); the (co) variance components were estimated using Restricted Maximum Likelihood (MTDFREML). SCCs of every month of lactation were considered as different traits in single and double trait analyses. The model included genetic additive, permanent environmental (for SCCt270 and for MY270) and residual random effects. Other fixed effects were: contemporary group; control number and age of cow at parity as a covariate (linear and quadratic effects). For CCSt, contemporary groups were defined as flock-year-month of the control, and for SCCt270 and MY270 as herd-year- season of the parity. x It was found that all effects influenced the expression of SCCE. For first parity females, no relationship between MY and SCC was found. The results indicated that the largest decreases were observed in females with more than one parity. This category... (Complete abstract click electronic access below) / Orientador: Humberto Tonhati / Coorientador: Lenira El Faro Zadra / Coorientador: Mario Fernando Ceron Munoz / Banca: Danísio Prado Munari / Banca: Lucia Galvão de Albuquerque / Banca: Vera Lúcia Cardoso / Banca: Maria Eugênia Zerlotti Mercadante / Doutor Bufalo - Aspectos genéticos. Celulas. Mastite. Genetic evaluation. eng Buffaloes. eng Variance components. eng Longitudinal data. eng Heritability. eng Mastitis. eng
134	Estimativas de (co) varância genética de pesos do nascimento até a maturidade em rebanhos da raça Nelore usando modelos de regressão aleatória e de características múltiplas / Boligon, Arione Augusti. January 2008 (has links) Resumo: Foram estimados parâmetros genéticos para pesos do nascimento à idade adulta de animais da raça Nelore por meio de análises uni, bi e multicaracterísticas e modelos de regressão aleatória. Os dados utilizados são de animais nascidos de 1975 e 2002, provenientes de 8 fazendas participantes do Programa de Melhoramento Genético da Raça Nelore (PMGRN). Os pesos foram obtidos do nascimento aos 8 anos de idade. Nas análises uni, bi e multicaracterísticas foram utilizados pesos em idades padrão como nascimento, desmama, ano, sobreano e aos 2, 3 e 5 anos de idade. Também foram realizadas análises utilizando o peso mais próximo aos 4,5 anos de idade como indicativo de peso adulto, considerando uma única medida a partir de 2, 3 e 4 anos de idade ou como registros repetidos de pesos a partir dessas mesmas idades. Nas análises de regressão aleatória, foram utilizados pesos de fêmeas do nascimento aos 8 anos de idade, considerando como variáveis independentes polinômios de Legendre da idade na data da pesagem. A variância residual foi modelada por meio de classes variando de 1 a 5. Foram utilizados 8 modelos de coeficientes de regressão aleatória para os efeitos direto e materno de animal, e de ambiente permanente de animal e materno. O modelo multicaracterística, incluindo registros de pesos ao desmame e à seleção é o mais indicado para a avaliação genética de pesos pós-desmama. Em avaliações genéticas para a característica de peso adulto, o emprego de modelos de repetibilidade, considerando pesos a partir de 3 anos de idade, seria o mais adequado em relação à utilização de medida única...(Resumo completo, clicar acesso eletrônico abaixo) / Abstract: Weight records of Nelore cattle from birth to mature age were analyzed using univariate, bivariate, multivariate and random regression models. Records of Nelore cattle born from 1975 to 2002, from 8 herds participating in the Nelore Cattle Breeding Program (NCBP) were used. The weights were obtained from birth to 8 years of age. Weights at birth, weaning, yearling, 18 months and 2, 3 and 5 years of age were analyzed using univariate, bivariate and multivariate models. Also, as indicative of the mature weight, the weight closest to 4.5 years of age, was analyzed considering only one record or repeated records obtained from 2, 3 and 4 years of age. For random regression models age of cow varied from birth to 8 years. Direct and maternal genetic and, animal and maternal permanent environmental variances were modeled by random regression on Legendre polynomials of age at recording, with order of fit from 3 to 6 and a total of 8 models. Residual variances were modeled by a step function with 1 or 5 classes. The multivariate model including weight records at weaning and at selection age is the most indicated for genetic evaluation of pos-weaning weights. For genetic evaluation of mature weight to use repeated records obtained from 3 years of age is better than only one record per animal. The random regression models were able to model changes of variances with age adequately, with parameter estimates similar to those obtained by multivariate analyses. The model with direct and maternal genetic effects, animal and maternal permanent environmental effects ajusted by quartic, cubic, sixth and cubic polynomials, respectively, and residual variances modeled by 5 classes, was the most adequate to describe the covariance structure of the data...(Complete abstract, click electronic access below) / Orientadora: Lucia Galvão de Albuquerque / Coorientadora: Maria Eugênia Zerlotti Mercadante / Banca: Henrique Nunes de Oliveira / Banca: Cláudia Cristina Paro de Paz / Mestre Nelore (Zebu) - Melhoramento genético. Growth curve. eng Longitudinal data. eng Mature weight. eng Random regression model. eng
135	Modelos de transição para dados binários / Transition models for binary data Idemauro Antonio Rodrigues de Lara 31 October 2007 (has links) Dados binários ou dicotômicos são comuns em muitas áreas das ciências, nas quais, muitas vezes, há interesse em registrar a ocorrência, ou não, de um evento particular. Por outro lado, quando cada unidade amostral é avaliada em mais de uma ocasião no tempo, tem-se dados longitudinais ou medidas repetidas no tempo. é comum também, nesses estudos, se ter uma ou mais variáveis explicativas associadas às variáveis respostas. As variáveis explicativas podem ser dependentes ou independentes do tempo. Na literatura, há técnicas disponíveis para a modelagem e análise desses dados, sendo os modelos disponíveis extensões dos modelos lineares generalizados. O enfoque do presente trabalho é dado aos modelos lineares generalizados de transição para a análise de dados longitudinais envolvendo uma resposta do tipo binária. Esses modelos são baseados em processos estocásticos e o interesse está em modelar as probabilidades de mudanças ou transições de categorias de respostas dos indivíduos no tempo. A suposição mais utilizada nesses processos é a da propriedade markoviana, a qual condiciona a resposta numa dada ocasião ao estado na ocasião anterior. Assim, são revistos os fundamentos para se especificar tais modelos, distinguindo-se os casos estacionário e não-estacionário. O método da máxima verossimilhança é utilizado para o ajuste dos modelos e estimação das probabilidades. Adicionalmente, apresentam-se testes assintóticos para comparar tratamentos, baseados na razão de chances e na diferença das probabilidades de transição. Outra questão explorada é a combinação do modelo de efeitos aleatórios com a do modelo de transição. Os métodos são ilustrados com um exemplo da área da saúde. Para esses dados, o processo é considerado estacionário de ordem dois e o teste proposto sinaliza diferença estatisticamente significativa a favor do tratamento ativo. Apesar de ser uma abordagem inicial dessa metodologia, verifica-se, que os modelos de transição têm notável aplicabilidade e são fontes para estudos e pesquisas futuras. / Binary or dichotomous data are quite common in many fields of Sciences in which there is an interest in registering the occurrence of a particular event. On the other hand, when each sampled unit is evaluated in more than one occasion, we have longitudinal data or repeated measures over time. It is also common, in longitudinal studies, to have explanatory variables associated to response measures, which can be time dependent or independent. In the literature, there are many approaches to modeling and evaluating these data, where the models are extensions of generalized linear models. This work focus on generalized linear transition models suitable for analyzing longitudinal data with binary response. Such models are based on stochastic processes and we aim to model the probabilities of change or transitions of individual response categories in time. The most used assumption in these processes is the Markov property, in which the response in one occasion depends on the immediately preceding response. Thus we review the fundamentals to specify these models, showing the diferences between stationary and non-stationary processes. The maximum likelihood approach is used in order to fit the models and estimate the probabilities. Furthermore, we show asymptotic tests to compare treatments based on odds ratio and on the diferences of transition probabilities. We also present a combination of random-efects model with transition model. The methods are illustrated with health data. For these data, the process is stationary of order two and the suggested test points to a significant statistical diference in favor of the active treatment. This work is an initial approach to transition models, which have high applicability and are great sources for further studies and researches. Análise de dados longitudinais Modelos lineares generalizados Processos estocásticos Verossimilhança Analysis of longitudinal data Generalized linear model Likelihood Stochastic processes
136	Modelos lineares mistos para análise de dados longitudinais bivariados provenientes de ensaios agropecuários / Linear mixed models in the analysis bivariate longitudinal data from agricultural essays Simone Silmara Werner Gurgel do Amaral 19 September 2013 (has links) Em estudos longitudinais, repetidas observações de uma mesma variável resposta são coletadas na mesma unidade experimental, em diferentes ocasiões. Como diferentes observações são realizadas na mesma unidade, espera-se que estas sejam correlacionadas, e que exista uma heterogeneidade de variâncias nas diferentes ocasiões. Dados longitudinais multivariados são obtidos quando um conjunto de diferentes variáveis respostas são mensuradas na mesma unidade experimental repetidas vezes ao longo do tempo; nesse caso, além da correlação entre observações realizadas na mesma unidade experimental, deve-se considerar também a correlação entre diferentes variáveis respostas. Uma forma de analisar dados longitudinais bivariados é empregar um modelo misto para cada uma das variáveis respostas e uni-los em um modelo misto bivariado especificando a distribuição conjunta para os efeitos aleatórios. As estimativas dos parâmetros desta distribuição comum podem ser usadas para avaliar a relação entre as diferentes respostas. Para exemplificar a utilização da técnica, foram utilizados dados de armazenamento de leite UAT. Os modelos lineares mistos bivariados foram ajustados por meio do software SAS e a análise gráfica foi realizada por meio do software R. Para seleção dos modelos empregou-se os Critérios de Informação de Akaike (AIC) e Bayesiano (BIC), e o teste da razão de verossimilhanças para comparação de modelos encaixados. A utilização do modelo linear misto bivariado permitiu modelar a heterogeneidade de variâncias entre ocasiões e a correlação entre diferentes medidas na mesma unidade experimental, bem como a correlação entre as variáveis respostas. / In longitudinal studies, repeated measurements of a response variable are taken in the same experimental unit over time. . Since different observations are measured on the same experimental unit, it is expected that there is correlation among the repeated measurements and heterogeneity of variances in different occasions. Multivariate Longitudinal Data are obtained when we measure a number of different response variables in the same experimental unit repeatedly over time; in this case, we should also observe a correlation between the different response variables. One way to analyze bivariate longitudinal data is to use a mixed model for each of the response variables, and unite them in bivariate mixed models specifying the joint distribution for random effects. Parameter estimates of this common distribution may be used to evaluate the relationship between different responses. As an example of the use of the technique, UHT milk storage data were used. Models were fitted using SAS software and the graphical analysis was done with software R. To model selection, Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) were used, and maximum likelihood ratio test was used to compare nested models. The use of bivariate mixed linear model allowed to model the heteroscedasticity of the occasions, the correlation between the different measurements in the same experimental unit and also the correlation between the different response variables. Armazenamento de produtos Correlação Dados longitudinais Heterogeneidade de variâncias Modelos mistos bivariados Bivariate mixed models Correlation Heteroscedasticity Longitudinal data Product storage
137	Bayesian Inference on Longitudinal Semi-continuous Substance Abuse/Dependence Symptoms Data Xing, Dongyuan 16 September 2015 (has links) Substance use data such as alcohol drinking often contain a high proportion of zeros. In studies examining the alcohol consumption in college students, for instance, many students may not drink in the studied period, resulting in a number of zeros. Zero-inflated continuous data, also called semi continuous data, typically consist of a mixture of a degenerate distribution at the origin (zero) and a right-skewed, continuous distribution for the positive values. Ignoring the extreme non-normality in semi-continuous data may lead to substantially biased estimates and inference. Longitudinal or repeated measures of semi-continuous data present special challenges in statistical inference because of the correlation tangled in the repeated measures on the same subject. Linear mixed-eects models (LMM) with normality assumption that is routinely used to analyze correlated continuous outcomes are inapplicable for analyzing semi-continuous outcome. Data transformation such as log transformation is typically used to correct the non-normality in data. However, log-transformed data, after the addition of a small constant to handle zeros, may not successfully approximate the normal distribution due to the spike caused by the zeros in the original observations. In addition, the reasons that data transformation should be avoided include: (i) transforming usually provides reduced information on an underlying data generation mechanism; (ii) data transformation causes diculty in regard to interpretation of the transformed scale; and (iii) it may cause re-transformation bias. Two-part mixed-eects models with one component modeling the probability of being zero and one modeling the intensity of nonzero values have been developed over the last ten years to analyze the longitudinal semi-continuous data. However, log transformation is still needed for the right-skewed nonzero continuous values in the two-part modeling. In this research, we developed Bayesian hierarchical models in which the extreme non-normality in the longitudinal semi-continuous data caused by the spike at zero and right skewness was accommodated using skew-elliptical (SE) distribution and all of the inferences were carried out through Bayesian approach via Markov chain Monte Carlo (MCMC). The substance abuse/dependence data, including alcohol abuse/dependence symptoms (AADS) data and marijuana abuse/dependence symptoms (MADS) data from a longitudinal observational study, were used to illustrate the proposed models and methods. This dissertation explored three topics: First, we presented one-part LMM with skew-normal (SN) distribution under Bayesian framework and applied it to AADS data. The association between AADS and gene serotonin transporter polymorphism (5-HTTLPR) and baseline covariates was analyzed. The results from the proposed model were compared with those from LMMs with normal, Gamma and LN distributional assumptions. Simulation studies were conducted to evaluate the performance of the proposed models. We concluded that the LMM with SN distribution not only provides the best model t based on Deviance Information Criterion (DIC), but also offers more intuitive and convenient interpretation of results, because it models the original scale of response variable. Second, we proposed a flexible two-part mixed-effects model with skew distributions including skew-t (ST) and SN distributions for the right-skewed nonzero values in Part II of model under a Bayesian framework. The proposed model is illustrated with the longitudinal AADS data and the results from models with ST, SN and normal distributions were compared under different random-effects structures. Simulation studies are conducted to evaluate the performance of the proposed models. Third, multivariate (bivariate) correlated semi-continuous data are also commonly encountered in clinical research. For instance, the alcohol use and marijuana use may be observed in the same subject and there might be underlying common factors to cause the dependence of alcohol and marijuana uses. There is very limited literature on multivariate analysis of semi-continuous data. We proposed a Bayesian approach to analyze bivariate semi-continuous outcomes by jointly modeling a logistic mixed-effects model on zero-inflation in either response and a bivariate linear mixed-effects model (BLMM) on the positive values through a correlated random-effects structure. Multivariate skew distributions including ST and SN distributions were used to relax the normality assumption in BLMM. The proposed models were illustrated with an application to the longitudinal AADS and MADS data. A simulation study was conducted to evaluate the performance of the proposed models. Two-part mixed-effects model Substance abuse/dependence symptoms data Bayesian analysis Skewed distributions Semi-continuous longitudinal data Biostatistics
138	Mönster som leder till sjukfrånvaro : Sekvensanalys på longitudinella data / Patterns that lead to sick leave : Sequence analysis on longitudinal data Jesperson, Sara, Johansson, Sara January 2017 (has links) Sjukfrånvaro innebär en kostnad för både arbetsgivare och arbetstagare. För en anonym fullgrossist är detta ett problem på en av deras lagerlokaler, där sjukfrånvaron är hög. Uppsatsen syftar till att identifiera intressanta mönster över tid som leder till sjukfrånvaro genom att analysera data från företagets lönesystem och tidssystem. Datamaterialet är longitudinellt och för att upptäcka mönster som leder till sjukfrånvaro används sekvensanalys. För att generera de sekventiella mönstren används algoritmen cSPADE då den möjliggör att tidsbegränsningar kan anges för sekvenserna. Relevansen hos de genererade sekvenserna utvärderas med tre intressemått: support, konfidens och lift. Tre separata analyser genomförs där olika antal variabler används, beroende på om de förändras över tid eller har ett konstant värde, och för dessa analyser aggregeras data veckovis. De vanligaste händelserna som leder till sjukfrånvaro hos expeditörer är olika anställningstider, kön och födelseår. Några dagars sjukfrånvaro under en vecka, det vill säga mellan 8 och 40 timmar, är mer förekommande bland expeditörerna jämfört med kortare respektive längre sjukfrånvaro. Det går att konstatera att mönster med tidigare sjukfrånvaro ofta leder till fortsatt sjukfrånvaro. Uppsatsen belyser även de problem som uppstår inom sekvensanalys, till exempel att konstanta variabler överskuggar de icke-konstanta variablerna i de genererade sekvenserna. Detta händer när variabler som förändras över tiden används i kombination med variabler som har konstanta värden, något som kan förekomma i longitudinella datamaterial. / Absence due to sickness results in a cost to both employers and employees. For an unnamed wholesaler this is a problem at one of their warehouses, where the rate of sick leave is high. The aim of this thesis is to identify interesting patterns over time that lead to sick leave by analyzing data from the company's payroll system and their attendance system. The data is longitudinal and to detect the patterns that lead to sick leave, sequence analysis is used. To generate the sequential patterns the algorithm cSPADE is used since it allows time constraints to be specified for the sequences. The relevance of the generated sequences is evaluated with three interest measures: support, confidence and lift. Three separate analyses are performed where different variables are used, depending on whether they change over time or have a constant value, and for these analyses the data is aggregated weekly. The most common events that lead to sick leave for the employees are different duration of employment, gender and birth year. A few days sick leave during a week, namely between 8 and 40 hours, is more common among the employees compared to shorter and longer sick leave. It can be noted that the pattern of previous sick leave usually leads to continued sick leave. The thesis also highlights the problems that arise in sequence analysis, for example that the constant variables overshadow the non-constant variables in the resulting sequences. This happens when variables that change over time are used in combination with variables that have a constant value, which may occur in longitudinal data. Sequence analysis Longitudinal data cSPADE Sick leave Sekvensanalys Longitudinella data cSPADE Sjukfrånvaro Probability Theory and Statistics Sannolikhetsteori och statistik
139	Time Trends and Predictors of Initiation for Cigarette and Waterpipe Smoking Among Jordanian School Children: Irbid, 2008-2011 McKelvey, Karma L, PhD 23 June 2014 (has links) Smoking prevalence among adolescents in the Middle East remains high while rates of smoking have been declining among adolescents elsewhere. The aims of this research were to (1) describe patterns of cigarette and waterpipe (WP) smoking, (2) identify determinants of WP smoking initiation, and (3) identify determinants of cigarette smoking initiation in a cohort of Jordanian school children. Among this cohort of school children in Irbid, Jordan, (age ≈ 12.6 at baseline) the first aim (N=1,781) described time trends in smoking behavior, age at initiation, and changes in frequency of smoking from 2008-2011 (grades 7 – 10). The second aim (N=1,243) identified determinants of WP initiation among WP-naïve students; and the third aim (N=1,454) identified determinants of cigarette smoking initiation among cigarette naïve participants. Determinants of initiation were assessed with generalized mixed models. All analyses were stratified by gender. Baseline prevalence of current smoking (cigarettes or WP) for boys and girls was 22.9% and 8.7% respectively. Prevalence of ever- and current- any smoking, cigarette smoking, WP smoking, and dual cigarette/WP smoking was higher in boys than girls each year (p These studies reveal intensive smoking patterns at early ages among Jordanian youth in Irbid, characterized by a predominance of WP smoking. WP may be a vehicle for tobacco dependence and subsequent cigarette uptake. The sizeable incidence of WP and cigarette initiation among students of both sexes points to a need for culturally relevant smoking prevention interventions. Gender-specific factors, refusal skills, and smoking cessation of both WP and cigarettes for youth and their parents/teachers would be important components of such initiatives. cigarette cohort initiation Jordan longitudinal school children smoking time trends waterpipe Epidemiology Public Health
140	Modelagem para probabilidade de frutificação em café Arábica baseado em ocupação de metâmeros / Modeling for probability of fruiting in Arabica coffee based on occupancy metameres Luiza Yoko de Barros 10 February 2017 (has links) Este é um trabalho proveniente de vários encontros com pesquisadores da Embrapa de Campinas, cujo objetivo principal se centrou em investigar a probabilidade de frutificação em árvores de Café Arábica. Para isso foram analisados 20 bancos de dados para árvores de Café Arábica de um cultivar localizado no Instituto Agronômico do Paraná em dois momentos distintos (Junho de 2010 e em Novembro - Dezembro de 2010) e consideradas as seguintes variáveis: \"OCUPAÇÃO\" (quadrática ou retangular), \"ESPAÇAMENTO\" (6.000 plantas/ha e 10.000 plantas/ha) ambas relativas ao cultivo das plantas e \"TAMANHO DO ENTRENÓ\" (medição em cm de uma partição definida da planta). Na busca de um modelo representativo de tal fenômeno, foram estudadas paralelamente tópicos relativos a alometria e assimetria dessas mesmas plantas, os quais permitiram modelar determinadas associações entre algumas estruturas como largura e comprimento de folhas. Os modelos ajustados apresentaram uma grande significância para a variável \"ESPAÇAMENTO\" nos dois tempos estudados, enquanto que a variável \"OCUPAÇÃO\" foi significativa apenas no segundo tempo e variável \"TAMANHO DO ENTRENÓ\" não foi significativa para nenhum dos tempos. A metodologia adotada para investigar essa probabilidade se deu através dos modelos de regressão logístico. Com o intuito de agregar a variável \"TEMPO\", juntou-se os dois bancos de dados em diferentes tempos e, baseado na metodologia de modelos mistos, obteve-se um modelo ajustado com retas paralelas, onde apenas a variável \"ESPAÇAMENTO\" foi considerada significativa. / This work is based on several meetings with Embrapa researchers from Campinas in an integrated study with professors from the ESALQ Department of Statistics and Agronomic Experimentation, whose main objective was to investigate the probability of fruiting in Arabica Coffee trees. In order to do so, it was analyzed 20 databases for Coffee trees of a cultivar located at the Agronomic Institute of Paraná at two different times (June 2010 and November - December 2010), where the following variables were considered: \"OCCUPATION \"(quadratic or rectangular),\"SPACING \"(6,000 plants / ha and 10,000 plants / ha), both related to plant cultivation and \"SIZE OF ENTRENO\"( measured in cm of a defined plant partition). In order to find a representative model of this phenomenon, topics related to allometry and asymmetry of these same plants were studied in parallel, which allowed to model certain associations between some structures such as width and length of leaves. The adjusted models presented a great significance for the variable \"SPACING\"in the two studied times, whereas the variable \"OCCUPATION\"was significant only in the second time and the variable \"SIZE OF THE ENTRENÓ\"was not significant for any of the times. The methodology used to investigate this probability was based on logistic regression models. In order to aggregate the variable \"TIME\", the two databases were combined at different times and, based on the methodology of mixed models, a model adjusted with parallel lines was obtained, where only the variable \"SPACING\"was considered significant. Café Arábica Dados longitudinais Metâmero Modelos mistos Regressão logística Arabica Cofee Logistic regression Longitudinal data Metamer Mixed models

Search results