Global ETD Search

61	Efeito de parâmetros ambientais na migração de baleias-jubarte (Megaptera novaeangliae) entre Mar de Scotia e Banco dos Abrolhos / Effect of environmental parameters in the migration of humpback whales (Megaptera novaeangliae) between Scotia Sea and Abrolhos Bank Abras, Daniela Rodrigues 24 February 2015 (has links) Fatores exógenos, como fotoperíodo, temperatura da superfície do mar e abundância de presas, e endógenos, como os ciclos circadianos e circanuais e alterações metabólicas são conhecidos como iniciadores dos movimentos migratórios. Este trabalho tem como objetivo estabelecer os principais parâmetros iniciadores da migração das baleias-jubarte. Foram analisados o fotoperíodo, índice de oscilação do oceano austral (SOI), temperatura da superfície do mar, concentração de clorofila-a e densidade de krill em relação ao número máximo de indivíduos avistados e o dia do pico de avistagem. O fotoperíodo mostrou ser o principal fator que influencia a migração da Antártica em direção a Abrolhos, enquanto que o caminho contrário, além de fotoperíodo, parece ser influenciado também pelo os fatores tais como temperatura da superfície do mar e a quantidade de presas disponíveis no verão anterior. Quanto maior a densidade de krill, maior o número máximo de indivíduos avistados e a temporada reprodutiva mais longa. O SOI mostrou ter influência no ciclo reprodutivo do krill. Valores negativos registraram maior densidade de krill e valores positivos, menor densidade de krill, através do modelo GLM. Altos valores de TSM apresentaram correlação negativa com a densidade de krill, e com o número de baleias avistadas e o tempo de permanência na área reprodutiva, indicando que o aquecimento da região antártica impõe condições não favoráveis para a temporada reprodutiva das baleias. / Exogenous factors, such as photoperiod, sea surface temperature and abundance of prey, and endogenous, such circadian and circannual cycles and metabolic changes are known as initiators of migratory movements. This work aims to establish the main parameters initiators of the migration of the humpback whales. The photoperiod, the Southern Ocean Index (SOI), the sea surface temperature, the chlorophyll-a concentration and the density of krill were analyzed in relation to the maximum number of individuals sighted and the duration of the reproductive season. The photoperiod showed to be the main factor that influences the migration from Antarctica to Abrolhos, while the opposite way, besides photoperiod, seemed to be influenced also by other factors such as sea surface temperature and the amount of prey available in the previous summer. The higher the density of krill, the greater the maximum number of individuals sighted and the longer the reproductive season. The SOI showed to have influence on the reproductive cycle of krill. Negative values correspond to higher density of krill, and positive values, lower density of krill, through the GLM model. High values of TSM presented negative correlation with the density of krill, and with the number of whales sighted and the reproductive season duration in the reproductive area, indicating that the Antartic warming impose unfavorable conditions for the reproductive season of whales. Baleia-Jubarte Fotoperíodo Gatilho Migratório Generalized Linear Models Humpback Whale Krill Krill Migration Trigger Modelos Lineares Generalizados Photoperiod
62	Modelagem de mortalidade natural e superdispersão em dados entomológicos / Modelling natural mortality and overdispersion in entomologic data Urbano, Mariana Ragassi 24 May 2012 (has links) Para dados provenientes de bioensaios entomol´ogicos, na maioria das vezes, ´e necess´ario levar em considera¸cao a ocorrencia de mortalidade natural e a superdispers ao. Para incorporar a mortalidade natural, pode-se utilizar a f´ormula de Abbott, que associada ao modelo binomial, caracteriza o modelo padrao de mortalidade natural. Modelos padroes de superdispersao incluem os modelos beta-binomial, log´stico normal, misturas discretas e o uso do fator de heterogeneidade. Como alternativa aos modelos padrao de mortalidade natural, e de mortalidade natural com o fator de heterogeneidade, foi desenvolvido o modelo de mortalidade natural com a inclusao de um efeito aleat´orio no preditor linear, para melhor acomodar a superdispersao. Para obter as estimativas dos parametros desse novo modelo, foram usados os algoritmos de Newton Raphson e EM. Para a verifica¸cao dos ajustes dos modelos foram usados gr´aficos semi-normais de probabilidade com envelopes de simula¸cao, e para realizar a compara¸cao entre os modelos foram utilizados o teste da razao de verossimilhan¸cas e o crit´eiro AIC. A seguir, foram calculadas as estimativas das doses efetivas. Os procedimentos foram todos implementados no software R. Como aplica¸cao, foram analisados tres conjuntos de dados, provenientes de ensaios entomol´ogicos. Para os tres conjuntos de dados, concluiu-se que o modelo de mortalidade natural com efeito aleat´orio ´e superior aos procedimentos padroes, geralmente, utilizados. / When fitting dose-response models to entomological data it is often necessary to take account of natural mortality and/or overdispersion. The standard approach to handle natural mortality is to use Abbotts formula, which allows for a constant underlying mortality rate. Standard overdispersion models include beta-binomial models, logistic-normal, discrete mixtures and the use of the heterogeneity factor. We extend the standard natural mortality model and include a random effect to handle the overdispersion. To obtain the parameters estimates of this new model, two algorithms were used: the Newton Raphson and the EM. For the application, were used three data sets. We introduce the likelihood ratio test, effective dose, and simulated envelope for the natural mortality model with a random effect. The procedures are implemented in the R system. For the three the data sets studied, a significant further improvement in the fit is possible by using the random-effect model. Algorithms Análise de dados Bioassays Bioensaios Data Analysis Generalized linear models Modelos lineares generalizados Modelos mistos Mortalidade natural Natural mortality
63	Modelos não lineares e lineares generalizados para avaliação da germinação de sementes de milho e soja / Non-linear and linear generalized models for evaluation of the germination of corn and soybean seeds Amorim, Deoclecio Jardim 24 January 2019 (has links) Submitted by DEOCLECIO JARDIM AMORIM (deocleciojardim@hotmail.com) on 2019-01-31T12:16:23Z No. of bitstreams: 1 DISSERTAÇÃO.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) / Approved for entry into archive by Ana Lucia de Grava Kempinas (algkempinas@fca.unesp.br) on 2019-01-31T18:34:52Z (GMT) No. of bitstreams: 1 amorim_dj_me_botfca.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) / Made available in DSpace on 2019-01-31T18:34:52Z (GMT). No. of bitstreams: 1 amorim_dj_me_botfca.pdf: 2351649 bytes, checksum: 9491438fdbbb72bee3416a1ee635f01f (MD5) Previous issue date: 2019-01-24 / Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq) / Dentre as características mais estudadas na indústria de sementes e bancos de germoplasma, destaca-se o potencial fisiológico, tendo em vista que sementes de maior qualidade fisiológica permitem obter uma rápida e uniforme emergência das plântulas, e consequentemente o estabelecimento do estande. O objetivo dessa pesquisa foi avaliar a germinação de sementes de milho (Zea mays L.) e soja (Glycine max (L.) Merrill) por meio de modelos não lineares e lineares generalizados. Foram utilizadas as cultivares de milho: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante e BRS 4103, e de soja as cultivares: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO e CD2857 RR, ambas da safra 2016/17. Avaliou-se a germinação de 20 sementes com quatro repetições por cultivar por meio do teste de emissão da raiz primária (protrusão). A contagem das sementes germinadas foi efetuada em intervalos regulares de 6, 12 e 24 horas até 204 horas, adotando-se como critério de germinação a protrusão da raiz primária ≥ 2 mm. Os dados foram dispostos na forma de porcentagem acumulada ao longo do tempo e pela proporção de sementes viáveis em cada intervalo de tempo testado dado por uma sequência de ensaios de Bernoulli. Os dados de porcentagem acumulada ao longo do tempo foram modelados pelas curvas não lineares de Gompertz e função de Hill de quatro parâmetros e os dados de proporção foram avaliados por modelos lineares generalizados testando as funções ligação: Probit, Logit e Complemento Log Log. As cultivares de milho que apresentaram a maior velocidade de germinação foram: AL Bandeirante e BRS 4103. Para soja os melhores resultados foram observados para as cultivares CD251 RR e CD2737 RR. As metodologias corroboraram quanto à classificação da qualidade fisiológica das cultivares. A curva de Gompertz teve melhor ajuste e permitiu aplicações práticas para o estudo de germinação estabelecendo um novo parâmetro para comparação de diferentes lotes de sementes. Os modelos lineares generalizados constituem uma metodologia robusta para avaliação da germinação de sementes de diferentes lotes e espécies agrícolas permitindo estimar qualquer tempo de germinação e uniformidade. / Among the most studied characteristics in the seed industry and germplasm banks, the physiological potential stands out, since seeds of higher physiological quality allow a quick and uniform emergence of the seedlings, and consequently the establishment of the stand. The objective of this research was to evaluate the germination of corn (Zea mays L.) and soybean (Glycine max (L.) Merrill) seeds using nonlinear models and generalized linear. The used cultivars of corn were: AS 1633 PRO3, 2B587 RR, 2A401PW, AL Bandeirante and BRS 4103, and the soybean cultivars were: DS59716 IPRO, CD2737 RR, CD251 RR, CD2820 IPRO and CD2857 RR, both of the 2016/17 crop. The germination of 20 seeds with four replicates per cultivar was evaluated by the primary root emission test (protrusion). The germinated seeds were counted at regular intervals of 6, 12 and 24 hours up to 204 hours, with protrusion of the primary root ≥ 2 mm being the germination criterion. The data were plotted as a percentage accumulated over time and by the proportion of viable seeds at each interval of time tested given by a sequence of Bernoulli assays. The data of percentage accumulated over time were modeled by the non-linear Gompertz curves and Hill function with four parameters and the proportion data were evaluated by generalized linear models testing the linking functions: Probit, Logit and Complement Log Log.The corn cultivars with the highest germination speed were: AL Bandeirante and BRS 4103. For soybean the best results were observed for the cultivars CD251 RR and CD2737 RR. The methodologies corroborate the classification of the physiological quality of cultivars. The Gompertz curve had a better adjustment and allowed practical applications for the study of germination, establishing a new parameter for comparison of different seeds lots. The generalized linear models constitute a robust methodology to evaluate the germination of seeds of different lots and agricultural species, allowing to estimate any germination and uniformity time. Generalized linear models Germinação Velocidade de germinação Modelos não lineares Modelos lineares generalizados Germination Speed of germination Non-linear models
64	Efeito de parâmetros ambientais na migração de baleias-jubarte (Megaptera novaeangliae) entre Mar de Scotia e Banco dos Abrolhos / Effect of environmental parameters in the migration of humpback whales (Megaptera novaeangliae) between Scotia Sea and Abrolhos Bank Daniela Rodrigues Abras 24 February 2015 (has links) Fatores exógenos, como fotoperíodo, temperatura da superfície do mar e abundância de presas, e endógenos, como os ciclos circadianos e circanuais e alterações metabólicas são conhecidos como iniciadores dos movimentos migratórios. Este trabalho tem como objetivo estabelecer os principais parâmetros iniciadores da migração das baleias-jubarte. Foram analisados o fotoperíodo, índice de oscilação do oceano austral (SOI), temperatura da superfície do mar, concentração de clorofila-a e densidade de krill em relação ao número máximo de indivíduos avistados e o dia do pico de avistagem. O fotoperíodo mostrou ser o principal fator que influencia a migração da Antártica em direção a Abrolhos, enquanto que o caminho contrário, além de fotoperíodo, parece ser influenciado também pelo os fatores tais como temperatura da superfície do mar e a quantidade de presas disponíveis no verão anterior. Quanto maior a densidade de krill, maior o número máximo de indivíduos avistados e a temporada reprodutiva mais longa. O SOI mostrou ter influência no ciclo reprodutivo do krill. Valores negativos registraram maior densidade de krill e valores positivos, menor densidade de krill, através do modelo GLM. Altos valores de TSM apresentaram correlação negativa com a densidade de krill, e com o número de baleias avistadas e o tempo de permanência na área reprodutiva, indicando que o aquecimento da região antártica impõe condições não favoráveis para a temporada reprodutiva das baleias. / Exogenous factors, such as photoperiod, sea surface temperature and abundance of prey, and endogenous, such circadian and circannual cycles and metabolic changes are known as initiators of migratory movements. This work aims to establish the main parameters initiators of the migration of the humpback whales. The photoperiod, the Southern Ocean Index (SOI), the sea surface temperature, the chlorophyll-a concentration and the density of krill were analyzed in relation to the maximum number of individuals sighted and the duration of the reproductive season. The photoperiod showed to be the main factor that influences the migration from Antarctica to Abrolhos, while the opposite way, besides photoperiod, seemed to be influenced also by other factors such as sea surface temperature and the amount of prey available in the previous summer. The higher the density of krill, the greater the maximum number of individuals sighted and the longer the reproductive season. The SOI showed to have influence on the reproductive cycle of krill. Negative values correspond to higher density of krill, and positive values, lower density of krill, through the GLM model. High values of TSM presented negative correlation with the density of krill, and with the number of whales sighted and the reproductive season duration in the reproductive area, indicating that the Antartic warming impose unfavorable conditions for the reproductive season of whales. Baleia-Jubarte Fotoperíodo Gatilho Migratório Krill Modelos Lineares Generalizados Generalized Linear Models Humpback Whale Krill Migration Trigger Photoperiod
65	Homogeneïtat d'estil en El Tirant Lo Blanc Riba Civil, Alexandre 20 September 2002 (has links) En la tesi s'aborda el problema de l'homogeneïtat d'estil en el Tirant lo Blanc mitjançant l'ús de l'estilometria. Les hipòtesis al voltant de l'autoria del Tirant lo Blanc van des de l'autoria única de Joanot Martorell a la intervenció d'un segon autor, be a l'última part de la novel·la o be al llarg de tota ella, passant per altres teories més heterodoxes. A la primera part de la tesi es fa un breu repàs dels problemes que aborda l'estilometria i d'algunes eines estadístiques útils a l'hora de fer un estudi quantitatiu de l'estil literari, es resumeix la qüestió de l'autoria del Tirant lo Blanc, i es descriu la base de dades que s'ha construït per la quantificació de l'estil en el Tirant. Per atacar el problema, hem començat adaptant tècniques d'anàlisi descriptiva de dades, com els gràfics de control i l'anàlisi de correspondències. Per explotar la base de dades, proposem un mètode pràctic per estimar un o més d'un punt de canvi en seqüències de normals, de binomials i de multinomials. El mètode es basa en l'ajust de models i troba els estimadors màxim versemblants del(s) punt(s) de canvi. També hem utilitzat un mètode cluster basat en l'ajust de models per a dades politòmiques, per a agrupar les files d'una taula de contingència. Vam començar l'estudi fent un estudi comparatiu de 12 maneres diferents de mesurar la riquesa i diversitat de vocabulari. Pel que fa a les unitats lexicomètriques la llargada de paraula i l'ús de paraules freqüents i lliures del context ens han sigut molt útils per a l'estimació del punt de canvi i l'atribució d'estil als capítols. L'ús de lletres, tot i ser menys útil, serveix per a reforçar l'evidència del que trobem amb les unitats abans esmentades. La llargada de frase i la de capítol no ens ha sigut útils per a determinar una frontera d'estil en el Tirant.Per tot el que hem anat trobant estem convençuts que hi ha un canvi sobtat en l'estil entre els capítols 371 i 382, que difícilment pot ser atribuïble a l'argument. També hem trobat que després del punt de canvi conviuen capítols amb els dos estils, el que probablement reforça la teoria de que un segon autor va afegir capítols sobre un original pràcticament acabat. De totes maneres, no ens pertoca a nosaltres descobrir que el canvi d'estil no pugui ser degut a altres raons. / En la tesis se aborda el problema de la homogeneidad de estilo en el Tirant lo Blanc mediante el uso de la estilometría. Las hipótesis sobre la autoría del Tirant lo Blanc van desde la autoría única de Joanot Martorell a la intervención de un segundo autor, bien en la última parte de la novela o bien a lo largo de toda ella, pasando por otras teorías más heterodoxas. En la primera parte de la tesis se hace un breve repaso de los problemas que aborda la estilometría i de algunas herramienta estadísticas útiles para el estudio cuantitativo del estilo literario, se resume la cuestión de la autoría del Tirant lo Blanc, y se describe la base de datos que s ha construido para la ciantificación del estilo en el Tirant. Para atacar el problema, hemos empezado adaptando técnicas de análisis descriptivo de datos, como los gráficos de control y el análisis de correspondencias. Para explotar la base de datos, proponemos un método práctico para estimar uno o más de un punto de cambio en secuencias de normales, de binomiales y de multinomiales. El método se basa en el ajuste de modelos y halla los estimadores máximo verosímiles del (de los) punto(s) de cambio. También hemo utilizado un método cluster basado en el ajuste de modelos para a datos politómicos, para agrupar las filas de una tabla de contingencia. Empezamos el estudio realizando un estudio comparativo de 12 formas diferentes de medir la riqueza y diversidad de vocabulario. Las unidades lexicométricas como la longitud de palabra y el uso de palabras frecuentes y libres del contexto nos han sido muy útiles para la estimación del punto de cambio y la atribución de estilo a los capítulos. El uso de letras, a pesar de ser menos útil, sirve para reforzar la evidencia de lo que hallamos con las unidades antes citadas. La longitud de frase y la de capítulo no nos han sido útiles para a determinar una frontera de estilo en el Tirant.Por todos los resultados que hemos ido obteniendo, estamos convencidos que hay un cambio repentino en el estilo entre los capítulos 371 y 382, que difícilmente puede ser atribuible al argumento. También hemos observado que después del punto de cambio conviven capítulos con los dos estilos, lo que probablemente refuerza la teoría de que un segundo autor añadió capítulos sobre un original prácticamente acabado. De todas maneras, no es nuestra misión descubrir que el cambio de estilo no pueda ser debido a otras razones. / This Ph.D. Thesis tackles the problem of the homogeneity of style in Tirant lo Blanc, using the statistical analysis of stylistic features that are measurable but rarely consciously controlled by the author. The goal is to determine whether the style in the book is homogeneous and, if it is not, to find stylistic boundaries. Tirant lo Blanc is the main work in Catalan literature, a chivalry book hailed to be 'the best book of its kind in the world' by Cervantes in Don Quixote, and is considered to be the first modern novel in Europe. There has been an intense and long lasting debate around its authorship originating from conflicting information given in its first edition; while the dedicatory letter states that Joanot Martorell takes sole responsibility for writing the book, the colophon states that the last quarter of the book was written by Martí Joan de Galba, after the death of Martorell. Neither of the two candidate authors left any text comparable to the one under study, and therefore one can not use discriminant analysis to help classify the chapters in the book by author. The majority opinion among medievalists leans towards the single-authorship hypothesis, even though there is a rather strong dissenting minority. In the first part of the thesis we summarize some useful statistical techniques for the quantitative analysis of literary style, we describe the problems that stylometry deals with and we give the state-of-the-art of the authorship attribution problem in Tirant lo Blanc. The data base built by the quantification of style is described as well. The analysis is started by the use of graphical, Statistical Process Control and Correspondence Analysis techniques. In order to obtain maximum likelihood estimates of one or more than one change points in either normal, binomial or multinomial sequences, we propose a practical method based on the fitting of Generalized Linear Models. A cluster method for the rows of a contingency table, based on the fitting of models, is proposed too. We analyze the evolution of the diversity of the vocabulary used in the book through twelve different diversity indices. Following the lead of the extensive stylometry literature, we use word length, and the use of function words to estimate the change point and the attribution of style to the 489 chapters of the book. The use of letters, in spite of being less useful, reinforces the evidences found with the units previously cited. The sentence length and the chapter length weren't useful to determine a style boundary in Tirant The statistical analysis consistently detects a change in style somewhere between chapters 371 and 382, even though a few chapters at the end have a style similar to the ones before that boundary. It is important to remark that even though the statistical analysis supports the existence of two authors, it is not up to us to exclude the possibility that the stylistic boundary found could be explained otherwise. categorical data estilometria index de diversitat generalized linear models change-point analysis llargada de paraules 1209. Estadística 311 60 80 82
66	Recursive Residuals and Model Diagnostics for Normal and Non-Normal State Space Models Frühwirth-Schnatter, Sylvia January 1994 (has links) (PDF) Model diagnostics for normal and non-normal state space models is based on recursive residuals which are defined from the one-step ahead predictive distribution. Routine calculation of these residuals is discussed in detail. Various tools of diagnostics are suggested to check e.g. for wrong observation distributions and for autocorrelation. The paper also covers such topics as model diagnostics for discrete time series, model diagnostics for generalized linear models, and model discrimination via Bayes factors. (author's abstract) / Series: Forschungsberichte / Institut für Statistik
67	A Method For Robust Design Of Products Or Processes With Categorical Response Erdural, Serkan 01 December 2006 (has links) (PDF) In industrial processes decreasing variation is very important while achieving the targets. For manufacturers, finding out optimal settings of product and process parameters that are capable of producing desired results under great conditions is crucial. In most cases, the quality response is measured on a continuous scale. However, in some cases, the desired quality response may be qualitative (categorical). There are many effective methods to design robust products/process through industrial experimentation when the response variable is continuous. But methods proposed so far in the literature for robust design with categorical response variables have various limitations. This study offers a simple and effective method for the analysis of categorical response data for robust product or process design. This method handles both location and dispersion effects to explore robust settings in an effective way. The method is illustrated on two cases: A foam molding process design and an iron-casting process design. QA Analysis 299.6-433
68	En applicering av generaliserade linjära modeller på interndata för operativa risker. Bengtsson Ranneberg, Emil, Hägglund, Mikael January 2015 (has links) Examensarbetet använder generaliserade linjära modeller för att identifiera och analysera enhetsspecifika egenskaper som påverkar risken för operativa förluster. Företag exponeras sällan mot operativa förluster vilket gör att det finns lite information om dessa förluster. De generaliserade linjära modellerna använder statistiska metoder som gör det möjligt att analysera all tillgänglig interndata trots att den är begränsad. Dessutom möjliggör metoden att analysera frekvensen av förlusterna samt magnituden av förlusterna var för sig. Det är fördelaktigt att göra två separata analyser, oberoende av varandra, för att identifiera vilka enhetsspecifika egenskaper som påverkar förlustfrekvensen respektive förlustmagnituden. För att modellera frekvensen av förlusterna används en Poissonfördelning. För att modellera magnituden av förlusterna används en Tweediefördelning som baseras på en semiparametrisk fördelning. Frekvens- och magnitudmodellen kombineras till en gemensam modell för att analysera vad som påverkar den totala kostnaden för operativa förluster. Resultatet visar att enhetens region, inkomst per tjänstgjord timme, storlek, internbetyg och erfarenhet hos personalen påverkar kostnaden för operativa förluster. / The objective of this Master’s Thesis is to identify and analyze explanatory variables that affect operational losses. This is achieved by applying Generalized Linear Models and selecting a number of explanatory variables that are based on the company’s unit attributes. An operational loss is a rare event and as a result, there is a limited amount of internal data. Generalized Linear Models uses a range of statistical tools to give reliable estimates although the data is scarce. By performing two separate and independent analyses, it is possible to identify and analyze various unit attributes and their impact of the loss frequency and loss severity. When modeling the loss frequency, a Poisson distribution is applied. When modeling the loss severity, a Tweedie distribution that is based on a semi-parametric distribution is applied. To analyze the total cost as a consequence of operational losses for a single unit with certain attributes, the frequency model and the severity model are combined to form one common model. The result from the analysis shows that the geographical location of the unit, the size of the unit, the income per working hour, the working experience of the employees and the internal rating of the unit are all attributes that affects the cost of operational losses. Generalized Linear Models Operational Risk internal data explanatory variables Generaliserade linjära modeller operativa risker interndata enhetsspecifika egenskaper
69	Statistical Methods for Dating Collections of Historical Documents Tilahun, Gelila 31 August 2011 (has links) The problem in this thesis was originally motivated by problems presented with documents of Early England Data Set (DEEDS). The central problem with these medieval documents is the lack of methods to assign accurate dates to those documents which bear no date. With the problems of the DEEDS documents in mind, we present two methods to impute missing features of texts. In the first method, we suggest a new class of metrics for measuring distances between texts. We then show how to combine the distances between the texts using statistical smoothing. This method can be adapted to settings where the features of the texts are ordered or unordered categoricals (as in the case of, for example, authorship assignment problems). In the second method, we estimate the probability of occurrences of words in texts using nonparametric regression techniques of local polynomial fitting with kernel weight to generalized linear models. We combine the estimated probability of occurrences of words of a text to estimate the probability of occurrence of a text as a function of its feature -- the feature in this case being the date in which the text is written. The application and results of our methods to the DEEDS documents are presented. Kernel Dating Documents Shingle Correspondence distance Smoothing Generalized linear models Logistics regression Local polynomial regression 0581 0463 0800
70	Bayesian model estimation and comparison for longitudinal categorical data Tran, Thu Trung January 2008 (has links) In this thesis, we address issues of model estimation for longitudinal categorical data and of model selection for these data with missing covariates. Longitudinal survey data capture the responses of each subject repeatedly through time, allowing for the separation of variation in the measured variable of interest across time for one subject from the variation in that variable among all subjects. Questions concerning persistence, patterns of structure, interaction of events and stability of multivariate relationships can be answered through longitudinal data analysis. Longitudinal data require special statistical methods because they must take into account the correlation between observations recorded on one subject. A further complication in analysing longitudinal data is accounting for the non- response or drop-out process. Potentially, the missing values are correlated with variables under study and hence cannot be totally excluded. Firstly, we investigate a Bayesian hierarchical model for the analysis of categorical longitudinal data from the Longitudinal Survey of Immigrants to Australia. Data for each subject is observed on three separate occasions, or waves, of the survey. One of the features of the data set is that observations for some variables are missing for at least one wave. A model for the employment status of immigrants is developed by introducing, at the first stage of a hierarchical model, a multinomial model for the response and then subsequent terms are introduced to explain wave and subject effects. To estimate the model, we use the Gibbs sampler, which allows missing data for both the response and explanatory variables to be imputed at each iteration of the algorithm, given some appropriate prior distributions. After accounting for significant covariate effects in the model, results show that the relative probability of remaining unemployed diminished with time following arrival in Australia. Secondly, we examine the Bayesian model selection techniques of the Bayes factor and Deviance Information Criterion for our regression models with miss- ing covariates. Computing Bayes factors involve computing the often complex marginal likelihood p(y\|model) and various authors have presented methods to estimate this quantity. Here, we take the approach of path sampling via power posteriors (Friel and Pettitt, 2006). The appeal of this method is that for hierarchical regression models with missing covariates, a common occurrence in longitudinal data analysis, it is straightforward to calculate and interpret since integration over all parameters, including the imputed missing covariates and the random effects, is carried out automatically with minimal added complexi- ties of modelling or computation. We apply this technique to compare models for the employment status of immigrants to Australia. Finally, we also develop a model choice criterion based on the Deviance In- formation Criterion (DIC), similar to Celeux et al. (2006), but which is suitable for use with generalized linear models (GLMs) when covariates are missing at random. We define three different DICs: the marginal, where the missing data are averaged out of the likelihood; the complete, where the joint likelihood for response and covariates is considered; and the naive, where the likelihood is found assuming the missing values are parameters. These three versions have different computational complexities. We investigate through simulation the performance of these three different DICs for GLMs consisting of normally, binomially and multinomially distributed data with missing covariates having a normal distribution. We find that the marginal DIC and the estimate of the effective number of parameters, pD, have desirable properties appropriately indicating the true model for the response under differing amounts of missingness of the covariates. We find that the complete DIC is inappropriate generally in this context as it is extremely sensitive to the degree of missingness of the covariate model. Our new methodology is illustrated by analysing the results of a community survey.

Search results