Global ETD Search

31	Essays on Non-Stationary Panel Analysis Surdeanu, Laura 09 January 2014 (has links) Tesi realitzada al Dept. d'Econometria, Estadística i Economia Espanyola / This thesis consists of three self-contained essays on non-stationary panel data. We propose novel approaches to both cointegration and unit root analysis in panel data models. The main contribution of this thesis is allowing for the presence of cross¬section dependence through the speciﬁcation of an approximate common factor model. Early studies assumed that time series in the panel data were either indepen¬dent or that cross-section dependence could be controlled by including time effects. In macroeconomic, microeconomic and ﬁnancial applications, cross-section depen¬dence is more a recurrent than a rare characteristic and it is usually caused by the presence of common shocks (oil price shocks or ﬁnancial crises) or the existence of local productivity spillover effects. Ignoring these factors can lead to spurious statistical inference. More exactly, in the case of unit root testing, the unaccounted cross-section dependence might lead one to conclude that panel data is actually I(0) stationary when in fact it might be I(1) non-stationary. Similarly, the panel data cointegration test statistics might indicate than there are more cointegrating relations than there exist. Thus, recent studies proposed several alternatives to over¬come this limitation. One popular approach is the factor structure applied to the error process, an approach that we employ throughout this thesis. In the ﬁrst essay we extend the univariate Carrion-i-Silvestre, Kim and Perron (2009) GLS-based unit root tests with multiple structural breaks to panel data. The proposed statistics are general enough that they allow for cross-section dependence and multiple structural breaks in both the level and the trend of the units of the panel. We evaluate the ﬁnite-sample properties of these statistics via Monte Carlo simulations. Our simulation study shows that the panel tests perform well, espe¬cially for the cases of known structural breaks. We apply these statistics to a panel of annual data covering the period 1870-2008 for 19 OECD countries. We ﬁnd strong evidence in favor of I(0) stationarity when we apply the unit root tests to idiosyncratic component. However, the empirical analysis also shows that the I(1) non-stationarity of the real per capita GDP is captured by the common factor. In the second essay we propose a test statistic to determine the cointegration rank of VAR processes both in a unit-by-unit analysis and in a panel data frame¬work. The cross-section dependence is accounted for through the speciﬁcation of a common factor model, which covers situations where there is cointegration among the cross-section dimension. We perform a Monte Carlo experiment in order to investigate the small-sample properties of the proposed panel statistic and the sim-ulation results indicate a good performance of the tests in terms of empirical size and power. We show that in some cases not accounting for common factors when they are present can lead to overestimating the cointegrating rank. We apply our proposed tests to two empirical applications using the variables involved in the money demand equation and the monetary exchange model. The money demand model detects two stochastic trends while the monetary exchange model detects three stochastic trends. In the third essay of this dissertation we investigate the cointegration relation between output, physical capital, human capital, public capital and labor for 17 Spanish regions observed over the period 1964-2000. The novelty of our approach is that we allow for cross-section dependence between the members of the panel using a common factor model. This is interesting because we allow the model speciﬁcation to capture unobservable variables (technological progress, total factor productivity) to be proxied by the common factors, something that has not been widely addressed in the literature. To see if the variables are cointegrated or not, we employ two different techniques at the panel level. More exactly, we compare the statistics from the single-equation method of Westerlund (2008) and Banerjee and Carrion-i-Silvestre (2011, 2013) with those from the VAR framework of Carrion¬i-Silvestre and Surdeanu (2011). Moreover, using the VAR method, we identify at least one common cointegrating relation among output, physical capital, human capital, public capital and labor. Finally, we use several estimators to estimate the long-run relation between these variables. 311 - Estadística
32	Experimental design applied to the selection of samples and sensors in multivariate calibration Ferré Baldrich, Joan 24 February 1998 (has links) Els models de calibratge multivariant relacionen respostes instrumentals (per exemple, espectres) d'un conjunt de mostres de calibratge amb quantitats de variables físiques o químiques tals com concentració d'analit, o índexs (per exemple, el nombre d'octà en gasolines). Aquesta relació es fa servir per predir aquestes quantitats a partir de les respostes instrumentals de noves mostres desconegudes, mesurades de la mateixa manera. La predicció emprant models de calibratge multivariants està esdevenint un pas comú en els procediments analítics. Per tant, l'habilitat del model de donar prediccions precises i no esbiaixades té una influència decisiva en la qualitat del resultat analític. És important que les mostres de calibratge i els sensors es triïn adequadament de manera que els models pugin representar adequadament el fenomen en estudi i assegurar la qualitat de les prediccions. En aquesta tesi s'ha estudiat la selecció de mostres de calibratge d'un a llista de mostres candidates en regressió sobre components principals (PCR) i la selecció de longituds d'ona en el model de mínims quadrats clàssics (CLS). El fonament l'ha donat la teoria del disseny estadístic d'experiments. En PCR, el nombre mínim de mostres de calibratge es tria emprant les respostes instrumentals de les mostres candidates. La concentració d'analit només cal determinar-la en les mostres seleccionades. S'han proposat diferents usos del criteri d'optimalitat D.En CLS, s'han interpretat diferents criteris per la selecció de longituds d'ona des del punt de vista de l'el·lipsoide de confiança de les concentracions predites. Els criteris també s'han revisat de manera crítica d'acord amb el seu efecte en la precisió, exactitud i veracitat (que s'han revisat d'acord amb les definicions ISO). Basat en la teoria del disseny d'experiments, s'han donat les regles per a la selecció de sensors. A demés, s'ha proposat un nou mètode per a detectar i reduir el biaix en les prediccions de noves mostres predites mitjançant CLS. Conclusions1. Criteris d'optimalitat del disseny d'experiment en MLR s'han aplicat per triar longituds d'ona de calibratge en CLS i el nombre mínim de mostres de calibratge en MLR i PCR a partir de les respostes instrumentals o scores de components principals d'una llista de candidats. Aquests criteris són un alternativa a (i/o complementen) el criteri subjectiu de l'experimentador. Els models construïts amb els punts triats per aquests criteris tenen una menor variància dels coeficients o concentracions i una millor habilitat de predicció que els models construïts amb mostres triades aleatòriament.2. El criteri D s'ha emprat amb èxit per triar mostres de calibratge en PCR i MLR, per triar un grup reduït de mostres per a comprovar la validesa de models de PCR abans d'estandarditzar-los i per triar longituds d'ona en CLS a partir de la matriu de sensibilitats. Les mostres de calibratge que són D òptimes generalment donen models de PCR i MLR amb una millor habilitat de predicció que quan les mostres de calibratge es trien aleatòriament o emprant l'algorisme de Kennard-Stone 3. Cal emprar algorismes d'optimització per trobar, els subconjunts de I punts òptims entre una llista de N candidats. En aquest treball es van emprar els algorismes de Fedorov, de Kennard-Stone i algorismes genètics.4. L'el·lipsoide de confiança de les concentracions estimades i la teoria del disseny d'experiments proporcionen el marc per interpretar l'efecte dels sensors triats amb aquests criteris en els resultats de predicció del model i per definir noves regles per triar longituds d'ona. 5. L'eficàcia dels criteris de selecció en CLS basats en la matriu de calibratge necessiten que no hi hagi biaix en la resposta dels sensors triats. La qualitat de les dades s'ha de comprovar abans de que s'empri el mètode de selecció de longituds d'ona. 6. La senyal analítica neta (NAS) és important pera comprendre el procés de quantificació en CLS i la propagació dels errors a les concentracions predites. S'han emprat diagnòstics tals com la sensibilitat, selectivitat i el gràfic de regressió del senyal analític net (NASRP), que es basen en el NAS d'un analit particular. S'ha vist que la norma del NAS està relacionada amb l'error de predicció. 7. El NASRP és una eina per a detectar gràficament si la resposta mesurada de la mostra desconeguda segueix el model calculat. La concentració estimada és el pendent de la recta ajustada als punts de gràfic. plot. Els sensors amb biaix es poden detectar i els sensors que segueixen el model es poden triar emprant la funció indicador d'Error i un mètode de finestres mòbils. / Multivariate calibration models relate instrumental responses (e.g. spectra) of a set of calibration samples to the quantities of chemical or physical variables such as analyte concentrations, or indexes (e.g. octane number in fuels). This relationship is used to predict these quantities from the instrumental response data of new unknown samples measured in the same manner. Prediction using multivariate calibration models is becoming one common step in the analytical procedure. Therefore, the ability of the model to give precise and unbiased predictions has a decisive influence on the quality of the analytical result. It is important that the calibration samples and sensors be carefully selected so that the models can properly represent the phenomenon under study and assure the quality of the predictions.We have studied the selection of calibration samples from the list of all the available samples in principal component regression (PCR) and the selection of wavelengths in classical least squares (CLS). The underlying basis has been given by experimental design theory. In PCR, the minimum number of calibration samples are selected using the instrumental responses of the candidate samples. The analyte concentration is only determined in the selected samples. Different uses of the D-criterion have also been proposed.In CLS, different criteria for wavelength selection have been interpreted from the point of view of the experimental design using the confidence hyperellipsoid of the predicted concentrations. The criteria have also been critically reviewed according to their effect on precision, accuracy and trueness (which are revised following ISO definitions). Based on the experimental design theory, new guidelines for sensor selection have been given. Moreover, a new method for detecting and reducing bias in unknown samples to be analyzed using CLS.Conclusions1. Optimality criteria derived from experimental design in MLR have been applied to select calibration wavelengths in CLS and the minimum number of calibration samples in MLR and PCR from the instrumental responses or principal component scores of a list of candidates. These criteria are an alternative (and/or a complement) to the experimenter's subjective criterion. The models built with the points selected with the proposed criteria had a smaller variance of the coefficients or concentrations and better predictive ability than the models built with the samples selected randomly 2. The D-criterion has been successfully used for selecting calibration samples in PCR and MLR, for selecting a reduced set of samples to assess the validity of PCR models before standardization and for selecting wavelengths in CLS from the matrix of sensitivities. D optimal calibration samples generally give PCR and MLR models with a better predictive ability than calibration samples selected randomly or using the Kennard-Stone algorithm. 3. Optimization algorithms are needed to find the optimal subsets of I points from a list of N candidates. Fedorov's algorithm, Kennard-Stone algorithm and Genetic Algorithms were studied here. 4. The confidence ellipsoid of the estimated concentrations and the experimental design theory provide the framework for interpreting the effect of the sensors selected with these criteria on the prediction results of the model and for deriving new guidelines for wavelength selection. 5. The efficacy of the selection criteria in CLS based on the calibration matrix requires there to be no bias in the response at the selected sensors. The quality of the data must be checked before a wavelength selection method is used. 6. The net analyte signal (NAS) is important to understand the quantification process in CLS and the propagation of errors to the predicted concentrations. Diagnostics such as sensitivity, selectivity and net analyte signal regression plots (NASRP) which are based on the NAS for each particular analyte have been used. The norm of the NAS has been found to be related to the prediction error . 7. The NASRP is a tool for graphically detecting whether the measured response of the unknown sample follows the calculated model. The estimated concentration is the slope of the straight line fitted to the points in this plot. The sensors with bias can be detected and the sensors that best follow the model can be selected using the Error Indicator function and a moving window method. disseny d'experiments Calibratge multivariant 311 512 543
33	Análisis de citas en publicaciones de usuarios de bibliotecas universitarias, El. Estudio de las tesis doctorales en informática de la Universidad Politécnica de Cataluña, 1996-1998. Urbano Salido, Cristóbal 13 June 2000 (has links) Los estudios sobre el uso de la información y sobre el comportamiento de los usuarios en la búsqueda de la misma ofrecen un conjunto de indicadores que son de un gran valor en la evaluación bibliotecaria. Si bien los estudios de usuarios han adquirido una importancia creciente en la evaluación bibliotecaria en España, son pocos los proyectos y estudios que han tratado los medios bibliométricos aplicados a los trabajos y publicaciones generadas por los usuarios de bibliotecas. Con el presente estudio se han perseguido dos objetivos principales: 1) Estudiar los fundamentos del análisis de citas y su aplicación al estudio de usuarios en bibliotecas universitarias a partir de las publicaciones por ellos generadas. 2) Experimentar algunas de las técnicas enunciadas mediante un caso práctico desarrollado en las Bibliotecas de la UPC. El trabajo experimental se ha desarrollado a partir de tesis doctorales del Departamento de Arquitectura de Computadores y del Departamento de Lenguajes y Sistemas Informáticos. Tras tratar estadísticamente 54 tesis y 6807 referencias bibliográficas se han obtenido diversos indicadores bibliométricos de la bibliográfia citada y se ha procedido al contraste con los datos ISI Journal Citation Report, con el catálogo de la biblioteca, con estadísticas del uso de la biblioteca y con la bibliografía sobre estudios bibliométricos del área de ciencias de la computación. Ciències de l'Educació 02 - Biblioteconomia. Documentació 311 - Estadística
34	Modelos lineales generalizados geoestadísticos basados en distancias Melo Martínez, Oscar Orlando 23 July 2013 (has links) En esta tesis se hace una mezcla del método de distancias con los modelos lineales generalizados mixtos tanto en lo espacial como en lo espacio-temporal. Con el empleo de las distancias se logran buenas predicciones y menores variabilidades en el espacio o espacio-tiempo de la región de estudio, provocando todo esto que se tomen mejores decisiones en los diferentes problemas de interés. Se propone un método alternativo para ajustar una variable respuesta tipo beta con dispersión variable usando distancias euclidianas entre los individuos. Se emplea el método de máxima verosimilitud para estimar los parámetros desconocidos del modelo propuesto y se presentan las principales propiedades de estos estimadores. Además, se realiza la inferencia estadística sobre los parámetros utilizando las aproximaciones obtenidas a partir de la normalidad asintótica del estimador de máxima verosimilitud; se desarrolla el diagnóstico y predicción de una nueva observación, y se estudia el problema de datos faltantes utilizando la metodología propuesta. Posteriormente, se propone una solución alterna para resolver problemas como el de prevalencia de Loa loa utilizando distancias euclidianas entre individuos; se describe un modelo lineal generalizado espacial mixto incorporando medidas generales de distancia o disimilaridad que se pueden aplicar a variables explicativas. En este caso, los parámetros involucrados en el modelo propuesto se estiman utilizando máxima verosimilitud mediante el método de Monte Carlo vía cadenas de Markov (MCMC). También se formula un modelo lineal beta espacial mixto con dispersión variable utilizando máxima verosimilitud mediante el método MCMC. El método propuesto se utiliza en situaciones donde la variable respuesta es una razón o proporción que esta relacionada con determinadas variables explicativas. Para este fin, se desarrolla una aproximación utilizando modelos lineales generalizados espaciales mixtos empleando la transformación Box-Cox en el modelo de precisión. Por lo tanto, se realiza el proceso de optimización de los parámetros tanto para modelo espacial de media como para el modelo espacial de dispersión variable. Además, se realiza la inferencia estadística sobre los parámetros utilizando las aproximaciones obtenidas a partir de la normalidad asintótica del estimador de máxima verosimilitud. También se desarrolla el diagnóstico del modelo y la predicción de nuevas observaciones. Por último, el método se ilustra a través de los contenidos de arcilla y magnesio. Adicionalmente, se describe el modelo basado en distancias para la predicción espacio-temporal usando modelos lineales generalizados. Se realiza el proceso de estimación de los parámetros involucrados en el modelo propuesto, mediante el método de ecuaciones de estimación generalizada y la inferencia estadística sobre los parámetros empleando las aproximaciones obtenidas a partir de la normalidad asintótica del estimador de máxima verosimilitud. Además, se desarrolla el diagnóstico del modelo y la predicción de nuevas observaciones. Se realiza una aplicación de la metodología propuesta para el número de acciones armadas estandarizada por cada 1000 km2 de los grupos irregulares FARC-EP y ELN en los diferentes departamentos de Colombia entre los años 2003 a 2009. Finalmente, se presenta un modelo autorregresivo espacial lineal generalizado mixto utilizando el método basado en distancias. Este modelo incluye retrasos tanto espaciales como temporales entre vectores de variables de estado estacionarias. Se utiliza la dinámica espacial de los datos econométricos tipo panel para estimar el modelo propuesto; los parámetros involucrados en el modelo se estiman utilizando el método MCMC mediante máxima verosimilitud. Además, se discute en este capítulo la interacción entre estacionariedad temporal y espacial, y se derivan las respuestas al impulso para el modelo propuesto, lo cual naturalmente depende de la dinámica temporal y espacial del modelo. / In the context of regression with a beta-type response variable, we propose a new method that links two methodologies: a distance-based model, and a beta regression with variable dispersion. The proposed model is useful for those situations where the response variable is a rate, a proportion or parts per million. This variable is related with a mixture between continuous and categorical explanatory variables. We present its main statistical properties and some measures for selection of the most predictive dimensions in the model. Furthermore, the prediction of a new observation and the problem of missing data are also developed. Using the proposed model, the mutual funds are analyzed employing the Gower distance for both the mean model and the variable dispersion model. Also, we present a new method based on distances, which allows the modeling of continuous and non-continuous random variables through distance-based spatial generalized linear mixed models (SGLMMs). The parameters are estimated using Markov chain Monte Carlo (MCMC) maximum likelihood. The method is illustrated through the analysis of the variation in the prevalence of Loa loa among a sample of village residents in Cameroon, where the explanatory variables included elevation, together with maximum normalized-difference vegetation index (NDVI) and the standard deviation of NDVI calculated from repeated satellite scans over time. Additionally, we propose a beta spatial linear mixed model with variable dispersion using MCMC. An approach to the SGLMMs using the Box-Cox transformation in the precision model is developed. Thus, the parameter optimization process is made for both the spatial mean model as the spatial variable dispersion model. Statistical inference over the parameters is performed using approximations obtained from the asymptotic normality of the maximum likelihood estimator. Diagnosis and prediction of a new observation are also developed. This model is illustrated through of the clay and magnesium contents. On the other hand, we present a solution to problems where the response variable is a count, a rate or a binary (dichotomous) using a refined distance-based generalized linear space-time-autoregressive model with space-time-autoregressive disturbances. This model may also contain additional spatial exogenous variables as well as time exogenous variables. The parameter estimation process is done by the space-time generalized estimating equations (GEE) method, and a measure of goodness-of-fit is presented. Also, the best linear unbiased predictor for prediction purposes is presented. An application for the standardized number of armed actions per 1000 km2 of rebel groups FARC-EP and ELN in different departments of Colombia from 2003 to 2009 is employed to illustrate the proposed methodology. Finally, a spatial generalized linear mixed autoregressive model using distance-based is defined including spatial as well as temporal lags between vectors of stationary state variables. Although the structural parameters are not fully identified in this model, contemporaneous spatial lag coefficients may be identified by exogenous state variables. Dynamic spatial panel data econometrics is used to estimate our proposed model. In this way, the parameters are estimated using MCMC maximum likelihood. We also discuss the interaction between temporal and spatial stationarity, and we derive the impulse responses for our model, which naturally depend upon the temporal and spatial dynamics of the model. Ciències Experimentals i Matemàtiques 311 - Estadística
35	Lleis d'escala i complexitat estructural de les infraestructures tecnològiques. Els sistemes biològics com a analogia pel disseny i optimització del transport i distribució de l'energia elèctrica Horta Bernús, Ricard 25 April 2014 (has links) Within the paradigm of sustainability, this thesis aims to provide complementary views to the conventional engineering in order to put forward tools that facilitate, on the one hand the search of solutions to improve the design of high-voltage power lines , and on the other hand to optimize their networks. As per the improving of a high-voltage line design, Scaling Laws and Gravity Models are used to provide new variables to estimate calculation of the electricity demand of a certain area. This article proposes modifications to the vector diagram of a transmission line operation, known as a Perrine-Baum Diagram to incorporate the new variables proposed with the objective to use this diagram as an application tool in real projects. Regarding the electric power networks, this work applies the analysis tools provided by the theories of Complex Networks to study its topology and spatial features in order to suggest more optimal designs. / Dins del paradigma de la sostenibilitat, aquesta tesi pretén aportar punts de vista complementaris als de l’enginyeria convencional amb l’objectiu de proposar eines que facilitin per una banda, trobar solucions per millorar el disseny de les línies elèctriques d’alta tensió, i per una altra optimitzar les xarxes constituïdes per aquestes. Pel que fa a la millora del disseny d’una línia d’alta tensió, s’han utilitzat les teories de les Lleis d’Escala i els Models Gravitacionals per aportar noves variables als càlculs de previsió de demanda elèctrica d’una regió. S’han proposat modificacions al diagrama vectorial de funcionament d’una línia elèctrica, conegut com a Diagrama Perrine-Baum, per incorporar les noves variables proposades amb l’objectiu que aquest diagrama pugui ser una eina d’aplicació en projectes reals. Pel que fa a les xarxes elèctriques s’han aplicat les eines d’anàlisi aportades per les teories de Xarxes Complexes per estudiar la seva topologia i les característiques espacials, amb l’objectiu de proposar dissenys més òptims. 311 - Estadística
36	Modelado y resolución de problemas de secuenciación en contexto JIT/DS mediante BDP Cano Pérez, Alberto 20 June 2014 (has links) This document develops the doctoral thesis entitled "Modelado y resolución de problemas de secuenciación en contexto JIT/DS mediante BDP". Through a literature review we explain the problems, within the environment of the sequence problems in production systems. Then, we describe the problems under study, the MMSP-W (Mixed-Model Sequencing Problem with Work overload Minimization), the BFSP (Blocking Flow Shop Problem) and the ORV (Output Rate Variation Problem). Then, a resolution procedure based on BDP (Bounded Dynamic Programming) to solve the problems is proposed. Finally, the results of the proposed procedure are compared with others based on for example linear programming or heuristics. / Este documento desarrolla la tesis doctoral titulada “Modelado y resolución de problemas de secuenciación en contexto JIT/DS mediante BDP”. A través de una revisión de la literatura se explican los problemas objeto de estudio, enmarcándolo dentro del entorno de la secuenciación en los sistemas productivos. A continuación se describen dichos problemas, el MMSP-W (Mixed-Model Sequencing Problem with Work overload Minimization), el BFSP (Blocking Flow Shop Problem) y el ORV (Output Rate Variation Problem). Finalmente se proponen algoritmos de resolución para dichos problemas basados en BDP (Bounded Dynamic Programming o programación dinámica acotada) y se compara la calidad de los procedimientos propuestos con otros basados en programación lineal o heurísticas 004 - Informàtica 311 - Estadística 51 - Matemàtiques
37	Mixed models and point processes Serra Saurina, Laura 22 November 2013 (has links) The main objective of this Thesis is to model the occurrence of wildfires and, in particular, knowing the factors with more influence, to evaluate how they are distributed in space and time. The Thesis presents three major goals. Firstly it has been analysed if data follows a particular pattern or behaves randomly. Secondly, because of fire distribution is variable in time, a model which includes the temporal component is used. Finally, it has been analysed those fires that burn areas greater than a given extension of hectares (50ha, 100ha or 150ha); even though they represent a small percentage of all fires, they signify a high percentage of the area burned and cause important environmental damage. The results presented may contribute to the prevention and management of wildfires. In addition, the methodology used in this work can be useful to determine those factors that help any fire to become a big wildfire / L’objectiu principal d’aquesta tesi és modelitzar l’ocurrència dels incendis i, en particular, analitzar la variabilitat del seu comportament en funció de l’espai i el temps tot coneixent quins són els factors que, amb més o menys intensitat, influeixen en el seu comportament. Es plantegen tres grans objectius. Primerament, s’analitza si les dades segueixen un patró determinat o altrament tenen un comportament aleatori. En segon lloc, s’estudia la variabilitat temporal dels incendis i s’aplica un model que incorpora la component temporal. Finalment, s’analitzen els incendis més grans que una extensió específica fixada (50ha, 100ha o 150ha) que, tot i no ser els més freqüents, són els que més mal mediambiental ocasionen. Els resultats obtinguts poden contribuir a la prevenció i a la gestió dels incendis forestals. A més, la metodologia utilitzada és útil per conèixer quins són els factors que fan que un incendi es converteixi en un gran incendi forestal 311 - Estadística 517 - Anàlisi 630 - Silvicultura. Arboricultura
38	Métodos estadísticos para tratar incertidumbre en estudios de asociación genética: aplicación a CNVs y SNPs imputados Subirana Cachinero, Isaac 18 September 2014 (has links) En los últimos años, se han descubierto un gran número de variantes genéticas de distinta naturaleza, desde las más simples que indican un cambio en un nucleótido (SNPs), hasta otras más complejas referentes al número de copias de un segmento de la cadena de ADN (CNVs). A pesar de que existen otras muchas variantes, como son las inversiones, microsatélites, etc., esta tesis se ha focalizado en los SNPs y en los CNVs, ya que son los dos tipos de variantes más analizadas en los estudios de epidemiología genética. En muchas situaciones, los métodos para analizar el efecto que tienen los SNPs o los CNVs sobre las enfermedades están bien resueltos. Sin embargo, en algunos casos, los SNPs y los CNVs se observan con incertidumbre. Por ejemplo, a veces el genotipo para un SNP no se observa directamente sino que se imputa. A su vez, establecer el número de copias para un CNV se hace de forma indirecta a partir de la señal cuantitativa de su sonda (probe). Esto hace que se requieran métodos estadísticos “no estándar” apropiados para estudiar la asociación entre SNPs imputados o CNVs incorporando esta incertidumbre. En la literatura se han descrito diferentes estrategias para afrontar los estudios de asociación entre una variante genética medida con incertidumbre y una variable respuesta: (i) la estrategia Naive y (ii) la estrategia conocida como Dosage. A grosso modo, la primera no tiene en cuenta la incertidumbre, mientras que la segunda lo hace de forma aproximada. En esta tesis doctoral se proponen y describen analíticamente modelos estadísticos para tratar datos genéticos medidos con incertidumbre que solventen las limitaciones que presentan los métodos existentes. Se demuestra que dichos modelos tienen la característica de incorporar la incertidumbre de forma adecuada en la función de verosimilitud. Además, se han escrito algoritmos numéricos para maximizar la función de verosimilitud de manera eficiente, a fin de poder analizar centenares de miles de variantes genéticas (estudios conocidos como GWAS –Genome Wide Association Studies-). El modelo propuesto es capaz de analizar distintos tipos de variable respuesta: binario (presencia o no de cierta enfermedad), cuantitativa (nivel de colesterol en sangre) ó censurada (tiempo hasta recaída). No sólo se han diseñado técnicas para el análisis de las variantes genéticas de forma individual sino también para pares simultáneamente (interacciones). Todo ello se ha implementado en distintas funciones estructuradas e integradas como parte de un programa de código libre y de uso común en la epidemiología genética como es R. Además, se ha escrito parte del código de las funciones en lenguaje C++ a fin de que los cálculos sean mucho más rápidos. El resultado ha sido la creación de un package de R llamado CNVassoc juntamente con un extenso manual de uso con numerosos ejemplos e instrucciones (vignette). Los artículos que conforman esta tesis son los siguientes: • “Accounting for uncertainty when assessing association between copy number and disease: a latent class model”, donde se presenta y describe el modelo propuesto. • “Genetic association analysis and meta-analysis of imputed SNPs in longitudinal studies”, donde se amplía el modelo al análisis de SNPs imputados en estudios con respuesta del tipo “tiempo hasta evento” (longitudinales). • “Interaction association analysis of imputed SNPs in case control and longitudinal studies”, donde se aplica el modelo a interacciones de pares de SNPs imputados en estudios de casos y controles y en estudios longitudinales. • “CNVassoc: Association analysis of CNV data using R”, en que se describe el package desarrollado e implementado en R junto con su vignette. / In the last years, a large number of genetic variants have been discovered, from the simplest ones indicating a change in a nucleotide (SNPs), until the much more complex ones which are repetitions in a segment of DNA chain (CNVs). Although it exist more genetic variants such as microsatellites, inversions, etc. this thesis has focused on SNPs and CNVs, since these variants are the most analyzed by far. In many cases, the methods to analyze the effect of SNPs or CNVs on a disease are well solved. However, in some cases, SNPs and CNVs are measured with uncertainty. For example, sometimes the genotype for a SNP has not been directly observed but has been imputed instead. At the same time, to establish the number of copies for a CNV is done indirectly from the quantitative signal by a designed probe. This makes necessary “no standard” and appropriated statistical methods to study the association between imputed SNPs or CNVs incorporating this uncertainty. Several strategies have been described in the literature to perform association studies between a genetic variant measured with uncertainty and a response: (i) Naive strategy and (ii) a strategy known as Dosage. A grosso modo, the first does not take into account uncertainty, while the second does but in an approximated way. In this thesis, a statistical method is proposed to deal with genetic data measured with uncertainty and overcome the limitations of other existing methods. This method has been described analytically, which incorporates the uncertainty in the model likelihood properly. Also, numerical algorithms have been built to maximize the likelihood in an efficient way in order to analyze hundreds of thousand variants in a reasonable time (GWAS –Genome Wide Association Studies-). All this has been implemented in several functions structured and integrated as part of a free and very popular software in genetic epidemiology called R. Also, part of the code has been translated to C++ to speed up the process. Quantitative, binary or time-to-event response types are supported by the proposed method, covering the most popular designs in genetic association studies: case-control, quantitative traits or longitudinal studies. The method has been accommodated to perform interaction analysis (epistasis), as well. Ciències Experimentals i Matemàtiques 311 - Estadística
39	Aplicació dels models de Thrustone i de Bradely-Terry a l’anàlisi de dades “ranking” obtingudes de mesures de preferència en escala ipsativa. Girabent i Farrés, Montserrat 04 November 2013 (has links) La recerca va néixer de l’interès de mesurar les preferències dels individus quan se’ls demana que ordenin una llista d’opcions, ja siguin conductes o objectes, obtenint així dades rànquing. Això determina que l’individu està forçat a establir un ordre entre les seves preferències, donant lloc al que es coneix com a escala de mesura ipsativa ordinal. Aquest tipus de mesura té com a avantatge front a la d’escala normativa, com Likert, que disminueix la probabilitat del conegut problema d’ “acquiescense bias” i s’elimini l’efecte “halo and horn”. Per altre banda, la principal característica del vector de respostes es que la suma dels seus components serà sempre una mateixa constant i això dificulta l’anàlisi de les dades. El primer objectiu fou el de revisar els models estadístics per analitzar dades rànquing mesurades en escala ipsativa que donen informació sobre el procés discriminador. El segon fou estendre aquests quan es tenen mesures repetides de l’elecció dels individus respecte les seves preferències i/o quan es consideren covariables referents a característiques dels propis individus o de les alternatives a ordenar. La primera teoria que marca l’ús d’una escala ipsativa és la llei dels judicis comparatius de Thurstone (1927), on es postula que quan a un individu se li demana que emeti un judici es produeix un procés discriminador en el contínuum psicològic. És en aquesta escala continua no observada en la que rau el interès a fi de veure el perfil de preferències en termes d’ordre i distància entre les opcions. La metodologia avaluada per a trobar les solucions en escala d’interval continua, es va fonamentar en dues aproximacions. La primera, treballada per el grup de Böckenholt (1991-2006) es base en els models clàssics desenvolupats per Thurstone al 1931. En aquesta les observacions ordenades s’expressen com a diferències de les variables latents subjacents a cada un dels ítems de comparació. Així, imposant les restriccions proposades per Maydeu-Olivares (2005) a la matriu de covariàncies, s’obté un cas particular d’un model d’equacions estructural (SEM). Aquest permet estimar les mitjanes de les variables latents que correspondran a la posició de cada opció en l’escala continua d’interval. Si bé, la solució depèn de que es compleixi la condició de normalitat de les variables latents i l’algorisme no troba solució a partir de cert nombre d’opcions. A més el model no permet modelar situacions de mesures repetides. En la segona aproximació es troben els treballs del grup de Dittrich (1998-2012) basats en els models de Bradley-Terry (BTM) del 1952. Els BTM consideren que la distribució de cada un dels judicis aparellats segueix una llei Binomial. Així, treballant directament amb la taula de contingències, es pot expressar la funció de versemblança com un model log-lineal general (LLBTM). És a partir d’aquest segon model, i de les seves extensions per a covariables que proposem l’extensió pel cas de mesures repetides. Les diferents propostes metodològiques es van provar tant per dades simulades com en dos exemples reals de l’àmbit de l’educació en ciències de la salut. En un, s’estudien les preferències sobre l’estil d’aprenentatge (Test Canfield) d’estudiants de medicina i en l’altre es valora si l’opinió dels estudiants de fisioteràpia sobre les activitats d’autoaprenentatge és diferent abans i després de realitzar-les. Com a conclusions, • La diferència entre les aproximacions de Thurstone y Bradley-Terry rau en la distribució que segueix la funció de versemblança. • El model LLBTM permet incorporar modificacions a les condicions d’aplicació que donen lloc a cada una de les extensions del model que incorporen covariables. • El model LLBTM permet una extensió en la que la comparació entre les opcions no sigui independent donant lloc als models per a mesures repetides. / The research focus on measuring individual preferences when people are asked to sort a list of options, thus obtaining data ranking. This determines that the subject is forced to establish an order between their preferences, resulting in what is known as ordinal Ipsative measurement scale.The advantage of this type of measure over the normative measurement scale such as Likert, which reduces the likelihood of problems known as "acquiescence bias" and removed the effect "halo and horn". However, they statistical analysis is difficult because the vector-response sums always a constant.The objectives were to review the statistical models to analyze the preferences measured in Ipsative scale, to give information about the discriminating process and to extend these models when we had repeated measures and / or covariates.The law of comparative judgments (Thurstone, 1927) postulated that this process occurs in discriminatory psychological continuum. This continuum scale allows finding the distance between the options.The methodology evaluated based on two approaches. First, the working of Böckenholt group (1991-2006) based on classical models developed by Thurstone in 1931. They expressed the ranking data as differences in the latent variables underlying each of items for comparison. So imposing the Maydeu-Olivares (2005) restrictions on the covariance matrix, we obtain a special case of a structural equation model to estimate the means of the latent variables that correspond to the position each option in the continuous interval scale. While the answer depends on the Normality of the latent variables. In addition, the model not allows to have repeated measurements. The second approach is the work of the Dittrich group (1998-2012) based on Bradley-Terry model (1952), which assumes a binomial distribution of the pairs of comparison. Thus, the likelihood function expressed as a general log-linear model (LBTM). The extension we developed is from LBTM.The aim of first applied study was to known the learning style preferences of medical students. The purpose of the second study was assessed whether physiotherapy students' opinions about self-learning is different before and after perform them.Conclusions:• The difference between the approaches of Thurstone and Bradley-Terry lies in the likelihood function distribution.• The model BTM allows incorporate modifications to the application conditions that give rise the extensions incorporating covariates and consider repeated measures. Ciències Experimentals i Matemàtiques 311 - Estadística
40	Economic evaluation in health research: cohort simultation and applications Pérez Álvarez, Nuria 23 July 2014 (has links) Currently, resources that may be spent in health care are limited so it is necessary to rationalize their consumption and prioritise their allocation to the options with higher health outcome and economic sustainability. It is for that reason that economic analyses are increasingly included in medicine research as an instrument for evaluating different therapeutic strategies. In this thesis, both cost and health outcome are separately and jointly evaluated to compare different therapeutic strategies to treat diseases in different and specific health areas. The challenge was adapting and implementing the methods to reflect the assessed health issue. The analyses require data, and the main sources to obtain them are clinical studies (prospective or retrospective), or simulation models. The use of simulations avoids to experiment directly to the system of interest, these methods imply a smaller time consumption and cost, and any danger can be caused by the experimentation performance. However, the simulated data always is going to be an approximation of real data. Real data of a clinical trial was used in the assessment of the adherence to antiretroviral treatment promotion program in HIV infected patients. A decision tree was used to study the cost per health gain, measured by means of clinical and health related quality of life outcomes. The simulation of a Spanish cohort of postmenopausal women and their possible osteoporotic fractures was done to assess the performance of two treatments for the prevention of vertebral and non-vertebral fractures in terms of cost-effectiveness. Simulation by means of a Markov model required that the disease evolution and the related events were simplified using a finite number of health states and the probabilities of moving from one state to another as the time go on. Markov models were adapted to reflect that the risk of suffering an event can change over time. This analytical model was applied to elucidate whether co-receptors testing is cost-effective to determine patient¿s suitability to benefit from the use of an antiretroviral treatment that includes maraviroc. All HIV strains require binding to CD4 plus at least one of the 2 co-receptors CCR5 or CXCR4 to enter human cells. Some HIV can use both co-receptors, and some individuals have a mixture of strains. Only patients with exclusively CCR5-tropic HIV are considered eligible to use the CCR5 antagonist maraviroc. A budget impact analyses to assess the economic effects of introducing eculizumab for treating the paroxysmal nocturnal hemoglobinuria was performed. Direct and indirect costs of this disease treatment were estimated and reported from the perspective of the health care system and from the societal perspective. Most of the published clinical studies are focused on measuring health in terms of efficacy and/or safety. But, sometimes the health and well-being quantification is not a direct measurement. Here, the calculation of the burden of disease for osteoporotic women who may suffer from fractures done at an individual level was presented in terms of disability adjusted life years (DALYs). Few studies of burden of diseases are available, and even less for Spanish population and performed using individual characteristics. The pharmacoeconomic studies can be useful in the health resources rationalization, and both budget impact analyses and new health measures are complementary tools. The work performed in this thesis constitutes a good example of methods application and adaptation to answer real clinical questions. / Actualment, els recursos destinats a la salut són limitats i es fa necessari racionalitzar-ne el seu consum; cal prioritzar la despesa en opcions que reportin un major benestar i que siguin sostenibles econòmicament. És per aquest motiu, que cada cop hi ha més estudis clínics que inclouen paràmetres econòmics com a instrument per triar entre diferents estratègies terapèutiques. En aquesta tesi, tant el cost com la resposta de salut s'estudien per separat i conjuntament per a comparar diferents estratègies per al tractament de malalties de diferent àrees de salut. El repte es troba en adaptar i implementar la metodologia necessària per a dur a terme el seguiment d'aquests problemas de salut. Les anàlisis requereixen dades i la majoria d'elles provenen d'estudis clínics, ja siguin prospectius o retrospectius, o bé de models de simulació. L'ús de dades simulades evita experimentar directament amb el sistema d'interès, implicant un temps menor, un cost més econòmic i un decreixement del risc d'experimentació necessaris per a l'obtenció de resultats. Per contrapartida, la simulació és una aproximació de les dades reals. D'una banda, l'avaluació d'un programa de promoció de l'adherència al tractament antiretroviral per pacients VIH+ es du a terme amb dades reals recollides en el marc d'un assaig clínic. El cost per unitat de guany en salut, mesurat amb paràmetres clínics i en qualitat de vida, es representa mitjançant un arbre de decisió. A continuació, la simulació d'una cohort de dones espanyoles post menopàusiques i les seves possibles fractures òssies permet comparar, en termes de cost-efectivitat, dos tractaments usats en la prevenció de fractures vertebrals i no vertebrals. Els models de Markov permeten simular el curs de la malaltia fent servir un nombre finit d'esdeveniments que representen els possibles estats de salut i la probabilitat de que un pacient canviï d'estat amb l'adaptació dels models de Markov per permetre que el risc de patir un esdeveniment variï en el temps permet determinar si els tests de co-receptors del VIH són cost-efectius per decidir si un pacient es pot beneficiar del tractament antiretroviral amb maraviroc. Les cadenes de VIH s'han d'unir als CD4, com a mínim, en un dels dos co-receptors possibles, CCR5 o CXCR4, per entrar a la cèl.lula. Alguns virus VIH fan servir els dos co-receptors i alguns individus tenen una mixtura de cadenes de VIH. Els pacients que es poden beneficiar de l'ús de maraviroc son aquells que estan infectats únicament per virus amb el co-receptor CCR5. Per últim, una anàlisis d'impacte pressupostari permet quantificar la despesa econòmica d'introduir eculizumab per a tractar la hemoglobinúria paroxística nocturna. En aquest cas, s'ha fet una estimació de costos directes i indirectes i es reporten des de la perspectiva del sistema sanitari i des de la perspectiva de la societat. La major part d'estudis clínics publicats es centren en la mesura de la salut en termes d'eficàcia i/o de seguretat; però moltes vegades, la salut i el benestar no es poden quantificar de manera directa. En aquest cas, la càrrega de la malaltia de l'osteoporosi en dones post menopàusiques ha estat quantificada mitjançant anys de vida viscuts amb discapacitat (DALYs) calculats a partir de dades obtingudes a nivell d'individu. Actualment es disposa de pocs estudis de càrrega de malaltia, i menys encara per a població espanyola i amb dades individuals. Els estudis farmacoeconòmics són d'utilitat en l'assignació de recursos, i els estudis d'impacte pressupostari i la generació de noves mesures per a quantificar la salut i el benestar son eines complementàries. El treball realitzat en aquesta tesis és un bon exemple d'aplicació i adaptació de l'estadística per a respondre diferents qüestions de rellevància clínica actualment. 311 - Estadística

Search results