Global ETD Search

231	Analyse intégrative de données de grande dimension appliquée à la recherche vaccinale / Integrative analysis of high-dimensional data applied to vaccine research Hejblum, Boris 06 March 2015 (has links) Les données d’expression génique sont reconnues comme étant de grande dimension, etnécessitant l’emploi de méthodes statistiques adaptées. Mais dans le contexte des essaisvaccinaux, d’autres mesures, comme par exemple les mesures de cytométrie en flux, sontégalement de grande dimension. De plus, ces données sont souvent mesurées de manièrelongitudinale. Ce travail est bâti sur l’idée que l’utilisation d’un maximum d’informationdisponible, en modélisant les connaissances a priori ainsi qu’en intégrant l’ensembledes différentes données disponibles, améliore l’inférence et l’interprétabilité des résultatsd’analyses statistiques en grande dimension. Tout d’abord, nous présentons une méthoded’analyse par groupe de gènes pour des données d’expression génique longitudinales. Ensuite,nous décrivons deux analyses intégratives dans deux études vaccinales. La premièremet en évidence une sous-expression des voies biologiques d’inflammation chez les patientsayant un rebond viral moins élevé à la suite d’un vaccin thérapeutique contre le VIH. Ladeuxième étude identifie un groupe de gènes lié au métabolisme lipidique dont l’impactsur la réponse à un vaccin contre la grippe semble régulé par la testostérone, et donc liéau sexe. Enfin, nous introduisons un nouveau modèle de mélange de distributions skew t àprocessus de Dirichlet pour l’identification de populations cellulaires à partir de donnéesde cytométrie en flux disponible notamment dans les essais vaccinaux. En outre, nousproposons une stratégie d’approximation séquentielle de la partition a posteriori dans lecas de mesures répétées. Ainsi, la reconnaissance automatique des populations cellulairespourrait permettre à la fois une avancée pratique pour le quotidien des immunologistesainsi qu’une interprétation plus précise des résultats d’expression génique après la priseen compte de l’ensemble des populations cellulaires. / Gene expression data is recognized as high-dimensional data that needs specific statisticaltools for its analysis. But in the context of vaccine trials, other measures, such asflow-cytometry measurements are also high-dimensional. In addition, such measurementsare often repeated over time. This work is built on the idea that using the maximum ofavailable information, by modeling prior knowledge and integrating all data at hand, willimprove the inference and the interpretation of biological results from high-dimensionaldata. First, we present an original methodological development, Time-course Gene SetAnalysis (TcGSA), for the analysis of longitudinal gene expression data, taking into accountprior biological knowledge in the form of predefined gene sets. Second, we describetwo integrative analyses of two different vaccine studies. The first study reveals lowerexpression of inflammatory pathways consistently associated with lower viral rebound followinga HIV therapeutic vaccine. The second study highlights the role of a testosteronemediated group of genes linked to lipid metabolism in sex differences in immunologicalresponse to a flu vaccine. Finally, we introduce a new model-based clustering approach forthe automated treatment of cell populations from flow-cytometry data, namely a Dirichletprocess mixture of skew t-distributions, with a sequential posterior approximation strategyfor dealing with repeated measurements. Hence, the automatic recognition of thecell populations could allow a practical improvement of the daily work of immunologistsas well as a better interpretation of gene expression data after taking into account thefrequency of all cell populations. Analyse intégrée Analyse par groupe de gènes Bayesien non paramétrique Connaissance a priori Cytométrie en flux Dimorphisme sexuel Distribution skew t Données de grande dimension Fenêtrage automatisé Grippe Génomique Modèle de mélange Processus de Dirichlet Vaccin VIH Automated gating Dirichlet process Flow cytometry Flu Gene set analysis Highdimensional data HIV Integrative analysis Mixture model Nonparametric Bayesian Prior knowledge Sexual dimorphism Skew t-distribution Statistical genomics Vaccine
232	Hard and fuzzy block clustering algorithms for high dimensional data / Algorithmes de block-clustering dur et flou pour les données en grande dimension Laclau, Charlotte 14 April 2016 (has links) Notre capacité grandissante à collecter et stocker des données a fait de l'apprentissage non supervisé un outil indispensable qui permet la découverte de structures et de modèles sous-jacents aux données, sans avoir à \étiqueter les individus manuellement. Parmi les différentes approches proposées pour aborder ce type de problème, le clustering est très certainement le plus répandu. Le clustering suppose que chaque groupe, également appelé cluster, est distribué autour d'un centre défini en fonction des valeurs qu'il prend pour l'ensemble des variables. Cependant, dans certaines applications du monde réel, et notamment dans le cas de données de dimension importante, cette hypothèse peut être invalidée. Aussi, les algorithmes de co-clustering ont-ils été proposés: ils décrivent les groupes d'individus par un ou plusieurs sous-ensembles de variables au regard de leur pertinence. La structure des données finalement obtenue est composée de blocs communément appelés co-clusters. Dans les deux premiers chapitres de cette thèse, nous présentons deux approches de co-clustering permettant de différencier les variables pertinentes du bruit en fonction de leur capacité \`a révéler la structure latente des données, dans un cadre probabiliste d'une part et basée sur la notion de métrique, d'autre part. L'approche probabiliste utilise le principe des modèles de mélanges, et suppose que les variables non pertinentes sont distribuées selon une loi de probabilité dont les paramètres sont indépendants de la partition des données en cluster. L'approche métrique est fondée sur l'utilisation d'une distance adaptative permettant d'affecter à chaque variable un poids définissant sa contribution au co-clustering. D'un point de vue théorique, nous démontrons la convergence des algorithmes proposés en nous appuyant sur le théorème de convergence de Zangwill. Dans les deux chapitres suivants, nous considérons un cas particulier de structure en co-clustering, qui suppose que chaque sous-ensemble d'individus et décrit par un unique sous-ensemble de variables. La réorganisation de la matrice originale selon les partitions obtenues sous cette hypothèse révèle alors une structure de blocks homogènes diagonaux. Comme pour les deux contributions précédentes, nous nous plaçons dans le cadre probabiliste et métrique. L'idée principale des méthodes proposées est d'imposer deux types de contraintes : (1) nous fixons le même nombre de cluster pour les individus et les variables; (2) nous cherchons une structure de la matrice de données d'origine qui possède les valeurs maximales sur sa diagonale (par exemple pour le cas des données binaires, on cherche des blocs diagonaux majoritairement composés de valeurs 1, et de 0 à l’extérieur de la diagonale). Les approches proposées bénéficient des garanties de convergence issues des résultats des chapitres précédents. Enfin, pour chaque chapitre, nous dérivons des algorithmes permettant d'obtenir des partitions dures et floues. Nous évaluons nos contributions sur un large éventail de données simulées et liées a des applications réelles telles que le text mining, dont les données peuvent être binaires ou continues. Ces expérimentations nous permettent également de mettre en avant les avantages et les inconvénients des différentes approches proposées. Pour conclure, nous pensons que cette thèse couvre explicitement une grande majorité des scénarios possibles découlant du co-clustering flou et dur, et peut être vu comme une généralisation de certaines approches de biclustering populaires. / With the increasing number of data available, unsupervised learning has become an important tool used to discover underlying patterns without the need to label instances manually. Among different approaches proposed to tackle this problem, clustering is arguably the most popular one. Clustering is usually based on the assumption that each group, also called cluster, is distributed around a center defined in terms of all features while in some real-world applications dealing with high-dimensional data, this assumption may be false. To this end, co-clustering algorithms were proposed to describe clusters by subsets of features that are the most relevant to them. The obtained latent structure of data is composed of blocks usually called co-clusters. In first two chapters, we describe two co-clustering methods that proceed by differentiating the relevance of features calculated with respect to their capability of revealing the latent structure of the data in both probabilistic and distance-based framework. The probabilistic approach uses the mixture model framework where the irrelevant features are assumed to have a different probability distribution that is independent of the co-clustering structure. On the other hand, the distance-based (also called metric-based) approach relied on the adaptive metric where each variable is assigned with its weight that defines its contribution in the resulting co-clustering. From the theoretical point of view, we show the global convergence of the proposed algorithms using Zangwill convergence theorem. In the last two chapters, we consider a special case of co-clustering where contrary to the original setting, each subset of instances is described by a unique subset of features resulting in a diagonal structure of the initial data matrix. Same as for the two first contributions, we consider both probabilistic and metric-based approaches. The main idea of the proposed contributions is to impose two different kinds of constraints: (1) we fix the number of row clusters to the number of column clusters; (2) we seek a structure of the original data matrix that has the maximum values on its diagonal (for instance for binary data, we look for diagonal blocks composed of ones with zeros outside the main diagonal). The proposed approaches enjoy the convergence guarantees derived from the results of the previous chapters. Finally, we present both hard and fuzzy versions of the proposed algorithms. We evaluate our contributions on a wide variety of synthetic and real-world benchmark binary and continuous data sets related to text mining applications and analyze advantages and inconvenients of each approach. To conclude, we believe that this thesis covers explicitly a vast majority of possible scenarios arising in hard and fuzzy co-clustering and can be seen as a generalization of some popular biclustering approaches. Classification Flou Classification croisée Modèle de mélange Approche métrique Modèle à bloc latent Données sparses Données binaires Classification de document Théorème de Zangwill Sélection de variable Données en grande dimension Algorithme Clustering Fuzzy Co-clustering Mixture model Metric approach Latent block model Sparse data Binary data Document clustering Zangwill theorem Feature selection High dimensional data Algorithm 004
233	Computational study of Formation and Development of Liquid Jets in Low Injection Pressure Conditions. Focus on urea-water solution injection for exhaust gas aftertreatment. Marco Gimeno, Javier 23 October 2023 (has links) [ES] La creciente preocupación sobre el efecto de la emisión de gases nocivos provenientes de motores de combustión interna alternativos (ICE) a la atmósfera ha llevado a los gobiernos a lo ancho del planeta a limitar la cantidad de dichas emisiones, particularmente en Europa a través de las normas EURO. La dificultad en cumplir dichas limitaciones ha llevado a la industria automovilística a cambiar el foco de motores de encendido por compresión (CI) o provocado (SI) hacia la electrificación o los combustibles libres de carbono. Sin embargo, esta transición no se puede llevar a cabo de manera sencilla en el corto y medio plazo, mientras que combustibles libres de carbono como el Hidrógeno (H2 ) o el Amoniaco (NH3 ) siguen produciendo algunos contaminantes como los Óxidos de Nitrógeno (NOx ), con los cuales hay que lidiar. Estas emisiones pueden ser particularmente dañinas para el ser humano ya que incrementan el riesgo de cáncer de pulmón. La Reducción Catalítica Selectiva (SCR) ha demostrado ser una tecnología eficaz para la reducción de este contaminante en particular. A través de una inyección de una Solución de Urea-Agua, junto con la energía térmica de los gases de escape, se genera una cantidad suficiente de NH 3 capaz de neutralizar los indeseados NOx en un catalizador de reducción. Con la inclusión de los SCR en automóviles ligeros además de su presencia tradicional en automóviles pesados, los SCR han sido el foco de la comunidad científica para mejorar el entendimiento de su principio de actuación, y mejorar su eficiencia en un entorno legislativo en el que los limites de emisión se han estrechado enormemente. Esta Tesis intenta ser parte de ese esfuerzo científico en caracterizar el proceso de inyección de UWS en su totalidad a través de un entorno computacional. El presente estudio tiene como objetivo proveer de un mejor entendimiento del proceso de atomización y degradación sufrido por los chorros de UWS. Las dinámicas no estacionarias que se dan lugar en la zonas cercana del chorro, añadido a la gran influencia de las características internas del inyector sobre el desarrollo del spray hacen que los métodos experimentales sean complicados para poder entender dicho proceso. Por otro lado, la Mecánica de Fluidos Computacional (CFD) presenta una alternativa. Para el propósito de esta Tesis, el CFD ha sido utilizado para caracterizar los sprays de SCR. Se intenta desarrollar y seleccionar los modelos más apropiados a chorros de baja velocidad, y establecer un conocimiento Una vez adquiridos dichos métodos, los mecanismos principales de rotura del chorro y de degradación de la urea se han analizan. En ese sentido, el uso de técnicas experimentales podrían ser sustituídos en el futuro para esta aplicación. Los métodos CFD son validados tanto en el campo cercano como en el lejano. Para el campo cercano, el tratamiento multi-fase se lleva a cabo a través de métodos de Modelo de Mezclas, o el método Volume-Of-Fluid. A través de ellos, la caracterización hidráulica de dos reconstrucciones del inyector de UWS se lleva a cabo. Subsiguientes análisis se llevan a cabo sobre las dinámicas de rotura de la vena líquida, descubriendo que mecanismos rigen el proceso. El estudio de campo lejano usa un Discrete Droplet Model (DDM) para lidiar con las fases líquidas y gaseosas. En él, la evaporación del agua y el proceso de termólisis de la urea han sido considerados y comparados con resultados experimentales con el fin de obtener una metodología fiel para su caracterización. Todo el conocimiento adquirido se aplica más tarde a un Close-Coupled SCR, en el cual condiciones de trabajo realista han sido consideradas. Además, una herramienta llamada Maximum Entropy Principle (MEP) es presentada. Por tanto, esta Tesis aporta una metodología valiosa capaz de predecir tanto el campo cercano como el lejano de chorros de UWS de una manera precisa. / [CA] La creixent preocupació sobre el efecte de l'emissió de gasos nocius provenients the motors de Combustió Interna Alternatius (ICE) a l'atmosfera ha dut als governs de tot el planeta a limitar la quantitat d'aquestes emisions, particularment a Europa mitjant les normes EURO. La dificultat de complir aquestes limitacions ha portat a l'industria automovilística a cambiar el focus de motors d'encedut per compresió (CI) o provocat (SI) cap a la electrificació o els combustibles lliures de carbó. No obstant això, aquesta transició no es pot dur a terme de manera senzilla , mentres que els combustibles lliures de carbó como l'Hidrogen (H2 ) o l'Amoniac (NH3 ) seguirien produint contaminants como els Óxids de Nitrogen (NOx ), amb els quals n'hi ha que bregar. Estes emissions poden ser particularment nocives per a l'esser humà ja que incrementen el risc de càncer de pulmó. La Reducció Catalítica Selectiva (SCR) ha demostrat ser una tecnología eficaç per a la reducció d'este contaminant en particular. Mitjançant una injecció d'una Solució D'Urea i Aigua, junt a l'energía térmica dels gasos d'fuita, es pot generar una quantitat suficiente de NH 3 capaç de neutralitzar els indesitjats NO x a un catalitzador de reducció. Amb l'inclusió dels SCR en automòvils lleugers a més de la seua tradicional presència en automòvils pesats, els SCR han segut el foc per a mijorar l'enteniment del seu principi d'actuació, i mijorar la seua eficiencia. Este estudi té como a objectiu proveir d'un mijor entenement del procés d'atomizació y degradació patit pels dolls de UWS. Les dinàmiques no estacionaries que es donen lloc en la zona propenca al doll, afegit a la gran influència de les característiques internes del injector sobre el desentroll de l'esprai, fan que els métods experimentals siguen complicats d'aplicar per entendre dit procés. Per un altre costat, la Mecànica de Fluïts Computacional (CFD) supon una alternativa que té certes avantatges. Per al propòsit d'esta Tesi, el CFD ha sigut utilitzat com la principal metodología per a caracteritzar elsesprais de SCR. Per mitjà de dits métodes, la Tesi vol desentrollar i seleccionar els models més apropiats que mitjos s'adapten a sprays de baixa velocitat, i establir un coneiximent per a posteriors estudis desentrollats sobre la mateixa temàtica. Una volta adquirits dits métodes, els mecanismes principals de trencament del doll, així com els de degradació de l'urea en amoníac s'analitzaran. En aquest sentit, l'us de técniques experimentals podría no ser utilitzat més en el futur per aquesta aplicació.Els métods CFD son aplicats i validats tant el el camp propenc com en el llunyà. Per al camp propenc, el tractament multi-component es porta a terme a través de métodes Eulerians-Eulerians, com el Model de Mescles, o el métode Volume-Of-Fluid. La caracterització hidràulica de dos reconstruccions de l'injector es porta a terme, els resultats del qual són comparats amb resultats experimentals. Subsegüents anàlisis es porten a terme sobre les dinàmiques de trencament de la vena líquida, descobrint qué mecanismes regeixen el procés. L'estudi de camp llunyà usa un Discrete Droplet Model (DDM) per a bregar en la fase líquida i gaseosa. En ell, l'evaporació del aigua y el procés de termòlisis de l'urea han sigut considerats i comparats amb el resultats experimentals amb la finalitat d'obtindre una metodología fidel per a la seua caracterització. Tot el coneixement obtingut s'aplica més tard a un Close-Coupled SCR, en el qual condicions de treball realistes han sigut considerades. Dels resultats obtinguts dels distints estudis, una ferramenta adicional anomenada Maximum Entropy Principle (MEP),capaç de predir el fenomen d'atomització dels doll de UWS sense la necessitat de realitzar simulacions del camp propenc, es presentat. Per tant, esta Tesi aporta una metodología capaç de predir tant el camp proper como el llunyà d'una manera precisa. / [EN] The increasing awareness of the effect of emitting harmful gases from Internal Combustion Engines (ICE) into the atmosphere has driven the governments across the globe to limit the amount of these emissions, par ticularly in Europe through the EURO norms. The difficulty to meet such limitations has driven the automotive industry to shift from traditional Compression Ignited (CI) or Spark Ignited (SI) engines toward electrification or carbon-free fuels. Nonetheless, this transition will not be easily done in the short and medium time frames, while carbon-free fuels such as Hydrogen (H2 ) and Ammonia (NH3 ) will keep producing certain pollutants such as Nitrogen Oxides (NOx ) which need taking care of. These emissions can be particularly hazardous for humans, increasing the risk of developing lung cancer. Selective Catalytic Reduction (SCR) is an effective technology for reducing this specific ICE contaminant. An injection of a Urea-Water Solution (UWS), together with the thermal energy of the combustion gases can generate a sufficient amount of NH 3 capable of neutralizing the unwanted NO x in a catalyst. With the fitting of SCR systems within light-duty applications, in addition to their traditional presence on heavy-duty usage, SCR has been on the focus to understand their working principle and improve their efficiency . This Thesis tries to become part of that scientific ensemble by characterizing the whole UWS injection process within a computational framework. The present research aims to provide a better understanding of the atomizing and degradation processes undergone by the UWS sprays. The transient dynamics taking place in the near-field region, added to the great influence of the inner-injector characteristics on the development of the spray make experimental approaches on such sprays challenging in providing such knowledge. Computational Fluid Dynamics (CFD) provide an alternative that has certain advantages. For this Thesis they have been adopted as the main methodology on characterizing SCR sprays. The Thesis tries to develop and select the appropriate models that best suit low-velocity sprays. With the suitable methods that best predict these sprays, the main jet breakup mechanisms, together with the urea-to-ammonia transformation will have their behavior analyzed. In that way, experimental techniques could be avoided for such applications. CFD is applied and validated both in the near-field and far-field regions. For the near-field, multi-component flows are treated through Eulerian-Eulerian such as the Mixture Model or the Volume-Of-Fluid method. Through them, a hydraulic characterization on two recon structions of the UWS injector is performed, with results compared with experimental data. Further analysis is done on the jet-to-droplet dynamics, assessing which mechanisms drove the process. The far-field analy sis uses a Discrete Droplet Model (DDM) for dealing with the gas and liquid phases. In it, the evaporation of water and the thermolysis process of the urea have been considered and again compared with experimental results to have a faithful methodology for its characterization. All the acquired knowledge has been later applied to a commercial Close-Coupled SCR, in which real-working conditions have been considered. From the results obtained from several studies, an additional tool called Maximum Entropy Principle (MEP), capable of predicting the UWS spray atomization phenomenon without the need to perform near-field simulations, has been provided. Accordingly, this Thesis provides a valuable methodology capable of predicting the near-field and far-field dynamics accurately thanks to its validation against experimental results from literature. Additionally, the MEP tool can be used independently for computational and experimental works to predict the performance of UWS atomizers.The work carried out presents a significant leap in the application of CFD tools in predicting low-velocity sprays. / Javier Marco Gimeno has been founded through a grant from the Government of Generalitat Valenciana with reference ACIF/2020/259 and financial support from the European Union. These same institutions, Government of Generalitat Valenciana and The European Union, supported through a grant for pre-doctoral stays out of the Comunitat Valenciana with reference CIBEFP/2021/11 the research carried out during the stay at Energy Systems, Argonne National Laboratory, United States of America. / Marco Gimeno, J. (2023). Computational study of Formation and Development of Liquid Jets in Low Injection Pressure Conditions. Focus on urea-water solution injection for exhaust gas aftertreatment [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/198699 Solución de Urea-Agua (UWS) Internal Combustion Engines (ICE) Tratamiento de gases de escape Computational Fluid Dynamics (CFD) Reynolds-Averaged Navier-Stokes (RANS) Large Eddy Simulation (LES) Volume-Of-Fluid (VOF) Mixture Model (MM) Discrete Droplet Model (DDM) Maximum Entropy Principle (MEP) Flujo Interno Flujo Externo Acoplamiento INGENIERIA AEROESPACIAL
234	Analysis of survey data in the presence of non-ignorable missing-data and selection mechanisms Hammon, Angelina 04 July 2023 (has links) Diese Dissertation beschäftigt sich mit Methoden zur Behandlung von nicht-ignorierbaren fehlenden Daten und Stichprobenverzerrungen – zwei häufig auftretenden Problemen bei der Analyse von Umfragedaten. Beide Datenprobleme können die Qualität der Analyseergebnisse erheblich beeinträchtigen und zu irreführenden Inferenzen über die Population führen. Daher behandle ich innerhalb von drei verschiedenen Forschungsartikeln, Methoden, die eine Durchführung von sogenannten Sensitivitätsanalysen in Bezug auf Missing- und Selektionsmechanismen ermöglichen und dabei auf typische Survey-Daten angewandt werden können. Im Rahmen des ersten und zweiten Artikels entwickele ich Verfahren zur multiplen Imputation von binären und ordinal Mehrebenen-Daten, welche es zulassen, einen potenziellen Missing Not at Random (MNAR) Mechanismus zu berücksichtigen. In unterschiedlichen Simulationsstudien konnte bestätigt werden, dass die neuen Imputationsmethoden in der Lage sind, in allen betrachteten Szenarien unverzerrte sowie effiziente Schätzungen zuliefern. Zudem konnte ihre Anwendbarkeit auf empirische Daten aufgezeigt werden. Im dritten Artikel untersuche ich ein Maß zur Quantifizierung und Adjustierung von nicht ignorierbaren Stichprobenverzerrungen in Anteilswerten, die auf der Basis von nicht-probabilistischen Daten geschätzt wurden. Es handelt sich hierbei um die erste Anwendung des Index auf eine echte nicht-probabilistische Stichprobe abseits der Forschergruppe, die das Maß entwickelt hat. Zudem leite ich einen allgemeinen Leitfaden für die Verwendung des Index in der Praxis ab und validiere die Fähigkeit des Maßes vorhandene Stichprobenverzerrungen korrekt zu erkennen. Die drei vorgestellten Artikel zeigen, wie wichtig es ist, vorhandene Schätzer auf ihre Robustheit hinsichtlich unterschiedlicher Annahmen über den Missing- und Selektionsmechanismus zu untersuchen, wenn es Hinweise darauf gibt, dass die Ignorierbarkeitsannahme verletzt sein könnte und stellen erste Lösungen zur Umsetzung bereit. / This thesis deals with methods for the appropriate handling of non-ignorable missing data and sample selection, which are two common challenges of survey data analysis. Both issues can dramatically affect the quality of analysis results and lead to misleading inferences about the population. Therefore, in three different research articles, I treat methods for the performance of so-called sensitivity analyses with regards to the missing data and selection mechanism that are usable with typical survey data. In the first and second article, I provide novel procedures for the multiple imputation of binary and ordinal multilevel data that are supposed to be Missing not At Random (MNAR). The methods’ suitability to produce unbiased and efficient estimates could be demonstrated in various simulation studies considering different data scenarios. Moreover, I could show their applicability to empirical data. In the third article, I investigate a measure to quantify and adjust non-ignorable selection bias in proportions estimated based on non-probabilistic data. In doing so, I provide the first application of the suggested index to a real non-probability sample outside its original research group. In addition, I derive general guidelines for its usage in practice, and validate the measure’s performance in properly detecting selection bias. The three presented articles highlight the necessity to assess the sensitivity of estimates towards different assumptions about the missing-data and selection mechanism if it seems realistic that the ignorability assumption might be violated, and provide first solutions to enable such robustness checks for specific data situations. Missing Not at Random Multiple Imputation Fully conditional specification Mehrebenen Daten Selektionsmodell Selection Not at Random Stichprobenverzerrung Nicht-probabilistische Stichprobe Pattern-mixture Modell Sensitivitätsanalyse Missing Not at Random Multiple imputation Fully conditional specification Multilevel data Selection model Selection Not at Random Selection bias Non-probability sample Pattern-mixture model Sensitivity analysis 300 Sozialwissenschaften ddc:300 ddc:519
235	CURE RATE AND DESTRUCTIVE CURE RATE MODELS UNDER PROPORTIONAL ODDS LIFETIME DISTRIBUTIONS FENG, TIAN January 2019 (has links) Cure rate models, introduced by Boag (1949), are very commonly used while modelling lifetime data involving long time survivors. Applications of cure rate models can be seen in biomedical science, industrial reliability, finance, manufacturing, demography and criminology. In this thesis, cure rate models are discussed under a competing cause scenario, with the assumption of proportional odds (PO) lifetime distributions for the susceptibles, and statistical inferential methods are then developed based on right-censored data. In Chapter 2, a flexible cure rate model is discussed by assuming the number of competing causes for the event of interest following the Conway-Maxwell (COM) Poisson distribution, and their corresponding lifetimes of non-cured or susceptible individuals can be described by PO model. This provides a natural extension of the work of Gu et al. (2011) who had considered a geometric number of competing causes. Under right censoring, maximum likelihood estimators (MLEs) are obtained by the use of expectation-maximization (EM) algorithm. An extensive Monte Carlo simulation study is carried out for various scenarios, and model discrimination between some well-known cure models like geometric, Poisson and Bernoulli is also examined. The goodness-of-fit and model diagnostics of the model are also discussed. A cutaneous melanoma dataset example is used to illustrate the models as well as the inferential methods. Next, in Chapter 3, the destructive cure rate models, introduced by Rodrigues et al. (2011), are discussed under the PO assumption. Here, the initial number of competing causes is modelled by a weighted Poisson distribution with special focus on exponentially weighted Poisson, length-biased Poisson and negative binomial distributions. Then, a damage distribution is introduced for the number of initial causes which do not get destroyed. An EM-type algorithm for computing the MLEs is developed. An extensive simulation study is carried out for various scenarios, and model discrimination between the three weighted Poisson distributions is also examined. All the models and methods of estimation are evaluated through a simulation study. A cutaneous melanoma dataset example is used to illustrate the models as well as the inferential methods. In Chapter 4, frailty cure rate models are discussed under a gamma frailty wherein the initial number of competing causes is described by a Conway-Maxwell (COM) Poisson distribution in which the lifetimes of non-cured individuals can be described by PO model. The detailed steps of the EM algorithm are then developed for this model and an extensive simulation study is carried out to evaluate the performance of the proposed model and the estimation method. A cutaneous melanoma dataset as well as a simulated data are used for illustrative purposes. Finally, Chapter 5 outlines the work carried out in the thesis and also suggests some problems of further research interest. / Thesis / Doctor of Philosophy (PhD) Cure rate models Mixture model Long-term survivors COM-Poisson distribution Weighted Poisson distribution EM algorithm Right censoring Non-informative censoring Profile likelihood Asymptotic variances and covariances Maximum likelihood estimation Likelihood-ratio test Exponential distribution Proportional odds model Weibull distribution Log-logistic distribution Gamma distribution Mixture of chi-square Akaike Information Criterion (AIC) Bayesian Information Criterion (BIC) Model discrimination Monte Carlo simulations Goodness-of-fit test Cutaneous melanoma
236	Regression modeling with missing outcomes : competing risks and longitudinal data / Contributions aux modèles de régression avec réponses manquantes : risques concurrents et données longitudinales Moreno Betancur, Margarita 05 December 2013 (has links) Les données manquantes sont fréquentes dans les études médicales. Dans les modèles de régression, les réponses manquantes limitent notre capacité à faire des inférences sur les effets des covariables décrivant la distribution de la totalité des réponses prévues sur laquelle porte l'intérêt médical. Outre la perte de précision, toute inférence statistique requière qu'une hypothèse sur le mécanisme de manquement soit vérifiée. Rubin (1976, Biometrika, 63:581-592) a appelé le mécanisme de manquement MAR (pour les sigles en anglais de « manquant au hasard ») si la probabilité qu'une réponse soit manquante ne dépend pas des réponses manquantes conditionnellement aux données observées, et MNAR (pour les sigles en anglais de « manquant non au hasard ») autrement. Cette distinction a des implications importantes pour la modélisation, mais en général il n'est pas possible de déterminer si le mécanisme de manquement est MAR ou MNAR à partir des données disponibles. Par conséquent, il est indispensable d'effectuer des analyses de sensibilité pour évaluer la robustesse des inférences aux hypothèses de manquement.Pour les données multivariées incomplètes, c'est-à-dire, lorsque l'intérêt porte sur un vecteur de réponses dont certaines composantes peuvent être manquantes, plusieurs méthodes de modélisation sous l'hypothèse MAR et, dans une moindre mesure, sous l'hypothèse MNAR ont été proposées. En revanche, le développement de méthodes pour effectuer des analyses de sensibilité est un domaine actif de recherche. Le premier objectif de cette thèse était de développer une méthode d'analyse de sensibilité pour les données longitudinales continues avec des sorties d'étude, c'est-à-dire, pour les réponses continues, ordonnées dans le temps, qui sont complètement observées pour chaque individu jusqu'à la fin de l'étude ou jusqu'à ce qu'il sorte définitivement de l'étude. Dans l'approche proposée, on évalue les inférences obtenues à partir d'une famille de modèles MNAR dits « de mélange de profils », indexés par un paramètre qui quantifie le départ par rapport à l'hypothèse MAR. La méthode a été motivée par un essai clinique étudiant un traitement pour le trouble du maintien du sommeil, durant lequel 22% des individus sont sortis de l'étude avant la fin.Le second objectif était de développer des méthodes pour la modélisation de risques concurrents avec des causes d'évènement manquantes en s'appuyant sur la théorie existante pour les données multivariées incomplètes. Les risques concurrents apparaissent comme une extension du modèle standard de l'analyse de survie où l'on distingue le type d'évènement ou la cause l'ayant entrainé. Les méthodes pour modéliser le risque cause-spécifique et la fonction d'incidence cumulée supposent en général que la cause d'évènement est connue pour tous les individus, ce qui n'est pas toujours le cas. Certains auteurs ont proposé des méthodes de régression gérant les causes manquantes sous l'hypothèse MAR, notamment pour la modélisation semi-paramétrique du risque. Mais d'autres modèles n'ont pas été considérés, de même que la modélisation sous MNAR et les analyses de sensibilité. Nous proposons des estimateurs pondérés et une approche par imputation multiple pour la modélisation semi-paramétrique de l'incidence cumulée sous l'hypothèse MAR. En outre, nous étudions une approche par maximum de vraisemblance pour la modélisation paramétrique du risque et de l'incidence sous MAR. Enfin, nous considérons des modèles de mélange de profils dans le contexte des analyses de sensibilité. Un essai clinique étudiant un traitement pour le cancer du sein de stade II avec 23% des causes de décès manquantes sert à illustrer les méthodes proposées. / Missing data are a common occurrence in medical studies. In regression modeling, missing outcomes limit our capability to draw inferences about the covariate effects of medical interest, which are those describing the distribution of the entire set of planned outcomes. In addition to losing precision, the validity of any method used to draw inferences from the observed data will require that some assumption about the mechanism leading to missing outcomes holds. Rubin (1976, Biometrika, 63:581-592) called the missingness mechanism MAR (for “missing at random”) if the probability of an outcome being missing does not depend on missing outcomes when conditioning on the observed data, and MNAR (for “missing not at random”) otherwise. This distinction has important implications regarding the modeling requirements to draw valid inferences from the available data, but generally it is not possible to assess from these data whether the missingness mechanism is MAR or MNAR. Hence, sensitivity analyses should be routinely performed to assess the robustness of inferences to assumptions about the missingness mechanism. In the field of incomplete multivariate data, in which the outcomes are gathered in a vector for which some components may be missing, MAR methods are widely available and increasingly used, and several MNAR modeling strategies have also been proposed. On the other hand, although some sensitivity analysis methodology has been developed, this is still an active area of research. The first aim of this dissertation was to develop a sensitivity analysis approach for continuous longitudinal data with drop-outs, that is, continuous outcomes that are ordered in time and completely observed for each individual up to a certain time-point, at which the individual drops-out so that all the subsequent outcomes are missing. The proposed approach consists in assessing the inferences obtained across a family of MNAR pattern-mixture models indexed by a so-called sensitivity parameter that quantifies the departure from MAR. The approach was prompted by a randomized clinical trial investigating the benefits of a treatment for sleep-maintenance insomnia, from which 22% of the individuals had dropped-out before the study end. The second aim was to build on the existing theory for incomplete multivariate data to develop methods for competing risks data with missing causes of failure. The competing risks model is an extension of the standard survival analysis model in which failures from different causes are distinguished. Strategies for modeling competing risks functionals, such as the cause-specific hazards (CSH) and the cumulative incidence function (CIF), generally assume that the cause of failure is known for all patients, but this is not always the case. Some methods for regression with missing causes under the MAR assumption have already been proposed, especially for semi-parametric modeling of the CSH. But other useful models have received little attention, and MNAR modeling and sensitivity analysis approaches have never been considered in this setting. We propose a general framework for semi-parametric regression modeling of the CIF under MAR using inverse probability weighting and multiple imputation ideas. Also under MAR, we propose a direct likelihood approach for parametric regression modeling of the CSH and the CIF. Furthermore, we consider MNAR pattern-mixture models in the context of sensitivity analyses. In the competing risks literature, a starting point for methodological developments for handling missing causes was a stage II breast cancer randomized clinical trial in which 23% of the deceased women had missing cause of death. We use these data to illustrate the practical value of the proposed approaches. Données manquantes Données longitudinales Risques concurrents Régression Réponses manquantes Sorties d'étude Cause d'évènement manquante Imputation multiple Estimateurs pondérés Maximum de vraisemblance Modèle de mélange de profils Analyse de sensibilité Modèle linéaire mixte Fonction d'incidence cumulée Risque cause-spécifique Pseudo-valeurs Missing data Longitudinal data Competing risks Regression Missing outcomes Drop-out Missing cause of failure Multiple imputation Inverse probability weighting Direct likelihood Pattern-mixture model Sensitivity analysis Linear mixed model Cumulative incidence function Cause-specific hazard Pseudo-values

Page generated in 0.0316 seconds