Global ETD Search

111	Bayesian modeling of neuropsychological test scores Du, Mengtian 06 October 2021 (has links) In this dissertation we propose novel Bayesian methods of analysis of patterns of neuropsychological testing. We first focus attention to situations in which the goal of the analysis is to discover risk factors of cognitive decline using longitudinal assessment of tests scores. Variable selection in the Bayesian setting is still challenging, particularly for analysis of longitudinal data. We propose a novel approach to selection of the fixed effects in mixed effect models that combines a backward selection algorithm and a metrics based on the posterior credible intervals of the model parameters. The heuristic of this approach is based on searching for those parameters that are most likely to be different from zero based on their posterior credible intervals, without requiring ad hoc approximations of model parameters or informative prior distributions. We show via a simulation study that this approach produces more parsimonious models than other popular criteria such as the Bayesian deviance information criterion. We then apply this approach to test the hypothesis that genotypes of the APOE gene have different effects on the rate of cognitive decline of participants in the Long Life Family Study. In the second part of the dissertation we shift focus on analysis of neuropsychological tests administered using emerging digital technologies. The challenge of analyzing these data is that for each study participant the test is a data stream that records time and spatial coordinates of the digitally executed test and the goal is to extract some useful and informative summary univariate variables that can be used for analysis. Toward this goal, we propose a novel application of Bayesian Hidden Markov Models to analyze digitally recorded Trail Making Tests. Applying the Hidden Markov Model enables us to perform automatic segmentation of the digital data stream and allows us to extract meaningful metrics that correlate the Trail Making Tests performance to other cognitive and physical function test scores. We show that the extracted metrics provide information in addition to the traditionally used scores. / 2023-10-06T00:00:00Z Biostatistics Bayesian hierarchical models Bayesian variable selection Credible intervals Mixed effects models
112	Advancing Bechhofer's Ranking Procedures to High-dimensional Variable Selection Gu, Chao 01 September 2021 (has links) No description available. Mathematics Statistics Ranking Procedure Variable Selection High dimensional Analysis Regression Analysis Coverage Probability
113	Scoring pour le risque de crédit : variable réponse polytomique, sélection de variables, réduction de la dimension, applications / Scoring for credit risk : polytomous response variable, variable selection, dimension reduction, applications Vital, Clément 11 July 2016 (has links) Le but de cette thèse était d'explorer la thématique du scoring dans le cadre de son utilisation dans le monde bancaire, et plus particulièrement pour contrôler le risque de crédit. En effet, la diversification et la globalisation des activités bancaires dans la deuxième moitié du XXe siècle ont conduit à l'instauration d'un certain nombre de régulations, afin de pouvoir s'assurer que les établissements bancaires disposent de capitaux nécessaires à couvrir le risque qu'ils prennent. Cette régulation impose ainsi la modélisation de certains indicateurs de risque, dont la probabilité de défaut, qui est pour un prêt en particulier la probabilité que le client se retrouve dans l'impossibilité de rembourser la somme qu'il doit. La modélisation de cet indicateur passe par la définition d'une variable d'intérêt appelée critère de risque, dénotant les "bons payeurs" et les "mauvais payeurs". Retranscrit dans un cadre statistique plus formel, cela signifie que nous cherchons à modéliser une variable à valeurs dans {0,1} par un ensemble de variables explicatives. Cette problématique est en pratique traitée comme une question de scoring. Le scoring consiste en la définition de fonction, appelées fonctions de score, qui retransmettent l'information contenue dans l'ensemble des variables explicatives dans une note de score réelle. L'objectif d'une telle fonction sera de donner sur les individus le même ordonnancement que la probabilité a posteriori du modèle, de manière à ce que les individus ayant une forte probabilité d'être "bons" aient une note élevée, et inversement que les individus ayant une forte probabilité d'être "mauvais" (et donc un risque fort pour la banque) aient une note faible. Des critères de performance tels que la courbe ROC et l'AUC ont été définis, permettant de quantifier à quel point l'ordonnancement produit par la fonction de score est pertinent. La méthode de référence pour obtenir des fonctions de score est la régression logistique, que nous présentons ici. Une problématique majeure dans le scoring pour le risque de crédit est celle de la sélection de variables. En effet, les banques disposent de larges bases de données recensant toutes les informations dont elles disposent sur leurs clients, aussi bien sociodémographiques que comportementales, et toutes ne permettent pas d'expliquer le critère de risque. Afin d'aborder ce sujet, nous avons choisi de considérer la technique du Lasso, reposant sur l'application d'une contrainte sur les coefficients, de manière à fixer les valeurs des coefficients les moins significatifs à zéro. Nous avons envisagé cette méthode dans le cadre des régressions linéaires et logistiques, ainsi qu'une extension appelée Group Lasso, permettant de considérer les variables explicatives par groupes. Nous avons ensuite considéré le cas où la variable réponse n'est plus binaire, mais polytomique, c'est-à-dire avec plusieurs niveaux de réponse possibles. La première étape a été de présenter une définition du scoring équivalente à celle présentée précédemment dans le cas binaire. Nous avons ensuite présenté différentes méthodes de régression adaptées à ce nouveau cas d'étude : une généralisation de la régression logistique binaire, des méthodes semi-paramétriques, ainsi qu'une application à la régression logistique polytomique du principe du Lasso. Enfin, le dernier chapitre est consacré à l'application de certaines des méthodes évoquées dans le manuscrit sur des jeux de données réelles, permettant de les confronter aux besoins réels de l'entreprise. / The objective of this thesis was to explore the subject of scoring in the banking world, and more precisely to study how to control credit risk. The diversification and globalization of the banking business in the second half of the twentieth century led to introduce regulations, which require banks to make reserves to cover the risk they take. These regulations also dictate that they should model different risk indicators, among which the probability of default. This indicator represents the probability for a client to find himself in the incapacity to pay back his debt. In order to predict this probability, one should define a risk criterion, that allows to distinguish the "bad clients" from the "good clients". In a more formal statistical approach, that means we want to model a binary variable by an ensemble of explanatory variables. This problem is usually treated as a scoring problem. It consists in the definition of functions, called scoring functions, which interpret the information contained in the explanatory variables and transform it into a real-value score note. The goal of such a function is to induce the same order on the observations than the a posteriori probability, so that the observations that have a high probability to be "good" have a high score, and those that have a high probability to be "bad" (and thus a high risk for the bank) have a low score. Performance criteria such as the ROC curve and the AUC allow us to quantify the quality of the order given by the scoring function. The reference method to obtain such scoring functions is the logistic regression, which we present here. A major subject in credit scoring is the variable selection. The banks have access to large databases, which gather information on the profile of their clients and their past behavior. However, those variables may not all be discriminating regarding the risk criterion. In order to select the variables, we proposed to use the Lasso method, based on the restriction of the coefficients of the model, so that the less significative coefficients will be fixed to zero. We applied the Lasso method on linear regression and logistic regression. We also considered an extension of the Lasso method called Group Lasso on logistic regression, which allows us to select groups of variables rather than individual variables. Then, we considered the case in which the response variable is not binary, but polytomous, that is to say with more than two response levels. The first step in this new context was to extend the scoring problem as we knew in the binary case to the polytomous case. We then presented some models adapted to this case: an extension of the binary logistic regression, semi-parametric methods, and an application of the Lasso method on the polytomous logistic regression. Finally, the last chapter deals with some application studies, in which the methods presented in this manuscript are applied to real data from the bank, to see how they meet the needs of the real world. Scoring Risque de crédit Régression polytomique Sélection de variables Lasso Scoring Credit risk Polytomous regression Variable selection Lasso
114	Heritability Estimation in High-dimensional Mixed Models : Theory and Applications. / Estimation de l'héritabilité dans les modèles mixtes en grande dimension : théorie et applications. Bonnet, Anna 05 December 2016 (has links) Nous nous intéressons à desméthodes statistiques pour estimer l'héritabilitéd'un caractère biologique, qui correspond à lapart des variations de ce caractère qui peut êtreattribuée à des facteurs génétiques. Nousproposons dans un premier temps d'étudierl'héritabilité de traits biologiques continus àl'aide de modèles linéaires mixtes parcimonieuxen grande dimension. Nous avons recherché lespropriétés théoriques de l'estimateur du maximumde vraisemblance de l'héritabilité : nousavons montré que cet estimateur était consistantet vérifiait un théorème central limite avec unevariance asymptotique que nous avons calculéeexplicitement. Ce résultat, appuyé par des simulationsnumériques sur des échantillons finis,nous a permis de constater que la variance denotre estimateur était très fortement influencéepar le ratio entre le nombre d'observations et lataille des effets génétiques. Plus précisément,quand le nombre d’observations est faiblecomparé à la taille des effets génétiques (ce quiest très souvent le cas dans les étudesgénétiques), la variance de l’estimateur était trèsgrande. Ce constat a motivé le développementd'une méthode de sélection de variables afin dene garder que les variants génétiques les plusimpliqués dans les variations phénotypiques etd’améliorer la précision des estimations del’héritabilité.La dernière partie de cette thèse est consacrée àl'estimation d'héritabilité de données binaires,dans le but d'étudier la part de facteursgénétiques impliqués dans des maladies complexes.Nous proposons d'étudier les propriétésthéoriques de la méthode développée par Golanet al. (2014) pour des données de cas-contrôleset très efficace en pratique. Nous montronsnotamment la consistance de l’estimateur del’héritabilité proposé par Golan et al. (2014). / We study statistical methods toestimate the heritability of a biological trait,which is the proportion of variations of thistrait that can be explained by genetic factors.First, we propose to study the heritability ofquantitative traits using high-dimensionalsparse linear mixed models. We investigate thetheoretical properties of the maximumlikelihood estimator for the heritability and weshow that it is a consistent estimator and that itsatisfies a central limit theorem with a closedformexpression for the asymptotic variance.This result, supported by an extendednumerical study, shows that the variance of ourestimator is strongly affected by the ratiobetween the number of observations and thesize of the random genetic effects. Moreprecisely, when the number of observations issmall compared to the size of the geneticeffects (which is often the case in geneticstudies), the variance of our estimator is verylarge. This motivated the development of avariable selection method in order to capturethe genetic variants which are involved themost in the phenotypic variations and providemore accurate heritability estimations. Wepropose then a variable selection methodadapted to high dimensional settings and weshow that, depending on the number of geneticvariants actually involved in the phenotypicvariations, called causal variants, it was a goodidea to include or not a variable selection stepbefore estimating heritability.The last part of this thesis is dedicated toheritability estimation for binary data, in orderto study the proportion of genetic factorsinvolved in complex diseases. We propose tostudy the theoretical properties of the methoddeveloped by Golan et al. (2014) for casecontroldata, which is very efficient in practice.Our main result is the proof of the consistencyof their heritability estimator. Héritabilité Modèles mixtes Grande dimension Sélection de variables Heritability Mixed models High dimension Variable selection
115	Statistical methods for transcriptomics: From microarrays to RNA-seq Tarazona Campos, Sonia 30 March 2015 (has links) La transcriptómica estudia el nivel de expresión de los genes en distintas condiciones experimentales para tratar de identificar los genes asociados a un fenotipo dado así como las relaciones de regulación entre distintos genes. Los datos ómicos se caracterizan por contener información de miles de variables en una muestra con pocas observaciones. Las tecnologías de alto rendimiento más comunes para medir el nivel de expresión de miles de genes simultáneamente son los microarrays y, más recientemente, la secuenciación de RNA (RNA-seq). Este trabajo de tesis versará sobre la evaluación, adaptación y desarrollo de modelos estadísticos para el análisis de datos de expresión génica, tanto si ha sido estimada mediante microarrays o bien con RNA-seq. El estudio se abordará con herramientas univariantes y multivariantes, así como con métodos tanto univariantes como multivariantes. / Tarazona Campos, S. (2014). Statistical methods for transcriptomics: From microarrays to RNA-seq [Tesis doctoral]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/48485 / TESIS / Premios Extraordinarios de tesis doctorales Bioestadistics Bioinformatics, Variable selection Non-parametric statistical methods Differential expression Microarrays RNA-seq Transcriptomics ESTADISTICA E INVESTIGACION OPERATIVA
116	Applications of Time to Event Analysis in Clinical Data Xu, Chenjia 12 1900 (has links) Indiana University-Purdue University Indianapolis (IUPUI) / Survival analysis has broad applications in diverse research areas. In this dissertation, we consider an innovative application of survival analysis approach to phase I dose-finding design and the modeling of multivariate survival data. In the first part of the dissertation, we apply time to event analysis in an innovative dose-finding design. To account for the unique feature of a new class of oncology drugs, T-cell engagers, we propose a phase I dose-finding method incorporating systematic intra-subject dose escalation. We utilize survival analysis approach to analyze intra-subject dose-escalation data and to identify the maximum tolerated dose. We evaluate the operating characteristics of the proposed design through simulation studies and compare it to existing methodologies. The second part of the dissertation focuses on multivariate survival data with semi-competing risks. Time-to-event data from the same subject are often correlated. In addition, semi-competing risks are sometimes present with correlated events when a terminal event can censor other non-terminal events but not vice versa. We use a semiparametric frailty model to account for the dependence between correlated survival events and semi-competing risks and adopt penalized partial likelihood (PPL) approach for parameter estimation. In addition, we investigate methods for variable selection in semi-parametric frailty models and propose a double penalized partial likelihood (DPPL) procedure for variable selection of fixed effects in frailty models. We consider two penalty functions, least absolute shrinkage and selection operator (LASSO) and smoothly clipped absolute deviation (SCAD) penalty. The proposed methods are evaluated in simulation studies and illustrated using data from Indianapolis-Ibadan Dementia Project. Dose-finding Frailty model Semi-competing risk Survival analysis Time to event Variable selection
117	Statistical Modeling Method for Efficiency Improvement of Industrial Processes / 生産プロセス効率化のための統計的モデリング手法 Kim, Sanghong 24 March 2014 (has links) 京都大学 / 0048 / 新制・課程博士 / 博士(工学) / 甲第18311号 / 工博第3903号 / 新制\|\|工\|\|1599(附属図書館) / 31169 / 京都大学大学院工学研究科化学工学専攻 / (主査)教授長谷部伸治, 教授大嶋正裕, 教授宮原稔 / 学位規則第4条第1項該当 / Doctor of Philosophy (Engineering) / Kyoto University / DFAM Soft-sensor Statistical model Input variable scaling Input variable selection Inferential contorl Just-in-time model 500
118	(Ultra-)High Dimensional Partially Linear Single Index Models for Quantile Regression Zhang, Yuankun 30 October 2018 (has links) No description available. Statistics Nonparametric modeling Single-index models Quantile regression Splines Variable selection High dimensionality
119	A Comparison of Variable Selection Methods for Modeling Human Judgment Carter, Kristina A. 05 June 2019 (has links) No description available. Psychology Quantitative Psychology judgment analysis lens model analysis variable selection random forest judgment modeling
120	On Analysis of Sufficient Dimension Reduction Models An, Panduan 04 June 2019 (has links) No description available. Mathematics Statistics Sufficient dimension reduction central subspace central mean subspace monotonicity variable selection hypothesis testing nonparametric

Search results