Global ETD Search

71	自變數增加對岭估計的影響分析萬世卿, Wan, Shin Chin Unknown Date (has links) 在最小平方估計中，當自變數間有共線性關係時，參數估計的變異變大，使得參數估計值不穩定。解決共線性對參數估計所造成影響的方法有很多，岭估計就是其中之一。在岭估計中，為了偵測出對岭估計有影響力的自變數，本文仿照Schall-Dunne的處理方式，推導出類似的Cook統計量及AP估計量，並且提出以Kullback-Leibler對稱散度來偵測對岭估計有影響力自變數。最後用"加拿大金融市場"與"員工對主管滿意度調查"的兩個實例，來說明本文所提出對岭估計有影響力自變數之偵測方法。岭估計共線性 Cook統計量 AP統計量 Kullback-Leibler 對稱散度變異膨脹因子廣義變異膨脹因子 Ridge regression
72	Estimation de fonctions de régression : sélection d'estimateurs ridge, étude de la procédure PLS1 et applications à la modélisation de la signature génique du cancer du poumon / Estimation of regression functions : ridge estimators selection, study of PLS1 procedure and applications on modelling the genetic signature of lung cancer Binard, Carole 04 May 2016 (has links) Cette thèse porte sur l’estimation d'une fonction de régression fournissant la meilleure relation entredes variables pour lesquelles on possède un certain nombre d’observations. Une première partie portesur une étude par simulation de deux méthodes automatiques de sélection du paramètre de laprocédure d'estimation ridge. D'un point de vue plus théorique, on présente et compare ensuite deuxméthodes de sélection d'un multiparamètre intervenant dans une procédure d'estimation d'unefonction de régression sur l'intervalle [0,1]. Dans une deuxième partie, on étudie la qualité del'estimateur PLS1, d'un point de vue théorique, à travers son risque quadratique et, plus précisément,le terme de variance dans la décomposition biais/variance de ce risque. Enfin, dans une troisièmepartie, une étude statistique sur données réelles est menée afin de mieux comprendre la signaturegénique de cellules cancéreuses à partir de la signature génique des sous-types cellulaires constituantle stroma tumoral associé / This thesis deals with the estimation of a regression function providing the best relationship betweenvariables for which we have some observations. In a first part, we complete a simulation study fortwo automatic selection methods of the ridge parameter. From a more theoretical point of view, wethen present and compare two selection methods of a multiparameter, that is used in an estimationprocedure of a regression function on [0,1]. In a second part, we study the quality of the PLS1estimator through its quadratic risk and, more precisely, the variance term in its bias/variancedecomposition. In a third part, a statistical study is carried out in order to explain the geneticsignature of cancer cells thanks to the genetic signatures of cellular subtypes which compose theassociated tumor stroma Estimateurs ridge par morceaux Fonction de régression Microdissection biologique in silico Procédure PLS Régression ridge Risque quadratique Sélection d'estimateurs Signature génique Simulations Piecewise ridge estimators Regression fuction In silico bilogical microdissection PLS1 procedure Ridge regression Quadratic risk Estimator selection Genetic signature Simulations
73	Statistical and Machine Learning for assessment of Traumatic Brain Injury Severity and Patient Outcomes Rahman, Md Abdur January 2021 (has links) Traumatic brain injury (TBI) is a leading cause of death in all age groups, causing society to be concerned. However, TBI diagnostics and patient outcomes prediction are still lacking in medical science. In this thesis, I used a subset of TBIcare data from Turku University Hospital in Finland to classify the severity, patient outcomes, and CT (computerized tomography) as positive/negative. The dataset was derived from the comprehensive metabolic profiling of serum samples from TBI patients. The study included 96 TBI patients who were diagnosed as 7 severe (sTBI=7), 10 moderate (moTBI=10), and 79 mild (mTBI=79). Among them, there were 85 good recoveries (Good_Recovery=85) and 11 bad recoveries (Bad_Recovery=11), as well as 49 CT positive (CT. Positive=49) and 47 CT negative (CT. Negative=47). There was a total of 455 metabolites (features), excluding three response variables. Feature selection techniques were applied to retain the most important features while discarding the rest. Subsequently, four classifications were used for classification: Ridge regression, Lasso regression, Neural network, and Deep learning. Ridge regression yielded the best results for binary classifications such as patient outcomes and CT positive/negative. The accuracy of CT positive/negative was 74% (AUC of 0.74), while the accuracy of patient outcomes was 91% (AUC of 0.91). For severity classification (multi-class classification), neural networks performed well, with a total accuracy of 90%. Despite the limited number of data points, the overall result was satisfactory. TBI (Traumatic brain injury) Metabolites Glasgow coma scale Severity Patient outcomes CT positive /negative Random Forest Boruta Lasso regression Ridge regression Neural network Deep learning. Social Sciences Interdisciplinary
74	Accelerating bulk material property prediction using machine learning potentials for molecular dynamics : predicting physical properties of bulk Aluminium and Silicon / Acceleration av materialegenskapers prediktion med hjälp av maskininlärda potentialer för molekylärdynamik Sepp Löfgren, Nicholas January 2021 (has links) In this project machine learning (ML) interatomic potentials are trained and used in molecular dynamics (MD) simulations to predict the physical properties of total energy, mean squared displacement (MSD) and specific heat capacity for systems of bulk Aluminium and Silicon. The interatomic potentials investigated are potentials trained using the ML models kernel ridge regression (KRR) and moment tensor potentials (MTPs). The simulations using these ML potentials are then compared with results obtained from ab-initio simulations using the gold standard method of density functional theory (DFT), as implemented in the Vienna ab-intio simulation package (VASP). The results show that the MTP simulations reach comparable accuracy compared to the DFT simulations for the properties total energy and MSD for Aluminium, with errors in the orders of magnitudes of meV and 10-5 Å2. Specific heat capacity is not reasonably replicated for Aluminium. The MTP simulations do not reasonably replicate the studied properties for the system of Silicon. The KRR models are implemented in the most direct way, and do not yield reasonably low errors even when trained on all available 10000 time steps of DFT training data. On the other hand, the MTPs require only to be trained on approximately 100 time steps to replicate the physical properties of Aluminium with accuracy comparable to DFT. After being trained on 100 time steps, the trained MTPs achieve mean absolute errors in the orders of magnitudes for the energy per atom and force magnitude predictions of 10-3 and 10-1 respectively for Aluminium, and 10-3 and 10-2 respectively for Silicon. At the same time, the MTP simulations require less core hours to simulate the same amount of time steps as the DFT simulations. In conclusion, MTPs could very likely play a role in accelerating both materials simulations themselves and subsequently the emergence of the data-driven materials design and informatics paradigm. machine learning moment tensor potential kernel ridge regression molecular dynamics density functional theory materials science data-driven materials design maskininlärning molekylärdynamik täthetsfunktionalteori materialvetenskap datadriven materialdesign Physical Sciences Fysik Computer Sciences Datavetenskap (datalogi)
75	Pénalisation et réduction de la dimension des variables auxiliaires en théorie des sondages / Penalization and data reduction of auxiliary variables in survey sampling Shehzad, Muhammad Ahmed 12 October 2012 (has links) Les enquêtes par sondage sont utiles pour estimer des caractéristiques d'une populationtelles que le total ou la moyenne. Cette thèse s'intéresse à l'étude detechniques permettant de prendre en compte un grand nombre de variables auxiliairespour l'estimation d'un total.Le premier chapitre rappelle quelques définitions et propriétés utiles pour lasuite du manuscrit : l'estimateur de Horvitz-Thompson, qui est présenté commeun estimateur n'utilisant pas l'information auxiliaire ainsi que les techniques decalage qui permettent de modifier les poids de sondage de facon à prendre encompte l'information auxiliaire en restituant exactement dans l'échantillon leurstotaux sur la population.Le deuxième chapitre, qui est une partie d'un article de synthèse accepté pourpublication, présente les méthodes de régression ridge comme un remède possibleau problème de colinéarité des variables auxiliaires, et donc de mauvais conditionnement.Nous étudions les points de vue "model-based" et "model-assisted" dela ridge regression. Cette technique qui fournit de meilleurs résultats en termed'erreur quadratique en comparaison avec les moindres carrés ordinaires peutégalement s'interpréter comme un calage pénalisé. Des simulations permettentd'illustrer l'intérêt de cette technique par compar[a]ison avec l'estimateur de Horvitz-Thompson.Le chapitre trois présente une autre manière de traiter les problèmes de colinéaritévia une réduction de la dimension basée sur les composantes principales. Nousétudions la régression sur composantes principales dans le contexte des sondages.Nous explorons également le calage sur les moments d'ordre deux des composantesprincipales ainsi que le calage partiel et le calage sur les composantes principalesestimées. Une illustration sur des données de l'entreprise Médiamétrie permet deconfirmer l'intérêt des ces techniques basées sur la réduction de la dimension pourl'estimation d'un total en présence d'un grand nombre de variables auxiliaires / Survey sampling techniques are quite useful in a way to estimate population parameterssuch as the population total when the large dimensional auxiliary data setis available. This thesis deals with the estimation of population total in presenceof ill-conditioned large data set.In the first chapter, we give some basic definitions that will be used in thelater chapters. The Horvitz-Thompson estimator is defined as an estimator whichdoes not use auxiliary variables. Along with, calibration technique is defined toincorporate the auxiliary variables for sake of improvement in the estimation ofpopulation totals for a fixed sample size.The second chapter is a part of a review article about ridge regression estimationas a remedy for the multicollinearity. We give a detailed review ofthe model-based, design-based and model-assisted scenarios for ridge estimation.These estimates give improved results in terms of MSE compared to the leastsquared estimates. Penalized calibration is also defined under survey sampling asan equivalent estimation technique to the ridge regression in the classical statisticscase. Simulation results confirm the improved estimation compared to theHorvitz-Thompson estimator.Another solution to the ill-conditioned large auxiliary data is given in terms ofprincipal components analysis in chapter three. Principal component regression isdefined and its use in survey sampling is explored. Some new types of principalcomponent calibration techniques are proposed such as calibration on the secondmoment of principal component variables, partial principal component calibrationand estimated principal component calibration to estimate a population total. Applicationof these techniques on real data advocates the use of these data reductiontechniques for the improved estimation of population totals Sondage Colinéarité Régression ridge Calage pénalisé Estimateur assisté par un modèle Estimateur basé sur un modèle Estimateur de Horvitz-Thompson Calage sur composantes principales Survey sampling Multicollinearity Ridge regression Penalized calibration Model-based estimator Model-assisted estimator Horvitz-Thompson estimator Principal component calibration 519

Page generated in 0.055 seconds