Spelling suggestions: "subject:"regularized regression"" "subject:"sigularized regression""
1 |
A Moving-window penalization method and its applicationsBao, Minli 01 August 2017 (has links)
Genome-wide association studies (GWAS) has played an import role in identifying genetic variants underlying human complex traits. However, its success is hindered by weak effect at causal variants and noise at non-causal variants. Penalized regression can be applied to handle GWAS problems. GWAS data has some specificities. Consecutive genetic markers are usually highly correlated due to linkage disequilibrium.
This thesis introduces a moving-window penalized method for GWAS which smooths the effects of consecutive SNPs. Simulation studies indicate that this penalized moving window method provides improved true positive findings. The practical utility of the proposed method is demonstrated by applying it to Genetic Analysis Workshop 16 Rheumatoid Arthritis data.
Next, the moving-window penalty is applied on generalized linear model. We call such an approach as smoothed lasso (SLasso). Coordinate descent computing algorithms are proposed in details, for both quadratic and logistic loss. Asymptotic properties are discussed. Then based on SLasso, we discuss a two-stage method called MW-Ridge. Simulation results show that while SLasso can provide more true positive findings than Lasso, it has a side-effect that it includes more unrelated random noises. MW-Ridge can eliminate such a side-effect and result in high true positive rates and low false detective rates. The applicability to real data is illustrated by using GAW 16 Rheumatoid Arthritis data.
The SLasso and MW-Ridge approaches are then generalized to multivariate response data. The multivariate response data can be transformed into univariate response data. The causal variants are not required to be the same for different response variables. We found that no matter how the causal variants are matched, being fully matched or 60% matched, MW-Ridge can always over perform Lasso by detecting all true positives with lower false detective rates.
|
2 |
Markers Of Alcohol Use Disorder Outpatient Treatment Outcome: Prediction Modeling Of Day One TreatmentSchaubhut, Geoffrey J 01 January 2020 (has links)
ABSTRACT
Background: Alcohol use disorders (AUD) affect health and wellbeing, and have broad societal costs (Bouchery, Harwood, Sacks, Simon, & Brewer, 2011; Rehm et al., 2009; Sudhinaraset, Wigglesworth, Takeuchi, & Tsuker, 2016). While treatments have existed for decades, they are limited in success and expensive to administer. As such, understanding which factors best predict who will benefit most from treatment remains a laudable goal. Prior attempts to predict factors associated with positive treatment outcome are limited by methodology including statistical methods that lead to poor predictive power in new samples. This study aims to use a data-driven approach to clarify the predictors of AUD treatment success (Objective 1) accompanied by a theory-driven analysis assessing the mediation of treatment outcomes through psychological distress (Objective 2). Methods: One hundred forty-five patients seeking treatment for alcohol use problems at the Day One Intensive Outpatient Treatment Program (part of UVM Medical Center) between June 2011 and June 2012 were examined. Variables were extracted through chart review and were categorized using the Bronfenbrenner Ecological Model. First, 20% of the sample was set-aside for model testing, and the remaining 80% was used in an Elastic Net Regularized linear regression, with 10-fold cross validation. Models were tested on the set-aside sample to yield estimates of out-of-sample prediction and repeated models were compared to ensure generalizability. Next, a theoretical model was tested examining a model of psychological distress mediating the relationship between individual predictors and treatment outcome. Results: The models developed from the Elastic Net Regularization approach demonstrated consistency in model strength (mean=0.32, standard deviation=0.03) with models ranging from 14 to 31 included variables. Across the models, 15 variables occurred in >75% of the models, and an additional 7 variables were included in 25% - 75% of the models. Some of the strongest predictors included treatment non-compliance (β=-0.92), ASI Alcohol Composite (β=0.63), treatment dosage (β =-0.36), and readiness to change (β=-0.95). The results of the theory-driven mediation analysis demonstrated several strong direct predictors of outcome frequency of alcohol use, including readiness to change (β=-0.59), initial frequency of alcohol use (β=0.27), and access to a primary care physician (β=-2.20). The theoretical model found that none of the mediation pathways (testing psychological variables) were significantly different from the direct models. Conclusions: This study used both data-driven and theory-driven methods to examine factors affecting treatment of AUDs. The application of data-driven methods provided several predictors of outcome that can guide treatment efforts within Day One IOP treatment, as well as generalized to other abstinence-based treatment settings. For example, focusing on treatment attendance and using motivational interviewing to enhance readiness to change are methods supported by this study. Demographic variables that have been shown to predict treatment outcome in small studies, without cross-validation were not identified by the elastic net regression (e.g., age and gender). It is suspected that this is due to model overfitting in prior studies supporting the importance of using generalizable statistical methods to understand predictors of treatment outcome. This notion is supported by the results of the theory-driven model, which did not yield a strong model of treatment success. Taken together, the results support the use of strong analytic techniques which will guide theory in the future.
|
3 |
Exploring relevant features associated with measles nonvaccination using a machine learning approachOlaya Bucaro, Orlando January 2020 (has links)
Measles is resurging around the world, and large outbreaks have been observed in several parts of the world. In 2019 the Philippines suffered a major measles outbreak partly due to low immunization rates in certain parts of the population. There is currently limited research on how to identify and reach pockets of unvaccinated individuals effectively. This thesis aims to find important factors associated with non-vaccination against measles using a machine learning approach, using data from the 2017 Philippine National Demographic and Health Survey. In the analyzed sample (n = 4006), 74.84% of children aged 9 months to 3 years had received their first dose of measles vaccine, and 25.16% had not. Logistic regression with all 536 candidate features was fit with the regularized regression method Elastic Net, capable of automatically selecting relevant features. The final model consists of 32 predictors, and these are related to access and contact with healthcare, the region of residence, wealth, education, religion, ethnicity, sanitary conditions, the ideal number of children, husbands’ occupation, age and weight of the child, and features relating to pre and postnatal care. Total accuracy of the final model is 79.02% [95% confidence interval: (76.37%, 81.5%)], sensitivity: 97.73%, specificity: 23.41% and area under receiver operating characteristic curve: 0.81. The results indicate that socioeconomic differences determine to a degree measles vaccination. However, the difficulty in classifying non-vaccinated children, the low specificity, using only health and demographic characteristics suggests other factors than what is available in the analyzed data, possibly vaccine hesitation, could have a large effect on measles non-vaccination. Based on the results, efforts should be made to ensure access to facility-based delivery for all mothers regardless of socioeconomic status, to improve measles vaccination rates in the Philippines.
|
4 |
Offline Reinforcement Learning for Downlink Link Adaption : A study on dataset and algorithm requirements for offline reinforcement learning. / Offline Reinforcement Learning för nedlänksanpassning : En studie om krav på en datauppsättning och algoritm för offline reinforcement learningDalman, Gabriella January 2024 (has links)
This thesis studies offline reinforcement learning as an optimization technique for downlink link adaptation, which is one of many control loops in Radio access networks. The work studies the impact of the quality of pre-collected datasets, in terms of how much the data covers the state-action space and whether it is collected by an expert policy or not. The data quality is evaluated by training three different algorithms: Deep Q-networks, Critic regularized regression, and Monotonic advantage re-weighted imitation learning. The performance is measured for each combination of algorithm and dataset, and their need for hyperparameter tuning and sample efficiency is studied. The results showed Critic regularized regression to be the most robust because it could learn well from any of the datasets that were used in the study and did not require extensive hyperparameter tuning. Deep Q-networks required careful hyperparameter tuning, but paired with the expert data it managed to reach rewards equally as high as the agents trained with Critic Regularized Regression. Monotonic advantage re-weighted imitation learning needed data from an expert policy to reach a high reward. In summary, offline reinforcement learning can perform with success in a telecommunication use case such as downlink link adaptation. Critic regularized regression was the preferred algorithm because it could perform great with all the three different datasets presented in the thesis. / Denna avhandling studerar offline reinforcement learning som en optimeringsteknik för nedlänks länkanpassning, vilket är en av många kontrollcyklar i radio access networks. Arbetet undersöker inverkan av kvaliteten på förinsamlade dataset, i form av hur mycket datan täcker state-action rymden och om den samlats in av en expertpolicy eller inte. Datakvaliteten utvärderas genom att träna tre olika algoritmer: Deep Q-nätverk, Critic regularized regression och Monotonic advantage re-weighted imitation learning. Prestanda mäts för varje kombination av algoritm och dataset, och deras behov av hyperparameterinställning och effektiv användning av data studeras. Resultaten visade att Critic regularized regression var mest robust, eftersom att den lyckades lära sig mycket från alla dataseten som användes i studien och inte krävde omfattande hyperparameterinställning. Deep Q-nätverk krävde noggrann hyperparameterinställning och tillsammans med expertdata lyckades den nå högst prestanda av alla agenter i studien. Monotonic advantage re-weighted imitation learning behövde data från en expertpolicy för att lyckas lära sig problemet. Det datasetet som var mest framgångsrikt var expertdatan. Sammanfattningsvis kan offline reinforcement learning vara framgångsrik inom telekommunikation, specifikt nedlänks länkanpassning. Critic regularized regression var den föredragna algoritmen för att den var stabil och kunde prestera bra med alla tre olika dataseten som presenterades i avhandlingen.
|
5 |
Développement des méthodes génériques d'analyses multi-variées pour la surveillance de la qualité du produit / Development of multivariate analysis methods for the product quality predictionMelhem, Mariam 20 November 2017 (has links)
L’industrie microélectronique est un domaine compétitif, confronté de manière permanente à plusieurs défis. Pour évaluer les étapes de fabrication, des tests de qualité sont appliqués. Ces tests étant discontinus, une défaillance des équipements peut causer une dégradation de la qualité du produit. Des alarmes peuvent être déclenchées pour indiquer des problèmes. D’autre part, on dispose d’une grande quantité de données des équipements obtenues à partir de capteurs. Une gestion des alarmes, une interpolation de mesures de qualité et une réduction de données équipements sont nécessaires. Il s’agit dans notre travail à développer des méthodes génériques d’analyse multi-variée permettant d’agréger toutes les informations disponibles sur les équipements pour prédire la qualité de produit en prenant en compte la qualité des différentes étapes de fabrication. En se basant sur le principe de reconnaissance de formes, nous avons proposé une approche pour prédire le nombre de produits restant à produire avant les pertes de performance liée aux spécifications clients en fonction des indices de santé des équipement. Notre approche permet aussi d'isoler les équipements responsables de dégradation. En plus, une méthodologie à base de régression régularisée est développée pour prédire la qualité du produit tout en prenant en compte les relations de corrélations et de dépendance existantes dans le processus. Un modèle pour la gestion des alarmes est construit où des indices de criticité et de similarité sont proposés. Les données alarmes sont ensuite utilisées pour prédire le rejet de produits. Une application sur des données industrielles provenant de STMicroelectronics est fournie. / The microelectronics industry is a highly competitive field, constantly confronted with several challenges. To evaluate the manufacturing steps, quality tests are applied during and at the end of production. As these tests are discontinuous, a defect or failure of the equipment can cause a deterioration in the product quality and a loss in the manufacturing Yield. Alarms are setting off to indicate problems, but periodic alarms can be triggered resulting in alarm flows. On the other hand, a large quantity of data of the equipment obtained from sensors is available. Alarm management, interpolation of quality measurements and reduction of correlated equipment data are required. We aim in our work to develop generic methods of multi-variate analysis allowing to aggregate all the available information (equipment health indicators, alarms) to predict the product quality taking into account the quality of the various manufacturing steps. Based on the pattern recognition principle, data of the degradation trajectory are compared with health indices for failing equipment. The objective is to predict the remaining number of products before loss of the performance related to customer specifications, and the isolation of equipment responsible for degradation. In addition, regression- ased methods are used to predict the product quality while taking into account the existing correlation and the dependency relationships in the process. A model for the alarm management is constructed where criticality and similarity indices are proposed. Then, alarm data are used to predict the product scrap. An application to industrial data from STMicroelectronics is provided.
|
Page generated in 0.1044 seconds