• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6867
  • 727
  • 652
  • 593
  • 427
  • 427
  • 427
  • 427
  • 427
  • 424
  • 342
  • 133
  • 119
  • 111
  • 108
  • Tagged with
  • 13129
  • 2380
  • 2254
  • 2048
  • 1772
  • 1657
  • 1447
  • 1199
  • 1066
  • 904
  • 858
  • 776
  • 760
  • 741
  • 739
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
451

Statistical Support Algorithms for Clinical Decisions and Prevention of Genetic-related Heart Disease

Sotero, Charity Faith Gallemit 16 August 2018 (has links)
<p> Drug-induced long QT syndrome (diLQTS) can lead to seemingly healthy patients experiencing cardiac arrest, specifically Torsades de Pointes (TdP), which may lead to death. Clinical decision support systems (CDSS) assist better prescribing of drugs, in part by issuing alerts that warn of the drug&rsquo;s potential harm. LQTS may be either genetic or acquired. Thirteen distinct genetic mutations have already been identified for hereditary LQTS. Since hereditary and acquired LQTS both share similar clinical symptoms, it is reasonable to assume that they both have some sort of genetic component. The goal of this study is to identify genetic risk markers for diLQTS and TdP. These markers will be used to develop a statistical DSS for clinical applications and prevention of genetic-related heart disease. We will use data from a genome-wide associate study conducted by the Pharmacogenomics of Arrhythmia Therapy subgroup of the Pharmacogenetics Research Network, focused on subjects with a history of diLQTS or TdP after taking medication. The data was made available for general research use by National Center for Biotechnology Information (NCBI). The data consists of 831 total patients, with 172 diLQTS and TdP case patients. Out of 620,901 initial markers, variable screening is done by a preliminary t-test (&alpha;=0.01), and the resulting feasible set of 5,754 markers associated with diLQTS to prevent TdP were used to create an appropriate predictive model. Methods used to create a predictive model were ensemble logistic regression, elastic net, random forests, artificial neural networks, and linear discriminant analysis. Of these methods using all 5,754 markers, accuracy ranged from 76.84% to 90.29%, with artificial neural networks as the most accurate model. Finally, variable importance algorithms were applied to extract a feasible set of markers from the ensemble logistic regression, elastic net, and random forests methods, and used to produce a subset of genetic markers suitable to build a proposed DSS. Of the methods using a subset of 61 markers, accuracy ranged from 76.59% to 87.00%, with ensemble logistic regression as the most accurate model. Of the methods using a subset of 22 markers, accuracy ranged from 74.24% to 82.87%, with the single hidden layer neural network (using the subset of markers extracted from the ensemble bagged logistic model) as the most accurate model.</p><p>
452

Nonlinearity Detection Using Penalization-Based Principle

Cong, Jie 08 September 2018 (has links)
<p> When constructing a statistical model, nonlinearity detection has always been an interesting topic and a difficult problem. To balance precision of parametric modeling and robustness of nonparametric modeling, the semi-parametric modeling method has shown very good performance. The specific example, spline fitting, can very well estimate nonlinear patterns. However, as the number of spline bases goes up, the method can generate a large amount of parameters to estimate, especially for multiple dimensional case. It's been discussed in the literature to treat additional slopes of spline bases as random terms, then those slopes can be controlled with a single variance term. The semi-parametric model then becomes a linear mixed effect problem. </p><p> Data of large dimensions has become a serious computation burden, especially when it comes to nonlinearity. A good dimension reduction technique is needed to ease this situation. Methods like LASSO type penalties have very good performance in linear regression. Traditional LASSO add a restriction on slopes to the model. Parameters can be shrunk to 0. Here we extend that method to semi-parametric spline fitting, making it possible to reduce dimensions of nonlinearity. The problem of nonlinearity detection is then transformed to a model selection problem. The penalty is taken on variance terms which control nonlinearity in each dimension. As the limit value changes, variance terms can be shrunk to 0. When one variance term is reduced to 0, the nonlinear part of that dimension is removed from the model. AIC/BIC criteria are used to choose the final model. This method is very challenging since testing is almost impossible due to the boundary situation. </p><p> The method is further extended to generalized additive model. Quasi-likelihood is adopted to simplify the problem, making it similar to partially linear additive case. LASSO type penalties are again performed on variance components of each dimension, making dimension reduction possible for nonlinear terms. Conditional AIC/BIC is used to select the model. </p><p> The dissertation is consisted of five parts. </p><p> In Chapter 1, we have a thorough literature review. All previous works including semi-parametric modeling, penalized spline fitting, linear mixed effect modeling, variable selection methods, and generalized nonparametric modeling are all introduced here. </p><p> In Chapter 2, the model construction is explained in detail for single dimension case. It includes derivation of iteration procedures, computation technique discussion, simulation studies including power analysis, and discussions of other parameter estimation methods. </p><p> In Chapter 3, the model is extended to multiple dimensional case. In addition to model construction, derivation of iteration procedures, computation technique discussion and simulation studies, we have a real data example, using plasma beta-carotene data from a nutritional study. The result shows advantage of nonlinearity detection. </p><p> In Chapter 4, generalized additive modeling is considered. We especially focus on the two most commonly used distributions, Bernoulli distribution and Poisson distribution. Model is constructed using Quasi-likelihood. Two iteration methods are introduced here. Simulation studies are performed on both distributions of one dimensional and multiple dimensional case. We have a real data example using Pima Indian diabetes study dataset. The result also shows advantage of nonlinearity detection. </p><p> In Chapter 5, some possible future works are dicussed. The topics include more complicated covariance matrix structure of random terms, dimension reduction for both linearity and nonlinearity at the same time, bootstrap method with model selection taken into account, and higher degree p-spline setup.</p><p>
453

Community Detection| Fundamental Limits, Methodology, and Variational Inference

Zhang, Ye 21 August 2018 (has links)
<p> Network analysis has become one of the most active research areas over the past few years. A core problem in network analysis is community detection. In this thesis, we investigate it under Stochastic Block Model and Degree-corrected Block Model from three different perspectives: 1) the minimax rates of community detection problem, 2) rate-optimal and computationally feasible algorithms, and 3) computational and theoretical guarantees of variational inference for community detection.</p><p>
454

Reduction of Confidence Interval Length for Small-Normal Data Sets Utilizing Bootstrap and Conformal Prediction Methods

Chavarria, Pablo C. 16 November 2018 (has links)
<p> It is of common practice to evoke a t-confidence interval for estimating the mean of a small data set with an assumed Normal distribution. These t-intervals are known to be wide to account for the lack of information. This thesis will focus on exploring ways to reduce the length of the interval, while preserving the level of confidence. Simulated small normal data sets will be used to analyze a combination of Bootstrapping and Conformal Prediction methods, while investigating measures of spread, such as standard deviation, kurtosis, excess CS kurtosis, skewness, etc. to create a criterion for when this combination of methodologies will greatly reduce the interval length. The goal is to be able to use the insight simulated data have to offer in order to apply to real world data. If time permits, a further look into the theory behind the results will be explored.</p><p>
455

Predicting Four-Year Graduation| A Sequential Modeling Approach

Sims, Michael S. 16 November 2018 (has links)
<p> As a result of the California State Universities having four-year graduation rates among freshman students below 20% over the last few years, the Graduation Initiative 2025 has been deployed. This initiative aims to increase the graduation rates to 40%, while eliminating opportunity and achievement gaps. A signicant impact of this is looking at the success of rst-time-freshmen (FTF) and the prediction of whether or not they will graduate in a timely fashion. To this end, a natural classication problem is identied: amongst the FTF cohort who will graduate in four years or less(class instance = 1), or more than four years (class instance = 0) including students who did not graduate. In this paper, using Area Under the Curve (AUC) as our models performance metric, we construct classication models that quickly identify students at risk of not graduating in a timely fashion. Furthermore, we will construct models cumulatively&mdash;term by term&mdash;where each successive model includes student data from matriculation to the end of a given term. Using this approach allows a University to nd an optimal time to deploy possible intervention programs. It should be noted that optimal in this paper means, having a model with high AUC as early into the students academic career as possible. This way, an at-risk student is identied early, and the value of the University intervening is optimized. In this paper we will compare a variety of classication algorithms such as Logistic Regression, Random Forest, and XGBoost to see which model yields the highest AUC. Also we provide insight on interpretation specically identifying the eect each covariate has on the response. This approach will be unique because not only will it be a means for identifying the problem, but also serve as part of the solution.</p><p>
456

Comprehensive Risk Stratification Model for Prognostication and Assisting with Therapeutic Decision-Making for Multiple Myeloma Patients

Song, Brian 01 November 2018 (has links)
<p> The goal of the research is to improve current risk stratification models of multiple myeloma by developing a novel statistical decision algorithm. The increase in precision would assist in providing optimal treatments for multiple myeloma cancer patients depending on the risk of progression at the time of diagnosis. If progression of cancer is imminent, then risk-adapted therapy would be a considerable option. Larger amount of data supplied from multiple clinics were gathered to obtain better prognosis. The data are available from the Synapse website under the Multiple Myeloma DREAM Challenge site. Although both genomic variation data and gene expression data were available, the study was done with the latter in conjunction with general patient data. Preliminary research has shown that the microarray data were not standardized among the different clinics, so the study required additional preprocessing before aggregating all data for comprehensive investigation. Accelerated Time Failure model is used to screen insignificant variables for easier processing, reducing 17,308 markers to 4,503. A combination of random forest models and likelihood ratio test is utilized to further reduce potentially significant biomarkers. The remaining biomarkers are used in multiple statistical models to determine the optimal model that best represents the data. The efficacy of the model is checked by using two clinics to train the model to predict the third clinic. The average and standard deviation of the resulting statistics are used to validate the consistency of the model for different clinics. We show that an improvement in current risk stratification models can be obtained. </p><p>
457

Robust Experimental Designs for fMRI with an Uncertain Design Matrix

January 2014 (has links)
abstract: Obtaining high-quality experimental designs to optimize statistical efficiency and data quality is quite challenging for functional magnetic resonance imaging (fMRI). The primary fMRI design issue is on the selection of the best sequence of stimuli based on a statistically meaningful optimality criterion. Some previous studies have provided some guidance and powerful computational tools for obtaining good fMRI designs. However, these results are mainly for basic experimental settings with simple statistical models. In this work, a type of modern fMRI experiments is considered, in which the design matrix of the statistical model depends not only on the selected design, but also on the experimental subject's probabilistic behavior during the experiment. The design matrix is thus uncertain at the design stage, making it diffcult to select good designs. By taking this uncertainty into account, a very efficient approach for obtaining high-quality fMRI designs is developed in this study. The proposed approach is built upon an analytical result, and an efficient computer algorithm. It is shown through case studies that the proposed approach can outperform an existing method in terms of computing time, and the quality of the obtained designs. / Dissertation/Thesis / Masters Thesis Statistics 2014
458

Anomaly Detection in Categorical Datasets with Artificial Contrasts

January 2016 (has links)
abstract: Anomaly is a deviation from the normal behavior of the system and anomaly detection techniques try to identify unusual instances based on deviation from the normal data. In this work, I propose a machine-learning algorithm, referred to as Artificial Contrasts, for anomaly detection in categorical data in which neither the dimension, the specific attributes involved, nor the form of the pattern is known a priori. I use RandomForest (RF) technique as an effective learner for artificial contrast. RF is a powerful algorithm that can handle relations of attributes in high dimensional data and detect anomalies while providing probability estimates for risk decisions. I apply the model to two simulated data sets and one real data set. The model was able to detect anomalies with a very high accuracy. Finally, by comparing the proposed model with other models in the literature, I demonstrate superior performance of the proposed model. / Dissertation/Thesis / Masters Thesis Industrial Engineering 2016
459

Essays on the Identification and Modeling of Variance

January 2018 (has links)
abstract: In the presence of correlation, generalized linear models cannot be employed to obtain regression parameter estimates. To appropriately address the extravariation due to correlation, methods to estimate and model the additional variation are investigated. A general form of the mean-variance relationship is proposed which incorporates the canonical parameter. The two variance parameters are estimated using generalized method of moments, negating the need for a distributional assumption. The mean-variance relation estimates are applied to clustered data and implemented in an adjusted generalized quasi-likelihood approach through an adjustment to the covariance matrix. In the presence of significant correlation in hierarchical structured data, the adjusted generalized quasi-likelihood model shows improved performance for random effect estimates. In addition, submodels to address deviation in skewness and kurtosis are provided to jointly model the mean, variance, skewness, and kurtosis. The additional models identify covariates influencing the third and fourth moments. A cutoff to trim the data is provided which improves parameter estimation and model fit. For each topic, findings are demonstrated through comprehensive simulation studies and numerical examples. Examples evaluated include data on children’s morbidity in the Philippines, adolescent health from the National Longitudinal Study of Adolescent to Adult Health, as well as proteomic assays for breast cancer screening. / Dissertation/Thesis / Doctoral Dissertation Statistics 2018
460

Statistical analysis of the effect of metformin use on endometrial cancer incidence in patients with type 2 diabetes

Marttila, M. (Mikko) 15 June 2016 (has links)
The use of metformin, a common orally ingested first-line therapy for treatment of type 2 diabetes, has been associated with a lowered risk of endometrial cancer incidence in observational studies. However, contradicting evidence has been observed in recent studies, along with discoveries of methodological flaws in some earlier studies. The goal of this analysis was to obtain additional methodologically sound evidence on the relationship between metformin use and endometrial cancer incidence. A population based cohort of women aged 40 or over with a new diagnosis of type 2 diabetes in 1996–2011 was assembled from Finnish registry sources. The primary measure of exposure to metformin was chosen as the subject’s ever-use status. A full cohort analysis was performed using Poisson regression, along with a nested case-control analysis, with up to 20 controls per case, where the effect of the cumulative amount of defined daily doses (DDD) of medication used was estimated. Both analyses were adjusted for patient age and duration of diabetes as well as the use of statins, insulin and other anti-diabetic medication. In the full cohort analysis there was a total of 89 871 patients followed, with a mean time on follow-up of 5.5 years, and a median of 4.6 years. During follow-up there were 580 cases of type 1 endometrial cancer, and 57 cases of type 2 or 3 endometrial cancers observed. The incidence rate of type 1 endometrial cancer in the diabetic cohort was 117.7 per 100 000 person-years. Metformin ever use was associated with an increased risk of type 1 endometrial cancer (HR: 1.26; 95% CI: 1.03–1.54) in the results obtained from the nested case-control analysis. A slight trend of increasing risk with the cumulative dose of metformin used was also observed. However, it is possible that this result is confounded by body mass index (BMI), of which data were not available, and which is known to be associated with an increased risk of endometrial cancer and may be associated with the use of metformin. Regardless, it can be said that the results obtained from this study do not support the hypothesis that the use of metformin would lower the risk of endometrial cancer for patients with type 2 diabetes. / Metformiini on tyypin 2 diabeteksen hoidossa suosittu ensilinjan lääke, jonka käytön on epäkokeellisissa tutkimuksissa havaittu olevan yhteydessä alentuneeseen kohdunrungon syövän ilmaantuvuuteen. Viimeaikaisista tutkimuksista on kuitenkin saatu ristiriitaista näyttöä, ja joidenkin aiempien tutkimusten tilastollisissa menetelmissä on havaittu puutteita. Tämän analyysin tavoitteena oli tuottaa lisää menetelmällisesti luotettavaa tietoa metformiinin ja kohdunrungon syövän välisestä yhteydestä. Tutkimusta varten koottiin suomalaisista rekisterilähteistä väestöpohjainen kohortti yli 40-vuotiaista naisista jotka saivat uuden tyypin 2 diabeteksen diagnoosin vuosina 1996–2011. Ensisijaisena metformiinin altistuksen mittarina käytettiin tietoa siitä, onko potilas koskaan käyttänyt metformiinia. Koko kohortti analysoitiin käyttäen Poisson-regressiomallia. Lisäksi kohortista poimittiin tiheysotannalla tapaus-verrokki otos lääkkeiden kumulatiivisen annoksen vaikutuksen estimointia varten. Analyysit vakioitiin iän, diabeteksen keston sekä statiinien, insuliinin ja diabeteslääkityksen käytön suhteen. Tutkimuksen aikana kohortissa seurattiin yhteensä 89 871 naista, joiden keskimääräinen seuranta-aika oli 5,5 vuotta (mediaani 4,6 vuotta). Seurannan aikana havaittiin yhteensä 580 tyypin 1 kohdunrungon syöpää ja 57 tyypin 2 tai 3 kohdunrungon syöpää. Tyypin 1 kohdunrungon syövän ilmaantuvuus kohortissa oli 117,7 per 100 000 henkilövuotta. Metformiinia joskus käyttäneiden potilaiden joukossa havaitiin kohonnut kohdunrungon syövän riski tapaus-verrokki otoksen analyysissä (riskitiheyssuhde: 1,26; 95 % luottamusväli: 1,03–1,54). Lisäksi metformiinin kumulatiivisen annoksen kasvaessa syövän riskin havaittiin hieman kasvavan. On kuitenkin mahdollista, että tuloksiin on tullut sekoittuneisuutta painoindeksin (BMI) suhteen. Tutkimusaineistosta puuttuva painoindeksitieto on tunnetusti yhteydessä kohonneeseen kohdunrungon syövän riskiin, ja lisäksi metformiinin käyttö voi olla yleisempää ylipainoisilla potilailla kuin normaalipainoisilla. Huolimatta mahdollisesta sekoittuneisuudesta, tämän tutkimuksen tulosten ei voida sanoa tukevan hypoteesia, jonka mukaan metformiinin käyttö alentaisi kohdunrungon syövän ilmaantuvuutta tyypin 2 diabeetikoilla.

Page generated in 0.0625 seconds