• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 257
  • 111
  • 16
  • 15
  • 13
  • 4
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 3
  • 1
  • Tagged with
  • 446
  • 129
  • 110
  • 88
  • 74
  • 57
  • 52
  • 46
  • 39
  • 32
  • 32
  • 31
  • 29
  • 25
  • 23
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
161

On testing for the Cox model using resampling methods

Fang, Jing, 方婧 January 2007 (has links)
published_or_final_version / abstract / Statistics and Actuarial Science / Master / Master of Philosophy
162

Modelling multivariate survival data using semiparametric models

李友榮, Lee, Yau-wing. January 2000 (has links)
published_or_final_version / Statistics and Actuarial Science / Master / Master of Philosophy
163

ESTIMATING POTENCY IN BIOASSAY IN THE PRESENCE OF AUTOCORRELATED ERRORS.

Maurer, Brian Alan, 1954- January 1982 (has links)
No description available.
164

Machine Learning Methods for Personalized Medicine Using Electronic Health Records

Wu, Peng January 2019 (has links)
The theme of this dissertation focuses on methods for estimating personalized treatment using machine learning algorithms leveraging information from electronic health records (EHRs). Current guidelines for medical decision making largely rely on data from randomized controlled trials (RCTs) studying average treatment effects. However, RCTs are usually conducted under specific inclusion/exclusion criteria, they may be inadequate to make individualized treatment decisions in real-world settings. Large-scale EHR provides opportunities to fulfill the goals of personalized medicine and learn individualized treatment rules (ITRs) depending on patient-specific characteristics from real-world patient data. On the other hand, since patients' electronic health records (EHRs) document treatment prescriptions in the real world, transferring information in EHRs to RCTs, if done appropriately, could potentially improve the performance of ITRs, in terms of precision and generalizability. Furthermore, EHR data domain usually consists text notes or similar structures, thus topic modeling techniques can be adapted to engineer features. In the first part of this work, we address challenges with EHRs and propose a machine learning approach based on matching techniques (referred as M-learning) to estimate optimal ITRs from EHRs. This new learning method performs matching method instead of inverse probability weighting as commonly used in many existing methods for estimating ITRs to more accurately assess individuals' treatment responses to alternative treatments and alleviate confounding. Matching-based value functions are proposed to compare matched pairs under a unified framework, where various types of outcomes for measuring treatment response (including continuous, ordinal, and discrete outcomes) can easily be accommodated. We establish the Fisher consistency and convergence rate of M-learning. Through extensive simulation studies, we show that M-learning outperforms existing methods when propensity scores are misspecified or when unmeasured confounders are present in certain scenarios. In the end of this part, we apply M-learning to estimate optimal personalized second-line treatments for type 2 diabetes patients to achieve better glycemic control or reduce major complications using EHRs from New York Presbyterian Hospital (NYPH). In the second part, we propose a new domain adaptation method to learn ITRs in by incorporating information from EHRs. Unless assuming no unmeasured confounding in EHRs, we cannot directly learn the optimal ITR from the combined EHR and RCT data. Instead, we first pre-train “super" features from EHRs that summarize physicians' treatment decisions and patients' observed benefits in the real world, which are likely to be informative of the optimal ITRs. We then augment the feature space of the RCT and learn the optimal ITRs stratifying by these features using RCT patients only. We adopt Q-learning and a modified matched-learning algorithm for estimation. We present theoretical justifications and conduct simulation studies to demonstrate the performance of our proposed method. Finally, we apply our method to transfer information learned from EHRs of type 2 diabetes (T2D) patients to improve learning individualized insulin therapies from an RCT. In the last part of this work, we report M-learning proposed in the first part to learn ITRs using interpretable features extracted from EHR documentation of medications and ICD diagnoses codes. We use a latent Dirichlet allocation (LDA) model to extract latent topics and weights as features for learning ITRs. Our method achieves confounding reduction in observational studies through matching treated and untreated individuals and improves treatment optimization by augmenting feature space with clinically meaningful LDA-based features. We apply the method to extract LDA-based features in EHR data collected at NYPH clinical data warehouse in studying optimal second-line treatment for T2D patients. We use cross validation to show that ITRs outperforms uniform treatment strategies (i.e., assigning insulin or another class of oral organic compounds to all individuals), and including topic modeling features leads to more reduction of post-treatment complications.
165

Extração de minúcias em imagens de impressões digitais / Extraction minutiae in fingerprints

Casado, Ricardo Salvino 17 June 2008 (has links)
Este trabalho descreve um método desenvolvido para a extração de minúcias em imagens de impressões digitais, baseado na abordagem clássica de binarização da imagem. O método é composto por três módulos: módulo de pré-processamento, para o aumento da discriminação visual, módulo de extração de minúcias, e módulo de pós-processamento para a remoção de falsas minúcias. As imagens de impressões digitais foram obtidas da base de dados FVC2004 (Fingerprint Verification Competition), e incluem imagens sintéticas e reais. A validação dos testes foi feita através de métodos quantitativos de medida chamados sensibilidade e especificidade. Os melhores resultados do software implementado foram obtidos com as imagens sintéticas seguido das imagens adquiridas com sensor óptico. As imagens obtidas através de sensor térmico apresentaram diferença de resultado considerável em relação às imagens dos demais bancos, pelo fato de conterem maior quantidade de ruído. / In this research a method for minutiae extraction in fingerprints images, based on classical approach of image binarization, is presented. The method is composed by three main modules: preprocessing, minutiae detection and post-processing. It was used fingerprint images from FVC2004 database (Fingerprint Verification Competition) that includes synthetic and real images. The tests validation was made through quantitative methods of measurement called sensitivity and specificity. The best results obtained with the developed software were with synthetic images followed by the images acquired with optical sensor. The results obtained with the thermal sensor images were worse than the ones obtained with synthetic and real images, showing a considerable difference, because they contain more noise.
166

Extração de minúcias em imagens de impressões digitais / Extraction minutiae in fingerprints

Ricardo Salvino Casado 17 June 2008 (has links)
Este trabalho descreve um método desenvolvido para a extração de minúcias em imagens de impressões digitais, baseado na abordagem clássica de binarização da imagem. O método é composto por três módulos: módulo de pré-processamento, para o aumento da discriminação visual, módulo de extração de minúcias, e módulo de pós-processamento para a remoção de falsas minúcias. As imagens de impressões digitais foram obtidas da base de dados FVC2004 (Fingerprint Verification Competition), e incluem imagens sintéticas e reais. A validação dos testes foi feita através de métodos quantitativos de medida chamados sensibilidade e especificidade. Os melhores resultados do software implementado foram obtidos com as imagens sintéticas seguido das imagens adquiridas com sensor óptico. As imagens obtidas através de sensor térmico apresentaram diferença de resultado considerável em relação às imagens dos demais bancos, pelo fato de conterem maior quantidade de ruído. / In this research a method for minutiae extraction in fingerprints images, based on classical approach of image binarization, is presented. The method is composed by three main modules: preprocessing, minutiae detection and post-processing. It was used fingerprint images from FVC2004 database (Fingerprint Verification Competition) that includes synthetic and real images. The tests validation was made through quantitative methods of measurement called sensitivity and specificity. The best results obtained with the developed software were with synthetic images followed by the images acquired with optical sensor. The results obtained with the thermal sensor images were worse than the ones obtained with synthetic and real images, showing a considerable difference, because they contain more noise.
167

Sequential Designs for Individualized Dosing in Phase I Cancer Clinical Trials

Mao, Xuezhou January 2014 (has links)
This dissertation presents novel sequential dose-finding designs that adjust for inter-individual pharmacokinetic variability in phase I cancer clinical trials. Unlike most traditional dose-finding designs whose primary goals are the determination of a maximum safe dose, the goal of our proposed designs is to estimate a patient-specific dosing function such that the responses of patients can achieve a target safety level. Extending from a single compartment model in the pharmacokinetic theory, we first postulate a linear model to describe the relationship between the area under concentration-time curve, dose and predicted clearance. We propose a repeated least squares procedure that aims to sequentially determine dose according to individual ability of metabolizing the drug. To guarantee consistent estimation of the individualized dosing function at the end of a trial, we apply repeated least squares subject to a consistency constraint based on an eigenvalue theory for stochastic linear regression. We empirically determine the convergence rate of the eigenvalue constraint using a real data set from an irinotecan study in colorectal carcinoma patients, and calibrate the procedure to minimize a loss function that accounts for the dosing costs of study subjects and future patients. When compared to the traditional body surface area and an equation based dosing methods, the simulation results demonstrate that the repeated least squares procedure control the dosing cost and allow for precise estimation of the dosing function. Furthermore, in order to enhance the generality and robustness of the dose-finding designs, we generalize the linear association to a nonlinear relationship between the response and a linear combination of dose and predicted clearance. We propose a two-stage sequential design, the semiparametric link-adapted recursion, which targets at individualizing dose assignments meanwhile adapting for an unknown nonlinear link function connecting the response and dose along with predicted clearance. The repeat least squares with eigenvalue constraint design is utilized as the first stage, and the second stage recursively applies an iterative semiparametric least squares approach to estimate the dosing function and determine dosage for next patient. The simulation results demonstrate that: at first, the performance of repeated least squares with eigenvalue constraint design is acceptably robust to model misspecifications; at second, as its performance is close to that of repeated least squares procedure under parametric models, the semiparametric link-adapted recursion does not sacrifice much estimation accuracy to gain robustness against model misspecifications; at last, compared to the repeated least squares procedure, the semiparametric link-adapted recursion can significantly improve the dosing costs and estimation precision under the semiparametric models.
168

Data-Driven Methods for Identifying and Validating Shorter Symptom Criteria Sets: The Case for DSM-5 Substance Use Disorders

Raffo, Cheryl January 2018 (has links)
In psychiatry, the Diagnostic and Statistical Manual of Mental Disorders (DSM) is the standard classification system used by clinicians to diagnose disorders. The DSM provides criteria sets that are quantifiable and directly observable measures or symptoms associated with each disorder. For classification, a minimum number of criteria must be observed and once this threshold is met, a disorder is considered to be present. For some disorders, a dimensional classification is also provided by the DSM where severity of disorder increases as the number of criteria observed increases (i.e., None, Mild, Moderate and Severe). While the criteria sets provided by the DSM are the primary assessment mechanisms used by clinicians in psychiatric disease classification, some criteria sets may have too many items making them problematic and/or inefficient in clinical and research settings. In addition, psychiatric disorders are inherently latent constructs without any direct visual or biological observation available which makes validation of psychiatric diagnoses difficult. The present dissertation proposes and applies two empirical statistical methods to address lengthy criteria sets and validation of diagnoses. The first proposal is a data-driven method packaged as a SAS Macro that systematically identifies subsets of criteria and associated cut-offs (i.e., diagnostic short-forms) that yield diagnoses as similar as possible as using the full criteria set. The motivating example is alcohol use disorder (AUD) which is a type of substance use disorder (SUD) in the DSM-5. A diagnosis of AUD is made when two or more of the 11 possible criteria associated with it are observed. Relying on data from the National Epidemiologic Survey on Alcohol and Related Conditions (NESARC-III), the new methodology identifies diagnostic short-forms for AUD by: (1) maximizing the association between the sum scores of all 11 criteria with newly constructed subscales from subsets of criteria, (2) optimizing the similarity of AUD prevalence between the current DSM-5 rule and newly constructed diagnostic short-forms, (3) maximizing sensitivity and specificity of the short-forms against the current DSM-5 rule, and (4) minimizing differences in the accuracy of the short-form across chosen covariates. The second method introduces external validators of disorder into the process of identifying and validating short-forms. Each step in the first methodology uses some type of comparison (i.e., maximizing correlation, sensitivity, specificity) with the current DSM rule assuming the DSM is the best diagnostic target to use. However, the method does not itself assess the validity of the criteria-based definition but instead relies on the validity of the original diagnosis. For the second methodology, we no longer assume the validity of the current DSM rule and instead introduce the use of external validators (antecedent, concurrent, and predictive) as the target when identifying short-forms. Application of the method is again AUD and the NESARC III is used as the data source. Rather than use the binary yes/no diagnosis, we use the dimensional classification framework provided by the DSM to identify and validate subsets and associated severity cut-offs (i.e., dimensional short-forms) in a systematic way. Using each external validator separately in the process could prove difficult in determining a consensus across the validators. Instead, our methodology offers a way to combine these external validators into a singular summary measure using factor analysis that derives the external composite validator (ECV). Using NESARC-III and following principles of convergent validity, we identify dimensional short-forms that most relate to the ECV in theoretically justified ways. Specifically, we obtain nested subsets of the original criteria set that (1) maximize the association between ECV and newly constructed subscales from subsets of criteria and (2) obtain associated severity cut-offs that maximally discriminate on ECV based on R-Squared. Substance use disorders in the DSM-5 include alcohol use disorder (AUD), nicotine use disorder (NUD) and drug use disorders (DUDs). Each of these substances is associated with a single underlying SUD construct with the same 11 diagnostic criteria used across each substance and the same diagnostic classifications. Cannabis and non-medical prescription opioids are two examples of DUDs and both have recently been identified as major public health priorities. Due to their diagnostic similarity to AUD in the DSM-5, these substances were ideal to also test our methodologies. Using data from the NESARC on criteria for cannabis use disorder (CUD) and opioid use disorder (OUD), we forward applied the diagnostic short-forms that accurately replicated AUD and also applied the methods to each substance separately. Overall, the new methodology was able to identify shorter criteria sets for AUD, CUD, and OUD that yielded highly accurate diagnosis compared to the current DSM (i.e., high sensitivity and specificity). Specifically, excluding criteria “Neglected major roles to use” and/or “Activities given up to use” created no marked change in ability to diagnose or measure severity the same way as DSM-5. When applying the method for identifying the most valid dimensional short-forms using external validators, different severity cut-points compared to the current DSM-5 were found and different cut-points were found across AUD, OUD, and CUD. There were dimensional short-forms with as few as 7 criteria for AUD, CUD and OUD that demonstrated the same or better level of validity as using all 11 criteria. We discuss the implications of these findings and propose recommendations for future DSM revisions. Lastly, we review limitations and future extensions of each of our proposed methodologies.
169

Statistical Learning Methods for Personalized Medicine

Qiu, Xin January 2018 (has links)
The theme of this dissertation is to develop simple and interpretable individualized treatment rules (ITRs) using statistical learning methods to assist personalized decision making in clinical practice. Considerable heterogeneity in treatment response is observed among individuals with mental disorders. Administering an individualized treatment rule according to patient-specific characteristics offers an opportunity to tailor treatment strategies to improve response. Black-box machine learning methods for estimating ITRs may produce treatment rules that have optimal benefit but lack transparency and interpretability. Barriers to implementing personalized treatments in clinical psychiatry include a lack of evidence-based, clinically interpretable, individualized treatment rules, a lack of diagnostic measure to evaluate candidate ITRs, a lack of power to detect treatment modifiers from a single study, and a lack of reproducibility of treatment rules estimated from single studies. This dissertation contains three parts to tackle these barriers: (1) methods to estimate the best linear ITR with guaranteed performance among the class of linear rules; (2) a tree-based method to improve the performance of a linear ITR fitted from the overall sample and identify subgroups with a large benefit; and (3) an integrative learning combining information across trials to provide an integrative ITR with improved efficiency and reproducibility. In the first part of the dissertation, we propose a machine learning method to estimate optimal linear individualized treatment rules for data collected from single stage randomized controlled trials (RCTs). In clinical practice, an informative and practically useful treatment rule should be simple and transparent. However, because simple rules are likely to be far from optimal, effective methods to construct such rules must guarantee performance, in terms of yielding the best clinical outcome (highest reward) among the class of simple rules under consideration. Furthermore, it is important to evaluate the benefit of the derived rules on the whole sample and in pre-specified subgroups (e.g., vulnerable patients). To achieve both goals, we propose a robust machine learn- ing algorithm replacing zero-one loss with an authentic approximation loss (ramp loss) for value maximization, referred to as the asymptotically best linear O-learning (ABLO), which estimates a linear treatment rule that is guaranteed to achieve optimal reward among the class of all linear rules. We then develop a diagnostic measure and inference procedure to evaluate the benefit of the obtained rule and compare it with the rules estimated by other methods. We provide theoretical justification for the proposed method and its inference procedure, and we demonstrate via simulations its superior performance when compared to existing methods. Lastly, we apply the proposed method to the Sequenced Treatment Alternatives to Relieve Depression (STAR*D) trial on major depressive disorder (MDD) and show that the estimated optimal linear rule provides a large benefit for mildly depressed and severely depressed patients but manifests a lack-of-fit for moderately depressed patients. The second part of the dissertation is motivated by the results of real data analysis in the first part, where the global linear rule estimated by ABLO from the overall sample performs inadequately on the subgroup of moderately depressed patients. Therefore, we aim to derive a simple and interpretable piece-wise linear ITR to maintain certain optimality that leads to improved benefit in subgroups of patients, as well as the overall sample. In this work, we propose a tree-based robust learning method to estimate optimal piece-wise linear ITRs and identify subgroups of patients with a large benefit. We achieve these goals by simultaneously identifying qualitative and quantitative interactions through a tree model, referred to as the composite interaction tree (CITree). We show that it has improved performance compared to existing methods on both overall sample and subgroups via extensive simulation studies. Lastly, we fit CITree to Research Evaluating the Value of Augmenting Medication with Psychotherapy (REVAMP) trial for treating major depressive disorders, where we identified both qualitative and quantitative interactions and subgroups of patients with a large benefit. The third part deals with the difficulties in the low power of identifying ITRs and replicating ITRs due to small sample sizes of single randomized controlled trials. In this work, a novel integrative learning method is developed to synthesize evidence across trials and provide an integrative ITR that improves efficiency and reproducibility. Our method does not require all studies to collect a common set of variables and thus allows information to be combined from ITRs identified from randomized controlled trials with heterogeneous sets of baseline covariates collected from different domains with different resolution. Based on the research goal, the integrative learning can be used to enhance a high-resolution ITR by borrowing information from coarsened ITRs or improve the coarsened ITR from a high-resolution ITR. With a simple modification, the proposed integrative learning can also be applied to improve the estimation of ITRs for studies with blockwise missing feature variables. We conduct extensive simulation studies to show that our method has improved performance compared to existing methods where only single-trial ITRs are used to learn personalized treatment rules. Lastly, we apply the proposed method to RCTs of major depressive disorder and other comorbid mental disorders. We found that by combining information from two studies, the integrated ITR has a greater benefit and improved efficiency compared to single-trial rules or universal non-personalized treatment rule.
170

Statistical Methods for Integrated Cancer Genomic Data Using a Joint Latent Variable Model

Drill, Esther January 2018 (has links)
Inspired by the TCGA (The Cancer Genome Atlas), we explore multimodal genomic datasets with integrative methods using a joint latent variable approach. We use iCluster+, an existing clustering method for integrative data, to identify potential subtypes within TCGA sarcoma and mesothelioma tumors, and across a large cohort of 33 dierent TCGA cancer datasets. For classication, motivated to improve the prediction of platinum resistance in high grade serous ovarian cancer (HGSOC) treatment, we propose novel integrative methods, iClassify to perform classication using a joint latent variable model. iClassify provides eective data integration and classication while handling heterogeneous data types, while providing a natural framework to incorporate covariate risk factors and examine genomic driver by covariate risk factor interaction. Feature selection is performed through a thresholding parameter that combines both latent variable and feature coecients. We demonstrate increased accuracy in classication over methods that assume homogeneous data type, such as linear discriminant analysis and penalized logistic regression, and improved feature selection. We apply iClassify to a TCGA cohort of HGSOC patients with three types of genomic data and platinum response data. This methodology has broad applications beyond predicting treatment outcomes and disease progression in cancer, including predicting prognosis and diagnosis in other diseases with major public health implications.

Page generated in 0.0268 seconds