Global ETD Search

331	Predicting Neighborhood-Level Recidivism and Residential Status of Sexual Offenders within the Context of Social Disorganization Theory Freedman, Daniel Brian 17 December 2010 (has links) No description available. Social Work Sexual Offenders General Recidivism Social Disorganization Theory Survival Analysis
332	A Predictive Time-to-Event Modeling Approach with Longitudinal Measurements and Missing Data Zhu, Lili January 2019 (has links) An important practical problem in the survival analysis is predicting the time to a future event such as the death or failure of a subject. It is of great importance for the medical decision making to investigate how the predictor variables including repeated measurements of the same subjects are affecting the future time-to-event. Such a prediction problem is particularly more challenging due to the fact that the future values of predictor variables are unknown, and they may vary dynamically over time. In this dissertation, we consider a predictive approach based on modeling the forward intensity function. To handle the practical difficulty due to missing data in longitudinal measurements, and to accommodate observations at irregularly spaced time points, we propose a smoothed composite likelihood approach for estimations. The forward intensity function approach intrinsically incorporates the future dynamics in the predictor variables that affect the stochastic occurrence of the future event. Thus the proposed framework is advantageous and parsimonious from requiring no separated modeling step for the stochastic mechanism of the predictor variables. Our theoretical analysis establishes the validity of the forward intensity modeling approach and the smoothed composite likelihood method. To model the parameters as continuous functions of time, we introduce the penalized B-spline method into the proposed approach. Extensive simulations and real-data analyses demonstrate the promising performance of the proposed predictive approach. / Statistics Statistics Forward Intensity Longitudinal Measurements Missing Data Predictive Modeling Smoothed Likelihood Survival Analysis
333	Improving the accuracy of statistics used in de-identification and model validation (via the concordance statistic) pertaining to time-to-event data Caetano, Samantha-Jo January 2020 (has links) Time-to-event data is very common in medical research. Thus, clinicians and patients need analysis of this data to be accurate, as it is often used to interpret disease screening results, inform treatment decisions, and identify at-risk patient groups (ie, sex, race, gene expressions, etc.). This thesis tackles three statistical issues pertaining to time-to-event data. The first issue was incurred from an Institute for Clinical and Evaluative Sciences lung cancer registry data set, which was de-identified by censoring patients at an earlier date. This resulted in an underestimate of the observed times of censored patients. Five methods were proposed to account for the underestimation incurred by de-identification. A subsequent simulation study was conducted to compare the effectiveness of each method in reducing bias, and mean squared error as well as improving coverage probabilities of four different KM estimates. The simulation results demonstrated that situations with relatively large numbers of censored patients required methodology with larger perturbation. In these scenarios, the fourth proposed method (which perturbed censored times such that they were censored in the final year of study) yielded estimates with the smallest bias, mean squared error, and largest coverage probability. Alternatively, when there were smaller numbers of censored patients, any manipulation to the altered data set worsened the accuracy of the estimates. The second issue arises when investigating model validation via the concordance (c) statistic. Specifically, the c-statistic is intended for measuring the accuracy of statistical models which assess the risk associated with a binary outcome. The c-statistic estimates the proportion of patient pairs where the patient with a higher predicted risk had experienced the event. The definition of a c-statistic cannot be uniquely extended to time-to-event outcomes, thus many proposals have been made. The second project developed a parametric c-statistic which assumes to the true survival times are exponentially distributed to invoke the memoryless property. A simulation study was conducted which included a comparative analysis of two other time-to-event c-statistics. Three different definitions of concordance in the time-to-event setting were compared, as were three different c-statistics. The c-statistic developed by the authors yielded the smallest bias when censoring is present in data, even when the exponentially distributed parametric assumptions do not hold. The c-statistic developed by the authors appears to be the most robust to censored data. Thus, it is recommended to use this c-statistic to validate prediction models applied to censored data. The third project in this thesis developed and assessed the appropriateness of an empirical time-to-event c-statistic that is derived by estimating the survival times of censored patients via the EM algorithm. A simulation study was conducted for various sample sizes, censoring levels and correlation rates. A non-parametric bootstrap was employed and the mean and standard error of the bias of 4 different time-to-event c-statistics were compared, including the empirical EM c-statistic developed by the authors. The newly developed c-statistic yielded the smallest mean bias and standard error in all simulated scenarios. The c-statistic developed by the authors appears to be the most appropriate when estimating concordance of a time-to-event model. Thus, it is recommended to use this c-statistic to validate prediction models applied to censored data. / Thesis / Doctor of Philosophy (PhD) Statistics Time-to-Event Analysis Survival Analysis De-identification Anonymization Model Validation Concordance Statistic AUC
334	Automatic Question Answering and Knowledge Discovery from Electronic Health Records Wang, Ping 25 August 2021 (has links) Electronic Health Records (EHR) data contain comprehensive longitudinal patient information, which is usually stored in databases in the form of either multi-relational structured tables or unstructured texts, e.g., clinical notes. EHR provides a useful resource to assist doctors' decision making, however, they also present many unique challenges that limit the efficient use of the valuable information, such as large data volume, heterogeneous and dynamic information, medical term abbreviations, and noisy nature caused by misspelled words. This dissertation focuses on the development and evaluation of advanced machine learning algorithms to solve the following research questions: (1) How to seek answers from EHR for clinical activity related questions posed in human language without the assistance of database and natural language processing (NLP) domain experts, (2) How to discover underlying relationships of different events and entities in structured tabular EHRs, and (3) How to predict when a medical event will occur and estimate its probability based on previous medical information of patients. First, to automatically retrieve answers for natural language questions from the structured tables in EHR, we study the question-to-SQL generation task by generating the corresponding SQL query of the input question. We propose a translation-edit model driven by a language generation module and an editing module for the SQL query generation task. This model helps automatically translate clinical activity related questions to SQL queries, so that the doctors only need to provide their questions in natural language to get the answers they need. We also create a large-scale dataset for question answering on tabular EHR to simulate a more realistic setting. Our performance evaluation shows that the proposed model is effective in handling the unique challenges about clinical terminologies, such as abbreviations and misspelled words. Second, to automatically identify answers for natural language questions from unstructured clinical notes in EHR, we propose to achieve this goal by querying a knowledge base constructed based on fine-grained document-level expert annotations of clinical records for various NLP tasks. We first create a dataset for clinical knowledge base question answering with two sets: clinical knowledge base and question-answer pairs. An attention-based aspect-level reasoning model is developed and evaluated on the new dataset. Our experimental analysis shows that it is effective in identifying answers and also allows us to analyze the impact of different answer aspects in predicting correct answers. Third, we focus on discovering underlying relationships of different entities (e.g., patient, disease, medication, and treatment) in tabular EHR, which can be formulated as a link prediction problem in graph domain. We develop a self-supervised learning framework for better representation learning of entities across a large corpus and also consider local contextual information for the down-stream link prediction task. We demonstrate the effectiveness, interpretability, and scalability of the proposed model on the healthcare network built from tabular EHR. It is also successfully applied to solve link prediction problems in a variety of domains, such as e-commerce, social networks, and academic networks. Finally, to dynamically predict the occurrence of multiple correlated medical events, we formulate the problem as a temporal (multiple time-points) and multi-task learning problem using tensor representation. We propose an algorithm to jointly and dynamically predict several survival problems at each time point and optimize it with the Alternating Direction Methods of Multipliers (ADMM) algorithm. The model allows us to consider both the dependencies between different tasks and the correlations of each task at different time points. We evaluate the proposed model on two real-world applications and demonstrate its effectiveness and interpretability. / Doctor of Philosophy / Healthcare is an important part of our lives. Due to the recent advances of data collection and storing techniques, a large amount of medical information is generated and stored in Electronic Health Records (EHR). By comprehensively documenting the longitudinal medical history information about a large patient cohort, this EHR data forms a fundamental resource in assisting doctors' decision making including optimization of treatments for patients and selection of patients for clinical trials. However, EHR data also presents a number of unique challenges, such as (i) large-scale and dynamic data, (ii) heterogeneity of medical information, and (iii) medical term abbreviation. It is difficult for doctors to effectively utilize such complex data collected in a typical clinical practice. Therefore, it is imperative to develop advanced methods that are helpful for efficient use of EHR and further benefit doctors in their clinical decision making. This dissertation focuses on automatically retrieving useful medical information, analyzing complex relationships of medical entities, and detecting future medical outcomes from EHR data. In order to retrieve information from EHR efficiently, we develop deep learning based algorithms that can automatically answer various clinical questions on structured and unstructured EHR data. These algorithms can help us understand more about the challenges in retrieving information from different data types in EHR. We also build a clinical knowledge graph based on EHR and link the distributed medical information and further perform the link prediction task, which allows us to analyze the complex underlying relationships of various medical entities. In addition, we propose a temporal multi-task survival analysis method to dynamically predict multiple medical events at the same time and identify the most important factors leading to the future medical events. By handling these unique challenges in EHR and developing suitable approaches, we hope to improve the efficiency of information retrieval and predictive modeling in healthcare. Electronic Health Records Question Answering Knowledge Discovery Knowledge Graph Survival Analysis
335	Two Applied Economics Essays: Trade Duration in U.S. Fresh Fruit and Vegetable Imports & Goods-Time Elasticity of Substitution in Household Food Production for SNAP participants and nonparticipants Rudi, Jeta 08 August 2012 (has links) The first study investigates the factors that impact the duration of U.S. fresh fruit and vegetable imports. We employ both survival analysis (Kaplan Meier estimates and Cox proportional hazards model) as well as count data models. Our results indicate that SPS treatment requirements positively impact the duration of trade while new market access has the opposite effect. Other factors typically included in trade duration models (such as: GDP, transportation costs, tariff rates, etc.) were also investigated. We also employ a probit model to understand the factors impacting the probability that a country selects into exporting fresh fruits and vegetables to the United States. The second study estimates the goods-time elasticity of substitution for Food Stamp/SNAP participants versus non participants. We find that the elasticity of substitution for SNAP participants is not statistically different from zero. This indicates that SNAP participants have Leontief production function in household food production, implying that increasing the amount of SNAP benefits paid to participants will not lead to more food production if the time households dedicate to food preparation remains unchanged. This finding extends the analysis done by Baral, Davis and You (2011) and offers insights for policies related to the SNAP program. / Master of Science Household food production Elasticity of substitution Survival analysis Trade duration Fresh fruits and vegetables
336	Multi-Platform Molecular Data Integration and Disease Outcome Analysis Youssef, Ibrahim Mohamed 06 December 2016 (has links) One of the most common measures of clinical outcomes is the survival time. Accurately linking cancer molecular profiling with survival outcome advances clinical management of cancer. However, existing survival analysis relies intensively on statistical evidence from a single level of data, without paying much attention to the integration of interacting multi-level data and the underlying biology. Advances in genomic techniques provide unprecedented power of characterizing the cancer tissue in a more complete manner than before, opening the opportunity of designing biologically informed and integrative approaches for survival analysis. Many cancer tissues have been profiled for gene expression levels and genomic variants (such as copy number alterations, sequence mutations, DNA methylation, and histone modification). However, it is not clear how to integrate the gene expression and genetic variants to achieve a better prediction and understanding of the cancer survival. To address this challenge, we propose two approaches for data integration in order to both biologically and statistically boost the features selection process for proper detection of the true predictive players of survival. The first approach is data-driven yet biologically informed. Consistent with the biological hierarchy from DNA to RNA, we prioritize each survival-relevant feature with two separate scores, predictive and mechanistic. With mRNA expression levels in concern, predictive features are those mRNAs whose variation in expression levels are associated with the survival outcome, and mechanistic features are those mRNAs whose variation in expression levels are associated with genomic variants (copy number alterations (CNAs) in this study). Further, we propose simultaneously integrating information from both the predictive model and the mechanistic model through our new approach GEMPS (Gene Expression as a Mediator for Predicting Survival). Applied on two cancer types (ovarian and glioblastoma multiforme), our method achieved better prediction power than peer methods. Gene set enrichment analysis confirms that the genes utilized for the final survival analysis are biologically important and relevant. The second approach is a generic mathematical framework to biologically regularize the Cox's proportional hazards model that is widely used in survival analysis. We propose a penalty function that both links the mechanistic model to the clinical model and reflects the biological downstream regulatory effect of the genomic variants on the mRNA expression levels of the target genes. Fast and efficient optimization principles like the coordinate descent and majorization-minimization are adopted in the inference process of the coefficients of the Cox model predictors. Through this model, we develop the regulator-target gene relationship to a new one: regulator-target-outcome relationship of a disease. Assessed via a simulation study and analysis of two real cancer data sets, the proposed method showed better performance in terms of selecting the true predictors and achieving better survival prediction. The proposed method gives insightful and meaningful interpretability to the selected model due to the biological linking of the mechanistic model and the clinical model. Other important forms of clinical outcomes are monitoring angiogenesis (formation of new blood vessels necessary for tumor to nourish itself and sustain its existence) and assessing therapeutic response. This can be done through dynamic imaging, in which a series of images at different time instances are acquired for a specific tumor site after injection of a contrast agent. Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) is a noninvasive tool to examine tumor vasculature patterns based on accumulation and washout of the contrast agent. DCE-MRI gives indication about tumor vasculature permeability, which in turn indicates the tumor angiogenic activity. Observing this activity over time can reflect the tumor drug responsiveness and efficacy of the treatment plan. However, due to the limited resolution of the imaging scanners, a partial-volume effect (PVE) problem occurs, which is the result of signals from two or more tissues combining together to produce a single image concentration value within a pixel, with the effect of inaccurate estimation to the values of the pharmacokinetic parameters. A multi-tissue compartmental modeling (CM) technique supported by convex analysis of mixtures is used to mitigate the PVE by clustering pixels and constructing a simplex whose vertices are of a single compartment type. CAM uses the identified pure-volume pixels to estimate the kinetics of the tissues under investigation. We propose an enhanced version of CAM-CM to identify pure-volume pixels more accurately. This includes the consideration of the neighborhood effect on each pixel and the use of a barycentric coordinate system to identify more pure-volume pixels and to test those identified by CAM-CM. Tested on simulated DCE-MRI data, the enhanced CAM-CM achieved better performance in terms of accuracy and reproducibility. / Ph. D. survival analysis Cox proportional hazards model molecular data integration intratumor vascular heterogeneity
337	Cure Rate Models with Nonparametric Form of Covariate Effects Chen, Tianlei 02 June 2015 (has links) This thesis focuses on development of spline-based hazard estimation models for cure rate data. Such data can be found in survival studies with long term survivors. Consequently, the population consists of the susceptible and non-susceptible sub-populations with the latter termed as "cured". The modeling of both the cure probability and the hazard function of the susceptible sub-population is of practical interest. Here we propose two smoothing-splines based models falling respectively into the popular classes of two component mixture cure rate models and promotion time cure rate models. Under the framework of two component mixture cure rate model, Wang, Du and Liang (2012) have developed a nonparametric model where the covariate effects on both the cure probability and the hazard component are estimated by smoothing splines. Our first development falls under the same framework but estimates the hazard component based on the accelerated failure time model, instead of the proportional hazards model in Wang, Du and Liang (2012). Our new model has better interpretation in practice. The promotion time cure rate model, motivated from a simplified biological interpretation of cancer metastasis, was first proposed only a few decades ago. Nonetheless, it has quickly become a competitor to the mixture models. Our second development aims to provide a nonparametric alternative to the existing parametric or semiparametric promotion time models. / Ph. D. Nonparametric Function Estimation Smoothing Spline Penalized Likelihood Method Survival Analysis Cure Rate Model
338	Three Essays on Agricultural Trade Policy Ning, Xin 27 November 2019 (has links) This dissertation consists of three essays examining the impacts of Sanitary and Phytosanitary (SPS) Measures on agricultural trade. The first essay estimates the impact of the 2003 Bovine Spongiform Encephalopathy (BSE) outbreak in the US on Japanese beef imports. I develop a source-differentiated demand system of fresh/chilled and frozen beef imports augmented with endogenous smooth transition functions. Results suggest that over one-half of the estimated income, own-price, and cross-price elasticities reached a new regime in the post-BSE period of Japanese beef imports where the competitive relationship and substitutability between US and Australian beef exports changed significantly. The second essay develops a product-line structural gravity model to estimate the trade flow effects of SPS measures that have been flagged as specific trade concerns in the World Trade Organization's (WTO's) SPS Committee meetings for the top 30 agricultural trading countries covering four major product sectors. Our findings are striking and call attention to the need for a deeper understanding of the impacts of SPS measures on WTO members' agricultural trade. Results show that the trade effects of SPS trade concern measures reduce exporters' agricultural trade by 67%, on average, during periods in which concerns were active. Significant heterogeneity in the trade effect of SPS measures exists with average estimated ad valorem equivalent tariffs ranging from 33% to 106%. The AVE effect of SPS concern measures maintained by the US is estimated at 42%, less than a half (a third) of the AVE effects of SPS concern measures imposed by the European Union (China). China's restrictions on Avian Influenza and ractopamine restrictions in pork exports are estimated to be the most prohibitive, causing an AVE effect of 120.3% and 88.9%, respectively. The third essay develops a discrete-time duration model to examine the extent to which these SPS concern measures affect the hazard rate of US agri-food exports during the 1995-2016 period. Results show that SPS concern measures raise the hazard rate of US agri-food exports by a range of 2.1%~15.3%, causing the predicted hazard rate to increase from 21.8% to a range of 23.6%~27.9%. This effect is heterogeneous across different agricultural sectors, with the most substantial effects occurring in US exports of meat, fruits, and vegetables. / Doctor of Philosophy / This dissertation consists of three essays on the examination of Sanitary and Phytosanitary (SPS) Measures and their impacts on agricultural trade. The first essay estimates the impact of the US 2003 Bovine Spongiform Encephalopathy (BSE) outbreaks on Japanese beef imports. Using a source-differentiated demand system of fresh/chilled and frozen beef imports embedded with endogenous smooth transition functions, we find that over one-half of the estimated income, own-price, and cross-price elasticities have changed remarkably, causing the Japanese beef import market to reach a new regime in the post-BSE period where the substitution and/or competition relationships between the US and Australia have changed. The second essay develops a product-line structural gravity model to estimate the trade effects of SPS measures flagged as concerns in the WTO's SPS Committee meetings for the top 30 agricultural trading countries covering four major product sectors. Results show that the trade effects of SPS concern measures are negative and significant, with the average estimated AVE tariffs ranging 33%~106%. The AVE effect of SPS concern measures maintained by the US is estimated to be 42%, less than a half (a third) of the AVE effects of SPS concern measures imposed by the European Union (China). China's restrictions on Avian Influenza and various ractopamine restrictions in the production and export of pork products are estimated to be the most prohibitive, causing an AVE effect of 120.3% and 88.9%, respectively. The third essay applies a discrete-time duration model to examine the extent to which SPS concern measures affect the hazard rate of US agri-food exports in 1995-2016. Results show that SPS concern measures raise the hazard rate of US agri-food exports by a range of 2.1%~15.3%, causing the predicted hazard rate to increase from 21.8% to a range of 23.6%~27.9%. This effect is heterogeneous across different agricultural sectors, with the most substantial effects occurring in US exports of meat, fruits, and vegetables. Agricultural Trade Sanitary Phytosanitary Measures BSE Outbreaks Ad-Valorem Tariff Equivalents Survival Analysis
339	Do seasoned offerings improve the performance of issuing firms? Evidence from China Zhang, D., Wu, Yuliang, Ye, Q., Liu, J. 2018 August 1923 (has links) Yes / This study provides new evidence that the performance of issuing firms varies by issue type, based on survival analysis methods. Our non-parametric results show that firms raising capital through rights issues, and notably through cash offers, experience a greater risk of delisting following issuance, as compared to those issuing convertible bonds. Our Cox model analyses demonstrate that plain equity issues, in contrast to convertible issues, are subject to different degrees of regulatory discipline, obligations and incentives in shaping survival trajectory. Further, high ownership concentration, agency issues intrinsic to equity offerings, weak shareholders' protection, and corporate ownership and governance and corporate control development at the time of an offer markedly influence post-issue survival. Plain equity issues, notably cash offers, are strongly linked with the agency costs of free cash flows. A large and truly independent board, allied to a separation of CEO and chairman powers, acts as a primary restraint on managers' self-interested behaviour. Such a cohesive governance mechanism can restrain rent-seeking in the firm's fundraising initiative. These observations hold when we take into account information available before an issue, at the time of an issue, and after an issue, demonstrating the robustness of our findings. Seasoned issues Agency costs Corporate ownership and governance Firm viability Survival analysis
340	Suicide prediction among older adults using Swedish National Registry Data : A machine learning approach using survival analysis Karlsson, Oliver, Von Hacht, Wilhelm January 2024 (has links) This thesis investigates the predictive modeling of suicidal behavioramong older adults in Sweden through the application of machine learning and survival analysis to data from Swedish National Registries. Theresearch utilizes longitudinal and static data characteristics, includingprescribed psychopharmaceuticals, medical conditions, and sociodemographic factors. By incorporating various survival analysis models suchas the Cox Proportional Hazards Model (CoxPH), Random SurvivalForest (RSF), and Gradient Boosting Survival Analysis (GBSA) withsequential machine learning techniques such as Long Short-Term Memory (LSTM) networks, the study aims to enhance the predictive accuracy of suicidal tendencies among the elderly. The thesis concludes thatcombining these techniques provides insights into the risk of suicide attempts and completed suicides while also underscoring the challengesassociated with modeling national registry data and demonstrating anovel approach to addressing suicidal behavior. Survival Analysis Regression Classification Suicide Prediction Drug Prescriptions Swedish National Register Engineering and Technology Teknik och teknologier

Search results