Global ETD Search

201	Data analysis and multiple imputation for two-level nested designs Bailey, Brittney E. 25 October 2018 (has links) No description available. Biostatistics Statistics biostatistics nested data cluster randomized trial clinical trial missing data semiparametric binary outcome multilevel model mixed effects multiple imputation predictive mean matching
202	Methodologies for Missing Data with Range Regressions Stoll, Kevin Edward 24 April 2019 (has links) No description available. Statistics Missing Data Missing Response Nonparametric Range Regression Nonparametric Range Regression Propensity Score Ascendancy Average Rank Propensity Stratification Regression Bootstrap Missing at Random Double-Robust Consistency Almost Sure
203	Data Classification System Based on Combination Optimized Decision Tree : A Study on Missing Data Handling, Rough Set Reduction, and FAVC Set Integration / Dataklassificeringssystem baserat på kombinationsoptimerat beslutsträd : En studie om saknad datahantering, grov uppsättningsreduktion och FAVC-uppsättningsintegration Lu, Xuechun January 2023 (has links) Data classification is a novel data analysis technique that involves extracting valuable information with potential utility from databases. It has found extensive applications in various domains, including finance, insurance, government, education, transportation, and defense. There are several methods available for data classification, with decision tree algorithms being one of the most widely used. These algorithms are based on instance-based inductive learning and offer advantages such as rule extraction, low computational complexity, and the ability to highlight important decision attributes, leading to high classification accuracy. According to statistics, decision tree algorithms[1] are among the most widely utilized data mining algorithms. To address these challenges, a decision tree algorithm is employed to solve classification problems. However, the existing decision tree algorithm exhibits limitations such as low calculation efficiency and multi-valued[2] bias. Therefore, a data classification system based on an optimized decision tree algorithm written in Python and a data storage system based on PostgreSQL were developed. The proposed algorithm surpasses traditional classification algorithms in terms of dimensionality reduction, attribute selection, and scalability. Ultimately, a combined optimization decision tree classifier system is introduced, which exhibits superior performance compared to the widely used ID3[3] algorithm. The improved decision tree algorithm has both theoretical and practical significance for data mining applications. / Dataklassificering är en ny dataanalysteknik som innebär att man extraherar värdefull information med potentiell nytta från databaser. Den har hittat omfattande tillämpningar inom olika domäner, inklusive finans, försäkring, regering, utbildning, transport och försvar. Det finns flera metoder tillgängliga för dataklassificering, där beslutsträdsalgoritmer är en av de mest använda. Dessa algoritmer är baserade på instansbaserad induktiv inlärning och erbjuder fördelar som regelextraktion, låg beräkningskomplexitet och förmågan att lyfta fram viktiga beslutsattribut, vilket leder till hög klassificeringsnoggrannhet. Enligt statistik är beslutsträdsalgoritmer bland de mest använda datautvinningsalgoritmerna. För att hantera dessa utmaningar används en beslutsträdsalgoritm för att lösa klassificeringsproblem. Den befintliga beslutsträds-algoritmen uppvisar dock begränsningar såsom låg beräkningseffektivitet och flervärdig bias. Därför utvecklades ett dataklassificeringssystem baserat på en optimerad beslutsträdsalgoritm skriven i Python och ett datalagringssystem baserat på PostgreSQL. Den föreslagna algoritmen överträffar traditionella klassificeringsalgoritmer när det gäller dimensionsreduktion, attributval och skalbarhet. I slutändan introduceras ett kombinerat optimeringsbeslutsträd-klassificeringssystem, som uppvisar överlägsen prestanda jämfört med den allmänt använda ID3-algoritmen. Den förbättrade beslutsträdsalgoritmen har både teoretisk och praktisk betydelse för datautvinningstillämpningar. Missing data handling Rough set reduction FAVC Set ID3 Saknade datahantering Rough set reducering FAVC Set ID3 Computer and Information Sciences Data- och informationsvetenskap
204	Temporally-Embedded Deep Learning Model for Health Outcome Prediction Boursalie, Omar January 2021 (has links) Deep learning models are increasingly used to analyze health records to model disease progression. Two characteristics of health records present challenges to developers of deep learning-based medical systems. First, the veracity of the estimation of missing health data must be evaluated to optimize the performance of deep learning models. Second, the currently most successful deep learning diagnostic models, called transformers, lack a mechanism to analyze the temporal characteristics of health records. In this thesis, these two challenges are investigated using a real-world medical dataset of longitudinal health records from 340,143 patients over ten years called MIIDD: McMaster Imaging Information and Diagnostic Dataset. To address missing data, the performance of imputation models (mean, regression, and deep learning) were evaluated on a real-world medical dataset. Next, techniques from adversarial machine learning were used to demonstrate how imputation can have a cascading negative impact on a deep learning model. Then, the strengths and limitations of evaluation metrics from the statistical literature (qualitative, predictive accuracy, and statistical distance) to evaluate deep learning-based imputation models were investigated. This research can serve as a reference to researchers evaluating the impact of imputation on their deep learning models. To analyze the temporal characteristics of health records, a new model was developed and evaluated called DTTHRE: Decoder Transformer for Temporally-Embedded Health Records Encoding. DTTHRE predicts patients' primary diagnoses by analyzing their medical histories, including the elapsed time between visits. The proposed model successfully predicted patients' primary diagnosis in their final visit with improved predictive performance (78.54 +/- 0.22%) compared to existing models in the literature. DTTHRE also increased the training examples available from limited medical datasets by predicting the primary diagnosis for each visit (79.53 +/- 0.25%) with no additional training time. This research contributes towards the goal of disease predictive modeling for clinical decision support. / Dissertation / Doctor of Philosophy (PhD) / In this thesis, two challenges using deep learning models to analyze health records are investigated using a real-world medical dataset. First, an important step in analyzing health records is to estimate missing data. We investigated how imputation can have a cascading negative impact on a deep learning model's performance. A comparative analysis was then conducted to investigate the strengths and limitations of evaluation metrics from the statistical literature to assess deep learning-based imputation models. Second, the most successful deep learning diagnostic models to date, called transformers, lack a mechanism to analyze the temporal characteristics of health records. To address this gap, we developed a new temporally-embedded transformer to analyze patients' medical histories, including the elapsed time between visits, to predict their primary diagnoses. The proposed model successfully predicted patients' primary diagnosis in their final visit with improved predictive performance (78.54 +/- 0.22%) compared to existing models in the literature. Health informatics Machine learning Deep learning Electronic health records Transformer Temporal Encodings Embeddings Imputation Missing data Model checking Multiple imputation Evaluation metrics Imaging Computed tomography X-ray
205	Missing Data Treatments in Multilevel Latent Growth Model: A Monte Carlo Simulation Study Jiang, Hui 25 September 2014 (has links) No description available. Education Statistics
206	Assessment of Soil Corrosion in Underground Pipelines via Statistical Inference Yajima, Ayako 10 September 2015 (has links) No description available. Civil Engineering Soil corrosion Corrosion assessment ECDA ILI Reliability Gaussian mixture models Clustering analysis Missing data analysis truncated distribution Generalized exponential distribution Bayesian inference MCMC
207	Modeling Smooth Time-Trajectories for Camera and Deformable Shape in Structure from Motion with Occlusion Gotardo, Paulo Fabiano Urnau 28 September 2010 (has links) No description available. Artificial Intelligence Computer Science Electrical Engineering Mathematics Motion Pictures Robots structure from motion matrix factorization missing data camera trajectory shape trajectory
208	A Monte Carlo Study of Missing Data Treatments for an Incomplete Level-2 Variable in Hierarchical Linear Models Kwon, Hyukje 20 July 2011 (has links) No description available. Educational Tests and Measurements missing data treatment listwise deletion mean substitution EM multiple imputation inclusive restrictive bias RMSE confidence interval HLM intercepts- and slopes- as-outcomes
209	Navigating the Risks of Dark Data : An Investigation into Personal Safety Gautam, Anshu January 2023 (has links) With the exponential proliferation of data, there has been a surge in data generation fromdiverse sources, including social media platforms, websites, mobile devices, and sensors.However, not all data is readily visible or accessible to the public, leading to the emergence ofthe concept known as "dark data." This type of data can exist in structured or unstructuredformats and can be stored in various repositories, such as databases, log files, and backups.The reasons behind data being classified as "dark" can vary, encompassing factors such as limited awareness, insufficient resources or tools for data analysis, or a perception ofirrelevance to current business operations. This research employs a qualitative research methodology incorporating audio/videorecordings and personal interviews to gather data, aiming to gain insights into individuals'understanding of the risks associated with dark data and their behaviors concerning thesharing of personal information online. Through the thematic analysis of the collected data,patterns and trends in individuals' risk perceptions regarding dark data become evident. The findings of this study illuminate the multiple dimensions of individuals' risk perceptions andt heir influence on attitudes towards sharing personal information in online contexts. Theseinsights provide valuable understanding of the factors that shape individuals' decisionsconcerning data privacy and security in the digital era. By contributing to the existing body ofknowledge, this research offers a deeper comprehension of the interplay between dark datarisks, individuals' perceptions, and their behaviors pertaining to online information sharing.The implications of this study can inform the development of strategies and interventionsaimed at fostering informed decision-making and ensuring personal safety in an increasinglydata-centric world Dark data Hidden data Big data Unstructured data Missing data Privacy Cybersecurity Personal data Data storage Consumer protection Information Systems, Social aspects
210	Methodological Issues in Design and Analysis of Studies with Correlated Data in Health Research Ma, Jinhui 04 1900 (has links) <p>Correlated data with complex association structures arise from longitudinal studies and cluster randomized trials. However, some methodological challenges in the design and analysis of such studies or trials have not been overcome. In this thesis, we address three of the challenges: 1) <em>Power analysis for population based longitudinal study investigating gene-environment interaction effects on chronic disease:</em> For longitudinal studies with interest in investigating the gene-environment interaction in disease susceptibility and progression, rigorous statistical power estimation is crucial to ensure that such studies are scientifically useful and cost-effective since human genome epidemiology is expensive. However conventional sample size calculations for longitudinal study can seriously overestimate the statistical power due to overlooking the measurement error, unmeasured etiological determinants, and competing events that can impede the occurrence of the event of interest. 2) <em>Comparing the performance of different multiple imputation strategies for missing binary outcomes in cluster randomized trials</em>: Though researchers have proposed various strategies to handle missing binary outcome in cluster randomized trials (CRTs), comprehensive guidelines on the selection of the most appropriate or optimal strategy are not available in the literature. 3) <em>Comparison of population-averaged and cluster-specific models for the analysis of cluster randomized trials with missing binary outcome</em>: Both population-averaged and cluster-specific models are commonly used for analyzing binary outcomes in CRTs. However, little attention has been paid to their accuracy and efficiency when analyzing data with missing outcomes. The objective of this thesis is to provide researchers recommendations and guidance for future research in handling the above issues.</p> / Doctor of Philosophy (PhD) longitudinal study cluster randomized trials correlated data missing data imputation sample size calculation Clinical Trials Survival Analysis Clinical Trials

Search results