Global ETD Search

1	Assessing Affordability of Fruits and Vegetables in the Brazos Valley-Texas Lotade-Manje, Justus 2011 December 1900 (has links) The burden of obesity-related illness, which disproportionately affects low income households and historically disadvantaged racial and ethnic groups, is a leading public health issue in the United States. In addition, previous research has documented differences in eating behavior and dietary intake between racial and ethnic groups, as well as between urban and rural residents. The coexistence of diet-related disparities and diet-related health conditions has therefore become a major focus of research and policy. Researchers have hypothesized that differences in eating behavior originate from differing levels of access to and affordability of healthy food options, such as fresh fruits and vegetables. Therefore, this dissertation examines the affordability of fresh produce in the Brazos Valley of Texas. This study uses information on produce prices collected by taking a census of food stores in a large regional area through the method ground-truthing. These are combined with responses to a contemporaneous health assessment survey. Key innovations include the construction of price indices based on economic theory, testing the robustness of results to different methods of price imputation, and employing spatial econometric techniques. In the first part of the analysis, I evaluate the socioeconomic and geographical factors associated with the affordability of fresh fruits and vegetables. The results based on Ordinary Least Squares (OLS) regression show that except housing values (as median value of owner-occupied units) and store type, most factors do not have significant effects on the prices for these food items. In addition, the sizes and signs of the coefficients vary greatly across items. We found that consumers who pay higher premiums for fresh produce reside in rural areas and high proportion of minorities neighborhoods. We then assess how our results are influenced by different imputation methods to account for missing prices. The results reveal that the impacts of the factors used are similar regardless of the imputation methods. Finally we investigate the presence of spatial relationships between prices at particular stores and competing stores in the neighborhoods. The spatial estimation results based on Maximum Likelihood (ML) indicate a weak spatial correlation between the prices at stores located near each others in the neighborhoods. Stores selling vegetables display a certain level of spatial autocorrelation between the prices at a particular store and its neighboring competitors. Stores selling fruits do not present such relations in the prices. Affordability Healthy food Spatial relations Missingness
2	AI applications on healthcare data / AI tillämpningar på Hälsovårdsdata Andersson, Oscar, Andersson, Tim January 2021 (has links) The purpose of this research is to get a better understanding of how different machine learning algorithms work with different amounts of data corruption. This is important since data corruption is an overbearing issue within data collection and thus, in extension, any work that relies on the collected data. The questions we were looking at were: What feature is the most important? How significant is the correlation of features? What algorithms should be used given the data available? And, How much noise (inaccurate or unhelpful captured data) is acceptable? The study is structured to introduce AI in healthcare, data missingness, and the machine learning algorithms we used in the study. In the method section, we give a recommended workflow for handling data with machine learning in mind. The results show us that when a dataset is filled with random values, the run-time of algorithms increases since many patterns are lost. Randomly removing values also caused less of a problem than first anticipated since we ran multiple trials, evening out any problems caused by the lost values. Lastly, imputation is a preferred way of handling missing data since it retained many dataset structures. One has to keep in mind if the imputation is done on categories or numerical values. However, there is no easy "best-fit" for any dataset. It is hard to give a concrete answer when choosing a machine learning algorithm that fits any dataset. Nevertheless, since it is easy to simply plug-and-play with many algorithms, we would recommend any user try different ones before deciding which one fits a project the best. AI Machine Learning Data missingness healthcare Computer Sciences Datavetenskap (datalogi)
3	MultiModal Neural Network for Healthcare Applications / Multimodal neural network för tillämpningar inom hälso- och sjukvård Satayeva, Malika January 2023 (has links) BACKGROUND. Multimodal Machine Learning is a powerful paradigm that capitalizes on the complementary predictive capabilities of different data modalities, such as text, image, time series. This approach allows for an extremely diverse feature space, which proves useful for combining different real-world tasks into a single model. Current architectures in the field of multimodal learning often integrate feature representations in parallel, a practice that not only limits their interpretability but also creates a reliance on the availability of specific modalities. Interpretability and robustness to missing inputs are particularly important in clinical decision support systems. To address these issues, the iGH Research Group at EPFL proposed a modular sequential input fusion called Modular Decision Support Network (MoDN). MoDN was tested on unimodal tabular inputs for multitask outputs and was shown to be superior to its monolithic parallel counterparts, while handling any number and combination of inputs and providing continuous real-time predictive feedback. AIM. We aim to extend MoDN to MultiModN with multimodal inputs and compare the benefits and limitations of sequential fusion with a state-of-the-art parallel fusion (P-Fusion) baseline.METHODS & FINDINGS. We align our experimental setup with a previously published P-Fusion baseline, focusing on two binary diagnostic predictive tasks (presence of pleural effusion and edema) in a popular multimodal clinical benchmark dataset (MIMIC).We perform four experiments: 1) comparing MultiModN to P-Fusion, 2) extending the architecture to multiple tasks, 3) exploring MultiModN's inherent interpretability in several metrics, and 4) testing its ability to be resistant to biased missingness by simulating missing not at random (MNAR) data during training and flipping the bias at inference. We show that MultiModN's sequential architecture does not compromise performance compared with the P-Fusion baseline, despite the added advantages of being multitask, composable and inherently interpretable. The final experiment shows that MultiModN resists catastrophic failure from MNAR data, which is particularly prevalent in clinical settings. / Multimodal maskininlärning är ett kraftfullt paradigm som utnyttjar de kompletterande prediktiva egenskaperna hos olika datamodaliteter, såsom text, bild, tidsserier. Detta tillvägagångssätt möjliggör ett extremt varierat funktionsutrymme, vilket visar sig vara användbart för att kombinera olika verkliga uppgifter i en enda modell. Nuvarande arkitekturer för multimodal inlärning integrerar ofta funktionsrepresentationer parallellt, en praxis som inte bara begränsar deras tolkningsbarhet utan också skapar ett beroende av tillgängligheten av specifika modaliteter. Tolkningsbarhet och robusthet mot saknade indata är särskilt viktigt i kliniska beslutsstödsystem. För att lösa dessa problem har forskargruppen iGH vid EPFL föreslagit en modulär sekventiell fusion av indata som kallas Modular Decision Support Network (MoDN). MoDN testades på unimodala tabulära indata för multitask-utdata och visade sig vara överlägsen sina monolitiska parallella motsvarigheter, samtidigt som den hanterar alla antal och kombinationer av indata och ger kontinuerlig prediktiv feedback i realtid. Vårt mål är att utöka MoDN till MultiModN med multimodala indata och jämföra fördelarna och begränsningarna med sekventiell fusion med en toppmodern baslinje för parallell fusion (P-Fusion). Vi anpassar vår experimentuppsättning till en tidigare publicerad P-Fusion-baslinje, med fokus på två binära diagnostiska prediktiva uppgifter (närvaro av pleural effusion och ödem) i en populär multimodal klinisk benchmark datauppsättning (MIMIC), som omfattar bilder, text, tabelldata och tidsserier. Vi utför fyra experiment och visar att MultiModN:s sekventiella arkitektur inte försämrar prestandan jämfört med P-Fusions baslinje, trots de extra fördelarna med att vara multitasking, komponerbar och tolkningsbar i sin egen rätt. Det sista experimentet visar att MultiModN motstår katastrofala fel från MNAR-data, vilket är särskilt vanligt i kliniska miljöer. Multimodal Learning Multi-task Learning Missingness Interpretability Multimodal Maskininlärning Multi-task Maskininlärning Missingness Tolkningsbarhet Other Mathematics Annan matematik
4	Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies Paiva, Thais Viana January 2014 (has links) <p>This thesis presents methods for multiple imputation that can be applied to missing data and data with confidential variables. Imputation is useful for missing data because it results in a data set that can be analyzed with complete data statistical methods. The missing data are filled in by values generated from a model fit to the observed data. The model specification will depend on the observed data pattern and the missing data mechanism. For example, when the reason why the data is missing is related to the outcome of interest, that is nonignorable missingness, we need to alter the model fit to the observed data to generate the imputed values from a different distribution. Imputation is also used for generating synthetic values for data sets with disclosure restrictions. Since the synthetic values are not actual observations, they can be released for statistical analysis. The interest is in fitting a model that approximates well the relationships in the original data, keeping the utility of the synthetic data, while preserving the confidentiality of the original data. We consider applications of these methods to data from social sciences and epidemiology.</p><p>The first method is for imputation of multivariate continuous data with nonignorable missingness. Regular imputation methods have been used to deal with nonresponse in several types of survey data. However, in some of these studies, the assumption of missing at random is not valid since the probability of missing depends on the response variable. We propose an imputation method for multivariate data sets when there is nonignorable missingness. We fit a truncated Dirichlet process mixture of multivariate normals to the observed data under a Bayesian framework to provide flexibility. With the posterior samples from the mixture model, an analyst can alter the estimated distribution to obtain imputed data under different scenarios. To facilitate that, I developed an R application that allows the user to alter the values of the mixture parameters and visualize the imputation results automatically. I demonstrate this process of sensitivity analysis with an application to the Colombian Annual Manufacturing Survey. I also include a simulation study to show that the correct complete data distribution can be recovered if the true missing data mechanism is known, thus validating that the method can be meaningfully interpreted to do sensitivity analysis.</p><p>The second method uses the imputation techniques for nonignorable missingness to implement a procedure for adaptive design in surveys. Specifically, I develop a procedure that agencies can use to evaluate whether or not it is effective to stop data collection. This decision is based on utility measures to compare the data collected so far with potential follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios considered by the analyst. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufactures.</p><p>The third method is for imputation of confidential data sets with spatial locations using disease mapping models. We consider data that include fine geographic information, such as census tract or street block identifiers. This type of data can be difficult to release as public use files, since fine geography provides information that ill-intentioned data users can use to identify individuals. We propose to release data with simulated geographies, so as to enable spatial analyses while reducing disclosure risks. We fit disease mapping models that predict areal-level counts from attributes in the file, and sample new locations based on the estimated models. I illustrate this approach using data on causes of death in North Carolina, including evaluations of the disclosure risks and analytic validity that can result from releasing synthetic geographies.</p> / Dissertation Statistics Confidential data Missing data Multiple Imputation Nonignorable missingness Spatial model Synthetic data
5	Statistical Tools for Efficient Confirmation of Diagnosis in Patients with Suspected Primary Central Nervous System Vasculitis Brooks, John 27 April 2023 (has links) The management of missing data is a major concern in classification model generation in all fields but poses a particular challenge in situations where there is only a small quantity of sparse data available. In the field of medicine, this is not an uncommon problem. While widely subscribed methodologies like logistic regression can, with minor modifications and potentially much labor, provide reasonable insights from the larger and less sparse datasets that are anticipated when analyzing diagnosis of common conditions, there are a multitude of rare conditions of interest. Primary angiitis of the central nervous system (PACNS) is a rare but devastating entity that given its range of presenting symptoms can be suspected in a variety of circumstances. It unfortunately continues to be a diagnosis that is hard to make. Aside from some general frameworks, there isn’t a rigorously defined diagnostic approach as is the case in other more common neuroinflammatory conditions like multiple sclerosis. Instead, clinicians currently rely on experience and clinical judgement to guide the reasonable exclusion of potential inciting entities and mimickers. In effect this results in a smaller quantity of heterogenous that may not optimally suited for more traditional classification methodology (e.g., logistic regression) without substantial contemplation and justification of appropriate data cleaning / preprocessing. It is therefore challenging to make and analyze systematic approaches that could direct clinicians in a way that standardizes patient care. In this thesis, a machine learning approach was presented to derive quantitatively justified insights into the factors that are most important to consider during the diagnostic process to identify conditions like PACNS. Modern categorization techniques (i.e., random forest and support vector machines) were used to generate diagnostic models identifying cases of PACNS from which key elements of diagnostic importance could be identified. A novel variant of a random forest (RF) approach was also demonstrated as a means of managing missing data in a small sample, a significant problem encountered when exploring data on rare conditions without clear diagnostic frameworks. A reduced need to hypothesize the reasons for missingness when generating and applying the novel variant was discussed. The application of such tools to diagnostic model generation of PACNS and other rare and / or emerging diseases and provide objective feedback was explored. This primarily centered around a structured assessment on how to prioritize testing to rapidly rule out conditions that require alternative management and could be used to support future guidelines to optimize the care of these patients. The material presented herein had three components. The first centered around the example of PACNS. It described, in detail, an example of a relevant medical condition and explores why the data is both rare and sparse. Furthermore, the reasons for the sparsity are heterogeneous or non-monotonic (i.e., not conducive to modelling with a singular model). This component concludes with a search for candidate variables to diagnose the condition by means of scoping review for subsequent comparative demonstration of the novel variant of random forest construction that was proposed. The second component discussed machine learning model development and simulates data with varying degrees and patterns of missingness to demonstrate how the models could be applied to data with properties like what would be expected of PACNS related data. Finally, described techniques were applied to separate a subset of patients with suspected PACNS from those with diagnosed PACNS using institutional data and proposes future study to expand upon and ultimately verify these insights. Further development of the novel random forest approach is also discussed. Random Forest Machine Learning Missingness Feature Importance Gini Importance
6	Chasing Shadows : An Anthropological Expedition of the Hunt for Olle Högbom Andersson, Viktor January 2024 (has links) This essay explores the mysterious disappearance of Olle Högbom from an anthropological perspective. It uses theories of hauntology, ruinology, and simulacra to examine how Olle's absence continues to affect society. The study involves a thematic analysis of online forums and qualitative interviews with Olle’s sister, contrasting public speculation with family narratives, and highlights the enduring presence of Olle in collective memory, illustrating how unresolved disappearances influence society, memory, and everyday life. This anthropological investigation into missing persons provides insights into how spectral presences shape cultural and social dynamics. Employing a blend of ethnographic interviews, content analysis, pictures, and autoethnography, this study paints an intimate portrait of relationships with the absent and examines the liminality of Olle’s existence. Autoethnography in combination with multimodality carries the potential to unearth the unknown and paint an intimate understanding of absence. Olle’s absence is depicted in the first chapter and partially in the third chapter, by presenting an autoethnographic account of the experience of forming relationships with the absent. Disappearance Missingness Hauntology Ruinology Simulacra Liminality Ethnography Anthropology Social Anthropology Socialantropologi
7	Examining Random-Coeffcient Pattern-Mixture Models forLongitudinal Data with Informative Dropout Bishop, Brenden 07 December 2017 (has links) No description available. Psychology Pattern-Mixture Model Longitudinal Dropout Missing Data NMAR Nonignorable Missingness
8	Multiple imputation for marginal and mixed models in longitudinal data with informative missingness Deng, Wei 07 October 2005 (has links) No description available. Generalized estimating equations non-ignorable missingness dropout Markov chain Monte Carlo (MCMC) Gibbs' sampler
9	Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data Wang, Xuan 2012 May 1900 (has links) Proteomics serves an important role at the systems-level in understanding of biological functioning. Mass spectrometry proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. In the most widely used bottom-up approach to MS-based high-throughput quantitative proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and then analyzed using a mass spectrometer. The three fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (i) Identifying the proteins that are present in a sample, (ii) Aligning different samples on elution (retention) time, mass, peak area (intensity) and etc, (iii) Quantifying the abundance levels of the identified proteins after alignment. Each of these challenges requires knowledge of the biological and technological context that give rise to the observed data, as well as the application of sound statistical principles for estimation and inference. In this dissertation, we present a set of statistical methods in bottom-up proteomics towards protein identification, alignment and quantification. We describe a fully Bayesian hierarchical modeling approach to peptide and protein identification on the basis of MS/MS fragmentation patterns in a unified framework. Our major contribution is to allow for dependence among the list of top candidate PSMs, which we accomplish with a Bayesian multiple component mixture model incorporating decoy search results and joint estimation of the accuracy of a list of peptide identifications for each MS/MS fragmentation spectrum. We also propose an objective criteria for the evaluation of the False Discovery Rate (FDR) associated with a list of identifications at both peptide level, which results in more accurate FDR estimates than existing methods like PeptideProphet. Several alignment algorithms have been developed using different warping functions. However, all the existing alignment approaches suffer from a useful metric for scoring an alignment between two data sets and hence lack a quantitative score for how good an alignment is. Our alignment approach uses "Anchor points" found to align all the individual scan in the target sample and provides a framework to quantify the alignment, that is, assigning a p-value to a set of aligned LC-MS runs to assess the correctness of alignment. After alignment using our algorithm, the p-values from Wilcoxon signed-rank test on elution (retention) time, M/Z, peak area successfully turn into non-significant values. Quantitative mass spectrometry-based proteomics involves statistical inference on protein abundance, based on the intensities of each protein's associated spectral peaks. However, typical mass spectrometry-based proteomics data sets have substantial proportions of missing observations, due at least in part to censoring of low intensities. This complicates intensity-based differential expression analysis. We outline a statistical method for protein differential expression, based on a simple Binomial likelihood. By modeling peak intensities as binary, in terms of "presence / absence", we enable the selection of proteins not typically amendable to quantitative analysis; e.g., "one-state" proteins that are present in one condition but absent in another. In addition, we present an analysis protocol that combines quantitative and presence / absence analysis of a given data set in a principled way, resulting in a single list of selected proteins with a single associated FDR. Proteomics Mass spectrometry Bottom-up Identification Alignment Quantitation LC-MS PSM Bayesian Hierarchical Model FDR Anchor Points Missingness
10	Why do some civilian lives matter more than others? Exploring how the quality, timeliness and consistency of data on civilian harm affects the conduct of hostilities for civilians caught in conflict. Lee, Amra January 2019 (has links) Normatively, protecting civilians from the conduct of hostilities is grounded in the Geneva Conventions and the UN Security Council protection of civilian agenda, both of which celebrate their 70 and 20 year anniversaries in 2019. Previous research focusses heavily on protection of civilians through peacekeeping whereas this research focuses on ‘non-armed’ approaches to enhancing civilian protection in conflict. Prior research and experience reveals a high level of missingness and variation in the level of available data on civilian harm in conflict. Where civilian harm is considered in the peace and conflict literature, it is predominantly from a securitized lens of understanding insurgent recruitment strategies and more recent counter-insurgent strategies aimed at winning ‘hearts and minds’. Through a structured focused comparison of four case studies the correlation between the level of quality, timely and consistent data on civilian harm and affect on the conduct of hostilities will be reviewed and potential confounders identified. Following this the hypothesized causal mechanism will be process traced through the pathway case of Afghanistan. The findings and analysis from both methods identify support for the theory and it’s refinement with important nuances in the factors conducive to quality, timely and consistent data collection on civilian harm in armed conflict. civilian harm civilian casualties missingness protection of civilians conduct of hostilities armed conflict Afghanistan Iraq Syria Yemen Middle East Asia Social Sciences Samhällsvetenskap

Search results