• Refine Query
  • Source
  • Publication year
  • to
  • Language
  • 6
  • 2
  • 2
  • Tagged with
  • 12
  • 6
  • 4
  • 4
  • 3
  • 3
  • 3
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • 2
  • About
  • The Global ETD Search service is a free service for researchers to find electronic theses and dissertations. This service is provided by the Networked Digital Library of Theses and Dissertations.
    Our metadata is collected from universities around the world. If you manage a university/consortium/country archive and want to be added, details can be found on the NDLTD website.
1

Assessing Affordability of Fruits and Vegetables in the Brazos Valley-Texas

Lotade-Manje, Justus 2011 December 1900 (has links)
The burden of obesity-related illness, which disproportionately affects low income households and historically disadvantaged racial and ethnic groups, is a leading public health issue in the United States. In addition, previous research has documented differences in eating behavior and dietary intake between racial and ethnic groups, as well as between urban and rural residents. The coexistence of diet-related disparities and diet-related health conditions has therefore become a major focus of research and policy. Researchers have hypothesized that differences in eating behavior originate from differing levels of access to and affordability of healthy food options, such as fresh fruits and vegetables. Therefore, this dissertation examines the affordability of fresh produce in the Brazos Valley of Texas. This study uses information on produce prices collected by taking a census of food stores in a large regional area through the method ground-truthing. These are combined with responses to a contemporaneous health assessment survey. Key innovations include the construction of price indices based on economic theory, testing the robustness of results to different methods of price imputation, and employing spatial econometric techniques. In the first part of the analysis, I evaluate the socioeconomic and geographical factors associated with the affordability of fresh fruits and vegetables. The results based on Ordinary Least Squares (OLS) regression show that except housing values (as median value of owner-occupied units) and store type, most factors do not have significant effects on the prices for these food items. In addition, the sizes and signs of the coefficients vary greatly across items. We found that consumers who pay higher premiums for fresh produce reside in rural areas and high proportion of minorities neighborhoods. We then assess how our results are influenced by different imputation methods to account for missing prices. The results reveal that the impacts of the factors used are similar regardless of the imputation methods. Finally we investigate the presence of spatial relationships between prices at particular stores and competing stores in the neighborhoods. The spatial estimation results based on Maximum Likelihood (ML) indicate a weak spatial correlation between the prices at stores located near each others in the neighborhoods. Stores selling vegetables display a certain level of spatial autocorrelation between the prices at a particular store and its neighboring competitors. Stores selling fruits do not present such relations in the prices.
2

AI applications on healthcare data / AI tillämpningar på Hälsovårdsdata

Andersson, Oscar, Andersson, Tim January 2021 (has links)
The purpose of this research is to get a better understanding of how different machine learning algorithms work with different amounts of data corruption. This is important since data corruption is an overbearing issue within data collection and thus, in extension, any work that relies on the collected data. The questions we were looking at were: What feature is the most important? How significant is the correlation of features? What algorithms should be used given the data available? And, How much noise (inaccurate or unhelpful captured data) is acceptable?  The study is structured to introduce AI in healthcare, data missingness, and the machine learning algorithms we used in the study. In the method section, we give a recommended workflow for handling data with machine learning in mind. The results show us that when a dataset is filled with random values, the run-time of algorithms increases since many patterns are lost. Randomly removing values also caused less of a problem than first anticipated since we ran multiple trials, evening out any problems caused by the lost values. Lastly, imputation is a preferred way of handling missing data since it retained many dataset structures. One has to keep in mind if the imputation is done on categories or numerical values. However, there is no easy "best-fit" for any dataset. It is hard to give a concrete answer when choosing a machine learning algorithm that fits any dataset. Nevertheless, since it is easy to simply plug-and-play with many algorithms, we would recommend any user try different ones before deciding which one fits a project the best.
3

Multiple Imputation Methods for Nonignorable Nonresponse, Adaptive Survey Design, and Dissemination of Synthetic Geographies

Paiva, Thais Viana January 2014 (has links)
<p>This thesis presents methods for multiple imputation that can be applied to missing data and data with confidential variables. Imputation is useful for missing data because it results in a data set that can be analyzed with complete data statistical methods. The missing data are filled in by values generated from a model fit to the observed data. The model specification will depend on the observed data pattern and the missing data mechanism. For example, when the reason why the data is missing is related to the outcome of interest, that is nonignorable missingness, we need to alter the model fit to the observed data to generate the imputed values from a different distribution. Imputation is also used for generating synthetic values for data sets with disclosure restrictions. Since the synthetic values are not actual observations, they can be released for statistical analysis. The interest is in fitting a model that approximates well the relationships in the original data, keeping the utility of the synthetic data, while preserving the confidentiality of the original data. We consider applications of these methods to data from social sciences and epidemiology.</p><p>The first method is for imputation of multivariate continuous data with nonignorable missingness. Regular imputation methods have been used to deal with nonresponse in several types of survey data. However, in some of these studies, the assumption of missing at random is not valid since the probability of missing depends on the response variable. We propose an imputation method for multivariate data sets when there is nonignorable missingness. We fit a truncated Dirichlet process mixture of multivariate normals to the observed data under a Bayesian framework to provide flexibility. With the posterior samples from the mixture model, an analyst can alter the estimated distribution to obtain imputed data under different scenarios. To facilitate that, I developed an R application that allows the user to alter the values of the mixture parameters and visualize the imputation results automatically. I demonstrate this process of sensitivity analysis with an application to the Colombian Annual Manufacturing Survey. I also include a simulation study to show that the correct complete data distribution can be recovered if the true missing data mechanism is known, thus validating that the method can be meaningfully interpreted to do sensitivity analysis.</p><p>The second method uses the imputation techniques for nonignorable missingness to implement a procedure for adaptive design in surveys. Specifically, I develop a procedure that agencies can use to evaluate whether or not it is effective to stop data collection. This decision is based on utility measures to compare the data collected so far with potential follow-up samples. The options are assessed by imputation of the nonrespondents under different missingness scenarios considered by the analyst. The variation in the utility measures is compared to the cost induced by the follow-up sample sizes. We apply the proposed method to the 2007 U.S. Census of Manufactures.</p><p>The third method is for imputation of confidential data sets with spatial locations using disease mapping models. We consider data that include fine geographic information, such as census tract or street block identifiers. This type of data can be difficult to release as public use files, since fine geography provides information that ill-intentioned data users can use to identify individuals. We propose to release data with simulated geographies, so as to enable spatial analyses while reducing disclosure risks. We fit disease mapping models that predict areal-level counts from attributes in the file, and sample new locations based on the estimated models. I illustrate this approach using data on causes of death in North Carolina, including evaluations of the disclosure risks and analytic validity that can result from releasing synthetic geographies.</p> / Dissertation
4

Statistical Tools for Efficient Confirmation of Diagnosis in Patients with Suspected Primary Central Nervous System Vasculitis

Brooks, John 27 April 2023 (has links)
The management of missing data is a major concern in classification model generation in all fields but poses a particular challenge in situations where there is only a small quantity of sparse data available. In the field of medicine, this is not an uncommon problem. While widely subscribed methodologies like logistic regression can, with minor modifications and potentially much labor, provide reasonable insights from the larger and less sparse datasets that are anticipated when analyzing diagnosis of common conditions, there are a multitude of rare conditions of interest. Primary angiitis of the central nervous system (PACNS) is a rare but devastating entity that given its range of presenting symptoms can be suspected in a variety of circumstances. It unfortunately continues to be a diagnosis that is hard to make. Aside from some general frameworks, there isn’t a rigorously defined diagnostic approach as is the case in other more common neuroinflammatory conditions like multiple sclerosis. Instead, clinicians currently rely on experience and clinical judgement to guide the reasonable exclusion of potential inciting entities and mimickers. In effect this results in a smaller quantity of heterogenous that may not optimally suited for more traditional classification methodology (e.g., logistic regression) without substantial contemplation and justification of appropriate data cleaning / preprocessing. It is therefore challenging to make and analyze systematic approaches that could direct clinicians in a way that standardizes patient care. In this thesis, a machine learning approach was presented to derive quantitatively justified insights into the factors that are most important to consider during the diagnostic process to identify conditions like PACNS. Modern categorization techniques (i.e., random forest and support vector machines) were used to generate diagnostic models identifying cases of PACNS from which key elements of diagnostic importance could be identified. A novel variant of a random forest (RF) approach was also demonstrated as a means of managing missing data in a small sample, a significant problem encountered when exploring data on rare conditions without clear diagnostic frameworks. A reduced need to hypothesize the reasons for missingness when generating and applying the novel variant was discussed. The application of such tools to diagnostic model generation of PACNS and other rare and / or emerging diseases and provide objective feedback was explored. This primarily centered around a structured assessment on how to prioritize testing to rapidly rule out conditions that require alternative management and could be used to support future guidelines to optimize the care of these patients. The material presented herein had three components. The first centered around the example of PACNS. It described, in detail, an example of a relevant medical condition and explores why the data is both rare and sparse. Furthermore, the reasons for the sparsity are heterogeneous or non-monotonic (i.e., not conducive to modelling with a singular model). This component concludes with a search for candidate variables to diagnose the condition by means of scoping review for subsequent comparative demonstration of the novel variant of random forest construction that was proposed. The second component discussed machine learning model development and simulates data with varying degrees and patterns of missingness to demonstrate how the models could be applied to data with properties like what would be expected of PACNS related data. Finally, described techniques were applied to separate a subset of patients with suspected PACNS from those with diagnosed PACNS using institutional data and proposes future study to expand upon and ultimately verify these insights. Further development of the novel random forest approach is also discussed.
5

Chasing Shadows : An Anthropological Expedition of the Hunt for Olle Högbom

Andersson, Viktor January 2024 (has links)
This essay explores the mysterious disappearance of Olle Högbom from an anthropological perspective. It uses theories of hauntology, ruinology, and simulacra to examine how Olle's absence continues to affect society. The study involves a thematic analysis of online forums and qualitative interviews with Olle’s sister, contrasting public speculation with family narratives, and highlights the enduring presence of Olle in collective memory, illustrating how unresolved disappearances influence society, memory, and everyday life. This anthropological investigation into missing persons provides insights into how spectral presences shape cultural and social dynamics. Employing a blend of ethnographic interviews, content analysis, pictures, and autoethnography, this study paints an intimate portrait of relationships with the absent and examines the liminality of Olle’s existence. Autoethnography in combination with multimodality carries the potential to unearth the unknown and paint an intimate understanding of absence. Olle’s absence is depicted in the first chapter and partially in the third chapter, by presenting an autoethnographic account of the experience of forming relationships with the absent.
6

Examining Random-Coeffcient Pattern-Mixture Models forLongitudinal Data with Informative Dropout

Bishop, Brenden 07 December 2017 (has links)
No description available.
7

Multiple imputation for marginal and mixed models in longitudinal data with informative missingness

Deng, Wei 07 October 2005 (has links)
No description available.
8

Statistical Methods for the Analysis of Mass Spectrometry-based Proteomics Data

Wang, Xuan 2012 May 1900 (has links)
Proteomics serves an important role at the systems-level in understanding of biological functioning. Mass spectrometry proteomics has become the tool of choice for identifying and quantifying the proteome of an organism. In the most widely used bottom-up approach to MS-based high-throughput quantitative proteomics, complex mixtures of proteins are first subjected to enzymatic cleavage, the resulting peptide products are separated based on chemical or physical properties and then analyzed using a mass spectrometer. The three fundamental challenges in the analysis of bottom-up MS-based proteomics are as follows: (i) Identifying the proteins that are present in a sample, (ii) Aligning different samples on elution (retention) time, mass, peak area (intensity) and etc, (iii) Quantifying the abundance levels of the identified proteins after alignment. Each of these challenges requires knowledge of the biological and technological context that give rise to the observed data, as well as the application of sound statistical principles for estimation and inference. In this dissertation, we present a set of statistical methods in bottom-up proteomics towards protein identification, alignment and quantification. We describe a fully Bayesian hierarchical modeling approach to peptide and protein identification on the basis of MS/MS fragmentation patterns in a unified framework. Our major contribution is to allow for dependence among the list of top candidate PSMs, which we accomplish with a Bayesian multiple component mixture model incorporating decoy search results and joint estimation of the accuracy of a list of peptide identifications for each MS/MS fragmentation spectrum. We also propose an objective criteria for the evaluation of the False Discovery Rate (FDR) associated with a list of identifications at both peptide level, which results in more accurate FDR estimates than existing methods like PeptideProphet. Several alignment algorithms have been developed using different warping functions. However, all the existing alignment approaches suffer from a useful metric for scoring an alignment between two data sets and hence lack a quantitative score for how good an alignment is. Our alignment approach uses "Anchor points" found to align all the individual scan in the target sample and provides a framework to quantify the alignment, that is, assigning a p-value to a set of aligned LC-MS runs to assess the correctness of alignment. After alignment using our algorithm, the p-values from Wilcoxon signed-rank test on elution (retention) time, M/Z, peak area successfully turn into non-significant values. Quantitative mass spectrometry-based proteomics involves statistical inference on protein abundance, based on the intensities of each protein's associated spectral peaks. However, typical mass spectrometry-based proteomics data sets have substantial proportions of missing observations, due at least in part to censoring of low intensities. This complicates intensity-based differential expression analysis. We outline a statistical method for protein differential expression, based on a simple Binomial likelihood. By modeling peak intensities as binary, in terms of "presence / absence", we enable the selection of proteins not typically amendable to quantitative analysis; e.g., "one-state" proteins that are present in one condition but absent in another. In addition, we present an analysis protocol that combines quantitative and presence / absence analysis of a given data set in a principled way, resulting in a single list of selected proteins with a single associated FDR.
9

Why do some civilian lives matter more than others? Exploring how the quality, timeliness and consistency of data on civilian harm affects the conduct of hostilities for civilians caught in conflict.

Lee, Amra January 2019 (has links)
Normatively, protecting civilians from the conduct of hostilities is grounded in the Geneva Conventions and the UN Security Council protection of civilian agenda, both of which celebrate their 70 and 20 year anniversaries in 2019. Previous research focusses heavily on protection of civilians through peacekeeping whereas this research focuses on ‘non-armed’ approaches to enhancing civilian protection in conflict. Prior research and experience reveals a high level of missingness and variation in the level of available data on civilian harm in conflict. Where civilian harm is considered in the peace and conflict literature, it is predominantly from a securitized lens of understanding insurgent recruitment strategies and more recent counter-insurgent strategies aimed at winning ‘hearts and minds’. Through a structured focused comparison of four case studies the correlation between the level of quality, timely and consistent data on civilian harm and affect on the conduct of hostilities will be reviewed and potential confounders identified. Following this the hypothesized causal mechanism will be process traced through the pathway case of Afghanistan. The findings and analysis from both methods identify support for the theory and it’s refinement with important nuances in the factors conducive to quality, timely and consistent data collection on civilian harm in armed conflict.
10

A tale of two applications: closed-loop quality control for 3D printing, and multiple imputation and the bootstrap for the analysis of big data with missingness

Wenbin Zhu (12226001) 20 April 2022 (has links)
<div><b>1. A Closed-Loop Machine Learning and Compensation Framework for Geometric Accuracy Control of 3D Printed Products</b></div><div><b><br></b></div>Additive manufacturing (AM) systems enable direct printing of three-dimensional (3D) physical products from computer-aided design (CAD) models. Despite the many advantages that AM systems have over traditional manufacturing, one of their significant limitations that impedes their wide adoption is geometric inaccuracies, or shape deviations between the printed product and the nominal CAD model. Machine learning for shape deviations can enable geometric accuracy control of 3D printed products via the generation of compensation plans, which are modifications of CAD models informed by the machine learning algorithm that reduce deviations in expectation. However, existing machine learning and compensation frameworks cannot accommodate deviations of fully 3D shapes with different geometries. The feasibility of existing frameworks for geometric accuracy control is further limited by resource constraints in AM systems that prevent the printing of multiple copies of new shapes.<div><br></div><div>We present a closed-loop machine learning and compensation framework that can improve geometric accuracy control of 3D shapes in AM systems. Our framework is based on a Bayesian extreme learning machine (BELM) architecture that leverages data and deviation models from previously printed products to transfer deviation models, and more accurately capture deviation patterns, for new 3D products. The closed-loop nature of compensation under our framework, in which past compensated products that do not adequately meet dimensional specifications are fed into the BELMs to re-learn the deviation model, enables the identification of effective compensation plans and satisfies resource constraints by printing only one new shape at a time. The power and cost-effectiveness of our framework are demonstrated with two validation experiments that involve different geometries for a Markforged Metal X AM machine printing 17-4 PH stainless steel products. As demonstrated in our case studies, our framework can reduce shape inaccuracies by 30% to 60% (depending on a shape's geometric complexity) in at most two iterations, with three training shapes and one or two test shapes for a specific geometry involved across the iterations. We also perform an additional validation experiment using a third geometry to establish the capabilities of our framework for prospective shape deviation prediction of 3D shapes that have never been printed before. This third experiment indicates that choosing one suitable class of past products for prospective prediction and model transfer, instead of including all past printed products with different geometries, could be sufficient for obtaining deviation models with good predictive performance. Ultimately, our closed-loop machine learning and compensation framework provides an important step towards accurate and cost-efficient deviation modeling and compensation for fully 3D printed products using a minimal number of printed training and test shapes, and thereby can advance AM as a high-quality manufacturing paradigm.<br></div><div><br></div><div><b>2. Multiple Imputation and the Bootstrap for the Analysis of Big Data with Missingness</b></div><div><br></div><div>Inference can be a challenging task for Big Data. Two significant issues are that Big Data frequently exhibit complicated missing data patterns, and that the complex statistical models and machine learning algorithms typically used to analyze Big Data do not have convenient quantification of uncertainties for estimators. These two difficulties have previously been addressed using multiple imputation and the bootstrap, respectively. However, it is not clear how multiple imputation and bootstrap procedures can be effectively combined to perform statistical inferences on Big Data with missing values. We investigate a practical framework for the combination of multiple imputation and bootstrap methods. Our framework is based on two principles: distribution of multiple imputation and bootstrap calculations across parallel computational cores, and the quantification of sources of variability involved in bootstrap procedures that use subsampling techniques via random effects or hierarchical models. This framework effectively extends the scope of existing methods for multiple imputation and the bootstrap to a broad range of Big Data settings. We perform simulation studies for linear and logistic regression across Big Data settings with different rates of missingness to characterize the frequentist properties and computational efficiencies of the combinations of multiple imputation and the bootstrap. We further illustrate how effective combinations of multiple imputation and the bootstrap for Big Data analyses can be identified in practice by means of both the simulation studies and a case study on COVID infection status data. Ultimately, our investigation demonstrates how the flexible combination of multiple imputation and the bootstrap under our framework can enable valid statistical inferences in an effective manner for Big Data with missingness.<br></div>

Page generated in 0.0499 seconds