851 |
ESSAYS ON SCALABLE BAYESIAN NONPARAMETRIC AND SEMIPARAMETRIC MODELSChenzhong Wu (18275839) 29 March 2024 (has links)
<p dir="ltr">In this thesis, we delve into the exploration of several nonparametric and semiparametric econometric models within the Bayesian framework, highlighting their applicability across a broad spectrum of microeconomic and macroeconomic issues. Positioned in the big data era, where data collection and storage expand at an unprecedented rate, the complexity of economic questions we aim to address is similarly escalating. This dual challenge ne- cessitates leveraging increasingly large datasets, thereby underscoring the critical need for designing flexible Bayesian priors and developing scalable, efficient algorithms tailored for high-dimensional datasets.</p><p dir="ltr">The initial two chapters, Chapter 2 and 3, are dedicated to crafting Bayesian priors suited for environments laden with a vast array of variables. These priors, alongside their corresponding algorithms, are optimized for computational efficiency, scalability to extensive datasets, and, ideally, distributability. We aim for these priors to accommodate varying levels of dataset sparsity. Chapter 2 assesses nonparametric additive models, employing a smoothing prior alongside a band matrix for each additive component. Utilizing the Bayesian backfitting algorithm significantly alleviates the computational load. In Chapter 3, we address multiple linear regression settings by adopting a flexible scale mixture of normal priors for coefficient parameters, thus allowing data-driven determination of the necessary amount of shrinkage. The use of a conjugate prior enables a closed-form solution for the posterior, markedly enhancing computational speed.</p><p dir="ltr">The subsequent chapters, Chapter 4 and 5, pivot towards time series dataset model- ing and Bayesian algorithms. A semiparametric modeling approach dissects the stochastic volatility in macro time series into persistent and transitory components, the latter addi- tional component addressing outliers. Utilizing a Dirichlet process mixture prior for the transitory part and a collapsed Gibbs sampling algorithm, we devise a method capable of efficiently processing over 10,000 observations and 200 variables. Chapter 4 introduces a simple univariate model, while Chapter 5 presents comprehensive Bayesian VARs. Our al- gorithms, more efficient and effective in managing outliers than existing ones, are adept at handling extensive macro datasets with hundreds of variables.</p>
|
852 |
Spatial association in archaeology. Development of statistical methodologies and computer techniques for spatial association of surface, lattice and point processes, applied to prehistoric evidence in North Yorkshire and to the Heslerton Romano-British site.Kelly, Michael A. January 1986 (has links)
The thesis investigates the concepts of archaeological spatial
association within the context of both site and regional data sets.
The techniques of geophysical surveying, surface distribution
collection and aerial photography are described and discussed.
Several new developments of technique are presented as well as a
detailed discussion of the problems of data presentation and
analysis.
The quantitative relationships between these data sets are
explored by modelling them as operands and describing association in
terms of operators. Both local and global measures of association
are considered with a discussion as to their relative merits.
Methods for the spatial association of regional lattice and point
processes are developed. A detailed discussion of distance based
spatial analysis techniques is presented.
|
853 |
Механизм принятия управленческих решений на промышленных предприятиях : магистерская диссертация / Mechanism for management decision-making at industrial enterprisesЛабабиди, М. Р., Lababidi, M. R. January 2021 (has links)
Принятие оптимального управленческого решения является одним из самых сложных обязанностей руководителей предприятия, поскольку с ростом неопределенности и количества независимых переменных решаемой проблемы, решения становятся более сложными, что требует надежных методов, помогающих менеджерам сделать более разумный выбор среди альтернативных вариантов действий. Целью магистерской диссертации является разработка теоретических и методических подходов к повышению эффективности процессов принятия управленческих решений на основе математических и статистических методов. В работе рассматривается важность использования математических и статистических методов как информационно-аналитической основы выбора и принятия оптимального решения. В качестве источников использовалась научно-исследовательская и методическая литература, и финансовая отчетность организаций в открытом доступе. В магистерской диссертации был предложен алгоритм принятия управленческих решений на промышленных предприятиях, предполагающий использование разработанного автором алгоритма принятия управленческих решений, основным элементом которого является использование математических и статистических методов как информационно-аналитической основы выбора и принятия оптимального решения, что позволяет повысить эффективность процесса принятия управленческих решений на промышленных предприятиях. / Making an optimal management decision is one of the most difficult responsibilities of enterprise managers, because with the growth of uncertainty and the number of independent variables of the problem being solved, decisions become more complex, which requires reliable methods that help managers make more reasonable choices among alternative options for action. The aim of the master's thesis is to develop theoretical and methodological approaches to improving the efficiency of management decision-making processes based on mathematical and statistical methods. The paper considers the importance of using mathematical and statistical methods as an information and analytical basis for choosing and making an optimal decision. Research and methodological literature and publicly available financial statements of organizations were used as sources. In the master's thesis, an algorithm for management decision-making at industrial enterprises was proposed, involving the use of an algorithm for management decision-making developed by the author, the main element of which is the use of mathematical and statistical methods as an information and analytical basis for choosing and making an optimal decision, which makes it possible to increase the efficiency of the management decision-making process at industrial plants.
|
854 |
The role of model implementation in neuroscientific applications of machine learningAbe, Taiga January 2024 (has links)
In modern neuroscience, large scale machine learning models are becoming increasingly critical components of data analysis. Despite the accelerating adoption of these large scale machine learning tools, there are fundamental challenges to their use in scientific applications that remain largely unaddressed. In this thesis, I focus on one such challenge: variability in the predictions of large scale machine learning models relative to seemingly trivial differences in their implementation.
Existing research has shown that the performance of large scale machine learning models (more so than traditional model like linear regression) is meaningfully entangled with design choices such as the hardware components, operating system, software dependencies, and random seed that the corresponding model depends upon. Within the bounds of current practice, there are few ways of controlling this kind of implementation variability across the broad community of neuroscience researchers (making data analysis less reproducible), and little understanding of how data analyses might be designed to mitigate these issues (making data analysis unreliable). This dissertation will present two broad research directions that address these shortcomings.
First, I will describe a novel, cloud-based platform for sharing data analysis tools reproducibly and at scale. This platform, called NeuroCAAS, enables developers of novel data analyses to precisely specify an implementation of their entire data analysis, which can then be used automatically by any other user on custom built cloud resources. I show that this approach is able to efficiently support a wide variety of existing data analysis tools, as well as novel tools which would not be feasible to build and share outside of a platform like NeuroCAAS.
Second, I conduct two large-scale studies on the behavior of deep ensembles. Deep ensembles are a class of machine learning model which uses implementation variability to improve the quality of model predictions; in particular, by aggregating the predictions of deep networks over stochastic initialization and training. Deep ensembles simultaneously provide a way to control the impact of implementation variability (by aggregating predictions across random seeds) and also to understand what kind of predictive diversity is generated by this particular form of implementation variability. I present a number of surprising results that contradict widely held intuitions about the performance of deep ensembles as well as the mechanisms behind their success, and show that in many aspects, the behavior of deep ensembles is similar to that of an appropriately chosen single neural network. As a whole, this dissertation presents novel methods and insights focused on the role of implementation variability in large scale machine learning models, and more generally upon the challenges of working with such large models in neuroscience data analysis. I conclude by discussing other ongoing efforts to improve the reproducibility and accessibility of large scale machine learning in neuroscience, as well as long term goals to speed the adoption and reliability of such methods in a scientific context.
|
855 |
Statistical Methods for Structured Data: Analyses of Discrete Time Series and NetworksPalmer, William Reed January 2023 (has links)
This dissertation addresses three problems of applied statistics involving discrete time series and network data. The three problems are (1) finding and analyzing community structure in directed networks, (2) capturing changes in dynamic count-valued time series of COVID-19 daily deaths, and (3) inferring the edges of an implicit network given noisy observations of a multivariate point process on its nodes. We use tools of spectral clustering, state-space models, Bayesian hierarchical modeling and variational inference to address these problems. Each chapter presents and discusses statistical methods for the given problem. We apply the methods to simulated and real data to both validate them and demonstrate their limitations.
In chapter 1 we consider a directed spectral method for community detection that utilizes a graph Laplacian defined for non-symmetric adjacency matrices. We give the theoretical motivation behind this directed graph Laplacian, and demonstrate its connection to an objective function that reflects a notion of how communities of nodes in directed networks should behave. Applying the method to directed networks, we compare the results to an approach using a symmetrized version of the adjacency matrices. A simulation study with a directed stochastic block model shows that directed spectral clustering can succeed where the symmetrized approach fails. And we find interesting and informative differences between the two approaches in the application to Congressional cosponsorship data.
n chapter 2 we propose a generalized non-linear state-space model for count-valued time series of COVID-19 fatalities. To capture the dynamic changes in daily COVID-19 death counts, we specify a latent state process that involves second order differencing and an AR(1)-ARCH(1) model. These modeling choices are motivated by the application and validated by model assessment. We consider and fit a progression of Bayesian hierarchical models under this general framework. Using COVID-19 daily death counts from New York City's five boroughs, we evaluate and compare the considered models through predictive model assessment. Our findings justify the elements included in the proposed model. The proposed model is further applied to time series of COVID-19 deaths from the four most populous counties in Texas. These model fits illuminate dynamics associated with multiple dynamic phases and show the applicability of the framework to localities beyond New York City.
In Chapter 3 we consider the task of inferring the connections between noisy observations of events. In our model-based approach, we consider a generative process incorporating latent dynamics that are directed by past events and the unobserved network structure. This process is based on a leaky integrate-and-fire (LIF) model from neuroscience for aggregating input and triggering events (spikes) in neural populations. Given observation data we estimate the model parameters with a novel variational Bayesian approach, specifying a highly structured and parsimonious approximation for the conditional posterior distribution of the process's latent dynamics. This approach allows for fully interpretable inference of both the model parameters of interest and the variational parameters. Moreover, it is computationally efficient in scenarios when the observed event times are not too sparse.
We apply our methods in a simulation study and to recorded neural activity in the dorsomedial frontal cortex (DMFC) of a rhesus macaque. We assess our results based on ground truth, model diagnostics, and spike prediction for held-out nodes.
|
856 |
A Tale of Two Paradoxes: Reconciling Selection Bias, Collider Bias, and the Birth Weight ParadoxLevy, Natalie S. January 2023 (has links)
Unexpected findings that contradict well-established relationships between exposures and outcomes are often referred to as “paradoxes” in the epidemiologic literature. For example, the “birth weight paradox” refers to the observed protective association between smoking during pregnancy and infant mortality among low birth weight infants. A recent body of literature suggests that this and several other well-known epidemiologic paradoxes can be attributed to collider bias. Collider bias results from conditioning on a variable that is caused by the exposure or shares common cause with the exposure and is caused by the outcome or shares common causes with the outcome. Several recent epidemiology textbooks and methodological studies further suggest that collider bias is the graphical representation of selection bias, suggesting that these two biases are synonymous.
This structural approach to bias is conceptually very useful for defining, describing, and identifying selection bias, but it introduces paradoxes of its own due to contradictory conclusions in the selection and collider bias methodologic literatures about their likely impact on study results in terms of magnitude, direction, and strata affected. Resolving these discrepancies is essential for our theoretical understanding of the relationship between selection and collider bias and has important practical implications for how we teach epidemiology, design studies, and evaluate and quantify the potential effects of bias on our results. For example, while patterns of collider bias coincide qualitatively with the birth weight paradox, the magnitude of collider bias would have to be substantial to reverse the sign of the association, contrary to prevailing beliefs that collider bias only minimally affects our results.
To date, the plausibility of collider bias as an explanation for the birth weight paradox has not been empirically evaluated using data in which the paradox is observed.Taken together, these inconsistencies and contradictions suggest that our understanding of selection bias and collider bias remains incomplete. The overarching goal of this dissertation was to advance the theoretical and quantitative understanding of the impact of collider bias on study results to clarify the relationship between selection and collider bias. I began by systematically reviewing the methodologic literature on selection and collider bias. I found that selection bias and collider bias are increasingly treated as synonyms, but that conclusions about the magnitude and direction of selection and collider bias, the stratum affected, and the conditions under which the effects of each type of bias were evaluated are highly inconsistent.
This suggested that divergent findings about the impact of selection and collider bias might be resolved by considering the impact of collider bias under a broader set of circumstances. I used microsimulations grounded in the sufficient component cause model to examine collider bias not under the null; interrogate why multiplicative interaction appeared central to the impact of collider bias; and clarify which stratum or strata are affected by collider bias. I identified clear patterns for the magnitude, direction, and strata affected by collider bias and successfully reconciled discrepancies with the selection bias literature. This work also enabled me to interrogate both the causal mechanisms and mathematical principles that underlie collider bias, which revealed how collider bias leads to non-exchangeability and when stratifying on a collider results in bias.
Finally, I applied this deeper understanding of the mechanisms underlying collider bias to empirically evaluate the plausibility of collider bias as an explanation for the birth weight paradox. Using microsimulations parameterized with 2015 National Center for Health Statistics Cohort Linked Birth-Infant Mortality, I identified scenarios that successfully reproduced the paradox and all observed relationships between smoking during pregnancy, infant mortality, and low birth weight. These findings strengthen the evidence for the role of collider bias in producing the paradox and shed light on the potential magnitude of unmeasured confounding and direct effects of smoking and low birth weight on infant mortality that may be required for the observed magnitude of the paradox to arise.
This work clarifies that almost all selection bias is collider bias; that the effects of collider bias vary in magnitude and direction; that selecting on a collider always leads to bias, but this bias may not occur in the stratum that coincides with our analytical sample; and that collider bias may resolve the birth weight paradox, but is unlikely to explain all epidemiologic paradoxes.
|
857 |
<b>Understanding Online Media Reactions to Significant Price Increases for Eggs</b>Sachina Kida (16898778) 25 April 2024 (has links)
<p dir="ltr">Retail prices for eggs surged during the period from early 2022 to mid-2023 in the U.S. Eggs are important to a wide range of people because of their nutritional benefits and cost relative to other protein sources. Thus, rapidly increasing egg prices can cause risks to numerous people. Using social media listening data, we analyzed the relationship between egg prices and online and social media attention and the relationship between egg prices and online and social media sentiment. Our findings suggest that egg prices are associated with the sentiment of the public as expressed in online media. However, the relationship between egg prices and online and social media attention is complex when studying the timing of increased concern with the timing of online news media coverage. Importantly, by leveraging a method of regression discontinuity in time, we show that online and social media conversations about eggs and egg prices tend to increase after the rapid rise in online news coverage. Similarly, online and social media conversations about eggs and egg prices tend to decrease after the rapid rise in online news coverage. This research also provided an example of how a total number of statements and sentiment score of social media listening data can be utilized to capture people’s attention levels, overall sentiment, and how they change over time.</p>
|
858 |
Evaluation of fiber-matrix interfacial shear strength in fiber reinforced plasticsSabat, Philippe Jacques January 1985 (has links)
The role of the interphase in fiberglass reinforced composites was studied by a combination of theoretical analysis, mechanical tests, and several high-resolution analytical techniques. The interphase was varied in composition by using epoxy and polyester matrix polymers with and without added coupling agents, as well as four fiber surface modifications. Different coupling agents on the fibers were shown to change the fiber tensile strength markedly. Filament wound unidirectional composites were tested in short beam "shear." Corresponding samples were fabricated by embedding one to seven fibers in the center of polymer dogbone specimens that were tested in tension to determine critical fiber lengths. Those values were used in a new theoretical treatment (that combines stress gradient shear-lag theory with Weibull statistics) to evaluate "interfacial shear strengths". The fact that results did not correlate with the short beam data was examined in detail via a combination of polarized light microscopy, electron microscopy (SEM) and spectroscopy (XPS or ESCA) and mass spectroscopy (SIMS). When the single fiber specimens were unloaded, a residual birefringent zone was measured and correlated with composite properties, as well as with SIMS and SEM analysis that identified changes in the locus of interphase failure. Variations in the interphase had dramatic effects upon composite properties, but it appears ·that there may be an optimum level of fiber-matrix adhesion depending upon the properties of both fiber and matrix. Fiber-fiber interactions were elucidated by combining tensile tests on multiple fiber dogbone specimens with high-resolution analytical techniques. In general, this work exemplifies a multidisciplinary approach that promises to help understand and characterize the structure and properties of the fiber-matrix interphase, and to optimize the properties of composite materials. / Master of Science
|
859 |
Causal Inference in the Face of Assumption ViolationsYuki Ohnishi (18423810) 26 April 2024 (has links)
<p dir="ltr">This dissertation advances the field of causal inference by developing methodologies in the face of assumption violations. Traditional causal inference methodologies hinge on a core set of assumptions, which are often violated in the complex landscape of modern experiments and observational studies. This dissertation proposes novel methodologies designed to address the challenges posed by single or multiple assumption violations. By applying these innovative approaches to real-world datasets, this research uncovers valuable insights that were previously inaccessible with existing methods. </p><p><br></p><p dir="ltr">First, three significant sources of complications in causal inference that are increasingly of interest are interference among individuals, nonadherence of individuals to their assigned treatments, and unintended missing outcomes. Interference exists if the outcome of an individual depends not only on its assigned treatment, but also on the assigned treatments for other units. It commonly arises when limited controls are placed on the interactions of individuals with one another during the course of an experiment. Treatment nonadherence frequently occurs in human subject experiments, as it can be unethical to force an individual to take their assigned treatment. Clinical trials, in particular, typically have subjects that do not adhere to their assigned treatments due to adverse side effects or intercurrent events. Missing values also commonly occur in clinical studies. For example, some patients may drop out of the study due to the side effects of the treatment. Failing to account for these considerations will generally yield unstable and biased inferences on treatment effects even in randomized experiments, but existing methodologies lack the ability to address all these challenges simultaneously. We propose a novel Bayesian methodology to fill this gap. </p><p><br></p><p dir="ltr">My subsequent research further addresses one of the limitations of the first project: a set of assumptions about interference structures that may be too restrictive in some practical settings. We introduce a concept of the ``degree of interference" (DoI), a latent variable capturing the interference structure. This concept allows for handling arbitrary, unknown interference structures to facilitate inference on causal estimands. </p><p><br></p><p dir="ltr">While randomized experiments offer a solid foundation for valid causal analysis, people are also interested in conducting causal inference using observational data due to the cost and difficulty of randomized experiments and the wide availability of observational data. Nonetheless, using observational data to infer causality requires us to rely on additional assumptions. A central assumption is that of \emph{ignorability}, which posits that the treatment is randomly assigned based on the variables (covariates) included in the dataset. While crucial, this assumption is often debatable, especially when treatments are assigned sequentially to optimize future outcomes. For instance, marketers typically adjust subsequent promotions based on responses to earlier ones and speculate on how customers might have reacted to alternative past promotions. This speculative behavior introduces latent confounders, which must be carefully addressed to prevent biased conclusions. </p><p dir="ltr">In the third project, we investigate these issues by studying sequences of promotional emails sent by a US retailer. We develop a novel Bayesian approach for causal inference from longitudinal observational data that accommodates noncompliance and latent sequential confounding. </p><p><br></p><p dir="ltr">Finally, we formulate the causal inference problem for the privatized data. In the era of digital expansion, the secure handling of sensitive data poses an intricate challenge that significantly influences research, policy-making, and technological innovation. As the collection of sensitive data becomes more widespread across academic, governmental, and corporate sectors, addressing the complex balance between making data accessible and safeguarding private information requires the development of sophisticated methods for analysis and reporting, which must include stringent privacy protections. Currently, the gold standard for maintaining this balance is Differential privacy. </p><p dir="ltr">Local differential privacy is a differential privacy paradigm in which individuals first apply a privacy mechanism to their data (often by adding noise) before transmitting the result to a curator. The noise for privacy results in additional bias and variance in their analyses. Thus, it is of great importance for analysts to incorporate the privacy noise into valid inference.</p><p dir="ltr">In this final project, we develop methodologies to infer causal effects from locally privatized data under randomized experiments. We present frequentist and Bayesian approaches and discuss the statistical properties of the estimators, such as consistency and optimality under various privacy scenarios.</p>
|
860 |
Stumpage price expectations: an empirical analysis of nonindustrial private landowners in the Mid-Atlantic statesLawrence, Gerald D. January 1985 (has links)
Numerous empirical studies outside of forestry have analyzed the role of price expectations in different decision processes. Empirical studies using price expectations in forestry research is a relatively new field of endeavor. Past studies have typically ignored or given cursory treatment to the role of price expectations.
This study provides a review of studies in forestry that have attempted to incorporate price expectations into model formulations. Models are then developed to explain the short-run harvest, and long-run regeneration expenditure decisions by the non-industrial private forest owner, incorporating different distributed lag formulations to account for price expectations.
The estimated models for the short-run harvest decision, using cross sectional non-aggregated data, indicates that price expectations play a significant role in this decision process. Therefore, price expectations should be incorporated in some form, (i.e. different forms of distributed lags), to properly specify models. Estimated models for the long-run regeneration expenditure decision indicates a weak link between economic variables and the regeneration decision.
For both types of models, estimated coefficients for personal characteristics of landowners are in general considered insignificant, indicating the lack of influence that personal characteristics have on these decision processes / Master of Science
|
Page generated in 0.1114 seconds